[MDEV-27461] Server hangs when the bufferpool is shrunk to 8M and innodb_page_size=64K - Jira

Matthias Leich created issue - 2022-01-10 16:02

Matthias Leich made changes - 2022-01-10 16:04

Field	Original Value	New Value
Attachment		MDEV-27461.txt [ 61594 ]

Matthias Leich made changes - 2022-01-10 16:34

Description

{noformat}
Observation on
origin/preview-10.8-~~MDEV-14425~~-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
performed a bufferpool resizing down to 8M and failed during or short after that with
[ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

Per my experiences with server startup on official versions of MariaDB
InnoDB denies to work in case the page_size is 64K and the bufferpool_size
is < 24M because of whatever reasons.
Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.
The corresponding test runs on MariaDB versions 10.5 till 10.8 too.

I will try to develop some MTR based tests revealing that such a drastic resizing
is possible.
{noformat}

{noformat}
Observation on
origin/preview-10.8-~~MDEV-14425~~-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
performed a bufferpool resizing down to 8M and failed during or short after that with
[ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

Per my experiences with server startup on official versions of MariaDB
InnoDB denies to work in case the page_size is 64K and the bufferpool_size
is < 24M because of whatever reasons.
Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.
The corresponding test runs on MariaDB versions 10.5 till 10.8 too.

I will try to develop some MTR based tests revealing that such a drastic resizing
is possible.
The MTR based test on 10.7 reports
SET GLOBAL innodb_buffer_pool_size = 8388608;
Warnings:
Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824

{noformat}

Matthias Leich made changes - 2022-01-10 17:05

Description

{noformat}
Observation on
origin/preview-10.8-~~MDEV-14425~~-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
performed a bufferpool resizing down to 8M and failed during or short after that with
[ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

Per my experiences with server startup on official versions of MariaDB
InnoDB denies to work in case the page_size is 64K and the bufferpool_size
is < 24M because of whatever reasons.
Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.
The corresponding test runs on MariaDB versions 10.5 till 10.8 too.

I will try to develop some MTR based tests revealing that such a drastic resizing
is possible.
The MTR based test on 10.7 reports
SET GLOBAL innodb_buffer_pool_size = 8388608;
Warnings:
Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824

{noformat}

{noformat}
Observation on
origin/preview-10.8-~~MDEV-14425~~-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
performed a bufferpool resizing down to 8M and failed during or short after that with
[ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

Per my experiences with server startup on official versions of MariaDB
InnoDB denies to work in case the page_size is 64K and the bufferpool_size
is < 24M because of whatever reasons.
Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.
The corresponding test runs on MariaDB versions 10.5 till 10.8 too.

I will try to develop some MTR based tests revealing that such a drastic resizing
is possible.

The MTR based test reports on
- origin/10.7 2022-01-06
   SET GLOBAL innodb_buffer_pool_size = 8388608;
   Warnings:
   Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
- origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
  a warning like above

{noformat}

Matthias Leich made changes - 2022-01-10 17:06

Description

{noformat}
Observation on
origin/preview-10.8-~~MDEV-14425~~-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
performed a bufferpool resizing down to 8M and failed during or short after that with
[ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

Per my experiences with server startup on official versions of MariaDB
InnoDB denies to work in case the page_size is 64K and the bufferpool_size
is < 24M because of whatever reasons.
Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.
The corresponding test runs on MariaDB versions 10.5 till 10.8 too.

I will try to develop some MTR based tests revealing that such a drastic resizing
is possible.

The MTR based test reports on
- origin/10.7 2022-01-06
   SET GLOBAL innodb_buffer_pool_size = 8388608;
   Warnings:
   Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
- origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
  a warning like above

{noformat}

{noformat}
Observation on
origin/preview-10.8-~~MDEV-14425~~-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
performed a bufferpool resizing down to 8M (message in server error log) and failed during or short after that with
[ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

Per my experiences with server startup on official versions of MariaDB
InnoDB denies to work in case the page_size is 64K and the bufferpool_size
is < 24M because of whatever reasons.
Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.

I will try to develop some MTR based tests revealing that such a drastic resizing
is possible.

The MTR based test reports on
- origin/10.7 2022-01-06
   SET GLOBAL innodb_buffer_pool_size = 8388608;
   Warnings:
   Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
- origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
  a warning like above

{noformat}

Matthias Leich made changes - 2022-01-10 17:23

Description

{noformat}
Observation on
origin/preview-10.8-~~MDEV-14425~~-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
performed a bufferpool resizing down to 8M (message in server error log) and failed during or short after that with
[ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

Per my experiences with server startup on official versions of MariaDB
InnoDB denies to work in case the page_size is 64K and the bufferpool_size
is < 24M because of whatever reasons.
Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.

I will try to develop some MTR based tests revealing that such a drastic resizing
is possible.

The MTR based test reports on
- origin/10.7 2022-01-06
   SET GLOBAL innodb_buffer_pool_size = 8388608;
   Warnings:
   Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
- origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
  a warning like above

{noformat}

{noformat}
Observation on
origin/preview-10.8-~~MDEV-14425~~-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
One RQG test starts the server with innodb_page_size 64K and bufferpool_size 256M,
performs a bufferpool resizing down to 8M (message in server error log) and fails during or short after that with
[ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

Per my experiences with server startup on official versions of MariaDB
InnoDB denies to work in case the page_size is 64K and the bufferpool_size
is < 24M because of whatever reasons.
Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.

I will try to develop some MTR based tests revealing that such a drastic resizing
is possible.

The MTR based test reports on
- origin/10.7 2022-01-06
   SET GLOBAL innodb_buffer_pool_size = 8388608;
   Warnings:
   Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
- origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
  a warning like above

{noformat}

Matthias Leich made changes - 2022-01-10 17:25

Fix Version/s	10.8 [ 26121 ]
Affects Version/s	10.8.0 [ 26800 ]

Matthias Leich made changes - 2022-01-10 18:28

Description

{noformat}
Observation on
origin/preview-10.8-~~MDEV-14425~~-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
One RQG test starts the server with innodb_page_size 64K and bufferpool_size 256M,
performs a bufferpool resizing down to 8M (message in server error log) and fails during or short after that with
[ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

Per my experiences with server startup on official versions of MariaDB
InnoDB denies to work in case the page_size is 64K and the bufferpool_size
is < 24M because of whatever reasons.
Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.

I will try to develop some MTR based tests revealing that such a drastic resizing
is possible.

The MTR based test reports on
- origin/10.7 2022-01-06
   SET GLOBAL innodb_buffer_pool_size = 8388608;
   Warnings:
   Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
- origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
  a warning like above

{noformat}

{noformat}
Observation on
origin/preview-10.8-~~MDEV-14425~~-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
One RQG test starts the server with innodb_page_size 64K and bufferpool_size 256M,
performs a bufferpool resizing down to 8M (message in server error log) and fails during or short after that with
[ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

Per my experiences with server startup on official versions of MariaDB
InnoDB denies to work in case the page_size is 64K and the bufferpool_size
is < 24M because of whatever reasons.
Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.

The MTR based test reports on
- origin/10.7 2022-01-06
   SET GLOBAL innodb_buffer_pool_size = 8388608;
   Warnings:
   Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
- origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
  a warning like above
- origin/10.8 commit baace572dc1d2ab35145556e9db231dca8c18147 (HEAD)
   ~~MDEV-25342~~: autosize innodb_buffer_pool_chunk_size
   mysqltest: At line 11: query 'SET GLOBAL innodb_buffer_pool_size = @innodb_buffer_pool_size_save' failed: ER_WRONG_ARGUMENTS (1210): Another buffer pool resize is already in progress.
   which is caused by the ongoing asynchronous buffer pool resizing to 8M issued earlier.
   The server error log shows
   2022-01-10 17:59:45 4 [Note] InnoDB: Requested to resize buffer pool. (new size: 8388608 bytes)
   2022-01-10 17:59:45 0 [Note] InnoDB: Resizing buffer pool from 25165824 to 8388608 (unit=1048576).

{noformat}

Matthias Leich made changes - 2022-01-10 18:31

Attachment		BPR.tgz [ 61600 ]
Fix Version/s		10.8 [ 26121 ]
Affects Version/s		10.8.0 [ 26800 ]
Assignee	Matthias Leich [ mleich ]	Daniel Black [ danblack ]
Summary	Draft: InnoDB allows to resize the bufferpool down to 8M even if page_size is 64K	InnoDB allows to resize the bufferpool down to 8M even if page_size is 64K

Daniel Black added a comment - 2022-01-11 03:48 - edited

Thanks mleich!

Technically reproducible on 10.2 with explicit minimum (1M) innodb-buffer-pool-chunk-size, ~~MDEV-25342~~ just pushed the default value for this lower.

10.2.42-MariaDB-80d33261ea1a731f228b712b0e763740f87b89be
$ podman run -e MARIADB_ALLOW_EMPTY_ROOT_PASSWORD=1 quay.io/mariadb-foundation/mariadb-devel:10.2 --innodb-buffer-pool-size=256M --innodb-buffer-pool-chunk-size=1M --innodb-page-size=64k
...
Version: '10.2.42-MariaDB-1:10.2.42+maria~bionic' as '10.2.42-MariaDB-80d33261ea1a731f228b712b0e763740f87b89be' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
2022-01-11 5:25:55 140318846473984 [Note] InnoDB: Resizing buffer pool from 268435456 to 8388608 (unit=1048576).
2022-01-11 5:25:55 140319263479552 [Note] InnoDB:
2022-01-11 5:25:55 140318846473984 [Note] InnoDB: Disabling adaptive hash index.
2022-01-11 5:25:55 140318846473984 [Note] InnoDB: disabled adaptive hash index.
2022-01-11 5:25:55 140318846473984 [Note] InnoDB: Withdrawing blocks to be shrunken.
2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : start to withdraw the last 3968 blocks.
2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968)
2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 3799 blocks from free list. Tried to relocate 41 pages (3840/3968).
2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968)
2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 0 blocks from free list. Tried to relocate 0 pages (3840/3968).
2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968)

2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968)
2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 0 blocks from free list. Tried to relocate 0 pages (3840/3968).
2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968)
2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 0 blocks from free list. Tried to relocate 0 pages (3840/3968).
2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : will retry to withdraw later.
2022-01-11 5:28:00 140318846473984 [Note] InnoDB: Will retry to withdraw 10 seconds later.

I don't quite get the assertion in the timeout thread, but if the above endless loop is prevented it becomes resolved.

Daniel Black added a comment - 2022-01-11 03:48 - edited Thanks mleich ! Technically reproducible on 10.2 with explicit minimum (1M) innodb-buffer-pool-chunk-size, MDEV-25342 just pushed the default value for this lower. 10.2.42-MariaDB-80d33261ea1a731f228b712b0e763740f87b89be $ podman run -e MARIADB_ALLOW_EMPTY_ROOT_PASSWORD=1 quay.io/mariadb-foundation/mariadb-devel:10.2 --innodb-buffer-pool-size=256M --innodb-buffer-pool-chunk-size=1M --innodb-page-size=64k ... Version: '10.2.42-MariaDB-1:10.2.42+maria~bionic' as '10.2.42-MariaDB-80d33261ea1a731f228b712b0e763740f87b89be' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: Resizing buffer pool from 268435456 to 8388608 (unit=1048576). 2022-01-11 5:25:55 140319263479552 [Note] InnoDB: 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: Disabling adaptive hash index. 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: disabled adaptive hash index. 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: Withdrawing blocks to be shrunken. 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : start to withdraw the last 3968 blocks. 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968) 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 3799 blocks from free list. Tried to relocate 41 pages (3840/3968). 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968) 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 0 blocks from free list. Tried to relocate 0 pages (3840/3968). 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968) 2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968) 2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 0 blocks from free list. Tried to relocate 0 pages (3840/3968). 2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968) 2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 0 blocks from free list. Tried to relocate 0 pages (3840/3968). 2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : will retry to withdraw later. 2022-01-11 5:28:00 140318846473984 [Note] InnoDB: Will retry to withdraw 10 seconds later. I don't quite get the assertion in the timeout thread, but if the above endless loop is prevented it becomes resolved.

Daniel Black made changes - 2022-01-11 05:41

Affects Version/s		10.7.1 [ 26120 ]
Affects Version/s		10.2.41 [ 26032 ]

Daniel Black made changes - 2022-01-11 05:41

Fix Version/s		10.2 [ 14601 ]
Fix Version/s	10.8 [ 26121 ]

Marko Mäkelä added a comment - 2022-01-11 06:26

I analyzed the hang yesterday. It was caused by buf_pool.mutex being continuously held by buf_LRU_free_page(), for buf_page_init_for_read() while buf_pool_t::resize() in another thread was desperately trying to complete. We had innodb_buffer_pool_size=8M and innodb_page_size=64k during that hang.

What the the page read was issued for should not matter. I think that it was purge or rollback.

Marko Mäkelä added a comment - 2022-01-11 06:26 I analyzed the hang yesterday. It was caused by buf_pool.mutex being continuously held by buf_LRU_free_page() , for buf_page_init_for_read() while buf_pool_t::resize() in another thread was desperately trying to complete. We had innodb_buffer_pool_size=8M and innodb_page_size=64k during that hang. What the the page read was issued for should not matter. I think that it was purge or rollback.

Daniel Black added a comment - 2022-01-11 06:58

From marko: My preferred fix would be to avoid the hang when trying to allocate a page while resize is running. I did not investigate how easy that would be to do.

Daniel Black added a comment - 2022-01-11 06:58 From marko : My preferred fix would be to avoid the hang when trying to allocate a page while resize is running. I did not investigate how easy that would be to do.

Daniel Black made changes - 2022-01-11 07:06

Link

This issue relates to ~~MDEV-27467~~ [ ~~MDEV-27467~~ ]

Marko Mäkelä added a comment - 2022-01-17 08:08

I think that the test failure demonstrates two problems. The main one is that the minimum buffer pool size is not being enforced consistently. I hope that you can fix that, danblack.

To me, the more interesting one is that the server is hanging. I will try to determine the root cause of that, and see what it would take to fix that.

Marko Mäkelä added a comment - 2022-01-17 08:08 I think that the test failure demonstrates two problems. The main one is that the minimum buffer pool size is not being enforced consistently. I hope that you can fix that, danblack . To me, the more interesting one is that the server is hanging. I will try to determine the root cause of that, and see what it would take to fix that.

Marko Mäkelä made changes - 2022-01-17 08:08

Link

This issue blocks ~~MDEV-25342~~ [ ~~MDEV-25342~~ ]

Marko Mäkelä added a comment - 2022-01-17 12:18

I had a look at the code. The function buf_LRU_get_free_block() already works in a rather reasonable fashion. It is releasing the buf_pool.mutex between iterations. The infamous message InnoDB: Difficult to find free blocks in the buffer pool is being suppressed when buffer pool resizing is active.

Because buf_LRU_get_free_block() is not waiting for buffer pool resizing to finish and because buf_pool_t::resize() is only polling the status every 10 seconds, the infinite loop could be unavoidable. In MDEV-27461.txt we have the following:

Thread 13 is attempting to allocate a buffer page for reading a page when opening a table, causing the infinite wait in buf_LRU_get_free_block().
Thread 11 (finishing ROLLBACK) is attempting to allocate a buffer page for reading a page.

I do not think that either thread should be holding many buffer pool page latches. I will have to debug the trace again to see if anything else could be improved, apart from the trivial fix to enforce a reasonable minimum buffer pool size (measured in pages of innodb_page_size, not bytes).

Marko Mäkelä added a comment - 2022-01-17 12:18 I had a look at the code. The function buf_LRU_get_free_block() already works in a rather reasonable fashion. It is releasing the buf_pool.mutex between iterations. The infamous message InnoDB: Difficult to find free blocks in the buffer pool is being suppressed when buffer pool resizing is active. Because buf_LRU_get_free_block() is not waiting for buffer pool resizing to finish and because buf_pool_t::resize() is only polling the status every 10 seconds, the infinite loop could be unavoidable. In MDEV-27461.txt we have the following: Thread 13 is attempting to allocate a buffer page for reading a page when opening a table, causing the infinite wait in buf_LRU_get_free_block() . Thread 11 (finishing ROLLBACK ) is attempting to allocate a buffer page for reading a page. I do not think that either thread should be holding many buffer pool page latches. I will have to debug the trace again to see if anything else could be improved, apart from the trivial fix to enforce a reasonable minimum buffer pool size (measured in pages of innodb_page_size , not bytes).

Marko Mäkelä added a comment - 2022-01-17 13:10

I analyzed a new trace of this. In it, we have 229 pages in both buf_pool.LRU and buf_pool.flush_list, and at least 2 of those pages are really waiting to be written out to a data file. (After ~~MDEV-25113~~, buf_pool.flush_list may contain clean pages.)

The page cleaner is in an untimed wait that was introduced in ~~MDEV-24278~~. The buffer pool resizer should probably wake it up. Both buf_flush_async_lsn and buf_flush_sync_lsn are 0. During the execution of the entire trace, the page cleaner was woken up only once:

Thread 2 hit Breakpoint 3, buf_flush_page_cleaner () at /data/Server/bb-10.8-MDEV-14425D/storage/innobase/buf/buf0flu.cc:2235

2235	    set_timespec(abstime, 1);

I believe that a potential hang is possible in 10.5 or later even with larger buffer pool sizes, due to ~~MDEV-24278~~.

Marko Mäkelä added a comment - 2022-01-17 13:10 I analyzed a new trace of this. In it, we have 229 pages in both buf_pool.LRU and buf_pool.flush_list , and at least 2 of those pages are really waiting to be written out to a data file. (After MDEV-25113 , buf_pool.flush_list may contain clean pages.) The page cleaner is in an untimed wait that was introduced in MDEV-24278 . The buffer pool resizer should probably wake it up. Both buf_flush_async_lsn and buf_flush_sync_lsn are 0. During the execution of the entire trace, the page cleaner was woken up only once: Thread 2 hit Breakpoint 3, buf_flush_page_cleaner () at /data/Server/bb-10.8-MDEV-14425D/storage/innobase/buf/buf0flu.cc:2235 2235 set_timespec(abstime, 1); I believe that a potential hang is possible in 10.5 or later even with larger buffer pool sizes, due to MDEV-24278 .

Marko Mäkelä added a comment - 2022-01-17 13:43 - edited

For 10.5 and later, I pushed a fix to ensure that the page cleaner will start writing out pages so that some space will become available in the buffer pool. According to mleich, it helps to reduce the problem, but it will not prevent hangs when larger values of innodb_max_dirty_pages_pct are in use.

With that fix (fixing a minor regression caused by ~~MDEV-24278~~), the problem is no longer repeatable with the default innodb_max_dirty_pages_pct=90. Therefore, I do not think that this bug should block ~~MDEV-25342~~ anymore.

Root cause:
I was wondering why buf_flush_LRU_list_batch() fails to write out any dirty pages. The reason is that the condition UT_LIST_GET_LEN(buf_pool.LRU) > BUF_LRU_MIN_LEN does not hold, because there are fewer than 256 pages in the buf_pool.LRU list. This would seem to be the hard minimum size of the buffer pool in pages. 256*16384=4MiB, and 256*65536=16MiB. The buffer pool is smaller than the latter.

I do not know why the minimum sizes at startup are steeper than the hard limit (5MiB for up to innodb_page_size=16384, and 24MiB for larger page sizes), but it indeed would be consistent to enforce the same limit at startup and in SET GLOBAL. danblack, can you please fix the SET GLOBAL?

Marko Mäkelä added a comment - 2022-01-17 13:43 - edited For 10.5 and later, I pushed a fix to ensure that the page cleaner will start writing out pages so that some space will become available in the buffer pool. According to mleich , it helps to reduce the problem, but it will not prevent hangs when larger values of innodb_max_dirty_pages_pct are in use. With that fix (fixing a minor regression caused by MDEV-24278 ), the problem is no longer repeatable with the default innodb_max_dirty_pages_pct=90 . Therefore, I do not think that this bug should block MDEV-25342 anymore. Root cause: I was wondering why buf_flush_LRU_list_batch() fails to write out any dirty pages. The reason is that the condition UT_LIST_GET_LEN(buf_pool.LRU) > BUF_LRU_MIN_LEN does not hold, because there are fewer than 256 pages in the buf_pool.LRU list. This would seem to be the hard minimum size of the buffer pool in pages. 256*16384=4MiB, and 256*65536=16MiB. The buffer pool is smaller than the latter. I do not know why the minimum sizes at startup are steeper than the hard limit (5MiB for up to innodb_page_size=16384 , and 24MiB for larger page sizes), but it indeed would be consistent to enforce the same limit at startup and in SET GLOBAL . danblack , can you please fix the SET GLOBAL ?

Marko Mäkelä made changes - 2022-01-17 15:14

Link

This issue relates to ~~MDEV-25342~~ [ ~~MDEV-25342~~ ]

Marko Mäkelä made changes - 2022-01-17 15:14

Link

This issue relates to ~~MDEV-24278~~ [ ~~MDEV-24278~~ ]

Marko Mäkelä made changes - 2022-01-17 15:14

Fix Version/s		10.3 [ 22126 ]
Fix Version/s		10.4 [ 22408 ]
Fix Version/s		10.5 [ 23123 ]
Fix Version/s		10.6 [ 24028 ]
Fix Version/s		10.7 [ 24805 ]
Fix Version/s		10.8 [ 26121 ]

Marko Mäkelä made changes - 2022-01-17 15:14

Link

This issue blocks ~~MDEV-25342~~ [ ~~MDEV-25342~~ ]

Matthias Leich added a comment - 2022-01-17 15:38

I applied the patch provided by Marko to

    origin/bb-10.8-MDEV-14425 7f75466f539b61d3dc8696e72a2d715c59aa04d6 2022-01-14T19:52:40+02:00

    The RQG replay tests starts the server with innodb_buffer_pool_size=256M

    and switches that during runtime between 256M and 8M.

and the results were

- innodb_max_dirty_pages_pct not assigned at server startup --> get default of 90

  More than 400 RQG tests executed and none of them showed the problem described on top.

  This is some dramatic progress compared to bb-10.8-MDEV-14425 without the patch.

- innodb_max_dirty_pages_pct assigned at server startup between 99% and 99.975%.

   Some replays of the problem. This is to be expected because we have a critical

   combination of high innodb_max_dirty_pages_pct and very small innodb_bufferpool_size.

  Raising innodb_max_dirty_pages_pct leads to increasing fraction of replays of the problem like expected.

Matthias Leich added a comment - 2022-01-17 15:38 I applied the patch provided by Marko to origin/bb-10.8-MDEV-14425 7f75466f539b61d3dc8696e72a2d715c59aa04d6 2022-01-14T19:52:40+02:00 The RQG replay tests starts the server with innodb_buffer_pool_size=256M and switches that during runtime between 256M and 8M. and the results were - innodb_max_dirty_pages_pct not assigned at server startup --> get default of 90 More than 400 RQG tests executed and none of them showed the problem described on top. This is some dramatic progress compared to bb-10.8-MDEV-14425 without the patch. - innodb_max_dirty_pages_pct assigned at server startup between 99% and 99.975%. Some replays of the problem. This is to be expected because we have a critical combination of high innodb_max_dirty_pages_pct and very small innodb_bufferpool_size. Raising innodb_max_dirty_pages_pct leads to increasing fraction of replays of the problem like expected.

Marko Mäkelä added a comment - 2022-01-18 07:00

Consistent enforcement of the minimum value in SET GLOBAL innobd_buffer_pool_size will be covered by ~~MDEV-27467~~.
The scope of this ticket was reduced to only fix the recent regression that was caused by ~~MDEV-24278~~ in MariaDB Server 10.5.9: the page cleaner would fail to wake up while the buffer pool is being shrunk.

Marko Mäkelä added a comment - 2022-01-18 07:00 Consistent enforcement of the minimum value in SET GLOBAL innobd_buffer_pool_size will be covered by MDEV-27467 . The scope of this ticket was reduced to only fix the recent regression that was caused by MDEV-24278 in MariaDB Server 10.5.9: the page cleaner would fail to wake up while the buffer pool is being shrunk.

Marko Mäkelä made changes - 2022-01-18 07:00

Fix Version/s		10.5.14 [ 26809 ]
Fix Version/s		10.6.6 [ 26811 ]
Fix Version/s		10.7.2 [ 26813 ]
Fix Version/s		10.8.1 [ 26815 ]
Fix Version/s	10.2 [ 14601 ]
Fix Version/s	10.3 [ 22126 ]
Fix Version/s	10.4 [ 22408 ]
Fix Version/s	10.5 [ 23123 ]
Fix Version/s	10.6 [ 24028 ]
Fix Version/s	10.7 [ 24805 ]
Fix Version/s	10.8 [ 26121 ]
Assignee	Daniel Black [ danblack ]	Marko Mäkelä [ marko ]
Resolution		Fixed [ 1 ]
Status	Open [ 1 ]	Closed [ 6 ]

Marko Mäkelä made changes - 2022-01-18 07:14

Summary

InnoDB allows to resize the bufferpool down to 8M even if page_size is 64K

Server hangs when the bufferpool is shrunk to 8M and innodb_page_size=64K

Marko Mäkelä made changes - 2022-01-18 07:14

Affects Version/s

10.2.41 [ 26032 ]

Marko Mäkelä made changes - 2022-01-20 15:50

Link

This issue relates to ~~MDEV-26784~~ [ ~~MDEV-26784~~ ]

Daniel Black made changes - 2025-01-20 11:39

Link

This issue relates to MDEV-35889 [ MDEV-35889 ]

MariaDB Server

Server hangs when the bufferpool is shrunk to 8M and innodb_page_size=64K

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration