Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27461

Server hangs when the bufferpool is shrunk to 8M and innodb_page_size=64K

Details

    Description

      Observation on
      origin/preview-10.8-MDEV-14425-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
      One RQG test starts the server with innodb_page_size 64K and bufferpool_size 256M,
      performs a bufferpool resizing down to 8M (message in server error log) and fails during or short after that with
      [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.
       
      Per my experiences with server startup on official versions of MariaDB
      InnoDB denies to work in case the page_size is 64K and the bufferpool_size
      is < 24M because of whatever reasons.
      Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.
       
      The MTR based test reports on
      - origin/10.7  2022-01-06
         SET GLOBAL innodb_buffer_pool_size = 8388608;
         Warnings:
         Warning 1210    innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
      - origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
        a warning like above
      - origin/10.8 commit baace572dc1d2ab35145556e9db231dca8c18147 (HEAD)
         MDEV-25342: autosize innodb_buffer_pool_chunk_size
         mysqltest: At line 11: query 'SET GLOBAL innodb_buffer_pool_size = @innodb_buffer_pool_size_save' failed: ER_WRONG_ARGUMENTS (1210): Another buffer pool resize is already in progress.
         which is caused by the ongoing asynchronous buffer pool resizing to 8M issued earlier.
         The server error log shows
         2022-01-10 17:59:45 4 [Note] InnoDB: Requested to resize buffer pool. (new size: 8388608 bytes)
         2022-01-10 17:59:45 0 [Note] InnoDB: Resizing buffer pool from 25165824 to 8388608 (unit=1048576).
      
      

      Attachments

        1. BPR.tgz
          10 kB
        2. MDEV-27461.txt
          472 kB

        Issue Links

          Activity

            mleich Matthias Leich created issue -
            mleich Matthias Leich made changes -
            Field Original Value New Value
            Attachment MDEV-27461.txt [ 61594 ]
            mleich Matthias Leich made changes -
            Description
            {noformat}
            Observation on
            origin/preview-10.8-MDEV-14425-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
            One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
            performed a bufferpool resizing down to 8M and failed during or short after that with
            [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

            Per my experiences with server startup on official versions of MariaDB
            InnoDB denies to work in case the page_size is 64K and the bufferpool_size
            is < 24M because of whatever reasons.
            Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.
            The corresponding test runs on MariaDB versions 10.5 till 10.8 too.

            I will try to develop some MTR based tests revealing that such a drastic resizing
            is possible.
            {noformat}
            {noformat}
            Observation on
            origin/preview-10.8-MDEV-14425-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
            One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
            performed a bufferpool resizing down to 8M and failed during or short after that with
            [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

            Per my experiences with server startup on official versions of MariaDB
            InnoDB denies to work in case the page_size is 64K and the bufferpool_size
            is < 24M because of whatever reasons.
            Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.
            The corresponding test runs on MariaDB versions 10.5 till 10.8 too.

            I will try to develop some MTR based tests revealing that such a drastic resizing
            is possible.
            The MTR based test on 10.7 reports
            SET GLOBAL innodb_buffer_pool_size = 8388608;
            Warnings:
            Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824

            {noformat}
            mleich Matthias Leich made changes -
            Description {noformat}
            Observation on
            origin/preview-10.8-MDEV-14425-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
            One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
            performed a bufferpool resizing down to 8M and failed during or short after that with
            [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

            Per my experiences with server startup on official versions of MariaDB
            InnoDB denies to work in case the page_size is 64K and the bufferpool_size
            is < 24M because of whatever reasons.
            Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.
            The corresponding test runs on MariaDB versions 10.5 till 10.8 too.

            I will try to develop some MTR based tests revealing that such a drastic resizing
            is possible.
            The MTR based test on 10.7 reports
            SET GLOBAL innodb_buffer_pool_size = 8388608;
            Warnings:
            Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824

            {noformat}
            {noformat}
            Observation on
            origin/preview-10.8-MDEV-14425-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
            One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
            performed a bufferpool resizing down to 8M and failed during or short after that with
            [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

            Per my experiences with server startup on official versions of MariaDB
            InnoDB denies to work in case the page_size is 64K and the bufferpool_size
            is < 24M because of whatever reasons.
            Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.
            The corresponding test runs on MariaDB versions 10.5 till 10.8 too.

            I will try to develop some MTR based tests revealing that such a drastic resizing
            is possible.

            The MTR based test reports on
            - origin/10.7 2022-01-06
               SET GLOBAL innodb_buffer_pool_size = 8388608;
               Warnings:
               Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
            - origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
              a warning like above

            {noformat}
            mleich Matthias Leich made changes -
            Description {noformat}
            Observation on
            origin/preview-10.8-MDEV-14425-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
            One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
            performed a bufferpool resizing down to 8M and failed during or short after that with
            [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

            Per my experiences with server startup on official versions of MariaDB
            InnoDB denies to work in case the page_size is 64K and the bufferpool_size
            is < 24M because of whatever reasons.
            Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.
            The corresponding test runs on MariaDB versions 10.5 till 10.8 too.

            I will try to develop some MTR based tests revealing that such a drastic resizing
            is possible.

            The MTR based test reports on
            - origin/10.7 2022-01-06
               SET GLOBAL innodb_buffer_pool_size = 8388608;
               Warnings:
               Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
            - origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
              a warning like above

            {noformat}
            {noformat}
            Observation on
            origin/preview-10.8-MDEV-14425-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
            One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
            performed a bufferpool resizing down to 8M (message in server error log) and failed during or short after that with
            [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

            Per my experiences with server startup on official versions of MariaDB
            InnoDB denies to work in case the page_size is 64K and the bufferpool_size
            is < 24M because of whatever reasons.
            Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.

            I will try to develop some MTR based tests revealing that such a drastic resizing
            is possible.

            The MTR based test reports on
            - origin/10.7 2022-01-06
               SET GLOBAL innodb_buffer_pool_size = 8388608;
               Warnings:
               Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
            - origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
              a warning like above

            {noformat}
            mleich Matthias Leich made changes -
            Description {noformat}
            Observation on
            origin/preview-10.8-MDEV-14425-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
            One RQG test starting the server with innodb_page_size 64K and bufferpool_size 256M
            performed a bufferpool resizing down to 8M (message in server error log) and failed during or short after that with
            [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

            Per my experiences with server startup on official versions of MariaDB
            InnoDB denies to work in case the page_size is 64K and the bufferpool_size
            is < 24M because of whatever reasons.
            Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.

            I will try to develop some MTR based tests revealing that such a drastic resizing
            is possible.

            The MTR based test reports on
            - origin/10.7 2022-01-06
               SET GLOBAL innodb_buffer_pool_size = 8388608;
               Warnings:
               Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
            - origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
              a warning like above

            {noformat}
            {noformat}
            Observation on
            origin/preview-10.8-MDEV-14425-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
            One RQG test starts the server with innodb_page_size 64K and bufferpool_size 256M,
            performs a bufferpool resizing down to 8M (message in server error log) and fails during or short after that with
            [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

            Per my experiences with server startup on official versions of MariaDB
            InnoDB denies to work in case the page_size is 64K and the bufferpool_size
            is < 24M because of whatever reasons.
            Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.

            I will try to develop some MTR based tests revealing that such a drastic resizing
            is possible.

            The MTR based test reports on
            - origin/10.7 2022-01-06
               SET GLOBAL innodb_buffer_pool_size = 8388608;
               Warnings:
               Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
            - origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
              a warning like above

            {noformat}
            mleich Matthias Leich made changes -
            Fix Version/s 10.8 [ 26121 ]
            Affects Version/s 10.8.0 [ 26800 ]
            mleich Matthias Leich made changes -
            Description {noformat}
            Observation on
            origin/preview-10.8-MDEV-14425-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
            One RQG test starts the server with innodb_page_size 64K and bufferpool_size 256M,
            performs a bufferpool resizing down to 8M (message in server error log) and fails during or short after that with
            [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

            Per my experiences with server startup on official versions of MariaDB
            InnoDB denies to work in case the page_size is 64K and the bufferpool_size
            is < 24M because of whatever reasons.
            Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.

            I will try to develop some MTR based tests revealing that such a drastic resizing
            is possible.

            The MTR based test reports on
            - origin/10.7 2022-01-06
               SET GLOBAL innodb_buffer_pool_size = 8388608;
               Warnings:
               Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
            - origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
              a warning like above

            {noformat}
            {noformat}
            Observation on
            origin/preview-10.8-MDEV-14425-innodb f6b1e6fbae27842f78b28ad8ceb898e6008841d1 2022-01-07T18:59:05+02:00
            One RQG test starts the server with innodb_page_size 64K and bufferpool_size 256M,
            performs a bufferpool resizing down to 8M (message in server error log) and fails during or short after that with
            [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.

            Per my experiences with server startup on official versions of MariaDB
            InnoDB denies to work in case the page_size is 64K and the bufferpool_size
            is < 24M because of whatever reasons.
            Nevertheless InnoDB does not deny a resizing of the bufferpool to 8M at runtime.

            The MTR based test reports on
            - origin/10.7 2022-01-06
               SET GLOBAL innodb_buffer_pool_size = 8388608;
               Warnings:
               Warning 1210 innodb_buffer_pool_size must be at least innodb_buffer_pool_chunk_size=25165824
            - origin/10.8 897d8c57b68858a58eea6cda4d971c3a94f6cc4a 2021-11-29
              a warning like above
            - origin/10.8 commit baace572dc1d2ab35145556e9db231dca8c18147 (HEAD)
               MDEV-25342: autosize innodb_buffer_pool_chunk_size
               mysqltest: At line 11: query 'SET GLOBAL innodb_buffer_pool_size = @innodb_buffer_pool_size_save' failed: ER_WRONG_ARGUMENTS (1210): Another buffer pool resize is already in progress.
               which is caused by the ongoing asynchronous buffer pool resizing to 8M issued earlier.
               The server error log shows
               2022-01-10 17:59:45 4 [Note] InnoDB: Requested to resize buffer pool. (new size: 8388608 bytes)
               2022-01-10 17:59:45 0 [Note] InnoDB: Resizing buffer pool from 25165824 to 8388608 (unit=1048576).

            {noformat}
            mleich Matthias Leich made changes -
            Attachment BPR.tgz [ 61600 ]
            Fix Version/s 10.8 [ 26121 ]
            Affects Version/s 10.8.0 [ 26800 ]
            Assignee Matthias Leich [ mleich ] Daniel Black [ danblack ]
            Summary Draft: InnoDB allows to resize the bufferpool down to 8M even if page_size is 64K InnoDB allows to resize the bufferpool down to 8M even if page_size is 64K
            danblack Daniel Black added a comment - - edited

            Thanks mleich!

            Technically reproducible on 10.2 with explicit minimum (1M) innodb-buffer-pool-chunk-size, MDEV-25342 just pushed the default value for this lower.

            10.2.42-MariaDB-80d33261ea1a731f228b712b0e763740f87b89be

            $ podman run  -e MARIADB_ALLOW_EMPTY_ROOT_PASSWORD=1 quay.io/mariadb-foundation/mariadb-devel:10.2 --innodb-buffer-pool-size=256M --innodb-buffer-pool-chunk-size=1M --innodb-page-size=64k
            ...
            Version: '10.2.42-MariaDB-1:10.2.42+maria~bionic' as '10.2.42-MariaDB-80d33261ea1a731f228b712b0e763740f87b89be'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
            2022-01-11  5:25:55 140318846473984 [Note] InnoDB: Resizing buffer pool from 268435456 to 8388608 (unit=1048576).
            2022-01-11  5:25:55 140319263479552 [Note] InnoDB: 
            2022-01-11  5:25:55 140318846473984 [Note] InnoDB: Disabling adaptive hash index.
            2022-01-11  5:25:55 140318846473984 [Note] InnoDB: disabled adaptive hash index.
            2022-01-11  5:25:55 140318846473984 [Note] InnoDB: Withdrawing blocks to be shrunken.
            2022-01-11  5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : start to withdraw the last 3968 blocks.
            2022-01-11  5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968)
            2022-01-11  5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 3799 blocks from free list. Tried to relocate 41 pages (3840/3968).
            2022-01-11  5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968)
            2022-01-11  5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 0 blocks from free list. Tried to relocate 0 pages (3840/3968).
            2022-01-11  5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968)
             
            2022-01-11  5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968)
            2022-01-11  5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 0 blocks from free list. Tried to relocate 0 pages (3840/3968).
            2022-01-11  5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968)
            2022-01-11  5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 0 blocks from free list. Tried to relocate 0 pages (3840/3968).
            2022-01-11  5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : will retry to withdraw later.
            2022-01-11  5:28:00 140318846473984 [Note] InnoDB: Will retry to withdraw 10 seconds later.
            

            I don't quite get the assertion in the timeout thread, but if the above endless loop is prevented it becomes resolved.

            danblack Daniel Black added a comment - - edited Thanks mleich ! Technically reproducible on 10.2 with explicit minimum (1M) innodb-buffer-pool-chunk-size, MDEV-25342 just pushed the default value for this lower. 10.2.42-MariaDB-80d33261ea1a731f228b712b0e763740f87b89be $ podman run -e MARIADB_ALLOW_EMPTY_ROOT_PASSWORD=1 quay.io/mariadb-foundation/mariadb-devel:10.2 --innodb-buffer-pool-size=256M --innodb-buffer-pool-chunk-size=1M --innodb-page-size=64k ... Version: '10.2.42-MariaDB-1:10.2.42+maria~bionic' as '10.2.42-MariaDB-80d33261ea1a731f228b712b0e763740f87b89be' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: Resizing buffer pool from 268435456 to 8388608 (unit=1048576). 2022-01-11 5:25:55 140319263479552 [Note] InnoDB: 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: Disabling adaptive hash index. 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: disabled adaptive hash index. 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: Withdrawing blocks to be shrunken. 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : start to withdraw the last 3968 blocks. 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968) 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 3799 blocks from free list. Tried to relocate 41 pages (3840/3968). 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968) 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 0 blocks from free list. Tried to relocate 0 pages (3840/3968). 2022-01-11 5:25:55 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968)   2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968) 2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 0 blocks from free list. Tried to relocate 0 pages (3840/3968). 2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (3840/3968) 2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : withdrew 0 blocks from free list. Tried to relocate 0 pages (3840/3968). 2022-01-11 5:28:00 140318846473984 [Note] InnoDB: buffer pool 0 : will retry to withdraw later. 2022-01-11 5:28:00 140318846473984 [Note] InnoDB: Will retry to withdraw 10 seconds later. I don't quite get the assertion in the timeout thread, but if the above endless loop is prevented it becomes resolved.
            danblack Daniel Black made changes -
            Affects Version/s 10.7.1 [ 26120 ]
            Affects Version/s 10.2.41 [ 26032 ]
            danblack Daniel Black made changes -
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.8 [ 26121 ]

            I analyzed the hang yesterday. It was caused by buf_pool.mutex being continuously held by buf_LRU_free_page(), for buf_page_init_for_read() while buf_pool_t::resize() in another thread was desperately trying to complete. We had innodb_buffer_pool_size=8M and innodb_page_size=64k during that hang.

            What the the page read was issued for should not matter. I think that it was purge or rollback.

            marko Marko Mäkelä added a comment - I analyzed the hang yesterday. It was caused by buf_pool.mutex being continuously held by buf_LRU_free_page() , for buf_page_init_for_read() while buf_pool_t::resize() in another thread was desperately trying to complete. We had innodb_buffer_pool_size=8M and innodb_page_size=64k during that hang. What the the page read was issued for should not matter. I think that it was purge or rollback.
            danblack Daniel Black added a comment -

            From marko: My preferred fix would be to avoid the hang when trying to allocate a page while resize is running. I did not investigate how easy that would be to do.

            danblack Daniel Black added a comment - From marko : My preferred fix would be to avoid the hang when trying to allocate a page while resize is running. I did not investigate how easy that would be to do.
            danblack Daniel Black made changes -

            I think that the test failure demonstrates two problems. The main one is that the minimum buffer pool size is not being enforced consistently. I hope that you can fix that, danblack.

            To me, the more interesting one is that the server is hanging. I will try to determine the root cause of that, and see what it would take to fix that.

            marko Marko Mäkelä added a comment - I think that the test failure demonstrates two problems. The main one is that the minimum buffer pool size is not being enforced consistently. I hope that you can fix that, danblack . To me, the more interesting one is that the server is hanging. I will try to determine the root cause of that, and see what it would take to fix that.
            marko Marko Mäkelä made changes -

            I had a look at the code. The function buf_LRU_get_free_block() already works in a rather reasonable fashion. It is releasing the buf_pool.mutex between iterations. The infamous message InnoDB: Difficult to find free blocks in the buffer pool is being suppressed when buffer pool resizing is active.

            Because buf_LRU_get_free_block() is not waiting for buffer pool resizing to finish and because buf_pool_t::resize() is only polling the status every 10 seconds, the infinite loop could be unavoidable. In MDEV-27461.txt we have the following:

            • Thread 13 is attempting to allocate a buffer page for reading a page when opening a table, causing the infinite wait in buf_LRU_get_free_block().
            • Thread 11 (finishing ROLLBACK) is attempting to allocate a buffer page for reading a page.

            I do not think that either thread should be holding many buffer pool page latches. I will have to debug the trace again to see if anything else could be improved, apart from the trivial fix to enforce a reasonable minimum buffer pool size (measured in pages of innodb_page_size, not bytes).

            marko Marko Mäkelä added a comment - I had a look at the code. The function buf_LRU_get_free_block() already works in a rather reasonable fashion. It is releasing the buf_pool.mutex between iterations. The infamous message InnoDB: Difficult to find free blocks in the buffer pool is being suppressed when buffer pool resizing is active. Because buf_LRU_get_free_block() is not waiting for buffer pool resizing to finish and because buf_pool_t::resize() is only polling the status every 10 seconds, the infinite loop could be unavoidable. In MDEV-27461.txt we have the following: Thread 13 is attempting to allocate a buffer page for reading a page when opening a table, causing the infinite wait in buf_LRU_get_free_block() . Thread 11 (finishing ROLLBACK ) is attempting to allocate a buffer page for reading a page. I do not think that either thread should be holding many buffer pool page latches. I will have to debug the trace again to see if anything else could be improved, apart from the trivial fix to enforce a reasonable minimum buffer pool size (measured in pages of innodb_page_size , not bytes).

            I analyzed a new trace of this. In it, we have 229 pages in both buf_pool.LRU and buf_pool.flush_list, and at least 2 of those pages are really waiting to be written out to a data file. (After MDEV-25113, buf_pool.flush_list may contain clean pages.)

            The page cleaner is in an untimed wait that was introduced in MDEV-24278. The buffer pool resizer should probably wake it up. Both buf_flush_async_lsn and buf_flush_sync_lsn are 0. During the execution of the entire trace, the page cleaner was woken up only once:

            Thread 2 hit Breakpoint 3, buf_flush_page_cleaner () at /data/Server/bb-10.8-MDEV-14425D/storage/innobase/buf/buf0flu.cc:2235
            2235	    set_timespec(abstime, 1);
            

            I believe that a potential hang is possible in 10.5 or later even with larger buffer pool sizes, due to MDEV-24278.

            marko Marko Mäkelä added a comment - I analyzed a new trace of this. In it, we have 229 pages in both buf_pool.LRU and buf_pool.flush_list , and at least 2 of those pages are really waiting to be written out to a data file. (After MDEV-25113 , buf_pool.flush_list may contain clean pages.) The page cleaner is in an untimed wait that was introduced in MDEV-24278 . The buffer pool resizer should probably wake it up. Both buf_flush_async_lsn and buf_flush_sync_lsn are 0. During the execution of the entire trace, the page cleaner was woken up only once: Thread 2 hit Breakpoint 3, buf_flush_page_cleaner () at /data/Server/bb-10.8-MDEV-14425D/storage/innobase/buf/buf0flu.cc:2235 2235 set_timespec(abstime, 1); I believe that a potential hang is possible in 10.5 or later even with larger buffer pool sizes, due to MDEV-24278 .
            marko Marko Mäkelä added a comment - - edited

            For 10.5 and later, I pushed a fix to ensure that the page cleaner will start writing out pages so that some space will become available in the buffer pool. According to mleich, it helps to reduce the problem, but it will not prevent hangs when larger values of innodb_max_dirty_pages_pct are in use.

            With that fix (fixing a minor regression caused by MDEV-24278), the problem is no longer repeatable with the default innodb_max_dirty_pages_pct=90. Therefore, I do not think that this bug should block MDEV-25342 anymore.

            Root cause:
            I was wondering why buf_flush_LRU_list_batch() fails to write out any dirty pages. The reason is that the condition UT_LIST_GET_LEN(buf_pool.LRU) > BUF_LRU_MIN_LEN does not hold, because there are fewer than 256 pages in the buf_pool.LRU list. This would seem to be the hard minimum size of the buffer pool in pages. 256*16384=4MiB, and 256*65536=16MiB. The buffer pool is smaller than the latter.

            I do not know why the minimum sizes at startup are steeper than the hard limit (5MiB for up to innodb_page_size=16384, and 24MiB for larger page sizes), but it indeed would be consistent to enforce the same limit at startup and in SET GLOBAL. danblack, can you please fix the SET GLOBAL?

            marko Marko Mäkelä added a comment - - edited For 10.5 and later, I pushed a fix to ensure that the page cleaner will start writing out pages so that some space will become available in the buffer pool. According to mleich , it helps to reduce the problem, but it will not prevent hangs when larger values of innodb_max_dirty_pages_pct are in use. With that fix (fixing a minor regression caused by MDEV-24278 ), the problem is no longer repeatable with the default innodb_max_dirty_pages_pct=90 . Therefore, I do not think that this bug should block MDEV-25342 anymore. Root cause: I was wondering why buf_flush_LRU_list_batch() fails to write out any dirty pages. The reason is that the condition UT_LIST_GET_LEN(buf_pool.LRU) > BUF_LRU_MIN_LEN does not hold, because there are fewer than 256 pages in the buf_pool.LRU list. This would seem to be the hard minimum size of the buffer pool in pages. 256*16384=4MiB, and 256*65536=16MiB. The buffer pool is smaller than the latter. I do not know why the minimum sizes at startup are steeper than the hard limit (5MiB for up to innodb_page_size=16384 , and 24MiB for larger page sizes), but it indeed would be consistent to enforce the same limit at startup and in SET GLOBAL . danblack , can you please fix the SET GLOBAL ?
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            marko Marko Mäkelä made changes -

            I applied the patch provided by Marko to
                origin/bb-10.8-MDEV-14425 7f75466f539b61d3dc8696e72a2d715c59aa04d6 2022-01-14T19:52:40+02:00
                The RQG replay tests starts the server with innodb_buffer_pool_size=256M
                and switches that during runtime between 256M and 8M. 
            and the results were
            - innodb_max_dirty_pages_pct not assigned at server startup --> get default of 90
              More than 400 RQG tests executed and none of them showed the problem described on top.
              This is some dramatic progress compared to bb-10.8-MDEV-14425 without the patch.
            - innodb_max_dirty_pages_pct assigned at server startup between 99% and 99.975%.
               Some replays of the problem. This is to be expected because we have a critical
               combination of high innodb_max_dirty_pages_pct and very small innodb_bufferpool_size.
              Raising innodb_max_dirty_pages_pct leads to increasing fraction of replays of the problem like expected.
            

            mleich Matthias Leich added a comment - I applied the patch provided by Marko to origin/bb-10.8-MDEV-14425 7f75466f539b61d3dc8696e72a2d715c59aa04d6 2022-01-14T19:52:40+02:00 The RQG replay tests starts the server with innodb_buffer_pool_size=256M and switches that during runtime between 256M and 8M. and the results were - innodb_max_dirty_pages_pct not assigned at server startup --> get default of 90 More than 400 RQG tests executed and none of them showed the problem described on top. This is some dramatic progress compared to bb-10.8-MDEV-14425 without the patch. - innodb_max_dirty_pages_pct assigned at server startup between 99% and 99.975%. Some replays of the problem. This is to be expected because we have a critical combination of high innodb_max_dirty_pages_pct and very small innodb_bufferpool_size. Raising innodb_max_dirty_pages_pct leads to increasing fraction of replays of the problem like expected.

            Consistent enforcement of the minimum value in SET GLOBAL innobd_buffer_pool_size will be covered by MDEV-27467.
            The scope of this ticket was reduced to only fix the recent regression that was caused by MDEV-24278 in MariaDB Server 10.5.9: the page cleaner would fail to wake up while the buffer pool is being shrunk.

            marko Marko Mäkelä added a comment - Consistent enforcement of the minimum value in SET GLOBAL innobd_buffer_pool_size will be covered by MDEV-27467 . The scope of this ticket was reduced to only fix the recent regression that was caused by MDEV-24278 in MariaDB Server 10.5.9: the page cleaner would fail to wake up while the buffer pool is being shrunk.
            marko Marko Mäkelä made changes -
            Fix Version/s 10.5.14 [ 26809 ]
            Fix Version/s 10.6.6 [ 26811 ]
            Fix Version/s 10.7.2 [ 26813 ]
            Fix Version/s 10.8.1 [ 26815 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Assignee Daniel Black [ danblack ] Marko Mäkelä [ marko ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            Summary InnoDB allows to resize the bufferpool down to 8M even if page_size is 64K Server hangs when the bufferpool is shrunk to 8M and innodb_page_size=64K
            marko Marko Mäkelä made changes -
            Affects Version/s 10.2.41 [ 26032 ]
            marko Marko Mäkelä made changes -
            danblack Daniel Black made changes -

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.