Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36226

Stall and crash when page cleaner fails to generate free pages during Async flush

Details

    Description

      The issue is not repeatable but here is a snapshot of the stack of user case when it crashed.

      1. srv_monitor_task: Raises fatal error as dict sys latch wait found to exceed 600 sec (10 minutes)

      #5  0x000056327cfbad2f in ib::fatal::~fatal (this=<optimized out>, __in_chrg=<optimized out>) 
      #6  0x000056327cfb222f in srv_monitor_task.cold () 
      #7  0x000056327d865268 in tpool::thread_pool_generic::timer_generic::run (this=0x56327f13be50) 
      

      2. DML threads waiting on dict sys latch while trying to open a table

      #5  dict_sys_t::freeze (line=1039, file="storage/innobase/dict/dict0dict.cc" at /usr/src/debug/MariaDB-/src_0/storage/innobase/dict/dict0dict.cc:1006
      #6  0x000056327d7e9f88 in dict_table_open_on_name (table_name="zabbix/applications", DICT_ERR_IGNORE_FK_NOKEY) at /usr/src/debug/MariaDB-/src_0/storage/innobase/dict/dict0dict.cc:1039
      #7  0x000056327d699f65 in ha_innobase::open(char const*, int, unsigned int) () at /usr/src/debug/MariaDB-/src_0/storage/innobase/handler/ha_innodb.cc:6187
      #9  0x000056327d20fa82 in open_table_from_share
      

      3. Threads stuck waiting for free page in buffer pool.

      #1 0x000056327d7dcd9f in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at storage/innobase/buf/buf0lru.cc:467
      

      3A. One of the above threads is fetching dict stat information holding X dictionary latch and hence blocking all dictionary operations.

      #1 0x000056327d7dcd9f in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at storage/innobase/buf/buf0lru.cc:467
      #2 0x000056327d02425f in buf_page_init_for_read (mode=132, page_id={m_id = 37005438222352}, at storage/innobase/buf/buf0rea.cc:121
      #3 0x000056327d7dd6f7 in buf_read_page_low (unzip=false, zip_size=0, page_id={m_id = 37005438222352}, mode=132, sync=true, space=0x56327efbbdb8) at storage/innobase/buf/buf0rea.cc:295
      #10 0x000056327d76188b in row_sel_step (thr=thr@entry=0x7f9418606dd0) at /usr/src/debug/MariaDB-/src_0/storage/innobase/row/row0sel.cc:2433
      #14 0x000056327d71dd1b in que_eval_sql (info=info@entry=0x7f941805d890,
      #15 0x000056327d8060a2 in dict_stats_fetch_from_ps (table=0x7f9418600e40) at /usr/src/debug/MariaDB-/src_0/storage/innobase/dict/dict0stats.cc:3910
      

      4. Page cleaner thread which should generate free pages, is busy flushing pages.
      #0  0x00007fa721b3248c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x000056327d7d14bc in buf_dblwr_t::flush_buffered_writes (this=0x56327eaa3520 <buf_dblwr>, size=64) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0dblwr.cc:588
      #2  0x000056327d7d1b5b in buf_dblwr_t::add_to_batch (this=<buf_dblwr>, request= {bpage = 0x7f9e6800faa0, slot=0x0, type = IORequest::WRITE_ASYNC}, size=16384) at storage/innobase/buf/buf0dblwr.cc:647
      #3  0x000056327d7d7655 in buf_page_t::flush (this=0x7f9e6800faa0, space=0x7f92e0072e78) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/os0file.h:209
      #4  0x000056327d7d9397 in buf_do_flush_list_batch (lsn=<optimized out>, max_n=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:1524
      #5  buf_flush_list_holding_mutex (lsn=<optimized out>, max_n=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:1598
      #6  buf_flush_list_holding_mutex (max_n=<optimized out>, lsn=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:1579
      #7  0x000056327d7d9dc9 in buf_flush_page_cleaner () at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:2503
      

      The crash happened as page cleaner could not free up any page for 10 minutes. Now during regular iteration the page cleaner does flush from flush list for maximum of innodb_max_io_capacity capacity and then goes for generating free pages from LRU tail.

      In the specific case innodb_max_io_capacity = 4000 4000x16k ~64M data page flush(128M write IO with double write) should not take that long. From the detailed code analysis I found one catch in some specific case where we may skip freeing any page from the LRU tail and go to the next iteration for flush list flush. The process could repeat for long time if the buffer pool size (in GBs) and the total number of dirty pages are large.

      I could repeat the case where LRU flush/eviction step fails to free any free page in debugger with the following scenario.

      1. Dirty pages that are modified and not used for long time (set-A) filling up the tail of the LRU.

      2. At the same time other dirty pages modified earlier than the LRU tail pages(set-B) but used and modified repeatedly. These pages are in the head of the flush list and would be flushed prior to set-A while flushing from the flush list to reduce checkpoint age.

      3. When asynchronous flush is triggered i.e. when 7/8 th of the LSN margin is filled in the redo log, we try to flush innodb_io_capacity_max number of pages from flush list in the iteration. after flushing innodb_io_capacity_max number of pages from flush list we try to do flush and eviction from the LRU (buf_flush_LRU) with max_n=0 i.e. the number of pages allowed to flush during LRU scan is zero because we have already flushed innodb_io_capacity_max amount of pages from flush list in the iteration.

      4. We have all dirty pages (set-A) in LRU tail and we return not able to evict any page at all.

      This scenario could happen with large buffer pool size. It should be fixed by ensuring at least some flush quota for LRU flush and eviction stage in a iteration.

      Attachments

        Issue Links

          Activity

            One thing that is somewhat unclear if this hang could possibly have been fixed by MDEV-35409 or MDEV-34265. The 10.6.18 release is also affected by MDEV-34166, but I assume that it was already ruled out.

            marko Marko Mäkelä added a comment - One thing that is somewhat unclear if this hang could possibly have been fixed by MDEV-35409 or MDEV-34265 . The 10.6.18 release is also affected by MDEV-34166 , but I assume that it was already ruled out.

            marko I checked this scenario in 10.6 latest. So I think it is not fixed by any other MDEV yet.

            commit 15139c88a8f14c535f6888892f3f6dc3765bc765 (HEAD -> 10.6, origin/10.6)
            Merge: e95a8f84de7 3a4c0295ae0
            Author: Julius Goryavsky <julius.goryavsky@mariadb.com>
            Date:   Wed Mar 5 01:54:40 2025 +0100
             
                Merge branch '10.5' into '10.6'
            
            

            debarun Debarun Banerjee added a comment - marko I checked this scenario in 10.6 latest. So I think it is not fixed by any other MDEV yet. commit 15139c88a8f14c535f6888892f3f6dc3765bc765 (HEAD -> 10.6, origin/10.6) Merge: e95a8f84de7 3a4c0295ae0 Author: Julius Goryavsky <julius.goryavsky@mariadb.com> Date: Wed Mar 5 01:54:40 2025 +0100   Merge branch '10.5' into '10.6'

            There are several open bugs about the fact that InnoDB is holding dict_sys.latch while waiting for something in the buffer pool. MDEV-33594 was encountered by the same customer earlier, and some more bugs are linked in MDEV-34988.

            marko Marko Mäkelä added a comment - There are several open bugs about the fact that InnoDB is holding dict_sys.latch while waiting for something in the buffer pool. MDEV-33594 was encountered by the same customer earlier, and some more bugs are linked in MDEV-34988 .

            Created a patch for fixing the issue along with some diagnostics log. Pull Request

            debarun Debarun Banerjee added a comment - Created a patch for fixing the issue along with some diagnostics log. Pull Request

            I am confident that debarun’s fix is going to improve things in this area, but it would be even better if it could be validated. We were never able to reproduce the original hang so far. I assume that it could require a buffer pool size that is small compared to the working set, maybe also a virtual machine with few vCPUs and lots of concurrent connections.

            marko Marko Mäkelä added a comment - I am confident that debarun ’s fix is going to improve things in this area, but it would be even better if it could be validated. We were never able to reproduce the original hang so far. I assume that it could require a buffer pool size that is small compared to the working set, maybe also a virtual machine with few vCPUs and lots of concurrent connections.

            Here is some suggestion for attempting to repeat the issue.

            1. Large enough buffer pool so that it takes 15 minutes+ time to flush 80% of the buffer pool.

            innodb_max_io_capacity = 4000, innodb_page_size=16k, innodb_buffer_pool_size= 72G
            

            2. Avoid page flushing based on dirty page.

            innodb_max_dirty_pages_pct= 99.9
            

            3. Have some write-load and set innodb_redo_log_size such that buffer pool dirty page percentage is very high ( at least > 80%) when checkpoint age is high and close to the of the total redo log margin.

            4. Some concurrent read-load might help pushing a dirty page to LRU tail before it is actually flushed.

            debarun Debarun Banerjee added a comment - Here is some suggestion for attempting to repeat the issue. 1. Large enough buffer pool so that it takes 15 minutes+ time to flush 80% of the buffer pool. innodb_max_io_capacity = 4000, innodb_page_size=16k, innodb_buffer_pool_size= 72G 2. Avoid page flushing based on dirty page. innodb_max_dirty_pages_pct= 99.9 3. Have some write-load and set innodb_redo_log_size such that buffer pool dirty page percentage is very high ( at least > 80%) when checkpoint age is high and close to the of the total redo log margin. 4. Some concurrent read-load might help pushing a dirty page to LRU tail before it is actually flushed.
            Steve Shaw Steve Shaw added a comment -

            I have managed to get 10.6.18 to hang with a hammerdb workload, it hasn't crashed but it is stuck at 0 TPM. Just before it dropped to 0 it gave the message "2025-03-25 15:38:36 0 [Warning] InnoDB: Could not free any blocks in the buffer pool! 4737408 blocks are in use and 0 free. Consider increasing innodb_buffer_pool_size". This was after 1 hr of running a workload.
            To do I this increased the buffer pool size to 72GB and I decreased the redo log size to 512MB, I also doubled the number of threads to 200 and used the "use all warehouses option" and used the parameters above. Any query to any of the tables then hangs e.g. "select * from warehouse".
            Top functions in this state are:
            25.16% mariadbd [.] buf_flush_list_holding_mutex(unsigned long, unsigned long)
            3.50% mariadbd [.] buf_flush_page_cleaner() [clone .cold]
            3.09% mariadbd [.] fil_flush_file_spaces() [clone .part.0]
            1.57% mariadbd [.] buf_flush_list(unsigned long, unsigned long)
            The database would not shut down and needed the process to be killed, but recovered on startup.
            Will attempt to repeat to ensure it is reproducible and then try with the pull request.

            Steve Shaw Steve Shaw added a comment - I have managed to get 10.6.18 to hang with a hammerdb workload, it hasn't crashed but it is stuck at 0 TPM. Just before it dropped to 0 it gave the message "2025-03-25 15:38:36 0 [Warning] InnoDB: Could not free any blocks in the buffer pool! 4737408 blocks are in use and 0 free. Consider increasing innodb_buffer_pool_size". This was after 1 hr of running a workload. To do I this increased the buffer pool size to 72GB and I decreased the redo log size to 512MB, I also doubled the number of threads to 200 and used the "use all warehouses option" and used the parameters above. Any query to any of the tables then hangs e.g. "select * from warehouse". Top functions in this state are: 25.16% mariadbd [.] buf_flush_list_holding_mutex(unsigned long, unsigned long) 3.50% mariadbd [.] buf_flush_page_cleaner() [clone .cold] 3.09% mariadbd [.] fil_flush_file_spaces() [clone .part.0] 1.57% mariadbd [.] buf_flush_list(unsigned long, unsigned long) The database would not shut down and needed the process to be killed, but recovered on startup. Will attempt to repeat to ensure it is reproducible and then try with the pull request.
            Steve Shaw Steve Shaw added a comment -

            I can confirm that the behaviour is reproducible, after running the workload for up to an hour it prints the message "Could not free any blocks in the buffer pool! 4737408 blocks are in use and 0 free. Consider increasing innodb_buffer_pool_size" and the transaction rate drops to 0, the database hangs and will not shutdown with the same functions active.

            Steve Shaw Steve Shaw added a comment - I can confirm that the behaviour is reproducible, after running the workload for up to an hour it prints the message "Could not free any blocks in the buffer pool! 4737408 blocks are in use and 0 free. Consider increasing innodb_buffer_pool_size" and the transaction rate drops to 0, the database hangs and will not shutdown with the same functions active.

            Hi Steve Shaw

            That is great news!. Can you please capture the following information for me to analyze and check if this is same issue ?

            1. Server error log
            2. SHOW ENGINE INNODB STATUS;
            3. Stack trace for all the threads by attaching gdb

            debarun Debarun Banerjee added a comment - Hi Steve Shaw That is great news!. Can you please capture the following information for me to analyze and check if this is same issue ? 1. Server error log 2. SHOW ENGINE INNODB STATUS; 3. Stack trace for all the threads by attaching gdb

            Steve Shaw can you please run 10.6.21 with and without the fix and verify that the fix solves this issue?

            monty Michael Widenius added a comment - Steve Shaw can you please run 10.6.21 with and without the fix and verify that the fix solves this issue?

            Here are some possible work around that would make this bug less likely to happen for 10.6.18+ versions. The tuning could adversely affect performance.

            1. Increase redo log size. The actual value needs to be derived by observing the check point age calculated as
            [Innodb_lsn_current - Innodb_lsn_last_checkpoint] in bytes. It is advised to keep the checkpoint age < 70% of the total redo size.

            MariaDB [(none)]> show status like "%lsn%";
            +----------------------------+---------+
            | Variable_name              | Value   |
            +----------------------------+---------+
            | Innodb_lsn_current         | 3913386 |
            | Innodb_lsn_flushed         | 3913386 |
            | Innodb_lsn_last_checkpoint | 3913370 |
            +----------------------------+---------+
            

            2. Increase innodb_io_capacity (more than 1/2 of innodb_max_io_capacity) to allow more aggressive flushing.
            https://mariadb.com/kb/en/innodb-system-variables/#innodb_io_capacity

            3. Reduce the value of innodb_max_dirty_pages_pct to allow some early flushing from the buffer pool.
            https://mariadb.com/kb/en/innodb-system-variables/#innodb_max_dirty_pages_pct

            4. Increase LRU scan depth to allow for more free page availability, It could affect overall performance.
            https://mariadb.com/kb/en/innodb-system-variables/#innodb_lru_scan_depth

            debarun Debarun Banerjee added a comment - Here are some possible work around that would make this bug less likely to happen for 10.6.18+ versions. The tuning could adversely affect performance. 1. Increase redo log size. The actual value needs to be derived by observing the check point age calculated as [Innodb_lsn_current - Innodb_lsn_last_checkpoint] in bytes. It is advised to keep the checkpoint age < 70% of the total redo size. MariaDB [(none)]> show status like "%lsn%"; +----------------------------+---------+ | Variable_name | Value | +----------------------------+---------+ | Innodb_lsn_current | 3913386 | | Innodb_lsn_flushed | 3913386 | | Innodb_lsn_last_checkpoint | 3913370 | +----------------------------+---------+ 2. Increase innodb_io_capacity (more than 1/2 of innodb_max_io_capacity) to allow more aggressive flushing. https://mariadb.com/kb/en/innodb-system-variables/#innodb_io_capacity 3. Reduce the value of innodb_max_dirty_pages_pct to allow some early flushing from the buffer pool. https://mariadb.com/kb/en/innodb-system-variables/#innodb_max_dirty_pages_pct 4. Increase LRU scan depth to allow for more free page availability, It could affect overall performance. https://mariadb.com/kb/en/innodb-system-variables/#innodb_lru_scan_depth
            Steve Shaw Steve Shaw added a comment -

            Yes I will, I will run on the unmodified 10.6.21 first to try and reproduce on this version.
            For the earlier questions the error log only showed
            "Could not free any blocks in the buffer pool! 4737408 blocks are in use and 0 free. Consider increasing innodb_buffer_pool_size"
            before hanging and no other messages.
            I ran SHOW ENGINE INNODB STATUS, however did not save the output, I will do this for 10.6.21.

            Steve Shaw Steve Shaw added a comment - Yes I will, I will run on the unmodified 10.6.21 first to try and reproduce on this version. For the earlier questions the error log only showed "Could not free any blocks in the buffer pool! 4737408 blocks are in use and 0 free. Consider increasing innodb_buffer_pool_size" before hanging and no other messages. I ran SHOW ENGINE INNODB STATUS, however did not save the output, I will do this for 10.6.21.

            Thanks Steve Shaw. Could you please also try to capture the stack trace when it hangs for 10.6.21 run ? It would be of great help in analysis.

            debarun Debarun Banerjee added a comment - Thanks Steve Shaw . Could you please also try to capture the stack trace when it hangs for 10.6.21 run ? It would be of great help in analysis.
            Steve Shaw Steve Shaw added a comment -

            debarun the same test has run against MariaDB 10.6.21-MariaDB source revision 066e8d6aeabc13242193780341e0f845528105de and the issue has not been encountered after running for 3 hours, 3X longer than when it is encountered on 10.6.18. There is also no messages on "Could not free any blocks". It does not appear to be encountered on 10.6.21 (but does not guarantee it).
            Can the pull request be applied to 10.6.18 to see if it fixes the known affected version?

            Steve Shaw Steve Shaw added a comment - debarun the same test has run against MariaDB 10.6.21-MariaDB source revision 066e8d6aeabc13242193780341e0f845528105de and the issue has not been encountered after running for 3 hours, 3X longer than when it is encountered on 10.6.18. There is also no messages on "Could not free any blocks". It does not appear to be encountered on 10.6.21 (but does not guarantee it). Can the pull request be applied to 10.6.18 to see if it fixes the known affected version?

            Steve Shaw Possibly some other change have made the behaviour different in 10.6.21 but as you mentioned it doesn't guarantee that the issue is not there. Yes, I think if the issue is repeatable in 10.6.18, it makes good sense to run it with the pull request changes. btw, the stack trace (and other details) information would be still useful for me even without the patch in 10.6.18 (in case it doesn't repeat with the pull-request patch).

            debarun Debarun Banerjee added a comment - Steve Shaw Possibly some other change have made the behaviour different in 10.6.21 but as you mentioned it doesn't guarantee that the issue is not there. Yes, I think if the issue is repeatable in 10.6.18, it makes good sense to run it with the pull request changes. btw, the stack trace (and other details) information would be still useful for me even without the patch in 10.6.18 (in case it doesn't repeat with the pull-request patch).
            Steve Shaw Steve Shaw added a comment -

            I also ran against 10.6.19 and did not see the issue or the message. I re-ran again vs 10.6.18 and this gave the message "2025-03-27 13:24:22 0 [Warning] InnoDB: Could not free any blocks in the buffer pool! 4737408 blocks are in use and 0 free. Consider increasing innodb_buffer_pool_size." yet did not hang in the subsequent hour as previously, yet stopping and restarting the test with the sessions reconnecting produced a hang immediately. So, so far "Could not free any blocks in the buffer pool!" and the hang state has only been reproduced every time after receiving the message on 10.6.18 and no other version.

            Steve Shaw Steve Shaw added a comment - I also ran against 10.6.19 and did not see the issue or the message. I re-ran again vs 10.6.18 and this gave the message "2025-03-27 13:24:22 0 [Warning] InnoDB: Could not free any blocks in the buffer pool! 4737408 blocks are in use and 0 free. Consider increasing innodb_buffer_pool_size." yet did not hang in the subsequent hour as previously, yet stopping and restarting the test with the sessions reconnecting produced a hang immediately. So, so far "Could not free any blocks in the buffer pool!" and the hang state has only been reproduced every time after receiving the message on 10.6.18 and no other version.

            MDEV-34265 must be included in ES 10.6.19-15, as it is fixed in CS 10.6.19.
            Thanks

            pandi.gurusamy Pandikrishnan Gurusamy added a comment - MDEV-34265 must be included in ES 10.6.19-15, as it is fixed in CS 10.6.19. Thanks

            pandi.gurusamy right, customer encountered the current bug on 10.6.19 and that stack trace also had confirmed earlier that it is not related to MDEV-34265. My analysis in this bug is mainly based on customer stack trace and this bug is expected to fix a symptom that should result in the exact symptom customer is facing. However, we could not repeat this issue in black-box testing and Steve Shawis kindly helping with it.

            Steve Shaw I see. In 10.6.18, you could be hitting MDEV-34265. For current customer case (CS-211883), however, we are sure that he is hitting a different issue which could be explained by the current bug MDEV-36226.

            To know for sure whether you are hitting MDEV-34265, I would need to check the complete stack trace in your case when it hangs. Could you please attach gdb to the hanging mariadbd and get the stack trace for all threads ?
            Also, I recommend to move to 10.6.19 for future testing of this bug to ensure that we are not blocked by MDEV-34265.

            debarun Debarun Banerjee added a comment - pandi.gurusamy right, customer encountered the current bug on 10.6.19 and that stack trace also had confirmed earlier that it is not related to MDEV-34265 . My analysis in this bug is mainly based on customer stack trace and this bug is expected to fix a symptom that should result in the exact symptom customer is facing. However, we could not repeat this issue in black-box testing and Steve Shaw is kindly helping with it. Steve Shaw I see. In 10.6.18, you could be hitting MDEV-34265 . For current customer case (CS-211883), however, we are sure that he is hitting a different issue which could be explained by the current bug MDEV-36226 . To know for sure whether you are hitting MDEV-34265 , I would need to check the complete stack trace in your case when it hangs. Could you please attach gdb to the hanging mariadbd and get the stack trace for all threads ? Also, I recommend to move to 10.6.19 for future testing of this bug to ensure that we are not blocked by MDEV-34265 .

            axel reported in MDEV-35155 observing a performance improvement. This is most likely thanks to debarun’s fix.

            marko Marko Mäkelä added a comment - axel reported in MDEV-35155 observing a performance improvement. This is most likely thanks to debarun ’s fix.

            People

              debarun Debarun Banerjee
              debarun Debarun Banerjee
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.