Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26445

innodb_undo_log_truncate is unnecessarily slow

Details

    Description

      Even after the version 10.5.9 which should have fixed it, it still crashes for me

      --Thread 139751148128000 has waited at btr0pcur.cc line 510 for 435.00 seconds the semaphore:
      S-lock on RW-latch at 0x7f3e1c0ca778 created in file buf0buf.cc line 1221
      a writer (thread id 139751159285504) has reserved it in mode  exclusive
      number of readers 0, waiters flag 1, lock_word: 0
      Last time write locked in file btr0pcur.cc line 257
      2021-08-20  7:33:31 0 [Note] InnoDB: A semaphore wait:
      --Thread 139751139219200 has waited at btr0pcur.cc line 510 for 254.00 seconds the semaphore:
      S-lock on RW-latch at 0x7f3e1c0ca778 created in file buf0buf.cc line 1221
      a writer (thread id 139751159285504) has reserved it in mode  exclusive
      number of readers 0, waiters flag 1, lock_word: 0
      Last time write locked in file btr0pcur.cc line 257
      2021-08-20  7:33:31 0 [Note] InnoDB: A semaphore wait:
      --Thread 139751134918400 has waited at btr0pcur.cc line 510 for 511.00 seconds the semaphore:
      S-lock on RW-latch at 0x7f3e1c0ca778 created in file buf0buf.cc line 1221
      a writer (thread id 139751159285504) has reserved it in mode  exclusive
      number of readers 0, waiters flag 1, lock_word: 0
      Last time write locked in file btr0pcur.cc line 257
      InnoDB: Pending reads 0, writes 0
      2021-08-20  7:33:31 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
      210820  7:33:31 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.5.11-MariaDB-1
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=346
      max_threads=402
      thread_count=330
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016043 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x49000
      Can't start addr2line
      /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x555b4482793e]
      /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x555b44333145]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7f40b1317140]
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7f40b0e60ce1]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7f40b0e4a537]
      /usr/sbin/mariadbd(+0x62fefd)[0x555b44017efd]
      /usr/sbin/mariadbd(+0x626b02)[0x555b4400eb02]
      /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x3b)[0x555b447c91fb]
      /usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32)[0x555b447ca592]
      /usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x555b447c8fbf]
      /lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0)[0x7f40b11feed0]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7f40b130bea7]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f40b0f22def]
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /home/mysql
      Resource Limits:
      Limit                     Soft Limit           Hard Limit           Units     
      Max cpu time              unlimited            unlimited            seconds   
      Max file size             unlimited            unlimited            bytes     
      Max data size             unlimited            unlimited            bytes     
      Max stack size            8388608              unlimited            bytes     
      Max core file size        0                    unlimited            bytes     
      Max resident set          unlimited            unlimited            bytes     
      Max processes             1030947              1030947              processes 
      Max open files            60000                60000                files     
      Max locked memory         65536                65536                bytes     
      Max address space         unlimited            unlimited            bytes     
      Max file locks            unlimited            unlimited            locks     
      Max pending signals       1030947              1030947              signals   
      Max msgqueue size         819200               819200               bytes     
      Max nice priority         0                    0                    
      Max realtime priority     0                    0                    
      Max realtime timeout      unlimited            unlimited            us        
      Core pattern: core
      

      Then after this crash, it has corrupted undo file and mariadb can't start :

      2021-08-20 10:07:23 0 [ERROR] [FATAL] InnoDB: Trying to read 16384 bytes at 884310016 outside the bounds of the file: .//undo002
      210820 10:07:23 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.5.11-MariaDB-1
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=0
      max_threads=402
      thread_count=0
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016043 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x49000
      ??:0(my_print_stacktrace)[0x55a09b42993e]
      ??:0(handle_fatal_signal)[0x55a09af35145]
      sigaction.c:0(__restore_rt)[0x7f9b7a0e0140]
      linux/raise.c:51(__GI_raise)[0x7f9b79c29ce1]
      stdlib/abort.c:81(__GI_abort)[0x7f9b79c13537]
      ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac19efd]
      ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac2d0f2]
      ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac2d137]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b34b247]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b34c494]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b33adb0]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b302398]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b2f873a]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b2fee6e]
      ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac13662]
      ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55a09b213139]
      ??:0(ha_initialize_handlerton(st_plugin_int*))[0x55a09af380de]
      ??:0(sys_var_pluginvar::sys_var_pluginvar(sys_var_chain*, char const*, st_plugin_int*, st_mysql_sys_var*))[0x55a09ad4db5a]
      ??:0(plugin_init(int*, char**, int))[0x55a09ad4ec47]
      ??:0(unireg_abort)[0x55a09ac7a457]
      ??:0(mysqld_main(int, char**))[0x55a09ac800e0]
      csu/libc-start.c:308(__libc_start_main)[0x7f9b79c14d0a]
      ??:0(_start)[0x55a09ac7446a]
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /home/mysql
      Resource Limits:
      Limit                     Soft Limit           Hard Limit           Units     
      Max cpu time              unlimited            unlimited            seconds   
      Max file size             unlimited            unlimited            bytes     
      Max data size             unlimited            unlimited            bytes     
      Max stack size            8388608              unlimited            bytes     
      Max core file size        0                    unlimited            bytes     
      Max resident set          unlimited            unlimited            bytes     
      Max processes             1030947              1030947              processes 
      Max open files            60000                60000                files     
      Max locked memory         65536                65536                bytes     
      Max address space         unlimited            unlimited            bytes     
      Max file locks            unlimited            unlimited            locks     
      Max pending signals       1030947              1030947              signals   
      Max msgqueue size         819200               819200               bytes     
      Max nice priority         0                    0                    
      Max realtime priority     0                    0                    
      Max realtime timeout      unlimited            unlimited            us        
      Core pattern: core
      

      Attachments

        1. crash.log
          996 kB
        2. crash2.log
          738 kB
        3. crash3.log
          1.30 MB
        4. hopeenoughdata.txt
          40 kB
        5. log.txt
          142 kB
        6. mariadb_bt_all_threads_workload_slowed_down.txt
          69 kB
        7. mariadb_bt_all_threads.txt
          19 kB
        8. semaphore without crash.txt
          713 kB
        9. showenginecensored.txt
          28 kB
        10. show status.txt
          62 kB
        11. stackfreeze
          209 kB
        12. stackfreeze2
          442 kB
        13. stackfreeze3
          442 kB
        14. stillstuckmariadb.txt
          22 kB
        15. stillstuckmariadb3
          17 kB

        Issue Links

          Activity

            martina342, thank you. Meanwhile, I think I nailed down MDEV-26450.

            I think that in 10.5 and 10.6, it should be possible to avoid the buf_flush_list_space() loop in trx_purge_truncate_history() and simply let undo tablespace truncation discard any pages that had not been written before the file was shrunk. This thanks to MDEV-15528. We will handle that as part of this ticket, not as part of the correctness issue, which is MDEV-26450.

            marko Marko Mäkelä added a comment - martina342 , thank you. Meanwhile, I think I nailed down MDEV-26450 . I think that in 10.5 and 10.6, it should be possible to avoid the buf_flush_list_space() loop in trx_purge_truncate_history() and simply let undo tablespace truncation discard any pages that had not been written before the file was shrunk. This thanks to MDEV-15528 . We will handle that as part of this ticket, not as part of the correctness issue, which is MDEV-26450 .

            I implemented a fix that avoids writing any pages of the being-truncated undo tablespace. It does not depend on MDEV-15528, but rather on MDEV-25113 and MDEV-15058. A simple 5-minute Sysbench oltp_update_index 8×100,000 rows (on RAM disk) shows the following difference of average throughput, 95th percentile and maximum latency between innodb_undo_tablespaces=0 and innodb_undo_tablespaces=2 innodb_undo_log_truncate=ON:

            connections baseline/tps baseline 95%/ms baseline max/ms truncate/tps truncate 95%/ms truncate max/ms
            10 95285.41 0.19 77.87 97687.20 0.18 38.54
            20 157405.77 0.18 195.21 159989.50 0.19 407.60
            40 164189.49 0.42 336.01 166069.04 0.70 333.03
            80 163296.38 1.32 773.72 161948.80 2.07 716.47
            160 146380.27 5.00 161.18 143285.16 5.88 136.79

            Note: Each test run was on the same 10.5-based server code, just changing the 2 configuration parameters. The latency is increasing starting with 40 concurrent connections. However, based on the server error log, the last time the undo logs were truncated was 14 minutes before the end of the test, which must have been while running at 40 concurrent connections. At the start (10 connections), the undo log was being truncated several times per second. So, the latency increase ought to be due to using 2 dedicated undo tablespaces, not due to undo tablespace truncation. The maximum latency is random by nature, but the average throughput and the 95th percentile latency should be reasonably stable.

            marko Marko Mäkelä added a comment - I implemented a fix that avoids writing any pages of the being-truncated undo tablespace . It does not depend on MDEV-15528 , but rather on MDEV-25113 and MDEV-15058 . A simple 5-minute Sysbench oltp_update_index 8×100,000 rows (on RAM disk) shows the following difference of average throughput, 95th percentile and maximum latency between innodb_undo_tablespaces=0 and innodb_undo_tablespaces=2 innodb_undo_log_truncate=ON : connections baseline/tps baseline 95%/ms baseline max/ms truncate/tps truncate 95%/ms truncate max/ms 10 95285.41 0.19 77.87 97687.20 0.18 38.54 20 157405.77 0.18 195.21 159989.50 0.19 407.60 40 164189.49 0.42 336.01 166069.04 0.70 333.03 80 163296.38 1.32 773.72 161948.80 2.07 716.47 160 146380.27 5.00 161.18 143285.16 5.88 136.79 Note: Each test run was on the same 10.5-based server code, just changing the 2 configuration parameters. The latency is increasing starting with 40 concurrent connections. However, based on the server error log, the last time the undo logs were truncated was 14 minutes before the end of the test, which must have been while running at 40 concurrent connections. At the start (10 connections), the undo log was being truncated several times per second. So, the latency increase ought to be due to using 2 dedicated undo tablespaces, not due to undo tablespace truncation. The maximum latency is random by nature, but the average throughput and the 95th percentile latency should be reasonably stable.

            It occurred to me that it is better to not hold any mutex while truncating the file. It suffices to only hold the tablespace latch. That should reduce the latency, by only blocking page allocation operations in the being-truncated tablespace. During the truncation, the undo tablespace should not be accessed at all.

            marko Marko Mäkelä added a comment - It occurred to me that it is better to not hold any mutex while truncating the file . It suffices to only hold the tablespace latch. That should reduce the latency, by only blocking page allocation operations in the being-truncated tablespace. During the truncation, the undo tablespace should not be accessed at all.

            I conducted another test on the 10.6-based branch, this time using NVMe instead of RAM disk, and 50GiB redo log and buffer pool. Yesterday, on the 10.5-based branch, I think that the buffer pool was 50GiB.

            connections throughput/tps 95% latency/ms max latency/ms
            10 105850.83 0.12 19.65
            20 161065.65 0.17 23.45
            40 172238.07 0.44 38.84
            80 169323.66 0.86 108.88

            I did not test at 160 connections. We can see that the latency and throughput are much better now, despite the fact that the test runs on real storage instead of RAM disk. Some of it should be explained by the trick to truncate the file while only holding the fil_space_t::latch.

            Admittedly, the storage is fast (peak write speed 2300 MB/s, only about 100 MB/s in this benchmark, and the maximum write rate I have seen in my tests is around 800 MB/s). I suppose that it was writing out almost no data pages. During the benchmark, the LSN only grew to 42,006,626,705 (not even to fill the 50 GiB log). So, there should have been no need to create a log checkpoint.

            The last undo truncation took place almost 12 minutes before the end of the benchmark. That was while we were serving 20 concurrent connections. I think that to have more chances of undo tablespace truncation, on a write-heavy system, innodb_undo_tablespaces needs to be larger than 2.

            marko Marko Mäkelä added a comment - I conducted another test on the 10.6-based branch, this time using NVMe instead of RAM disk, and 50GiB redo log and buffer pool. Yesterday, on the 10.5-based branch, I think that the buffer pool was 50GiB. connections throughput/tps 95% latency/ms max latency/ms 10 105850.83 0.12 19.65 20 161065.65 0.17 23.45 40 172238.07 0.44 38.84 80 169323.66 0.86 108.88 I did not test at 160 connections. We can see that the latency and throughput are much better now, despite the fact that the test runs on real storage instead of RAM disk. Some of it should be explained by the trick to truncate the file while only holding the fil_space_t::latch . Admittedly, the storage is fast (peak write speed 2300 MB/s, only about 100 MB/s in this benchmark, and the maximum write rate I have seen in my tests is around 800 MB/s). I suppose that it was writing out almost no data pages. During the benchmark, the LSN only grew to 42,006,626,705 (not even to fill the 50 GiB log). So, there should have been no need to create a log checkpoint. The last undo truncation took place almost 12 minutes before the end of the benchmark. That was while we were serving 20 concurrent connections. I think that to have more chances of undo tablespace truncation, on a write-heavy system, innodb_undo_tablespaces needs to be larger than 2.

            origin/bb-10.6-MDEV-26450 456479690b9ee38b0b8a912bf4b6217952c5903b 2021-09-23T11:21:51+03:00
            which should contain fixes for MDEV-26450 and MDEV-26445 behaved well in RQG testing.
            

            mleich Matthias Leich added a comment - origin/bb-10.6-MDEV-26450 456479690b9ee38b0b8a912bf4b6217952c5903b 2021-09-23T11:21:51+03:00 which should contain fixes for MDEV-26450 and MDEV-26445 behaved well in RQG testing.

            People

              marko Marko Mäkelä
              martina342 martina342
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.