[MDEV-26445] innodb_undo_log_truncate is unnecessarily slow - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.5.11, 10.6
Fix Version/s: 10.5.13, 10.6.5, 10.7.1
Component/s: Storage Engine - InnoDB
Labels:
- performance
Environment:
debian 11

Description

Even after the version 10.5.9 which should have fixed it, it still crashes for me

--Thread 139751148128000 has waited at btr0pcur.cc line 510 for 435.00 seconds the semaphore:

S-lock on RW-latch at 0x7f3e1c0ca778 created in file buf0buf.cc line 1221

a writer (thread id 139751159285504) has reserved it in mode  exclusive

number of readers 0, waiters flag 1, lock_word: 0

Last time write locked in file btr0pcur.cc line 257

2021-08-20  7:33:31 0 [Note] InnoDB: A semaphore wait:

--Thread 139751139219200 has waited at btr0pcur.cc line 510 for 254.00 seconds the semaphore:

S-lock on RW-latch at 0x7f3e1c0ca778 created in file buf0buf.cc line 1221

a writer (thread id 139751159285504) has reserved it in mode  exclusive

number of readers 0, waiters flag 1, lock_word: 0

Last time write locked in file btr0pcur.cc line 257

2021-08-20  7:33:31 0 [Note] InnoDB: A semaphore wait:

--Thread 139751134918400 has waited at btr0pcur.cc line 510 for 511.00 seconds the semaphore:

S-lock on RW-latch at 0x7f3e1c0ca778 created in file buf0buf.cc line 1221

a writer (thread id 139751159285504) has reserved it in mode  exclusive

number of readers 0, waiters flag 1, lock_word: 0

Last time write locked in file btr0pcur.cc line 257

InnoDB: Pending reads 0, writes 0

2021-08-20  7:33:31 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.

210820  7:33:31 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.5.11-MariaDB-1

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=346

max_threads=402

thread_count=330

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016043 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x0 thread_stack 0x49000

Can't start addr2line

/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x555b4482793e]

/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x555b44333145]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7f40b1317140]

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7f40b0e60ce1]

/lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7f40b0e4a537]

/usr/sbin/mariadbd(+0x62fefd)[0x555b44017efd]

/usr/sbin/mariadbd(+0x626b02)[0x555b4400eb02]

/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x3b)[0x555b447c91fb]

/usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32)[0x555b447ca592]

/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x555b447c8fbf]

/lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0)[0x7f40b11feed0]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7f40b130bea7]

/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f40b0f22def]

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains

information that should help you find out what is causing the crash.

Writing a core file...

Working directory at /home/mysql

Resource Limits:

Limit                     Soft Limit           Hard Limit           Units

Max cpu time              unlimited            unlimited            seconds

Max file size             unlimited            unlimited            bytes

Max data size             unlimited            unlimited            bytes

Max stack size            8388608              unlimited            bytes

Max core file size        0                    unlimited            bytes

Max resident set          unlimited            unlimited            bytes

Max processes             1030947              1030947              processes

Max open files            60000                60000                files

Max locked memory         65536                65536                bytes

Max address space         unlimited            unlimited            bytes

Max file locks            unlimited            unlimited            locks

Max pending signals       1030947              1030947              signals

Max msgqueue size         819200               819200               bytes

Max nice priority         0                    0

Max realtime priority     0                    0

Max realtime timeout      unlimited            unlimited            us

Core pattern: core

Then after this crash, it has corrupted undo file and mariadb can't start :

2021-08-20 10:07:23 0 [ERROR] [FATAL] InnoDB: Trying to read 16384 bytes at 884310016 outside the bounds of the file: .//undo002

210820 10:07:23 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.5.11-MariaDB-1

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=0

max_threads=402

thread_count=0

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1016043 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x0 thread_stack 0x49000

??:0(my_print_stacktrace)[0x55a09b42993e]

??:0(handle_fatal_signal)[0x55a09af35145]

sigaction.c:0(__restore_rt)[0x7f9b7a0e0140]

linux/raise.c:51(__GI_raise)[0x7f9b79c29ce1]

stdlib/abort.c:81(__GI_abort)[0x7f9b79c13537]

??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac19efd]

??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac2d0f2]

??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac2d137]

??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b34b247]

??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b34c494]

??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b33adb0]

??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b302398]

??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b2f873a]

??:0(std::unique_lock<std::mutex>::unlock())[0x55a09b2fee6e]

??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55a09ac13662]

??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55a09b213139]

??:0(ha_initialize_handlerton(st_plugin_int*))[0x55a09af380de]

??:0(sys_var_pluginvar::sys_var_pluginvar(sys_var_chain*, char const*, st_plugin_int*, st_mysql_sys_var*))[0x55a09ad4db5a]

??:0(plugin_init(int*, char**, int))[0x55a09ad4ec47]

??:0(unireg_abort)[0x55a09ac7a457]

??:0(mysqld_main(int, char**))[0x55a09ac800e0]

csu/libc-start.c:308(__libc_start_main)[0x7f9b79c14d0a]

??:0(_start)[0x55a09ac7446a]

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains

information that should help you find out what is causing the crash.

Writing a core file...

Working directory at /home/mysql

Resource Limits:

Limit                     Soft Limit           Hard Limit           Units

Max cpu time              unlimited            unlimited            seconds

Max file size             unlimited            unlimited            bytes

Max data size             unlimited            unlimited            bytes

Max stack size            8388608              unlimited            bytes

Max core file size        0                    unlimited            bytes

Max resident set          unlimited            unlimited            bytes

Max processes             1030947              1030947              processes

Max open files            60000                60000                files

Max locked memory         65536                65536                bytes

Max address space         unlimited            unlimited            bytes

Max file locks            unlimited            unlimited            locks

Max pending signals       1030947              1030947              signals

Max msgqueue size         819200               819200               bytes

Max nice priority         0                    0

Max realtime priority     0                    0

Max realtime timeout      unlimited            unlimited            us

Core pattern: core

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

crash.log
996 kB
2021-08-25 08:11
crash2.log
738 kB
2021-08-31 15:24
crash3.log
1.30 MB
2021-09-04 19:03
hopeenoughdata.txt
40 kB
2021-09-15 10:58
log.txt
142 kB
2021-08-25 08:16
mariadb_bt_all_threads_workload_slowed_down.txt
69 kB
2021-09-16 08:31
mariadb_bt_all_threads.txt
19 kB
2021-09-15 19:57
semaphore without crash.txt
713 kB
2021-09-06 08:37
showenginecensored.txt
28 kB
2021-09-09 13:02
show status.txt
62 kB
2021-09-13 16:21
stackfreeze
209 kB
2021-09-09 11:42
stackfreeze2
442 kB
2021-09-09 12:03
stackfreeze3
442 kB
2021-09-09 12:11
stillstuckmariadb.txt
22 kB
2021-09-13 14:37
stillstuckmariadb3
17 kB
2021-09-13 16:21

Issue Links

causes

MDEV-32757 innodb_undo_log_truncate=ON is not crash safe

Closed

includes

MDEV-21952 ibdata1 file size growing in MariaDB

Closed

is blocked by

MDEV-26450 Corruption due to innodb_undo_log_truncate

Closed

relates to

MDEV-24911 Missing warning before [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.mutex

Open

MDEV-26662 buf_read_page_background() is blocking on LRU eviction

Closed

MDEV-27530 InnoDB - Performance issues after upgrade 10.4.22 to 10.5.13

Closed

(1 relates to)

Activity

Ascending order - Click to sort in descending order

View 46 older comments

Marko Mäkelä added a comment - 2021-09-21 12:31

martina342, thank you. Meanwhile, I think I nailed down ~~MDEV-26450~~.

I think that in 10.5 and 10.6, it should be possible to avoid the buf_flush_list_space() loop in trx_purge_truncate_history() and simply let undo tablespace truncation discard any pages that had not been written before the file was shrunk. This thanks to ~~MDEV-15528~~. We will handle that as part of this ticket, not as part of the correctness issue, which is ~~MDEV-26450~~.

Marko Mäkelä added a comment - 2021-09-21 12:31 martina342 , thank you. Meanwhile, I think I nailed down MDEV-26450 . I think that in 10.5 and 10.6, it should be possible to avoid the buf_flush_list_space() loop in trx_purge_truncate_history() and simply let undo tablespace truncation discard any pages that had not been written before the file was shrunk. This thanks to MDEV-15528 . We will handle that as part of this ticket, not as part of the correctness issue, which is MDEV-26450 .

Marko Mäkelä added a comment - 2021-09-22 13:26

I implemented a fix that avoids writing any pages of the being-truncated undo tablespace. It does not depend on ~~MDEV-15528~~, but rather on ~~MDEV-25113~~ and ~~MDEV-15058~~. A simple 5-minute Sysbench oltp_update_index 8×100,000 rows (on RAM disk) shows the following difference of average throughput, 95th percentile and maximum latency between innodb_undo_tablespaces=0 and innodb_undo_tablespaces=2 innodb_undo_log_truncate=ON:

connections	baseline/tps	baseline 95%/ms	baseline max/ms	truncate/tps	truncate 95%/ms	truncate max/ms
10	95285.41	0.19	77.87	97687.20	0.18	38.54
20	157405.77	0.18	195.21	159989.50	0.19	407.60
40	164189.49	0.42	336.01	166069.04	0.70	333.03
80	163296.38	1.32	773.72	161948.80	2.07	716.47
160	146380.27	5.00	161.18	143285.16	5.88	136.79

Note: Each test run was on the same 10.5-based server code, just changing the 2 configuration parameters. The latency is increasing starting with 40 concurrent connections. However, based on the server error log, the last time the undo logs were truncated was 14 minutes before the end of the test, which must have been while running at 40 concurrent connections. At the start (10 connections), the undo log was being truncated several times per second. So, the latency increase ought to be due to using 2 dedicated undo tablespaces, not due to undo tablespace truncation. The maximum latency is random by nature, but the average throughput and the 95th percentile latency should be reasonably stable.

Marko Mäkelä added a comment - 2021-09-22 13:26 I implemented a fix that avoids writing any pages of the being-truncated undo tablespace . It does not depend on MDEV-15528 , but rather on MDEV-25113 and MDEV-15058 . A simple 5-minute Sysbench oltp_update_index 8×100,000 rows (on RAM disk) shows the following difference of average throughput, 95th percentile and maximum latency between innodb_undo_tablespaces=0 and innodb_undo_tablespaces=2 innodb_undo_log_truncate=ON : connections baseline/tps baseline 95%/ms baseline max/ms truncate/tps truncate 95%/ms truncate max/ms 10 95285.41 0.19 77.87 97687.20 0.18 38.54 20 157405.77 0.18 195.21 159989.50 0.19 407.60 40 164189.49 0.42 336.01 166069.04 0.70 333.03 80 163296.38 1.32 773.72 161948.80 2.07 716.47 160 146380.27 5.00 161.18 143285.16 5.88 136.79 Note: Each test run was on the same 10.5-based server code, just changing the 2 configuration parameters. The latency is increasing starting with 40 concurrent connections. However, based on the server error log, the last time the undo logs were truncated was 14 minutes before the end of the test, which must have been while running at 40 concurrent connections. At the start (10 connections), the undo log was being truncated several times per second. So, the latency increase ought to be due to using 2 dedicated undo tablespaces, not due to undo tablespace truncation. The maximum latency is random by nature, but the average throughput and the 95th percentile latency should be reasonably stable.

Marko Mäkelä added a comment - 2021-09-23 08:25

It occurred to me that it is better to not hold any mutex while truncating the file. It suffices to only hold the tablespace latch. That should reduce the latency, by only blocking page allocation operations in the being-truncated tablespace. During the truncation, the undo tablespace should not be accessed at all.

Marko Mäkelä added a comment - 2021-09-23 08:25 It occurred to me that it is better to not hold any mutex while truncating the file . It suffices to only hold the tablespace latch. That should reduce the latency, by only blocking page allocation operations in the being-truncated tablespace. During the truncation, the undo tablespace should not be accessed at all.

Marko Mäkelä added a comment - 2021-09-23 09:03

I conducted another test on the 10.6-based branch, this time using NVMe instead of RAM disk, and 50GiB redo log and buffer pool. Yesterday, on the 10.5-based branch, I think that the buffer pool was 50GiB.

connections	throughput/tps	95% latency/ms	max latency/ms
10	105850.83	0.12	19.65
20	161065.65	0.17	23.45
40	172238.07	0.44	38.84
80	169323.66	0.86	108.88

I did not test at 160 connections. We can see that the latency and throughput are much better now, despite the fact that the test runs on real storage instead of RAM disk. Some of it should be explained by the trick to truncate the file while only holding the fil_space_t::latch.

Admittedly, the storage is fast (peak write speed 2300 MB/s, only about 100 MB/s in this benchmark, and the maximum write rate I have seen in my tests is around 800 MB/s). I suppose that it was writing out almost no data pages. During the benchmark, the LSN only grew to 42,006,626,705 (not even to fill the 50 GiB log). So, there should have been no need to create a log checkpoint.

The last undo truncation took place almost 12 minutes before the end of the benchmark. That was while we were serving 20 concurrent connections. I think that to have more chances of undo tablespace truncation, on a write-heavy system, innodb_undo_tablespaces needs to be larger than 2.

Marko Mäkelä added a comment - 2021-09-23 09:03 I conducted another test on the 10.6-based branch, this time using NVMe instead of RAM disk, and 50GiB redo log and buffer pool. Yesterday, on the 10.5-based branch, I think that the buffer pool was 50GiB. connections throughput/tps 95% latency/ms max latency/ms 10 105850.83 0.12 19.65 20 161065.65 0.17 23.45 40 172238.07 0.44 38.84 80 169323.66 0.86 108.88 I did not test at 160 connections. We can see that the latency and throughput are much better now, despite the fact that the test runs on real storage instead of RAM disk. Some of it should be explained by the trick to truncate the file while only holding the fil_space_t::latch . Admittedly, the storage is fast (peak write speed 2300 MB/s, only about 100 MB/s in this benchmark, and the maximum write rate I have seen in my tests is around 800 MB/s). I suppose that it was writing out almost no data pages. During the benchmark, the LSN only grew to 42,006,626,705 (not even to fill the 50 GiB log). So, there should have been no need to create a log checkpoint. The last undo truncation took place almost 12 minutes before the end of the benchmark. That was while we were serving 20 concurrent connections. I think that to have more chances of undo tablespace truncation, on a write-heavy system, innodb_undo_tablespaces needs to be larger than 2.

Matthias Leich added a comment - 2021-09-23 15:20

origin/bb-10.6-MDEV-26450 456479690b9ee38b0b8a912bf4b6217952c5903b 2021-09-23T11:21:51+03:00

which should contain fixes for MDEV-26450 and MDEV-26445 behaved well in RQG testing.

Matthias Leich added a comment - 2021-09-23 15:20 origin/bb-10.6-MDEV-26450 456479690b9ee38b0b8a912bf4b6217952c5903b 2021-09-23T11:21:51+03:00 which should contain fixes for MDEV-26450 and MDEV-26445 behaved well in RQG testing.

MariaDB Server

innodb_undo_log_truncate is unnecessarily slow

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration