Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.5.14, 10.6.6, 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL)
-
Debian GNU/Linux Sid, libstdc++-dev Version: 12.2.0-14
Description
While testing innodb.recovery_memory related to MDEV-31350 and MDEV-31353, I got a surprise crash near the end of my test run of a RelWithDebInfo executable:
10.9 44c9008ba65686abf1c82c9166255a8c52d61f74 |
innodb.recovery_memory 'innodb,release' w76 [ 100 fail ]
|
Test ended at 2023-05-26 12:43:08
|
|
CURRENT_TEST: innodb.recovery_memory
|
|
|
Server [mysqld.1 - pid: 1375448, winpid: 1375448, exit: 256] failed during test run
|
Server log from this test:
|
----------SERVER LOG START-----------
|
$ /dev/shm/10.9/sql/mariadbd --defaults-group-suffix=.1 --defaults-file=/dev/shm/10.9/mysql-test/var/76/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --disable-gdb --loose-sequence --innodb_buffer_pool_size=1073741824 --innodb_buffer_pool_size=5242880 --core-file --loose-debug-sync-timeout=300
|
2023-05-26 12:42:51 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32222)
|
2023-05-26 12:42:51 0 [Warning] Changed limits: max_open_files: 1024 max_connections: 151 (was 151) table_cache: 421 (was 2000)
|
2023-05-26 12:42:51 0 [Note] Starting MariaDB 10.9.7-MariaDB-log source revision 44c9008ba65686abf1c82c9166255a8c52d61f74 as process 1375542
|
2023-05-26 12:42:51 0 [Note] InnoDB: Compressed tables use zlib 1.2.13
|
2023-05-26 12:42:51 0 [Note] InnoDB: Using transactional memory
|
2023-05-26 12:42:51 0 [Note] InnoDB: Number of transaction pools: 1
|
2023-05-26 12:42:51 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
|
2023-05-26 12:42:51 0 [Note] InnoDB: Using liburing
|
2023-05-26 12:42:51 0 [Note] InnoDB: Initializing buffer pool, total size = 5.000MiB, chunk size = 1.000MiB
|
2023-05-26 12:42:51 0 [Note] InnoDB: Completed initialization of buffer pool
|
2023-05-26 12:42:51 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
|
2023-05-26 12:42:51 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=691948272
|
2023-05-26 12:42:51 0 [Note] InnoDB: Multi-batch recovery needed at LSN 692430929
|
2023-05-26 12:42:51 0 [Note] InnoDB: End of log at LSN=694016960
|
2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 692776501/694016960; 356 pages
|
2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 693253964/694016960; 317 pages
|
2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 693734818/694016960; 268 pages
|
2023-05-26 12:42:52 0 [Note] InnoDB: To recover: 9 pages
|
230526 12:42:52 [ERROR] mysqld got signal 11 ;
|
...
|
#5 buf_page_t::oldest_modification (this=0x0) at /mariadb/11/storage/innobase/include/buf0buf.h:749
|
#6 log_sort_flush_list()::$_1::operator()(buf_page_t const*, buf_page_t const*) const (lhs=0x0, rhs=<optimized out>, this=<optimized out>) at /mariadb/11/storage/innobase/log/log0recv.cc:3802
|
#7 __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>::operator()<buf_page_t**, buf_page_t**>(buf_page_t**, buf_page_t**) (__it1=<optimized out>, __it2=0x5578776b7ca0, this=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/predefined_ops.h:158
|
#8 std::__unguarded_partition<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=<optimized out>, __last=0x5578776b7fe8, __pivot=<optimized out>, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1867
|
#9 std::__unguarded_partition_pivot<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1889
|
#10 std::__introsort_loop<buf_page_t**, long, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, long, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=__first@entry=0x5578776b7ca0, __last=0x5578776b7ff0, __depth_limit=11, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1921
|
#11 0x0000557875a531ce in std::__sort<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1937
|
#12 std::sort<buf_page_t**, log_sort_flush_list()::$_1>(buf_page_t**, buf_page_t**, log_sort_flush_list()::$_1) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:4853
|
The crash occurs in the following code, which was added in MDEV-27022:
static void log_sort_flush_list() |
{
|
mysql_mutex_lock(&buf_pool.flush_list_mutex);
|
|
const size_t size= UT_LIST_GET_LEN(buf_pool.flush_list); |
std::unique_ptr<buf_page_t *[]> list(new buf_page_t *[size]); |
|
size_t idx= 0; |
for (buf_page_t *p= UT_LIST_GET_FIRST(buf_pool.flush_list); p; |
p= UT_LIST_GET_NEXT(list, p))
|
list.get()[idx++]= p;
|
|
std::sort(list.get(), list.get() + size,
|
[](const buf_page_t *lhs, const buf_page_t *rhs) { |
return rhs->oldest_modification() < lhs->oldest_modification(); |
});
|
Based on the core dump, it looks like the comparison function may have been invoked on the first out-of-bounds element list.get() + size.
The size is equal to buf_pool.flush_list.count, and all elements in the array list.get() are nonnull. The address immediately after the last element (the second argument passed to std::sort()) contains a null pointer.
To me, this looks like a possible error in the implementation of std::sort() in this version of libstdc+. The executable had been compiled with clang-15, but the option -stdlib=libc+ was not used.
I think that more investigation will be needed on this. There is also other similar use of std::sort() in InnoDB.
Attachments
Issue Links
- relates to
-
MDEV-25113 Reduce effect of parallel background flush on select workload
-
- Closed
-
-
MDEV-31791 Crash recovery in the test innodb.recovery_memory occasionally fails
-
- Closed
-
-
MDEV-35225 Bogus debug assertion failures in innodb.innodb-32k-crash
-
- Closed
-
-
MDEV-27022 Buffer pool is being flushed during recovery
-
- Closed
-
-
MDEV-31350 test innodb.recovery_memory failed on '21 failed attempts to flush a page'
-
- Closed
-
-
MDEV-31353 InnoDB recovery hangs after reporting corruption
-
- Closed
-
-
MDEV-32029 Assertion failures in log_sort_flush_list upon crash recovery
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Status | Open [ 1 ] | In Progress [ 3 ] |
Link |
This issue relates to |
Link |
This issue relates to |
issue.field.resolutiondate | 2023-07-28 10:20:43.0 | 2023-07-28 10:20:43.158 |
Fix Version/s | 10.6.15 [ 29013 ] | |
Fix Version/s | 10.9.8 [ 29015 ] | |
Fix Version/s | 10.10.6 [ 29017 ] | |
Fix Version/s | 10.11.5 [ 29019 ] | |
Fix Version/s | 11.0.3 [ 28920 ] | |
Fix Version/s | 11.1.2 [ 28921 ] | |
Fix Version/s | 11.2.1 [ 29034 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.9 [ 26905 ] | |
Fix Version/s | 10.10 [ 27530 ] | |
Fix Version/s | 10.11 [ 27614 ] | |
Fix Version/s | 11.0 [ 28320 ] | |
Fix Version/s | 11.1 [ 28549 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Link |
This issue relates to |
Link |
This issue relates to |
One year ago, mleich had reproduced this exactly once, actually even with rr replay, when testing a deleted development branch of
MDEV-13542. There were many failures with that branch because it was tested with some fault injection (pretending that pages are corrupted when they are not), and the files were removed in order to save storage space. Only a log file was left. I will copy some data from there:# 2022-05-27T17:48:04 [3595936] | [rr 3681439 22656]2022-05-27 17:47:26 0 [Note] InnoDB: To recover: 345 pages from log
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 23039]=================================================================
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 23046][rr 3681439 23050]==3681439==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x621000053938 at pc 0x56178acbabf2 bp 0x7fffb454d2e0 sp 0x7fffb454d2d0
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 23053][rr 3681439 23055]READ of size 8 at 0x621000053938 thread T0
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28861] #0 0x56178acbabf1 in operator()<buf_page_t**, buf_page_t**> /usr/include/c++/9/bits/predefined_ops.h:143
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28869] #1 0x56178acbaab5 in __unguarded_partition<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::<lambda(const buf_page_t*, const buf_page_t*)> > > /usr/include/c++/9/bits/stl_algo.h:1901
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28871] #2 0x56178acba092 in __unguarded_partition_pivot<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::<lambda(const buf_page_t*, const buf_page_t*)> > > /usr/include/c++/9/bits/stl_algo.h:1922
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28873] #3 0x56178acb932d in __introsort_loop<buf_page_t**, long int, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::<lambda(const buf_page_t*, const buf_page_t*)> > > /usr/include/c++/9/bits/stl_algo.h:1952
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28879] #4 0x56178acb7497 in __sort<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::<lambda(const buf_page_t*, const buf_page_t*)> > > /usr/include/c++/9/bits/stl_algo.h:1967
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28885] #5 0x56178acb6047 in sort<buf_page_t**, log_sort_flush_list()::<lambda(const buf_page_t*, const buf_page_t*)> > /usr/include/c++/9/bits/stl_algo.h:4899
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28887] #6 0x56178aca9887 in log_sort_flush_list /data/Server/bb-10.6-MDEV-13542N/storage/innobase/log/log0recv.cc:3092
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28889] #7 0x56178acac10e in recv_sys_t::apply(bool) /data/Server/bb-10.6-MDEV-13542N/storage/innobase/log/log0recv.cc:3285
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28895] #8 0x56178af2c41d in srv_start(bool) /data/Server/bb-10.6-MDEV-13542N/storage/innobase/srv/srv0start.cc:1486
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28926] #9 0x56178aac124e in innodb_init /data/Server/bb-10.6-MDEV-13542N/storage/innobase/handler/ha_innodb.cc:4316
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28944] #10 0x56178a1c8cde in ha_initialize_handlerton(st_plugin_int*) /data/Server/bb-10.6-MDEV-13542N/sql/handler.cc:659
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28954] #11 0x561789a254be in plugin_initialize /data/Server/bb-10.6-MDEV-13542N/sql/sql_plugin.cc:1465
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28956] #12 0x561789a27223 in plugin_init(int*, char**, int) /data/Server/bb-10.6-MDEV-13542N/sql/sql_plugin.cc:1758
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28968] #13 0x56178972fc05 in init_server_components /data/Server/bb-10.6-MDEV-13542N/sql/mysqld.cc:5090
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28970] #14 0x5617897315bc in mysqld_main(int, char**) /data/Server/bb-10.6-MDEV-13542N/sql/mysqld.cc:5707
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28972] #15 0x56178971ffbc in main /data/Server/bb-10.6-MDEV-13542N/sql/main.cc:34
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28974] #16 0x7f80dde7f0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28976] #17 0x56178971fedd in _start (/data/Server_bin/bb-10.6-MDEV-13542N_asan/bin/mariadbd+0x13ffedd)
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28978]
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28988]0x621000053938 is located 0 bytes to the right of 4152-byte region [0x621000052900,0x621000053938)
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 28994]allocated by thread T0 here:
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29000] #0 0x7f80deb89b47 in operator new[](unsigned long) (/lib/x86_64-linux-gnu/libasan.so.5+0x10fb47)
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29002] #1 0x56178aca9749 in log_sort_flush_list /data/Server/bb-10.6-MDEV-13542N/storage/innobase/log/log0recv.cc:3085
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29004] #2 0x56178acac10e in recv_sys_t::apply(bool) /data/Server/bb-10.6-MDEV-13542N/storage/innobase/log/log0recv.cc:3285
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29006] #3 0x56178af2c41d in srv_start(bool) /data/Server/bb-10.6-MDEV-13542N/storage/innobase/srv/srv0start.cc:1486
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29008] #4 0x56178aac124e in innodb_init /data/Server/bb-10.6-MDEV-13542N/storage/innobase/handler/ha_innodb.cc:4316
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29010] #5 0x56178a1c8cde in ha_initialize_handlerton(st_plugin_int*) /data/Server/bb-10.6-MDEV-13542N/sql/handler.cc:659
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29012] #6 0x561789a254be in plugin_initialize /data/Server/bb-10.6-MDEV-13542N/sql/sql_plugin.cc:1465
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29014] #7 0x561789a27223 in plugin_init(int*, char**, int) /data/Server/bb-10.6-MDEV-13542N/sql/sql_plugin.cc:1758
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29016] #8 0x56178972fc05 in init_server_components /data/Server/bb-10.6-MDEV-13542N/sql/mysqld.cc:5090
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29018] #9 0x5617897315bc in mysqld_main(int, char**) /data/Server/bb-10.6-MDEV-13542N/sql/mysqld.cc:5707
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29020] #10 0x56178971ffbc in main /data/Server/bb-10.6-MDEV-13542N/sql/main.cc:34
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29022] #11 0x7f80dde7f0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29024]
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29034]SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/include/c++/9/bits/predefined_ops.h:143 in operator()<buf_page_t**, buf_page_t**>
# 2022-05-27T17:48:04 [3595936] | [rr 3681439 29468]Shadow bytes around the buggy address:
# 2022-05-27T17:48:04 [3595936] | 0x0c42800026d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
# 2022-05-27T17:48:04 [3595936] | 0x0c42800026e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
# 2022-05-27T17:48:04 [3595936] | 0x0c42800026f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
# 2022-05-27T17:48:04 [3595936] | 0x0c4280002700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
# 2022-05-27T17:48:04 [3595936] | 0x0c4280002710: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
# 2022-05-27T17:48:04 [3595936] | =>0x0c4280002720: 00 00 00 00 00 00 00[fa]fa fa fa fa fa fa fa fa
It looks like exactly my SIGSEGV, with the AddressSanitizer shadow byte hinting that the element past the end of the buffer is being accessed. According to the above output, this happened with the C++ run-time library of GCC 9 already.