[MDEV-31354] SIGSEGV in log_sort_flush_list() in InnoDB crash recovery - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.5.14, 10.6.6, 10.5(EOL), 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL)
Fix Version/s: 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2, 11.2.1
Component/s: Storage Engine - InnoDB
Labels:
- crash
- recovery
Environment:
Debian GNU/Linux Sid, libstdc++-dev Version: 12.2.0-14

Description

While testing innodb.recovery_memory related to ~~MDEV-31350~~ and ~~MDEV-31353~~, I got a surprise crash near the end of my test run of a RelWithDebInfo executable:

10.9 44c9008ba65686abf1c82c9166255a8c52d61f74
innodb.recovery_memory 'innodb,release' w76 [ 100 fail ]
Test ended at 2023-05-26 12:43:08

CURRENT_TEST: innodb.recovery_memory


Server [mysqld.1 - pid: 1375448, winpid: 1375448, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
$ /dev/shm/10.9/sql/mariadbd --defaults-group-suffix=.1 --defaults-file=/dev/shm/10.9/mysql-test/var/76/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --disable-gdb --loose-sequence --innodb_buffer_pool_size=1073741824 --innodb_buffer_pool_size=5242880 --core-file --loose-debug-sync-timeout=300
2023-05-26 12:42:51 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32222)
2023-05-26 12:42:51 0 [Warning] Changed limits: max_open_files: 1024 max_connections: 151 (was 151) table_cache: 421 (was 2000)
2023-05-26 12:42:51 0 [Note] Starting MariaDB 10.9.7-MariaDB-log source revision 44c9008ba65686abf1c82c9166255a8c52d61f74 as process 1375542
2023-05-26 12:42:51 0 [Note] InnoDB: Compressed tables use zlib 1.2.13
2023-05-26 12:42:51 0 [Note] InnoDB: Using transactional memory
2023-05-26 12:42:51 0 [Note] InnoDB: Number of transaction pools: 1
2023-05-26 12:42:51 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2023-05-26 12:42:51 0 [Note] InnoDB: Using liburing
2023-05-26 12:42:51 0 [Note] InnoDB: Initializing buffer pool, total size = 5.000MiB, chunk size = 1.000MiB
2023-05-26 12:42:51 0 [Note] InnoDB: Completed initialization of buffer pool
2023-05-26 12:42:51 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
2023-05-26 12:42:51 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=691948272
2023-05-26 12:42:51 0 [Note] InnoDB: Multi-batch recovery needed at LSN 692430929
2023-05-26 12:42:51 0 [Note] InnoDB: End of log at LSN=694016960
2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 692776501/694016960; 356 pages
2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 693253964/694016960; 317 pages
2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 693734818/694016960; 268 pages
2023-05-26 12:42:52 0 [Note] InnoDB: To recover: 9 pages
230526 12:42:52 [ERROR] mysqld got signal 11 ;
...
#5 buf_page_t::oldest_modification (this=0x0) at /mariadb/11/storage/innobase/include/buf0buf.h:749
#6 log_sort_flush_list()::$_1::operator()(buf_page_t const, buf_page_t const) const (lhs=0x0, rhs=<optimized out>, this=<optimized out>) at /mariadb/11/storage/innobase/log/log0recv.cc:3802
#7 __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>::operator()<buf_page_t, buf_page_t>(buf_page_t, buf_page_t) (__it1=<optimized out>, __it2=0x5578776b7ca0, this=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/predefined_ops.h:158
#8 std::__unguarded_partition<buf_page_t, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t, buf_page_t, buf_page_t, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=<optimized out>, __last=0x5578776b7fe8, __pivot=<optimized out>, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1867
#9 std::__unguarded_partition_pivot<buf_page_t, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1889
#10 std::__introsort_loop<buf_page_t, long, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t, buf_page_t**, long, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=__first@entry=0x5578776b7ca0, __last=0x5578776b7ff0, __depth_limit=11, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1921
#11 0x0000557875a531ce in std::__sort<buf_page_t, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1937
#12 std::sort<buf_page_t, log_sort_flush_list()::$_1>(buf_page_t, buf_page_t**, log_sort_flush_list()::$_1) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:4853

The crash occurs in the following code, which was added in ~~MDEV-27022~~:

static void log_sort_flush_list()

  mysql_mutex_lock(&buf_pool.flush_list_mutex);

  const size_t size= UT_LIST_GET_LEN(buf_pool.flush_list);

  std::unique_ptr<buf_page_t *[]> list(new buf_page_t *[size]);

  size_t idx= 0;

  for (buf_page_t *p= UT_LIST_GET_FIRST(buf_pool.flush_list); p;

       p= UT_LIST_GET_NEXT(list, p))

    list.get()[idx++]= p;

  std::sort(list.get(), list.get() + size,

            [](const buf_page_t *lhs, const buf_page_t *rhs) {

              return rhs->oldest_modification() < lhs->oldest_modification();

});

Based on the core dump, it looks like the comparison function may have been invoked on the first out-of-bounds element list.get() + size.

The size is equal to buf_pool.flush_list.count, and all elements in the array list.get() are nonnull. The address immediately after the last element (the second argument passed to std::sort()) contains a null pointer.

To me, this looks like a possible error in the implementation of std::sort() in this version of libstdc+. The executable had been compiled with clang-15, but the option -stdlib=libc+ was not used.

I think that more investigation will be needed on this. There is also other similar use of std::sort() in InnoDB.

Attachments

Issue Links

relates to

MDEV-25113 Reduce effect of parallel background flush on select workload

Closed

MDEV-31791 Crash recovery in the test innodb.recovery_memory occasionally fails

Closed

MDEV-35225 Bogus debug assertion failures in innodb.innodb-32k-crash

Closed

MDEV-27022 Buffer pool is being flushed during recovery

Closed

MDEV-31350 test innodb.recovery_memory failed on '21 failed attempts to flush a page'

Closed

MDEV-31353 InnoDB recovery hangs after reporting corruption

Closed

MDEV-32029 Assertion failures in log_sort_flush_list upon crash recovery

Closed

(2 relates to)

SIGSEGV in log_sort_flush_list() in InnoDB crash recovery

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration