Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.5, 10.6, 10.11, 11.1(EOL), 10.5.14, 10.6.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 11.0(EOL)
-
Debian GNU/Linux Sid, libstdc++-dev Version: 12.2.0-14
Description
While testing innodb.recovery_memory related to MDEV-31350 and MDEV-31353, I got a surprise crash near the end of my test run of a RelWithDebInfo executable:
10.9 44c9008ba65686abf1c82c9166255a8c52d61f74 |
innodb.recovery_memory 'innodb,release' w76 [ 100 fail ]
|
Test ended at 2023-05-26 12:43:08
|
|
CURRENT_TEST: innodb.recovery_memory
|
|
|
Server [mysqld.1 - pid: 1375448, winpid: 1375448, exit: 256] failed during test run
|
Server log from this test:
|
----------SERVER LOG START-----------
|
$ /dev/shm/10.9/sql/mariadbd --defaults-group-suffix=.1 --defaults-file=/dev/shm/10.9/mysql-test/var/76/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --disable-gdb --loose-sequence --innodb_buffer_pool_size=1073741824 --innodb_buffer_pool_size=5242880 --core-file --loose-debug-sync-timeout=300
|
2023-05-26 12:42:51 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32222)
|
2023-05-26 12:42:51 0 [Warning] Changed limits: max_open_files: 1024 max_connections: 151 (was 151) table_cache: 421 (was 2000)
|
2023-05-26 12:42:51 0 [Note] Starting MariaDB 10.9.7-MariaDB-log source revision 44c9008ba65686abf1c82c9166255a8c52d61f74 as process 1375542
|
2023-05-26 12:42:51 0 [Note] InnoDB: Compressed tables use zlib 1.2.13
|
2023-05-26 12:42:51 0 [Note] InnoDB: Using transactional memory
|
2023-05-26 12:42:51 0 [Note] InnoDB: Number of transaction pools: 1
|
2023-05-26 12:42:51 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
|
2023-05-26 12:42:51 0 [Note] InnoDB: Using liburing
|
2023-05-26 12:42:51 0 [Note] InnoDB: Initializing buffer pool, total size = 5.000MiB, chunk size = 1.000MiB
|
2023-05-26 12:42:51 0 [Note] InnoDB: Completed initialization of buffer pool
|
2023-05-26 12:42:51 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
|
2023-05-26 12:42:51 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=691948272
|
2023-05-26 12:42:51 0 [Note] InnoDB: Multi-batch recovery needed at LSN 692430929
|
2023-05-26 12:42:51 0 [Note] InnoDB: End of log at LSN=694016960
|
2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 692776501/694016960; 356 pages
|
2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 693253964/694016960; 317 pages
|
2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 693734818/694016960; 268 pages
|
2023-05-26 12:42:52 0 [Note] InnoDB: To recover: 9 pages
|
230526 12:42:52 [ERROR] mysqld got signal 11 ;
|
...
|
#5 buf_page_t::oldest_modification (this=0x0) at /mariadb/11/storage/innobase/include/buf0buf.h:749
|
#6 log_sort_flush_list()::$_1::operator()(buf_page_t const*, buf_page_t const*) const (lhs=0x0, rhs=<optimized out>, this=<optimized out>) at /mariadb/11/storage/innobase/log/log0recv.cc:3802
|
#7 __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>::operator()<buf_page_t**, buf_page_t**>(buf_page_t**, buf_page_t**) (__it1=<optimized out>, __it2=0x5578776b7ca0, this=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/predefined_ops.h:158
|
#8 std::__unguarded_partition<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=<optimized out>, __last=0x5578776b7fe8, __pivot=<optimized out>, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1867
|
#9 std::__unguarded_partition_pivot<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1889
|
#10 std::__introsort_loop<buf_page_t**, long, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, long, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=__first@entry=0x5578776b7ca0, __last=0x5578776b7ff0, __depth_limit=11, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1921
|
#11 0x0000557875a531ce in std::__sort<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1937
|
#12 std::sort<buf_page_t**, log_sort_flush_list()::$_1>(buf_page_t**, buf_page_t**, log_sort_flush_list()::$_1) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:4853
|
The crash occurs in the following code, which was added in MDEV-27022:
static void log_sort_flush_list() |
{
|
mysql_mutex_lock(&buf_pool.flush_list_mutex);
|
|
const size_t size= UT_LIST_GET_LEN(buf_pool.flush_list); |
std::unique_ptr<buf_page_t *[]> list(new buf_page_t *[size]); |
|
size_t idx= 0; |
for (buf_page_t *p= UT_LIST_GET_FIRST(buf_pool.flush_list); p; |
p= UT_LIST_GET_NEXT(list, p))
|
list.get()[idx++]= p;
|
|
std::sort(list.get(), list.get() + size,
|
[](const buf_page_t *lhs, const buf_page_t *rhs) { |
return rhs->oldest_modification() < lhs->oldest_modification(); |
});
|
Based on the core dump, it looks like the comparison function may have been invoked on the first out-of-bounds element list.get() + size.
The size is equal to buf_pool.flush_list.count, and all elements in the array list.get() are nonnull. The address immediately after the last element (the second argument passed to std::sort()) contains a null pointer.
To me, this looks like a possible error in the implementation of std::sort() in this version of libstdc+. The executable had been compiled with clang-15, but the option -stdlib=libc+ was not used.
I think that more investigation will be needed on this. There is also other similar use of std::sort() in InnoDB.
Attachments
Issue Links
- relates to
-
MDEV-25113 Reduce effect of parallel background flush on select workload
- Closed
-
MDEV-31791 Crash recovery in the test innodb.recovery_memory occasionally fails
- Closed
-
MDEV-27022 Buffer pool is being flushed during recovery
- Closed
-
MDEV-31350 test innodb.recovery_memory failed on '21 failed attempts to flush a page'
- Closed
-
MDEV-31353 InnoDB recovery hangs after reporting corruption
- Closed
-
MDEV-32029 Assertion failures in log_sort_flush_list upon crash recovery
- Closed