Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31354

SIGSEGV in log_sort_flush_list() in InnoDB crash recovery

    XMLWordPrintable

Details

    Description

      While testing innodb.recovery_memory related to MDEV-31350 and MDEV-31353, I got a surprise crash near the end of my test run of a RelWithDebInfo executable:

      10.9 44c9008ba65686abf1c82c9166255a8c52d61f74

      innodb.recovery_memory 'innodb,release'  w76 [ 100 fail ]
              Test ended at 2023-05-26 12:43:08
       
      CURRENT_TEST: innodb.recovery_memory
       
       
      Server [mysqld.1 - pid: 1375448, winpid: 1375448, exit: 256] failed during test run
      Server log from this test:
      ----------SERVER LOG START-----------
      $ /dev/shm/10.9/sql/mariadbd --defaults-group-suffix=.1 --defaults-file=/dev/shm/10.9/mysql-test/var/76/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --disable-gdb --loose-sequence --innodb_buffer_pool_size=1073741824 --innodb_buffer_pool_size=5242880 --core-file --loose-debug-sync-timeout=300
      2023-05-26 12:42:51 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32222)
      2023-05-26 12:42:51 0 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)
      2023-05-26 12:42:51 0 [Note] Starting MariaDB 10.9.7-MariaDB-log source revision 44c9008ba65686abf1c82c9166255a8c52d61f74 as process 1375542
      2023-05-26 12:42:51 0 [Note] InnoDB: Compressed tables use zlib 1.2.13
      2023-05-26 12:42:51 0 [Note] InnoDB: Using transactional memory
      2023-05-26 12:42:51 0 [Note] InnoDB: Number of transaction pools: 1
      2023-05-26 12:42:51 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2023-05-26 12:42:51 0 [Note] InnoDB: Using liburing
      2023-05-26 12:42:51 0 [Note] InnoDB: Initializing buffer pool, total size = 5.000MiB, chunk size = 1.000MiB
      2023-05-26 12:42:51 0 [Note] InnoDB: Completed initialization of buffer pool
      2023-05-26 12:42:51 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
      2023-05-26 12:42:51 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=691948272
      2023-05-26 12:42:51 0 [Note] InnoDB: Multi-batch recovery needed at LSN 692430929
      2023-05-26 12:42:51 0 [Note] InnoDB: End of log at LSN=694016960
      2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 692776501/694016960; 356 pages
      2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 693253964/694016960; 317 pages
      2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 693734818/694016960; 268 pages
      2023-05-26 12:42:52 0 [Note] InnoDB: To recover: 9 pages
      230526 12:42:52 [ERROR] mysqld got signal 11 ;
      ...
      #5  buf_page_t::oldest_modification (this=0x0) at /mariadb/11/storage/innobase/include/buf0buf.h:749
      #6  log_sort_flush_list()::$_1::operator()(buf_page_t const*, buf_page_t const*) const (lhs=0x0, rhs=<optimized out>, this=<optimized out>) at /mariadb/11/storage/innobase/log/log0recv.cc:3802
      #7  __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>::operator()<buf_page_t**, buf_page_t**>(buf_page_t**, buf_page_t**) (__it1=<optimized out>, __it2=0x5578776b7ca0, this=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/predefined_ops.h:158
      #8  std::__unguarded_partition<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=<optimized out>, __last=0x5578776b7fe8, __pivot=<optimized out>, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1867
      #9  std::__unguarded_partition_pivot<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1889
      #10 std::__introsort_loop<buf_page_t**, long, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, long, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=__first@entry=0x5578776b7ca0, __last=0x5578776b7ff0, __depth_limit=11, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1921
      #11 0x0000557875a531ce in std::__sort<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1937
      #12 std::sort<buf_page_t**, log_sort_flush_list()::$_1>(buf_page_t**, buf_page_t**, log_sort_flush_list()::$_1) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:4853
      

      The crash occurs in the following code, which was added in MDEV-27022:

      static void log_sort_flush_list()
      {
        mysql_mutex_lock(&buf_pool.flush_list_mutex);
       
        const size_t size= UT_LIST_GET_LEN(buf_pool.flush_list);
        std::unique_ptr<buf_page_t *[]> list(new buf_page_t *[size]);
       
        size_t idx= 0;
        for (buf_page_t *p= UT_LIST_GET_FIRST(buf_pool.flush_list); p;
             p= UT_LIST_GET_NEXT(list, p))
          list.get()[idx++]= p;
       
        std::sort(list.get(), list.get() + size,
                  [](const buf_page_t *lhs, const buf_page_t *rhs) {
                    return rhs->oldest_modification() < lhs->oldest_modification();
                  });
      

      Based on the core dump, it looks like the comparison function may have been invoked on the first out-of-bounds element list.get() + size.

      The size is equal to buf_pool.flush_list.count, and all elements in the array list.get() are nonnull. The address immediately after the last element (the second argument passed to std::sort()) contains a null pointer.

      To me, this looks like a possible error in the implementation of std::sort() in this version of libstdc+. The executable had been compiled with clang-15, but the option -stdlib=libc+ was not used.

      I think that more investigation will be needed on this. There is also other similar use of std::sort() in InnoDB.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.