Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31354

SIGSEGV in log_sort_flush_list() in InnoDB crash recovery

Details

    Description

      While testing innodb.recovery_memory related to MDEV-31350 and MDEV-31353, I got a surprise crash near the end of my test run of a RelWithDebInfo executable:

      10.9 44c9008ba65686abf1c82c9166255a8c52d61f74

      innodb.recovery_memory 'innodb,release'  w76 [ 100 fail ]
              Test ended at 2023-05-26 12:43:08
       
      CURRENT_TEST: innodb.recovery_memory
       
       
      Server [mysqld.1 - pid: 1375448, winpid: 1375448, exit: 256] failed during test run
      Server log from this test:
      ----------SERVER LOG START-----------
      $ /dev/shm/10.9/sql/mariadbd --defaults-group-suffix=.1 --defaults-file=/dev/shm/10.9/mysql-test/var/76/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --disable-gdb --loose-sequence --innodb_buffer_pool_size=1073741824 --innodb_buffer_pool_size=5242880 --core-file --loose-debug-sync-timeout=300
      2023-05-26 12:42:51 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32222)
      2023-05-26 12:42:51 0 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)
      2023-05-26 12:42:51 0 [Note] Starting MariaDB 10.9.7-MariaDB-log source revision 44c9008ba65686abf1c82c9166255a8c52d61f74 as process 1375542
      2023-05-26 12:42:51 0 [Note] InnoDB: Compressed tables use zlib 1.2.13
      2023-05-26 12:42:51 0 [Note] InnoDB: Using transactional memory
      2023-05-26 12:42:51 0 [Note] InnoDB: Number of transaction pools: 1
      2023-05-26 12:42:51 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2023-05-26 12:42:51 0 [Note] InnoDB: Using liburing
      2023-05-26 12:42:51 0 [Note] InnoDB: Initializing buffer pool, total size = 5.000MiB, chunk size = 1.000MiB
      2023-05-26 12:42:51 0 [Note] InnoDB: Completed initialization of buffer pool
      2023-05-26 12:42:51 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
      2023-05-26 12:42:51 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=691948272
      2023-05-26 12:42:51 0 [Note] InnoDB: Multi-batch recovery needed at LSN 692430929
      2023-05-26 12:42:51 0 [Note] InnoDB: End of log at LSN=694016960
      2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 692776501/694016960; 356 pages
      2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 693253964/694016960; 317 pages
      2023-05-26 12:42:51 0 [Note] InnoDB: To recover: LSN 693734818/694016960; 268 pages
      2023-05-26 12:42:52 0 [Note] InnoDB: To recover: 9 pages
      230526 12:42:52 [ERROR] mysqld got signal 11 ;
      ...
      #5  buf_page_t::oldest_modification (this=0x0) at /mariadb/11/storage/innobase/include/buf0buf.h:749
      #6  log_sort_flush_list()::$_1::operator()(buf_page_t const*, buf_page_t const*) const (lhs=0x0, rhs=<optimized out>, this=<optimized out>) at /mariadb/11/storage/innobase/log/log0recv.cc:3802
      #7  __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>::operator()<buf_page_t**, buf_page_t**>(buf_page_t**, buf_page_t**) (__it1=<optimized out>, __it2=0x5578776b7ca0, this=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/predefined_ops.h:158
      #8  std::__unguarded_partition<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=<optimized out>, __last=0x5578776b7fe8, __pivot=<optimized out>, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1867
      #9  std::__unguarded_partition_pivot<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1889
      #10 std::__introsort_loop<buf_page_t**, long, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, long, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=__first@entry=0x5578776b7ca0, __last=0x5578776b7ff0, __depth_limit=11, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1921
      #11 0x0000557875a531ce in std::__sort<buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1> >(buf_page_t**, buf_page_t**, __gnu_cxx::__ops::_Iter_comp_iter<log_sort_flush_list()::$_1>) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:1937
      #12 std::sort<buf_page_t**, log_sort_flush_list()::$_1>(buf_page_t**, buf_page_t**, log_sort_flush_list()::$_1) (__first=0x5578776b7ca0, __last=0x5578776b7ff0, __comp=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:4853
      

      The crash occurs in the following code, which was added in MDEV-27022:

      static void log_sort_flush_list()
      {
        mysql_mutex_lock(&buf_pool.flush_list_mutex);
       
        const size_t size= UT_LIST_GET_LEN(buf_pool.flush_list);
        std::unique_ptr<buf_page_t *[]> list(new buf_page_t *[size]);
       
        size_t idx= 0;
        for (buf_page_t *p= UT_LIST_GET_FIRST(buf_pool.flush_list); p;
             p= UT_LIST_GET_NEXT(list, p))
          list.get()[idx++]= p;
       
        std::sort(list.get(), list.get() + size,
                  [](const buf_page_t *lhs, const buf_page_t *rhs) {
                    return rhs->oldest_modification() < lhs->oldest_modification();
                  });
      

      Based on the core dump, it looks like the comparison function may have been invoked on the first out-of-bounds element list.get() + size.

      The size is equal to buf_pool.flush_list.count, and all elements in the array list.get() are nonnull. The address immediately after the last element (the second argument passed to std::sort()) contains a null pointer.

      To me, this looks like a possible error in the implementation of std::sort() in this version of libstdc+. The executable had been compiled with clang-15, but the option -stdlib=libc+ was not used.

      I think that more investigation will be needed on this. There is also other similar use of std::sort() in InnoDB.

      Attachments

        Issue Links

          Activity

            Another failure with the above buggy patch:

            CURRENT_TEST: innodb.recovery_memory
            mysqltest: At line 50: query 'SHOW CREATE TABLE t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
            …
            2023-07-27 12:34:36 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=311491401
            2023-07-27 12:34:36 0 [Note] InnoDB: End of log at LSN=315636465
            2023-07-27 12:34:36 0 [Note] InnoDB: To recover: 279 pages
            2023-07-27 12:34:36 0 [ERROR] InnoDB: Not applying INSERT_HEAP_DYNAMIC due to corruption on [page id: space=239, page number=110]
            2023-07-27 12:34:36 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
            

            Hopefully this revised wait loop does the trick and ensures that no page writes are in progress while we sort the buf_pool.flush_list:

              for (;;)
              {
                os_aio_wait_until_no_pending_writes(false);
                mysql_mutex_lock(&buf_pool.flush_list_mutex);
                if (buf_pool.flush_list_active())
                  my_cond_wait(&buf_pool.done_flush_list,
                               &buf_pool.flush_list_mutex.m_mutex);
                else if (!os_aio_pending_writes())
                  break;
                mysql_mutex_unlock(&buf_pool.flush_list_mutex);
              }
            

            marko Marko Mäkelä added a comment - Another failure with the above buggy patch: CURRENT_TEST: innodb.recovery_memory mysqltest: At line 50: query 'SHOW CREATE TABLE t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB' … 2023-07-27 12:34:36 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=311491401 2023-07-27 12:34:36 0 [Note] InnoDB: End of log at LSN=315636465 2023-07-27 12:34:36 0 [Note] InnoDB: To recover: 279 pages 2023-07-27 12:34:36 0 [ERROR] InnoDB: Not applying INSERT_HEAP_DYNAMIC due to corruption on [page id: space=239, page number=110] 2023-07-27 12:34:36 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. Hopefully this revised wait loop does the trick and ensures that no page writes are in progress while we sort the buf_pool.flush_list : for (;;) { os_aio_wait_until_no_pending_writes( false ); mysql_mutex_lock(&buf_pool.flush_list_mutex); if (buf_pool.flush_list_active()) my_cond_wait(&buf_pool.done_flush_list, &buf_pool.flush_list_mutex.m_mutex); else if (!os_aio_pending_writes()) break ; mysql_mutex_unlock(&buf_pool.flush_list_mutex); }

            Unfortunately, also with the revised patch I got a failure:

            innodb.recovery_memory 'innodb,release'  w59 [ 53 fail ]
                    Test ended at 2023-07-27 12:52:00
             
            CURRENT_TEST: innodb.recovery_memory
            mysqltest: At line 50: query 'SHOW CREATE TABLE t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
            …
            2023-07-27 12:51:59 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=280364554
            2023-07-27 12:51:59 0 [Note] InnoDB: End of log at LSN=284055215
            2023-07-27 12:51:59 0 [Note] InnoDB: To recover: 198 pages
            2023-07-27 12:51:59 0 [ERROR] InnoDB: Not applying INSERT_HEAP_DYNAMIC due to corruption on [page id: space=218, page number=113]
            2023-07-27 12:51:59 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
            2023-07-27 12:51:59 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages.
            2023-07-27 12:51:59 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=218, page number=113]
            2023-07-27 12:51:59 0 [Note] sorting 194,194
            

            The kill+restart before this failure processed much fewer pages in the last batch:

            2023-07-27 12:51:55 0 [Note] sorting 11,11
            

            This means that the test is reproducing two independent bugs. I started one more campaign, hoping to produce an rr replay trace:

            while ./mtr --rr=-h --parallel=60 innodb.recovery_memory{,,,,,,,,,}{,,,,,}; do :; done
            

            marko Marko Mäkelä added a comment - Unfortunately, also with the revised patch I got a failure: innodb.recovery_memory 'innodb,release' w59 [ 53 fail ] Test ended at 2023-07-27 12:52:00   CURRENT_TEST: innodb.recovery_memory mysqltest: At line 50: query 'SHOW CREATE TABLE t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB' … 2023-07-27 12:51:59 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=280364554 2023-07-27 12:51:59 0 [Note] InnoDB: End of log at LSN=284055215 2023-07-27 12:51:59 0 [Note] InnoDB: To recover: 198 pages 2023-07-27 12:51:59 0 [ERROR] InnoDB: Not applying INSERT_HEAP_DYNAMIC due to corruption on [page id: space=218, page number=113] 2023-07-27 12:51:59 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. 2023-07-27 12:51:59 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages. 2023-07-27 12:51:59 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=218, page number=113] 2023-07-27 12:51:59 0 [Note] sorting 194,194 The kill+restart before this failure processed much fewer pages in the last batch: 2023-07-27 12:51:55 0 [Note] sorting 11,11 This means that the test is reproducing two independent bugs. I started one more campaign, hoping to produce an rr replay trace: while . /mtr --rr=-h --parallel=60 innodb.recovery_memory{,,,,,,,,,}{,,,,,}; do :; done

            I failed to reproduce this with rr so far. I got the idea to test with simulated asynchronous I/O:

            ./mtr --parallel=100 --repeat=100 --mysqld=--innodb-use-native-aio=0 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,}
            

            This did fail as well, so a bug like MDEV-29610 should not play a role.

            innodb.recovery_memory 'innodb,release'  w87 [ 62 fail ]
                    Test ended at 2023-07-27 17:32:06
             
            CURRENT_TEST: innodb.recovery_memory
            mysqltest: At line 42: query 'CREATE TABLE t1(f1 INT NOT NULL)ENGINE=InnoDB' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
            …
            2023-07-27 17:32:05 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=324014657
            2023-07-27 17:32:05 0 [Note] InnoDB: Multi-batch recovery needed at LSN 324529178
            2023-07-27 17:32:05 0 [Note] InnoDB: End of log at LSN=325255352
            2023-07-27 17:32:05 0 [Note] InnoDB: To recover: LSN 324637270/325255352; 287 pages
            2023-07-27 17:32:06 0 [Note] InnoDB: To recover: LSN 325135744/325255352; 383 pages
            2023-07-27 17:32:06 0 [ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2398, page number=729743360], should be [page id: space=1, page number=556]
            

            marko Marko Mäkelä added a comment - I failed to reproduce this with rr so far. I got the idea to test with simulated asynchronous I/O: . /mtr --parallel=100 --repeat=100 --mysqld=--innodb-use-native-aio=0 innodb.recovery_memory{,,,,,,,,,}{,,,,,,,,,} This did fail as well, so a bug like MDEV-29610 should not play a role. innodb.recovery_memory 'innodb,release' w87 [ 62 fail ] Test ended at 2023-07-27 17:32:06   CURRENT_TEST: innodb.recovery_memory mysqltest: At line 42: query 'CREATE TABLE t1(f1 INT NOT NULL)ENGINE=InnoDB' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB' … 2023-07-27 17:32:05 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=324014657 2023-07-27 17:32:05 0 [Note] InnoDB: Multi-batch recovery needed at LSN 324529178 2023-07-27 17:32:05 0 [Note] InnoDB: End of log at LSN=325255352 2023-07-27 17:32:05 0 [Note] InnoDB: To recover: LSN 324637270/325255352; 287 pages 2023-07-27 17:32:06 0 [Note] InnoDB: To recover: LSN 325135744/325255352; 383 pages 2023-07-27 17:32:06 0 [ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=2398, page number=729743360], should be [page id: space=1, page number=556]

            I pushed a fix for the SIGSEGV issue (likely due to a data race with std::sort). The remaining failures will be addressed in MDEV-31791.

            marko Marko Mäkelä added a comment - I pushed a fix for the SIGSEGV issue (likely due to a data race with std::sort ). The remaining failures will be addressed in MDEV-31791 .

            mleich was able to reproduce this in a branch where the fix had not been merged yet:

            ssh pluto
            rr replay /data/results/1690831257/TBR-2024/1/rr/latest-trace/
            

            break std::sort<buf_page_t**, log_sort_flush_list()::<lambda(const buf_page_t*, const buf_page_t*)> >
            continue
            break buf_page_write_complete
            ignore 2 1000
            continue
            info breakpoints
            

            Thread 1 hit Breakpoint 1, std::sort<buf_page_t**, log_sort_flush_list()::<lambda(const buf_page_t*, const buf_page_t*)> > (__last=0x5581033f8720, __first=0x5581033f8670, __comp=...)
                at /usr/include/c++/9/bits/stl_algo.h:4899
            4899	      std::__sort(__first, __last, __gnu_cxx::__ops::__iter_comp_iter(__comp));
            (rr) break buf_page_write_complete
            Breakpoint 2 at 0x5581000b83d1: buf_page_write_complete. (2 locations)
            (rr) ign 2 10000
            Will ignore next 10000 crossings of breakpoint 2.
            (rr) continue
            Continuing.
             
            Thread 1 received signal SIGSEGV, Segmentation fault.
            0x00005581006f1b82 in std::__atomic_base<unsigned long>::load (__m=std::memory_order_seq_cst, this=0xd1) at /usr/include/c++/9/bits/atomic_base.h:413
            413	      load(memory_order __m = memory_order_seq_cst) const noexcept
            (rr) info breakpoints
            Num     Type           Disp Enb Address            What
            1       breakpoint     keep y   0x00005581006f4a55 in std::sort<buf_page_t**, log_sort_flush_list()::<lambda(const buf_page_t*, const buf_page_t*)> > at /usr/include/c++/9/bits/stl_algo.h:1962
            	breakpoint already hit 1 time
            2       breakpoint     keep y   <MULTIPLE>         
            	breakpoint already hit 22 times
            	ignore next 9978 hits
            2.1                         y   0x00005581000b83d1 in buf_page_write_complete(IORequest const&) at /data/Server/bb-11.2-MDEV-14795_1E/storage/innobase/buf/buf0flu.cc:337
            2.2                         y   0x00005581007cf270 in buf_page_write_complete(IORequest const&) at /data/Server/bb-11.2-MDEV-14795_1E/storage/innobase/buf/buf0flu.cc:321
            

            I observed one page for which oldest_modification_ was modified from 56630890 to 1 while the std::sort() was in progress.

            marko Marko Mäkelä added a comment - mleich was able to reproduce this in a branch where the fix had not been merged yet: ssh pluto rr replay /data/results/1690831257/TBR-2024/1/rr/latest-trace/ break std::sort<buf_page_t**, log_sort_flush_list()::<lambda(const buf_page_t*, const buf_page_t*)> > continue break buf_page_write_complete ignore 2 1000 continue info breakpoints Thread 1 hit Breakpoint 1, std::sort<buf_page_t**, log_sort_flush_list()::<lambda(const buf_page_t*, const buf_page_t*)> > (__last=0x5581033f8720, __first=0x5581033f8670, __comp=...) at /usr/include/c++/9/bits/stl_algo.h:4899 4899 std::__sort(__first, __last, __gnu_cxx::__ops::__iter_comp_iter(__comp)); (rr) break buf_page_write_complete Breakpoint 2 at 0x5581000b83d1: buf_page_write_complete. (2 locations) (rr) ign 2 10000 Will ignore next 10000 crossings of breakpoint 2. (rr) continue Continuing.   Thread 1 received signal SIGSEGV, Segmentation fault. 0x00005581006f1b82 in std::__atomic_base<unsigned long>::load (__m=std::memory_order_seq_cst, this=0xd1) at /usr/include/c++/9/bits/atomic_base.h:413 413 load(memory_order __m = memory_order_seq_cst) const noexcept (rr) info breakpoints Num Type Disp Enb Address What 1 breakpoint keep y 0x00005581006f4a55 in std::sort<buf_page_t**, log_sort_flush_list()::<lambda(const buf_page_t*, const buf_page_t*)> > at /usr/include/c++/9/bits/stl_algo.h:1962 breakpoint already hit 1 time 2 breakpoint keep y <MULTIPLE> breakpoint already hit 22 times ignore next 9978 hits 2.1 y 0x00005581000b83d1 in buf_page_write_complete(IORequest const&) at /data/Server/bb-11.2-MDEV-14795_1E/storage/innobase/buf/buf0flu.cc:337 2.2 y 0x00005581007cf270 in buf_page_write_complete(IORequest const&) at /data/Server/bb-11.2-MDEV-14795_1E/storage/innobase/buf/buf0flu.cc:321 I observed one page for which oldest_modification_ was modified from 56630890 to 1 while the std::sort() was in progress.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.