Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31817

SIGSEGV after btr_page_get_father_block() returns nullptr on corrupted data

Details

    Description

      #!/bin/bash
      # Start server with --max_connections=10000
      # Set variables and ensure ramloc is a ramdisk or tmpfs (i.e. /dev/shm)
       
      user="root"
      socket="./socket.sock"
      db="test"
      client="./bin/mariadb"
      errorlog="./log/master.err"
      ramloc="/dev/shm"
      threads=2000   # Number of concurrent threads
      queries=100    # Number of t1/t2 INSERTs per thread/per test round
      rounds=999999  # Number of max test rounds
       
      # Setup
      ${client} -u ${user} -S ${socket} -D ${db} -e "
      DROP TABLE IF EXISTS t1;
      DROP TABLE IF EXISTS t2;
      CREATE TABLE t1 (c1 INT NOT NULL AUTO_INCREMENT, c2 INT NOT NULL, PRIMARY KEY (c1), UNIQUE KEY u1 (c1,c2)) ENGINE=InnoDB AUTO_INCREMENT=1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4; 
      CREATE TABLE t2 (c1 DATETIME NOT NULL, c2 DOUBLE NOT NULL, t1_c1 INT NOT NULL, PRIMARY KEY (t1_c1,c1)) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;
      "
       
      insert_rows(){
        SQL=
        for ((i=0;i<${queries};i++)); do
          SQL="${SQL}INSERT INTO t1 (c2) VALUES (0); INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID()));"
        done
        ${client} -u ${user} -S ${socket} -D ${db} -e "${SQL}"
        rm -f ${ramloc}/${prefix}_md_proc_${1}  # Thread done
      }
       
      abort(){ jobs -p | xargs -P100 kill >/dev/null 2>&1; rm -Rf ${ramloc}/${prefix}_md_proc_*; exit 1; }
      trap abort SIGINT
       
      count=0
      prefix="$(echo "${RANDOM}${RANDOM}${RANDOM}" | cut -b1-5)"
      rm -f ${ramloc}/${prefix}_md_proc_*
      for ((i=0;i<${rounds};i++)); do
        for ((i=0;i<${threads};i++)); do
          if [ ! -r ${ramloc}/${prefix}_md_proc_${i} ]; then  # Thread idle
            touch ${ramloc}/${prefix}_md_proc_${i}  # Thread busy
            insert_rows ${i} &
            if [ $[ ${count} % 1000 ] -eq 0 ]; then
              ${client} -u ${user} -S ${socket} -D ${db} -e "TRUNCATE TABLE t1;"
            fi
            count=$[ ${count} + 1 ]
            if [ $[ ${count} % 100 ] -eq 0 ]; then  # Limit disk I/O, check once every new 100 threads
              echo "Count: ${count}" | tee lastcount.log
              TAIL="$(tail -n10 ${errorlog} | tr -d '\n')"
              if [[ "${TAIL}" == *"ERROR"* ]]; then
                echo '*** Error found:'
                grep -i 'ERROR' log/master.err
                abort
              elif [[ "${TAIL}" == *"down complete"* ]]; then
                echo '*** Server shutdown'
                abort
              elif ! ${client}-admin ping -u ${user} -S ${socket} > /dev/null 2>&1; then
                echo '*** Server gone (killed/crashed)'
                abort
              fi
            fi
          fi
        done
      done
      

      Leads to:

      10.6 b102872ad50cce5959ad95369740766d14e9e48c (Optimized)

      Version: '10.6.15-MariaDB'  socket: '/test/bb-10.6-primary-corruption_MD260723-mariadb-10.6.15-linux-x86_64-opt/socket.sock'  port: 11190  MariaDB Server
      2023-07-31 11:55:08 1508 [Note] InnoDB: Number of pools: 2
      2023-07-31 20:53:10 997643 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
      

      10.6 b102872ad50cce5959ad95369740766d14e9e48c (Optimized)

      Core was generated by `/test/bb-10.6-primary-corruption_MD260723-mariadb-10.6.15-linux-x86_64-opt/bin/'.
      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  0x00005608ae1df729 in rec_offs_n_fields (offsets=<optimized out>)
          at /test/bb-10.6-primary-corruption_opt/storage/innobase/include/rem0rec.h:605
      [Current thread is 1 (Thread 0x14ded65de640 (LWP 3098102))]
      (gdb) bt
      #0  0x00005608ae1df729 in rec_offs_n_fields (offsets=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/include/rem0rec.h:605
      #1  rec_offs_data_size (offsets=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/include/rem0rec.inl:968
      #2  btr_node_ptr_set_child_page_no (mtr=<optimized out>, page_no=<optimized out>, offsets=<optimized out>, rec=<optimized out>, block=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/btr/btr0btr.cc:732
      #3  btr_attach_half_pages (mtr=0x14ded65db730, direction=112, new_block=0x14e2f4081140, split_rec=0x14e2fa4d320d "", block=0x14e2f40f7f20, index=0x14e0ec0b9ba8, flags=0) at /test/bb-10.6-primary-corruption_opt/storage/innobase/btr/btr0btr.cc:2547
      #4  btr_page_split_and_insert (flags=flags@entry=0, cursor=cursor@entry=0x14ded65daee0, offsets=offsets@entry=0x14ded65dae70, heap=heap@entry=0x14ded65dae68, tuple=tuple@entry=0x14e04407edb8, n_ext=0, mtr=0x14ded65db730, err=0x14ded65dad80) at /test/bb-10.6-primary-corruption_opt/storage/innobase/btr/btr0btr.cc:3120
      #5  0x00005608ae1f48c2 in btr_cur_pessimistic_insert (flags=flags@entry=0, cursor=0x14ded65daee0, offsets=offsets@entry=0x14ded65dae70, heap=0x14ded65dae68, entry=entry@entry=0x14e04407edb8, rec=0x14ded65daf70, big_rec=0x14ded65db2b0, n_ext=<optimized out>, thr=0x14e044151670, mtr=0x14ded65db730) at /test/bb-10.6-primary-corruption_opt/storage/innobase/btr/btr0cur.cc:2689
      #6  0x00005608ae1762cf in row_ins_sec_index_entry_low (flags=<optimized out>, mode=<optimized out>, index=0x14e0ec0b9ba8, offsets_heap=<optimized out>, heap=<optimized out>, entry=0x14e04407edb8, trx_id=<optimized out>, thr=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0ins.cc:3143
      #7  0x00005608ae179055 in row_ins_sec_index_entry (index=0x14e0ec0b9ba8, entry=0x14e04407edb8, thr=0x14e044151670, check_foreign=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0ins.cc:3333
      #8  0x00005608ae17939a in row_ins_index_entry (thr=0x14e044151670, entry=<optimized out>, index=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0ins.cc:3367
      #9  row_ins_index_entry_step (thr=0x14e044151670, node=0x14e044151448) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0ins.cc:3533
      #10 row_ins (thr=0x14e044151670, node=0x14e044151448) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0ins.cc:3658
      #11 row_ins_step (thr=thr@entry=0x14e044151670) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0ins.cc:3787
      #12 0x00005608ae1873ef in row_insert_for_mysql (mysql_rec=<optimized out>, prebuilt=0x14e044150f58, ins_mode=ROW_INS_NORMAL) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0mysql.cc:1314
      #13 0x00005608ae0e29aa in ha_innobase::write_row (this=0x14e0441846b0, record=0x14e0e40a7488 "\377d4") at /test/bb-10.6-primary-corruption_opt/storage/innobase/handler/ha_innodb.cc:7917
      #14 0x00005608ade32930 in handler::ha_write_row (this=0x14e0441846b0, buf=0x14e0e40a7488 "\377d4") at /test/bb-10.6-primary-corruption_opt/sql/handler.cc:7629
      #15 0x00005608adbb5c42 in write_record (thd=thd@entry=0x14e1c808f968, table=table@entry=0x14e0e416e558, info=info@entry=0x14ded65dc440, sink=sink@entry=0x0) at /test/bb-10.6-primary-corruption_opt/sql/sql_insert.cc:2157
      #16 0x00005608adbbce8f in mysql_insert (thd=thd@entry=0x14e1c808f968, table_list=<optimized out>, fields=@0x14e1c8094860: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x14e1c809fd40, last = 0x14e1c809fd40, elements = 1}, <No data fields>}, values_list=@0x14e1c80948a8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x14e1c80a02b0, last = 0x14e1c80a02b0, elements = 1}, <No data fields>}, update_fields=@0x14e1c8094890: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x5608aec9c050 <end_of_list>, last = 0x14e1c8094890, elements = 0}, <No data fields>}, update_values=@0x14e1c8094878: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x5608aec9c050 <end_of_list>, last = 0x14e1c8094878, elements = 0}, <No data fields>}, duplic=<optimized out>, ignore=<optimized out>, result=<optimized out>) at /test/bb-10.6-primary-corruption_opt/sql/sql_insert.cc:1129
      #17 0x00005608adbf16a1 in mysql_execute_command (thd=0x14e1c808f968, is_called_from_prepared_stmt=<optimized out>) at /test/bb-10.6-primary-corruption_opt/sql/sql_parse.cc:4570
      #18 0x00005608adbf5dd4 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x14e1c808f968) at /test/bb-10.6-primary-corruption_opt/sql/sql_parse.cc:8041
      #19 mysql_parse (thd=0x14e1c808f968, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /test/bb-10.6-primary-corruption_opt/sql/sql_parse.cc:7963
      #20 0x00005608adbf8422 in dispatch_command (command=COM_QUERY, thd=0x14e1c808f968, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /test/bb-10.6-primary-corruption_opt/sql/sql_parse.cc:1993
      #21 0x00005608adbf9ca0 in do_command (thd=0x14e1c808f968, blocking=blocking@entry=true) at /test/bb-10.6-primary-corruption_opt/sql/sql_parse.cc:1409
      #22 0x00005608adcff827 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x5608b0d7d3f8, put_in_cache=put_in_cache@entry=true) at /test/bb-10.6-primary-corruption_opt/sql/sql_connect.cc:1416
      #23 0x00005608adcffafd in handle_one_connection (arg=0x5608b0d7d3f8) at /test/bb-10.6-primary-corruption_opt/sql/sql_connect.cc:1318
      #24 0x000014e309694b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #25 0x000014e309726a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
      

      Attachments

        Issue Links

          Activity

            Can you please try to reproduce a crash earlier with the following patch:

            diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
            index 98b76c34ff5..a9f3c1eb769 100644
            --- a/storage/innobase/row/row0mysql.cc
            +++ b/storage/innobase/row/row0mysql.cc
            @@ -706,7 +706,7 @@ row_mysql_handle_errors(
             	case DB_TABLE_CORRUPT:
             	case DB_CORRUPTION:
             	case DB_PAGE_CORRUPTED:
            -		ib::error() << "We detected index corruption in an InnoDB type"
            +		ib::fatal() << "We detected index corruption in an InnoDB type"
             			" table. You have to dump + drop + reimport the"
             			" table or, in a case of widespread corruption,"
             			" dump all InnoDB tables and recreate the whole"
            

            marko Marko Mäkelä added a comment - Can you please try to reproduce a crash earlier with the following patch: diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc index 98b76c34ff5..a9f3c1eb769 100644 --- a/storage/innobase/row/row0mysql.cc +++ b/storage/innobase/row/row0mysql.cc @@ -706,7 +706,7 @@ row_mysql_handle_errors( case DB_TABLE_CORRUPT: case DB_CORRUPTION: case DB_PAGE_CORRUPTED: - ib::error() << "We detected index corruption in an InnoDB type" + ib::fatal() << "We detected index corruption in an InnoDB type" " table. You have to dump + drop + reimport the" " table or, in a case of widespread corruption," " dump all InnoDB tables and recreate the whole"
            Roel Roel Van de Paar added a comment - - edited

            Core + mariadbd + ldd files (200M). Please note there is an extra 'myarchive.tar.xz' file in the upload containing a (likely partial) core that should be ignored. Note, this is pre-the-above-patch.
            https://drive.google.com/file/d/11ej-NeHxvgh_h6pTntxRD2bt11UAGYGM/view?usp=drive_link

            Roel Roel Van de Paar added a comment - - edited Core + mariadbd + ldd files (200M). Please note there is an extra 'myarchive.tar.xz' file in the upload containing a (likely partial) core that should be ignored. Note, this is pre-the-above-patch. https://drive.google.com/file/d/11ej-NeHxvgh_h6pTntxRD2bt11UAGYGM/view?usp=drive_link

            There is no usable high-level debug information, so I have to resort to checking this at the machine instruction level:

            10.6 b102872ad50cce5959ad95369740766d14e9e48c

            (gdb) frame 0
            #0  0x00005608ae1df729 in rec_offs_n_fields (offsets=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/include/rem0rec.h:605
            605	in /test/bb-10.6-primary-corruption_opt/storage/innobase/include/rem0rec.h
            (gdb) disassemble
            …
               0x00005608ae1df716 <+8198>:	call   0x5608ae1d9500 <btr_page_get_father_block(rec_offs*, mem_heap_t*, mtr_t*, btr_cur_t*)>
               0x00005608ae1df71b <+8203>:	mov    -0xa8(%rbp),%r9
               0x00005608ae1df722 <+8210>:	mov    -0x138(%rbp),%rcx
            => 0x00005608ae1df729 <+8217>:	movzwl 0x2(%rax),%edx
               0x00005608ae1df72d <+8221>:	mov    -0xb8(%rbp),%rsi
               0x00005608ae1df734 <+8228>:	movbe  (%rcx),%edi
               0x00005608ae1df738 <+8232>:	mov    (%rcx),%ecx
               0x00005608ae1df73a <+8234>:	movzwl 0x4(%rax,%rdx,2),%edx
               0x00005608ae1df73f <+8239>:	and    $0x3fff,%edx
               0x00005608ae1df745 <+8245>:	cmpq   $0x0,0x38(%r9)
               0x00005608ae1df74a <+8250>:	jne    0x5608ae1e0039 <_Z25btr_page_split_and_insertmP9btr_cur_tPPtPP16mem_block_info_tPK8dtuple_tmP5mtr_tP7dberr_t+10537>
            …
            

            At the time of the crash, the rax register (the return value from btr_page_get_father_block()) is 0. Most of the x86-64-v2 above quoted instructions that follow the call correspond to the following statement in rec_offs_data_size():

            	size = get_value(rec_offs_base(offsets)[rec_offs_n_fields(offsets)]);
            

            It looks like we are missing the necessary handling for errors that were returned for corrupted data by btr_page_get_father_block(). Please try the following patch:

            diff --git a/storage/innobase/btr/btr0btr.cc b/storage/innobase/btr/btr0btr.cc
            index ee2f8d00857..24848186d67 100644
            --- a/storage/innobase/btr/btr0btr.cc
            +++ b/storage/innobase/btr/btr0btr.cc
            @@ -844,14 +844,14 @@ static rec_offs *btr_page_get_parent(rec_offs *offsets, mem_heap_t *heap,
                     if (page_cur_search_with_match(tuple, PAGE_CUR_LE, &up_match,
                                                    &low_match, &cursor->page_cur,
                                                    nullptr))
            -          return nullptr;
            +          ut_a(1 < 0);
                     offsets= rec_get_offsets(cursor->page_cur.rec, index, offsets, 0,
                                              ULINT_UNDEFINED, &heap);
                     p= btr_node_ptr_get_child_page_no(cursor->page_cur.rec, offsets);
                     if (p != page_no)
                     {
                       if (btr_page_get_level(block->page.frame) == level)
            -            return nullptr;
            +            ut_a(2 < 0);
                       i= 0; // MDEV-29835 FIXME: require all pages to be latched in order!
                       continue;
                     }
            @@ -867,6 +867,7 @@ static rec_offs *btr_page_get_parent(rec_offs *offsets, mem_heap_t *heap,
                     return offsets;
                   }
             
            +  ut_a(3 < 0);
               return nullptr;
             }
             
            @@ -886,8 +887,7 @@ btr_page_get_father_block(
             {
               rec_t *rec=
                 page_rec_get_next(page_get_infimum_rec(cursor->block()->page.frame));
            -  if (UNIV_UNLIKELY(!rec))
            -    return nullptr;
            +  ut_a(rec);
               cursor->page_cur.rec= rec;
               return btr_page_get_parent(offsets, heap, cursor, mtr);
             }
            

            Because the test involves ROW_FORMAT=COMPRESSED, I think that these corruptions are somehow related to MDEV-31574. For better testing of that, I would suggest to enable the page_zip_validate() checks with the following, and file separate bugs for that:

            cmake -DWITH_INNODB_EXTRA_DEBUG=ON .
            

            We must implement proper checking of btr_page_get_father_block() return values, to avoid crashes on corruptions that were detected at that level.

            marko Marko Mäkelä added a comment - There is no usable high-level debug information, so I have to resort to checking this at the machine instruction level: 10.6 b102872ad50cce5959ad95369740766d14e9e48c (gdb) frame 0 #0 0x00005608ae1df729 in rec_offs_n_fields (offsets=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/include/rem0rec.h:605 605 in /test/bb-10.6-primary-corruption_opt/storage/innobase/include/rem0rec.h (gdb) disassemble … 0x00005608ae1df716 <+8198>: call 0x5608ae1d9500 <btr_page_get_father_block(rec_offs*, mem_heap_t*, mtr_t*, btr_cur_t*)> 0x00005608ae1df71b <+8203>: mov -0xa8(%rbp),%r9 0x00005608ae1df722 <+8210>: mov -0x138(%rbp),%rcx => 0x00005608ae1df729 <+8217>: movzwl 0x2(%rax),%edx 0x00005608ae1df72d <+8221>: mov -0xb8(%rbp),%rsi 0x00005608ae1df734 <+8228>: movbe (%rcx),%edi 0x00005608ae1df738 <+8232>: mov (%rcx),%ecx 0x00005608ae1df73a <+8234>: movzwl 0x4(%rax,%rdx,2),%edx 0x00005608ae1df73f <+8239>: and $0x3fff,%edx 0x00005608ae1df745 <+8245>: cmpq $0x0,0x38(%r9) 0x00005608ae1df74a <+8250>: jne 0x5608ae1e0039 <_Z25btr_page_split_and_insertmP9btr_cur_tPPtPP16mem_block_info_tPK8dtuple_tmP5mtr_tP7dberr_t+10537> … At the time of the crash, the rax register (the return value from btr_page_get_father_block() ) is 0. Most of the x86-64-v2 above quoted instructions that follow the call correspond to the following statement in rec_offs_data_size() : size = get_value(rec_offs_base(offsets)[rec_offs_n_fields(offsets)]); It looks like we are missing the necessary handling for errors that were returned for corrupted data by btr_page_get_father_block() . Please try the following patch: diff --git a/storage/innobase/btr/btr0btr.cc b/storage/innobase/btr/btr0btr.cc index ee2f8d00857..24848186d67 100644 --- a/storage/innobase/btr/btr0btr.cc +++ b/storage/innobase/btr/btr0btr.cc @@ -844,14 +844,14 @@ static rec_offs *btr_page_get_parent(rec_offs *offsets, mem_heap_t *heap, if (page_cur_search_with_match(tuple, PAGE_CUR_LE, &up_match, &low_match, &cursor->page_cur, nullptr)) - return nullptr; + ut_a(1 < 0); offsets= rec_get_offsets(cursor->page_cur.rec, index, offsets, 0, ULINT_UNDEFINED, &heap); p= btr_node_ptr_get_child_page_no(cursor->page_cur.rec, offsets); if (p != page_no) { if (btr_page_get_level(block->page.frame) == level) - return nullptr; + ut_a(2 < 0); i= 0; // MDEV-29835 FIXME: require all pages to be latched in order! continue; } @@ -867,6 +867,7 @@ static rec_offs *btr_page_get_parent(rec_offs *offsets, mem_heap_t *heap, return offsets; } + ut_a(3 < 0); return nullptr; } @@ -886,8 +887,7 @@ btr_page_get_father_block( { rec_t *rec= page_rec_get_next(page_get_infimum_rec(cursor->block()->page.frame)); - if (UNIV_UNLIKELY(!rec)) - return nullptr; + ut_a(rec); cursor->page_cur.rec= rec; return btr_page_get_parent(offsets, heap, cursor, mtr); } Because the test involves ROW_FORMAT=COMPRESSED , I think that these corruptions are somehow related to MDEV-31574 . For better testing of that, I would suggest to enable the page_zip_validate() checks with the following, and file separate bugs for that: cmake -DWITH_INNODB_EXTRA_DEBUG=ON . We must implement proper checking of btr_page_get_father_block() return values, to avoid crashes on corruptions that were detected at that level.

            The outcomes from the testrun pre-the last comment, i.e. this patch was applied:

            -		ib::error() << "We detected index corruption in an InnoDB type"
            +		ib::fatal() << "We detected index corruption in an InnoDB type"
            

            All other things being equal to the original setup. Branch was bb-10.6-primary-corruption with the patch applied.

            10.6.15 b102872ad50cce5959ad95369740766d14e9e48c (Optimized)

            Version: '10.6.15-MariaDB'  socket: '/test/bb-10.6-primary-corruption-MD010823-mariadb-10.6.15-linux-x86_64-opt/socket.sock'  port: 10126  MariaDB Server
            2023-08-01 18:50:52 1585 [Note] InnoDB: Number of pools: 2
            2023-08-01 21:07:19 257654 [ERROR] [FATAL] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
            2023-08-01 21:07:19 0x14e1db152640  InnoDB: Assertion failure in file /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/page/page0zip.cc line 4211
            InnoDB: Failing assertion: slot_rec
            InnoDB: We intentionally generate a memory trap.
            InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
            InnoDB: If you get repeated assertion failures or crashes, even
            InnoDB: immediately after the mariadbd startup, there may be
            InnoDB: corruption in the InnoDB tablespace. Please refer to
            InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
            InnoDB: about forcing recovery.
            

            10.6.15 b102872ad50cce5959ad95369740766d14e9e48c (Optimized)

            Core was generated by `/test/bb-10.6-primary-corruption-MD010823-mariadb-10.6.15-linux-x86_64-opt/bin/'.
            Program terminated with signal SIGABRT, Aborted.
            #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22960232478272)
                at ./nptl/pthread_kill.c:44
            [Current thread is 1 (Thread 0x14e1d8803640 (LWP 882075))]
            (gdb) bt
            #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22960232478272) at ./nptl/pthread_kill.c:44
            #1  __pthread_kill_internal (signo=6, threadid=22960232478272) at ./nptl/pthread_kill.c:78
            #2  __GI___pthread_kill (threadid=22960232478272, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
            #3  0x000014e609242476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
            #4  0x000014e6092287f3 in __GI_abort () at ./stdlib/abort.c:79
            #5  0x000056298746f5c3 in ib::fatal::~fatal (this=this@entry=0x14e1d8800dc0, __in_chrg=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/ut/ut0ut.cc:515
            #6  0x0000562987461a0d in row_mysql_handle_errors (new_err=0x14e1d8800fec, trx=0x14e60898f280, thr=<optimized out>, savept=0x14e1d8800ff0) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/row/row0mysql.cc:709
            #7  0x0000562987b43796 in row_insert_for_mysql (mysql_rec=<optimized out>, prebuilt=0x14e2a423db88, ins_mode=ROW_INS_NORMAL) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/row/row0mysql.cc:1325
            #8  0x0000562987a9ea4a in ha_innobase::write_row (this=0x14e2a411ccb0, record=0x14e2a41a4718 "\377\254\005") at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/handler/ha_innodb.cc:7917
            #9  0x00005629877ee930 in handler::ha_write_row (this=0x14e2a411ccb0, buf=0x14e2a41a4718 "\377\254\005") at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/handler.cc:7629
            #10 0x0000562987571c42 in write_record (thd=thd@entry=0x14e2a4093488, table=table@entry=0x14e2a4211f28, info=info@entry=0x14e1d8801440, sink=sink@entry=0x0) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_insert.cc:2157
            #11 0x0000562987578e8f in mysql_insert (thd=thd@entry=0x14e2a4093488, table_list=<optimized out>, fields=@0x14e2a4098380: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x14e2a40a24f0, last = 0x14e2a40a24f0, elements = 1}, <No data fields>}, values_list=@0x14e2a40983c8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x14e2a40a2a60, last = 0x14e2a40a2a60, elements = 1}, <No data fields>}, update_fields=@0x14e2a40983b0: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x562988659050 <end_of_list>, last = 0x14e2a40983b0, elements = 0}, <No data fields>}, update_values=@0x14e2a4098398: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x562988659050 <end_of_list>, last = 0x14e2a4098398, elements = 0}, <No data fields>}, duplic=<optimized out>, ignore=<optimized out>, result=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_insert.cc:1129
            #12 0x00005629875ad6a1 in mysql_execute_command (thd=0x14e2a4093488, is_called_from_prepared_stmt=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:4570
            #13 0x00005629875b1dd4 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x14e2a4093488) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:8041
            #14 mysql_parse (thd=0x14e2a4093488, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:7963
            #15 0x00005629875b4422 in dispatch_command (command=COM_QUERY, thd=0x14e2a4093488, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:1993
            #16 0x00005629875b5ca0 in do_command (thd=0x14e2a4093488, blocking=blocking@entry=true) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:1409
            #17 0x00005629876bb827 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x5629898aa588, put_in_cache=put_in_cache@entry=true) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_connect.cc:1416
            #18 0x00005629876bbafd in handle_one_connection (arg=0x5629898aa588) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_connect.cc:1318
            #19 0x000014e609294b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
            #20 0x000014e609326a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
            

            Core + mariadbd + ldd files (126M):
            https://drive.google.com/file/d/1408q1RHxXWqKyeFs7ELi-n7vTFjiNuhI/view?usp=sharing

            Roel Roel Van de Paar added a comment - The outcomes from the testrun pre-the last comment, i.e. this patch was applied: - ib::error() << "We detected index corruption in an InnoDB type" + ib::fatal() << "We detected index corruption in an InnoDB type" All other things being equal to the original setup. Branch was bb-10.6-primary-corruption with the patch applied. 10.6.15 b102872ad50cce5959ad95369740766d14e9e48c (Optimized) Version: '10.6.15-MariaDB' socket: '/test/bb-10.6-primary-corruption-MD010823-mariadb-10.6.15-linux-x86_64-opt/socket.sock' port: 10126 MariaDB Server 2023-08-01 18:50:52 1585 [Note] InnoDB: Number of pools: 2 2023-08-01 21:07:19 257654 [ERROR] [FATAL] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. 2023-08-01 21:07:19 0x14e1db152640 InnoDB: Assertion failure in file /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/page/page0zip.cc line 4211 InnoDB: Failing assertion: slot_rec InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mariadbd startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/ InnoDB: about forcing recovery. 10.6.15 b102872ad50cce5959ad95369740766d14e9e48c (Optimized) Core was generated by `/test/bb-10.6-primary-corruption-MD010823-mariadb-10.6.15-linux-x86_64-opt/bin/'. Program terminated with signal SIGABRT, Aborted. #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=22960232478272) at ./nptl/pthread_kill.c:44 [Current thread is 1 (Thread 0x14e1d8803640 (LWP 882075))] (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=22960232478272) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=22960232478272) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=22960232478272, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x000014e609242476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x000014e6092287f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x000056298746f5c3 in ib::fatal::~fatal (this=this@entry=0x14e1d8800dc0, __in_chrg=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/ut/ut0ut.cc:515 #6 0x0000562987461a0d in row_mysql_handle_errors (new_err=0x14e1d8800fec, trx=0x14e60898f280, thr=<optimized out>, savept=0x14e1d8800ff0) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/row/row0mysql.cc:709 #7 0x0000562987b43796 in row_insert_for_mysql (mysql_rec=<optimized out>, prebuilt=0x14e2a423db88, ins_mode=ROW_INS_NORMAL) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/row/row0mysql.cc:1325 #8 0x0000562987a9ea4a in ha_innobase::write_row (this=0x14e2a411ccb0, record=0x14e2a41a4718 "\377\254\005") at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/handler/ha_innodb.cc:7917 #9 0x00005629877ee930 in handler::ha_write_row (this=0x14e2a411ccb0, buf=0x14e2a41a4718 "\377\254\005") at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/handler.cc:7629 #10 0x0000562987571c42 in write_record (thd=thd@entry=0x14e2a4093488, table=table@entry=0x14e2a4211f28, info=info@entry=0x14e1d8801440, sink=sink@entry=0x0) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_insert.cc:2157 #11 0x0000562987578e8f in mysql_insert (thd=thd@entry=0x14e2a4093488, table_list=<optimized out>, fields=@0x14e2a4098380: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x14e2a40a24f0, last = 0x14e2a40a24f0, elements = 1}, <No data fields>}, values_list=@0x14e2a40983c8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x14e2a40a2a60, last = 0x14e2a40a2a60, elements = 1}, <No data fields>}, update_fields=@0x14e2a40983b0: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x562988659050 <end_of_list>, last = 0x14e2a40983b0, elements = 0}, <No data fields>}, update_values=@0x14e2a4098398: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x562988659050 <end_of_list>, last = 0x14e2a4098398, elements = 0}, <No data fields>}, duplic=<optimized out>, ignore=<optimized out>, result=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_insert.cc:1129 #12 0x00005629875ad6a1 in mysql_execute_command (thd=0x14e2a4093488, is_called_from_prepared_stmt=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:4570 #13 0x00005629875b1dd4 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x14e2a4093488) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:8041 #14 mysql_parse (thd=0x14e2a4093488, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:7963 #15 0x00005629875b4422 in dispatch_command (command=COM_QUERY, thd=0x14e2a4093488, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:1993 #16 0x00005629875b5ca0 in do_command (thd=0x14e2a4093488, blocking=blocking@entry=true) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:1409 #17 0x00005629876bb827 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x5629898aa588, put_in_cache=put_in_cache@entry=true) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_connect.cc:1416 #18 0x00005629876bbafd in handle_one_connection (arg=0x5629898aa588) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_connect.cc:1318 #19 0x000014e609294b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 #20 0x000014e609326a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 Core + mariadbd + ldd files (126M): https://drive.google.com/file/d/1408q1RHxXWqKyeFs7ELi-n7vTFjiNuhI/view?usp=sharing

            Issue is rr-elusive (verified)

            Roel Roel Van de Paar added a comment - Issue is rr-elusive (verified)

            Roel, to find out why btr_page_get_father_block() would return nullptr, it would have been very helpful if you had applied also the second patch. Without such a run, I can only ensure that a crash will be avoided when nullptr is returned.

            marko Marko Mäkelä added a comment - Roel , to find out why btr_page_get_father_block() would return nullptr , it would have been very helpful if you had applied also the second patch. Without such a run, I can only ensure that a crash will be avoided when nullptr is returned.

            Roel, please test this commit that adds some log messages to better identify the root cause of the corruption. You may want to add abort(); calls after each added message, or set breakpoints, so that we will get a little more information straight from the debugger.

            marko Marko Mäkelä added a comment - Roel , please test this commit that adds some log messages to better identify the root cause of the corruption. You may want to add abort(); calls after each added message, or set breakpoints, so that we will get a little more information straight from the debugger.
            Roel Roel Van de Paar added a comment - - edited

            Discussed with marko, retested, using 963a88be2e89100260e09e46e90e2b3271444899 in 10.6-MDEV-32371. Outcome:
            Opt: Count: 1590600, no failures
            Dbg: Count: 154200, no failures

            Roel Roel Van de Paar added a comment - - edited Discussed with marko , retested, using 963a88be2e89100260e09e46e90e2b3271444899 in 10.6-MDEV-32371 . Outcome: Opt: Count: 1590600, no failures Dbg: Count: 154200, no failures

            People

              marko Marko Mäkelä
              Roel Roel Van de Paar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.