[MDEV-31817] SIGSEGV after btr_page_get_father_block() returns nullptr on corrupted data Created: 2023-08-01  Updated: 2023-12-07  Resolved: 2023-11-30

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2
Fix Version/s: 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2

Type: Bug Priority: Critical
Reporter: Roel Van de Paar Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, multi-thread, rr-elusive, sporadic

Issue Links:
Relates
relates to MDEV-32174 ROW_FORMAT=COMPRESSED table corruptio... Confirmed

 Description   

#!/bin/bash
# Start server with --max_connections=10000
# Set variables and ensure ramloc is a ramdisk or tmpfs (i.e. /dev/shm)
 
user="root"
socket="./socket.sock"
db="test"
client="./bin/mariadb"
errorlog="./log/master.err"
ramloc="/dev/shm"
threads=2000   # Number of concurrent threads
queries=100    # Number of t1/t2 INSERTs per thread/per test round
rounds=999999  # Number of max test rounds
 
# Setup
${client} -u ${user} -S ${socket} -D ${db} -e "
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
CREATE TABLE t1 (c1 INT NOT NULL AUTO_INCREMENT, c2 INT NOT NULL, PRIMARY KEY (c1), UNIQUE KEY u1 (c1,c2)) ENGINE=InnoDB AUTO_INCREMENT=1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4; 
CREATE TABLE t2 (c1 DATETIME NOT NULL, c2 DOUBLE NOT NULL, t1_c1 INT NOT NULL, PRIMARY KEY (t1_c1,c1)) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;
"
 
insert_rows(){
  SQL=
  for ((i=0;i<${queries};i++)); do
    SQL="${SQL}INSERT INTO t1 (c2) VALUES (0); INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID()));"
  done
  ${client} -u ${user} -S ${socket} -D ${db} -e "${SQL}"
  rm -f ${ramloc}/${prefix}_md_proc_${1}  # Thread done
}
 
abort(){ jobs -p | xargs -P100 kill >/dev/null 2>&1; rm -Rf ${ramloc}/${prefix}_md_proc_*; exit 1; }
trap abort SIGINT
 
count=0
prefix="$(echo "${RANDOM}${RANDOM}${RANDOM}" | cut -b1-5)"
rm -f ${ramloc}/${prefix}_md_proc_*
for ((i=0;i<${rounds};i++)); do
  for ((i=0;i<${threads};i++)); do
    if [ ! -r ${ramloc}/${prefix}_md_proc_${i} ]; then  # Thread idle
      touch ${ramloc}/${prefix}_md_proc_${i}  # Thread busy
      insert_rows ${i} &
      if [ $[ ${count} % 1000 ] -eq 0 ]; then
        ${client} -u ${user} -S ${socket} -D ${db} -e "TRUNCATE TABLE t1;"
      fi
      count=$[ ${count} + 1 ]
      if [ $[ ${count} % 100 ] -eq 0 ]; then  # Limit disk I/O, check once every new 100 threads
        echo "Count: ${count}" | tee lastcount.log
        TAIL="$(tail -n10 ${errorlog} | tr -d '\n')"
        if [[ "${TAIL}" == *"ERROR"* ]]; then
          echo '*** Error found:'
          grep -i 'ERROR' log/master.err
          abort
        elif [[ "${TAIL}" == *"down complete"* ]]; then
          echo '*** Server shutdown'
          abort
        elif ! ${client}-admin ping -u ${user} -S ${socket} > /dev/null 2>&1; then
          echo '*** Server gone (killed/crashed)'
          abort
        fi
      fi
    fi
  done
done

Leads to:

10.6 b102872ad50cce5959ad95369740766d14e9e48c (Optimized)

Version: '10.6.15-MariaDB'  socket: '/test/bb-10.6-primary-corruption_MD260723-mariadb-10.6.15-linux-x86_64-opt/socket.sock'  port: 11190  MariaDB Server
2023-07-31 11:55:08 1508 [Note] InnoDB: Number of pools: 2
2023-07-31 20:53:10 997643 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.

10.6 b102872ad50cce5959ad95369740766d14e9e48c (Optimized)

Core was generated by `/test/bb-10.6-primary-corruption_MD260723-mariadb-10.6.15-linux-x86_64-opt/bin/'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005608ae1df729 in rec_offs_n_fields (offsets=<optimized out>)
    at /test/bb-10.6-primary-corruption_opt/storage/innobase/include/rem0rec.h:605
[Current thread is 1 (Thread 0x14ded65de640 (LWP 3098102))]
(gdb) bt
#0  0x00005608ae1df729 in rec_offs_n_fields (offsets=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/include/rem0rec.h:605
#1  rec_offs_data_size (offsets=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/include/rem0rec.inl:968
#2  btr_node_ptr_set_child_page_no (mtr=<optimized out>, page_no=<optimized out>, offsets=<optimized out>, rec=<optimized out>, block=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/btr/btr0btr.cc:732
#3  btr_attach_half_pages (mtr=0x14ded65db730, direction=112, new_block=0x14e2f4081140, split_rec=0x14e2fa4d320d "", block=0x14e2f40f7f20, index=0x14e0ec0b9ba8, flags=0) at /test/bb-10.6-primary-corruption_opt/storage/innobase/btr/btr0btr.cc:2547
#4  btr_page_split_and_insert (flags=flags@entry=0, cursor=cursor@entry=0x14ded65daee0, offsets=offsets@entry=0x14ded65dae70, heap=heap@entry=0x14ded65dae68, tuple=tuple@entry=0x14e04407edb8, n_ext=0, mtr=0x14ded65db730, err=0x14ded65dad80) at /test/bb-10.6-primary-corruption_opt/storage/innobase/btr/btr0btr.cc:3120
#5  0x00005608ae1f48c2 in btr_cur_pessimistic_insert (flags=flags@entry=0, cursor=0x14ded65daee0, offsets=offsets@entry=0x14ded65dae70, heap=0x14ded65dae68, entry=entry@entry=0x14e04407edb8, rec=0x14ded65daf70, big_rec=0x14ded65db2b0, n_ext=<optimized out>, thr=0x14e044151670, mtr=0x14ded65db730) at /test/bb-10.6-primary-corruption_opt/storage/innobase/btr/btr0cur.cc:2689
#6  0x00005608ae1762cf in row_ins_sec_index_entry_low (flags=<optimized out>, mode=<optimized out>, index=0x14e0ec0b9ba8, offsets_heap=<optimized out>, heap=<optimized out>, entry=0x14e04407edb8, trx_id=<optimized out>, thr=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0ins.cc:3143
#7  0x00005608ae179055 in row_ins_sec_index_entry (index=0x14e0ec0b9ba8, entry=0x14e04407edb8, thr=0x14e044151670, check_foreign=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0ins.cc:3333
#8  0x00005608ae17939a in row_ins_index_entry (thr=0x14e044151670, entry=<optimized out>, index=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0ins.cc:3367
#9  row_ins_index_entry_step (thr=0x14e044151670, node=0x14e044151448) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0ins.cc:3533
#10 row_ins (thr=0x14e044151670, node=0x14e044151448) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0ins.cc:3658
#11 row_ins_step (thr=thr@entry=0x14e044151670) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0ins.cc:3787
#12 0x00005608ae1873ef in row_insert_for_mysql (mysql_rec=<optimized out>, prebuilt=0x14e044150f58, ins_mode=ROW_INS_NORMAL) at /test/bb-10.6-primary-corruption_opt/storage/innobase/row/row0mysql.cc:1314
#13 0x00005608ae0e29aa in ha_innobase::write_row (this=0x14e0441846b0, record=0x14e0e40a7488 "\377d4") at /test/bb-10.6-primary-corruption_opt/storage/innobase/handler/ha_innodb.cc:7917
#14 0x00005608ade32930 in handler::ha_write_row (this=0x14e0441846b0, buf=0x14e0e40a7488 "\377d4") at /test/bb-10.6-primary-corruption_opt/sql/handler.cc:7629
#15 0x00005608adbb5c42 in write_record (thd=thd@entry=0x14e1c808f968, table=table@entry=0x14e0e416e558, info=info@entry=0x14ded65dc440, sink=sink@entry=0x0) at /test/bb-10.6-primary-corruption_opt/sql/sql_insert.cc:2157
#16 0x00005608adbbce8f in mysql_insert (thd=thd@entry=0x14e1c808f968, table_list=<optimized out>, fields=@0x14e1c8094860: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x14e1c809fd40, last = 0x14e1c809fd40, elements = 1}, <No data fields>}, values_list=@0x14e1c80948a8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x14e1c80a02b0, last = 0x14e1c80a02b0, elements = 1}, <No data fields>}, update_fields=@0x14e1c8094890: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x5608aec9c050 <end_of_list>, last = 0x14e1c8094890, elements = 0}, <No data fields>}, update_values=@0x14e1c8094878: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x5608aec9c050 <end_of_list>, last = 0x14e1c8094878, elements = 0}, <No data fields>}, duplic=<optimized out>, ignore=<optimized out>, result=<optimized out>) at /test/bb-10.6-primary-corruption_opt/sql/sql_insert.cc:1129
#17 0x00005608adbf16a1 in mysql_execute_command (thd=0x14e1c808f968, is_called_from_prepared_stmt=<optimized out>) at /test/bb-10.6-primary-corruption_opt/sql/sql_parse.cc:4570
#18 0x00005608adbf5dd4 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x14e1c808f968) at /test/bb-10.6-primary-corruption_opt/sql/sql_parse.cc:8041
#19 mysql_parse (thd=0x14e1c808f968, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /test/bb-10.6-primary-corruption_opt/sql/sql_parse.cc:7963
#20 0x00005608adbf8422 in dispatch_command (command=COM_QUERY, thd=0x14e1c808f968, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /test/bb-10.6-primary-corruption_opt/sql/sql_parse.cc:1993
#21 0x00005608adbf9ca0 in do_command (thd=0x14e1c808f968, blocking=blocking@entry=true) at /test/bb-10.6-primary-corruption_opt/sql/sql_parse.cc:1409
#22 0x00005608adcff827 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x5608b0d7d3f8, put_in_cache=put_in_cache@entry=true) at /test/bb-10.6-primary-corruption_opt/sql/sql_connect.cc:1416
#23 0x00005608adcffafd in handle_one_connection (arg=0x5608b0d7d3f8) at /test/bb-10.6-primary-corruption_opt/sql/sql_connect.cc:1318
#24 0x000014e309694b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#25 0x000014e309726a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81



 Comments   
Comment by Marko Mäkelä [ 2023-08-01 ]

Can you please try to reproduce a crash earlier with the following patch:

diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
index 98b76c34ff5..a9f3c1eb769 100644
--- a/storage/innobase/row/row0mysql.cc
+++ b/storage/innobase/row/row0mysql.cc
@@ -706,7 +706,7 @@ row_mysql_handle_errors(
 	case DB_TABLE_CORRUPT:
 	case DB_CORRUPTION:
 	case DB_PAGE_CORRUPTED:
-		ib::error() << "We detected index corruption in an InnoDB type"
+		ib::fatal() << "We detected index corruption in an InnoDB type"
 			" table. You have to dump + drop + reimport the"
 			" table or, in a case of widespread corruption,"
 			" dump all InnoDB tables and recreate the whole"

Comment by Roel Van de Paar [ 2023-08-01 ]

Core + mariadbd + ldd files (200M). Please note there is an extra 'myarchive.tar.xz' file in the upload containing a (likely partial) core that should be ignored. Note, this is pre-the-above-patch.
https://drive.google.com/file/d/11ej-NeHxvgh_h6pTntxRD2bt11UAGYGM/view?usp=drive_link

Comment by Marko Mäkelä [ 2023-08-01 ]

There is no usable high-level debug information, so I have to resort to checking this at the machine instruction level:

10.6 b102872ad50cce5959ad95369740766d14e9e48c

(gdb) frame 0
#0  0x00005608ae1df729 in rec_offs_n_fields (offsets=<optimized out>) at /test/bb-10.6-primary-corruption_opt/storage/innobase/include/rem0rec.h:605
605	in /test/bb-10.6-primary-corruption_opt/storage/innobase/include/rem0rec.h
(gdb) disassemble
   0x00005608ae1df716 <+8198>:	call   0x5608ae1d9500 <btr_page_get_father_block(rec_offs*, mem_heap_t*, mtr_t*, btr_cur_t*)>
   0x00005608ae1df71b <+8203>:	mov    -0xa8(%rbp),%r9
   0x00005608ae1df722 <+8210>:	mov    -0x138(%rbp),%rcx
=> 0x00005608ae1df729 <+8217>:	movzwl 0x2(%rax),%edx
   0x00005608ae1df72d <+8221>:	mov    -0xb8(%rbp),%rsi
   0x00005608ae1df734 <+8228>:	movbe  (%rcx),%edi
   0x00005608ae1df738 <+8232>:	mov    (%rcx),%ecx
   0x00005608ae1df73a <+8234>:	movzwl 0x4(%rax,%rdx,2),%edx
   0x00005608ae1df73f <+8239>:	and    $0x3fff,%edx
   0x00005608ae1df745 <+8245>:	cmpq   $0x0,0x38(%r9)
   0x00005608ae1df74a <+8250>:	jne    0x5608ae1e0039 <_Z25btr_page_split_and_insertmP9btr_cur_tPPtPP16mem_block_info_tPK8dtuple_tmP5mtr_tP7dberr_t+10537>

At the time of the crash, the rax register (the return value from btr_page_get_father_block()) is 0. Most of the x86-64-v2 above quoted instructions that follow the call correspond to the following statement in rec_offs_data_size():

	size = get_value(rec_offs_base(offsets)[rec_offs_n_fields(offsets)]);

It looks like we are missing the necessary handling for errors that were returned for corrupted data by btr_page_get_father_block(). Please try the following patch:

diff --git a/storage/innobase/btr/btr0btr.cc b/storage/innobase/btr/btr0btr.cc
index ee2f8d00857..24848186d67 100644
--- a/storage/innobase/btr/btr0btr.cc
+++ b/storage/innobase/btr/btr0btr.cc
@@ -844,14 +844,14 @@ static rec_offs *btr_page_get_parent(rec_offs *offsets, mem_heap_t *heap,
         if (page_cur_search_with_match(tuple, PAGE_CUR_LE, &up_match,
                                        &low_match, &cursor->page_cur,
                                        nullptr))
-          return nullptr;
+          ut_a(1 < 0);
         offsets= rec_get_offsets(cursor->page_cur.rec, index, offsets, 0,
                                  ULINT_UNDEFINED, &heap);
         p= btr_node_ptr_get_child_page_no(cursor->page_cur.rec, offsets);
         if (p != page_no)
         {
           if (btr_page_get_level(block->page.frame) == level)
-            return nullptr;
+            ut_a(2 < 0);
           i= 0; // MDEV-29835 FIXME: require all pages to be latched in order!
           continue;
         }
@@ -867,6 +867,7 @@ static rec_offs *btr_page_get_parent(rec_offs *offsets, mem_heap_t *heap,
         return offsets;
       }
 
+  ut_a(3 < 0);
   return nullptr;
 }
 
@@ -886,8 +887,7 @@ btr_page_get_father_block(
 {
   rec_t *rec=
     page_rec_get_next(page_get_infimum_rec(cursor->block()->page.frame));
-  if (UNIV_UNLIKELY(!rec))
-    return nullptr;
+  ut_a(rec);
   cursor->page_cur.rec= rec;
   return btr_page_get_parent(offsets, heap, cursor, mtr);
 }

Because the test involves ROW_FORMAT=COMPRESSED, I think that these corruptions are somehow related to MDEV-31574. For better testing of that, I would suggest to enable the page_zip_validate() checks with the following, and file separate bugs for that:

cmake -DWITH_INNODB_EXTRA_DEBUG=ON .

We must implement proper checking of btr_page_get_father_block() return values, to avoid crashes on corruptions that were detected at that level.

Comment by Roel Van de Paar [ 2023-08-03 ]

The outcomes from the testrun pre-the last comment, i.e. this patch was applied:

-		ib::error() << "We detected index corruption in an InnoDB type"
+		ib::fatal() << "We detected index corruption in an InnoDB type"

All other things being equal to the original setup. Branch was bb-10.6-primary-corruption with the patch applied.

10.6.15 b102872ad50cce5959ad95369740766d14e9e48c (Optimized)

Version: '10.6.15-MariaDB'  socket: '/test/bb-10.6-primary-corruption-MD010823-mariadb-10.6.15-linux-x86_64-opt/socket.sock'  port: 10126  MariaDB Server
2023-08-01 18:50:52 1585 [Note] InnoDB: Number of pools: 2
2023-08-01 21:07:19 257654 [ERROR] [FATAL] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-08-01 21:07:19 0x14e1db152640  InnoDB: Assertion failure in file /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/page/page0zip.cc line 4211
InnoDB: Failing assertion: slot_rec
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mariadbd startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.

10.6.15 b102872ad50cce5959ad95369740766d14e9e48c (Optimized)

Core was generated by `/test/bb-10.6-primary-corruption-MD010823-mariadb-10.6.15-linux-x86_64-opt/bin/'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22960232478272)
    at ./nptl/pthread_kill.c:44
[Current thread is 1 (Thread 0x14e1d8803640 (LWP 882075))]
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22960232478272) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=22960232478272) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=22960232478272, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x000014e609242476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x000014e6092287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000056298746f5c3 in ib::fatal::~fatal (this=this@entry=0x14e1d8800dc0, __in_chrg=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/ut/ut0ut.cc:515
#6  0x0000562987461a0d in row_mysql_handle_errors (new_err=0x14e1d8800fec, trx=0x14e60898f280, thr=<optimized out>, savept=0x14e1d8800ff0) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/row/row0mysql.cc:709
#7  0x0000562987b43796 in row_insert_for_mysql (mysql_rec=<optimized out>, prebuilt=0x14e2a423db88, ins_mode=ROW_INS_NORMAL) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/row/row0mysql.cc:1325
#8  0x0000562987a9ea4a in ha_innobase::write_row (this=0x14e2a411ccb0, record=0x14e2a41a4718 "\377\254\005") at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/storage/innobase/handler/ha_innodb.cc:7917
#9  0x00005629877ee930 in handler::ha_write_row (this=0x14e2a411ccb0, buf=0x14e2a41a4718 "\377\254\005") at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/handler.cc:7629
#10 0x0000562987571c42 in write_record (thd=thd@entry=0x14e2a4093488, table=table@entry=0x14e2a4211f28, info=info@entry=0x14e1d8801440, sink=sink@entry=0x0) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_insert.cc:2157
#11 0x0000562987578e8f in mysql_insert (thd=thd@entry=0x14e2a4093488, table_list=<optimized out>, fields=@0x14e2a4098380: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x14e2a40a24f0, last = 0x14e2a40a24f0, elements = 1}, <No data fields>}, values_list=@0x14e2a40983c8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x14e2a40a2a60, last = 0x14e2a40a2a60, elements = 1}, <No data fields>}, update_fields=@0x14e2a40983b0: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x562988659050 <end_of_list>, last = 0x14e2a40983b0, elements = 0}, <No data fields>}, update_values=@0x14e2a4098398: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x562988659050 <end_of_list>, last = 0x14e2a4098398, elements = 0}, <No data fields>}, duplic=<optimized out>, ignore=<optimized out>, result=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_insert.cc:1129
#12 0x00005629875ad6a1 in mysql_execute_command (thd=0x14e2a4093488, is_called_from_prepared_stmt=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:4570
#13 0x00005629875b1dd4 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x14e2a4093488) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:8041
#14 mysql_parse (thd=0x14e2a4093488, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:7963
#15 0x00005629875b4422 in dispatch_command (command=COM_QUERY, thd=0x14e2a4093488, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:1993
#16 0x00005629875b5ca0 in do_command (thd=0x14e2a4093488, blocking=blocking@entry=true) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_parse.cc:1409
#17 0x00005629876bb827 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x5629898aa588, put_in_cache=put_in_cache@entry=true) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_connect.cc:1416
#18 0x00005629876bbafd in handle_one_connection (arg=0x5629898aa588) at /test/bb-10.6-primary-corruption_MOD_Patch_Marko_opt/sql/sql_connect.cc:1318
#19 0x000014e609294b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#20 0x000014e609326a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Core + mariadbd + ldd files (126M):
https://drive.google.com/file/d/1408q1RHxXWqKyeFs7ELi-n7vTFjiNuhI/view?usp=sharing

Comment by Roel Van de Paar [ 2023-08-03 ]

Issue is rr-elusive (verified)

Comment by Marko Mäkelä [ 2023-11-23 ]

Roel, to find out why btr_page_get_father_block() would return nullptr, it would have been very helpful if you had applied also the second patch. Without such a run, I can only ensure that a crash will be avoided when nullptr is returned.

Comment by Marko Mäkelä [ 2023-11-23 ]

Roel, please test this commit that adds some log messages to better identify the root cause of the corruption. You may want to add abort(); calls after each added message, or set breakpoints, so that we will get a little more information straight from the debugger.

Comment by Roel Van de Paar [ 2023-11-30 ]

Discussed with marko, retested, using 963a88be2e89100260e09e46e90e2b3271444899 in 10.6-MDEV-32371. Outcome:
Opt: Count: 1590600, no failures
Dbg: Count: 154200, no failures

Generated at Thu Feb 08 10:26:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.