[MDEV-30100] Assertion `purge_sys.tail.trx_no <= purge_sys.rseg->last_trx_no()' - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.5.16, 11.2.0, 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL)
Fix Version/s: 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3, 11.2.2
Component/s: Storage Engine - InnoDB
Labels:
- rr-profile
Environment:
centos 7.6

Description

The process exits during mysqld execution, and an error occurs after restarting.

lead to:

InnoDB: Failing assertion: purge_sys.tail.trx_no <= purge_sys.rseg->last_trx_no()

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/

InnoDB: about forcing recovery.

220914 18:23:30 [ERROR] mysqld got signal 6 ;

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

data.tar.xz
927 kB
2023-08-03 08:30

Issue Links

causes

MDEV-32103 InnoDB ALTER TABLE is not crash safe

Closed

duplicates

MDEV-30261 mysqld got signal 6 ;

Closed

relates to

MDEV-29401 InnoDB history list length increased in 10.6 compared to 10.5 for the same load

Closed

Activity

Ascending order - Click to sort in descending order

View 17 older comments

Marko Mäkelä added a comment - 2023-08-17 12:29

I tried but failed to find the exact root cause of this.

break ibuf0ibuf.cc:3509

cond 1 index==0x12222402ac38 && *(char*)entry.fields[0].data=='2'

break ibuf_merge_or_delete_for_page

cond 2 page_id.m_id==0x50000000e

continue

The first breakpoint is being hit exactly once in the entire trace, for a different primary key, col1=0x81c (2076) and not col1=0x11a. I think that in order to find and fix the cause of this corruption, we might need some test case simplification.

Marko Mäkelä added a comment - 2023-08-17 12:29 I tried but failed to find the exact root cause of this. break ibuf0ibuf.cc:3509 cond 1 index==0x12222402ac38 && *(char*)entry.fields[0].data=='2' break ibuf_merge_or_delete_for_page cond 2 page_id.m_id==0x50000000e continue The first breakpoint is being hit exactly once in the entire trace, for a different primary key, col1=0x81c (2076) and not col1=0x11a. I think that in order to find and fix the cause of this corruption, we might need some test case simplification.

Marko Mäkelä added a comment - 2023-08-18 09:29

I just spent 4 hours analyzing the root cause of another failure:

# 2023-08-17T13:18:56 [33159] ERROR: checkDatabaseIntegrity  server[1]:  Query ->ALTER TABLE test . t5 FORCE<- failed with 1062 : Duplicate entry '1900-01-01' for key 'col_date'

Yes, there was a duplicate value 0x8ed821 for that unique column, with the PRIMARY KEY values pk=0x875 and pk=0x876. It took me some effort to find the original INSERT statement. At that point, the unique key did exist, but the constraint was violated because innodb_change_buffering was enabled and unique_checks=0 was in effect. The problematic INSERT only inserted that single row. In the clustered index page at that time, we got:

#1  page_cur_insert_rec_low (cur=cur@entry=0x79f71d2ada20, rec=rec@entry=0x1b2d141183be "\200", offsets=<optimized out>, mtr=mtr@entry=0x79f71d2adfc0)

    at /data/Server/bb-10.6-MDEV-30100/storage/innobase/page/page0cur.cc:1569

1569	    mach_write_to_2(cur->rec - REC_NEXT,

(rr) p/x *rec@30

$44 = {0x80, 0x0, 0x8, 0x76, 0x0, 0x0, 0x0, 0x0, 0x80, 0x12, 0x92, 0x0, 0x0, 0x4, 0x66, 0xc, 0x10, 0x8e, 0xd8, 0x21, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}

(rr) p/x *cur.rec@30

$46 = {0x80, 0x0, 0x8, 0x75, 0x0, 0x0, 0x0, 0x0, 0x80, 0x4, 0x89, 0x0, 0x0, 0x4, 0x65, 0xa, 0x88, 0x8e, 0xd8, 0x21, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}

That is, pk=0x875 was inserted by DB_TRX_ID=0x8004, and the current transaction is inserting pk=0x876,DB_TRX_ID=0x8012. In both records right after the 7-byte DB_ROLL_PTR, we see the duplicate col_date value 0x8ed821. This is still OK; the duplicate should be detected when we attempt to insert that key to the unique index. After that point, this last insert should be rolled back. However, the duplicate key was not noticed due to the following:

Thread 29 hit Breakpoint 19, ibuf_insert_low (mode=BTR_MODIFY_PREV, op=IBUF_OP_INSERT, no_counter=1, entry=0x1b2d140aaee8, entry_size=12, index=0x1b2d1409e898, page_id={m_id = 38654705818}, zip_size=0,

    thr=0x1b2d1411f890) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/ibuf/ibuf0ibuf.cc:3111

3111	{

5: x/i $pc

=> 0x555a1593cd90 <ibuf_insert_low(btr_latch_mode, ibuf_op_t, ulint, dtuple_t const*, ulint, dict_index_t*, page_id_t, ulint, que_thr_t*)>:	push   %rbp

(rr) p index.name

$54 = {m_name = 0x1b2d1409ea28 "col_date"}

(rr) fin

Run till exit from #0  ibuf_insert_low (mode=BTR_MODIFY_PREV, op=IBUF_OP_INSERT, no_counter=1, entry=0x1b2d140aaee8, entry_size=12, index=0x1b2d1409e898, page_id={m_id = 38654705818}, zip_size=0,

    thr=0x1b2d1411f890) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/ibuf/ibuf0ibuf.cc:3111

ibuf_insert (op=op@entry=IBUF_OP_INSERT, entry=entry@entry=0x1b2d140aaee8, index=0x1b2d1409e898, page_id=page_id@entry={m_id = 38654705818}, zip_size=zip_size@entry=0, thr=<optimized out>)

    at /data/Server/bb-10.6-MDEV-30100/storage/innobase/ibuf/ibuf0ibuf.cc:3512

3512		if (err == DB_FAIL) {

5: x/i $pc

=> 0x555a1593ead5 <ibuf_insert(ibuf_op_t, dtuple_t const*, dict_index_t*, page_id_t, unsigned long, que_thr_t*)+533>:	add    $0x20,%rsp

Value returned is $55 = DB_SUCCESS

(rr)

Run till exit from #0  ibuf_insert (op=op@entry=IBUF_OP_INSERT, entry=entry@entry=0x1b2d140aaee8, index=0x1b2d1409e898, page_id=page_id@entry={m_id = 38654705818}, zip_size=zip_size@entry=0,

    thr=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/ibuf/ibuf0ibuf.cc:3512

0x0000555a15fc0f8a in btr_cur_t::search_leaf (this=this@entry=0x79f71d2ad760, tuple=tuple@entry=0x1b2d140aaee8, mode=mode@entry=PAGE_CUR_LE, latch_mode=BTR_MODIFY_LEAF, latch_mode@entry=2114,

    mtr=mtr@entry=0x79f71d2adfb0) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/include/btr0cur.h:744

744	  dict_index_t *index() const { return page_cur.index; }

5: x/i $pc

=> 0x555a15fc0f8a <btr_cur_t::search_leaf(dtuple_t const*, page_cur_mode_t, btr_latch_mode, mtr_t*)+2026>:	mov    -0x538(%rbp),%r10

Value returned is $56 = true

(rr) bt

#0  0x0000555a15fc0f8a in btr_cur_t::search_leaf (this=this@entry=0x79f71d2ad760, tuple=tuple@entry=0x1b2d140aaee8, mode=mode@entry=PAGE_CUR_LE, latch_mode=BTR_MODIFY_LEAF, latch_mode@entry=2114,

    mtr=mtr@entry=0x79f71d2adfb0) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/include/btr0cur.h:744

#1  0x0000555a15f4ac05 in row_ins_sec_index_entry_low (flags=0, mode=BTR_MODIFY_LEAF, index=0x1b2d1409e898, offsets_heap=<optimized out>, heap=0x1b2d14256430, entry=0x1b2d140aaee8, trx_id=0, thr=0x1b2d1411f890)

    at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3015

#2  0x0000555a15f4e312 in row_ins_sec_index_entry (index=0x1b2d1409e898, entry=0x1b2d140aaee8, thr=0x1b2d1411f890, check_foreign=<optimized out>)

    at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3322

#3  0x0000555a15f4e7ec in row_ins_index_entry (thr=0x1b2d1411f890, entry=<optimized out>, index=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3370

#4  row_ins_index_entry_step (thr=0x1b2d1411f890, node=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3536

#5  row_ins (thr=0x1b2d1411f890, node=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3661

#6  row_ins_step (thr=thr@entry=0x1b2d1411f890) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3790

#7  0x0000555a15f5b227 in row_insert_for_mysql (mysql_rec=mysql_rec@entry=0x1b2d140ac2c0 "\377", prebuilt=0x1b2d1412ed98, ins_mode=ROW_INS_NORMAL)

    at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0mysql.cc:1315

#8  0x0000555a15eca90e in ha_innobase::write_row (this=0x1b2d140a8490, record=0x1b2d140ac2c0 "\377") at /data/Server/bb-10.6-MDEV-30100/storage/innobase/handler/ha_innodb.cc:7905

This statement

INSERT IGNORE INTO `t5` ( `pk`, `col_char` ) VALUES ( NULL, 'l' ) /* E_R Thread1 QNO 63149 CON_ID 14 */;

was successfully autocommitted with that 1 invalid row inserted, because unique_checks=0 allowed the buffered insert into the unique index col_date to happen. Had it not been possible to use the change buffer, the uniqueness violation would have been noticed.

Marko Mäkelä added a comment - 2023-08-18 09:29 I just spent 4 hours analyzing the root cause of another failure: # 2023-08-17T13:18:56 [33159] ERROR: checkDatabaseIntegrity server[1]: Query ->ALTER TABLE test . t5 FORCE<- failed with 1062 : Duplicate entry '1900-01-01' for key 'col_date' Yes, there was a duplicate value 0x8ed821 for that unique column, with the PRIMARY KEY values pk=0x875 and pk=0x876. It took me some effort to find the original INSERT statement. At that point, the unique key did exist, but the constraint was violated because innodb_change_buffering was enabled and unique_checks=0 was in effect. The problematic INSERT only inserted that single row. In the clustered index page at that time, we got: #1 page_cur_insert_rec_low (cur=cur@entry=0x79f71d2ada20, rec=rec@entry=0x1b2d141183be "\200", offsets=<optimized out>, mtr=mtr@entry=0x79f71d2adfc0) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/page/page0cur.cc:1569 1569 mach_write_to_2(cur->rec - REC_NEXT, (rr) p/x *rec@30 $44 = {0x80, 0x0, 0x8, 0x76, 0x0, 0x0, 0x0, 0x0, 0x80, 0x12, 0x92, 0x0, 0x0, 0x4, 0x66, 0xc, 0x10, 0x8e, 0xd8, 0x21, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} (rr) p/x *cur.rec@30 $46 = {0x80, 0x0, 0x8, 0x75, 0x0, 0x0, 0x0, 0x0, 0x80, 0x4, 0x89, 0x0, 0x0, 0x4, 0x65, 0xa, 0x88, 0x8e, 0xd8, 0x21, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} That is, pk=0x875 was inserted by DB_TRX_ID=0x8004, and the current transaction is inserting pk=0x876,DB_TRX_ID=0x8012. In both records right after the 7-byte DB_ROLL_PTR, we see the duplicate col_date value 0x8ed821. This is still OK; the duplicate should be detected when we attempt to insert that key to the unique index. After that point, this last insert should be rolled back. However, the duplicate key was not noticed due to the following: Thread 29 hit Breakpoint 19, ibuf_insert_low (mode=BTR_MODIFY_PREV, op=IBUF_OP_INSERT, no_counter=1, entry=0x1b2d140aaee8, entry_size=12, index=0x1b2d1409e898, page_id={m_id = 38654705818}, zip_size=0, thr=0x1b2d1411f890) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/ibuf/ibuf0ibuf.cc:3111 3111 { 5: x/i $pc => 0x555a1593cd90 <ibuf_insert_low(btr_latch_mode, ibuf_op_t, ulint, dtuple_t const*, ulint, dict_index_t*, page_id_t, ulint, que_thr_t*)>: push %rbp (rr) p index.name $54 = {m_name = 0x1b2d1409ea28 "col_date"} (rr) fin Run till exit from #0 ibuf_insert_low (mode=BTR_MODIFY_PREV, op=IBUF_OP_INSERT, no_counter=1, entry=0x1b2d140aaee8, entry_size=12, index=0x1b2d1409e898, page_id={m_id = 38654705818}, zip_size=0, thr=0x1b2d1411f890) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/ibuf/ibuf0ibuf.cc:3111 ibuf_insert (op=op@entry=IBUF_OP_INSERT, entry=entry@entry=0x1b2d140aaee8, index=0x1b2d1409e898, page_id=page_id@entry={m_id = 38654705818}, zip_size=zip_size@entry=0, thr=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/ibuf/ibuf0ibuf.cc:3512 3512 if (err == DB_FAIL) { 5: x/i $pc => 0x555a1593ead5 <ibuf_insert(ibuf_op_t, dtuple_t const*, dict_index_t*, page_id_t, unsigned long, que_thr_t*)+533>: add $0x20,%rsp Value returned is $55 = DB_SUCCESS (rr) Run till exit from #0 ibuf_insert (op=op@entry=IBUF_OP_INSERT, entry=entry@entry=0x1b2d140aaee8, index=0x1b2d1409e898, page_id=page_id@entry={m_id = 38654705818}, zip_size=zip_size@entry=0, thr=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/ibuf/ibuf0ibuf.cc:3512 0x0000555a15fc0f8a in btr_cur_t::search_leaf (this=this@entry=0x79f71d2ad760, tuple=tuple@entry=0x1b2d140aaee8, mode=mode@entry=PAGE_CUR_LE, latch_mode=BTR_MODIFY_LEAF, latch_mode@entry=2114, mtr=mtr@entry=0x79f71d2adfb0) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/include/btr0cur.h:744 744 dict_index_t *index() const { return page_cur.index; } 5: x/i $pc => 0x555a15fc0f8a <btr_cur_t::search_leaf(dtuple_t const*, page_cur_mode_t, btr_latch_mode, mtr_t*)+2026>: mov -0x538(%rbp),%r10 Value returned is $56 = true (rr) bt #0 0x0000555a15fc0f8a in btr_cur_t::search_leaf (this=this@entry=0x79f71d2ad760, tuple=tuple@entry=0x1b2d140aaee8, mode=mode@entry=PAGE_CUR_LE, latch_mode=BTR_MODIFY_LEAF, latch_mode@entry=2114, mtr=mtr@entry=0x79f71d2adfb0) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/include/btr0cur.h:744 #1 0x0000555a15f4ac05 in row_ins_sec_index_entry_low (flags=0, mode=BTR_MODIFY_LEAF, index=0x1b2d1409e898, offsets_heap=<optimized out>, heap=0x1b2d14256430, entry=0x1b2d140aaee8, trx_id=0, thr=0x1b2d1411f890) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3015 #2 0x0000555a15f4e312 in row_ins_sec_index_entry (index=0x1b2d1409e898, entry=0x1b2d140aaee8, thr=0x1b2d1411f890, check_foreign=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3322 #3 0x0000555a15f4e7ec in row_ins_index_entry (thr=0x1b2d1411f890, entry=<optimized out>, index=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3370 #4 row_ins_index_entry_step (thr=0x1b2d1411f890, node=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3536 #5 row_ins (thr=0x1b2d1411f890, node=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3661 #6 row_ins_step (thr=thr@entry=0x1b2d1411f890) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3790 #7 0x0000555a15f5b227 in row_insert_for_mysql (mysql_rec=mysql_rec@entry=0x1b2d140ac2c0 "\377", prebuilt=0x1b2d1412ed98, ins_mode=ROW_INS_NORMAL) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0mysql.cc:1315 #8 0x0000555a15eca90e in ha_innobase::write_row (this=0x1b2d140a8490, record=0x1b2d140ac2c0 "\377") at /data/Server/bb-10.6-MDEV-30100/storage/innobase/handler/ha_innodb.cc:7905 This statement INSERT IGNORE INTO `t5` ( `pk`, `col_char` ) VALUES ( NULL , 'l' ) /* E_R Thread1 QNO 63149 CON_ID 14 */ ; was successfully autocommitted with that 1 invalid row inserted, because unique_checks=0 allowed the buffered insert into the unique index col_date to happen. Had it not been possible to use the change buffer, the uniqueness violation would have been noticed.

Matthias Leich added a comment - 2023-08-23 17:12

origin/bb-10.6-~~MDEV-30100~~ 562850823ba74e15ae638205aeacee9f678de393 2023-08-17T11:14:02+03:00
performed well in the last round of RQG testing. No new problems.

Matthias Leich added a comment - 2023-08-23 17:12 origin/bb-10.6- MDEV-30100 562850823ba74e15ae638205aeacee9f678de393 2023-08-17T11:14:02+03:00 performed well in the last round of RQG testing. No new problems.

Vladislav Lesin added a comment - 2023-08-25 10:08

The code looks correct to me. Thanks to marko for explaining some details.

Vladislav Lesin added a comment - 2023-08-25 10:08 The code looks correct to me. Thanks to marko for explaining some details.

Marko Mäkelä added a comment - 2023-08-25 11:18

Thanks again to mleich for testing this, and vlad.lesin for the review. I added some comments to trx_t::commit_empty() to explain the logic.

Marko Mäkelä added a comment - 2023-08-25 11:18 Thanks again to mleich for testing this, and vlad.lesin for the review. I added some comments to trx_t::commit_empty() to explain the logic.

MariaDB Server

Assertion `purge_sys.tail.trx_no <= purge_sys.rseg->last_trx_no()'

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration