The first breakpoint is being hit exactly once in the entire trace, for a different primary key, col1=0x81c (2076) and not col1=0x11a. I think that in order to find and fix the cause of this corruption, we might need some test case simplification.
Marko Mäkelä
added a comment - I tried but failed to find the exact root cause of this.
break ibuf0ibuf.cc:3509
cond 1 index==0x12222402ac38 && *(char*)entry.fields[0].data=='2'
break ibuf_merge_or_delete_for_page
cond 2 page_id.m_id==0x50000000e
continue
The first breakpoint is being hit exactly once in the entire trace, for a different primary key, col1=0x81c (2076) and not col1=0x11a. I think that in order to find and fix the cause of this corruption, we might need some test case simplification.
I just spent 4 hours analyzing the root cause of another failure:
# 2023-08-17T13:18:56 [33159] ERROR: checkDatabaseIntegrity server[1]: Query ->ALTER TABLE test . t5 FORCE<- failed with 1062 : Duplicate entry '1900-01-01' for key 'col_date'
Yes, there was a duplicate value 0x8ed821 for that unique column, with the PRIMARY KEY values pk=0x875 and pk=0x876. It took me some effort to find the original INSERT statement. At that point, the unique key did exist, but the constraint was violated because innodb_change_buffering was enabled and unique_checks=0 was in effect. The problematic INSERT only inserted that single row. In the clustered index page at that time, we got:
That is, pk=0x875 was inserted by DB_TRX_ID=0x8004, and the current transaction is inserting pk=0x876,DB_TRX_ID=0x8012. In both records right after the 7-byte DB_ROLL_PTR, we see the duplicate col_date value 0x8ed821. This is still OK; the duplicate should be detected when we attempt to insert that key to the unique index. After that point, this last insert should be rolled back. However, the duplicate key was not noticed due to the following:
Run till exit from #0 ibuf_insert (op=op@entry=IBUF_OP_INSERT, entry=entry@entry=0x1b2d140aaee8, index=0x1b2d1409e898, page_id=page_id@entry={m_id = 38654705818}, zip_size=zip_size@entry=0,
thr=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/ibuf/ibuf0ibuf.cc:3512
0x0000555a15fc0f8a in btr_cur_t::search_leaf (this=this@entry=0x79f71d2ad760, tuple=tuple@entry=0x1b2d140aaee8, mode=mode@entry=PAGE_CUR_LE, latch_mode=BTR_MODIFY_LEAF, latch_mode@entry=2114,
mtr=mtr@entry=0x79f71d2adfb0) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/include/btr0cur.h:744
at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3015
#2 0x0000555a15f4e312 in row_ins_sec_index_entry (index=0x1b2d1409e898, entry=0x1b2d140aaee8, thr=0x1b2d1411f890, check_foreign=<optimized out>)
at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3322
#3 0x0000555a15f4e7ec in row_ins_index_entry (thr=0x1b2d1411f890, entry=<optimized out>, index=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3370
#4 row_ins_index_entry_step (thr=0x1b2d1411f890, node=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3536
#5 row_ins (thr=0x1b2d1411f890, node=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3661
#6 row_ins_step (thr=thr@entry=0x1b2d1411f890) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3790
#7 0x0000555a15f5b227 in row_insert_for_mysql (mysql_rec=mysql_rec@entry=0x1b2d140ac2c0 "\377", prebuilt=0x1b2d1412ed98, ins_mode=ROW_INS_NORMAL)
at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0mysql.cc:1315
#8 0x0000555a15eca90e in ha_innobase::write_row (this=0x1b2d140a8490, record=0x1b2d140ac2c0 "\377") at /data/Server/bb-10.6-MDEV-30100/storage/innobase/handler/ha_innodb.cc:7905
was successfully autocommitted with that 1 invalid row inserted, because unique_checks=0 allowed the buffered insert into the unique index col_date to happen. Had it not been possible to use the change buffer, the uniqueness violation would have been noticed.
Marko Mäkelä
added a comment - I just spent 4 hours analyzing the root cause of another failure:
# 2023-08-17T13:18:56 [33159] ERROR: checkDatabaseIntegrity server[1]: Query ->ALTER TABLE test . t5 FORCE<- failed with 1062 : Duplicate entry '1900-01-01' for key 'col_date'
Yes, there was a duplicate value 0x8ed821 for that unique column, with the PRIMARY KEY values pk=0x875 and pk=0x876. It took me some effort to find the original INSERT statement. At that point, the unique key did exist, but the constraint was violated because innodb_change_buffering was enabled and unique_checks=0 was in effect. The problematic INSERT only inserted that single row. In the clustered index page at that time, we got:
#1 page_cur_insert_rec_low (cur=cur@entry=0x79f71d2ada20, rec=rec@entry=0x1b2d141183be "\200", offsets=<optimized out>, mtr=mtr@entry=0x79f71d2adfc0)
at /data/Server/bb-10.6-MDEV-30100/storage/innobase/page/page0cur.cc:1569
1569 mach_write_to_2(cur->rec - REC_NEXT,
(rr) p/x *rec@30
$44 = {0x80, 0x0, 0x8, 0x76, 0x0, 0x0, 0x0, 0x0, 0x80, 0x12, 0x92, 0x0, 0x0, 0x4, 0x66, 0xc, 0x10, 0x8e, 0xd8, 0x21, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
(rr) p/x *cur.rec@30
$46 = {0x80, 0x0, 0x8, 0x75, 0x0, 0x0, 0x0, 0x0, 0x80, 0x4, 0x89, 0x0, 0x0, 0x4, 0x65, 0xa, 0x88, 0x8e, 0xd8, 0x21, 0x0, 0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
That is, pk=0x875 was inserted by DB_TRX_ID=0x8004, and the current transaction is inserting pk=0x876,DB_TRX_ID=0x8012. In both records right after the 7-byte DB_ROLL_PTR, we see the duplicate col_date value 0x8ed821. This is still OK; the duplicate should be detected when we attempt to insert that key to the unique index. After that point, this last insert should be rolled back. However, the duplicate key was not noticed due to the following:
Thread 29 hit Breakpoint 19, ibuf_insert_low (mode=BTR_MODIFY_PREV, op=IBUF_OP_INSERT, no_counter=1, entry=0x1b2d140aaee8, entry_size=12, index=0x1b2d1409e898, page_id={m_id = 38654705818}, zip_size=0,
thr=0x1b2d1411f890) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/ibuf/ibuf0ibuf.cc:3111
3111 {
5: x/i $pc
=> 0x555a1593cd90 <ibuf_insert_low(btr_latch_mode, ibuf_op_t, ulint, dtuple_t const*, ulint, dict_index_t*, page_id_t, ulint, que_thr_t*)>: push %rbp
(rr) p index.name
$54 = {m_name = 0x1b2d1409ea28 "col_date"}
(rr) fin
Run till exit from #0 ibuf_insert_low (mode=BTR_MODIFY_PREV, op=IBUF_OP_INSERT, no_counter=1, entry=0x1b2d140aaee8, entry_size=12, index=0x1b2d1409e898, page_id={m_id = 38654705818}, zip_size=0,
thr=0x1b2d1411f890) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/ibuf/ibuf0ibuf.cc:3111
ibuf_insert (op=op@entry=IBUF_OP_INSERT, entry=entry@entry=0x1b2d140aaee8, index=0x1b2d1409e898, page_id=page_id@entry={m_id = 38654705818}, zip_size=zip_size@entry=0, thr=<optimized out>)
at /data/Server/bb-10.6-MDEV-30100/storage/innobase/ibuf/ibuf0ibuf.cc:3512
3512 if (err == DB_FAIL) {
5: x/i $pc
=> 0x555a1593ead5 <ibuf_insert(ibuf_op_t, dtuple_t const*, dict_index_t*, page_id_t, unsigned long, que_thr_t*)+533>: add $0x20,%rsp
Value returned is $55 = DB_SUCCESS
(rr)
Run till exit from #0 ibuf_insert (op=op@entry=IBUF_OP_INSERT, entry=entry@entry=0x1b2d140aaee8, index=0x1b2d1409e898, page_id=page_id@entry={m_id = 38654705818}, zip_size=zip_size@entry=0,
thr=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/ibuf/ibuf0ibuf.cc:3512
0x0000555a15fc0f8a in btr_cur_t::search_leaf (this=this@entry=0x79f71d2ad760, tuple=tuple@entry=0x1b2d140aaee8, mode=mode@entry=PAGE_CUR_LE, latch_mode=BTR_MODIFY_LEAF, latch_mode@entry=2114,
mtr=mtr@entry=0x79f71d2adfb0) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/include/btr0cur.h:744
744 dict_index_t *index() const { return page_cur.index; }
5: x/i $pc
=> 0x555a15fc0f8a <btr_cur_t::search_leaf(dtuple_t const*, page_cur_mode_t, btr_latch_mode, mtr_t*)+2026>: mov -0x538(%rbp),%r10
Value returned is $56 = true
(rr) bt
#0 0x0000555a15fc0f8a in btr_cur_t::search_leaf (this=this@entry=0x79f71d2ad760, tuple=tuple@entry=0x1b2d140aaee8, mode=mode@entry=PAGE_CUR_LE, latch_mode=BTR_MODIFY_LEAF, latch_mode@entry=2114,
mtr=mtr@entry=0x79f71d2adfb0) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/include/btr0cur.h:744
#1 0x0000555a15f4ac05 in row_ins_sec_index_entry_low (flags=0, mode=BTR_MODIFY_LEAF, index=0x1b2d1409e898, offsets_heap=<optimized out>, heap=0x1b2d14256430, entry=0x1b2d140aaee8, trx_id=0, thr=0x1b2d1411f890)
at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3015
#2 0x0000555a15f4e312 in row_ins_sec_index_entry (index=0x1b2d1409e898, entry=0x1b2d140aaee8, thr=0x1b2d1411f890, check_foreign=<optimized out>)
at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3322
#3 0x0000555a15f4e7ec in row_ins_index_entry (thr=0x1b2d1411f890, entry=<optimized out>, index=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3370
#4 row_ins_index_entry_step (thr=0x1b2d1411f890, node=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3536
#5 row_ins (thr=0x1b2d1411f890, node=<optimized out>) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3661
#6 row_ins_step (thr=thr@entry=0x1b2d1411f890) at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0ins.cc:3790
#7 0x0000555a15f5b227 in row_insert_for_mysql (mysql_rec=mysql_rec@entry=0x1b2d140ac2c0 "\377", prebuilt=0x1b2d1412ed98, ins_mode=ROW_INS_NORMAL)
at /data/Server/bb-10.6-MDEV-30100/storage/innobase/row/row0mysql.cc:1315
#8 0x0000555a15eca90e in ha_innobase::write_row (this=0x1b2d140a8490, record=0x1b2d140ac2c0 "\377") at /data/Server/bb-10.6-MDEV-30100/storage/innobase/handler/ha_innodb.cc:7905
This statement
INSERT IGNORE INTO `t5` ( `pk`, `col_char` ) VALUES ( NULL , 'l' ) /* E_R Thread1 QNO 63149 CON_ID 14 */ ;
was successfully autocommitted with that 1 invalid row inserted, because unique_checks=0 allowed the buffered insert into the unique index col_date to happen. Had it not been possible to use the change buffer, the uniqueness violation would have been noticed.
origin/bb-10.6-MDEV-30100 562850823ba74e15ae638205aeacee9f678de393 2023-08-17T11:14:02+03:00
performed well in the last round of RQG testing. No new problems.
Matthias Leich
added a comment - origin/bb-10.6- MDEV-30100 562850823ba74e15ae638205aeacee9f678de393 2023-08-17T11:14:02+03:00
performed well in the last round of RQG testing. No new problems.
Thanks again to mleich for testing this, and vlad.lesin for the review. I added some comments to trx_t::commit_empty() to explain the logic.
Marko Mäkelä
added a comment - Thanks again to mleich for testing this, and vlad.lesin for the review. I added some comments to trx_t::commit_empty() to explain the logic.
People
Marko Mäkelä
dongjian
Votes:
1Vote for this issue
Watchers:
8Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
I tried but failed to find the exact root cause of this.
break ibuf0ibuf.cc:3509
cond 1 index==0x12222402ac38 && *(char*)entry.fields[0].data=='2'
break ibuf_merge_or_delete_for_page
cond 2 page_id.m_id==0x50000000e
continue
The first breakpoint is being hit exactly once in the entire trace, for a different primary key, col1=0x81c (2076) and not col1=0x11a. I think that in order to find and fix the cause of this corruption, we might need some test case simplification.