|
In the trace that I analyzed, both the clustered index and the secondary index page consist of a single page. There are 7 records in the clustered index (PRIMARY KEY(col1)) and 4 in the secondary index.
Secondary index:(col1,col_string)=
(1,'1111111111'),
(2,'2222222222'),
(332,'3333333333'),
(7,'1111111111').
None of the records are delete-marked. The first record was inserted last in the record heap; the others were in order. The PAGE_FREE list is empty. All this would suggest that the index was created afterwards and not at table creation.
Clustered index: (col1,DB_TRX_ID,DB_ROLL_PTR,col2,col_int,col_string,col_text)=
(1,0,1<<55,1,1,'1111111111',blob),
(2,0,1<<55,2,2,'2222222222',blob),
(8,0,1<<55,7,8,'8888888888',blob),
(40,0,1<<55,39,40,'4444444444',blob),
(332,0,1<<55,332,324,'3333333333',blob),
(374,0,1<<55,1,373,'3333333333',blob),
(1962,0,1<<55,1,324,'1111111111',blob).
Again, no records are delete-marked, and the PAGE_FREE list is empty.
There are not only missing records in the secondary index, but also an extra record col1=7, even though all history has been purged from the clustered index (DB_TRX_ID=0 was reset by MDEV-12288). Extra records in the secondary index are only allowed when they correspond to an old version of the clustered index records. Such records would also be delete-marked in the secondary index, unless the DELETE operation is in progress.
In other words, this corruption could have manifested itself as MDEV-9663/MDEV-14643 messages as well. Unfortunately, the EXTENDED attribute of CHECK TABLE is being ignored by InnoDB. It would be good if it actually did what is requested: check that there is a 1:1 correspondence between the secondary index and clustered index records.
Watchpoints on PAGE_N_RECS on the two pages will let us know what exactly happened:
- PRIMARY KEY(col1)=8 is inserted
- ROLLBACK will empty the table
- PRIMARY KEY(col1)=1486 is inserted
- ROLLBACK will empty the table
- PRIMARY KEY(col1)=332 is inserted
- PRIMARY KEY(col1)=1962 is inserted
- PRIMARY KEY(col1)=374 is inserted
- PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is inserted in a new transaction
- PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is deleted by ROLLBACK
- At event 51548, PRIMARY KEY(col1,DB_TRX_ID)=(0,0x28) is inserted
- PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is deleted by ROLLBACK
- …
- At event 51578, COMMIT invokes MDL_context::release_transactional_locks() for transaction 0x28 even though it is still active!
- At event 55952, ha_innobase::prepare_inplace_alter_table() is invoked, even though transaction 0x28 is still active with 11 undo log records
- At event 56196, the secondary index page is initialized with 3 records.
I think that we have seen enough. Here is the stack trace for what I think is the smoking gun:
|
10.5 c89366866bca2df46b0592719a1f6b6dabf470cba
|
Thread 1 hit Breakpoint 6, MDL_context::release_transactional_locks (this=this@entry=0x62b0000bd338) at /home/mleich/10.5/sql/mdl.cc:3043
|
3043 {
|
(rr) when
|
Current event: 51578
|
(rr) bt
|
#0 MDL_context::release_transactional_locks (this=this@entry=0x62b0000bd338) at /home/mleich/10.5/sql/mdl.cc:3043
|
#1 0x000055c67c48c8c8 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5585
|
#2 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false,
|
is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993
|
#3 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 " COMMIT /* E_R Thread2 QNO 99 CON_ID 15 */ ",
|
packet_length=packet_length@entry=44, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866
|
…
|
(rr) up
|
#1 0x000055c67c48c8c8 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5585
|
5585 thd->mdl_context.release_transactional_locks();
|
(rr) list
|
5580 lex->tx_chain != TVL_NO));
|
5581 bool tx_release= (lex->tx_release == TVL_YES ||
|
5582 (thd->variables.completion_type == 2 &&
|
5583 lex->tx_release != TVL_NO));
|
5584 bool commit_failed= trans_commit(thd);
|
5585 thd->mdl_context.release_transactional_locks();
|
5586 if (commit_failed)
|
5587 {
|
5588 WSREP_DEBUG("COMMIT failed, MDL released: %lld",
|
5589 (longlong) thd->thread_id);
|
(rr) i lo
|
tx_chain = false
|
tx_release = false
|
commit_failed = true
|
…
|
This looks like the MDEV-15532/MDEV-22076 scenario: COMMIT is refused for an XA transaction, but yet we wrongly release the MDL that should keep protecting the transaction.
mleich, even though the table has a nice small number of records, I do not think that it is useful to test the combination of XA+DDL+DML before those bugs have been addressed.
Please try to reproduce this kind of corruption without XA, or DML with XA (but no concurrent DDL). I do not think that there are many users of XA yet, because it was practically broken before MDEV-742 was addressed in 10.5.
|
|
mleich, I am sorry for blaming you. We do actually have XA transactions, but they are internal XA transactions for the binlog. Here are a couple examples from the same rr replay trace:
Thread 3 hit Breakpoint 1, innobase_xa_prepare (hton=0x615000001e98, thd=0x62b0000bd218, prepare_trx=false) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:17190
|
17190 {
|
(rr) bt
|
#0 innobase_xa_prepare (hton=0x615000001e98, thd=0x62b0000bd218, prepare_trx=false) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:17190
|
#1 0x000055c67cd50c8b in prepare_or_error (ht=ht@entry=0x615000001e98, thd=thd@entry=0x62b0000bd218, all=all@entry=false) at /home/mleich/10.5/sql/handler.cc:1364
|
#2 0x000055c67cd54297 in ha_commit_trans (thd=thd@entry=0x62b0000bd218, all=all@entry=false) at /home/mleich/10.5/sql/handler.cc:1742
|
#3 0x000055c67c94e010 in trans_commit_stmt (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:472
|
#4 0x000055c67c48fee7 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:6011
|
#5 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false,
|
is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993
|
#6 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218,
|
packet=packet@entry=0x629000ae1219 " INSERT INTO t1 (col1,col2, col_int, col_string, col_text) VALUES ( 8, 8, 8 - 1, REPEAT(SUBSTR(CAST( 8 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 8 AS CHAR),1,1), @fill_amount) ) /* E_R Thread2 QNO 36 CO"..., packet_length=packet_length@entry=211, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866
|
#7 0x000055c67c4676ab in do_command (thd=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:1347
|
#8 0x000055c67c907775 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x60800000cf38, put_in_cache=put_in_cache@entry=true) at /home/mleich/10.5/sql/sql_connect.cc:1411
|
#9 0x000055c67c908a19 in handle_one_connection (arg=arg@entry=0x60800000cf38) at /home/mleich/10.5/sql/sql_connect.cc:1313
|
#10 0x000055c67d682162 in pfs_spawn_thread (arg=0x615000008f18) at /home/mleich/10.5/storage/perfschema/pfs.cc:2201
|
#11 0x00007ffd78caf6db in start_thread (arg=0x443e7feb0300) at pthread_create.c:463
|
#12 0x00001f030f433a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
(rr) when
|
Current event: 48560
|
(rr) continue
|
…
|
Thread 14 hit Breakpoint 5, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
(rr) when
|
Current event: 51578
|
(rr) bt
|
#0 trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256
|
#1 0x000055c67c48c8af in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5584
|
#2 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false,
|
is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993
|
#3 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 " COMMIT /* E_R Thread2 QNO 99 CON_ID 15 */ ",
|
packet_length=packet_length@entry=44, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866
|
#4 0x000055c67c4676ab in do_command (thd=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:1347
|
#5 0x000055c67c907775 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x60800000cf38, put_in_cache=put_in_cache@entry=true) at /home/mleich/10.5/sql/sql_connect.cc:1411
|
#6 0x000055c67c908a19 in handle_one_connection (arg=arg@entry=0x60800000cf38) at /home/mleich/10.5/sql/sql_connect.cc:1313
|
#7 0x000055c67d682162 in pfs_spawn_thread (arg=0x615000008f18) at /home/mleich/10.5/storage/perfschema/pfs.cc:2201
|
#8 0x00007ffd78caf6db in start_thread (arg=0x443e7feb0300) at pthread_create.c:463
|
#9 0x00001f030f433a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
The trans_commit() fails because trans_check() fails due to thd->transaction->xid_state.is_explicit_XA().
I find it hard to follow the control flow. Watchpoints are not useful, because the flags seem to be "disabled by default". For example, trans->no_2pc was never set, and I would expect the same to hold for the thd->transaction->xid_state.xid_cache_element.
If this corruption is not repeatable with --skip-log-bin, that would explain why mikhail.izioumtchenko@booking.com never repeated this corruption for MySQL when we were employed by Oracle.
|
|
Thanks to sujatha.sivakumar, I verified with one more breakpoint that the failure looks like a duplicate of the MDEV-15532 scenario (explicit use of XA START, with wrong COMMIT. Sorry for the false alarm.
mleich, please try again without XA.
Thread 2 hit Breakpoint 1, xid_cache_insert (thd=thd@entry=0x62b0000cb218,
|
xid_state=0x62b0000ce838, xid=0x62b0000d2350)
|
at /home/mleich/10.5/sql/xa.cc:286
|
286 {
|
(rr) when
|
Current event: 49759
|
(rr) bt
|
#0 xid_cache_insert (thd=thd@entry=0x62b0000cb218, xid_state=0x62b0000ce838,
|
xid=0x62b0000d2350) at /home/mleich/10.5/sql/xa.cc:286
|
#1 0x000055c67cb683d0 in trans_xa_start (thd=thd@entry=0x62b0000cb218)
|
at /home/mleich/10.5/sql/xa.cc:453
|
#2 0x000055c67c48e1fa in mysql_execute_command (thd=thd@entry=0x62b0000cb218)
|
at /home/mleich/10.5/sql/sql_parse.cc:5793
|
#3 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000cb218,
|
rawbuf=<optimized out>, length=<optimized out>,
|
parser_state=parser_state@entry=0x64a36dd61ff0,
|
is_com_multi=is_com_multi@entry=false,
|
is_next_command=is_next_command@entry=false)
|
at /home/mleich/10.5/sql/sql_parse.cc:7993
|
#4 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY,
|
thd=thd@entry=0x62b0000cb218,
|
packet=packet@entry=0x629000ac3219 "XA START 'xid96' /* E_R Thread1 QNO 40 CON_ID 14 */ ", packet_length=packet_length@entry=53,
|
…
|
(rr) break trans_commit
|
Breakpoint 2 at 0x55c67c94c6b7: file /home/mleich/10.5/sql/transaction.cc, line 256.
|
(rr) c
|
Continuing.
|
[Switching to Thread 26146.38140]
|
|
Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
(rr) when
|
Current event: 49854
|
(rr) c
|
Continuing.
|
[Switching to Thread 26146.39212]
|
|
Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
(rr) when
|
Current event: 49896
|
(rr) bt
|
#0 trans_commit (thd=thd@entry=0x62b0000cb218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
#1 0x000055c67c48c8af in mysql_execute_command (thd=thd@entry=0x62b0000cb218)
|
at /home/mleich/10.5/sql/sql_parse.cc:5584
|
#2 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000cb218,
|
rawbuf=<optimized out>, length=<optimized out>,
|
parser_state=parser_state@entry=0x64a36dd61ff0,
|
is_com_multi=is_com_multi@entry=false,
|
is_next_command=is_next_command@entry=false)
|
at /home/mleich/10.5/sql/sql_parse.cc:7993
|
#3 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY,
|
thd=thd@entry=0x62b0000cb218,
|
packet=packet@entry=0x629000ac3219 " COMMIT /* E_R Thread1 QNO 44 CON_ID 14 */ ", packet_length=packet_length@entry=44,
|
is_com_multi=is_com_multi@entry=false,
|
is_next_command=is_next_command@entry=false)
|
at /home/mleich/10.5/sql/sql_parse.cc:1866
|
…
|
(rr) n
|
258 DBUG_ENTER("trans_commit");
|
(rr)
|
260 if (trans_check(thd))
|
(rr)
|
261 DBUG_RETURN(TRUE);
|
According to this, the thread first executed XA START and then the same thread attempted to execute COMMIT instead of an XA statement. This is exactly the MDEV-15532 scenario.
|
|
I posted an earlier XA START;COMMIT pair than the one that involves the transaction 0x28. Here is the correct pair that explains why the COMMIT fails:
Thread 14 hit Breakpoint 1, xid_cache_insert (thd=thd@entry=0x62b0000bd218,
|
xid_state=0x62b0000c0838, xid=0x62b0000c4350)
|
at /home/mleich/10.5/sql/xa.cc:286
|
286 {
|
(rr) when
|
Current event: 50027
|
(rr) bt
|
#0 xid_cache_insert (thd=thd@entry=0x62b0000bd218, xid_state=0x62b0000c0838,
|
xid=0x62b0000c4350) at /home/mleich/10.5/sql/xa.cc:286
|
#1 0x000055c67cb683d0 in trans_xa_start (thd=thd@entry=0x62b0000bd218)
|
at /home/mleich/10.5/sql/xa.cc:453
|
#2 0x000055c67c48e1fa in mysql_execute_command (thd=thd@entry=0x62b0000bd218)
|
at /home/mleich/10.5/sql/sql_parse.cc:5793
|
#3 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218,
|
rawbuf=<optimized out>, length=<optimized out>,
|
parser_state=parser_state@entry=0x443e7feadff0,
|
is_com_multi=is_com_multi@entry=false,
|
is_next_command=is_next_command@entry=false)
|
at /home/mleich/10.5/sql/sql_parse.cc:7993
|
#4 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY,
|
thd=thd@entry=0x62b0000bd218,
|
packet=packet@entry=0x629000ae1219 "XA START 'xid164' /* E_R Thread2 QNO 62 CON_ID 15 */ ", packet_length=packet_length@entry=54,
|
is_com_multi=is_com_multi@entry=false,
|
is_next_command=is_next_command@entry=false)
|
at /home/mleich/10.5/sql/sql_parse.cc:1866
|
…
|
(rr) break trans_commit
|
Breakpoint 2 at 0x55c67c94c6b7: file /home/mleich/10.5/sql/transaction.cc, line 256.
|
(rr) c
|
Continuing.
|
Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
(rr) when
|
Current event: 50104
|
(rr) c
|
Continuing.
|
[Switching to Thread 26146.39212]
|
|
Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
(rr) when
|
Current event: 50421
|
(rr) c
|
Continuing.
|
Continuing.
|
[Switching to Thread 26146.38140]
|
|
Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
(rr) when
|
Current event: 50475
|
(rr) c
|
Continuing.
|
[Switching to Thread 26146.39212]
|
|
Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
(rr) command 2
|
Type commands for breakpoint(s) 2, one per line.
|
End with a line saying just "end".
|
>when
|
>end
|
(rr) c
|
Continuing.
|
|
Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
Current event: 50613
|
(rr) c
|
Continuing.
|
[Switching to Thread 26146.38140]
|
|
Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
Current event: 51061
|
(rr) c
|
Continuing.
|
[Switching to Thread 26146.39212]
|
|
Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
Current event: 51084
|
(rr)
|
Continuing.
|
[Switching to Thread 26146.38140]
|
|
Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
Current event: 51248
|
(rr)
|
Continuing.
|
[Switching to Thread 26146.39212]
|
|
Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
Current event: 51299
|
(rr)
|
Continuing.
|
[Switching to Thread 26146.38140]
|
|
Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
Current event: 51463
|
(rr)
|
Continuing.
|
[Switching to Thread 26146.39212]
|
|
Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
Current event: 51516
|
(rr)
|
Continuing.
|
[Switching to Thread 26146.38140]
|
|
Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)
|
at /home/mleich/10.5/sql/transaction.cc:256
|
256 {
|
Current event: 51578
|
If you search for Thread 14 in the above, you can see that the XA START 'xid164' was followed by multiple incorrect COMMIT. The invalid MDL release should have occurred already at event 50475, and not 51578, like I claimed earlier. Either way, the ha_innobase::prepare_inplace_alter_table() is able to proceed at event 55952 even though the conflicting transaction is still active inside InnoDB.
|
|
Sorry for confusion
|
The rr trace offered belongs to some different RQG run than the run mentioned on top (002082.log).
|
Both runs show "Warning InnoDB: Index 'Marvão_idx3' contains .... entries, should be ..." but
|
- in the first run (002082.log) the grammar does not generate XA commands
|
- in the run with rr trace XA commands are generated.
|
I will make new rr traces with the first grammar.
|
|
|
|
For several weeks, we had lost access to the environment where this bug could be reproduced. We not only need to reproduce this extremely rare scenario, but we also need to reproduce it while the server is running under rr record, so that the execution trace can be analyzed. It turns out that https://rr-project.org/ does not work in most virtual machine environments.
|
|
I analyzed another case of this on 10.5 9ef36faa614528b66e0a6ba58a4c40b246658b83. The primary key index contains 32 records, while the secondary index contains 33 records. Unfortunately, the secondary index is defined on a virtual column, and therefore this corruption should be a duplicate of one of the bugs related to indexed virtual columns (MDEV-5800).
The contents of the secondary index (col_int_g,col1) is as follows. None of the records are delete-marked:
bebebebe,0004
|
bebebebe,0006
|
bebebebe,0007
|
bebebebe,0011
|
bebebebe,001f
|
bebebebe,0023
|
bebebebe,0032
|
bebebebe,003a
|
bebebebe,0092
|
bebebebe,00b3
|
bebebebe,00d0
|
bebebebe,01c7
|
bebebebe,0384
|
bebebebe,03c8
|
bebebebe,0746
|
bebebebe,077f
|
bebebebe,082c
|
bebebebe,0940
|
bebebebe,098c
|
bebebebe,0c77
|
bebebebe,0d53
|
bebebebe,0de2
|
bebebebe,0e06
|
0008,0008
|
0033,0033
|
003e,003e
|
00f7,00f8
|
0183,0183
|
05c1,05c1
|
0b9f,0b9f
|
0ba9,0ba9
|
0e05,0e06
|
0fa8,0fa9
|
The 23 values with the negative INT numbers 0xbebebebe (0x3ebebebe in the InnoDB table) are somewhat suspicious. InnoDB used to use the constant 0xbe in the file mem0dbg.cc, but that code was removed in MySQL 5.7.4 and MariaDB 10.2.2.
The table was created and altered to reach the following final definition:
CREATE TABLE t1 (
|
col1 int(11) NOT NULL,
|
col2 int(11) DEFAULT NULL,
|
col_int int(11) DEFAULT NULL,
|
col_string char(19) DEFAULT NULL,
|
col_varchar varchar(500) DEFAULT NULL,
|
col_text text DEFAULT NULL,
|
col_int_g int(11) GENERATED ALWAYS AS (col_int) VIRTUAL,
|
col_string_g char(13) GENERATED ALWAYS AS (substr(col_string,4,13)) VIRTUAL,
|
col_text_g text GENERATED ALWAYS AS (substr(col_text,1,499)) VIRTUAL,
|
col2_copy int(11) DEFAULT NULL,
|
PRIMARY KEY (col1),
|
UNIQUE KEY marvão_uidx2 (col_int),
|
UNIQUE KEY uidx3 (col_int,col2),
|
UNIQUE KEY uidx2 (col_int),
|
UNIQUE KEY uidx1 (col_text(9)),
|
KEY idx1 (col_text(9),col1),
|
KEY idx3 (col1),
|
KEY MArvão_idx3 (col_int_g),
|
KEY idx2 (col_text(9),col_int)
|
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC
|
So, we should have col_int_g=col1, which is the primary key. But in the primary key index the smallest value of col1 is 4, not -0x3ebebebe. After the 23 first tuples of the secondary index we also have a few other records where the two columns differ (only by 1).
I think that we need a test case that does not include indexed virtual columns.
nikitamalyavin, which bug on virtual columns would you think that this case is duplicating?
|
|
The following rr trace shows a case where
|
CHECK TABLE `test`.`table100_innodb_key_pk_parts_2_int_autoinc` EXTENDED harvests
|
'test.table100_innodb_key_pk_parts_2_int_autoinc', 'check', 'error', 'Partition p0 returned error'
|
'test.table100_innodb_key_pk_parts_2_int_autoinc', 'check', 'Warning', 'InnoDB: Index \'col_varchar_255_ucs2_key\' contains 79 entries, should be 80.'
|
'test.table100_innodb_key_pk_parts_2_int_autoinc', 'check', 'error', 'Corrupt'
|
Workflow:
|
1. Start DB server and create a few tables.
|
Generated columns or foreign keys are not involved. But one table is partitioned.
|
innodb-buffer-pool-size=5M is exceptional small.
|
2. One session runs DML only
|
3. At some point of time (2. is ongoing) the server process gets killed with SIGKILL
|
4. Restart and check the consistency of table via SELECTs and CHECK TABLE extended
|
|
HEAD, origin/bb-10.5-MDEV-23456 20a8a5ba50eae3d0ae92da202f5fada89409d072 2020-08-28T22:54:26+05:30
|
compiled with ASAN and "Og".
|
|
RQG call
|
perl rqg.pl \
|
--grammar=conf/engines/many_indexes.yy \
|
--gendata=conf/engines/many_indexes.zz \
|
--reporters=CrashRecovery1 \
|
--mysqld=--innodb_use_native_aio=1 \
|
--mysqld=--innodb_lock_schedule_algorithm=fcfs \
|
--mysqld=--loose-idle_write_transaction_timeout=0 \
|
--mysqld=--loose-idle_transaction_timeout=0 \
|
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
|
--mysqld=--connect_timeout=60 \
|
--mysqld=--interactive_timeout=28800 \
|
--mysqld=--slave_net_timeout=60 \
|
--mysqld=--net_read_timeout=30 \
|
--mysqld=--net_write_timeout=60 \
|
--mysqld=--loose-table_lock_wait_timeout=50 \
|
--mysqld=--wait_timeout=28800 \
|
--mysqld=--lock-wait-timeout=86400 \
|
--mysqld=--innodb-lock-wait-timeout=50 \
|
--no-mask \
|
--queries=10000000 \
|
--seed=random \
|
--reporters=Backtrace \
|
--reporters=ErrorLog \
|
--reporters=Deadlock1 \
|
--validators=None \
|
--mysqld=--log_output=none \
|
--mysqld=--log-bin \
|
--mysqld=--log_bin_trust_function_creators=1 \
|
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
|
--engine=InnoDB \
|
--restart_timeout=120 \
|
--duration=300 \
|
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
|
--mysqld=--loose-innodb-sync-debug \
|
--mysqld=--innodb_stats_persistent=off \
|
--mysqld=--loose-max-statement-time=30 \
|
--threads=1 \
|
--mysqld=--innodb_page_size=4K \
|
--mysqld=--innodb-buffer-pool-size=5M \
|
--duration=300 \
|
--no_mask \
|
--workdir=<local settings> \
|
--vardir=<local settings> \
|
--mtr-build-thread=<local settings> \
|
--basedir1=<local settings> \
|
--script_debug=_nix_ \
|
--rr=Server \
|
--rr_options=--chaos
|
|
sdp:/RQG/storage/1598869940/tmp/dev/shm/vardir/1598869940/28/1/rr
|
_RR_TRACE_DIR="." rr replay --mark-stdio mysqld-1
|
|
RQG log is /RQG/storage/1598869940/000094.log
|
|
|
|
|
I filed MDEV-23632 for the 0xbebebebe problem, which I was able to repeat on 10.3 as well. I made a mistake in my earlier comment: col_int_g is an alias for col_int, not col1.
|
|
The trace with ASAN includes crash recovery. As far as I can tell, the table probably may have been corrupted before recovery. In the clustered index, CHECK TABLE accessed 80 records. I dumped the values of the PRIMARY KEY column (an INT, ranging from 0x19 to 0xc0) and the indexed UCS2 column. I repeated the same exercise for the 79 visited records of the secondary index.
It is possible that I made a mistake when dumping the indexed column in the primary key index, because it looks like I would always get a mismatch for the column value. Unfortunately, InnoDB does not implement CHECK TABLE…EXTENDED that would cross-check the keys in each index. It only counts the records.
Because the trace started with a crash recovery, it is possible that the table was corrupted already before recovery. rr replay of the trace before the server was killed could not be replayed; diversion was reported.
Would it be possible to repeat this with fewer columns and indexes, or without crash recovery? Theoretically, one secondary index could be enough. And we should not need many non-indexed columns in the table.
|
|
marko, can't remember the task i am immediately aware of, that could duplicate this one. The value can be left broken in the case of computation failure, which is covered by MDEV-20618 fix. I'd check this one first (especially since you have same-looking values)
|
|
The RQG simplifier is running.
I expect to have some heavy simplified replay test soon.
|
|
Also the newest rr replay trace is flagging corruption in an index on a UCS2 column, apparently without involving server restart, which is good news. mleich says that with UTF-8, the bug would take longer to reproduce.
It is a little challenging to debug the trace, because both the primary key and the secondary key consist of multiple pages even though the number of rows is relatively small. There probably are unnecessary columns and indexes (5 secondary indexes in addition to the failing one). But, I should be able to analyze the available trace. It will just take some extra effort, because I cannot comfortably look at a small number of page dumps but will have to define some breakpoints and auto-display expressions to dump the interesting data.
|
|
The uploaded MDEV-22924.tgz contains some MTR + shellscript based brute force replay test
|
- ./mysqltest_background.sh
|
- suite/innodb/t/MDEV-22924.test
|
- suite/innodb/t/MDEV-22924-master.opt
|
A replay attempt could look like
|
./mtr --mem -rr --rr-arg="--chaos" MDEV-22924
|
You might need
|
- several replay attempts or
|
- to raise the value assigned to $MY_RUNTIME in suite/innodb/t/MDEV-22924.test.
|
Some warnings regarding the content of suite/innodb/t/MDEV-22924.test:
|
I am rather sure that certain statements (there is one with wrong syntax) or some
|
components of the table (column or a key) are most probably not required for replay.
|
But when removing other statements or components usually the runtime required
|
till replay raised serious.
|
|
|
|
I am debugging an rr replay trace that was generated with MDEV-22924.tgz . I observed that while CHECK TABLE is accessing the PRIMARY KEY index, an UPDATE is accessing a secondary index of the same table. Due to this, my standard practice of saving the page dumps at the end of the run did not produce good results. Instead, I ended up setting a breakpoint in buf_page_get_gen() to extract the pages as they were when CHECK TABLE accessed them.
The execution uses innodb_page_size=4k. The clustered index comprises 3 leaf pages, and the secondary index comprises 1 page:
|
PRIMARY KEY(pk)
|
(pk,DB_TRX_ID,DB_ROLL_PTR,col_char_255_utf8_key,col_longtext_ucs2_key,…)=
|
page 12:
|
(1,0x7d9,update,'moderate','',…),
|
(2,0x7d9,update,'suppose','',…),
|
(3,0x7d9,update,'mfrjdutx'…,'',…),
|
page 10:
|
(4,0x7d9,update,'rjdutxv','',…),
|
(5,0x7d9,update,'utxvbchboy'…,'',…),
|
page 11:
|
(6,0x7d9,update,'i','',…)
|
|
KEY(col_longtext_ucs2_key)
|
(col_longtext_ucs2_key,pk)=
|
page 5:
|
('',1),('',2),('',3),('',4),('',5),('',6),
|
delete-marked(repeat(0x744,242),1),
|
delete-marked(repeat(0x744,242),2),
|
… (15 records total)
|
The problem is that CHECK TABLE is only counting 5 records in the secondary index, instead of the expected number 6.
It is entirely normal that the secondary index contains extra records; this is how MVCC works until we implement MDEV-17598. Next, I will check in the execution trace each access of those 6 first secondary index records to determine which record is not being counted, and why.
|
|
All records in the clustered index page carry DB_TRX_ID=2009=0x7d9. Also the PAGE_MAX_TRX_ID in the secondary index is at that value. The next available transaction identifier (trx_sys.m_max_trx_id) is 2012. The read view of the CHECK TABLE excludes transaction 2009.
Transaction 2009 modified all 6 rows of the table by executing the statement
UPDATE table100_innodb_int_autoinc SET col_longtext_ucs2_key = REPEAT( 0xA, 0 ) WHERE col_char_255_utf8_key BETWEEN REPEAT( 0xF5B, 224 ) AND REPEAT( 0xB2, 140 );
|
The transaction was started at the event time (when) 4387662 and committed at 4257184. At the time the transaction started, the clustered index leaf pages were 13,14,10,11.
The CHECK TABLE had started executing on the primary key at when 4255465. At the time the UPDATE transaction is committing, it was executing inside btr_validate_index(), trying to access our secondary index page 5.
It looks like we must ignore the 6 first records in the secondary index page (which must be excluded from the read view) and concentrate on the delete-marked ones, to see which record is missing and why. Here are all 15 secondary index records that were present when CHECK TABLE counted the records:
|
KEY(col_longtext_ucs2_key)
|
(col_longtext_ucs2_key,pk)=
|
page 5:
|
('',1),('',2),('',3),('',4),('',5),('',6),
|
delete-marked(repeat(0x744,242),1),
|
delete-marked(repeat(0x744,242),2),
|
delete-marked(repeat(0x744,242),3),
|
delete-marked(repeat(0x744,242),4),
|
delete-marked(repeat(0x744,242),5),
|
delete-marked(repeat(0x744,242),6),
|
delete-marked(repeat(0xf76,85),1),
|
delete-marked(repeat(0xf76,85),3),
|
delete-marked(repeat(0xf76,85),6),
|
I noticed that also purge is running concurrently with the CHECK TABLE. That should explain why there exist only 3 keys with the value 0xf76.
I hope to complete the analysis of this trace next week. It is rather challenging due to the amount of concurrency, including page splits and merges in the clustered index.
|
|
Transaction 2009 updated the clustered index records to col_longtest_ucs2_key='' at the following event times, with the following preceding contents of the record:
| when |
pk |
DB_TRX_ID |
col_longtest_ucs2_key |
| 4243126 |
6 |
0x7d5 |
repeat(0xf76,85) |
| 4246790 |
3 |
0x7d5 |
repeat(0xf76,85) |
| 4247019 |
1 |
0x7d5 |
repeat(0xf76,85) |
| 4251855 |
4 |
0x7d2 |
repeat(0x744,242) |
| 4253252 |
2 |
0x7d2 |
repeat(0x744,242) |
| 4253481 |
5 |
0x7d2 |
repeat(0x744,242) |
This suggests that none of the records that were visible in the secondary index page when CHECK TABLE was executed could actually be purged.
The UPDATE transaction was holding an S-latch on the secondary index page (5) from event time 4254001 to 4256397.
At event time 4254061, a purge thread starts waiting for an X-latch on the secondary index page.
At event time 4255465, the failing CHECK TABLE starts accessing the clustered index.
Between event time 4256397 and 4256882 the purge thread was holding the X-latch on the secondary index page.
Between event time 4256918 and 4257017, the same purge thread holds the secondary index X-latch again.
Between event time 4257031 and 4257048, the UPDATE transaction 2009 holds the X-latch on the secondary index page, so that it can perform the first half of the last update, to delete-mark the record (repeat(0x744,242),5).
Between event time 4257072 and 4257117, the UPDATE transaction 2009 holds the X-latch on the secondary index page, so that it can perform the second half of the last update, to insert the record ('',5).
At event time 4257186, the UPDATE transaction 2009 commits.
Between event time 4257264 and 4257322, the failing CHECK TABLE holds an SX-latch on the secondary index page.
At event time 4257336, the failing CHECK TABLE acquires an S-latch on the secondary index page. This is the first time it starts counting records in the secondary index.
At event time 4257868, a purge thread requests an exclusive latch on the secondary index page, while our CHECK TABLE is holding an S-latch.
At event time 4258247, the failing CHECK TABLE releases the S-latch on the secondary index page.
At event time 4258372, purge releases the exclusive page latch.
Between event time 4258406 and 4258406, the same purge thread holds an exclusive page latch.
Between event time 4259276 and 4259435, the same purge thread holds an exclusive page latch.
Between event time 4259469 and 4259542, the same purge thread holds an exclusive page latch.
Between event time 4259576 and 4259649, the same purge thread holds an exclusive page latch.
Between event time 4259683 and 4260375, the same purge thread holds an exclusive page latch.
Between event time 4260409 and 4260489, the same purge thread holds an exclusive page latch.
Between event time 4260523 and 4260596, the same purge thread holds an exclusive page latch.
Between event time 4260630 and 4260677, the same purge thread holds an exclusive page latch.
Between event time 4260711 and 4260758, the same purge thread holds an exclusive page latch.
Between event time 4260792 and 4260847, the same purge thread holds an exclusive page latch.
Between event time 4260881 and 4260928, the same purge thread holds an exclusive page latch.
Between event time 4260962 and 4261009, the same purge thread holds an exclusive page latch.
At event time 4261016, our failing CHECK TABLE acquires an S-latch on the secondary index page.
At event time 4261047, an UPDATE transaction 2014 starts to wait for an exclusive page latch.
At event time 4261383, our failing CHECK TABLE releases the S-latch.
At event time 4261431, the UPDATE transaction 2014 releases the exclusive page latch.
At event time 4261445, our failing CHECK TABLE acquires an S-latch on the secondary index page.
At event time 4261457, our failing CHECK TABLE acquires an S-latch on the secondary index page.
Between event time 4261451 and 4261472, the UPDATE transaction 2014 holds the exclusive page latch.
Between event time 4261586 and 4261635, a different purge thread holds an exclusive page latch.
…
At event time 4263975, we report the count mismatch.
Because there is so much concurrent activity, it seems infeasible to track down all changes to the page between the commit of the transaction 2009 and the CHECK TABLE. Next, I will determine which ‘raw’ records the CHECK TABLE is observing in the secondary index, and which of them are being miscounted.
|
|
The following record is unexpectedly determined as not visible:
Thread 4 hit Breakpoint 15, row_search_mvcc (buf=buf@entry=0x621003b4d518 '\276' <repeats 200 times>..., mode=<optimized out>, mode@entry=PAGE_CUR_G, prebuilt=prebuilt@entry=0x62100013b1a0,
|
match_mode=match_mode@entry=0, direction=direction@entry=0) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:5253
|
5253 ut_ad(prebuilt->select_lock_type == LOCK_NONE
|
9: /x *prebuilt.pcur.btr_cur.page_cur.rec@4 = {0x7, 0x44, 0x7, 0x44}
|
11: /x prebuilt.pcur.btr_cur.page_cur.rec[480]@8 = {0x7, 0x44, 0x7, 0x44, 0x80, 0x0, 0x0, 0x2}
|
(rr) when
|
Current event: 4262501
|
The clustered index page 12 was looked up at event time 4262479, and it does contain the key (pk=2). It seems that incorrect caching at Row_sel_get_clust_rec_for_mysql::operator() might be to blame for the mismatch.
At event time 4261371, Row_sel_get_clust_rec_for_mysql::cached_clust_rec had been updated:
Thread 4 hit Hardware watchpoint 17: -location cached_clust_rec
|
|
Old value = (const rec_t *) 0x5acc21d84083 <incomplete sequence \354\200>
|
New value = (const rec_t *) 0x5acc21a18083 "\200"
|
Row_sel_get_clust_rec_for_mysql::operator() (this=this@entry=0x27ed5a27a540, prebuilt=prebuilt@entry=0x62100013b1a0, sec_index=sec_index@entry=0x61700006b7a0, rec=<optimized out>, thr=thr@entry=0x62100013b980,
|
out_rec=out_rec@entry=0x27ed5a27a040, offsets=<optimized out>, offset_heap=<optimized out>, vrow=<optimized out>, mtr=<optimized out>) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:3418
|
3418 cached_old_vers = old_vers;
|
(rr) when
|
Current event: 4261371
|
(rr) p/x *clust_rec@20
|
$88 = {0x80, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd9, 0x4c, 0x0, 0x0, 0x6, 0x31, 0x8, 0xc6, 0x6d, 0x6f, 0x64}
|
Note: (pk,DB_TRX_ID)=(1,2009), not (2,2009).
A little later, we get to check whether the secondary index record with pk=2 should exist in our read view:
(rr)
|
Row_sel_get_clust_rec_for_mysql::operator() (this=this@entry=0x27ed5a27a540, prebuilt=prebuilt@entry=0x62100013b1a0, sec_index=sec_index@entry=0x61700006b7a0, rec=<optimized out>, thr=thr@entry=0x62100013b980,
|
out_rec=out_rec@entry=0x27ed5a27a040, offsets=<optimized out>, offset_heap=<optimized out>, vrow=<optimized out>, mtr=<optimized out>) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:3258
|
3258 clust_rec = btr_pcur_get_rec(prebuilt->clust_pcur);
|
(rr)
|
3260 prebuilt->clust_pcur->trx_if_known = trx;
|
(rr) p clust_rec
|
$91 = (const rec_t *) 0x5acc21a18083 "\200"
|
(rr) when
|
Current event: 4262489
|
(rr) p/x *clust_rec@20
|
$92 = {0x80, 0x0, 0x0, 0x2, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd9, 0x4c, 0x0, 0x0, 0x6, 0x32, 0x6, 0xce, 0x73, 0x75, 0x70}
|
Note: clust_rec is at the same pointer value, but it now points to a different record, (pk,DB_TRX_ID)=(2,2009), presumably because there has been a lot of purge and UPDATE activity that caused the records to move around.
A little later, we hit the bug:
3405 if (clust_rec != cached_clust_rec) {
|
(rr)
|
3420 err = DB_SUCCESS;
|
(rr)
|
3421 old_vers = cached_old_vers;
|
(rr) p/x *cached_old_vers@20
|
$93 = {0x80, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd5, 0x49, 0x0, 0x0, 0x6, 0x30, 0x0, 0x38, 0x6d, 0x6f, 0x64}
|
(rr) when
|
Current event: 4262489
|
So, we wrongly determine the old version of the record pk=2 to be something with pk=1, and subsequently determine that the secondary index record does not exist in the read view.
This incorrect logic, which was introduced in MDEV-20301, would likely have been caught much earlier if we had had a debug assertion to check that the PRIMARY KEY values are equal.
The bug seems to affect MVCC reads of secondary index records only. The MDEV-20301 ‘optimization’ was not applied to checks for implicit locks.
|
|
We might try to rescue the MDEV-20301 logic by checking that the FIL_PAGE_LSN of the page is unchanged. (Note that before MDEV-21132 in 10.5.0, the latest value of FIL_PAGE_LSN could be stored in newest_lsn, and hence that field must be checked as well.) We should also add a debug assertion that the (pk,DB_TRX_ID,DB_ROLL_PTR) match between clust_rec and cached_clust_rec, and that the (pk) matches with cached_old_vers.
|
|
This bug is blocking the testing effort for catching other corruption bugs. I will try to address this myself as soon as possible.
|