[MDEV-22924] Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37. - Jira

Matthias Leich created issue - 2020-06-17 12:23

Matthias Leich made changes - 2020-06-17 12:25

Field	Original Value	New Value
Description	{noformat} Index corruption found during RQG testing Some CHECK TABLE t2 EXTENDED harvested test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37. Work flow: 1. Start the server and generate some initial data 2. Several concurrent sessions runs a DDL/DML mix and sometimes CHECK TABLE EXTENDED Please note critical events like killing the server process followed by Restart with CrashRecovery are not involved.. I have some "rr" trace for the tree origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08 which is based on actual 10.5 but contains some non final changes of Marko. Per Marko the failure cannot be caused by his changes. My test machinery has seen the same problem also on 10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00 but the corresponding details are lost. And so its unknown if that was during concurrent load or after crash with recovery. RQG git clone https://github.com/mleich1/rqg --branch experimental RQG origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00 perl rqg.pl \ --grammar=conf/mariadb/table_stress_innodb_nocopy.yy \ --gendata=conf/mariadb/table_stress.zz \ --gendata_sql=conf/mariadb/table_stress.sql \ --mysqld=--innodb_use_native_aio=1 \ --mysqld=--innodb_lock_schedule_algorithm=fcfs \ --mysqld=--loose-idle_write_transaction_timeout=0 \ --mysqld=--loose-idle_transaction_timeout=0 \ --mysqld=--loose-idle_readonly_transaction_timeout=0 \ --mysqld=--connect_timeout=60 \ --mysqld=--interactive_timeout=28800 \ --mysqld=--slave_net_timeout=60 \ --mysqld=--net_read_timeout=30 \ --mysqld=--net_write_timeout=60 \ --mysqld=--loose-table_lock_wait_timeout=50 \ --mysqld=--wait_timeout=28800 \ --mysqld=--lock-wait-timeout=86400 \ --mysqld=--innodb-lock-wait-timeout=50 \ --no-mask \ --queries=10000000 \ --seed=random \ --reporters=Backtrace \ --reporters=ErrorLog \ --reporters=Deadlock1 \ --validators=None \ --mysqld=--log_output=none \ --mysqld=--log-bin \ --mysqld=--log_bin_trust_function_creators=1 \ --mysqld=--loose-max-statement-time=30 \ --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ --engine=InnoDB \ --restart_timeout=120 \ --duration=300 \ --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \ --mysqld=--innodb_stats_persistent=off \ --threads=33 \ --mysqld=--innodb_page_size=4K \ --mysqld=--innodb-buffer-pool-size=256M \ --duration=300 \ --no_mask \ ... certain local settings ... If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells that running that running a work flow invoking kill+restart was planned. But the data corruption was hit before the RQG reporter "CrashRecovery1" would kill the server process. {noformat}	{noformat} Index corruption found during RQG testing Some CHECK TABLE t2 EXTENDED harvested test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37. And the server error log contains a [rr 76197 241886]2020-06-16 18:31:54 33 [ERROR] InnoDB: Index `MarvÃ£o_idx3` of table `test`.`t2` is corrupted [rr 76197 241890]2020-06-16 18:31:54 33 [Note] InnoDB: Load corrupted index `MarvÃ£o_idx3` of table `test`.`t2` Work flow: 1. Start the server and generate some initial data 2. Several concurrent sessions runs a DDL/DML mix and sometimes CHECK TABLE EXTENDED Please note critical events like killing the server process followed by Restart with CrashRecovery are not involved.. I have some "rr" trace for the tree origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08 which is based on actual 10.5 but contains some non final changes of Marko. Per Marko the failure cannot be caused by his changes. My test machinery has seen the same problem also on 10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00 but the corresponding details are lost. And so its unknown if that was during concurrent load or after crash with recovery. RQG git clone https://github.com/mleich1/rqg --branch experimental RQG origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00 perl rqg.pl \ --grammar=conf/mariadb/table_stress_innodb_nocopy.yy \ --gendata=conf/mariadb/table_stress.zz \ --gendata_sql=conf/mariadb/table_stress.sql \ --mysqld=--innodb_use_native_aio=1 \ --mysqld=--innodb_lock_schedule_algorithm=fcfs \ --mysqld=--loose-idle_write_transaction_timeout=0 \ --mysqld=--loose-idle_transaction_timeout=0 \ --mysqld=--loose-idle_readonly_transaction_timeout=0 \ --mysqld=--connect_timeout=60 \ --mysqld=--interactive_timeout=28800 \ --mysqld=--slave_net_timeout=60 \ --mysqld=--net_read_timeout=30 \ --mysqld=--net_write_timeout=60 \ --mysqld=--loose-table_lock_wait_timeout=50 \ --mysqld=--wait_timeout=28800 \ --mysqld=--lock-wait-timeout=86400 \ --mysqld=--innodb-lock-wait-timeout=50 \ --no-mask \ --queries=10000000 \ --seed=random \ --reporters=Backtrace \ --reporters=ErrorLog \ --reporters=Deadlock1 \ --validators=None \ --mysqld=--log_output=none \ --mysqld=--log-bin \ --mysqld=--log_bin_trust_function_creators=1 \ --mysqld=--loose-max-statement-time=30 \ --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ --engine=InnoDB \ --restart_timeout=120 \ --duration=300 \ --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \ --mysqld=--innodb_stats_persistent=off \ --threads=33 \ --mysqld=--innodb_page_size=4K \ --mysqld=--innodb-buffer-pool-size=256M \ --duration=300 \ --no_mask \ ... certain local settings ... If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells that running that running a work flow invoking kill+restart was planned. But the data corruption was hit before the RQG reporter "CrashRecovery1" would kill the server process. {noformat}

Matthias Leich made changes - 2020-06-17 12:32

Assignee	Matthias Leich [ mleich ]	Marko Mäkelä [ marko ]
Summary	Draft: Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.	Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.

Matthias Leich made changes - 2020-07-15 15:57

Labels

rr-profile

need_rr

Matthias Leich made changes - 2020-07-15 15:57

Comment

[ A comment with security level 'Developers' was removed. ]

Matthias Leich made changes - 2020-07-24 16:38

Labels

need_rr

rr-profile

Marko Mäkelä added a comment - 2020-07-30 11:18

In the trace that I analyzed, both the clustered index and the secondary index page consist of a single page. There are 7 records in the clustered index (PRIMARY KEY(col1)) and 4 in the secondary index.
Secondary index:(col1,col_string)=
(1,'1111111111'),
(2,'2222222222'),
(332,'3333333333'),
(7,'1111111111').
None of the records are delete-marked. The first record was inserted last in the record heap; the others were in order. The PAGE_FREE list is empty. All this would suggest that the index was created afterwards and not at table creation.

Clustered index: (col1,DB_TRX_ID,DB_ROLL_PTR,col2,col_int,col_string,col_text)=
(1,0,1<<55,1,1,'1111111111',blob),
(2,0,1<<55,2,2,'2222222222',blob),
(8,0,1<<55,7,8,'8888888888',blob),
(40,0,1<<55,39,40,'4444444444',blob),
(332,0,1<<55,332,324,'3333333333',blob),
(374,0,1<<55,1,373,'3333333333',blob),
(1962,0,1<<55,1,324,'1111111111',blob).
Again, no records are delete-marked, and the PAGE_FREE list is empty.

There are not only missing records in the secondary index, but also an extra record col1=7, even though all history has been purged from the clustered index (DB_TRX_ID=0 was reset by ~~MDEV-12288~~). Extra records in the secondary index are only allowed when they correspond to an old version of the clustered index records. Such records would also be delete-marked in the secondary index, unless the DELETE operation is in progress.

In other words, this corruption could have manifested itself as ~~MDEV-9663~~/~~MDEV-14643~~ messages as well. Unfortunately, the EXTENDED attribute of CHECK TABLE is being ignored by InnoDB. It would be good if it actually did what is requested: check that there is a 1:1 correspondence between the secondary index and clustered index records.

Watchpoints on PAGE_N_RECS on the two pages will let us know what exactly happened:

PRIMARY KEY(col1)=8 is inserted
ROLLBACK will empty the table
PRIMARY KEY(col1)=1486 is inserted
ROLLBACK will empty the table
PRIMARY KEY(col1)=332 is inserted
PRIMARY KEY(col1)=1962 is inserted
PRIMARY KEY(col1)=374 is inserted
PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is inserted in a new transaction
PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is deleted by ROLLBACK
At event 51548, PRIMARY KEY(col1,DB_TRX_ID)=(0,0x28) is inserted
PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is deleted by ROLLBACK
…
At event 51578, COMMIT invokes MDL_context::release_transactional_locks() for transaction 0x28 even though it is still active!
At event 55952, ha_innobase::prepare_inplace_alter_table() is invoked, even though transaction 0x28 is still active with 11 undo log records
At event 56196, the secondary index page is initialized with 3 records.

I think that we have seen enough. Here is the stack trace for what I think is the smoking gun:

10.5 c89366866bca2df46b0592719a1f6b6dabf470cba
Thread 1 hit Breakpoint 6, MDL_context::release_transactional_locks (this=this@entry=0x62b0000bd338) at /home/mleich/10.5/sql/mdl.cc:3043
3043 {
(rr) when
Current event: 51578
(rr) bt
#0 MDL_context::release_transactional_locks (this=this@entry=0x62b0000bd338) at /home/mleich/10.5/sql/mdl.cc:3043
#1 0x000055c67c48c8c8 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5585
#2 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false,
is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993
#3 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 " COMMIT /* E_R Thread2 QNO 99 CON_ID 15 */ ",
packet_length=packet_length@entry=44, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866
…
(rr) up
#1 0x000055c67c48c8c8 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5585
5585 thd->mdl_context.release_transactional_locks();
(rr) list
5580 lex->tx_chain != TVL_NO));
5581 bool tx_release= (lex->tx_release == TVL_YES \|\|
5582 (thd->variables.completion_type == 2 &&
5583 lex->tx_release != TVL_NO));
5584 bool commit_failed= trans_commit(thd);
5585 thd->mdl_context.release_transactional_locks();
5586 if (commit_failed)
5587 {
5588 WSREP_DEBUG("COMMIT failed, MDL released: %lld",
5589 (longlong) thd->thread_id);
(rr) i lo
tx_chain = false
tx_release = false
commit_failed = true
…

This looks like the ~~MDEV-15532~~/~~MDEV-22076~~ scenario: COMMIT is refused for an XA transaction, but yet we wrongly release the MDL that should keep protecting the transaction.

mleich, even though the table has a nice small number of records, I do not think that it is useful to test the combination of XA+DDL+DML before those bugs have been addressed.

Please try to reproduce this kind of corruption without XA, or DML with XA (but no concurrent DDL). I do not think that there are many users of XA yet, because it was practically broken before ~~MDEV-742~~ was addressed in 10.5.

Marko Mäkelä added a comment - 2020-07-30 11:18 In the trace that I analyzed, both the clustered index and the secondary index page consist of a single page. There are 7 records in the clustered index ( PRIMARY KEY(col1) ) and 4 in the secondary index. Secondary index:(col1,col_string)= (1,'1111111111'), (2,'2222222222'), (332,'3333333333'), (7,'1111111111'). None of the records are delete-marked. The first record was inserted last in the record heap; the others were in order. The PAGE_FREE list is empty. All this would suggest that the index was created afterwards and not at table creation. Clustered index: (col1,DB_TRX_ID,DB_ROLL_PTR,col2,col_int,col_string,col_text)= (1,0,1<<55,1,1,'1111111111',blob), (2,0,1<<55,2,2,'2222222222',blob), (8,0,1<<55,7,8,'8888888888',blob), (40,0,1<<55,39,40,'4444444444',blob), (332,0,1<<55,332,324,'3333333333',blob), (374,0,1<<55,1,373,'3333333333',blob), (1962,0,1<<55,1,324,'1111111111',blob). Again, no records are delete-marked, and the PAGE_FREE list is empty. There are not only missing records in the secondary index, but also an extra record col1=7, even though all history has been purged from the clustered index (DB_TRX_ID=0 was reset by MDEV-12288 ). Extra records in the secondary index are only allowed when they correspond to an old version of the clustered index records. Such records would also be delete-marked in the secondary index, unless the DELETE operation is in progress. In other words, this corruption could have manifested itself as MDEV-9663 / MDEV-14643 messages as well. Unfortunately, the EXTENDED attribute of CHECK TABLE is being ignored by InnoDB. It would be good if it actually did what is requested: check that there is a 1:1 correspondence between the secondary index and clustered index records. Watchpoints on PAGE_N_RECS on the two pages will let us know what exactly happened: PRIMARY KEY(col1)=8 is inserted ROLLBACK will empty the table PRIMARY KEY(col1)=1486 is inserted ROLLBACK will empty the table PRIMARY KEY(col1)=332 is inserted PRIMARY KEY(col1)=1962 is inserted PRIMARY KEY(col1)=374 is inserted PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is inserted in a new transaction PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is deleted by ROLLBACK At event 51548, PRIMARY KEY(col1,DB_TRX_ID)=(0,0x28) is inserted PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is deleted by ROLLBACK … At event 51578, COMMIT invokes MDL_context::release_transactional_locks() for transaction 0x28 even though it is still active! At event 55952, ha_innobase::prepare_inplace_alter_table() is invoked, even though transaction 0x28 is still active with 11 undo log records At event 56196, the secondary index page is initialized with 3 records. I think that we have seen enough. Here is the stack trace for what I think is the smoking gun: 10.5 c89366866bca2df46b0592719a1f6b6dabf470cba Thread 1 hit Breakpoint 6, MDL_context::release_transactional_locks (this=this@entry=0x62b0000bd338) at /home/mleich/10.5/sql/mdl.cc:3043 3043 { (rr) when Current event: 51578 (rr) bt #0 MDL_context::release_transactional_locks (this=this@entry=0x62b0000bd338) at /home/mleich/10.5/sql/mdl.cc:3043 #1 0x000055c67c48c8c8 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5585 #2 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993 #3 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 " COMMIT /* E_R Thread2 QNO 99 CON_ID 15 */ ", packet_length=packet_length@entry=44, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866 … (rr) up #1 0x000055c67c48c8c8 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5585 5585 thd->mdl_context.release_transactional_locks(); (rr) list 5580 lex->tx_chain != TVL_NO)); 5581 bool tx_release= (lex->tx_release == TVL_YES || 5582 (thd->variables.completion_type == 2 && 5583 lex->tx_release != TVL_NO)); 5584 bool commit_failed= trans_commit(thd); 5585 thd->mdl_context.release_transactional_locks(); 5586 if (commit_failed) 5587 { 5588 WSREP_DEBUG("COMMIT failed, MDL released: %lld", 5589 (longlong) thd->thread_id); (rr) i lo tx_chain = false tx_release = false commit_failed = true … This looks like the MDEV-15532 / MDEV-22076 scenario: COMMIT is refused for an XA transaction, but yet we wrongly release the MDL that should keep protecting the transaction. mleich , even though the table has a nice small number of records, I do not think that it is useful to test the combination of XA+DDL+DML before those bugs have been addressed. Please try to reproduce this kind of corruption without XA, or DML with XA (but no concurrent DDL). I do not think that there are many users of XA yet, because it was practically broken before MDEV-742 was addressed in 10.5.

Marko Mäkelä made changes - 2020-07-30 11:18

Labels

rr-profile

rr-profile-analyzed

Marko Mäkelä made changes - 2020-07-30 11:20

Labels

rr-profile-analyzed

need_rr rr-profile-analyzed

Marko Mäkelä added a comment - 2020-07-30 11:56

mleich, I am sorry for blaming you. We do actually have XA transactions, but they are internal XA transactions for the binlog. Here are a couple examples from the same rr replay trace:

Thread 3 hit Breakpoint 1, innobase_xa_prepare (hton=0x615000001e98, thd=0x62b0000bd218, prepare_trx=false) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:17190

17190	{

(rr) bt

#0  innobase_xa_prepare (hton=0x615000001e98, thd=0x62b0000bd218, prepare_trx=false) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:17190

#1  0x000055c67cd50c8b in prepare_or_error (ht=ht@entry=0x615000001e98, thd=thd@entry=0x62b0000bd218, all=all@entry=false) at /home/mleich/10.5/sql/handler.cc:1364

#2  0x000055c67cd54297 in ha_commit_trans (thd=thd@entry=0x62b0000bd218, all=all@entry=false) at /home/mleich/10.5/sql/handler.cc:1742

#3  0x000055c67c94e010 in trans_commit_stmt (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:472

#4  0x000055c67c48fee7 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:6011

#5  0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false,

    is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993

#6  0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218,

    packet=packet@entry=0x629000ae1219 " INSERT INTO t1 (col1,col2, col_int, col_string, col_text) VALUES ( 8, 8, 8 - 1, REPEAT(SUBSTR(CAST( 8 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 8 AS CHAR),1,1), @fill_amount) ) /* E_R Thread2 QNO 36 CO"..., packet_length=packet_length@entry=211, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866

#7  0x000055c67c4676ab in do_command (thd=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:1347

#8  0x000055c67c907775 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x60800000cf38, put_in_cache=put_in_cache@entry=true) at /home/mleich/10.5/sql/sql_connect.cc:1411

#9  0x000055c67c908a19 in handle_one_connection (arg=arg@entry=0x60800000cf38) at /home/mleich/10.5/sql/sql_connect.cc:1313

#10 0x000055c67d682162 in pfs_spawn_thread (arg=0x615000008f18) at /home/mleich/10.5/storage/perfschema/pfs.cc:2201

#11 0x00007ffd78caf6db in start_thread (arg=0x443e7feb0300) at pthread_create.c:463

#12 0x00001f030f433a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(rr) when

Current event: 48560

(rr) continue

…

Thread 14 hit Breakpoint 5, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256

256	{

(rr) when

Current event: 51578

(rr) bt

#0  trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256

#1  0x000055c67c48c8af in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5584

#2  0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false,

    is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993

#3  0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 " COMMIT  /* E_R Thread2 QNO 99 CON_ID 15 */ ",

    packet_length=packet_length@entry=44, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866

#4  0x000055c67c4676ab in do_command (thd=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:1347

#5  0x000055c67c907775 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x60800000cf38, put_in_cache=put_in_cache@entry=true) at /home/mleich/10.5/sql/sql_connect.cc:1411

#6  0x000055c67c908a19 in handle_one_connection (arg=arg@entry=0x60800000cf38) at /home/mleich/10.5/sql/sql_connect.cc:1313

#7  0x000055c67d682162 in pfs_spawn_thread (arg=0x615000008f18) at /home/mleich/10.5/storage/perfschema/pfs.cc:2201

#8  0x00007ffd78caf6db in start_thread (arg=0x443e7feb0300) at pthread_create.c:463

#9  0x00001f030f433a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The trans_commit() fails because trans_check() fails due to thd->transaction->xid_state.is_explicit_XA().

I find it hard to follow the control flow. Watchpoints are not useful, because the flags seem to be "disabled by default". For example, trans->no_2pc was never set, and I would expect the same to hold for the thd->transaction->xid_state.xid_cache_element.

If this corruption is not repeatable with --skip-log-bin, that would explain why mikhail.izioumtchenko@booking.com never repeated this corruption for MySQL when we were employed by Oracle.

Marko Mäkelä added a comment - 2020-07-30 11:56 mleich , I am sorry for blaming you. We do actually have XA transactions, but they are internal XA transactions for the binlog. Here are a couple examples from the same rr replay trace: Thread 3 hit Breakpoint 1, innobase_xa_prepare (hton=0x615000001e98, thd=0x62b0000bd218, prepare_trx=false) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:17190 17190 { (rr) bt #0 innobase_xa_prepare (hton=0x615000001e98, thd=0x62b0000bd218, prepare_trx=false) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:17190 #1 0x000055c67cd50c8b in prepare_or_error (ht=ht@entry=0x615000001e98, thd=thd@entry=0x62b0000bd218, all=all@entry=false) at /home/mleich/10.5/sql/handler.cc:1364 #2 0x000055c67cd54297 in ha_commit_trans (thd=thd@entry=0x62b0000bd218, all=all@entry=false) at /home/mleich/10.5/sql/handler.cc:1742 #3 0x000055c67c94e010 in trans_commit_stmt (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:472 #4 0x000055c67c48fee7 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:6011 #5 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993 #6 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 " INSERT INTO t1 (col1,col2, col_int, col_string, col_text) VALUES ( 8, 8, 8 - 1, REPEAT(SUBSTR(CAST( 8 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 8 AS CHAR),1,1), @fill_amount) ) /* E_R Thread2 QNO 36 CO"..., packet_length=packet_length@entry=211, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866 #7 0x000055c67c4676ab in do_command (thd=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:1347 #8 0x000055c67c907775 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x60800000cf38, put_in_cache=put_in_cache@entry=true) at /home/mleich/10.5/sql/sql_connect.cc:1411 #9 0x000055c67c908a19 in handle_one_connection (arg=arg@entry=0x60800000cf38) at /home/mleich/10.5/sql/sql_connect.cc:1313 #10 0x000055c67d682162 in pfs_spawn_thread (arg=0x615000008f18) at /home/mleich/10.5/storage/perfschema/pfs.cc:2201 #11 0x00007ffd78caf6db in start_thread (arg=0x443e7feb0300) at pthread_create.c:463 #12 0x00001f030f433a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (rr) when Current event: 48560 (rr) continue … Thread 14 hit Breakpoint 5, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) when Current event: 51578 (rr) bt #0 trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 #1 0x000055c67c48c8af in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5584 #2 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993 #3 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 " COMMIT /* E_R Thread2 QNO 99 CON_ID 15 */ ", packet_length=packet_length@entry=44, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866 #4 0x000055c67c4676ab in do_command (thd=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:1347 #5 0x000055c67c907775 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x60800000cf38, put_in_cache=put_in_cache@entry=true) at /home/mleich/10.5/sql/sql_connect.cc:1411 #6 0x000055c67c908a19 in handle_one_connection (arg=arg@entry=0x60800000cf38) at /home/mleich/10.5/sql/sql_connect.cc:1313 #7 0x000055c67d682162 in pfs_spawn_thread (arg=0x615000008f18) at /home/mleich/10.5/storage/perfschema/pfs.cc:2201 #8 0x00007ffd78caf6db in start_thread (arg=0x443e7feb0300) at pthread_create.c:463 #9 0x00001f030f433a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 The trans_commit() fails because trans_check() fails due to thd->transaction->xid_state.is_explicit_XA() . I find it hard to follow the control flow. Watchpoints are not useful, because the flags seem to be "disabled by default". For example, trans->no_2pc was never set, and I would expect the same to hold for the thd->transaction->xid_state.xid_cache_element . If this corruption is not repeatable with --skip-log-bin , that would explain why mikhail.izioumtchenko@booking.com never repeated this corruption for MySQL when we were employed by Oracle.

Marko Mäkelä made changes - 2020-07-30 11:56

Link

This issue relates to ~~MDEV-15532~~ [ ~~MDEV-15532~~ ]

Marko Mäkelä made changes - 2020-07-30 11:56

Link

This issue relates to ~~MDEV-22076~~ [ ~~MDEV-22076~~ ]

Marko Mäkelä made changes - 2020-07-30 11:56

Link

This issue relates to ~~MDEV-9663~~ [ ~~MDEV-9663~~ ]

Marko Mäkelä made changes - 2020-07-30 11:56

Component/s		Replication [ 10100 ]
Labels	need_rr rr-profile-analyzed	rr-profile

Marko Mäkelä made changes - 2020-07-30 12:35

Link

This issue relates to ~~MDEV-22373~~ [ ~~MDEV-22373~~ ]

Marko Mäkelä added a comment - 2020-07-30 12:52

Thanks to sujatha.sivakumar, I verified with one more breakpoint that the failure looks like a duplicate of the ~~MDEV-15532~~ scenario (explicit use of XA START, with wrong COMMIT. Sorry for the false alarm.
mleich, please try again without XA.

Thread 2 hit Breakpoint 1, xid_cache_insert (thd=thd@entry=0x62b0000cb218,

    xid_state=0x62b0000ce838, xid=0x62b0000d2350)

    at /home/mleich/10.5/sql/xa.cc:286

286	{

(rr) when

Current event: 49759

(rr) bt

#0  xid_cache_insert (thd=thd@entry=0x62b0000cb218, xid_state=0x62b0000ce838,

    xid=0x62b0000d2350) at /home/mleich/10.5/sql/xa.cc:286

#1  0x000055c67cb683d0 in trans_xa_start (thd=thd@entry=0x62b0000cb218)

    at /home/mleich/10.5/sql/xa.cc:453

#2  0x000055c67c48e1fa in mysql_execute_command (thd=thd@entry=0x62b0000cb218)

    at /home/mleich/10.5/sql/sql_parse.cc:5793

#3  0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000cb218,

    rawbuf=<optimized out>, length=<optimized out>,

    parser_state=parser_state@entry=0x64a36dd61ff0,

    is_com_multi=is_com_multi@entry=false,

    is_next_command=is_next_command@entry=false)

    at /home/mleich/10.5/sql/sql_parse.cc:7993

#4  0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY,

    thd=thd@entry=0x62b0000cb218,

    packet=packet@entry=0x629000ac3219 "XA START 'xid96'  /* E_R Thread1 QNO 40 CON_ID 14 */ ", packet_length=packet_length@entry=53,

…

(rr) break trans_commit

Breakpoint 2 at 0x55c67c94c6b7: file /home/mleich/10.5/sql/transaction.cc, line 256.

(rr) c

Continuing.

[Switching to Thread 26146.38140]

Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

(rr) when

Current event: 49854

(rr) c

Continuing.

[Switching to Thread 26146.39212]

Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

(rr) when

Current event: 49896

(rr) bt

#0  trans_commit (thd=thd@entry=0x62b0000cb218)

    at /home/mleich/10.5/sql/transaction.cc:256

#1  0x000055c67c48c8af in mysql_execute_command (thd=thd@entry=0x62b0000cb218)

    at /home/mleich/10.5/sql/sql_parse.cc:5584

#2  0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000cb218,

    rawbuf=<optimized out>, length=<optimized out>,

    parser_state=parser_state@entry=0x64a36dd61ff0,

    is_com_multi=is_com_multi@entry=false,

    is_next_command=is_next_command@entry=false)

    at /home/mleich/10.5/sql/sql_parse.cc:7993

#3  0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY,

    thd=thd@entry=0x62b0000cb218,

    packet=packet@entry=0x629000ac3219 " COMMIT  /* E_R Thread1 QNO 44 CON_ID 14 */ ", packet_length=packet_length@entry=44,

    is_com_multi=is_com_multi@entry=false,

    is_next_command=is_next_command@entry=false)

    at /home/mleich/10.5/sql/sql_parse.cc:1866

…

(rr) n

258	  DBUG_ENTER("trans_commit");

(rr)

260	  if (trans_check(thd))

(rr)

261	    DBUG_RETURN(TRUE);

According to this, the thread first executed XA START and then the same thread attempted to execute COMMIT instead of an XA statement. This is exactly the ~~MDEV-15532~~ scenario.

Marko Mäkelä added a comment - 2020-07-30 12:52 Thanks to sujatha.sivakumar , I verified with one more breakpoint that the failure looks like a duplicate of the MDEV-15532 scenario (explicit use of XA START , with wrong COMMIT . Sorry for the false alarm. mleich , please try again without XA. Thread 2 hit Breakpoint 1, xid_cache_insert (thd=thd@entry=0x62b0000cb218, xid_state=0x62b0000ce838, xid=0x62b0000d2350) at /home/mleich/10.5/sql/xa.cc:286 286 { (rr) when Current event: 49759 (rr) bt #0 xid_cache_insert (thd=thd@entry=0x62b0000cb218, xid_state=0x62b0000ce838, xid=0x62b0000d2350) at /home/mleich/10.5/sql/xa.cc:286 #1 0x000055c67cb683d0 in trans_xa_start (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/xa.cc:453 #2 0x000055c67c48e1fa in mysql_execute_command (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/sql_parse.cc:5793 #3 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000cb218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x64a36dd61ff0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993 #4 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000cb218, packet=packet@entry=0x629000ac3219 "XA START 'xid96' /* E_R Thread1 QNO 40 CON_ID 14 */ ", packet_length=packet_length@entry=53, … (rr) break trans_commit Breakpoint 2 at 0x55c67c94c6b7: file /home/mleich/10.5/sql/transaction.cc, line 256. (rr) c Continuing. [Switching to Thread 26146.38140] Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) when Current event: 49854 (rr) c Continuing. [Switching to Thread 26146.39212] Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) when Current event: 49896 (rr) bt #0 trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 #1 0x000055c67c48c8af in mysql_execute_command (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/sql_parse.cc:5584 #2 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000cb218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x64a36dd61ff0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993 #3 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000cb218, packet=packet@entry=0x629000ac3219 " COMMIT /* E_R Thread1 QNO 44 CON_ID 14 */ ", packet_length=packet_length@entry=44, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866 … (rr) n 258 DBUG_ENTER("trans_commit"); (rr) 260 if (trans_check(thd)) (rr) 261 DBUG_RETURN(TRUE); According to this, the thread first executed XA START and then the same thread attempted to execute COMMIT instead of an XA statement. This is exactly the MDEV-15532 scenario.

Marko Mäkelä made changes - 2020-07-30 12:52

Component/s	Replication [ 10100 ]
Labels	rr-profile	need_rr rr-profile-analyzed

Marko Mäkelä added a comment - 2020-07-30 13:07 - edited

I posted an earlier XA START;COMMIT pair than the one that involves the transaction 0x28. Here is the correct pair that explains why the COMMIT fails:

Thread 14 hit Breakpoint 1, xid_cache_insert (thd=thd@entry=0x62b0000bd218,

    xid_state=0x62b0000c0838, xid=0x62b0000c4350)

    at /home/mleich/10.5/sql/xa.cc:286

286	{

(rr) when

Current event: 50027

(rr) bt

#0  xid_cache_insert (thd=thd@entry=0x62b0000bd218, xid_state=0x62b0000c0838,

    xid=0x62b0000c4350) at /home/mleich/10.5/sql/xa.cc:286

#1  0x000055c67cb683d0 in trans_xa_start (thd=thd@entry=0x62b0000bd218)

    at /home/mleich/10.5/sql/xa.cc:453

#2  0x000055c67c48e1fa in mysql_execute_command (thd=thd@entry=0x62b0000bd218)

    at /home/mleich/10.5/sql/sql_parse.cc:5793

#3  0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218,

    rawbuf=<optimized out>, length=<optimized out>,

    parser_state=parser_state@entry=0x443e7feadff0,

    is_com_multi=is_com_multi@entry=false,

    is_next_command=is_next_command@entry=false)

    at /home/mleich/10.5/sql/sql_parse.cc:7993

#4  0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY,

    thd=thd@entry=0x62b0000bd218,

    packet=packet@entry=0x629000ae1219 "XA START 'xid164'  /* E_R Thread2 QNO 62 CON_ID 15 */ ", packet_length=packet_length@entry=54,

    is_com_multi=is_com_multi@entry=false,

    is_next_command=is_next_command@entry=false)

    at /home/mleich/10.5/sql/sql_parse.cc:1866

…

(rr) break trans_commit

Breakpoint 2 at 0x55c67c94c6b7: file /home/mleich/10.5/sql/transaction.cc, line 256.

(rr) c

Continuing.

Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

(rr) when

Current event: 50104

(rr) c

Continuing.

[Switching to Thread 26146.39212]

Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

(rr) when

Current event: 50421

(rr) c

Continuing.

Continuing.

[Switching to Thread 26146.38140]

Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

(rr) when

Current event: 50475

(rr) c

Continuing.

[Switching to Thread 26146.39212]

Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

(rr) command 2

Type commands for breakpoint(s) 2, one per line.

End with a line saying just "end".

>when

>end

(rr) c

Continuing.

Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

Current event: 50613

(rr) c

Continuing.

[Switching to Thread 26146.38140]

Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

Current event: 51061

(rr) c

Continuing.

[Switching to Thread 26146.39212]

Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

Current event: 51084

(rr)

Continuing.

[Switching to Thread 26146.38140]

Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

Current event: 51248

(rr)

Continuing.

[Switching to Thread 26146.39212]

Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

Current event: 51299

(rr)

Continuing.

[Switching to Thread 26146.38140]

Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

Current event: 51463

(rr)

Continuing.

[Switching to Thread 26146.39212]

Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

Current event: 51516

(rr)

Continuing.

[Switching to Thread 26146.38140]

Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)

    at /home/mleich/10.5/sql/transaction.cc:256

256	{

Current event: 51578

If you search for Thread 14 in the above, you can see that the XA START 'xid164' was followed by multiple incorrect COMMIT. The invalid MDL release should have occurred already at event 50475, and not 51578, like I claimed earlier. Either way, the ha_innobase::prepare_inplace_alter_table() is able to proceed at event 55952 even though the conflicting transaction is still active inside InnoDB.

Marko Mäkelä added a comment - 2020-07-30 13:07 - edited I posted an earlier XA START;COMMIT pair than the one that involves the transaction 0x28. Here is the correct pair that explains why the COMMIT fails: Thread 14 hit Breakpoint 1, xid_cache_insert (thd=thd@entry=0x62b0000bd218, xid_state=0x62b0000c0838, xid=0x62b0000c4350) at /home/mleich/10.5/sql/xa.cc:286 286 { (rr) when Current event: 50027 (rr) bt #0 xid_cache_insert (thd=thd@entry=0x62b0000bd218, xid_state=0x62b0000c0838, xid=0x62b0000c4350) at /home/mleich/10.5/sql/xa.cc:286 #1 0x000055c67cb683d0 in trans_xa_start (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/xa.cc:453 #2 0x000055c67c48e1fa in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5793 #3 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993 #4 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 "XA START 'xid164' /* E_R Thread2 QNO 62 CON_ID 15 */ ", packet_length=packet_length@entry=54, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866 … (rr) break trans_commit Breakpoint 2 at 0x55c67c94c6b7: file /home/mleich/10.5/sql/transaction.cc, line 256. (rr) c Continuing. Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) when Current event: 50104 (rr) c Continuing. [Switching to Thread 26146.39212] Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) when Current event: 50421 (rr) c Continuing. Continuing. [Switching to Thread 26146.38140] Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) when Current event: 50475 (rr) c Continuing. [Switching to Thread 26146.39212] Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) command 2 Type commands for breakpoint(s) 2, one per line. End with a line saying just "end". >when >end (rr) c Continuing. Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 50613 (rr) c Continuing. [Switching to Thread 26146.38140] Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51061 (rr) c Continuing. [Switching to Thread 26146.39212] Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51084 (rr) Continuing. [Switching to Thread 26146.38140] Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51248 (rr) Continuing. [Switching to Thread 26146.39212] Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51299 (rr) Continuing. [Switching to Thread 26146.38140] Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51463 (rr) Continuing. [Switching to Thread 26146.39212] Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51516 (rr) Continuing. [Switching to Thread 26146.38140] Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51578 If you search for Thread 14 in the above, you can see that the XA START 'xid164' was followed by multiple incorrect COMMIT . The invalid MDL release should have occurred already at event 50475, and not 51578, like I claimed earlier. Either way, the ha_innobase::prepare_inplace_alter_table() is able to proceed at event 55952 even though the conflicting transaction is still active inside InnoDB.

Matthias Leich added a comment - 2020-07-30 13:38

Sorry for confusion

The rr trace offered belongs to some different RQG run than the run mentioned on top (002082.log).

Both runs show "Warning InnoDB: Index 'Marvão_idx3' contains .... entries, should be ..." but

- in the first run (002082.log) the grammar does not generate XA commands

- in the run with rr trace XA commands are generated.

I will make new rr traces with the first grammar.

Matthias Leich added a comment - 2020-07-30 13:38 Sorry for confusion The rr trace offered belongs to some different RQG run than the run mentioned on top (002082.log). Both runs show "Warning InnoDB: Index 'Marvão_idx3' contains .... entries, should be ..." but - in the first run (002082.log) the grammar does not generate XA commands - in the run with rr trace XA commands are generated. I will make new rr traces with the first grammar.

Marko Mäkelä made changes - 2020-08-03 08:27

Link

This issue relates to ~~MDEV-23192~~ [ ~~MDEV-23192~~ ]

Marko Mäkelä made changes - 2020-08-07 15:39

Link

This issue blocks MENT-751 [ MENT-751 ]

Marko Mäkelä made changes - 2020-08-07 15:40

Link

This issue blocks ~~MDEV-23192~~ [ ~~MDEV-23192~~ ]

Marko Mäkelä made changes - 2020-08-07 16:12

Link

This issue blocks ~~MDEV-14643~~ [ ~~MDEV-14643~~ ]

Marko Mäkelä made changes - 2020-08-07 16:14

Link

This issue blocks MDEV-22725 [ MDEV-22725 ]

Marko Mäkelä made changes - 2020-08-07 16:52

Link

This issue blocks ~~MDEV-9530~~ [ ~~MDEV-9530~~ ]

Marko Mäkelä made changes - 2020-08-07 17:51

Link

This issue relates to ~~MDEV-23192~~ [ ~~MDEV-23192~~ ]

Marko Mäkelä made changes - 2020-08-25 06:08

Link

This issue blocks ~~MDEV-23563~~ [ ~~MDEV-23563~~ ]

Marko Mäkelä added a comment - 2020-08-25 06:22

For several weeks, we had lost access to the environment where this bug could be reproduced. We not only need to reproduce this extremely rare scenario, but we also need to reproduce it while the server is running under rr record, so that the execution trace can be analyzed. It turns out that https://rr-project.org/ does not work in most virtual machine environments.

Marko Mäkelä added a comment - 2020-08-25 06:22 For several weeks, we had lost access to the environment where this bug could be reproduced. We not only need to reproduce this extremely rare scenario, but we also need to reproduce it while the server is running under rr record , so that the execution trace can be analyzed. It turns out that https://rr-project.org/ does not work in most virtual machine environments.

Marko Mäkelä made changes - 2020-08-25 07:20

Link

This issue relates to ~~MDEV-9663~~ [ ~~MDEV-9663~~ ]

Marko Mäkelä made changes - 2020-08-25 07:29

Link

This issue blocks ~~MDEV-9663~~ [ ~~MDEV-9663~~ ]

Julien Fritsch made changes - 2020-08-25 09:44

Priority

Major [ 3 ]

Critical [ 2 ]

Julien Fritsch made changes - 2020-08-25 09:44

Labels

need_rr rr-profile-analyzed

need_feedback need_rr rr-profile-analyzed

Julien Fritsch made changes - 2020-08-26 09:14

Link

This issue is blocked by TODO-2490 [ TODO-2490 ]

Matthias Leich made changes - 2020-08-28 11:50

Description

{noformat}
Index corruption found during RQG testing
Some CHECK TABLE t2 EXTENDED harvested
test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.
And the server error log contains a
[rr 76197 241886]2020-06-16 18:31:54 33 [ERROR] InnoDB: Index `MarvÃ£o_idx3` of table `test`.`t2` is corrupted
[rr 76197 241890]2020-06-16 18:31:54 33 [Note] InnoDB: Load corrupted index `MarvÃ£o_idx3` of table `test`.`t2`

Work flow:
1. Start the server and generate some initial data
2. Several concurrent sessions runs a DDL/DML mix
and sometimes CHECK TABLE EXTENDED
Please note critical events like killing the server process followed by Restart
with CrashRecovery are not involved..

I have some "rr" trace for the tree
origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08
which is based on actual 10.5 but contains some non final changes of Marko.
Per Marko the failure cannot be caused by his changes.

My test machinery has seen the same problem also on
10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00
but the corresponding details are lost.
And so its unknown if that was during concurrent load or after crash with recovery.

RQG
git clone https://github.com/mleich1/rqg --branch experimental RQG
origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00

perl rqg.pl \
--grammar=conf/mariadb/table_stress_innodb_nocopy.yy \
--gendata=conf/mariadb/table_stress.zz \
--gendata_sql=conf/mariadb/table_stress.sql \
--mysqld=--innodb_use_native_aio=1 \
--mysqld=--innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--net_read_timeout=30 \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--wait_timeout=28800 \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--innodb-lock-wait-timeout=50 \
--no-mask \
--queries=10000000 \
--seed=random \
--reporters=Backtrace \
--reporters=ErrorLog \
--reporters=Deadlock1 \
--validators=None \
--mysqld=--log_output=none \
--mysqld=--log-bin \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--loose-max-statement-time=30 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--engine=InnoDB \
--restart_timeout=120 \
--duration=300 \
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
--mysqld=--innodb_stats_persistent=off \
--threads=33 \
--mysqld=--innodb_page_size=4K \
--mysqld=--innodb-buffer-pool-size=256M \
--duration=300 \
--no_mask \
... certain local settings ...

If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells that running
that running a work flow invoking kill+restart was planned.
But the data corruption was hit before the RQG reporter "CrashRecovery1"
would kill the server process.
{noformat}

{noformat}
Index corruption found during RQG testing
Some CHECK TABLE t2 EXTENDED harvested
test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.
And the server error log contains a
[rr 76197 241886]2020-06-16 18:31:54 33 [ERROR] InnoDB: Index `MarvÃ£o_idx3` of table `test`.`t2` is corrupted
[rr 76197 241890]2020-06-16 18:31:54 33 [Note] InnoDB: Load corrupted index `MarvÃ£o_idx3` of table `test`.`t2`

Work flow:
1. Start the server and generate some initial data
2. Several concurrent sessions runs a DDL/DML mix
and sometimes CHECK TABLE EXTENDED
Please note critical events like killing the server process followed by Restart
with CrashRecovery are not involved..

I have some "rr" trace for the tree
origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08
which is based on actual 10.5 but contains some non final changes of Marko.
Per Marko the failure cannot be caused by his changes.

My test machinery has seen the same problem also on
10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00
but the corresponding details are lost.
And so its unknown if that was during concurrent load or after crash with recovery.

RQG
git clone https://github.com/mleich1/rqg --branch experimental RQG
origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00

perl rqg.pl \
--grammar=conf/mariadb/table_stress_innodb_nocopy.yy \
--gendata=conf/mariadb/table_stress.zz \
--gendata_sql=conf/mariadb/table_stress.sql \
--mysqld=--innodb_use_native_aio=1 \
--mysqld=--innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--net_read_timeout=30 \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--wait_timeout=28800 \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--innodb-lock-wait-timeout=50 \
--no-mask \
--queries=10000000 \
--seed=random \
--reporters=Backtrace \
--reporters=ErrorLog \
--reporters=Deadlock1 \
--validators=None \
--mysqld=--log_output=none \
--mysqld=--log-bin \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--loose-max-statement-time=30 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--engine=InnoDB \
--restart_timeout=120 \
--duration=300 \
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
--mysqld=--innodb_stats_persistent=off \
--threads=33 \
--mysqld=--innodb_page_size=4K \
--mysqld=--innodb-buffer-pool-size=256M \
--duration=300 \
--no_mask \
... certain local settings ...

If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells
that running a work flow invoking kill+restart was planned.
But the data corruption was hit before the RQG reporter "CrashRecovery1"
would kill the server process.
{noformat}

Matthias Leich made changes - 2020-08-28 14:04

Labels

need_feedback need_rr rr-profile-analyzed

need_feedback rr-profile rr-profile-analyzed

Marko Mäkelä added a comment - 2020-08-31 10:18

I analyzed another case of this on 10.5 9ef36faa614528b66e0a6ba58a4c40b246658b83. The primary key index contains 32 records, while the secondary index contains 33 records. Unfortunately, the secondary index is defined on a virtual column, and therefore this corruption should be a duplicate of one of the bugs related to indexed virtual columns (~~MDEV-5800~~).

The contents of the secondary index (col_int_g,col1) is as follows. None of the records are delete-marked:

bebebebe,0004

bebebebe,0006

bebebebe,0007

bebebebe,0011

bebebebe,001f

bebebebe,0023

bebebebe,0032

bebebebe,003a

bebebebe,0092

bebebebe,00b3

bebebebe,00d0

bebebebe,01c7

bebebebe,0384

bebebebe,03c8

bebebebe,0746

bebebebe,077f

bebebebe,082c

bebebebe,0940

bebebebe,098c

bebebebe,0c77

bebebebe,0d53

bebebebe,0de2

bebebebe,0e06

0008,0008

0033,0033

003e,003e

00f7,00f8

0183,0183

05c1,05c1

0b9f,0b9f

0ba9,0ba9

0e05,0e06

0fa8,0fa9

The 23 values with the negative INT numbers 0xbebebebe (0x3ebebebe in the InnoDB table) are somewhat suspicious. InnoDB used to use the constant 0xbe in the file mem0dbg.cc, but that code was removed in MySQL 5.7.4 and MariaDB 10.2.2.
The table was created and altered to reach the following final definition:

CREATE TABLE t1 (

  col1 int(11) NOT NULL,

  col2 int(11) DEFAULT NULL,

  col_int int(11) DEFAULT NULL,

  col_string char(19) DEFAULT NULL,

  col_varchar varchar(500) DEFAULT NULL,

  col_text text DEFAULT NULL,

  col_int_g int(11) GENERATED ALWAYS AS (col_int) VIRTUAL,

  col_string_g char(13) GENERATED ALWAYS AS (substr(col_string,4,13)) VIRTUAL,

  col_text_g text GENERATED ALWAYS AS (substr(col_text,1,499)) VIRTUAL,

  col2_copy int(11) DEFAULT NULL,

  PRIMARY KEY (col1),

  UNIQUE KEY marvão_uidx2 (col_int),

  UNIQUE KEY uidx3 (col_int,col2),

  UNIQUE KEY uidx2 (col_int),

  UNIQUE KEY uidx1 (col_text(9)),

  KEY idx1 (col_text(9),col1),

  KEY idx3 (col1),

  KEY MArvão_idx3 (col_int_g),

  KEY idx2 (col_text(9),col_int)

) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC

So, we should have col_int_g=col1, which is the primary key. But in the primary key index the smallest value of col1 is 4, not -0x3ebebebe. After the 23 first tuples of the secondary index we also have a few other records where the two columns differ (only by 1).

I think that we need a test case that does not include indexed virtual columns.

nikitamalyavin, which bug on virtual columns would you think that this case is duplicating?

Marko Mäkelä added a comment - 2020-08-31 10:18 I analyzed another case of this on 10.5 9ef36faa614528b66e0a6ba58a4c40b246658b83. The primary key index contains 32 records, while the secondary index contains 33 records. Unfortunately, the secondary index is defined on a virtual column, and therefore this corruption should be a duplicate of one of the bugs related to indexed virtual columns ( MDEV-5800 ). The contents of the secondary index (col_int_g,col1) is as follows. None of the records are delete-marked: bebebebe,0004 bebebebe,0006 bebebebe,0007 bebebebe,0011 bebebebe,001f bebebebe,0023 bebebebe,0032 bebebebe,003a bebebebe,0092 bebebebe,00b3 bebebebe,00d0 bebebebe,01c7 bebebebe,0384 bebebebe,03c8 bebebebe,0746 bebebebe,077f bebebebe,082c bebebebe,0940 bebebebe,098c bebebebe,0c77 bebebebe,0d53 bebebebe,0de2 bebebebe,0e06 0008,0008 0033,0033 003e,003e 00f7,00f8 0183,0183 05c1,05c1 0b9f,0b9f 0ba9,0ba9 0e05,0e06 0fa8,0fa9 The 23 values with the negative INT numbers 0xbebebebe (0x3ebebebe in the InnoDB table) are somewhat suspicious. InnoDB used to use the constant 0xbe in the file mem0dbg.cc , but that code was removed in MySQL 5.7.4 and MariaDB 10.2.2. The table was created and altered to reach the following final definition: CREATE TABLE t1 ( col1 int (11) NOT NULL , col2 int (11) DEFAULT NULL , col_int int (11) DEFAULT NULL , col_string char (19) DEFAULT NULL , col_varchar varchar (500) DEFAULT NULL , col_text text DEFAULT NULL , col_int_g int (11) GENERATED ALWAYS AS (col_int) VIRTUAL, col_string_g char (13) GENERATED ALWAYS AS (substr(col_string,4,13)) VIRTUAL, col_text_g text GENERATED ALWAYS AS (substr(col_text,1,499)) VIRTUAL, col2_copy int (11) DEFAULT NULL , PRIMARY KEY (col1), UNIQUE KEY marvão_uidx2 (col_int), UNIQUE KEY uidx3 (col_int,col2), UNIQUE KEY uidx2 (col_int), UNIQUE KEY uidx1 (col_text(9)), KEY idx1 (col_text(9),col1), KEY idx3 (col1), KEY MArvão_idx3 (col_int_g), KEY idx2 (col_text(9),col_int) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT= DYNAMIC So, we should have col_int_g=col1 , which is the primary key. But in the primary key index the smallest value of col1 is 4, not -0x3ebebebe. After the 23 first tuples of the secondary index we also have a few other records where the two columns differ (only by 1). I think that we need a test case that does not include indexed virtual columns. nikitamalyavin , which bug on virtual columns would you think that this case is duplicating?

Marko Mäkelä made changes - 2020-08-31 10:18

Labels

need_feedback rr-profile rr-profile-analyzed

need_feedback need_rr rr-profile rr-profile-analyzed

Matthias Leich added a comment - 2020-08-31 11:57

The following rr trace shows a case where

CHECK TABLE `test`.`table100_innodb_key_pk_parts_2_int_autoinc` EXTENDED harvests

        'test.table100_innodb_key_pk_parts_2_int_autoinc',  'check', 'error',         'Partition p0 returned error'

        'test.table100_innodb_key_pk_parts_2_int_autoinc',  'check', 'Warning',  'InnoDB: Index \'col_varchar_255_ucs2_key\' contains 79 entries, should be 80.'

        'test.table100_innodb_key_pk_parts_2_int_autoinc',  'check', 'error',         'Corrupt'

Workflow:

1. Start DB server and create a few tables.

     Generated columns or foreign keys are not involved. But one table is partitioned.

     innodb-buffer-pool-size=5M is exceptional small.

2. One session runs DML only

3. At some point of time (2. is ongoing) the server process gets killed with SIGKILL

4. Restart and check the consistency of table via SELECTs and CHECK TABLE extended

HEAD, origin/bb-10.5-MDEV-23456 20a8a5ba50eae3d0ae92da202f5fada89409d072 2020-08-28T22:54:26+05:30

compiled with ASAN and "Og".

RQG call

perl rqg.pl \

--grammar=conf/engines/many_indexes.yy \

--gendata=conf/engines/many_indexes.zz \

--reporters=CrashRecovery1 \

--mysqld=--innodb_use_native_aio=1 \

--mysqld=--innodb_lock_schedule_algorithm=fcfs \

--mysqld=--loose-idle_write_transaction_timeout=0 \

--mysqld=--loose-idle_transaction_timeout=0 \

--mysqld=--loose-idle_readonly_transaction_timeout=0 \

--mysqld=--connect_timeout=60 \

--mysqld=--interactive_timeout=28800 \

--mysqld=--slave_net_timeout=60 \

--mysqld=--net_read_timeout=30 \

--mysqld=--net_write_timeout=60 \

--mysqld=--loose-table_lock_wait_timeout=50 \

--mysqld=--wait_timeout=28800 \

--mysqld=--lock-wait-timeout=86400 \

--mysqld=--innodb-lock-wait-timeout=50 \

--no-mask \

--queries=10000000 \

--seed=random \

--reporters=Backtrace \

--reporters=ErrorLog \

--reporters=Deadlock1 \

--validators=None \

--mysqld=--log_output=none \

--mysqld=--log-bin \

--mysqld=--log_bin_trust_function_creators=1 \

--mysqld=--loose-debug_assert_on_not_freed_memory=0 \

--engine=InnoDB \

--restart_timeout=120 \

--duration=300 \

--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \

--mysqld=--loose-innodb-sync-debug \

--mysqld=--innodb_stats_persistent=off \

--mysqld=--loose-max-statement-time=30 \

--threads=1 \

--mysqld=--innodb_page_size=4K \

--mysqld=--innodb-buffer-pool-size=5M \

--duration=300 \

--no_mask \

--workdir=<local settings> \

--vardir=<local settings> \

--mtr-build-thread=<local settings> \

--basedir1=<local settings> \

--script_debug=_nix_ \

--rr=Server \

--rr_options=--chaos

sdp:/RQG/storage/1598869940/tmp/dev/shm/vardir/1598869940/28/1/rr

_RR_TRACE_DIR="." rr replay --mark-stdio mysqld-1

RQG log is /RQG/storage/1598869940/000094.log

Matthias Leich added a comment - 2020-08-31 11:57 The following rr trace shows a case where CHECK TABLE `test`.`table100_innodb_key_pk_parts_2_int_autoinc` EXTENDED harvests 'test.table100_innodb_key_pk_parts_2_int_autoinc', 'check', 'error', 'Partition p0 returned error' 'test.table100_innodb_key_pk_parts_2_int_autoinc', 'check', 'Warning', 'InnoDB: Index \'col_varchar_255_ucs2_key\' contains 79 entries, should be 80.' 'test.table100_innodb_key_pk_parts_2_int_autoinc', 'check', 'error', 'Corrupt' Workflow: 1. Start DB server and create a few tables. Generated columns or foreign keys are not involved. But one table is partitioned. innodb-buffer-pool-size=5M is exceptional small. 2. One session runs DML only 3. At some point of time (2. is ongoing) the server process gets killed with SIGKILL 4. Restart and check the consistency of table via SELECTs and CHECK TABLE extended HEAD, origin/bb-10.5-MDEV-23456 20a8a5ba50eae3d0ae92da202f5fada89409d072 2020-08-28T22:54:26+05:30 compiled with ASAN and "Og". RQG call perl rqg.pl \ --grammar=conf/engines/many_indexes.yy \ --gendata=conf/engines/many_indexes.zz \ --reporters=CrashRecovery1 \ --mysqld=--innodb_use_native_aio=1 \ --mysqld=--innodb_lock_schedule_algorithm=fcfs \ --mysqld=--loose-idle_write_transaction_timeout=0 \ --mysqld=--loose-idle_transaction_timeout=0 \ --mysqld=--loose-idle_readonly_transaction_timeout=0 \ --mysqld=--connect_timeout=60 \ --mysqld=--interactive_timeout=28800 \ --mysqld=--slave_net_timeout=60 \ --mysqld=--net_read_timeout=30 \ --mysqld=--net_write_timeout=60 \ --mysqld=--loose-table_lock_wait_timeout=50 \ --mysqld=--wait_timeout=28800 \ --mysqld=--lock-wait-timeout=86400 \ --mysqld=--innodb-lock-wait-timeout=50 \ --no-mask \ --queries=10000000 \ --seed=random \ --reporters=Backtrace \ --reporters=ErrorLog \ --reporters=Deadlock1 \ --validators=None \ --mysqld=--log_output=none \ --mysqld=--log-bin \ --mysqld=--log_bin_trust_function_creators=1 \ --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ --engine=InnoDB \ --restart_timeout=120 \ --duration=300 \ --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \ --mysqld=--loose-innodb-sync-debug \ --mysqld=--innodb_stats_persistent=off \ --mysqld=--loose-max-statement-time=30 \ --threads=1 \ --mysqld=--innodb_page_size=4K \ --mysqld=--innodb-buffer-pool-size=5M \ --duration=300 \ --no_mask \ --workdir=<local settings> \ --vardir=<local settings> \ --mtr-build-thread=<local settings> \ --basedir1=<local settings> \ --script_debug=_nix_ \ --rr=Server \ --rr_options=--chaos sdp:/RQG/storage/1598869940/tmp/dev/shm/vardir/1598869940/28/1/rr _RR_TRACE_DIR="." rr replay --mark-stdio mysqld-1 RQG log is /RQG/storage/1598869940/000094.log

Matthias Leich made changes - 2020-08-31 11:57

Labels

need_feedback need_rr rr-profile rr-profile-analyzed

need_feedback rr-profile rr-profile-analyzed

Marko Mäkelä made changes - 2020-08-31 13:38

Link

This issue relates to ~~MDEV-23632~~ [ ~~MDEV-23632~~ ]

Marko Mäkelä added a comment - 2020-08-31 13:40

I filed ~~MDEV-23632~~ for the 0xbebebebe problem, which I was able to repeat on 10.3 as well. I made a mistake in my earlier comment: col_int_g is an alias for col_int, not col1.

Marko Mäkelä added a comment - 2020-08-31 13:40 I filed MDEV-23632 for the 0xbebebebe problem, which I was able to repeat on 10.3 as well. I made a mistake in my earlier comment: col_int_g is an alias for col_int , not col1 .

Marko Mäkelä made changes - 2020-08-31 13:40

Labels

need_feedback rr-profile rr-profile-analyzed

need_feedback rr-profile

Marko Mäkelä added a comment - 2020-08-31 15:57

The trace with ASAN includes crash recovery. As far as I can tell, the table probably may have been corrupted before recovery. In the clustered index, CHECK TABLE accessed 80 records. I dumped the values of the PRIMARY KEY column (an INT, ranging from 0x19 to 0xc0) and the indexed UCS2 column. I repeated the same exercise for the 79 visited records of the secondary index.

It is possible that I made a mistake when dumping the indexed column in the primary key index, because it looks like I would always get a mismatch for the column value. Unfortunately, InnoDB does not implement CHECK TABLE…EXTENDED that would cross-check the keys in each index. It only counts the records.

Because the trace started with a crash recovery, it is possible that the table was corrupted already before recovery. rr replay of the trace before the server was killed could not be replayed; diversion was reported.

Would it be possible to repeat this with fewer columns and indexes, or without crash recovery? Theoretically, one secondary index could be enough. And we should not need many non-indexed columns in the table.

Marko Mäkelä added a comment - 2020-08-31 15:57 The trace with ASAN includes crash recovery. As far as I can tell, the table probably may have been corrupted before recovery. In the clustered index, CHECK TABLE accessed 80 records. I dumped the values of the PRIMARY KEY column (an INT, ranging from 0x19 to 0xc0) and the indexed UCS2 column. I repeated the same exercise for the 79 visited records of the secondary index. It is possible that I made a mistake when dumping the indexed column in the primary key index, because it looks like I would always get a mismatch for the column value. Unfortunately, InnoDB does not implement CHECK TABLE…EXTENDED that would cross-check the keys in each index. It only counts the records. Because the trace started with a crash recovery, it is possible that the table was corrupted already before recovery. rr replay of the trace before the server was killed could not be replayed; diversion was reported. Would it be possible to repeat this with fewer columns and indexes, or without crash recovery? Theoretically, one secondary index could be enough. And we should not need many non-indexed columns in the table.

Nikita Malyavin added a comment - 2020-08-31 16:17

marko, can't remember the task i am immediately aware of, that could duplicate this one. The value can be left broken in the case of computation failure, which is covered by ~~MDEV-20618~~ fix. I'd check this one first (especially since you have same-looking values)

Nikita Malyavin added a comment - 2020-08-31 16:17 marko , can't remember the task i am immediately aware of, that could duplicate this one. The value can be left broken in the case of computation failure, which is covered by MDEV-20618 fix. I'd check this one first (especially since you have same-looking values)

Matthias Leich added a comment - 2020-08-31 19:20

The RQG simplifier is running.
I expect to have some heavy simplified replay test soon.

Matthias Leich added a comment - 2020-08-31 19:20 The RQG simplifier is running. I expect to have some heavy simplified replay test soon.

Marko Mäkelä added a comment - 2020-09-01 14:13

Also the newest rr replay trace is flagging corruption in an index on a UCS2 column, apparently without involving server restart, which is good news. mleich says that with UTF-8, the bug would take longer to reproduce.

It is a little challenging to debug the trace, because both the primary key and the secondary key consist of multiple pages even though the number of rows is relatively small. There probably are unnecessary columns and indexes (5 secondary indexes in addition to the failing one). But, I should be able to analyze the available trace. It will just take some extra effort, because I cannot comfortably look at a small number of page dumps but will have to define some breakpoints and auto-display expressions to dump the interesting data.

Marko Mäkelä added a comment - 2020-09-01 14:13 Also the newest rr replay trace is flagging corruption in an index on a UCS2 column, apparently without involving server restart, which is good news. mleich says that with UTF-8, the bug would take longer to reproduce. It is a little challenging to debug the trace, because both the primary key and the secondary key consist of multiple pages even though the number of rows is relatively small. There probably are unnecessary columns and indexes (5 secondary indexes in addition to the failing one). But, I should be able to analyze the available trace. It will just take some extra effort, because I cannot comfortably look at a small number of page dumps but will have to define some breakpoints and auto-display expressions to dump the interesting data.

Matthias Leich made changes - 2020-09-02 14:46

Attachment

MDEV-22924.tgz [ 53737 ]

Matthias Leich added a comment - 2020-09-02 15:03

The uploaded MDEV-22924.tgz contains some MTR + shellscript based brute force replay test

- ./mysqltest_background.sh

- suite/innodb/t/MDEV-22924.test

- suite/innodb/t/MDEV-22924-master.opt

A replay attempt could look like

./mtr --mem -rr --rr-arg="--chaos" MDEV-22924

You might need

- several replay attempts or

- to raise the value assigned to $MY_RUNTIME in suite/innodb/t/MDEV-22924.test.

Some warnings regarding the content of suite/innodb/t/MDEV-22924.test:

I am rather sure that certain statements (there is one with wrong syntax) or some

components of the table (column or a key) are most probably not required for replay.

But when removing other statements or components usually the runtime required

till replay raised serious.

Matthias Leich added a comment - 2020-09-02 15:03 The uploaded MDEV-22924.tgz contains some MTR + shellscript based brute force replay test - ./mysqltest_background.sh - suite/innodb/t/MDEV-22924.test - suite/innodb/t/MDEV-22924-master.opt A replay attempt could look like ./mtr --mem -rr --rr-arg="--chaos" MDEV-22924 You might need - several replay attempts or - to raise the value assigned to $MY_RUNTIME in suite/innodb/t/MDEV-22924.test. Some warnings regarding the content of suite/innodb/t/MDEV-22924.test: I am rather sure that certain statements (there is one with wrong syntax) or some components of the table (column or a key) are most probably not required for replay. But when removing other statements or components usually the runtime required till replay raised serious.

Matthias Leich made changes - 2020-09-02 17:44

Attachment

~~MDEV-22924~~.tgz [ 53737 ]

Matthias Leich made changes - 2020-09-02 17:44

Attachment

MDEV-22924.tgz [ 53742 ]

Marko Mäkelä added a comment - 2020-09-04 14:05

I am debugging an rr replay trace that was generated with MDEV-22924.tgz. I observed that while CHECK TABLE is accessing the PRIMARY KEY index, an UPDATE is accessing a secondary index of the same table. Due to this, my standard practice of saving the page dumps at the end of the run did not produce good results. Instead, I ended up setting a breakpoint in buf_page_get_gen() to extract the pages as they were when CHECK TABLE accessed them.

The execution uses innodb_page_size=4k. The clustered index comprises 3 leaf pages, and the secondary index comprises 1 page:

PRIMARY KEY(pk)
(pk,DB_TRX_ID,DB_ROLL_PTR,col_char_255_utf8_key,col_longtext_ucs2_key,…)=
page 12:
(1,0x7d9,update,'moderate','',…),
(2,0x7d9,update,'suppose','',…),
(3,0x7d9,update,'mfrjdutx'…,'',…),
page 10:
(4,0x7d9,update,'rjdutxv','',…),
(5,0x7d9,update,'utxvbchboy'…,'',…),
page 11:
(6,0x7d9,update,'i','',…)

KEY(col_longtext_ucs2_key)
(col_longtext_ucs2_key,pk)=
page 5:
('',1),('',2),('',3),('',4),('',5),('',6),
delete-marked(repeat(0x744,242),1),
delete-marked(repeat(0x744,242),2),
… (15 records total)

The problem is that CHECK TABLE is only counting 5 records in the secondary index, instead of the expected number 6.
It is entirely normal that the secondary index contains extra records; this is how MVCC works until we implement MDEV-17598. Next, I will check in the execution trace each access of those 6 first secondary index records to determine which record is not being counted, and why.

Marko Mäkelä added a comment - 2020-09-04 14:05 I am debugging an rr replay trace that was generated with MDEV-22924.tgz . I observed that while CHECK TABLE is accessing the PRIMARY KEY index, an UPDATE is accessing a secondary index of the same table. Due to this, my standard practice of saving the page dumps at the end of the run did not produce good results. Instead, I ended up setting a breakpoint in buf_page_get_gen() to extract the pages as they were when CHECK TABLE accessed them. The execution uses innodb_page_size=4k . The clustered index comprises 3 leaf pages, and the secondary index comprises 1 page: PRIMARY KEY(pk) (pk,DB_TRX_ID,DB_ROLL_PTR,col_char_255_utf8_key,col_longtext_ucs2_key,…)= page 12: (1,0x7d9,update,'moderate','',…), (2,0x7d9,update,'suppose','',…), (3,0x7d9,update,'mfrjdutx'…,'',…), page 10: (4,0x7d9,update,'rjdutxv','',…), (5,0x7d9,update,'utxvbchboy'…,'',…), page 11: (6,0x7d9,update,'i','',…) KEY(col_longtext_ucs2_key) (col_longtext_ucs2_key,pk)= page 5: ('',1),('',2),('',3),('',4),('',5),('',6), delete-marked(repeat(0x744,242),1), delete-marked(repeat(0x744,242),2), … (15 records total) The problem is that CHECK TABLE is only counting 5 records in the secondary index, instead of the expected number 6. It is entirely normal that the secondary index contains extra records; this is how MVCC works until we implement MDEV-17598 . Next, I will check in the execution trace each access of those 6 first secondary index records to determine which record is not being counted, and why.

Marko Mäkelä added a comment - 2020-09-04 17:29

All records in the clustered index page carry DB_TRX_ID=2009=0x7d9. Also the PAGE_MAX_TRX_ID in the secondary index is at that value. The next available transaction identifier (trx_sys.m_max_trx_id) is 2012. The read view of the CHECK TABLE excludes transaction 2009.

Transaction 2009 modified all 6 rows of the table by executing the statement

UPDATE table100_innodb_int_autoinc SET col_longtext_ucs2_key = REPEAT( 0xA, 0 ) WHERE col_char_255_utf8_key BETWEEN REPEAT( 0xF5B, 224 ) AND REPEAT( 0xB2, 140 );

The transaction was started at the event time (when) 4387662 and committed at 4257184. At the time the transaction started, the clustered index leaf pages were 13,14,10,11.

The CHECK TABLE had started executing on the primary key at when 4255465. At the time the UPDATE transaction is committing, it was executing inside btr_validate_index(), trying to access our secondary index page 5.

It looks like we must ignore the 6 first records in the secondary index page (which must be excluded from the read view) and concentrate on the delete-marked ones, to see which record is missing and why. Here are all 15 secondary index records that were present when CHECK TABLE counted the records:

KEY(col_longtext_ucs2_key)
(col_longtext_ucs2_key,pk)=
page 5:
('',1),('',2),('',3),('',4),('',5),('',6),
delete-marked(repeat(0x744,242),1),
delete-marked(repeat(0x744,242),2),
delete-marked(repeat(0x744,242),3),
delete-marked(repeat(0x744,242),4),
delete-marked(repeat(0x744,242),5),
delete-marked(repeat(0x744,242),6),
delete-marked(repeat(0xf76,85),1),
delete-marked(repeat(0xf76,85),3),
delete-marked(repeat(0xf76,85),6),

I noticed that also purge is running concurrently with the CHECK TABLE. That should explain why there exist only 3 keys with the value 0xf76.

I hope to complete the analysis of this trace next week. It is rather challenging due to the amount of concurrency, including page splits and merges in the clustered index.

Marko Mäkelä added a comment - 2020-09-04 17:29 All records in the clustered index page carry DB_TRX_ID=2009=0x7d9. Also the PAGE_MAX_TRX_ID in the secondary index is at that value. The next available transaction identifier ( trx_sys.m_max_trx_id ) is 2012. The read view of the CHECK TABLE excludes transaction 2009. Transaction 2009 modified all 6 rows of the table by executing the statement UPDATE table100_innodb_int_autoinc SET col_longtext_ucs2_key = REPEAT( 0xA, 0 ) WHERE col_char_255_utf8_key BETWEEN REPEAT( 0xF5B, 224 ) AND REPEAT( 0xB2, 140 ); The transaction was started at the event time ( when ) 4387662 and committed at 4257184. At the time the transaction started, the clustered index leaf pages were 13,14,10,11. The CHECK TABLE had started executing on the primary key at when 4255465. At the time the UPDATE transaction is committing, it was executing inside btr_validate_index() , trying to access our secondary index page 5. It looks like we must ignore the 6 first records in the secondary index page (which must be excluded from the read view) and concentrate on the delete-marked ones, to see which record is missing and why. Here are all 15 secondary index records that were present when CHECK TABLE counted the records: KEY(col_longtext_ucs2_key) (col_longtext_ucs2_key,pk)= page 5: ('',1),('',2),('',3),('',4),('',5),('',6), delete-marked(repeat(0x744,242),1), delete-marked(repeat(0x744,242),2), delete-marked(repeat(0x744,242),3), delete-marked(repeat(0x744,242),4), delete-marked(repeat(0x744,242),5), delete-marked(repeat(0x744,242),6), delete-marked(repeat(0xf76,85),1), delete-marked(repeat(0xf76,85),3), delete-marked(repeat(0xf76,85),6), I noticed that also purge is running concurrently with the CHECK TABLE . That should explain why there exist only 3 keys with the value 0xf76. I hope to complete the analysis of this trace next week. It is rather challenging due to the amount of concurrency, including page splits and merges in the clustered index.

Marko Mäkelä added a comment - 2020-09-07 07:52

Transaction 2009 updated the clustered index records to col_longtest_ucs2_key='' at the following event times, with the following preceding contents of the record:

when	pk	DB_TRX_ID	col_longtest_ucs2_key
4243126	6	0x7d5	repeat(0xf76,85)
4246790	3	0x7d5	repeat(0xf76,85)
4247019	1	0x7d5	repeat(0xf76,85)
4251855	4	0x7d2	repeat(0x744,242)
4253252	2	0x7d2	repeat(0x744,242)
4253481	5	0x7d2	repeat(0x744,242)

This suggests that none of the records that were visible in the secondary index page when CHECK TABLE was executed could actually be purged.

The UPDATE transaction was holding an S-latch on the secondary index page (5) from event time 4254001 to 4256397.
At event time 4254061, a purge thread starts waiting for an X-latch on the secondary index page.
At event time 4255465, the failing CHECK TABLE starts accessing the clustered index.
Between event time 4256397 and 4256882 the purge thread was holding the X-latch on the secondary index page.
Between event time 4256918 and 4257017, the same purge thread holds the secondary index X-latch again.
Between event time 4257031 and 4257048, the UPDATE transaction 2009 holds the X-latch on the secondary index page, so that it can perform the first half of the last update, to delete-mark the record (repeat(0x744,242),5).
Between event time 4257072 and 4257117, the UPDATE transaction 2009 holds the X-latch on the secondary index page, so that it can perform the second half of the last update, to insert the record ('',5).
At event time 4257186, the UPDATE transaction 2009 commits.
Between event time 4257264 and 4257322, the failing CHECK TABLE holds an SX-latch on the secondary index page.
At event time 4257336, the failing CHECK TABLE acquires an S-latch on the secondary index page. This is the first time it starts counting records in the secondary index.
At event time 4257868, a purge thread requests an exclusive latch on the secondary index page, while our CHECK TABLE is holding an S-latch.
At event time 4258247, the failing CHECK TABLE releases the S-latch on the secondary index page.
At event time 4258372, purge releases the exclusive page latch.
Between event time 4258406 and 4258406, the same purge thread holds an exclusive page latch.
Between event time 4259276 and 4259435, the same purge thread holds an exclusive page latch.
Between event time 4259469 and 4259542, the same purge thread holds an exclusive page latch.
Between event time 4259576 and 4259649, the same purge thread holds an exclusive page latch.
Between event time 4259683 and 4260375, the same purge thread holds an exclusive page latch.
Between event time 4260409 and 4260489, the same purge thread holds an exclusive page latch.
Between event time 4260523 and 4260596, the same purge thread holds an exclusive page latch.
Between event time 4260630 and 4260677, the same purge thread holds an exclusive page latch.
Between event time 4260711 and 4260758, the same purge thread holds an exclusive page latch.
Between event time 4260792 and 4260847, the same purge thread holds an exclusive page latch.
Between event time 4260881 and 4260928, the same purge thread holds an exclusive page latch.
Between event time 4260962 and 4261009, the same purge thread holds an exclusive page latch.
At event time 4261016, our failing CHECK TABLE acquires an S-latch on the secondary index page.
At event time 4261047, an UPDATE transaction 2014 starts to wait for an exclusive page latch.
At event time 4261383, our failing CHECK TABLE releases the S-latch.
At event time 4261431, the UPDATE transaction 2014 releases the exclusive page latch.
At event time 4261445, our failing CHECK TABLE acquires an S-latch on the secondary index page.
At event time 4261457, our failing CHECK TABLE acquires an S-latch on the secondary index page.
Between event time 4261451 and 4261472, the UPDATE transaction 2014 holds the exclusive page latch.
Between event time 4261586 and 4261635, a different purge thread holds an exclusive page latch.
…
At event time 4263975, we report the count mismatch.

Because there is so much concurrent activity, it seems infeasible to track down all changes to the page between the commit of the transaction 2009 and the CHECK TABLE. Next, I will determine which ‘raw’ records the CHECK TABLE is observing in the secondary index, and which of them are being miscounted.

Marko Mäkelä added a comment - 2020-09-07 07:52 Transaction 2009 updated the clustered index records to col_longtest_ucs2_key='' at the following event times, with the following preceding contents of the record: when pk DB_TRX_ID col_longtest_ucs2_key 4243126 6 0x7d5 repeat(0xf76,85) 4246790 3 0x7d5 repeat(0xf76,85) 4247019 1 0x7d5 repeat(0xf76,85) 4251855 4 0x7d2 repeat(0x744,242) 4253252 2 0x7d2 repeat(0x744,242) 4253481 5 0x7d2 repeat(0x744,242) This suggests that none of the records that were visible in the secondary index page when CHECK TABLE was executed could actually be purged. The UPDATE transaction was holding an S-latch on the secondary index page (5) from event time 4254001 to 4256397. At event time 4254061, a purge thread starts waiting for an X-latch on the secondary index page. At event time 4255465, the failing CHECK TABLE starts accessing the clustered index. Between event time 4256397 and 4256882 the purge thread was holding the X-latch on the secondary index page. Between event time 4256918 and 4257017, the same purge thread holds the secondary index X-latch again. Between event time 4257031 and 4257048, the UPDATE transaction 2009 holds the X-latch on the secondary index page, so that it can perform the first half of the last update, to delete-mark the record (repeat(0x744,242),5). Between event time 4257072 and 4257117, the UPDATE transaction 2009 holds the X-latch on the secondary index page, so that it can perform the second half of the last update, to insert the record ('',5). At event time 4257186, the UPDATE transaction 2009 commits. Between event time 4257264 and 4257322, the failing CHECK TABLE holds an SX-latch on the secondary index page. At event time 4257336, the failing CHECK TABLE acquires an S-latch on the secondary index page. This is the first time it starts counting records in the secondary index. At event time 4257868, a purge thread requests an exclusive latch on the secondary index page, while our CHECK TABLE is holding an S-latch. At event time 4258247, the failing CHECK TABLE releases the S-latch on the secondary index page. At event time 4258372, purge releases the exclusive page latch. Between event time 4258406 and 4258406, the same purge thread holds an exclusive page latch. Between event time 4259276 and 4259435, the same purge thread holds an exclusive page latch. Between event time 4259469 and 4259542, the same purge thread holds an exclusive page latch. Between event time 4259576 and 4259649, the same purge thread holds an exclusive page latch. Between event time 4259683 and 4260375, the same purge thread holds an exclusive page latch. Between event time 4260409 and 4260489, the same purge thread holds an exclusive page latch. Between event time 4260523 and 4260596, the same purge thread holds an exclusive page latch. Between event time 4260630 and 4260677, the same purge thread holds an exclusive page latch. Between event time 4260711 and 4260758, the same purge thread holds an exclusive page latch. Between event time 4260792 and 4260847, the same purge thread holds an exclusive page latch. Between event time 4260881 and 4260928, the same purge thread holds an exclusive page latch. Between event time 4260962 and 4261009, the same purge thread holds an exclusive page latch. At event time 4261016, our failing CHECK TABLE acquires an S-latch on the secondary index page. At event time 4261047, an UPDATE transaction 2014 starts to wait for an exclusive page latch. At event time 4261383, our failing CHECK TABLE releases the S-latch. At event time 4261431, the UPDATE transaction 2014 releases the exclusive page latch. At event time 4261445, our failing CHECK TABLE acquires an S-latch on the secondary index page. At event time 4261457, our failing CHECK TABLE acquires an S-latch on the secondary index page. Between event time 4261451 and 4261472, the UPDATE transaction 2014 holds the exclusive page latch. Between event time 4261586 and 4261635, a different purge thread holds an exclusive page latch. … At event time 4263975, we report the count mismatch. Because there is so much concurrent activity, it seems infeasible to track down all changes to the page between the commit of the transaction 2009 and the CHECK TABLE . Next, I will determine which ‘raw’ records the CHECK TABLE is observing in the secondary index, and which of them are being miscounted.

Marko Mäkelä added a comment - 2020-09-07 09:14

The following record is unexpectedly determined as not visible:

Thread 4 hit Breakpoint 15, row_search_mvcc (buf=buf@entry=0x621003b4d518 '\276' <repeats 200 times>..., mode=<optimized out>, mode@entry=PAGE_CUR_G, prebuilt=prebuilt@entry=0x62100013b1a0,

    match_mode=match_mode@entry=0, direction=direction@entry=0) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:5253

5253					ut_ad(prebuilt->select_lock_type == LOCK_NONE

9: /x *prebuilt.pcur.btr_cur.page_cur.rec@4 = {0x7, 0x44, 0x7, 0x44}

11: /x prebuilt.pcur.btr_cur.page_cur.rec[480]@8 = {0x7, 0x44, 0x7, 0x44, 0x80, 0x0, 0x0, 0x2}

(rr) when

Current event: 4262501

The clustered index page 12 was looked up at event time 4262479, and it does contain the key (pk=2). It seems that incorrect caching at Row_sel_get_clust_rec_for_mysql::operator() might be to blame for the mismatch.

At event time 4261371, Row_sel_get_clust_rec_for_mysql::cached_clust_rec had been updated:

Thread 4 hit Hardware watchpoint 17: -location cached_clust_rec

Old value = (const rec_t *) 0x5acc21d84083 <incomplete sequence \354\200>

New value = (const rec_t *) 0x5acc21a18083 "\200"

Row_sel_get_clust_rec_for_mysql::operator() (this=this@entry=0x27ed5a27a540, prebuilt=prebuilt@entry=0x62100013b1a0, sec_index=sec_index@entry=0x61700006b7a0, rec=<optimized out>, thr=thr@entry=0x62100013b980,

    out_rec=out_rec@entry=0x27ed5a27a040, offsets=<optimized out>, offset_heap=<optimized out>, vrow=<optimized out>, mtr=<optimized out>) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:3418

3418					cached_old_vers = old_vers;

(rr) when

Current event: 4261371

(rr) p/x *clust_rec@20

$88 = {0x80, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd9, 0x4c, 0x0, 0x0, 0x6, 0x31, 0x8, 0xc6, 0x6d, 0x6f, 0x64}

Note: (pk,DB_TRX_ID)=(1,2009), not (2,2009).
A little later, we get to check whether the secondary index record with pk=2 should exist in our read view:

(rr)

Row_sel_get_clust_rec_for_mysql::operator() (this=this@entry=0x27ed5a27a540, prebuilt=prebuilt@entry=0x62100013b1a0, sec_index=sec_index@entry=0x61700006b7a0, rec=<optimized out>, thr=thr@entry=0x62100013b980,

    out_rec=out_rec@entry=0x27ed5a27a040, offsets=<optimized out>, offset_heap=<optimized out>, vrow=<optimized out>, mtr=<optimized out>) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:3258

3258		clust_rec = btr_pcur_get_rec(prebuilt->clust_pcur);

(rr)

3260		prebuilt->clust_pcur->trx_if_known = trx;

(rr) p clust_rec

$91 = (const rec_t *) 0x5acc21a18083 "\200"

(rr) when

Current event: 4262489

(rr) p/x *clust_rec@20

$92 = {0x80, 0x0, 0x0, 0x2, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd9, 0x4c, 0x0, 0x0, 0x6, 0x32, 0x6, 0xce, 0x73, 0x75, 0x70}

Note: clust_rec is at the same pointer value, but it now points to a different record, (pk,DB_TRX_ID)=(2,2009), presumably because there has been a lot of purge and UPDATE activity that caused the records to move around.
A little later, we hit the bug:

3405				if (clust_rec != cached_clust_rec) {

(rr)

3420					err = DB_SUCCESS;

(rr)

3421					old_vers = cached_old_vers;

(rr) p/x *cached_old_vers@20

$93 = {0x80, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd5, 0x49, 0x0, 0x0, 0x6, 0x30, 0x0, 0x38, 0x6d, 0x6f, 0x64}

(rr) when

Current event: 4262489

So, we wrongly determine the old version of the record pk=2 to be something with pk=1, and subsequently determine that the secondary index record does not exist in the read view.

This incorrect logic, which was introduced in ~~MDEV-20301~~, would likely have been caught much earlier if we had had a debug assertion to check that the PRIMARY KEY values are equal.

The bug seems to affect MVCC reads of secondary index records only. The ~~MDEV-20301~~ ‘optimization’ was not applied to checks for implicit locks.

Marko Mäkelä added a comment - 2020-09-07 09:14 The following record is unexpectedly determined as not visible: Thread 4 hit Breakpoint 15, row_search_mvcc (buf=buf@entry=0x621003b4d518 '\276' <repeats 200 times>..., mode=<optimized out>, mode@entry=PAGE_CUR_G, prebuilt=prebuilt@entry=0x62100013b1a0, match_mode=match_mode@entry=0, direction=direction@entry=0) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:5253 5253 ut_ad(prebuilt->select_lock_type == LOCK_NONE 9: /x *prebuilt.pcur.btr_cur.page_cur.rec@4 = {0x7, 0x44, 0x7, 0x44} 11: /x prebuilt.pcur.btr_cur.page_cur.rec[480]@8 = {0x7, 0x44, 0x7, 0x44, 0x80, 0x0, 0x0, 0x2} (rr) when Current event: 4262501 The clustered index page 12 was looked up at event time 4262479, and it does contain the key (pk=2). It seems that incorrect caching at Row_sel_get_clust_rec_for_mysql::operator() might be to blame for the mismatch. At event time 4261371, Row_sel_get_clust_rec_for_mysql::cached_clust_rec had been updated: Thread 4 hit Hardware watchpoint 17: -location cached_clust_rec Old value = (const rec_t *) 0x5acc21d84083 <incomplete sequence \354\200> New value = (const rec_t *) 0x5acc21a18083 "\200" Row_sel_get_clust_rec_for_mysql::operator() (this=this@entry=0x27ed5a27a540, prebuilt=prebuilt@entry=0x62100013b1a0, sec_index=sec_index@entry=0x61700006b7a0, rec=<optimized out>, thr=thr@entry=0x62100013b980, out_rec=out_rec@entry=0x27ed5a27a040, offsets=<optimized out>, offset_heap=<optimized out>, vrow=<optimized out>, mtr=<optimized out>) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:3418 3418 cached_old_vers = old_vers; (rr) when Current event: 4261371 (rr) p/x *clust_rec@20 $88 = {0x80, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd9, 0x4c, 0x0, 0x0, 0x6, 0x31, 0x8, 0xc6, 0x6d, 0x6f, 0x64} Note: (pk,DB_TRX_ID)=(1,2009), not (2,2009). A little later, we get to check whether the secondary index record with pk=2 should exist in our read view: (rr) Row_sel_get_clust_rec_for_mysql::operator() (this=this@entry=0x27ed5a27a540, prebuilt=prebuilt@entry=0x62100013b1a0, sec_index=sec_index@entry=0x61700006b7a0, rec=<optimized out>, thr=thr@entry=0x62100013b980, out_rec=out_rec@entry=0x27ed5a27a040, offsets=<optimized out>, offset_heap=<optimized out>, vrow=<optimized out>, mtr=<optimized out>) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:3258 3258 clust_rec = btr_pcur_get_rec(prebuilt->clust_pcur); (rr) 3260 prebuilt->clust_pcur->trx_if_known = trx; (rr) p clust_rec $91 = (const rec_t *) 0x5acc21a18083 "\200" (rr) when Current event: 4262489 (rr) p/x *clust_rec@20 $92 = {0x80, 0x0, 0x0, 0x2, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd9, 0x4c, 0x0, 0x0, 0x6, 0x32, 0x6, 0xce, 0x73, 0x75, 0x70} Note: clust_rec is at the same pointer value, but it now points to a different record, (pk,DB_TRX_ID)=(2,2009), presumably because there has been a lot of purge and UPDATE activity that caused the records to move around. A little later, we hit the bug: 3405 if (clust_rec != cached_clust_rec) { (rr) 3420 err = DB_SUCCESS; (rr) 3421 old_vers = cached_old_vers; (rr) p/x *cached_old_vers@20 $93 = {0x80, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd5, 0x49, 0x0, 0x0, 0x6, 0x30, 0x0, 0x38, 0x6d, 0x6f, 0x64} (rr) when Current event: 4262489 So, we wrongly determine the old version of the record pk=2 to be something with pk=1, and subsequently determine that the secondary index record does not exist in the read view. This incorrect logic, which was introduced in MDEV-20301 , would likely have been caught much earlier if we had had a debug assertion to check that the PRIMARY KEY values are equal. The bug seems to affect MVCC reads of secondary index records only. The MDEV-20301 ‘optimization’ was not applied to checks for implicit locks.

Marko Mäkelä made changes - 2020-09-07 09:14

Link

This issue is caused by ~~MDEV-20301~~ [ ~~MDEV-20301~~ ]

Marko Mäkelä made changes - 2020-09-07 09:14

Fix Version/s		10.2 [ 14601 ]
Fix Version/s		10.3 [ 22126 ]
Fix Version/s		10.4 [ 22408 ]
Affects Version/s		10.5.0 [ 23709 ]
Affects Version/s		10.4.8 [ 23721 ]
Affects Version/s		10.3.18 [ 23719 ]
Affects Version/s		10.2.27 [ 23717 ]
Affects Version/s	10.5.4 [ 24264 ]
Assignee	Marko Mäkelä [ marko ]	Aleksey Midenkov [ midenok ]
Labels	need_feedback rr-profile	affects-tests corruption not-10.1 regression rr-profile-analyzed
Priority	Critical [ 2 ]	Blocker [ 1 ]

Marko Mäkelä added a comment - 2020-09-07 09:21

We might try to rescue the ~~MDEV-20301~~ logic by checking that the FIL_PAGE_LSN of the page is unchanged. (Note that before ~~MDEV-21132~~ in 10.5.0, the latest value of FIL_PAGE_LSN could be stored in newest_lsn, and hence that field must be checked as well.) We should also add a debug assertion that the (pk,DB_TRX_ID,DB_ROLL_PTR) match between clust_rec and cached_clust_rec, and that the (pk) matches with cached_old_vers.

Marko Mäkelä added a comment - 2020-09-07 09:21 We might try to rescue the MDEV-20301 logic by checking that the FIL_PAGE_LSN of the page is unchanged. (Note that before MDEV-21132 in 10.5.0, the latest value of FIL_PAGE_LSN could be stored in newest_lsn , and hence that field must be checked as well.) We should also add a debug assertion that the (pk,DB_TRX_ID,DB_ROLL_PTR) match between clust_rec and cached_clust_rec , and that the (pk) matches with cached_old_vers .

Marko Mäkelä made changes - 2020-09-07 09:21

Status

Open [ 1 ]

Confirmed [ 10101 ]

Marko Mäkelä made changes - 2020-09-07 09:25

Link

This issue blocks ~~MDEV-9530~~ [ ~~MDEV-9530~~ ]

Marko Mäkelä made changes - 2020-09-07 09:39

Link

This issue blocks ~~MDEV-9663~~ [ ~~MDEV-9663~~ ]

Marko Mäkelä added a comment - 2020-09-07 10:11

This bug is blocking the testing effort for catching other corruption bugs. I will try to address this myself as soon as possible.

Marko Mäkelä added a comment - 2020-09-07 10:11 This bug is blocking the testing effort for catching other corruption bugs. I will try to address this myself as soon as possible.

Marko Mäkelä made changes - 2020-09-07 10:11

Assignee

Aleksey Midenkov [ midenok ]

Marko Mäkelä [ marko ]

Marko Mäkelä made changes - 2020-09-07 12:40

Status

Confirmed [ 10101 ]

In Progress [ 3 ]

Marko Mäkelä made changes - 2020-09-07 17:03

issue.field.resolutiondate

2020-09-07 17:03:05.0

2020-09-07 17:03:05.837

Marko Mäkelä made changes - 2020-09-07 17:03

Fix Version/s		10.2.34 [ 24505 ]
Fix Version/s		10.3.25 [ 24506 ]
Fix Version/s		10.4.15 [ 24507 ]
Fix Version/s		10.5.6 [ 24508 ]
Fix Version/s	10.2 [ 14601 ]
Fix Version/s	10.3 [ 22126 ]
Fix Version/s	10.4 [ 22408 ]
Fix Version/s	10.5 [ 23123 ]
Resolution		Fixed [ 1 ]
Status	In Progress [ 3 ]	Closed [ 6 ]

Ralf Gebhardt made changes - 2020-10-06 18:35

Fix Version/s

10.5.7 [ 25019 ]

Ralf Gebhardt made changes - 2020-10-06 18:37

Fix Version/s

10.5.6 [ 24508 ]

Ralf Gebhardt made changes - 2020-10-06 18:40

Fix Version/s

10.4.16 [ 25020 ]

Ralf Gebhardt made changes - 2020-10-06 18:42

Fix Version/s

10.4.15 [ 24507 ]

Ralf Gebhardt made changes - 2020-10-06 18:43

Fix Version/s

10.3.26 [ 25021 ]

Ralf Gebhardt made changes - 2020-10-06 18:45

Fix Version/s

10.3.25 [ 24506 ]

Ralf Gebhardt made changes - 2020-10-06 18:47

Fix Version/s

10.2.35 [ 25022 ]

Ralf Gebhardt made changes - 2020-10-06 18:48

Fix Version/s

10.2.34 [ 24505 ]

Manjot Singh (Inactive) made changes - 2020-10-13 16:06

Assignee

Marko Mäkelä [ marko ]

Manjot Singh [ manjot ]

Marko Mäkelä made changes - 2020-10-13 16:15

Assignee

Manjot Singh [ manjot ]

Marko Mäkelä [ marko ]

Manjot Singh (Inactive) made changes - 2020-10-13 16:26

Assignee

Marko Mäkelä [ marko ]

Manjot Singh [ manjot ]

Manjot Singh (Inactive) made changes - 2020-10-13 16:26

Assignee

Manjot Singh [ manjot ]

Marko Mäkelä [ marko ]

Sergei Golubchik made changes - 2021-12-06 21:51

Workflow

MariaDB v3 [ 110142 ]

MariaDB v4 [ 157976 ]

MariaDB Server

Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration