Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22924

Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.

Details

    Description

      Index corruption found during RQG testing
      Some CHECK TABLE t2 EXTENDED harvested
      test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.
      And the server error log contains a
      [rr 76197 241886]2020-06-16 18:31:54 33 [ERROR] InnoDB: Index `Marvão_idx3` of table `test`.`t2` is corrupted
      [rr 76197 241890]2020-06-16 18:31:54 33 [Note] InnoDB: Load corrupted index `Marvão_idx3` of table `test`.`t2`
       
      Work flow:
      1. Start the server and generate some initial data
      2. Several concurrent sessions runs a DDL/DML mix
           and sometimes CHECK TABLE EXTENDED
      Please note critical events like killing the server process followed by Restart
      with CrashRecovery are not involved..
       
      I have some "rr" trace for the tree
      origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08
      which is based on actual 10.5 but contains some non final changes of Marko.
      Per Marko the failure cannot be caused by his changes.
       
      My test machinery has seen the same problem also on
      10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00
      but the corresponding details are lost.
      And so its unknown if that was during concurrent load or after crash with recovery.
       
      RQG
      git clone https://github.com/mleich1/rqg --branch experimental RQG
      origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00
       
      perl rqg.pl \
      --grammar=conf/mariadb/table_stress_innodb_nocopy.yy \
      --gendata=conf/mariadb/table_stress.zz \
      --gendata_sql=conf/mariadb/table_stress.sql \
      --mysqld=--innodb_use_native_aio=1 \
      --mysqld=--innodb_lock_schedule_algorithm=fcfs \
      --mysqld=--loose-idle_write_transaction_timeout=0 \
      --mysqld=--loose-idle_transaction_timeout=0 \
      --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      --mysqld=--connect_timeout=60 \
      --mysqld=--interactive_timeout=28800 \
      --mysqld=--slave_net_timeout=60 \
      --mysqld=--net_read_timeout=30 \
      --mysqld=--net_write_timeout=60 \
      --mysqld=--loose-table_lock_wait_timeout=50 \
      --mysqld=--wait_timeout=28800 \
      --mysqld=--lock-wait-timeout=86400 \
      --mysqld=--innodb-lock-wait-timeout=50 \
      --no-mask \
      --queries=10000000 \
      --seed=random \
      --reporters=Backtrace \
      --reporters=ErrorLog \
      --reporters=Deadlock1 \
      --validators=None \
      --mysqld=--log_output=none \
      --mysqld=--log-bin \
      --mysqld=--log_bin_trust_function_creators=1 \
      --mysqld=--loose-max-statement-time=30 \
      --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      --engine=InnoDB \
      --restart_timeout=120 \
      --duration=300 \
      --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      --mysqld=--innodb_stats_persistent=off \
      --threads=33 \
      --mysqld=--innodb_page_size=4K \
      --mysqld=--innodb-buffer-pool-size=256M \
      --duration=300 \
      --no_mask \
      ... certain local settings ...
       
      If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells
      that running a work flow invoking kill+restart was planned.
      But the data corruption was hit before the RQG reporter "CrashRecovery1"
      would kill the server process.
      

      Attachments

        Issue Links

          Activity

            mleich Matthias Leich created issue -
            mleich Matthias Leich made changes -
            Field Original Value New Value
            Description
            {noformat}
            Index corruption found during RQG testing
            Some CHECK TABLE t2 EXTENDED harvested
            test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.

            Work flow:
            1. Start the server and generate some initial data
            2. Several concurrent sessions runs a DDL/DML mix
                 and sometimes CHECK TABLE EXTENDED
            Please note critical events like killing the server process followed by Restart
            with CrashRecovery are not involved..

            I have some "rr" trace for the tree
            origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08
            which is based on actual 10.5 but contains some non final changes of Marko.
            Per Marko the failure cannot be caused by his changes.

            My test machinery has seen the same problem also on
            10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00
            but the corresponding details are lost.
            And so its unknown if that was during concurrent load or after crash with recovery.

            RQG
            git clone https://github.com/mleich1/rqg --branch experimental RQG
            origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00

            perl rqg.pl \
            --grammar=conf/mariadb/table_stress_innodb_nocopy.yy \
            --gendata=conf/mariadb/table_stress.zz \
            --gendata_sql=conf/mariadb/table_stress.sql \
            --mysqld=--innodb_use_native_aio=1 \
            --mysqld=--innodb_lock_schedule_algorithm=fcfs \
            --mysqld=--loose-idle_write_transaction_timeout=0 \
            --mysqld=--loose-idle_transaction_timeout=0 \
            --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            --mysqld=--connect_timeout=60 \
            --mysqld=--interactive_timeout=28800 \
            --mysqld=--slave_net_timeout=60 \
            --mysqld=--net_read_timeout=30 \
            --mysqld=--net_write_timeout=60 \
            --mysqld=--loose-table_lock_wait_timeout=50 \
            --mysqld=--wait_timeout=28800 \
            --mysqld=--lock-wait-timeout=86400 \
            --mysqld=--innodb-lock-wait-timeout=50 \
            --no-mask \
            --queries=10000000 \
            --seed=random \
            --reporters=Backtrace \
            --reporters=ErrorLog \
            --reporters=Deadlock1 \
            --validators=None \
            --mysqld=--log_output=none \
            --mysqld=--log-bin \
            --mysqld=--log_bin_trust_function_creators=1 \
            --mysqld=--loose-max-statement-time=30 \
            --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            --engine=InnoDB \
            --restart_timeout=120 \
            --duration=300 \
            --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
            --mysqld=--innodb_stats_persistent=off \
            --threads=33 \
            --mysqld=--innodb_page_size=4K \
            --mysqld=--innodb-buffer-pool-size=256M \
            --duration=300 \
            --no_mask \
            ... certain local settings ...

            If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells that running
            that running a work flow invoking kill+restart was planned.
            But the data corruption was hit before the RQG reporter "CrashRecovery1"
            would kill the server process.
            {noformat}
            {noformat}
            Index corruption found during RQG testing
            Some CHECK TABLE t2 EXTENDED harvested
            test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.
            And the server error log contains a
            [rr 76197 241886]2020-06-16 18:31:54 33 [ERROR] InnoDB: Index `Marvão_idx3` of table `test`.`t2` is corrupted
            [rr 76197 241890]2020-06-16 18:31:54 33 [Note] InnoDB: Load corrupted index `Marvão_idx3` of table `test`.`t2`

            Work flow:
            1. Start the server and generate some initial data
            2. Several concurrent sessions runs a DDL/DML mix
                 and sometimes CHECK TABLE EXTENDED
            Please note critical events like killing the server process followed by Restart
            with CrashRecovery are not involved..

            I have some "rr" trace for the tree
            origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08
            which is based on actual 10.5 but contains some non final changes of Marko.
            Per Marko the failure cannot be caused by his changes.

            My test machinery has seen the same problem also on
            10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00
            but the corresponding details are lost.
            And so its unknown if that was during concurrent load or after crash with recovery.

            RQG
            git clone https://github.com/mleich1/rqg --branch experimental RQG
            origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00

            perl rqg.pl \
            --grammar=conf/mariadb/table_stress_innodb_nocopy.yy \
            --gendata=conf/mariadb/table_stress.zz \
            --gendata_sql=conf/mariadb/table_stress.sql \
            --mysqld=--innodb_use_native_aio=1 \
            --mysqld=--innodb_lock_schedule_algorithm=fcfs \
            --mysqld=--loose-idle_write_transaction_timeout=0 \
            --mysqld=--loose-idle_transaction_timeout=0 \
            --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            --mysqld=--connect_timeout=60 \
            --mysqld=--interactive_timeout=28800 \
            --mysqld=--slave_net_timeout=60 \
            --mysqld=--net_read_timeout=30 \
            --mysqld=--net_write_timeout=60 \
            --mysqld=--loose-table_lock_wait_timeout=50 \
            --mysqld=--wait_timeout=28800 \
            --mysqld=--lock-wait-timeout=86400 \
            --mysqld=--innodb-lock-wait-timeout=50 \
            --no-mask \
            --queries=10000000 \
            --seed=random \
            --reporters=Backtrace \
            --reporters=ErrorLog \
            --reporters=Deadlock1 \
            --validators=None \
            --mysqld=--log_output=none \
            --mysqld=--log-bin \
            --mysqld=--log_bin_trust_function_creators=1 \
            --mysqld=--loose-max-statement-time=30 \
            --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            --engine=InnoDB \
            --restart_timeout=120 \
            --duration=300 \
            --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
            --mysqld=--innodb_stats_persistent=off \
            --threads=33 \
            --mysqld=--innodb_page_size=4K \
            --mysqld=--innodb-buffer-pool-size=256M \
            --duration=300 \
            --no_mask \
            ... certain local settings ...

            If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells that running
            that running a work flow invoking kill+restart was planned.
            But the data corruption was hit before the RQG reporter "CrashRecovery1"
            would kill the server process.
            {noformat}
            mleich Matthias Leich made changes -
            Assignee Matthias Leich [ mleich ] Marko Mäkelä [ marko ]
            Summary Draft: Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37. Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.
            mleich Matthias Leich made changes -
            Labels rr-profile need_rr
            mleich Matthias Leich made changes -
            Comment [ A comment with security level 'Developers' was removed. ]
            mleich Matthias Leich made changes -
            Labels need_rr rr-profile

            In the trace that I analyzed, both the clustered index and the secondary index page consist of a single page. There are 7 records in the clustered index (PRIMARY KEY(col1)) and 4 in the secondary index.
            Secondary index:(col1,col_string)=
            (1,'1111111111'),
            (2,'2222222222'),
            (332,'3333333333'),
            (7,'1111111111').
            None of the records are delete-marked. The first record was inserted last in the record heap; the others were in order. The PAGE_FREE list is empty. All this would suggest that the index was created afterwards and not at table creation.

            Clustered index: (col1,DB_TRX_ID,DB_ROLL_PTR,col2,col_int,col_string,col_text)=
            (1,0,1<<55,1,1,'1111111111',blob),
            (2,0,1<<55,2,2,'2222222222',blob),
            (8,0,1<<55,7,8,'8888888888',blob),
            (40,0,1<<55,39,40,'4444444444',blob),
            (332,0,1<<55,332,324,'3333333333',blob),
            (374,0,1<<55,1,373,'3333333333',blob),
            (1962,0,1<<55,1,324,'1111111111',blob).
            Again, no records are delete-marked, and the PAGE_FREE list is empty.

            There are not only missing records in the secondary index, but also an extra record col1=7, even though all history has been purged from the clustered index (DB_TRX_ID=0 was reset by MDEV-12288). Extra records in the secondary index are only allowed when they correspond to an old version of the clustered index records. Such records would also be delete-marked in the secondary index, unless the DELETE operation is in progress.

            In other words, this corruption could have manifested itself as MDEV-9663/MDEV-14643 messages as well. Unfortunately, the EXTENDED attribute of CHECK TABLE is being ignored by InnoDB. It would be good if it actually did what is requested: check that there is a 1:1 correspondence between the secondary index and clustered index records.

            Watchpoints on PAGE_N_RECS on the two pages will let us know what exactly happened:

            1. PRIMARY KEY(col1)=8 is inserted
            2. ROLLBACK will empty the table
            3. PRIMARY KEY(col1)=1486 is inserted
            4. ROLLBACK will empty the table
            5. PRIMARY KEY(col1)=332 is inserted
            6. PRIMARY KEY(col1)=1962 is inserted
            7. PRIMARY KEY(col1)=374 is inserted
            8. PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is inserted in a new transaction
            9. PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is deleted by ROLLBACK
            10. At event 51548, PRIMARY KEY(col1,DB_TRX_ID)=(0,0x28) is inserted
            11. PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is deleted by ROLLBACK
            12. At event 51578, COMMIT invokes MDL_context::release_transactional_locks() for transaction 0x28 even though it is still active!
            13. At event 55952, ha_innobase::prepare_inplace_alter_table() is invoked, even though transaction 0x28 is still active with 11 undo log records
            14. At event 56196, the secondary index page is initialized with 3 records.

            I think that we have seen enough. Here is the stack trace for what I think is the smoking gun:

            10.5 c89366866bca2df46b0592719a1f6b6dabf470cba

            Thread 1 hit Breakpoint 6, MDL_context::release_transactional_locks (this=this@entry=0x62b0000bd338) at /home/mleich/10.5/sql/mdl.cc:3043
            3043	{
            (rr) when
            Current event: 51578
            (rr) bt
            #0  MDL_context::release_transactional_locks (this=this@entry=0x62b0000bd338) at /home/mleich/10.5/sql/mdl.cc:3043
            #1  0x000055c67c48c8c8 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5585
            #2  0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false, 
                is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993
            #3  0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 " COMMIT  /* E_R Thread2 QNO 99 CON_ID 15 */ ", 
                packet_length=packet_length@entry=44, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866
            (rr) up
            #1  0x000055c67c48c8c8 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5585
            5585	    thd->mdl_context.release_transactional_locks();
            (rr) list
            5580	                     lex->tx_chain != TVL_NO));
            5581	    bool tx_release= (lex->tx_release == TVL_YES ||
            5582	                      (thd->variables.completion_type == 2 &&
            5583	                       lex->tx_release != TVL_NO));
            5584	    bool commit_failed= trans_commit(thd);
            5585	    thd->mdl_context.release_transactional_locks();
            5586	    if (commit_failed)
            5587	    {
            5588	      WSREP_DEBUG("COMMIT failed, MDL released: %lld",
            5589	                  (longlong) thd->thread_id);
            (rr) i lo
            tx_chain = false
            tx_release = false
            commit_failed = true

            This looks like the MDEV-15532/MDEV-22076 scenario: COMMIT is refused for an XA transaction, but yet we wrongly release the MDL that should keep protecting the transaction.

            mleich, even though the table has a nice small number of records, I do not think that it is useful to test the combination of XA+DDL+DML before those bugs have been addressed.

            Please try to reproduce this kind of corruption without XA, or DML with XA (but no concurrent DDL). I do not think that there are many users of XA yet, because it was practically broken before MDEV-742 was addressed in 10.5.

            marko Marko Mäkelä added a comment - In the trace that I analyzed, both the clustered index and the secondary index page consist of a single page. There are 7 records in the clustered index ( PRIMARY KEY(col1) ) and 4 in the secondary index. Secondary index:(col1,col_string)= (1,'1111111111'), (2,'2222222222'), (332,'3333333333'), (7,'1111111111'). None of the records are delete-marked. The first record was inserted last in the record heap; the others were in order. The PAGE_FREE list is empty. All this would suggest that the index was created afterwards and not at table creation. Clustered index: (col1,DB_TRX_ID,DB_ROLL_PTR,col2,col_int,col_string,col_text)= (1,0,1<<55,1,1,'1111111111',blob), (2,0,1<<55,2,2,'2222222222',blob), (8,0,1<<55,7,8,'8888888888',blob), (40,0,1<<55,39,40,'4444444444',blob), (332,0,1<<55,332,324,'3333333333',blob), (374,0,1<<55,1,373,'3333333333',blob), (1962,0,1<<55,1,324,'1111111111',blob). Again, no records are delete-marked, and the PAGE_FREE list is empty. There are not only missing records in the secondary index, but also an extra record col1=7, even though all history has been purged from the clustered index (DB_TRX_ID=0 was reset by MDEV-12288 ). Extra records in the secondary index are only allowed when they correspond to an old version of the clustered index records. Such records would also be delete-marked in the secondary index, unless the DELETE operation is in progress. In other words, this corruption could have manifested itself as MDEV-9663 / MDEV-14643 messages as well. Unfortunately, the EXTENDED attribute of CHECK TABLE is being ignored by InnoDB. It would be good if it actually did what is requested: check that there is a 1:1 correspondence between the secondary index and clustered index records. Watchpoints on PAGE_N_RECS on the two pages will let us know what exactly happened: PRIMARY KEY(col1)=8 is inserted ROLLBACK will empty the table PRIMARY KEY(col1)=1486 is inserted ROLLBACK will empty the table PRIMARY KEY(col1)=332 is inserted PRIMARY KEY(col1)=1962 is inserted PRIMARY KEY(col1)=374 is inserted PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is inserted in a new transaction PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is deleted by ROLLBACK At event 51548, PRIMARY KEY(col1,DB_TRX_ID)=(0,0x28) is inserted PRIMARY KEY(col1,DB_TRX_ID)=(0,0x2e) is deleted by ROLLBACK … At event 51578, COMMIT invokes MDL_context::release_transactional_locks() for transaction 0x28 even though it is still active! At event 55952, ha_innobase::prepare_inplace_alter_table() is invoked, even though transaction 0x28 is still active with 11 undo log records At event 56196, the secondary index page is initialized with 3 records. I think that we have seen enough. Here is the stack trace for what I think is the smoking gun: 10.5 c89366866bca2df46b0592719a1f6b6dabf470cba Thread 1 hit Breakpoint 6, MDL_context::release_transactional_locks (this=this@entry=0x62b0000bd338) at /home/mleich/10.5/sql/mdl.cc:3043 3043 { (rr) when Current event: 51578 (rr) bt #0 MDL_context::release_transactional_locks (this=this@entry=0x62b0000bd338) at /home/mleich/10.5/sql/mdl.cc:3043 #1 0x000055c67c48c8c8 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5585 #2 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993 #3 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 " COMMIT /* E_R Thread2 QNO 99 CON_ID 15 */ ", packet_length=packet_length@entry=44, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866 … (rr) up #1 0x000055c67c48c8c8 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5585 5585 thd->mdl_context.release_transactional_locks(); (rr) list 5580 lex->tx_chain != TVL_NO)); 5581 bool tx_release= (lex->tx_release == TVL_YES || 5582 (thd->variables.completion_type == 2 && 5583 lex->tx_release != TVL_NO)); 5584 bool commit_failed= trans_commit(thd); 5585 thd->mdl_context.release_transactional_locks(); 5586 if (commit_failed) 5587 { 5588 WSREP_DEBUG("COMMIT failed, MDL released: %lld", 5589 (longlong) thd->thread_id); (rr) i lo tx_chain = false tx_release = false commit_failed = true … This looks like the MDEV-15532 / MDEV-22076 scenario: COMMIT is refused for an XA transaction, but yet we wrongly release the MDL that should keep protecting the transaction. mleich , even though the table has a nice small number of records, I do not think that it is useful to test the combination of XA+DDL+DML before those bugs have been addressed. Please try to reproduce this kind of corruption without XA, or DML with XA (but no concurrent DDL). I do not think that there are many users of XA yet, because it was practically broken before MDEV-742 was addressed in 10.5.
            marko Marko Mäkelä made changes -
            Labels rr-profile rr-profile-analyzed
            marko Marko Mäkelä made changes -
            Labels rr-profile-analyzed need_rr rr-profile-analyzed

            mleich, I am sorry for blaming you. We do actually have XA transactions, but they are internal XA transactions for the binlog. Here are a couple examples from the same rr replay trace:

            Thread 3 hit Breakpoint 1, innobase_xa_prepare (hton=0x615000001e98, thd=0x62b0000bd218, prepare_trx=false) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:17190
            17190	{
            (rr) bt
            #0  innobase_xa_prepare (hton=0x615000001e98, thd=0x62b0000bd218, prepare_trx=false) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:17190
            #1  0x000055c67cd50c8b in prepare_or_error (ht=ht@entry=0x615000001e98, thd=thd@entry=0x62b0000bd218, all=all@entry=false) at /home/mleich/10.5/sql/handler.cc:1364
            #2  0x000055c67cd54297 in ha_commit_trans (thd=thd@entry=0x62b0000bd218, all=all@entry=false) at /home/mleich/10.5/sql/handler.cc:1742
            #3  0x000055c67c94e010 in trans_commit_stmt (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:472
            #4  0x000055c67c48fee7 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:6011
            #5  0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false, 
                is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993
            #6  0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, 
                packet=packet@entry=0x629000ae1219 " INSERT INTO t1 (col1,col2, col_int, col_string, col_text) VALUES ( 8, 8, 8 - 1, REPEAT(SUBSTR(CAST( 8 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 8 AS CHAR),1,1), @fill_amount) ) /* E_R Thread2 QNO 36 CO"..., packet_length=packet_length@entry=211, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866
            #7  0x000055c67c4676ab in do_command (thd=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:1347
            #8  0x000055c67c907775 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x60800000cf38, put_in_cache=put_in_cache@entry=true) at /home/mleich/10.5/sql/sql_connect.cc:1411
            #9  0x000055c67c908a19 in handle_one_connection (arg=arg@entry=0x60800000cf38) at /home/mleich/10.5/sql/sql_connect.cc:1313
            #10 0x000055c67d682162 in pfs_spawn_thread (arg=0x615000008f18) at /home/mleich/10.5/storage/perfschema/pfs.cc:2201
            #11 0x00007ffd78caf6db in start_thread (arg=0x443e7feb0300) at pthread_create.c:463
            #12 0x00001f030f433a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            (rr) when
            Current event: 48560
            (rr) continue
            Thread 14 hit Breakpoint 5, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            (rr) when
            Current event: 51578
            (rr) bt
            #0  trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256
            #1  0x000055c67c48c8af in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5584
            #2  0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false, 
                is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993
            #3  0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 " COMMIT  /* E_R Thread2 QNO 99 CON_ID 15 */ ", 
                packet_length=packet_length@entry=44, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866
            #4  0x000055c67c4676ab in do_command (thd=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:1347
            #5  0x000055c67c907775 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x60800000cf38, put_in_cache=put_in_cache@entry=true) at /home/mleich/10.5/sql/sql_connect.cc:1411
            #6  0x000055c67c908a19 in handle_one_connection (arg=arg@entry=0x60800000cf38) at /home/mleich/10.5/sql/sql_connect.cc:1313
            #7  0x000055c67d682162 in pfs_spawn_thread (arg=0x615000008f18) at /home/mleich/10.5/storage/perfschema/pfs.cc:2201
            #8  0x00007ffd78caf6db in start_thread (arg=0x443e7feb0300) at pthread_create.c:463
            #9  0x00001f030f433a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            

            The trans_commit() fails because trans_check() fails due to thd->transaction->xid_state.is_explicit_XA().

            I find it hard to follow the control flow. Watchpoints are not useful, because the flags seem to be "disabled by default". For example, trans->no_2pc was never set, and I would expect the same to hold for the thd->transaction->xid_state.xid_cache_element.

            If this corruption is not repeatable with --skip-log-bin, that would explain why mikhail.izioumtchenko@booking.com never repeated this corruption for MySQL when we were employed by Oracle.

            marko Marko Mäkelä added a comment - mleich , I am sorry for blaming you. We do actually have XA transactions, but they are internal XA transactions for the binlog. Here are a couple examples from the same rr replay trace: Thread 3 hit Breakpoint 1, innobase_xa_prepare (hton=0x615000001e98, thd=0x62b0000bd218, prepare_trx=false) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:17190 17190 { (rr) bt #0 innobase_xa_prepare (hton=0x615000001e98, thd=0x62b0000bd218, prepare_trx=false) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:17190 #1 0x000055c67cd50c8b in prepare_or_error (ht=ht@entry=0x615000001e98, thd=thd@entry=0x62b0000bd218, all=all@entry=false) at /home/mleich/10.5/sql/handler.cc:1364 #2 0x000055c67cd54297 in ha_commit_trans (thd=thd@entry=0x62b0000bd218, all=all@entry=false) at /home/mleich/10.5/sql/handler.cc:1742 #3 0x000055c67c94e010 in trans_commit_stmt (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:472 #4 0x000055c67c48fee7 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:6011 #5 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993 #6 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 " INSERT INTO t1 (col1,col2, col_int, col_string, col_text) VALUES ( 8, 8, 8 - 1, REPEAT(SUBSTR(CAST( 8 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 8 AS CHAR),1,1), @fill_amount) ) /* E_R Thread2 QNO 36 CO"..., packet_length=packet_length@entry=211, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866 #7 0x000055c67c4676ab in do_command (thd=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:1347 #8 0x000055c67c907775 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x60800000cf38, put_in_cache=put_in_cache@entry=true) at /home/mleich/10.5/sql/sql_connect.cc:1411 #9 0x000055c67c908a19 in handle_one_connection (arg=arg@entry=0x60800000cf38) at /home/mleich/10.5/sql/sql_connect.cc:1313 #10 0x000055c67d682162 in pfs_spawn_thread (arg=0x615000008f18) at /home/mleich/10.5/storage/perfschema/pfs.cc:2201 #11 0x00007ffd78caf6db in start_thread (arg=0x443e7feb0300) at pthread_create.c:463 #12 0x00001f030f433a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (rr) when Current event: 48560 (rr) continue … Thread 14 hit Breakpoint 5, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) when Current event: 51578 (rr) bt #0 trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 #1 0x000055c67c48c8af in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5584 #2 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993 #3 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 " COMMIT /* E_R Thread2 QNO 99 CON_ID 15 */ ", packet_length=packet_length@entry=44, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866 #4 0x000055c67c4676ab in do_command (thd=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:1347 #5 0x000055c67c907775 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x60800000cf38, put_in_cache=put_in_cache@entry=true) at /home/mleich/10.5/sql/sql_connect.cc:1411 #6 0x000055c67c908a19 in handle_one_connection (arg=arg@entry=0x60800000cf38) at /home/mleich/10.5/sql/sql_connect.cc:1313 #7 0x000055c67d682162 in pfs_spawn_thread (arg=0x615000008f18) at /home/mleich/10.5/storage/perfschema/pfs.cc:2201 #8 0x00007ffd78caf6db in start_thread (arg=0x443e7feb0300) at pthread_create.c:463 #9 0x00001f030f433a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 The trans_commit() fails because trans_check() fails due to thd->transaction->xid_state.is_explicit_XA() . I find it hard to follow the control flow. Watchpoints are not useful, because the flags seem to be "disabled by default". For example, trans->no_2pc was never set, and I would expect the same to hold for the thd->transaction->xid_state.xid_cache_element . If this corruption is not repeatable with --skip-log-bin , that would explain why mikhail.izioumtchenko@booking.com never repeated this corruption for MySQL when we were employed by Oracle.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Component/s Replication [ 10100 ]
            Labels need_rr rr-profile-analyzed rr-profile
            marko Marko Mäkelä made changes -

            Thanks to sujatha.sivakumar, I verified with one more breakpoint that the failure looks like a duplicate of the MDEV-15532 scenario (explicit use of XA START, with wrong COMMIT. Sorry for the false alarm.
            mleich, please try again without XA.

            Thread 2 hit Breakpoint 1, xid_cache_insert (thd=thd@entry=0x62b0000cb218, 
                xid_state=0x62b0000ce838, xid=0x62b0000d2350)
                at /home/mleich/10.5/sql/xa.cc:286
            286	{
            (rr) when
            Current event: 49759
            (rr) bt
            #0  xid_cache_insert (thd=thd@entry=0x62b0000cb218, xid_state=0x62b0000ce838, 
                xid=0x62b0000d2350) at /home/mleich/10.5/sql/xa.cc:286
            #1  0x000055c67cb683d0 in trans_xa_start (thd=thd@entry=0x62b0000cb218)
                at /home/mleich/10.5/sql/xa.cc:453
            #2  0x000055c67c48e1fa in mysql_execute_command (thd=thd@entry=0x62b0000cb218)
                at /home/mleich/10.5/sql/sql_parse.cc:5793
            #3  0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000cb218, 
                rawbuf=<optimized out>, length=<optimized out>, 
                parser_state=parser_state@entry=0x64a36dd61ff0, 
                is_com_multi=is_com_multi@entry=false, 
                is_next_command=is_next_command@entry=false)
                at /home/mleich/10.5/sql/sql_parse.cc:7993
            #4  0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, 
                thd=thd@entry=0x62b0000cb218, 
                packet=packet@entry=0x629000ac3219 "XA START 'xid96'  /* E_R Thread1 QNO 40 CON_ID 14 */ ", packet_length=packet_length@entry=53, 
            (rr) break trans_commit
            Breakpoint 2 at 0x55c67c94c6b7: file /home/mleich/10.5/sql/transaction.cc, line 256.
            (rr) c
            Continuing.
            [Switching to Thread 26146.38140]
             
            Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            (rr) when
            Current event: 49854
            (rr) c
            Continuing.
            [Switching to Thread 26146.39212]
             
            Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            (rr) when
            Current event: 49896
            (rr) bt
            #0  trans_commit (thd=thd@entry=0x62b0000cb218)
                at /home/mleich/10.5/sql/transaction.cc:256
            #1  0x000055c67c48c8af in mysql_execute_command (thd=thd@entry=0x62b0000cb218)
                at /home/mleich/10.5/sql/sql_parse.cc:5584
            #2  0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000cb218, 
                rawbuf=<optimized out>, length=<optimized out>, 
                parser_state=parser_state@entry=0x64a36dd61ff0, 
                is_com_multi=is_com_multi@entry=false, 
                is_next_command=is_next_command@entry=false)
                at /home/mleich/10.5/sql/sql_parse.cc:7993
            #3  0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, 
                thd=thd@entry=0x62b0000cb218, 
                packet=packet@entry=0x629000ac3219 " COMMIT  /* E_R Thread1 QNO 44 CON_ID 14 */ ", packet_length=packet_length@entry=44, 
                is_com_multi=is_com_multi@entry=false, 
                is_next_command=is_next_command@entry=false)
                at /home/mleich/10.5/sql/sql_parse.cc:1866
            (rr) n
            258	  DBUG_ENTER("trans_commit");
            (rr) 
            260	  if (trans_check(thd))
            (rr) 
            261	    DBUG_RETURN(TRUE);
            

            According to this, the thread first executed XA START and then the same thread attempted to execute COMMIT instead of an XA statement. This is exactly the MDEV-15532 scenario.

            marko Marko Mäkelä added a comment - Thanks to sujatha.sivakumar , I verified with one more breakpoint that the failure looks like a duplicate of the MDEV-15532 scenario (explicit use of XA START , with wrong COMMIT . Sorry for the false alarm. mleich , please try again without XA. Thread 2 hit Breakpoint 1, xid_cache_insert (thd=thd@entry=0x62b0000cb218, xid_state=0x62b0000ce838, xid=0x62b0000d2350) at /home/mleich/10.5/sql/xa.cc:286 286 { (rr) when Current event: 49759 (rr) bt #0 xid_cache_insert (thd=thd@entry=0x62b0000cb218, xid_state=0x62b0000ce838, xid=0x62b0000d2350) at /home/mleich/10.5/sql/xa.cc:286 #1 0x000055c67cb683d0 in trans_xa_start (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/xa.cc:453 #2 0x000055c67c48e1fa in mysql_execute_command (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/sql_parse.cc:5793 #3 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000cb218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x64a36dd61ff0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993 #4 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000cb218, packet=packet@entry=0x629000ac3219 "XA START 'xid96' /* E_R Thread1 QNO 40 CON_ID 14 */ ", packet_length=packet_length@entry=53, … (rr) break trans_commit Breakpoint 2 at 0x55c67c94c6b7: file /home/mleich/10.5/sql/transaction.cc, line 256. (rr) c Continuing. [Switching to Thread 26146.38140]   Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) when Current event: 49854 (rr) c Continuing. [Switching to Thread 26146.39212]   Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) when Current event: 49896 (rr) bt #0 trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 #1 0x000055c67c48c8af in mysql_execute_command (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/sql_parse.cc:5584 #2 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000cb218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x64a36dd61ff0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993 #3 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000cb218, packet=packet@entry=0x629000ac3219 " COMMIT /* E_R Thread1 QNO 44 CON_ID 14 */ ", packet_length=packet_length@entry=44, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866 … (rr) n 258 DBUG_ENTER("trans_commit"); (rr) 260 if (trans_check(thd)) (rr) 261 DBUG_RETURN(TRUE); According to this, the thread first executed XA START and then the same thread attempted to execute COMMIT instead of an XA statement. This is exactly the MDEV-15532 scenario.
            marko Marko Mäkelä made changes -
            Component/s Replication [ 10100 ]
            Labels rr-profile need_rr rr-profile-analyzed
            marko Marko Mäkelä added a comment - - edited

            I posted an earlier XA START;COMMIT pair than the one that involves the transaction 0x28. Here is the correct pair that explains why the COMMIT fails:

            Thread 14 hit Breakpoint 1, xid_cache_insert (thd=thd@entry=0x62b0000bd218, 
                xid_state=0x62b0000c0838, xid=0x62b0000c4350)
                at /home/mleich/10.5/sql/xa.cc:286
            286	{
            (rr) when
            Current event: 50027
            (rr) bt
            #0  xid_cache_insert (thd=thd@entry=0x62b0000bd218, xid_state=0x62b0000c0838, 
                xid=0x62b0000c4350) at /home/mleich/10.5/sql/xa.cc:286
            #1  0x000055c67cb683d0 in trans_xa_start (thd=thd@entry=0x62b0000bd218)
                at /home/mleich/10.5/sql/xa.cc:453
            #2  0x000055c67c48e1fa in mysql_execute_command (thd=thd@entry=0x62b0000bd218)
                at /home/mleich/10.5/sql/sql_parse.cc:5793
            #3  0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, 
                rawbuf=<optimized out>, length=<optimized out>, 
                parser_state=parser_state@entry=0x443e7feadff0, 
                is_com_multi=is_com_multi@entry=false, 
                is_next_command=is_next_command@entry=false)
                at /home/mleich/10.5/sql/sql_parse.cc:7993
            #4  0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, 
                thd=thd@entry=0x62b0000bd218, 
                packet=packet@entry=0x629000ae1219 "XA START 'xid164'  /* E_R Thread2 QNO 62 CON_ID 15 */ ", packet_length=packet_length@entry=54, 
                is_com_multi=is_com_multi@entry=false, 
                is_next_command=is_next_command@entry=false)
                at /home/mleich/10.5/sql/sql_parse.cc:1866
            (rr) break trans_commit
            Breakpoint 2 at 0x55c67c94c6b7: file /home/mleich/10.5/sql/transaction.cc, line 256.
            (rr) c
            Continuing.
            Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            (rr) when
            Current event: 50104
            (rr) c
            Continuing.
            [Switching to Thread 26146.39212]
             
            Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            (rr) when
            Current event: 50421
            (rr) c
            Continuing.
            Continuing.
            [Switching to Thread 26146.38140]
             
            Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            (rr) when
            Current event: 50475
            (rr) c
            Continuing.
            [Switching to Thread 26146.39212]
             
            Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            (rr) command 2
            Type commands for breakpoint(s) 2, one per line.
            End with a line saying just "end".
            >when
            >end
            (rr) c
            Continuing.
             
            Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            Current event: 50613
            (rr) c
            Continuing.
            [Switching to Thread 26146.38140]
             
            Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            Current event: 51061
            (rr) c
            Continuing.
            [Switching to Thread 26146.39212]
             
            Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            Current event: 51084
            (rr) 
            Continuing.
            [Switching to Thread 26146.38140]
             
            Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            Current event: 51248
            (rr) 
            Continuing.
            [Switching to Thread 26146.39212]
             
            Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            Current event: 51299
            (rr) 
            Continuing.
            [Switching to Thread 26146.38140]
             
            Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            Current event: 51463
            (rr) 
            Continuing.
            [Switching to Thread 26146.39212]
             
            Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            Current event: 51516
            (rr) 
            Continuing.
            [Switching to Thread 26146.38140]
             
            Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218)
                at /home/mleich/10.5/sql/transaction.cc:256
            256	{
            Current event: 51578
            

            If you search for Thread 14 in the above, you can see that the XA START 'xid164' was followed by multiple incorrect COMMIT. The invalid MDL release should have occurred already at event 50475, and not 51578, like I claimed earlier. Either way, the ha_innobase::prepare_inplace_alter_table() is able to proceed at event 55952 even though the conflicting transaction is still active inside InnoDB.

            marko Marko Mäkelä added a comment - - edited I posted an earlier XA START;COMMIT pair than the one that involves the transaction 0x28. Here is the correct pair that explains why the COMMIT fails: Thread 14 hit Breakpoint 1, xid_cache_insert (thd=thd@entry=0x62b0000bd218, xid_state=0x62b0000c0838, xid=0x62b0000c4350) at /home/mleich/10.5/sql/xa.cc:286 286 { (rr) when Current event: 50027 (rr) bt #0 xid_cache_insert (thd=thd@entry=0x62b0000bd218, xid_state=0x62b0000c0838, xid=0x62b0000c4350) at /home/mleich/10.5/sql/xa.cc:286 #1 0x000055c67cb683d0 in trans_xa_start (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/xa.cc:453 #2 0x000055c67c48e1fa in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5793 #3 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993 #4 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 "XA START 'xid164' /* E_R Thread2 QNO 62 CON_ID 15 */ ", packet_length=packet_length@entry=54, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866 … (rr) break trans_commit Breakpoint 2 at 0x55c67c94c6b7: file /home/mleich/10.5/sql/transaction.cc, line 256. (rr) c Continuing. Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) when Current event: 50104 (rr) c Continuing. [Switching to Thread 26146.39212]   Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) when Current event: 50421 (rr) c Continuing. Continuing. [Switching to Thread 26146.38140]   Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) when Current event: 50475 (rr) c Continuing. [Switching to Thread 26146.39212]   Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { (rr) command 2 Type commands for breakpoint(s) 2, one per line. End with a line saying just "end". >when >end (rr) c Continuing.   Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 50613 (rr) c Continuing. [Switching to Thread 26146.38140]   Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51061 (rr) c Continuing. [Switching to Thread 26146.39212]   Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51084 (rr) Continuing. [Switching to Thread 26146.38140]   Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51248 (rr) Continuing. [Switching to Thread 26146.39212]   Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51299 (rr) Continuing. [Switching to Thread 26146.38140]   Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51463 (rr) Continuing. [Switching to Thread 26146.39212]   Thread 2 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000cb218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51516 (rr) Continuing. [Switching to Thread 26146.38140]   Thread 14 hit Breakpoint 2, trans_commit (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/transaction.cc:256 256 { Current event: 51578 If you search for Thread 14 in the above, you can see that the XA START 'xid164' was followed by multiple incorrect COMMIT . The invalid MDL release should have occurred already at event 50475, and not 51578, like I claimed earlier. Either way, the ha_innobase::prepare_inplace_alter_table() is able to proceed at event 55952 even though the conflicting transaction is still active inside InnoDB.

            Sorry for confusion
            The rr trace offered belongs to some different RQG run than the run mentioned on top (002082.log).
            Both runs show "Warning InnoDB: Index 'Marvão_idx3' contains .... entries, should be ..." but
            - in the first run (002082.log) the grammar does not generate XA commands
            - in the run with rr trace XA commands are generated.
            I will make new rr traces with the first grammar.
            
            

            mleich Matthias Leich added a comment - Sorry for confusion The rr trace offered belongs to some different RQG run than the run mentioned on top (002082.log). Both runs show "Warning InnoDB: Index 'Marvão_idx3' contains .... entries, should be ..." but - in the first run (002082.log) the grammar does not generate XA commands - in the run with rr trace XA commands are generated. I will make new rr traces with the first grammar.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            For several weeks, we had lost access to the environment where this bug could be reproduced. We not only need to reproduce this extremely rare scenario, but we also need to reproduce it while the server is running under rr record, so that the execution trace can be analyzed. It turns out that https://rr-project.org/ does not work in most virtual machine environments.

            marko Marko Mäkelä added a comment - For several weeks, we had lost access to the environment where this bug could be reproduced. We not only need to reproduce this extremely rare scenario, but we also need to reproduce it while the server is running under rr record , so that the execution trace can be analyzed. It turns out that https://rr-project.org/ does not work in most virtual machine environments.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            julien.fritsch Julien Fritsch made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            julien.fritsch Julien Fritsch made changes -
            Labels need_rr rr-profile-analyzed need_feedback need_rr rr-profile-analyzed
            julien.fritsch Julien Fritsch made changes -
            mleich Matthias Leich made changes -
            Description {noformat}
            Index corruption found during RQG testing
            Some CHECK TABLE t2 EXTENDED harvested
            test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.
            And the server error log contains a
            [rr 76197 241886]2020-06-16 18:31:54 33 [ERROR] InnoDB: Index `Marvão_idx3` of table `test`.`t2` is corrupted
            [rr 76197 241890]2020-06-16 18:31:54 33 [Note] InnoDB: Load corrupted index `Marvão_idx3` of table `test`.`t2`

            Work flow:
            1. Start the server and generate some initial data
            2. Several concurrent sessions runs a DDL/DML mix
                 and sometimes CHECK TABLE EXTENDED
            Please note critical events like killing the server process followed by Restart
            with CrashRecovery are not involved..

            I have some "rr" trace for the tree
            origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08
            which is based on actual 10.5 but contains some non final changes of Marko.
            Per Marko the failure cannot be caused by his changes.

            My test machinery has seen the same problem also on
            10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00
            but the corresponding details are lost.
            And so its unknown if that was during concurrent load or after crash with recovery.

            RQG
            git clone https://github.com/mleich1/rqg --branch experimental RQG
            origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00

            perl rqg.pl \
            --grammar=conf/mariadb/table_stress_innodb_nocopy.yy \
            --gendata=conf/mariadb/table_stress.zz \
            --gendata_sql=conf/mariadb/table_stress.sql \
            --mysqld=--innodb_use_native_aio=1 \
            --mysqld=--innodb_lock_schedule_algorithm=fcfs \
            --mysqld=--loose-idle_write_transaction_timeout=0 \
            --mysqld=--loose-idle_transaction_timeout=0 \
            --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            --mysqld=--connect_timeout=60 \
            --mysqld=--interactive_timeout=28800 \
            --mysqld=--slave_net_timeout=60 \
            --mysqld=--net_read_timeout=30 \
            --mysqld=--net_write_timeout=60 \
            --mysqld=--loose-table_lock_wait_timeout=50 \
            --mysqld=--wait_timeout=28800 \
            --mysqld=--lock-wait-timeout=86400 \
            --mysqld=--innodb-lock-wait-timeout=50 \
            --no-mask \
            --queries=10000000 \
            --seed=random \
            --reporters=Backtrace \
            --reporters=ErrorLog \
            --reporters=Deadlock1 \
            --validators=None \
            --mysqld=--log_output=none \
            --mysqld=--log-bin \
            --mysqld=--log_bin_trust_function_creators=1 \
            --mysqld=--loose-max-statement-time=30 \
            --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            --engine=InnoDB \
            --restart_timeout=120 \
            --duration=300 \
            --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
            --mysqld=--innodb_stats_persistent=off \
            --threads=33 \
            --mysqld=--innodb_page_size=4K \
            --mysqld=--innodb-buffer-pool-size=256M \
            --duration=300 \
            --no_mask \
            ... certain local settings ...

            If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells that running
            that running a work flow invoking kill+restart was planned.
            But the data corruption was hit before the RQG reporter "CrashRecovery1"
            would kill the server process.
            {noformat}
            {noformat}
            Index corruption found during RQG testing
            Some CHECK TABLE t2 EXTENDED harvested
            test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.
            And the server error log contains a
            [rr 76197 241886]2020-06-16 18:31:54 33 [ERROR] InnoDB: Index `Marvão_idx3` of table `test`.`t2` is corrupted
            [rr 76197 241890]2020-06-16 18:31:54 33 [Note] InnoDB: Load corrupted index `Marvão_idx3` of table `test`.`t2`

            Work flow:
            1. Start the server and generate some initial data
            2. Several concurrent sessions runs a DDL/DML mix
                 and sometimes CHECK TABLE EXTENDED
            Please note critical events like killing the server process followed by Restart
            with CrashRecovery are not involved..

            I have some "rr" trace for the tree
            origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08
            which is based on actual 10.5 but contains some non final changes of Marko.
            Per Marko the failure cannot be caused by his changes.

            My test machinery has seen the same problem also on
            10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00
            but the corresponding details are lost.
            And so its unknown if that was during concurrent load or after crash with recovery.

            RQG
            git clone https://github.com/mleich1/rqg --branch experimental RQG
            origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00

            perl rqg.pl \
            --grammar=conf/mariadb/table_stress_innodb_nocopy.yy \
            --gendata=conf/mariadb/table_stress.zz \
            --gendata_sql=conf/mariadb/table_stress.sql \
            --mysqld=--innodb_use_native_aio=1 \
            --mysqld=--innodb_lock_schedule_algorithm=fcfs \
            --mysqld=--loose-idle_write_transaction_timeout=0 \
            --mysqld=--loose-idle_transaction_timeout=0 \
            --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            --mysqld=--connect_timeout=60 \
            --mysqld=--interactive_timeout=28800 \
            --mysqld=--slave_net_timeout=60 \
            --mysqld=--net_read_timeout=30 \
            --mysqld=--net_write_timeout=60 \
            --mysqld=--loose-table_lock_wait_timeout=50 \
            --mysqld=--wait_timeout=28800 \
            --mysqld=--lock-wait-timeout=86400 \
            --mysqld=--innodb-lock-wait-timeout=50 \
            --no-mask \
            --queries=10000000 \
            --seed=random \
            --reporters=Backtrace \
            --reporters=ErrorLog \
            --reporters=Deadlock1 \
            --validators=None \
            --mysqld=--log_output=none \
            --mysqld=--log-bin \
            --mysqld=--log_bin_trust_function_creators=1 \
            --mysqld=--loose-max-statement-time=30 \
            --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            --engine=InnoDB \
            --restart_timeout=120 \
            --duration=300 \
            --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
            --mysqld=--innodb_stats_persistent=off \
            --threads=33 \
            --mysqld=--innodb_page_size=4K \
            --mysqld=--innodb-buffer-pool-size=256M \
            --duration=300 \
            --no_mask \
            ... certain local settings ...

            If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells
            that running a work flow invoking kill+restart was planned.
            But the data corruption was hit before the RQG reporter "CrashRecovery1"
            would kill the server process.
            {noformat}
            mleich Matthias Leich made changes -
            Labels need_feedback need_rr rr-profile-analyzed need_feedback rr-profile rr-profile-analyzed

            I analyzed another case of this on 10.5 9ef36faa614528b66e0a6ba58a4c40b246658b83. The primary key index contains 32 records, while the secondary index contains 33 records. Unfortunately, the secondary index is defined on a virtual column, and therefore this corruption should be a duplicate of one of the bugs related to indexed virtual columns (MDEV-5800).

            The contents of the secondary index (col_int_g,col1) is as follows. None of the records are delete-marked:

            bebebebe,0004
            bebebebe,0006
            bebebebe,0007
            bebebebe,0011
            bebebebe,001f
            bebebebe,0023
            bebebebe,0032
            bebebebe,003a
            bebebebe,0092
            bebebebe,00b3
            bebebebe,00d0
            bebebebe,01c7
            bebebebe,0384
            bebebebe,03c8
            bebebebe,0746
            bebebebe,077f
            bebebebe,082c
            bebebebe,0940
            bebebebe,098c
            bebebebe,0c77
            bebebebe,0d53
            bebebebe,0de2
            bebebebe,0e06
            0008,0008
            0033,0033
            003e,003e
            00f7,00f8
            0183,0183
            05c1,05c1
            0b9f,0b9f
            0ba9,0ba9
            0e05,0e06
            0fa8,0fa9
            

            The 23 values with the negative INT numbers 0xbebebebe (0x3ebebebe in the InnoDB table) are somewhat suspicious. InnoDB used to use the constant 0xbe in the file mem0dbg.cc, but that code was removed in MySQL 5.7.4 and MariaDB 10.2.2.
            The table was created and altered to reach the following final definition:

            CREATE TABLE t1 (
              col1 int(11) NOT NULL,
              col2 int(11) DEFAULT NULL,
              col_int int(11) DEFAULT NULL,
              col_string char(19) DEFAULT NULL,
              col_varchar varchar(500) DEFAULT NULL,
              col_text text DEFAULT NULL,
              col_int_g int(11) GENERATED ALWAYS AS (col_int) VIRTUAL,
              col_string_g char(13) GENERATED ALWAYS AS (substr(col_string,4,13)) VIRTUAL,
              col_text_g text GENERATED ALWAYS AS (substr(col_text,1,499)) VIRTUAL,
              col2_copy int(11) DEFAULT NULL,
              PRIMARY KEY (col1),
              UNIQUE KEY marvão_uidx2 (col_int),
              UNIQUE KEY uidx3 (col_int,col2),
              UNIQUE KEY uidx2 (col_int),
              UNIQUE KEY uidx1 (col_text(9)),
              KEY idx1 (col_text(9),col1),
              KEY idx3 (col1),
              KEY MArvão_idx3 (col_int_g),
              KEY idx2 (col_text(9),col_int)
            ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC
            

            So, we should have col_int_g=col1, which is the primary key. But in the primary key index the smallest value of col1 is 4, not -0x3ebebebe. After the 23 first tuples of the secondary index we also have a few other records where the two columns differ (only by 1).

            I think that we need a test case that does not include indexed virtual columns.

            nikitamalyavin, which bug on virtual columns would you think that this case is duplicating?

            marko Marko Mäkelä added a comment - I analyzed another case of this on 10.5 9ef36faa614528b66e0a6ba58a4c40b246658b83. The primary key index contains 32 records, while the secondary index contains 33 records. Unfortunately, the secondary index is defined on a virtual column, and therefore this corruption should be a duplicate of one of the bugs related to indexed virtual columns ( MDEV-5800 ). The contents of the secondary index (col_int_g,col1) is as follows. None of the records are delete-marked: bebebebe,0004 bebebebe,0006 bebebebe,0007 bebebebe,0011 bebebebe,001f bebebebe,0023 bebebebe,0032 bebebebe,003a bebebebe,0092 bebebebe,00b3 bebebebe,00d0 bebebebe,01c7 bebebebe,0384 bebebebe,03c8 bebebebe,0746 bebebebe,077f bebebebe,082c bebebebe,0940 bebebebe,098c bebebebe,0c77 bebebebe,0d53 bebebebe,0de2 bebebebe,0e06 0008,0008 0033,0033 003e,003e 00f7,00f8 0183,0183 05c1,05c1 0b9f,0b9f 0ba9,0ba9 0e05,0e06 0fa8,0fa9 The 23 values with the negative INT numbers 0xbebebebe (0x3ebebebe in the InnoDB table) are somewhat suspicious. InnoDB used to use the constant 0xbe in the file mem0dbg.cc , but that code was removed in MySQL 5.7.4 and MariaDB 10.2.2. The table was created and altered to reach the following final definition: CREATE TABLE t1 ( col1 int (11) NOT NULL , col2 int (11) DEFAULT NULL , col_int int (11) DEFAULT NULL , col_string char (19) DEFAULT NULL , col_varchar varchar (500) DEFAULT NULL , col_text text DEFAULT NULL , col_int_g int (11) GENERATED ALWAYS AS (col_int) VIRTUAL, col_string_g char (13) GENERATED ALWAYS AS (substr(col_string,4,13)) VIRTUAL, col_text_g text GENERATED ALWAYS AS (substr(col_text,1,499)) VIRTUAL, col2_copy int (11) DEFAULT NULL , PRIMARY KEY (col1), UNIQUE KEY marvão_uidx2 (col_int), UNIQUE KEY uidx3 (col_int,col2), UNIQUE KEY uidx2 (col_int), UNIQUE KEY uidx1 (col_text(9)), KEY idx1 (col_text(9),col1), KEY idx3 (col1), KEY MArvão_idx3 (col_int_g), KEY idx2 (col_text(9),col_int) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT= DYNAMIC So, we should have col_int_g=col1 , which is the primary key. But in the primary key index the smallest value of col1 is 4, not -0x3ebebebe. After the 23 first tuples of the secondary index we also have a few other records where the two columns differ (only by 1). I think that we need a test case that does not include indexed virtual columns. nikitamalyavin , which bug on virtual columns would you think that this case is duplicating?
            marko Marko Mäkelä made changes -
            Labels need_feedback rr-profile rr-profile-analyzed need_feedback need_rr rr-profile rr-profile-analyzed

            The following rr trace shows a case where 
            CHECK TABLE `test`.`table100_innodb_key_pk_parts_2_int_autoinc` EXTENDED harvests
                    'test.table100_innodb_key_pk_parts_2_int_autoinc',  'check', 'error',         'Partition p0 returned error'
                    'test.table100_innodb_key_pk_parts_2_int_autoinc',  'check', 'Warning',  'InnoDB: Index \'col_varchar_255_ucs2_key\' contains 79 entries, should be 80.'
                    'test.table100_innodb_key_pk_parts_2_int_autoinc',  'check', 'error',         'Corrupt'
            Workflow:
            1. Start DB server and create a few tables.
                 Generated columns or foreign keys are not involved. But one table is partitioned.
                 innodb-buffer-pool-size=5M is exceptional small.
            2. One session runs DML only
            3. At some point of time (2. is ongoing) the server process gets killed with SIGKILL
            4. Restart and check the consistency of table via SELECTs and CHECK TABLE extended          
             
            HEAD, origin/bb-10.5-MDEV-23456 20a8a5ba50eae3d0ae92da202f5fada89409d072 2020-08-28T22:54:26+05:30
            compiled with ASAN and "Og".
             
            RQG call
            perl rqg.pl \
            --grammar=conf/engines/many_indexes.yy \
            --gendata=conf/engines/many_indexes.zz \
            --reporters=CrashRecovery1 \
            --mysqld=--innodb_use_native_aio=1 \
            --mysqld=--innodb_lock_schedule_algorithm=fcfs \
            --mysqld=--loose-idle_write_transaction_timeout=0 \
            --mysqld=--loose-idle_transaction_timeout=0 \
            --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            --mysqld=--connect_timeout=60 \
            --mysqld=--interactive_timeout=28800 \
            --mysqld=--slave_net_timeout=60 \
            --mysqld=--net_read_timeout=30 \
            --mysqld=--net_write_timeout=60 \
            --mysqld=--loose-table_lock_wait_timeout=50 \
            --mysqld=--wait_timeout=28800 \
            --mysqld=--lock-wait-timeout=86400 \
            --mysqld=--innodb-lock-wait-timeout=50 \
            --no-mask \
            --queries=10000000 \
            --seed=random \
            --reporters=Backtrace \
            --reporters=ErrorLog \
            --reporters=Deadlock1 \
            --validators=None \
            --mysqld=--log_output=none \
            --mysqld=--log-bin \
            --mysqld=--log_bin_trust_function_creators=1 \
            --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            --engine=InnoDB \
            --restart_timeout=120 \
            --duration=300 \
            --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
            --mysqld=--loose-innodb-sync-debug \
            --mysqld=--innodb_stats_persistent=off \
            --mysqld=--loose-max-statement-time=30 \
            --threads=1 \
            --mysqld=--innodb_page_size=4K \
            --mysqld=--innodb-buffer-pool-size=5M \
            --duration=300 \
            --no_mask \
            --workdir=<local settings> \
            --vardir=<local settings> \
            --mtr-build-thread=<local settings> \
            --basedir1=<local settings> \
            --script_debug=_nix_ \
            --rr=Server \
            --rr_options=--chaos
             
            sdp:/RQG/storage/1598869940/tmp/dev/shm/vardir/1598869940/28/1/rr
            _RR_TRACE_DIR="." rr replay --mark-stdio mysqld-1
             
            RQG log is /RQG/storage/1598869940/000094.log
             
            
            

            mleich Matthias Leich added a comment - The following rr trace shows a case where CHECK TABLE `test`.`table100_innodb_key_pk_parts_2_int_autoinc` EXTENDED harvests 'test.table100_innodb_key_pk_parts_2_int_autoinc', 'check', 'error', 'Partition p0 returned error' 'test.table100_innodb_key_pk_parts_2_int_autoinc', 'check', 'Warning', 'InnoDB: Index \'col_varchar_255_ucs2_key\' contains 79 entries, should be 80.' 'test.table100_innodb_key_pk_parts_2_int_autoinc', 'check', 'error', 'Corrupt' Workflow: 1. Start DB server and create a few tables. Generated columns or foreign keys are not involved. But one table is partitioned. innodb-buffer-pool-size=5M is exceptional small. 2. One session runs DML only 3. At some point of time (2. is ongoing) the server process gets killed with SIGKILL 4. Restart and check the consistency of table via SELECTs and CHECK TABLE extended   HEAD, origin/bb-10.5-MDEV-23456 20a8a5ba50eae3d0ae92da202f5fada89409d072 2020-08-28T22:54:26+05:30 compiled with ASAN and "Og".   RQG call perl rqg.pl \ --grammar=conf/engines/many_indexes.yy \ --gendata=conf/engines/many_indexes.zz \ --reporters=CrashRecovery1 \ --mysqld=--innodb_use_native_aio=1 \ --mysqld=--innodb_lock_schedule_algorithm=fcfs \ --mysqld=--loose-idle_write_transaction_timeout=0 \ --mysqld=--loose-idle_transaction_timeout=0 \ --mysqld=--loose-idle_readonly_transaction_timeout=0 \ --mysqld=--connect_timeout=60 \ --mysqld=--interactive_timeout=28800 \ --mysqld=--slave_net_timeout=60 \ --mysqld=--net_read_timeout=30 \ --mysqld=--net_write_timeout=60 \ --mysqld=--loose-table_lock_wait_timeout=50 \ --mysqld=--wait_timeout=28800 \ --mysqld=--lock-wait-timeout=86400 \ --mysqld=--innodb-lock-wait-timeout=50 \ --no-mask \ --queries=10000000 \ --seed=random \ --reporters=Backtrace \ --reporters=ErrorLog \ --reporters=Deadlock1 \ --validators=None \ --mysqld=--log_output=none \ --mysqld=--log-bin \ --mysqld=--log_bin_trust_function_creators=1 \ --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ --engine=InnoDB \ --restart_timeout=120 \ --duration=300 \ --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \ --mysqld=--loose-innodb-sync-debug \ --mysqld=--innodb_stats_persistent=off \ --mysqld=--loose-max-statement-time=30 \ --threads=1 \ --mysqld=--innodb_page_size=4K \ --mysqld=--innodb-buffer-pool-size=5M \ --duration=300 \ --no_mask \ --workdir=<local settings> \ --vardir=<local settings> \ --mtr-build-thread=<local settings> \ --basedir1=<local settings> \ --script_debug=_nix_ \ --rr=Server \ --rr_options=--chaos   sdp:/RQG/storage/1598869940/tmp/dev/shm/vardir/1598869940/28/1/rr _RR_TRACE_DIR="." rr replay --mark-stdio mysqld-1   RQG log is /RQG/storage/1598869940/000094.log  
            mleich Matthias Leich made changes -
            Labels need_feedback need_rr rr-profile rr-profile-analyzed need_feedback rr-profile rr-profile-analyzed
            marko Marko Mäkelä made changes -

            I filed MDEV-23632 for the 0xbebebebe problem, which I was able to repeat on 10.3 as well. I made a mistake in my earlier comment: col_int_g is an alias for col_int, not col1.

            marko Marko Mäkelä added a comment - I filed MDEV-23632 for the 0xbebebebe problem, which I was able to repeat on 10.3 as well. I made a mistake in my earlier comment: col_int_g is an alias for col_int , not col1 .
            marko Marko Mäkelä made changes -
            Labels need_feedback rr-profile rr-profile-analyzed need_feedback rr-profile

            The trace with ASAN includes crash recovery. As far as I can tell, the table probably may have been corrupted before recovery. In the clustered index, CHECK TABLE accessed 80 records. I dumped the values of the PRIMARY KEY column (an INT, ranging from 0x19 to 0xc0) and the indexed UCS2 column. I repeated the same exercise for the 79 visited records of the secondary index.

            It is possible that I made a mistake when dumping the indexed column in the primary key index, because it looks like I would always get a mismatch for the column value. Unfortunately, InnoDB does not implement CHECK TABLE…EXTENDED that would cross-check the keys in each index. It only counts the records.

            Because the trace started with a crash recovery, it is possible that the table was corrupted already before recovery. rr replay of the trace before the server was killed could not be replayed; diversion was reported.

            Would it be possible to repeat this with fewer columns and indexes, or without crash recovery? Theoretically, one secondary index could be enough. And we should not need many non-indexed columns in the table.

            marko Marko Mäkelä added a comment - The trace with ASAN includes crash recovery. As far as I can tell, the table probably may have been corrupted before recovery. In the clustered index, CHECK TABLE accessed 80 records. I dumped the values of the PRIMARY KEY column (an INT, ranging from 0x19 to 0xc0) and the indexed UCS2 column. I repeated the same exercise for the 79 visited records of the secondary index. It is possible that I made a mistake when dumping the indexed column in the primary key index, because it looks like I would always get a mismatch for the column value. Unfortunately, InnoDB does not implement CHECK TABLE…EXTENDED that would cross-check the keys in each index. It only counts the records. Because the trace started with a crash recovery, it is possible that the table was corrupted already before recovery. rr replay of the trace before the server was killed could not be replayed; diversion was reported. Would it be possible to repeat this with fewer columns and indexes, or without crash recovery? Theoretically, one secondary index could be enough. And we should not need many non-indexed columns in the table.

            marko, can't remember the task i am immediately aware of, that could duplicate this one. The value can be left broken in the case of computation failure, which is covered by MDEV-20618 fix. I'd check this one first (especially since you have same-looking values)

            nikitamalyavin Nikita Malyavin added a comment - marko , can't remember the task i am immediately aware of, that could duplicate this one. The value can be left broken in the case of computation failure, which is covered by MDEV-20618 fix. I'd check this one first (especially since you have same-looking values)

            The RQG simplifier is running.
            I expect to have some heavy simplified replay test soon.

            mleich Matthias Leich added a comment - The RQG simplifier is running. I expect to have some heavy simplified replay test soon.

            Also the newest rr replay trace is flagging corruption in an index on a UCS2 column, apparently without involving server restart, which is good news. mleich says that with UTF-8, the bug would take longer to reproduce.

            It is a little challenging to debug the trace, because both the primary key and the secondary key consist of multiple pages even though the number of rows is relatively small. There probably are unnecessary columns and indexes (5 secondary indexes in addition to the failing one). But, I should be able to analyze the available trace. It will just take some extra effort, because I cannot comfortably look at a small number of page dumps but will have to define some breakpoints and auto-display expressions to dump the interesting data.

            marko Marko Mäkelä added a comment - Also the newest rr replay trace is flagging corruption in an index on a UCS2 column, apparently without involving server restart, which is good news. mleich says that with UTF-8, the bug would take longer to reproduce. It is a little challenging to debug the trace, because both the primary key and the secondary key consist of multiple pages even though the number of rows is relatively small. There probably are unnecessary columns and indexes (5 secondary indexes in addition to the failing one). But, I should be able to analyze the available trace. It will just take some extra effort, because I cannot comfortably look at a small number of page dumps but will have to define some breakpoints and auto-display expressions to dump the interesting data.
            mleich Matthias Leich made changes -
            Attachment MDEV-22924.tgz [ 53737 ]

            The uploaded MDEV-22924.tgz contains some MTR + shellscript based brute force replay test
            - ./mysqltest_background.sh
            - suite/innodb/t/MDEV-22924.test
            - suite/innodb/t/MDEV-22924-master.opt
            A replay attempt could look like
            ./mtr --mem -rr --rr-arg="--chaos" MDEV-22924
            You might need
            - several replay attempts or
            - to raise the value assigned to $MY_RUNTIME in suite/innodb/t/MDEV-22924.test.
            Some warnings regarding the content of suite/innodb/t/MDEV-22924.test:
            I am rather sure that certain statements (there is one with wrong syntax) or some
            components of the table (column or a key) are most probably not required for replay.
            But when removing other statements or components usually the runtime required
            till replay raised serious.
            
            

            mleich Matthias Leich added a comment - The uploaded MDEV-22924.tgz contains some MTR + shellscript based brute force replay test - ./mysqltest_background.sh - suite/innodb/t/MDEV-22924.test - suite/innodb/t/MDEV-22924-master.opt A replay attempt could look like ./mtr --mem -rr --rr-arg="--chaos" MDEV-22924 You might need - several replay attempts or - to raise the value assigned to $MY_RUNTIME in suite/innodb/t/MDEV-22924.test. Some warnings regarding the content of suite/innodb/t/MDEV-22924.test: I am rather sure that certain statements (there is one with wrong syntax) or some components of the table (column or a key) are most probably not required for replay. But when removing other statements or components usually the runtime required till replay raised serious.
            mleich Matthias Leich made changes -
            Attachment MDEV-22924.tgz [ 53737 ]
            mleich Matthias Leich made changes -
            Attachment MDEV-22924.tgz [ 53742 ]

            I am debugging an rr replay trace that was generated with MDEV-22924.tgz. I observed that while CHECK TABLE is accessing the PRIMARY KEY index, an UPDATE is accessing a secondary index of the same table. Due to this, my standard practice of saving the page dumps at the end of the run did not produce good results. Instead, I ended up setting a breakpoint in buf_page_get_gen() to extract the pages as they were when CHECK TABLE accessed them.

            The execution uses innodb_page_size=4k. The clustered index comprises 3 leaf pages, and the secondary index comprises 1 page:

            PRIMARY KEY(pk)

            (pk,DB_TRX_ID,DB_ROLL_PTR,col_char_255_utf8_key,col_longtext_ucs2_key,…)=
            page 12:
            (1,0x7d9,update,'moderate','',…),
            (2,0x7d9,update,'suppose','',…),
            (3,0x7d9,update,'mfrjdutx'…,'',…),
            page 10:
            (4,0x7d9,update,'rjdutxv','',…),
            (5,0x7d9,update,'utxvbchboy'…,'',…),
            page 11:
            (6,0x7d9,update,'i','',…)
            

            KEY(col_longtext_ucs2_key)

            (col_longtext_ucs2_key,pk)=
            page 5:
            ('',1),('',2),('',3),('',4),('',5),('',6),
            delete-marked(repeat(0x744,242),1),
            delete-marked(repeat(0x744,242),2),
            … (15 records total)
            

            The problem is that CHECK TABLE is only counting 5 records in the secondary index, instead of the expected number 6.
            It is entirely normal that the secondary index contains extra records; this is how MVCC works until we implement MDEV-17598. Next, I will check in the execution trace each access of those 6 first secondary index records to determine which record is not being counted, and why.

            marko Marko Mäkelä added a comment - I am debugging an rr replay trace that was generated with MDEV-22924.tgz . I observed that while CHECK TABLE is accessing the PRIMARY KEY index, an UPDATE is accessing a secondary index of the same table. Due to this, my standard practice of saving the page dumps at the end of the run did not produce good results. Instead, I ended up setting a breakpoint in buf_page_get_gen() to extract the pages as they were when CHECK TABLE accessed them. The execution uses innodb_page_size=4k . The clustered index comprises 3 leaf pages, and the secondary index comprises 1 page: PRIMARY KEY(pk) (pk,DB_TRX_ID,DB_ROLL_PTR,col_char_255_utf8_key,col_longtext_ucs2_key,…)= page 12: (1,0x7d9,update,'moderate','',…), (2,0x7d9,update,'suppose','',…), (3,0x7d9,update,'mfrjdutx'…,'',…), page 10: (4,0x7d9,update,'rjdutxv','',…), (5,0x7d9,update,'utxvbchboy'…,'',…), page 11: (6,0x7d9,update,'i','',…) KEY(col_longtext_ucs2_key) (col_longtext_ucs2_key,pk)= page 5: ('',1),('',2),('',3),('',4),('',5),('',6), delete-marked(repeat(0x744,242),1), delete-marked(repeat(0x744,242),2), … (15 records total) The problem is that CHECK TABLE is only counting 5 records in the secondary index, instead of the expected number 6. It is entirely normal that the secondary index contains extra records; this is how MVCC works until we implement MDEV-17598 . Next, I will check in the execution trace each access of those 6 first secondary index records to determine which record is not being counted, and why.

            All records in the clustered index page carry DB_TRX_ID=2009=0x7d9. Also the PAGE_MAX_TRX_ID in the secondary index is at that value. The next available transaction identifier (trx_sys.m_max_trx_id) is 2012. The read view of the CHECK TABLE excludes transaction 2009.

            Transaction 2009 modified all 6 rows of the table by executing the statement

            UPDATE table100_innodb_int_autoinc SET col_longtext_ucs2_key = REPEAT( 0xA, 0 ) WHERE col_char_255_utf8_key BETWEEN REPEAT( 0xF5B, 224 ) AND REPEAT( 0xB2, 140 );
            

            The transaction was started at the event time (when) 4387662 and committed at 4257184. At the time the transaction started, the clustered index leaf pages were 13,14,10,11.

            The CHECK TABLE had started executing on the primary key at when 4255465. At the time the UPDATE transaction is committing, it was executing inside btr_validate_index(), trying to access our secondary index page 5.

            It looks like we must ignore the 6 first records in the secondary index page (which must be excluded from the read view) and concentrate on the delete-marked ones, to see which record is missing and why. Here are all 15 secondary index records that were present when CHECK TABLE counted the records:

            KEY(col_longtext_ucs2_key)

            (col_longtext_ucs2_key,pk)=
            page 5:
            ('',1),('',2),('',3),('',4),('',5),('',6),
            delete-marked(repeat(0x744,242),1),
            delete-marked(repeat(0x744,242),2),
            delete-marked(repeat(0x744,242),3),
            delete-marked(repeat(0x744,242),4),
            delete-marked(repeat(0x744,242),5),
            delete-marked(repeat(0x744,242),6),
            delete-marked(repeat(0xf76,85),1),
            delete-marked(repeat(0xf76,85),3),
            delete-marked(repeat(0xf76,85),6),
            

            I noticed that also purge is running concurrently with the CHECK TABLE. That should explain why there exist only 3 keys with the value 0xf76.

            I hope to complete the analysis of this trace next week. It is rather challenging due to the amount of concurrency, including page splits and merges in the clustered index.

            marko Marko Mäkelä added a comment - All records in the clustered index page carry DB_TRX_ID=2009=0x7d9. Also the PAGE_MAX_TRX_ID in the secondary index is at that value. The next available transaction identifier ( trx_sys.m_max_trx_id ) is 2012. The read view of the CHECK TABLE excludes transaction 2009. Transaction 2009 modified all 6 rows of the table by executing the statement UPDATE table100_innodb_int_autoinc SET col_longtext_ucs2_key = REPEAT( 0xA, 0 ) WHERE col_char_255_utf8_key BETWEEN REPEAT( 0xF5B, 224 ) AND REPEAT( 0xB2, 140 ); The transaction was started at the event time ( when ) 4387662 and committed at 4257184. At the time the transaction started, the clustered index leaf pages were 13,14,10,11. The CHECK TABLE had started executing on the primary key at when 4255465. At the time the UPDATE transaction is committing, it was executing inside btr_validate_index() , trying to access our secondary index page 5. It looks like we must ignore the 6 first records in the secondary index page (which must be excluded from the read view) and concentrate on the delete-marked ones, to see which record is missing and why. Here are all 15 secondary index records that were present when CHECK TABLE counted the records: KEY(col_longtext_ucs2_key) (col_longtext_ucs2_key,pk)= page 5: ('',1),('',2),('',3),('',4),('',5),('',6), delete-marked(repeat(0x744,242),1), delete-marked(repeat(0x744,242),2), delete-marked(repeat(0x744,242),3), delete-marked(repeat(0x744,242),4), delete-marked(repeat(0x744,242),5), delete-marked(repeat(0x744,242),6), delete-marked(repeat(0xf76,85),1), delete-marked(repeat(0xf76,85),3), delete-marked(repeat(0xf76,85),6), I noticed that also purge is running concurrently with the CHECK TABLE . That should explain why there exist only 3 keys with the value 0xf76. I hope to complete the analysis of this trace next week. It is rather challenging due to the amount of concurrency, including page splits and merges in the clustered index.

            Transaction 2009 updated the clustered index records to col_longtest_ucs2_key='' at the following event times, with the following preceding contents of the record:

            when pk DB_TRX_ID col_longtest_ucs2_key
            4243126 6 0x7d5 repeat(0xf76,85)
            4246790 3 0x7d5 repeat(0xf76,85)
            4247019 1 0x7d5 repeat(0xf76,85)
            4251855 4 0x7d2 repeat(0x744,242)
            4253252 2 0x7d2 repeat(0x744,242)
            4253481 5 0x7d2 repeat(0x744,242)

            This suggests that none of the records that were visible in the secondary index page when CHECK TABLE was executed could actually be purged.

            The UPDATE transaction was holding an S-latch on the secondary index page (5) from event time 4254001 to 4256397.
            At event time 4254061, a purge thread starts waiting for an X-latch on the secondary index page.
            At event time 4255465, the failing CHECK TABLE starts accessing the clustered index.
            Between event time 4256397 and 4256882 the purge thread was holding the X-latch on the secondary index page.
            Between event time 4256918 and 4257017, the same purge thread holds the secondary index X-latch again.
            Between event time 4257031 and 4257048, the UPDATE transaction 2009 holds the X-latch on the secondary index page, so that it can perform the first half of the last update, to delete-mark the record (repeat(0x744,242),5).
            Between event time 4257072 and 4257117, the UPDATE transaction 2009 holds the X-latch on the secondary index page, so that it can perform the second half of the last update, to insert the record ('',5).
            At event time 4257186, the UPDATE transaction 2009 commits.
            Between event time 4257264 and 4257322, the failing CHECK TABLE holds an SX-latch on the secondary index page.
            At event time 4257336, the failing CHECK TABLE acquires an S-latch on the secondary index page. This is the first time it starts counting records in the secondary index.
            At event time 4257868, a purge thread requests an exclusive latch on the secondary index page, while our CHECK TABLE is holding an S-latch.
            At event time 4258247, the failing CHECK TABLE releases the S-latch on the secondary index page.
            At event time 4258372, purge releases the exclusive page latch.
            Between event time 4258406 and 4258406, the same purge thread holds an exclusive page latch.
            Between event time 4259276 and 4259435, the same purge thread holds an exclusive page latch.
            Between event time 4259469 and 4259542, the same purge thread holds an exclusive page latch.
            Between event time 4259576 and 4259649, the same purge thread holds an exclusive page latch.
            Between event time 4259683 and 4260375, the same purge thread holds an exclusive page latch.
            Between event time 4260409 and 4260489, the same purge thread holds an exclusive page latch.
            Between event time 4260523 and 4260596, the same purge thread holds an exclusive page latch.
            Between event time 4260630 and 4260677, the same purge thread holds an exclusive page latch.
            Between event time 4260711 and 4260758, the same purge thread holds an exclusive page latch.
            Between event time 4260792 and 4260847, the same purge thread holds an exclusive page latch.
            Between event time 4260881 and 4260928, the same purge thread holds an exclusive page latch.
            Between event time 4260962 and 4261009, the same purge thread holds an exclusive page latch.
            At event time 4261016, our failing CHECK TABLE acquires an S-latch on the secondary index page.
            At event time 4261047, an UPDATE transaction 2014 starts to wait for an exclusive page latch.
            At event time 4261383, our failing CHECK TABLE releases the S-latch.
            At event time 4261431, the UPDATE transaction 2014 releases the exclusive page latch.
            At event time 4261445, our failing CHECK TABLE acquires an S-latch on the secondary index page.
            At event time 4261457, our failing CHECK TABLE acquires an S-latch on the secondary index page.
            Between event time 4261451 and 4261472, the UPDATE transaction 2014 holds the exclusive page latch.
            Between event time 4261586 and 4261635, a different purge thread holds an exclusive page latch.

            At event time 4263975, we report the count mismatch.

            Because there is so much concurrent activity, it seems infeasible to track down all changes to the page between the commit of the transaction 2009 and the CHECK TABLE. Next, I will determine which ‘raw’ records the CHECK TABLE is observing in the secondary index, and which of them are being miscounted.

            marko Marko Mäkelä added a comment - Transaction 2009 updated the clustered index records to col_longtest_ucs2_key='' at the following event times, with the following preceding contents of the record: when pk DB_TRX_ID col_longtest_ucs2_key 4243126 6 0x7d5 repeat(0xf76,85) 4246790 3 0x7d5 repeat(0xf76,85) 4247019 1 0x7d5 repeat(0xf76,85) 4251855 4 0x7d2 repeat(0x744,242) 4253252 2 0x7d2 repeat(0x744,242) 4253481 5 0x7d2 repeat(0x744,242) This suggests that none of the records that were visible in the secondary index page when CHECK TABLE was executed could actually be purged. The UPDATE transaction was holding an S-latch on the secondary index page (5) from event time 4254001 to 4256397. At event time 4254061, a purge thread starts waiting for an X-latch on the secondary index page. At event time 4255465, the failing CHECK TABLE starts accessing the clustered index. Between event time 4256397 and 4256882 the purge thread was holding the X-latch on the secondary index page. Between event time 4256918 and 4257017, the same purge thread holds the secondary index X-latch again. Between event time 4257031 and 4257048, the UPDATE transaction 2009 holds the X-latch on the secondary index page, so that it can perform the first half of the last update, to delete-mark the record (repeat(0x744,242),5). Between event time 4257072 and 4257117, the UPDATE transaction 2009 holds the X-latch on the secondary index page, so that it can perform the second half of the last update, to insert the record ('',5). At event time 4257186, the UPDATE transaction 2009 commits. Between event time 4257264 and 4257322, the failing CHECK TABLE holds an SX-latch on the secondary index page. At event time 4257336, the failing CHECK TABLE acquires an S-latch on the secondary index page. This is the first time it starts counting records in the secondary index. At event time 4257868, a purge thread requests an exclusive latch on the secondary index page, while our CHECK TABLE is holding an S-latch. At event time 4258247, the failing CHECK TABLE releases the S-latch on the secondary index page. At event time 4258372, purge releases the exclusive page latch. Between event time 4258406 and 4258406, the same purge thread holds an exclusive page latch. Between event time 4259276 and 4259435, the same purge thread holds an exclusive page latch. Between event time 4259469 and 4259542, the same purge thread holds an exclusive page latch. Between event time 4259576 and 4259649, the same purge thread holds an exclusive page latch. Between event time 4259683 and 4260375, the same purge thread holds an exclusive page latch. Between event time 4260409 and 4260489, the same purge thread holds an exclusive page latch. Between event time 4260523 and 4260596, the same purge thread holds an exclusive page latch. Between event time 4260630 and 4260677, the same purge thread holds an exclusive page latch. Between event time 4260711 and 4260758, the same purge thread holds an exclusive page latch. Between event time 4260792 and 4260847, the same purge thread holds an exclusive page latch. Between event time 4260881 and 4260928, the same purge thread holds an exclusive page latch. Between event time 4260962 and 4261009, the same purge thread holds an exclusive page latch. At event time 4261016, our failing CHECK TABLE acquires an S-latch on the secondary index page. At event time 4261047, an UPDATE transaction 2014 starts to wait for an exclusive page latch. At event time 4261383, our failing CHECK TABLE releases the S-latch. At event time 4261431, the UPDATE transaction 2014 releases the exclusive page latch. At event time 4261445, our failing CHECK TABLE acquires an S-latch on the secondary index page. At event time 4261457, our failing CHECK TABLE acquires an S-latch on the secondary index page. Between event time 4261451 and 4261472, the UPDATE transaction 2014 holds the exclusive page latch. Between event time 4261586 and 4261635, a different purge thread holds an exclusive page latch. … At event time 4263975, we report the count mismatch. Because there is so much concurrent activity, it seems infeasible to track down all changes to the page between the commit of the transaction 2009 and the CHECK TABLE . Next, I will determine which ‘raw’ records the CHECK TABLE is observing in the secondary index, and which of them are being miscounted.

            The following record is unexpectedly determined as not visible:

            Thread 4 hit Breakpoint 15, row_search_mvcc (buf=buf@entry=0x621003b4d518 '\276' <repeats 200 times>..., mode=<optimized out>, mode@entry=PAGE_CUR_G, prebuilt=prebuilt@entry=0x62100013b1a0, 
                match_mode=match_mode@entry=0, direction=direction@entry=0) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:5253
            5253					ut_ad(prebuilt->select_lock_type == LOCK_NONE
            9: /x *prebuilt.pcur.btr_cur.page_cur.rec@4 = {0x7, 0x44, 0x7, 0x44}
            11: /x prebuilt.pcur.btr_cur.page_cur.rec[480]@8 = {0x7, 0x44, 0x7, 0x44, 0x80, 0x0, 0x0, 0x2}
            (rr) when
            Current event: 4262501
            

            The clustered index page 12 was looked up at event time 4262479, and it does contain the key (pk=2). It seems that incorrect caching at Row_sel_get_clust_rec_for_mysql::operator() might be to blame for the mismatch.

            At event time 4261371, Row_sel_get_clust_rec_for_mysql::cached_clust_rec had been updated:

            Thread 4 hit Hardware watchpoint 17: -location cached_clust_rec
             
            Old value = (const rec_t *) 0x5acc21d84083 <incomplete sequence \354\200>
            New value = (const rec_t *) 0x5acc21a18083 "\200"
            Row_sel_get_clust_rec_for_mysql::operator() (this=this@entry=0x27ed5a27a540, prebuilt=prebuilt@entry=0x62100013b1a0, sec_index=sec_index@entry=0x61700006b7a0, rec=<optimized out>, thr=thr@entry=0x62100013b980, 
                out_rec=out_rec@entry=0x27ed5a27a040, offsets=<optimized out>, offset_heap=<optimized out>, vrow=<optimized out>, mtr=<optimized out>) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:3418
            3418					cached_old_vers = old_vers;
            (rr) when
            Current event: 4261371
            (rr) p/x *clust_rec@20
            $88 = {0x80, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd9, 0x4c, 0x0, 0x0, 0x6, 0x31, 0x8, 0xc6, 0x6d, 0x6f, 0x64}
            

            Note: (pk,DB_TRX_ID)=(1,2009), not (2,2009).
            A little later, we get to check whether the secondary index record with pk=2 should exist in our read view:

            (rr) 
            Row_sel_get_clust_rec_for_mysql::operator() (this=this@entry=0x27ed5a27a540, prebuilt=prebuilt@entry=0x62100013b1a0, sec_index=sec_index@entry=0x61700006b7a0, rec=<optimized out>, thr=thr@entry=0x62100013b980, 
                out_rec=out_rec@entry=0x27ed5a27a040, offsets=<optimized out>, offset_heap=<optimized out>, vrow=<optimized out>, mtr=<optimized out>) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:3258
            3258		clust_rec = btr_pcur_get_rec(prebuilt->clust_pcur);
            (rr) 
            3260		prebuilt->clust_pcur->trx_if_known = trx;
            (rr) p clust_rec
            $91 = (const rec_t *) 0x5acc21a18083 "\200"
            (rr) when
            Current event: 4262489
            (rr) p/x *clust_rec@20
            $92 = {0x80, 0x0, 0x0, 0x2, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd9, 0x4c, 0x0, 0x0, 0x6, 0x32, 0x6, 0xce, 0x73, 0x75, 0x70}
            

            Note: clust_rec is at the same pointer value, but it now points to a different record, (pk,DB_TRX_ID)=(2,2009), presumably because there has been a lot of purge and UPDATE activity that caused the records to move around.
            A little later, we hit the bug:

            3405				if (clust_rec != cached_clust_rec) {
            (rr) 
            3420					err = DB_SUCCESS;
            (rr) 
            3421					old_vers = cached_old_vers;
            (rr) p/x *cached_old_vers@20
            $93 = {0x80, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd5, 0x49, 0x0, 0x0, 0x6, 0x30, 0x0, 0x38, 0x6d, 0x6f, 0x64}
            (rr) when
            Current event: 4262489
            

            So, we wrongly determine the old version of the record pk=2 to be something with pk=1, and subsequently determine that the secondary index record does not exist in the read view.

            This incorrect logic, which was introduced in MDEV-20301, would likely have been caught much earlier if we had had a debug assertion to check that the PRIMARY KEY values are equal.

            The bug seems to affect MVCC reads of secondary index records only. The MDEV-20301 ‘optimization’ was not applied to checks for implicit locks.

            marko Marko Mäkelä added a comment - The following record is unexpectedly determined as not visible: Thread 4 hit Breakpoint 15, row_search_mvcc (buf=buf@entry=0x621003b4d518 '\276' <repeats 200 times>..., mode=<optimized out>, mode@entry=PAGE_CUR_G, prebuilt=prebuilt@entry=0x62100013b1a0, match_mode=match_mode@entry=0, direction=direction@entry=0) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:5253 5253 ut_ad(prebuilt->select_lock_type == LOCK_NONE 9: /x *prebuilt.pcur.btr_cur.page_cur.rec@4 = {0x7, 0x44, 0x7, 0x44} 11: /x prebuilt.pcur.btr_cur.page_cur.rec[480]@8 = {0x7, 0x44, 0x7, 0x44, 0x80, 0x0, 0x0, 0x2} (rr) when Current event: 4262501 The clustered index page 12 was looked up at event time 4262479, and it does contain the key (pk=2). It seems that incorrect caching at Row_sel_get_clust_rec_for_mysql::operator() might be to blame for the mismatch. At event time 4261371, Row_sel_get_clust_rec_for_mysql::cached_clust_rec had been updated: Thread 4 hit Hardware watchpoint 17: -location cached_clust_rec   Old value = (const rec_t *) 0x5acc21d84083 <incomplete sequence \354\200> New value = (const rec_t *) 0x5acc21a18083 "\200" Row_sel_get_clust_rec_for_mysql::operator() (this=this@entry=0x27ed5a27a540, prebuilt=prebuilt@entry=0x62100013b1a0, sec_index=sec_index@entry=0x61700006b7a0, rec=<optimized out>, thr=thr@entry=0x62100013b980, out_rec=out_rec@entry=0x27ed5a27a040, offsets=<optimized out>, offset_heap=<optimized out>, vrow=<optimized out>, mtr=<optimized out>) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:3418 3418 cached_old_vers = old_vers; (rr) when Current event: 4261371 (rr) p/x *clust_rec@20 $88 = {0x80, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd9, 0x4c, 0x0, 0x0, 0x6, 0x31, 0x8, 0xc6, 0x6d, 0x6f, 0x64} Note: (pk,DB_TRX_ID)=(1,2009), not (2,2009). A little later, we get to check whether the secondary index record with pk=2 should exist in our read view: (rr) Row_sel_get_clust_rec_for_mysql::operator() (this=this@entry=0x27ed5a27a540, prebuilt=prebuilt@entry=0x62100013b1a0, sec_index=sec_index@entry=0x61700006b7a0, rec=<optimized out>, thr=thr@entry=0x62100013b980, out_rec=out_rec@entry=0x27ed5a27a040, offsets=<optimized out>, offset_heap=<optimized out>, vrow=<optimized out>, mtr=<optimized out>) at /home/mleich/Server/10.5A/storage/innobase/row/row0sel.cc:3258 3258 clust_rec = btr_pcur_get_rec(prebuilt->clust_pcur); (rr) 3260 prebuilt->clust_pcur->trx_if_known = trx; (rr) p clust_rec $91 = (const rec_t *) 0x5acc21a18083 "\200" (rr) when Current event: 4262489 (rr) p/x *clust_rec@20 $92 = {0x80, 0x0, 0x0, 0x2, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd9, 0x4c, 0x0, 0x0, 0x6, 0x32, 0x6, 0xce, 0x73, 0x75, 0x70} Note: clust_rec is at the same pointer value, but it now points to a different record, (pk,DB_TRX_ID)=(2,2009), presumably because there has been a lot of purge and UPDATE activity that caused the records to move around. A little later, we hit the bug: 3405 if (clust_rec != cached_clust_rec) { (rr) 3420 err = DB_SUCCESS; (rr) 3421 old_vers = cached_old_vers; (rr) p/x *cached_old_vers@20 $93 = {0x80, 0x0, 0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x7, 0xd5, 0x49, 0x0, 0x0, 0x6, 0x30, 0x0, 0x38, 0x6d, 0x6f, 0x64} (rr) when Current event: 4262489 So, we wrongly determine the old version of the record pk=2 to be something with pk=1, and subsequently determine that the secondary index record does not exist in the read view. This incorrect logic, which was introduced in MDEV-20301 , would likely have been caught much earlier if we had had a debug assertion to check that the PRIMARY KEY values are equal. The bug seems to affect MVCC reads of secondary index records only. The MDEV-20301 ‘optimization’ was not applied to checks for implicit locks.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Affects Version/s 10.5.0 [ 23709 ]
            Affects Version/s 10.4.8 [ 23721 ]
            Affects Version/s 10.3.18 [ 23719 ]
            Affects Version/s 10.2.27 [ 23717 ]
            Affects Version/s 10.5.4 [ 24264 ]
            Assignee Marko Mäkelä [ marko ] Aleksey Midenkov [ midenok ]
            Labels need_feedback rr-profile affects-tests corruption not-10.1 regression rr-profile-analyzed
            Priority Critical [ 2 ] Blocker [ 1 ]

            We might try to rescue the MDEV-20301 logic by checking that the FIL_PAGE_LSN of the page is unchanged. (Note that before MDEV-21132 in 10.5.0, the latest value of FIL_PAGE_LSN could be stored in newest_lsn, and hence that field must be checked as well.) We should also add a debug assertion that the (pk,DB_TRX_ID,DB_ROLL_PTR) match between clust_rec and cached_clust_rec, and that the (pk) matches with cached_old_vers.

            marko Marko Mäkelä added a comment - We might try to rescue the MDEV-20301 logic by checking that the FIL_PAGE_LSN of the page is unchanged. (Note that before MDEV-21132 in 10.5.0, the latest value of FIL_PAGE_LSN could be stored in newest_lsn , and hence that field must be checked as well.) We should also add a debug assertion that the (pk,DB_TRX_ID,DB_ROLL_PTR) match between clust_rec and cached_clust_rec , and that the (pk) matches with cached_old_vers .
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            This bug is blocking the testing effort for catching other corruption bugs. I will try to address this myself as soon as possible.

            marko Marko Mäkelä added a comment - This bug is blocking the testing effort for catching other corruption bugs. I will try to address this myself as soon as possible.
            marko Marko Mäkelä made changes -
            Assignee Aleksey Midenkov [ midenok ] Marko Mäkelä [ marko ]
            marko Marko Mäkelä made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2020-09-07 17:03:05.0 2020-09-07 17:03:05.837
            marko Marko Mäkelä made changes -
            Fix Version/s 10.2.34 [ 24505 ]
            Fix Version/s 10.3.25 [ 24506 ]
            Fix Version/s 10.4.15 [ 24507 ]
            Fix Version/s 10.5.6 [ 24508 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.5.7 [ 25019 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.5.6 [ 24508 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.4.16 [ 25020 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.4.15 [ 24507 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.3.26 [ 25021 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.3.25 [ 24506 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.2.35 [ 25022 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.2.34 [ 24505 ]
            manjot Manjot Singh (Inactive) made changes -
            Assignee Marko Mäkelä [ marko ] Manjot Singh [ manjot ]
            marko Marko Mäkelä made changes -
            Assignee Manjot Singh [ manjot ] Marko Mäkelä [ marko ]
            manjot Manjot Singh (Inactive) made changes -
            Assignee Marko Mäkelä [ marko ] Manjot Singh [ manjot ]
            manjot Manjot Singh (Inactive) made changes -
            Assignee Manjot Singh [ manjot ] Marko Mäkelä [ marko ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 110142 ] MariaDB v4 [ 157976 ]

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.