Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.2.27, 10.3.18, 10.4.8, 10.5.0
Description
Index corruption found during RQG testing
|
Some CHECK TABLE t2 EXTENDED harvested
|
test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.
|
And the server error log contains a
|
[rr 76197 241886]2020-06-16 18:31:54 33 [ERROR] InnoDB: Index `Marvão_idx3` of table `test`.`t2` is corrupted
|
[rr 76197 241890]2020-06-16 18:31:54 33 [Note] InnoDB: Load corrupted index `Marvão_idx3` of table `test`.`t2`
|
|
Work flow:
|
1. Start the server and generate some initial data
|
2. Several concurrent sessions runs a DDL/DML mix
|
and sometimes CHECK TABLE EXTENDED
|
Please note critical events like killing the server process followed by Restart
|
with CrashRecovery are not involved..
|
|
I have some "rr" trace for the tree
|
origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08
|
which is based on actual 10.5 but contains some non final changes of Marko.
|
Per Marko the failure cannot be caused by his changes.
|
|
My test machinery has seen the same problem also on
|
10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00
|
but the corresponding details are lost.
|
And so its unknown if that was during concurrent load or after crash with recovery.
|
|
RQG
|
git clone https://github.com/mleich1/rqg --branch experimental RQG
|
origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00
|
|
perl rqg.pl \
|
--grammar=conf/mariadb/table_stress_innodb_nocopy.yy \
|
--gendata=conf/mariadb/table_stress.zz \
|
--gendata_sql=conf/mariadb/table_stress.sql \
|
--mysqld=--innodb_use_native_aio=1 \
|
--mysqld=--innodb_lock_schedule_algorithm=fcfs \
|
--mysqld=--loose-idle_write_transaction_timeout=0 \
|
--mysqld=--loose-idle_transaction_timeout=0 \
|
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
|
--mysqld=--connect_timeout=60 \
|
--mysqld=--interactive_timeout=28800 \
|
--mysqld=--slave_net_timeout=60 \
|
--mysqld=--net_read_timeout=30 \
|
--mysqld=--net_write_timeout=60 \
|
--mysqld=--loose-table_lock_wait_timeout=50 \
|
--mysqld=--wait_timeout=28800 \
|
--mysqld=--lock-wait-timeout=86400 \
|
--mysqld=--innodb-lock-wait-timeout=50 \
|
--no-mask \
|
--queries=10000000 \
|
--seed=random \
|
--reporters=Backtrace \
|
--reporters=ErrorLog \
|
--reporters=Deadlock1 \
|
--validators=None \
|
--mysqld=--log_output=none \
|
--mysqld=--log-bin \
|
--mysqld=--log_bin_trust_function_creators=1 \
|
--mysqld=--loose-max-statement-time=30 \
|
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
|
--engine=InnoDB \
|
--restart_timeout=120 \
|
--duration=300 \
|
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
|
--mysqld=--innodb_stats_persistent=off \
|
--threads=33 \
|
--mysqld=--innodb_page_size=4K \
|
--mysqld=--innodb-buffer-pool-size=256M \
|
--duration=300 \
|
--no_mask \
|
... certain local settings ...
|
|
If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells
|
that running a work flow invoking kill+restart was planned.
|
But the data corruption was hit before the RQG reporter "CrashRecovery1"
|
would kill the server process.
|
Attachments
Issue Links
- blocks
-
MDEV-14643 InnoDB: Failing assertion: !cursor->index->is_committed()
-
- Closed
-
-
MDEV-23192 Crash in row_search_mvcc() probably related to secondary index corruption
-
- Closed
-
-
MDEV-23563 InnoDB: Failing assertion: !cursor->index->is_committed() in row0ins.cc
-
- Closed
-
- is caused by
-
MDEV-20301 InnoDB's MVCC has O(N^2) behaviors
-
- Closed
-
- relates to
-
MDEV-23632 ALTER TABLE...ADD KEY creates corrupted index on virtual column
-
- Closed
-
-
MDEV-15532 XA: Assertion `!log->same_pk' failed in row_log_table_apply_delete
-
- Closed
-
-
MDEV-22076 InnoDB: Failing assertion: !cursor->index->is_committed() or Assertion `0' in row_purge_remove_sec_if_poss_leaf after DML/DDL with XA
-
- Closed
-
-
MDEV-22373 Unable to find a record to delete-mark ends up crashing mysqld process after upgrading from 10.1.43 to 10.4
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Description |
{noformat} Index corruption found during RQG testing Some CHECK TABLE t2 EXTENDED harvested test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37. Work flow: 1. Start the server and generate some initial data 2. Several concurrent sessions runs a DDL/DML mix and sometimes CHECK TABLE EXTENDED Please note critical events like killing the server process followed by Restart with CrashRecovery are not involved.. I have some "rr" trace for the tree origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08 which is based on actual 10.5 but contains some non final changes of Marko. Per Marko the failure cannot be caused by his changes. My test machinery has seen the same problem also on 10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00 but the corresponding details are lost. And so its unknown if that was during concurrent load or after crash with recovery. RQG git clone https://github.com/mleich1/rqg --branch experimental RQG origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00 perl rqg.pl \ --grammar=conf/mariadb/table_stress_innodb_nocopy.yy \ --gendata=conf/mariadb/table_stress.zz \ --gendata_sql=conf/mariadb/table_stress.sql \ --mysqld=--innodb_use_native_aio=1 \ --mysqld=--innodb_lock_schedule_algorithm=fcfs \ --mysqld=--loose-idle_write_transaction_timeout=0 \ --mysqld=--loose-idle_transaction_timeout=0 \ --mysqld=--loose-idle_readonly_transaction_timeout=0 \ --mysqld=--connect_timeout=60 \ --mysqld=--interactive_timeout=28800 \ --mysqld=--slave_net_timeout=60 \ --mysqld=--net_read_timeout=30 \ --mysqld=--net_write_timeout=60 \ --mysqld=--loose-table_lock_wait_timeout=50 \ --mysqld=--wait_timeout=28800 \ --mysqld=--lock-wait-timeout=86400 \ --mysqld=--innodb-lock-wait-timeout=50 \ --no-mask \ --queries=10000000 \ --seed=random \ --reporters=Backtrace \ --reporters=ErrorLog \ --reporters=Deadlock1 \ --validators=None \ --mysqld=--log_output=none \ --mysqld=--log-bin \ --mysqld=--log_bin_trust_function_creators=1 \ --mysqld=--loose-max-statement-time=30 \ --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ --engine=InnoDB \ --restart_timeout=120 \ --duration=300 \ --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \ --mysqld=--innodb_stats_persistent=off \ --threads=33 \ --mysqld=--innodb_page_size=4K \ --mysqld=--innodb-buffer-pool-size=256M \ --duration=300 \ --no_mask \ ... certain local settings ... If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells that running that running a work flow invoking kill+restart was planned. But the data corruption was hit before the RQG reporter "CrashRecovery1" would kill the server process. {noformat} |
{noformat}
Index corruption found during RQG testing Some CHECK TABLE t2 EXTENDED harvested test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37. And the server error log contains a [rr 76197 241886]2020-06-16 18:31:54 33 [ERROR] InnoDB: Index `Marvão_idx3` of table `test`.`t2` is corrupted [rr 76197 241890]2020-06-16 18:31:54 33 [Note] InnoDB: Load corrupted index `Marvão_idx3` of table `test`.`t2` Work flow: 1. Start the server and generate some initial data 2. Several concurrent sessions runs a DDL/DML mix and sometimes CHECK TABLE EXTENDED Please note critical events like killing the server process followed by Restart with CrashRecovery are not involved.. I have some "rr" trace for the tree origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08 which is based on actual 10.5 but contains some non final changes of Marko. Per Marko the failure cannot be caused by his changes. My test machinery has seen the same problem also on 10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00 but the corresponding details are lost. And so its unknown if that was during concurrent load or after crash with recovery. RQG git clone https://github.com/mleich1/rqg --branch experimental RQG origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00 perl rqg.pl \ --grammar=conf/mariadb/table_stress_innodb_nocopy.yy \ --gendata=conf/mariadb/table_stress.zz \ --gendata_sql=conf/mariadb/table_stress.sql \ --mysqld=--innodb_use_native_aio=1 \ --mysqld=--innodb_lock_schedule_algorithm=fcfs \ --mysqld=--loose-idle_write_transaction_timeout=0 \ --mysqld=--loose-idle_transaction_timeout=0 \ --mysqld=--loose-idle_readonly_transaction_timeout=0 \ --mysqld=--connect_timeout=60 \ --mysqld=--interactive_timeout=28800 \ --mysqld=--slave_net_timeout=60 \ --mysqld=--net_read_timeout=30 \ --mysqld=--net_write_timeout=60 \ --mysqld=--loose-table_lock_wait_timeout=50 \ --mysqld=--wait_timeout=28800 \ --mysqld=--lock-wait-timeout=86400 \ --mysqld=--innodb-lock-wait-timeout=50 \ --no-mask \ --queries=10000000 \ --seed=random \ --reporters=Backtrace \ --reporters=ErrorLog \ --reporters=Deadlock1 \ --validators=None \ --mysqld=--log_output=none \ --mysqld=--log-bin \ --mysqld=--log_bin_trust_function_creators=1 \ --mysqld=--loose-max-statement-time=30 \ --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ --engine=InnoDB \ --restart_timeout=120 \ --duration=300 \ --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \ --mysqld=--innodb_stats_persistent=off \ --threads=33 \ --mysqld=--innodb_page_size=4K \ --mysqld=--innodb-buffer-pool-size=256M \ --duration=300 \ --no_mask \ ... certain local settings ... If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells that running that running a work flow invoking kill+restart was planned. But the data corruption was hit before the RQG reporter "CrashRecovery1" would kill the server process. {noformat} |
Assignee | Matthias Leich [ mleich ] | Marko Mäkelä [ marko ] |
Summary | Draft: Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37. | Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37. |
Labels | rr-profile | need_rr |
Comment | [ A comment with security level 'Developers' was removed. ] |
Labels | need_rr | rr-profile |
Labels | rr-profile | rr-profile-analyzed |
Labels | rr-profile-analyzed | need_rr rr-profile-analyzed |
Link |
This issue relates to |
Link |
This issue relates to |
Component/s | Replication [ 10100 ] | |
Labels | need_rr rr-profile-analyzed | rr-profile |
Link |
This issue relates to |
Component/s | Replication [ 10100 ] | |
Labels | rr-profile | need_rr rr-profile-analyzed |
Link |
This issue relates to |
Link | This issue blocks MENT-751 [ MENT-751 ] |
Link |
This issue blocks |
Link |
This issue blocks |
Link | This issue blocks MDEV-22725 [ MDEV-22725 ] |
Link |
This issue relates to |
Link |
This issue blocks |
Priority | Major [ 3 ] | Critical [ 2 ] |
Labels | need_rr rr-profile-analyzed | need_feedback need_rr rr-profile-analyzed |
Link | This issue is blocked by TODO-2490 [ TODO-2490 ] |
Description |
{noformat}
Index corruption found during RQG testing Some CHECK TABLE t2 EXTENDED harvested test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37. And the server error log contains a [rr 76197 241886]2020-06-16 18:31:54 33 [ERROR] InnoDB: Index `Marvão_idx3` of table `test`.`t2` is corrupted [rr 76197 241890]2020-06-16 18:31:54 33 [Note] InnoDB: Load corrupted index `Marvão_idx3` of table `test`.`t2` Work flow: 1. Start the server and generate some initial data 2. Several concurrent sessions runs a DDL/DML mix and sometimes CHECK TABLE EXTENDED Please note critical events like killing the server process followed by Restart with CrashRecovery are not involved.. I have some "rr" trace for the tree origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08 which is based on actual 10.5 but contains some non final changes of Marko. Per Marko the failure cannot be caused by his changes. My test machinery has seen the same problem also on 10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00 but the corresponding details are lost. And so its unknown if that was during concurrent load or after crash with recovery. RQG git clone https://github.com/mleich1/rqg --branch experimental RQG origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00 perl rqg.pl \ --grammar=conf/mariadb/table_stress_innodb_nocopy.yy \ --gendata=conf/mariadb/table_stress.zz \ --gendata_sql=conf/mariadb/table_stress.sql \ --mysqld=--innodb_use_native_aio=1 \ --mysqld=--innodb_lock_schedule_algorithm=fcfs \ --mysqld=--loose-idle_write_transaction_timeout=0 \ --mysqld=--loose-idle_transaction_timeout=0 \ --mysqld=--loose-idle_readonly_transaction_timeout=0 \ --mysqld=--connect_timeout=60 \ --mysqld=--interactive_timeout=28800 \ --mysqld=--slave_net_timeout=60 \ --mysqld=--net_read_timeout=30 \ --mysqld=--net_write_timeout=60 \ --mysqld=--loose-table_lock_wait_timeout=50 \ --mysqld=--wait_timeout=28800 \ --mysqld=--lock-wait-timeout=86400 \ --mysqld=--innodb-lock-wait-timeout=50 \ --no-mask \ --queries=10000000 \ --seed=random \ --reporters=Backtrace \ --reporters=ErrorLog \ --reporters=Deadlock1 \ --validators=None \ --mysqld=--log_output=none \ --mysqld=--log-bin \ --mysqld=--log_bin_trust_function_creators=1 \ --mysqld=--loose-max-statement-time=30 \ --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ --engine=InnoDB \ --restart_timeout=120 \ --duration=300 \ --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \ --mysqld=--innodb_stats_persistent=off \ --threads=33 \ --mysqld=--innodb_page_size=4K \ --mysqld=--innodb-buffer-pool-size=256M \ --duration=300 \ --no_mask \ ... certain local settings ... If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells that running that running a work flow invoking kill+restart was planned. But the data corruption was hit before the RQG reporter "CrashRecovery1" would kill the server process. {noformat} |
{noformat}
Index corruption found during RQG testing Some CHECK TABLE t2 EXTENDED harvested test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37. And the server error log contains a [rr 76197 241886]2020-06-16 18:31:54 33 [ERROR] InnoDB: Index `Marvão_idx3` of table `test`.`t2` is corrupted [rr 76197 241890]2020-06-16 18:31:54 33 [Note] InnoDB: Load corrupted index `Marvão_idx3` of table `test`.`t2` Work flow: 1. Start the server and generate some initial data 2. Several concurrent sessions runs a DDL/DML mix and sometimes CHECK TABLE EXTENDED Please note critical events like killing the server process followed by Restart with CrashRecovery are not involved.. I have some "rr" trace for the tree origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08 which is based on actual 10.5 but contains some non final changes of Marko. Per Marko the failure cannot be caused by his changes. My test machinery has seen the same problem also on 10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00 but the corresponding details are lost. And so its unknown if that was during concurrent load or after crash with recovery. RQG git clone https://github.com/mleich1/rqg --branch experimental RQG origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00 perl rqg.pl \ --grammar=conf/mariadb/table_stress_innodb_nocopy.yy \ --gendata=conf/mariadb/table_stress.zz \ --gendata_sql=conf/mariadb/table_stress.sql \ --mysqld=--innodb_use_native_aio=1 \ --mysqld=--innodb_lock_schedule_algorithm=fcfs \ --mysqld=--loose-idle_write_transaction_timeout=0 \ --mysqld=--loose-idle_transaction_timeout=0 \ --mysqld=--loose-idle_readonly_transaction_timeout=0 \ --mysqld=--connect_timeout=60 \ --mysqld=--interactive_timeout=28800 \ --mysqld=--slave_net_timeout=60 \ --mysqld=--net_read_timeout=30 \ --mysqld=--net_write_timeout=60 \ --mysqld=--loose-table_lock_wait_timeout=50 \ --mysqld=--wait_timeout=28800 \ --mysqld=--lock-wait-timeout=86400 \ --mysqld=--innodb-lock-wait-timeout=50 \ --no-mask \ --queries=10000000 \ --seed=random \ --reporters=Backtrace \ --reporters=ErrorLog \ --reporters=Deadlock1 \ --validators=None \ --mysqld=--log_output=none \ --mysqld=--log-bin \ --mysqld=--log_bin_trust_function_creators=1 \ --mysqld=--loose-max-statement-time=30 \ --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ --engine=InnoDB \ --restart_timeout=120 \ --duration=300 \ --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \ --mysqld=--innodb_stats_persistent=off \ --threads=33 \ --mysqld=--innodb_page_size=4K \ --mysqld=--innodb-buffer-pool-size=256M \ --duration=300 \ --no_mask \ ... certain local settings ... If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells that running a work flow invoking kill+restart was planned. But the data corruption was hit before the RQG reporter "CrashRecovery1" would kill the server process. {noformat} |
Labels | need_feedback need_rr rr-profile-analyzed | need_feedback rr-profile rr-profile-analyzed |
Labels | need_feedback rr-profile rr-profile-analyzed | need_feedback need_rr rr-profile rr-profile-analyzed |
Labels | need_feedback need_rr rr-profile rr-profile-analyzed | need_feedback rr-profile rr-profile-analyzed |
Link |
This issue relates to |
Labels | need_feedback rr-profile rr-profile-analyzed | need_feedback rr-profile |
Attachment | MDEV-22924.tgz [ 53737 ] |
Attachment |
|
Attachment | MDEV-22924.tgz [ 53742 ] |
Link |
This issue is caused by |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Affects Version/s | 10.5.0 [ 23709 ] | |
Affects Version/s | 10.4.8 [ 23721 ] | |
Affects Version/s | 10.3.18 [ 23719 ] | |
Affects Version/s | 10.2.27 [ 23717 ] | |
Affects Version/s | 10.5.4 [ 24264 ] | |
Assignee | Marko Mäkelä [ marko ] | Aleksey Midenkov [ midenok ] |
Labels | need_feedback rr-profile | affects-tests corruption not-10.1 regression rr-profile-analyzed |
Priority | Critical [ 2 ] | Blocker [ 1 ] |
Status | Open [ 1 ] | Confirmed [ 10101 ] |
Assignee | Aleksey Midenkov [ midenok ] | Marko Mäkelä [ marko ] |
Status | Confirmed [ 10101 ] | In Progress [ 3 ] |
issue.field.resolutiondate | 2020-09-07 17:03:05.0 | 2020-09-07 17:03:05.837 |
Fix Version/s | 10.2.34 [ 24505 ] | |
Fix Version/s | 10.3.25 [ 24506 ] | |
Fix Version/s | 10.4.15 [ 24507 ] | |
Fix Version/s | 10.5.6 [ 24508 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Fix Version/s | 10.5.7 [ 25019 ] |
Fix Version/s | 10.5.6 [ 24508 ] |
Fix Version/s | 10.4.16 [ 25020 ] |
Fix Version/s | 10.4.15 [ 24507 ] |
Fix Version/s | 10.3.26 [ 25021 ] |
Fix Version/s | 10.3.25 [ 24506 ] |
Fix Version/s | 10.2.35 [ 25022 ] |
Fix Version/s | 10.2.34 [ 24505 ] |
Assignee | Marko Mäkelä [ marko ] | Manjot Singh [ manjot ] |
Assignee | Manjot Singh [ manjot ] | Marko Mäkelä [ marko ] |
Assignee | Marko Mäkelä [ marko ] | Manjot Singh [ manjot ] |
Assignee | Manjot Singh [ manjot ] | Marko Mäkelä [ marko ] |
Workflow | MariaDB v3 [ 110142 ] | MariaDB v4 [ 157976 ] |
In the trace that I analyzed, both the clustered index and the secondary index page consist of a single page. There are 7 records in the clustered index (PRIMARY KEY(col1)) and 4 in the secondary index.
Secondary index:(col1,col_string)=
(1,'1111111111'),
(2,'2222222222'),
(332,'3333333333'),
(7,'1111111111').
None of the records are delete-marked. The first record was inserted last in the record heap; the others were in order. The PAGE_FREE list is empty. All this would suggest that the index was created afterwards and not at table creation.
Clustered index: (col1,DB_TRX_ID,DB_ROLL_PTR,col2,col_int,col_string,col_text)=
(1,0,1<<55,1,1,'1111111111',blob),
(2,0,1<<55,2,2,'2222222222',blob),
(8,0,1<<55,7,8,'8888888888',blob),
(40,0,1<<55,39,40,'4444444444',blob),
(332,0,1<<55,332,324,'3333333333',blob),
(374,0,1<<55,1,373,'3333333333',blob),
(1962,0,1<<55,1,324,'1111111111',blob).
Again, no records are delete-marked, and the PAGE_FREE list is empty.
There are not only missing records in the secondary index, but also an extra record col1=7, even though all history has been purged from the clustered index (DB_TRX_ID=0 was reset by
MDEV-12288). Extra records in the secondary index are only allowed when they correspond to an old version of the clustered index records. Such records would also be delete-marked in the secondary index, unless the DELETE operation is in progress.In other words, this corruption could have manifested itself as
MDEV-9663/MDEV-14643messages as well. Unfortunately, the EXTENDED attribute of CHECK TABLE is being ignored by InnoDB. It would be good if it actually did what is requested: check that there is a 1:1 correspondence between the secondary index and clustered index records.Watchpoints on PAGE_N_RECS on the two pages will let us know what exactly happened:
I think that we have seen enough. Here is the stack trace for what I think is the smoking gun:
10.5 c89366866bca2df46b0592719a1f6b6dabf470cba
Thread 1 hit Breakpoint 6, MDL_context::release_transactional_locks (this=this@entry=0x62b0000bd338) at /home/mleich/10.5/sql/mdl.cc:3043
3043 {
(rr) when
Current event: 51578
(rr) bt
#0 MDL_context::release_transactional_locks (this=this@entry=0x62b0000bd338) at /home/mleich/10.5/sql/mdl.cc:3043
#1 0x000055c67c48c8c8 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5585
#2 0x000055c67c4a73cb in mysql_parse (thd=thd@entry=0x62b0000bd218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x443e7feadff0, is_com_multi=is_com_multi@entry=false,
is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:7993
#3 0x000055c67c46c85a in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000bd218, packet=packet@entry=0x629000ae1219 " COMMIT /* E_R Thread2 QNO 99 CON_ID 15 */ ",
packet_length=packet_length@entry=44, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/10.5/sql/sql_parse.cc:1866
…
(rr) up
#1 0x000055c67c48c8c8 in mysql_execute_command (thd=thd@entry=0x62b0000bd218) at /home/mleich/10.5/sql/sql_parse.cc:5585
5585 thd->mdl_context.release_transactional_locks();
(rr) list
5580 lex->tx_chain != TVL_NO));
5581 bool tx_release= (lex->tx_release == TVL_YES ||
5582 (thd->variables.completion_type == 2 &&
5583 lex->tx_release != TVL_NO));
5584 bool commit_failed= trans_commit(thd);
5585 thd->mdl_context.release_transactional_locks();
5586 if (commit_failed)
5587 {
5588 WSREP_DEBUG("COMMIT failed, MDL released: %lld",
5589 (longlong) thd->thread_id);
(rr) i lo
tx_chain = false
tx_release = false
commit_failed = true
…
This looks like the
MDEV-15532/MDEV-22076scenario: COMMIT is refused for an XA transaction, but yet we wrongly release the MDL that should keep protecting the transaction.mleich, even though the table has a nice small number of records, I do not think that it is useful to test the combination of XA+DDL+DML before those bugs have been addressed.
Please try to reproduce this kind of corruption without XA, or DML with XA (but no concurrent DDL). I do not think that there are many users of XA yet, because it was practically broken before
MDEV-742was addressed in 10.5.