Index corruption found during RQG testing
|
Some CHECK TABLE t2 EXTENDED harvested
|
test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.
|
And the server error log contains a
|
[rr 76197 241886]2020-06-16 18:31:54 33 [ERROR] InnoDB: Index `Marvão_idx3` of table `test`.`t2` is corrupted
|
[rr 76197 241890]2020-06-16 18:31:54 33 [Note] InnoDB: Load corrupted index `Marvão_idx3` of table `test`.`t2`
|
|
Work flow:
|
1. Start the server and generate some initial data
|
2. Several concurrent sessions runs a DDL/DML mix
|
and sometimes CHECK TABLE EXTENDED
|
Please note critical events like killing the server process followed by Restart
|
with CrashRecovery are not involved..
|
|
I have some "rr" trace for the tree
|
origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08
|
which is based on actual 10.5 but contains some non final changes of Marko.
|
Per Marko the failure cannot be caused by his changes.
|
|
My test machinery has seen the same problem also on
|
10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00
|
but the corresponding details are lost.
|
And so its unknown if that was during concurrent load or after crash with recovery.
|
|
RQG
|
git clone https://github.com/mleich1/rqg --branch experimental RQG
|
origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00
|
|
perl rqg.pl \
|
--grammar=conf/mariadb/table_stress_innodb_nocopy.yy \
|
--gendata=conf/mariadb/table_stress.zz \
|
--gendata_sql=conf/mariadb/table_stress.sql \
|
--mysqld=--innodb_use_native_aio=1 \
|
--mysqld=--innodb_lock_schedule_algorithm=fcfs \
|
--mysqld=--loose-idle_write_transaction_timeout=0 \
|
--mysqld=--loose-idle_transaction_timeout=0 \
|
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
|
--mysqld=--connect_timeout=60 \
|
--mysqld=--interactive_timeout=28800 \
|
--mysqld=--slave_net_timeout=60 \
|
--mysqld=--net_read_timeout=30 \
|
--mysqld=--net_write_timeout=60 \
|
--mysqld=--loose-table_lock_wait_timeout=50 \
|
--mysqld=--wait_timeout=28800 \
|
--mysqld=--lock-wait-timeout=86400 \
|
--mysqld=--innodb-lock-wait-timeout=50 \
|
--no-mask \
|
--queries=10000000 \
|
--seed=random \
|
--reporters=Backtrace \
|
--reporters=ErrorLog \
|
--reporters=Deadlock1 \
|
--validators=None \
|
--mysqld=--log_output=none \
|
--mysqld=--log-bin \
|
--mysqld=--log_bin_trust_function_creators=1 \
|
--mysqld=--loose-max-statement-time=30 \
|
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
|
--engine=InnoDB \
|
--restart_timeout=120 \
|
--duration=300 \
|
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
|
--mysqld=--innodb_stats_persistent=off \
|
--threads=33 \
|
--mysqld=--innodb_page_size=4K \
|
--mysqld=--innodb-buffer-pool-size=256M \
|
--duration=300 \
|
--no_mask \
|
... certain local settings ...
|
|
If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells
|
that running a work flow invoking kill+restart was planned.
|
But the data corruption was hit before the RQG reporter "CrashRecovery1"
|
would kill the server process.
|