[MDEV-21217] innodb_force_recovery=2 may wrongly abort the rollback of recovered transactions Created: 2019-12-04  Updated: 2020-07-30  Resolved: 2020-06-13

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.0.34, 10.1.30, 10.2.12, 10.3.3, 10.4.0, 10.5.0
Fix Version/s: 10.5.4, 10.2.33, 10.3.24, 10.4.14

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: rr-profile

Issue Links:
Blocks
blocks MDEV-23339 innodb_force_recovery=2 may still abo... Closed
Problem/Incident
is caused by MDEV-12352 InnoDB shutdown should not be blocked... Closed
Relates
relates to MDEV-22889 InnoDB occasionally breaks the isolat... Open

 Description   

The assert hit during RQG testing was observed first time on
origin/10.5 8ed646f0712b8f459439dfcdc3cf4ae45d79ec95 2019-12-02T13:35:54+03:00
The corresponding remains (archive with core file etc.) of that failing test had to be deleted later.
 
The server error log contained
#0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
#1  0x00005632ef370321 in my_write_core (sig=6) at mysys/stacktrace.c:518
#2  0x00005632eea7659f in handle_fatal_signal (sig=6) at sql/signal_handler.cc:343
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#5  0x00007f8ef32fc37a in __GI_abort () at abort.c:89
#6  0x00007f8ef32f2b47 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x5632ef83c7d4 "trx->lock.table_cached == 0", file=file@entry=0x5632ef83c3a0 "storage/innobase/trx/trx0trx.cc", 
    line=line@entry=381, function=function@entry=0x5632ef83e8d0 <trx_create()::__PRETTY_FUNCTION__> "trx_t* trx_create()") at assert.c:92
#7  0x00007f8ef32f2bf2 in __GI___assert_fail (assertion=0x5632ef83c7d4 "trx->lock.table_cached == 0", file=0x5632ef83c3a0 "storage/innobase/trx/trx0trx.cc", line=381, 
    function=0x5632ef83e8d0 <trx_create()::__PRETTY_FUNCTION__> "trx_t* trx_create()") at assert.c:101
#8  0x00005632ef0f14bc in trx_create () at storage/innobase/trx/trx0trx.cc:381
#9  0x00005632eee729f2 in innobase_trx_allocate (thd=0x7f8eb8001bd8) at storage/innobase/handler/ha_innodb.cc:2526
#10 0x00005632eee72ad1 in check_trx_exists (thd=0x7f8eb8001bd8) at storage/innobase/handler/ha_innodb.cc:2551
#11 0x00005632eee9e303 in ha_innobase::update_thd (this=0x7f8eb800f3c0, thd=0x7f8eb8001bd8) at storage/innobase/handler/ha_innodb.cc:2767
#12 0x00005632eee8d0dc in ha_innobase::info_low (this=0x7f8eb800f3c0, flag=282, is_analyze=false) at storage/innobase/handler/ha_innodb.cc:14493
#13 0x00005632eee8dd1d in ha_innobase::info (this=0x7f8eb800f3c0, flag=282) at storage/innobase/handler/ha_innodb.cc:14806
#14 0x00005632eee7a5ae in ha_innobase::open (this=0x7f8eb800f3c0, name=0x7f8eb800bda8 "./mysql/gtid_slave_pos") at storage/innobase/handler/ha_innodb.cc:6117
#15 0x00005632eea7f3f8 in handler::ha_open (this=0x7f8eb800f3c0, table_arg=0x7f8eb800e1c8, name=0x7f8eb800bda8 "./mysql/gtid_slave_pos", mode=2, test_if_locked=18, mem_root=0x0, partitions_to_open=0x0)
    at sql/handler.cc:2736
#16 0x00005632ee8722a9 in open_table_from_share (thd=0x7f8eb8001bd8, share=0x7f8eb800b810, alias=0x7f8eec21e3a8, db_stat=33, prgflag=8, ha_open_flags=18, outparam=0x7f8eb800e1c8, is_create_table=false, 
    partitions_to_open=0x0) at sql/table.cc:4094
#17 0x00005632ee68dcc4 in open_table (thd=0x7f8eb8001bd8, table_list=0x7f8eec21e360, ot_ctx=0x7f8eec21e190) at sql/sql_base.cc:2074
#18 0x00005632ee69175d in open_and_process_table (thd=0x7f8eb8001bd8, tables=0x7f8eec21e360, counter=0x7f8eec21e224, flags=0, prelocking_strategy=0x7f8eec21e2a8, has_prelocking_list=false, ot_ctx=0x7f8eec21e190)
    at sql/sql_base.cc:3837
#19 0x00005632ee6929f7 in open_tables (thd=0x7f8eb8001bd8, options=..., start=0x7f8eec21e208, counter=0x7f8eec21e224, flags=0, prelocking_strategy=0x7f8eec21e2a8) at sql/sql_base.cc:4311
#20 0x00005632ee694b74 in open_and_lock_tables (thd=0x7f8eb8001bd8, options=..., tables=0x7f8eec21e360, derived=false, flags=0, prelocking_strategy=0x7f8eec21e2a8) at sql/sql_base.cc:5204
#21 0x00005632ee64ea15 in open_and_lock_tables (thd=0x7f8eb8001bd8, tables=0x7f8eec21e360, derived=false, flags=0) at sql/sql_base.h:506
#22 0x00005632ee8bc86d in scan_one_gtid_slave_pos_table (thd=0x7f8eb8001bd8, hash=0x7f8eec21ee10, array=0x7f8eec21edb0, tablename=0x7f8eb800a6e0, out_hton=0x7f8eec21ea70) at sql/rpl_rli.cc:1531
#23 0x00005632ee8bd633 in load_gtid_state_cb (thd=0x7f8eb8001bd8, table_name=0x7f8eb800a6e0, arg=0x7f8eec21edd0) at sql/rpl_rli.cc:1806
#24 0x00005632ee8bd12a in scan_all_gtid_slave_pos_table (thd=0x7f8eb8001bd8, cb=0x5632ee8bd5f1 <load_gtid_state_cb(THD*, LEX_CSTRING*, void*)>, cb_data=0x7f8eec21edd0) at sql/rpl_rli.cc:1675
#25 0x00005632ee8bd7f1 in rpl_load_gtid_slave_state (thd=0x7f8eb8001bd8) at sql/rpl_rli.cc:1843
#26 0x00005632eebba326 in binlog_background_thread (arg=0x0) at sql/log.cc:9971
#27 0x00007f8ef413f6da in start_thread (arg=0x7f8eec21f700) at pthread_create.c:456
#28 0x00007f8ef33cdd7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
 
What follows are the results of another replay on
origin/10.5 d2c593c2a66549abe67c4e73dbefeb876fa5ed9d 2020-06-12T16:22:03+03:00  build with ASAN
which claims to be a 10.5.4.
...
2020-06-12 16:35:45 0 [Note] Crash recovery finished.
...
2020-06-12 16:35:50 0 [Note] /home/mleich/Server_bin/10.5_asan//bin/mysqld: ready for connections.
2020-06-12T16:36:58 [67872] | Version: '10.5.4-MariaDB-debug-log'  socket: ...
2020-06-12T16:36:58 [67872] | mysqld: /home/mleich/10.5/storage/innobase/trx/trx0trx.cc:376: trx_t* trx_create(): Assertion `trx->lock.table_cached == 0' failed.
....    "rr" together with ASAN etc. caused that I did not got the usual backtrace within the server error log 
 
Backtrace based on rr trace:
-------------------------------------------
Thread 2 (Thread 59357.95740):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fda70738801 in __GI_abort () at abort.c:79
#2  0x00007fda7072839a in __assert_fail_base (fmt=0x7fda708af7d8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55db86822240 "trx->lock.table_cached == 0", 
    file=file@entry=0x55db86821a00 "/home/mleich/10.5/storage/innobase/trx/trx0trx.cc", line=line@entry=376, function=function@entry=0x55db86827220 <trx_create()::__PRETTY_FUNCTION__> "trx_t* trx_create()") at assert.c:92
#3  0x00007fda70728412 in __GI___assert_fail (assertion=0x55db86822240 "trx->lock.table_cached == 0", file=0x55db86821a00 "/home/mleich/10.5/storage/innobase/trx/trx0trx.cc", line=376, 
    function=0x55db86827220 <trx_create()::__PRETTY_FUNCTION__> "trx_t* trx_create()") at assert.c:101
#4  0x000055db8523576a in trx_create () at /home/mleich/10.5/storage/innobase/trx/trx0trx.cc:376
#5  0x000055db84cf2b02 in innobase_trx_allocate (thd=0x62b00007e218) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:2465
#6  0x000055db84cf2cea in check_trx_exists (thd=0x62b00007e218) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:2490
#7  0x000055db84d5aa1f in ha_innobase::update_thd (this=0x61d0007e90b8, thd=0x62b00007e218) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:2706
#8  0x000055db84d32ede in ha_innobase::info_low (this=0x61d0007e90b8, flag=282, is_analyze=false) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:14395
#9  0x000055db84d34c73 in ha_innobase::info (this=0x61d0007e90b8, flag=282) at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:14707
#10 0x000055db84d041fd in ha_innobase::open (this=0x61d0007e90b8, name=0x61b000080250 "./test/E") at /home/mleich/10.5/storage/innobase/handler/ha_innodb.cc:6032
#11 0x000055db84220eee in handler::ha_open (this=0x61d0007e90b8, table_arg=0x620000040098, name=0x61b000080250 "./test/E", mode=2, test_if_locked=18, mem_root=0x0, partitions_to_open=0x0) at /home/mleich/10.5/sql/handler.cc:2906
#12 0x000055db83d862c7 in open_table_from_share (thd=0x62b00007e218, share=0x61b00007fcb8, alias=0x629000401280, db_stat=33, prgflag=8, ha_open_flags=18, outparam=0x620000040098, is_create_table=false, partitions_to_open=0x0)
    at /home/mleich/10.5/sql/table.cc:4185
#13 0x000055db838ad314 in open_table (thd=0x62b00007e218, table_list=0x629000401238, ot_ctx=0x373f36466140) at /home/mleich/10.5/sql/sql_base.cc:1981
#14 0x000055db838b6f22 in open_and_process_table (thd=0x62b00007e218, tables=0x629000401238, counter=0x373f36466260, flags=1090, prelocking_strategy=0x373f364662a0, has_prelocking_list=false, ot_ctx=0x373f36466140)
    at /home/mleich/10.5/sql/sql_base.cc:3780
#15 0x000055db838b9aca in open_tables (thd=0x62b00007e218, options=..., start=0x373f36466230, counter=0x373f36466260, flags=1090, prelocking_strategy=0x373f364662a0) at /home/mleich/10.5/sql/sql_base.cc:4252
#16 0x000055db838a2eae in open_tables (thd=0x62b00007e218, tables=0x373f36466230, counter=0x373f36466260, flags=1090, prelocking_strategy=0x373f364662a0) at /home/mleich/10.5/sql/sql_base.h:264
#17 0x000055db838bf6de in open_normal_and_derived_tables (thd=0x62b00007e218, tables=0x629000401238, flags=1090, dt_phases=35) at /home/mleich/10.5/sql/sql_base.cc:5221
#18 0x000055db838bfc5a in open_tables_only_view_structure (thd=0x62b00007e218, table_list=0x629000401238, can_deadlock=false) at /home/mleich/10.5/sql/sql_base.cc:5272
#19 0x000055db83c0c0c2 in fill_schema_table_by_open (thd=0x62b00007e218, mem_root=0x62b000083ce0, is_show_fields_or_keys=true, table=0x624000488138, schema_table=0x55db87399260 <schema_tables+1920>, orig_db_name=0x62b000085800, 
    orig_table_name=0x62b000085810, open_tables_state_backup=0x373f36468510, can_deadlock=false) at /home/mleich/10.5/sql/sql_show.cc:4600
#20 0x000055db83c0ebd4 in get_all_tables (thd=0x62b00007e218, tables=0x62b000087238, cond=0x0) at /home/mleich/10.5/sql/sql_show.cc:5125
#21 0x000055db83c41bc3 in get_schema_tables_result (join=0x62b0000883a0, executed_place=PROCESSED_BY_JOIN_EXEC) at /home/mleich/10.5/sql/sql_show.cc:8673
#22 0x000055db83b143b9 in JOIN::exec_inner (this=0x62b0000883a0) at /home/mleich/10.5/sql/sql_select.cc:4401
#23 0x000055db83b12268 in JOIN::exec (this=0x62b0000883a0) at /home/mleich/10.5/sql/sql_select.cc:4225
#24 0x000055db83b16322 in mysql_select (thd=0x62b00007e218, tables=0x62b000087238, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2684619520, result=0x62b000088370, unit=0x62b000082230, 
    select_lex=0x62b000082a30) at /home/mleich/10.5/sql/sql_select.cc:4649
#25 0x000055db83ae84d3 in handle_select (thd=0x62b00007e218, lex=0x62b000082168, result=0x62b000088370, setup_tables_done_option=0) at /home/mleich/10.5/sql/sql_select.cc:417
#26 0x000055db83a58ca5 in execute_sqlcom_select (thd=0x62b00007e218, all_tables=0x62b000087238) at /home/mleich/10.5/sql/sql_parse.cc:6208
#27 0x000055db83a47fd4 in mysql_execute_command (thd=0x62b00007e218) at /home/mleich/10.5/sql/sql_parse.cc:3939
#28 0x000055db83a63df9 in mysql_parse (thd=0x62b00007e218, rawbuf=0x62b000085238 "SHOW KEYS FROM `test`.`E`", length=25, parser_state=0x373f3646a8f0, is_com_multi=false, is_next_command=false) at /home/mleich/10.5/sql/sql_parse.cc:7992
#29 0x000055db83a3a94d in dispatch_command (command=COM_QUERY, thd=0x62b00007e218, packet=0x6290003ed219 "SHOW KEYS FROM `test`.`E`", packet_length=25, is_com_multi=false, is_next_command=false) at /home/mleich/10.5/sql/sql_parse.cc:1874
#30 0x000055db83a371cb in do_command (thd=0x62b00007e218) at /home/mleich/10.5/sql/sql_parse.cc:1355
#31 0x000055db83e64708 in do_handle_one_connection (connect=0x6080045091b8, put_in_cache=true) at /home/mleich/10.5/sql/sql_connect.cc:1411
#32 0x000055db83e64066 in handle_one_connection (arg=0x6080045091b8) at /home/mleich/10.5/sql/sql_connect.cc:1313
#33 0x000055db84b1cac1 in pfs_spawn_thread (arg=0x615000008f18) at /home/mleich/10.5/storage/perfschema/pfs.cc:2201
#34 0x0000104016d8d6db in start_thread (arg=0x373f3646c300) at pthread_create.c:463
#35 0x00007fda7081988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 
Work flow of the RQG Test
0. Bootstrap
1. Start the server with innodb_force_recovery=2
2. Create and fill all tables which get used later.
3. Several concurrent sessions run excessive DML on these tables
     ~ 280s after start of 3. the server process gets killed (SIGKILL)
4. The attempt to restart that server with innodb_force_recovery=2 had success.
5. Than come checks for he integrity of the data content and there a 
     SHOW KEYS FROM `test`.`E` caused the assert.
 
RQG
git clone https://github.com/mleich1/rqg --branch experimental RQG
origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00
 
perl rqg.pl \                        
--grammar=conf/mariadb/innodb_compression_encryption.yy \
--gendata=conf/mariadb/innodb_compression_encryption.zz \
--mysqld=--plugin-load-add=file_key_management.so \
--mysqld=--loose-file-key-management-filename=conf/mariadb/encryption_keys.txt \
--mysqld=--innodb_use_native_aio=1 \
--mysqld=--innodb_stats_persistent=off \
--mysqld=--innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--net_read_timeout=30 \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--wait_timeout=28800 \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--innodb-lock-wait-timeout=50 \
--no-mask \
--queries=10000000 \
--duration=300 \
--seed=random \
--reporters=Backtrace \
--reporters=ErrorLog \
--reporters=Deadlock1 \
--reporters=CrashRecovery1 \
--validators=None \
--mysqld=--log_output=none \
--mysqld=--log-bin \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--loose-max-statement-time=30 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--engine=InnoDB \
--restart_timeout=120 \
--threads=2 \
--mysqld=--innodb_force_recovery=2 \
--mysqld=--innodb_page_size=8K \
--mysqld=--innodb-buffer-pool-size=256M \
--duration=300 \
--no_mask \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<local settings> \
--basedir2=<local settings> \
--script_debug=_nix_ \
--rr=Server \
--rr_options=--chaos
 
Certain options are not required for replay.
 



 Comments   
Comment by Marko Mäkelä [ 2020-06-13 ]

We resurrected table locks for the recovered transaction:

10.5 d2c593c2a66549abe67c4e73dbefeb876fa5ed9d

#0  0x000055db84eaa5ae in lock_table_create (table=0x619000028120, 
    type_mode=1, trx=0x2edc0254fb90, c_lock=0x0)
    at /home/mleich/10.5/storage/innobase/lock/lock0lock.cc:3485
#1  0x000055db84eadc4d in lock_table_ix_resurrect (table=0x619000028120, 
    trx=0x2edc0254fb90)
    at /home/mleich/10.5/storage/innobase/lock/lock0lock.cc:3942
#2  0x000055db85238b88 in trx_resurrect_table_locks (trx=0x2edc0254fb90, 
    undo=0x61200000b458)
    at /home/mleich/10.5/storage/innobase/trx/trx0trx.cc:644
#3  0x000055db852399c2 in trx_resurrect (undo=0x61200000b458, 
    rseg=0x61300000a4d8, start_time=1591979736, start_time_micro=289221474758, 
    rows_to_undo=0x7ffc53b2fa30, is_old_insert=false)
    at /home/mleich/10.5/storage/innobase/trx/trx0trx.cc:725
#4  0x000055db8523a0c0 in trx_lists_init_at_db_start ()
    at /home/mleich/10.5/storage/innobase/trx/trx0trx.cc:786
(rr) when
Current event: 108171

Later, the transaction was rolled back, but we did not clear {{ trx->lock.table_cached}} before freeing the transaction:

10.5 d2c593c2a66549abe67c4e73dbefeb876fa5ed9d

Thread 62 hit Breakpoint 5, trx_free (trx=@0x46c635b407c8: 0x2edc0254fb90)
    at /home/mleich/10.5/storage/innobase/trx/trx0trx.cc:394
(rr) bt
#0  trx_free (trx=@0x46c635b407c8: 0x2edc0254fb90)
    at /home/mleich/10.5/storage/innobase/trx/trx0trx.cc:394
#1  0x000055db852379c2 in trx_free_at_shutdown (trx=0x2edc0254fb90)
    at /home/mleich/10.5/storage/innobase/trx/trx0trx.cc:552
#2  0x000055db8521df14 in trx_rollback_recovered (all=true)
    at /home/mleich/10.5/storage/innobase/trx/trx0roll.cc:809
#3  0x000055db8521e3ae in trx_rollback_all_recovered ()
    at /home/mleich/10.5/storage/innobase/trx/trx0roll.cc:838
#4  0x0000104016d8d6db in start_thread (arg=0x46c635b41700)
    at pthread_create.c:463
#5  0x00007fda7081988f in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) when
Current event: 115082

A little later, the transaction object will be reused in trx_create() and the problem will be flagged.

It appears that normally, the field would be cleared by the call trx_init(this) in trx_t::commit_in_memory(). But, here we are incorrectly marking the rollback as interrupted:

	if (UNIV_UNLIKELY(trx_get_dict_operation(trx) == TRX_DICT_OP_NONE
			  && !srv_undo_sources
			  && !srv_is_being_started)
	    && (srv_fast_shutdown == 3 || trx == trx_roll_crash_recv_trx)) {
		/* Shutdown has been initiated. */
		trx->error_state = DB_INTERRUPTED;
		return NULL;
	}

Shutdown was not initiated, but we had failed to set srv_undo_sources if the value of innodb_force_recovery is 2, 3, or 4.
mleich, can you please try the following patch?

diff --git a/storage/innobase/srv/srv0start.cc b/storage/innobase/srv/srv0start.cc
index 3fe78bc1909..4f68af25bcf 100644
--- a/storage/innobase/srv/srv0start.cc
+++ b/storage/innobase/srv/srv0start.cc
@@ -1875,6 +1875,7 @@ dberr_t srv_start(bool create_new_db)
 
 		if (srv_force_recovery < SRV_FORCE_NO_TRX_UNDO
 		    && !srv_read_only_mode) {
+			srv_undo_sources = true;
 			/* Drop partially created indexes. */
 			row_merge_drop_temp_indexes();
 			/* Drop garbage tables. */
@@ -1915,8 +1916,9 @@ dberr_t srv_start(bool create_new_db)
 		ut_ad(srv_force_recovery >= SRV_FORCE_NO_UNDO_LOG_SCAN
 		      || !purge_sys.enabled());
 
+		srv_undo_sources = !high_level_read_only;
+
 		if (srv_force_recovery < SRV_FORCE_NO_BACKGROUND) {
-			srv_undo_sources = true;
 			/* Create the dict stats gathering task */
 			dict_stats_start();
 			/* Create the thread that will optimize the

Comment by Marko Mäkelä [ 2020-06-13 ]

Sorry, my patch cannot work as is. The srv_undo_sources is being as a predicate that tells whether the additional threads or tasks exist.
Here is a less intrusive patch that I think should work in this scenario:

diff --git a/storage/innobase/row/row0undo.cc b/storage/innobase/row/row0undo.cc
index 398e31a5ae5..c4b0af997f5 100644
--- a/storage/innobase/row/row0undo.cc
+++ b/storage/innobase/row/row0undo.cc
@@ -477,6 +477,7 @@ row_undo_step(
 	ut_ad(que_node_get_type(node) == QUE_NODE_UNDO);
 
 	if (UNIV_UNLIKELY(trx_get_dict_operation(trx) == TRX_DICT_OP_NONE
+			  && trx != trx_roll_crash_recv_trx
 			  && !srv_undo_sources
 			  && !srv_is_being_started)
 	    && (srv_fast_shutdown == 3 || trx == trx_roll_crash_recv_trx)) {

I will try to come up with a test case. Basically, we should start up the server with innodb_force_recovery=2 and expect the rollback to complete. I hope that this will be observable with a READ UNCOMMITTED transaction.

Comment by Marko Mäkelä [ 2020-06-13 ]

I think that we must update the predicate to check for srv_shutdown_state==SRV_SHUTDOWN_INITIATED that was introduced in MDEV-22827. That should be the least intrusive fix.

I am able to reproduce the problem with the following test case, which is not entirely deterministic, because it depends on the relative timing of trx_rollback_all_recovered and the srv_is_being_started=false assignment in srv_start():

--source include/have_innodb.inc
--source include/not_embedded.inc
--source include/have_sequence.inc
 
FLUSH TABLES;
 
connect (con0,localhost,root);
CREATE TABLE t0 (a INT PRIMARY KEY) ENGINE=InnoDB;
BEGIN;
INSERT INTO t0 SELECT * FROM seq_1_to_10000;
 
connection default;
# Persist the above incomplete transaction.
SET GLOBAL innodb_flush_log_at_trx_commit=1;
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
 
--let $restart_parameters= --innodb-force-recovery=2
--let $shutdown_timeout=0
--source include/restart_mysqld.inc
--let $restart_parameters=
--let $shutdown_timeout=
 
disconnect con0;
connection default;
SELECT * FROM t0 LOCK IN SHARE MODE;
DROP TABLE t0,t1;

Without the fix, the SELECT would hang, because the rollback was aborted and the locks were not released.

Comment by Marko Mäkelä [ 2020-06-13 ]

I believe that this logic was broken in MDEV-12352. The fix will depend on MDEV-22769.

Comment by Marko Mäkelä [ 2020-06-13 ]

Pushed to 10.2 and merged to 10.3 (with conflict resolution).

Generated at Thu Feb 08 09:05:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.