[MDEV-25559] Auto-create: infinite loop after interrupted lock wait Created: 2021-04-28  Updated: 2021-12-17  Resolved: 2021-05-02

Status: Closed
Project: MariaDB Server
Component/s: Locking, Partitioning, Versioned Tables
Affects Version/s: N/A
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Aleksey Midenkov
Resolution: Fixed Votes: 0
Labels: None
Environment:

bb-10.6-midenok-MDEV-17554


Issue Links:
Relates
relates to MDEV-17554 Auto-create history partitions for sy... Closed

 Description   

The scenario is an extension of the timed deadlock reported as MDEV-25547. While in MDEV-25547 the problem is a delay upon DML, possibly very lengthy but still finite, here it develops into a permanent hang – or at least I have never seen it end, and I don't see in the stack trace any suggestions that it is time-limited.

I tried to minimize the scenario, but still not completely sure everything that remains there is necessary. Also, while it currently fails every time for me, it is probably still non-deterministic by nature. I expect it should be easy enough to synchronize it properly after the initial analysis, but I couldn't do it blindly.

The scenario involves 3 connections which compete for auto-creating a partition by running DML on the table. In this case it hits the time interval boundary, although the same applies to limit partitions. When they hit a deadlock similar to described in MDEV-25547, it gets resolved by killing queries in two of three deadlocked connections and finishing their transactions. In the test case query killing is done by the means of max_statement_time, it has the same effect as KILL QUERY.
The remaining connection is meant to finish its own query without any obstacles.

However, it doesn't happen. The remaining connection hangs (or rather loops, at 100% CPU) forever.

I expect it to be a quite realistic scenario, hence the high severity. As mentioned in MDEV-25547, lock_wait_timeout which affects the waiting time in these deadlocks is usually quite lengthy in real-life instances, so it will probably happen a lot that the deadlocks end up being resolved by killing the queries.

The test case is for reproducing purposes only, don't put it into the regression suite, create a synchronized one instead.

--source include/have_partition.inc
--source include/have_metadata_lock_info.inc
--source include/have_innodb.inc
 
call mtr.add_suppression("is out of INTERVAL, need more HISTORY partitions");
 
--connect(con1,localhost,root,,)
set max_statement_time= 2;
--connect(con2,localhost,root,,)
set lock_wait_timeout= 5;
--connect(con3,localhost,root,,)
set max_statement_time= 2;
 
--connection con1
 
SET @@timestamp = unix_timestamp('2021-04-27 00:00:01');
create table t (pk int primary key, a int) engine=InnoDB with system versioning partition by system_time interval 1 hour auto;
insert into t values (1,0),(2,0),(3,0);
 
--echo #
--echo # Jump 1 hour forward, start transaction and get partition p1 created
--echo #
begin;
SET @@timestamp = unix_timestamp('2021-04-27 01:00:01');
update t set a = 1 where pk = 1;
 
--connection con2
--echo #
--echo # Jump 10 more seconds forward, start transaction and get metadata lock on the table
--echo #
begin;
SET @@timestamp = unix_timestamp('2021-04-27 01:00:11');
update t set a = 2 where pk = 2;
 
--connection con1
--echo #
--echo # Jump another hour forward, try to create partition p2, get locked
--echo #
SET @@timestamp = unix_timestamp('2021-04-27 02:00:12');
--send
  update t set a = 1 where pk = 1;
 
--connection con2
--echo #
--echo # Jump another 10 seconds forward, also try to create partition p2 and get locked
--echo #
SET @@timestamp = unix_timestamp('2021-04-27 02:00:22');
--send
  update t set a = 2 where pk = 2;
 
--connection con3
--echo #
--echo # Jump another 10 seconds forward, start transaction, also try to create partition p2 and get locked
--echo #
SET @@timestamp = unix_timestamp('2021-04-27 02:00:33');
begin;
--send
  update t set a = 3 where pk = 3;
 
--connection default
--echo #
--echo # If at this point for con2 we see query "update t set a = 2 where pk = 2"
--echo # in "closing tables" or "Opening tables" state,
--echo # then it will hang at the end. otherwise it will probably go through
--echo #
--query_vertical SELECT ID, COMMAND, STATE, INFO FROM INFORMATION_SCHEMA.PROCESSLIST WHERE INFO = 'update t set a = 2 where pk = 2'
 
--echo #
--echo # Now we just wait till the other two connections interrupt the queries
--echo # due to max_statement_time. It has the same effect as running
--echo # KILL QUERY for them, only without an external action.
--echo # Then we commit transactions in them and disconnect, to be sure
--echo # they don't interfere anymore in any way
--echo # 
--connection con3
--error ER_STATEMENT_TIMEOUT
--reap
commit;
--disconnect con3
 
--connection con1
--error ER_STATEMENT_TIMEOUT
--reap
commit;
--disconnect con1
 
--connection default
--echo #
--echo # Now if we got the hang the processlist shows con2 still running the query,
--echo # everything else idle. Metadata lock info is for informational purposes only,
--echo # it shows the same single lock whether we got the hang or not
--echo #
show processlist;
select * from information_schema.metadata_lock_info;
 
--let $pidfile= `SELECT @@pid_file`
--echo #
--echo # If it hangs, it hangs here.
--echo # It doesn't have anything to deadlock with,
--echo # and even if it did, lock_wait_timeout is only 5 seconds.
--echo # The reproducer doesn't interrupt the hang,
--echo # so that it can be analyzed in peace.
--echo # The server binary: $MYSQLD
--echo # The server PID:
--cat_file $pidfile
--echo #
--connection con2
--reap
--disconnect con2
 
--connection default
drop table t;

The troubled connection is con2. At the end of the test case, when other concurrent connections exited, the output is this (starting from the processlist):

bb-10.6-midenok-MDEV-17554 b96b96f9

show processlist;
Id	User	Host	db	Command	Time	State	Info	Progress
4	root	localhost	test	Query	0	starting	show processlist	0.000
6	root	localhost	test	Query	2	closing tables	update t set a = 2 where pk = 2	0.000
select * from information_schema.metadata_lock_info;
THREAD_ID	LOCK_MODE	LOCK_DURATION	LOCK_TYPE	TABLE_SCHEMA	TABLE_NAME
6	MDL_SHARED_WRITE	NULL	Table metadata lock	test	t
#
# If it hangs, it hangs here.
# It doesn't have anything to deadlock with,
# and even if it did, lock_wait_timeout is only 5 seconds.
# The reproducer doesn't interrupt the hang,
# so that it can be analyzed in peace.

The thread runs around somewhere in open_table, so the stack trace is different every time a snapshot is taken. Here are a couple of examples:

#0  0x0000556b9b959c81 in lf_hash_search_using_hash_value (hash=0x556b9d1c1120 <tdc_hash>, pins=0x556b9fd02f18, hashnr=1932676072, key=0x7f68ec0146d9, keylen=7) at /data/src/bb-10.6-midenok-MDEV-17554/mysys
/lf_hash.cc:499
#1  0x0000556b9a6b0b95 in tdc_acquire_share (thd=0x7f68ec000db8, tl=0x7f68ec014260, flags=3, out_table=0x7f69242843e8) at /data/src/bb-10.6-midenok-MDEV-17554/sql/table_cache.cc:808
#2  0x0000556b9a08f438 in open_table (thd=0x7f68ec000db8, table_list=0x7f68ec014260, ot_ctx=0x7f69242847d0) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_base.cc:1918
#3  0x0000556b9a09871e in open_and_process_table (thd=0x7f68ec000db8, tables=0x7f68ec014260, counter=0x7f69242848e8, flags=0, prelocking_strategy=0x7f6924284858, has_prelocking_list=false, ot_ctx=0x7f692428
47d0) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_base.cc:3970
#4  0x0000556b9a09ac73 in open_tables (thd=0x7f68ec000db8, options=..., start=0x7f69242848c0, counter=0x7f69242848e8, flags=0, prelocking_strategy=0x7f6924284858) at /data/src/bb-10.6-midenok-MDEV-17554/sql
/sql_base.cc:4453
#5  0x0000556b9a077f22 in open_tables (thd=0x7f68ec000db8, tables=0x7f69242848c0, counter=0x7f69242848e8, flags=0) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_base.h:479
#6  0x0000556b9a40bbdb in mysql_update (thd=0x7f68ec000db8, table_list=0x7f68ec014260, fields=..., values=..., conds=0x7f68ec014d40, order_num=0, order=0x0, limit=18446744073709551615, ignore=false, found_r
eturn=0x7f6924284f50, updated_return=0x7f6924285040) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_update.cc:409
#7  0x0000556b9a1e41c6 in mysql_execute_command (thd=0x7f68ec000db8) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_parse.cc:4403
#8  0x0000556b9a1fda04 in mysql_parse (thd=0x7f68ec000db8, rawbuf=0x7f68ec014180 "update t set a = 2 where pk = 2", length=31, parser_state=0x7f6924285490) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_pa
rse.cc:8021
#9  0x0000556b9a1d3d51 in dispatch_command (command=COM_QUERY, thd=0x7f68ec000db8, packet=0x7f68ec00b899 "update t set a = 2 where pk = 2", packet_length=31, blocking=true) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_parse.cc:1898
#10 0x0000556b9a1d0e06 in do_command (thd=0x7f68ec000db8, blocking=true) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_parse.cc:1407
#11 0x0000556b9a4f947b in do_handle_one_connection (connect=0x556ba00acbc8, put_in_cache=true) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_connect.cc:1410
#12 0x0000556b9a4f8e3e in handle_one_connection (arg=0x556ba00acbc8) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_connect.cc:1312
#13 0x0000556b9aeb0310 in pfs_spawn_thread (arg=0x556ba00acca8) at /data/src/bb-10.6-midenok-MDEV-17554/storage/perfschema/pfs.cc:2201
#14 0x00007f692a5f7609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x00007f692a1cb293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

#0  0x0000556b9b988bb4 in code_state () at /data/src/bb-10.6-midenok-MDEV-17554/dbug/dbug.c:379
#1  0x0000556b9b98c548 in _db_enter_ (_func_=0x556b9bd7b946 "ha_partition::loop_partitions", _file_=0x556b9bd77fa8 "/data/src/bb-10.6-midenok-MDEV-17554/sql/ha_partition.cc", _line_=9322, _stack_frame_=0x7f
6924284210) at /data/src/bb-10.6-midenok-MDEV-17554/dbug/dbug.c:1133
#2  0x0000556b9ae96e3f in ha_partition::loop_partitions (this=0x7f68e802e8e0, callback=0x556b9ae9531d <extra_cb(handler*, void*)>, param=0x7f6924284254) at /data/src/bb-10.6-midenok-MDEV-17554/sql/ha_partit
ion.cc:9322
#3  0x0000556b9ae95e37 in ha_partition::extra (this=0x7f68e802e8e0, operation=HA_EXTRA_IS_ATTACHED_CHILDREN) at /data/src/bb-10.6-midenok-MDEV-17554/sql/ha_partition.cc:9144
#4  0x0000556b9a6ad8ce in tc_acquire_table (thd=0x7f68ec000db8, element=0x7f68e8036f38) at /data/src/bb-10.6-midenok-MDEV-17554/sql/table_cache.cc:401
#5  0x0000556b9a6b1297 in tdc_acquire_share (thd=0x7f68ec000db8, tl=0x7f68ec014260, flags=3, out_table=0x7f69242843e8) at /data/src/bb-10.6-midenok-MDEV-17554/sql/table_cache.cc:862
#6  0x0000556b9a08f438 in open_table (thd=0x7f68ec000db8, table_list=0x7f68ec014260, ot_ctx=0x7f69242847d0) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_base.cc:1918
#7  0x0000556b9a09871e in open_and_process_table (thd=0x7f68ec000db8, tables=0x7f68ec014260, counter=0x7f69242848e8, flags=0, prelocking_strategy=0x7f6924284858, has_prelocking_list=false, ot_ctx=0x7f692428
47d0) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_base.cc:3970
#8  0x0000556b9a09ac73 in open_tables (thd=0x7f68ec000db8, options=..., start=0x7f69242848c0, counter=0x7f69242848e8, flags=0, prelocking_strategy=0x7f6924284858) at /data/src/bb-10.6-midenok-MDEV-17554/sql
/sql_base.cc:4453
#9  0x0000556b9a077f22 in open_tables (thd=0x7f68ec000db8, tables=0x7f69242848c0, counter=0x7f69242848e8, flags=0) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_base.h:479
#10 0x0000556b9a40bbdb in mysql_update (thd=0x7f68ec000db8, table_list=0x7f68ec014260, fields=..., values=..., conds=0x7f68ec014d40, order_num=0, order=0x0, limit=18446744073709551615, ignore=false, found_return=0x7f6924284f50, updated_return=0x7f6924285040) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_update.cc:409
#11 0x0000556b9a1e41c6 in mysql_execute_command (thd=0x7f68ec000db8) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_parse.cc:4403
#12 0x0000556b9a1fda04 in mysql_parse (thd=0x7f68ec000db8, rawbuf=0x7f68ec014180 "update t set a = 2 where pk = 2", length=31, parser_state=0x7f6924285490) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_parse.cc:8021
#13 0x0000556b9a1d3d51 in dispatch_command (command=COM_QUERY, thd=0x7f68ec000db8, packet=0x7f68ec00b899 "update t set a = 2 where pk = 2", packet_length=31, blocking=true) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_parse.cc:1898
#14 0x0000556b9a1d0e06 in do_command (thd=0x7f68ec000db8, blocking=true) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_parse.cc:1407
#15 0x0000556b9a4f947b in do_handle_one_connection (connect=0x556ba00acbc8, put_in_cache=true) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_connect.cc:1410
#16 0x0000556b9a4f8e3e in handle_one_connection (arg=0x556ba00acbc8) at /data/src/bb-10.6-midenok-MDEV-17554/sql/sql_connect.cc:1312
#17 0x0000556b9aeb0310 in pfs_spawn_thread (arg=0x556ba00acca8) at /data/src/bb-10.6-midenok-MDEV-17554/storage/perfschema/pfs.cc:2201
#18 0x00007f692a5f7609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#19 0x00007f692a1cb293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95



 Comments   
Comment by Aleksey Midenkov [ 2021-05-02 ]

This is not about synchronisation but about error type of the failed lock. I made shorter version of your script.

Comment by Aleksey Midenkov [ 2021-12-17 ]

Reproduce

--source include/have_partition.inc
--source include/have_innodb.inc
 
--echo #
--echo # MDEV-25559 Auto-create: infinite loop after interrupted lock wait
--echo #
set timestamp= unix_timestamp('2000-01-01 00:00:00');
create table t (pk int primary key, a int) engine innodb with system versioning
partition by system_time interval 1 hour auto;
insert into t values (1, 0);
begin;
update t set a= a + 1;
--connect (con1,localhost,root,,)
set max_statement_time= 1;
set timestamp= unix_timestamp('2000-01-01 01:00:00');
send update t set a= a + 2;
--connection default
set timestamp= unix_timestamp('2000-01-01 01:00:00');
send update t set a= a + 3;
--connection con1
--error ER_STATEMENT_TIMEOUT
reap;
--disconnect con1
--connection default
reap;
commit;
drop table t;

Result

Infinite loop.

Fix

There was a condition like this in Open_table_context::recover_from_failed_open()

    case OT_ADD_HISTORY_PARTITION:
      result= lock_table_names(m_thd, m_thd->lex->create_info, m_failed_table,
                               NULL, get_timeout(), 0);
      /*
         We are now under MDL_EXCLUSIVE mode. Other threads have no table share
         acquired: they are blocked either at open_table_get_mdl_lock() in
         open_table() or at lock_table_names() here.
      */
      if (result)
      {
        if (m_action == OT_ADD_HISTORY_PARTITION &&
            m_thd->get_stmt_da()->sql_errno() == ER_LOCK_WAIT_TIMEOUT)
        {
          // MDEV-23642 Locking timeout caused by auto-creation affects original DML
          TABLE_SHARE *share= tdc_acquire_share(m_thd, m_failed_table,
                                                GTS_TABLE, NULL);
          if (share)
          {
            share->vers_skip_auto_create= false;
            tdc_release_share(share);
          }
          m_thd->clear_error();
          vers_create_count= 0;
          result= false;
        }
        break;
      }

For error code ER_STATEMENT_TIMEOUT like in the above test case we could not get inside the fork and reset vers_skip_auto_create inside the share. Other threads trying to auto-create too will not exit the loop until this flag is false. The above code was replaced by

    case OT_ADD_HISTORY_PARTITION:
      if (!m_thd->locked_tables_mode)
        result= lock_table_names(m_thd, m_thd->lex->create_info, m_failed_table,
                                NULL, get_timeout(), 0);
      else
      {
        DBUG_ASSERT(!result);
        DBUG_ASSERT(m_action == OT_ADD_HISTORY_PARTITION);
      }
      /*
         We are now under MDL_EXCLUSIVE mode. Other threads have no table share
         acquired: they are blocked either at open_table_get_mdl_lock() in
         open_table() or at lock_table_names() here.
      */
      if (result)
      {
        if (m_action == OT_ADD_HISTORY_PARTITION)
        {
          TABLE_SHARE *share= tdc_acquire_share(m_thd, m_failed_table,
                                                GTS_TABLE, NULL);
          if (share)
          {
            share->vers_skip_auto_create= false;
            tdc_release_share(share);
          }
          if (m_thd->get_stmt_da()->sql_errno() == ER_LOCK_WAIT_TIMEOUT)
          {
            // MDEV-23642 Locking timeout caused by auto-creation affects original DML
            m_thd->clear_error();
            vers_create_count= 0;
            result= false;
          }
        }
        break;
      }

Which after successfull share acquisition sets vers_skip_auto_create unconditionally.

Generated at Thu Feb 08 09:38:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.