[MDEV-19425] Deadlock after concurrent DDL/DML/KILL connection Created: 2019-05-09  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: Locking
Affects Version/s: 10.2.24, 10.3.15, 10.4.5
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Vladislav Vaintroub
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File MDEV-19425.tgz    

 Description   

Problem found during RQG testing.
The output of the MTR based replay test
       Two connections run many loops containing the sequence
     DELETE FROM t0 WHERE col_int = 1 OR col_int IS NULL ; INSERT INTO t0 (col1,col2, col_int) VALUES (1,1,1) , (1,1,1) ; COMMIT ;
      DELETE FROM t0 WHERE col_int = 1 OR col_int IS NULL ; INSERT INTO t0 (col1,col2, col_int) VALUES (1,1,1) , (1,1,1) ; COMMIT ;
      ALTER TABLE t0 ADD COLUMN col_int_g_copy INTEGER GENERATED ALWAYS AS (col_int) VIRTUAL ; ALTER TABLE t0 DROP COLUMN col_int_g ; ALTER TABLE t0 CHANGE COLUMN col_int_g_copy col_int_g INTEGER GENERATED ALWAYS AS (col_int) VIRTUAL ;
   SET @kill_id =  CONNECTION_ID() ; KILL SOFT CONNECTION @kill_id ;
   
  and a third connection observes the processlist is after some first phase like
SHOW PROCESSLIST;
Id      User    Host    db      Command Time    State   Info    Progress
4       root    localhost       test    Query   0       init    SHOW PROCESSLIST        0.000
78      root    localhost       test    Query   106     Waiting for table metadata lock ALTER TABLE t0 CHANGE COLUMN col_int_g_copy col_int_g INTEGER GENERATED ALWAYS AS (col_int) VIRTUAL     0.000
79      root    localhost       test    Query   106     Waiting for table level lock    INSERT INTO t0 (col1,col2, col_int) VALUES (1,1,1) , (1,1,1)    0.000
...
SHOW PROCESSLIST;
Id      User    Host    db      Command Time    State   Info    Progress
4       root    localhost       test    Query   0       init    SHOW PROCESSLIST        0.000
78      root    localhost       test    Query   115     Waiting for table metadata lock ALTER TABLE t0 CHANGE COLUMN col_int_g_copy col_int_g INTEGER GENERATED ALWAYS AS (col_int) VIRTUAL     0.000
79      root    localhost       test    Query   115     Waiting for table level lock    INSERT INTO t0 (col1,col2, col_int) VALUES (1,1,1) , (1,1,1)    0.000
with two connections showing no changes except raising time.
 
10.2.24 commit 8f9c8579d0bfa12aa7966b773864d816d68be064        2019-05
10.3.15 commit 651a43e0a0361098c35164a128ca588a2989d8e1 2019-05-07
10.4.5 commit baadbe96019b205164167928d80e836ebbb6bcfe 2019-04-23



 Comments   
Comment by Matthias Leich [ 2019-05-09 ]

Content of the uploaded archive MDEV-19425.tgz
- mysqltest_background.sh  --  shellscript running the concurrent sessions
- t/ml_deadlock.test -- MTR based test which calls  mysqltest_background.sh
- prt  -- protocol of my MTR run
You will most probably need some Linux with GNU tools for running the replay test.
I apologize in advance for the case that 
- mysqltest_background.sh  behaves somehow unfortunate
- all that above is no locking problem within the server at all
In case MTR prints processlist content like
SHOW PROCESSLIST;
Id  User    Host    db  Command Time    State   Info    Progress
4   root    localhost   test    Query   0   init    SHOW PROCESSLIST    0.000
27  root    localhost   test    Query   2   Waiting for table metadata lock ALTER TABLE t0 CHANGE COLUMN col_int_g_copy col_int_g INTEGER GENERATED ALWAYS AS (col_int) VIRTUAL 0.000
28  root    localhost   test    Query   2   Waiting for table level lock    INSERT INTO t0 (col1,col2, col_int) VALUES (1,1,1) , (1,1,1)    0.000
with increasing time values for the two connections waiting for locks than you have replayed the problem.
 
Storage engines for t0 tested (just assignment of engine with no extra engine options):
1. MyISAM and also Aria --> The bad effect described here.
2. InnoDB --> No bad effect

Generated at Thu Feb 08 08:51:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.