[MDEV-13379] subsequent OPTIMIZEs on the same table may cause complete table deadlock Created: 2017-07-25  Updated: 2023-06-06  Resolved: 2023-06-06

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.2
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Andrii Nikitin (Inactive) Assignee: Ramesh Sivaraman
Resolution: Won't Fix Votes: 0
Labels: need_verification

Attachments: Text File gdb.txt    
Issue Links:
Relates
relates to MDEV-10245 Two concurrent OPTIMIZE on InnoDB tab... Closed
relates to MDEV-13316 Unresolved conflict between OPTIMIZE ... Closed

 Description   

I get complete table deadlock with simple test case like below:

1. setup default two-node galera cluster on the same host, e.g. on ports 3314 and 3315
2. Create table:
create table test.t select 1;
3. Start following concurrent load:

( for i1 in {1..100000} ; do mysql -h127.0.0.1 -P3314 -e "insert into t select $i1" test || : ; done ) &> /dev/null &
( for i1 in {1..100000} ; do mysql -h127.0.0.1 -P3315 -e "insert into t select $i1" test || : ; done ) &> /dev/null &
 
( for k1 in {1..1000}; do sleep 10; mysql -h127.0.0.1 -P3314 -e "optimize table t" test & wait ; done ) &>/dev/null &
( for k1 in {1..1000}; do sleep 10; mysql -h127.0.0.1 -P3315 -e "optimize table t" test & wait ; done ) &>/dev/null &
 
 
while true ; do
mysql -h127.0.0.1 -P3314 -e 'show processlist'
mysql -h127.0.0.1 -P3315 -e 'show processlist'
mysql -h127.0.0.1 -P3314 -e 'select count(*) from t' test
mysql -h127.0.0.1 -P3315 -e 'select count(*) from t' test
sleep 2
done

The cluster usually hangs up within 1 min, this is what observed:

(11:17 sql) > mysql -h 127.0.0.1 -P 3314 -uroot  -e 'show processlist'
+-----+-------------+-----------------+------+---------+------+-------------------------+--------------------------+----------+
| Id  | User        | Host            | db   | Command | Time | State                   | Info                     | Progress |
+-----+-------------+-----------------+------+---------+------+-------------------------+--------------------------+----------+
|   1 | system user |                 | test | Sleep   |  748 | Waiting for table flush | optimize table t         |    0.000 |
|   2 | system user |                 | NULL | Sleep   |  815 | wsrep aborter idle      | NULL                     |    0.000 |
| 518 | a           | localhost:37078 | test | Query   |  748 | query end               | insert into t select 487 |    0.000 |
| 519 | a           | localhost:37080 | test | Query   |  748 | Waiting for table flush | select count(*) from t   |    0.000 |
| 520 | a           | localhost:37122 | test | Query   |  738 | checking permissions    | optimize table t         |    0.000 |
| 526 | root        | localhost:38382 | NULL | Query   |    0 | init                    | show processlist         |    0.000 |
+-----+-------------+-----------------+------+---------+------+-------------------------+--------------------------+----------+
(11:17 sql) > mysql -h 127.0.0.1 -P 3315 -uroot -e 'show processlist'
+-----+-------------+-----------------+------+---------+------+----------------------+--------------------------+----------+
| Id  | User        | Host            | db   | Command | Time | State                | Info                     | Progress |
+-----+-------------+-----------------+------+---------+------+----------------------+--------------------------+----------+
|   1 | system user |                 | NULL | Sleep   |  799 | wsrep aborter idle   | NULL                     |    0.000 |
|   2 | system user |                 | NULL | Sleep   |  748 | committed 992        | NULL                     |    0.000 |
| 517 | a           | localhost:46558 | test | Query   |  748 | query end            | insert into t select 505 |    0.000 |
| 518 | a           | localhost:46562 | test | Query   |  738 | checking permissions | optimize table t         |    0.000 |
| 524 | root        | localhost:47822 | NULL | Query   |    0 | init                 | show processlist         |    0.000 |
+-----+-------------+-----------------+------+---------+------+----------------------+--------------------------+----------+

It is kid of strange to me that I cannot reproduce the problem with alter command below instead of OPTIMIZE:
alter table t engine=innodb, algorithm=copy



 Comments   
Comment by Elena Stepanova [ 2017-07-25 ]

See also MDEV-10245 , which is also about concurrent OPTIMIZE causing a deadlock. Stack traces are different, though.

Comment by Andrii Nikitin (Inactive) [ 2017-07-25 ]

smaller lock_wait_timeout isn't really helping as well: with lock_wait_timeout=30 client shows timeout error trying to read table, but other locked connections are not affected:

+ cluster1/sql.sh 'select count(*) from t'
m8 :944
m9 :ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
+ sleep 2
+ true
+ m8-10.1.25/sql.sh show processlist
1       system user             NULL    Sleep   128     committed 928   NULL    0.000
2       system user             NULL    Sleep   184     wsrep aborter idle      NULL    0.000
505     root    localhost       test    Query   127     query end       insert into t select 485        0.000
506     root    localhost       test    Query   117     checking permissions    optimize table t        0.000
513     root    localhost       test    Query   0       init    show processlist        0.000
+ m9-10.1.25/sql.sh show processlist
1       system user             test    Sleep   128     Waiting for table flush optimize table t        0.000
2       system user             NULL    Sleep   168     wsrep aborter idle      NULL    0.000
487     root    localhost       test    Query   128     query end       insert into t select 465        0.000
490     root    localhost       test    Query   118     checking permissions    optimize table t        0.000
499     root    localhost       test    Query   0       init    show processlist        0.000
+ cluster1/sql.sh 'select count(*) from t'
m8 :944
m9 :ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
 
+ sleep 2
+ true
+ m8-10.1.25/sql.sh show processlist
1       system user             NULL    Sleep   160     committed 928   NULL    0.000
2       system user             NULL    Sleep   216     wsrep aborter idle      NULL    0.000
505     root    localhost       test    Query   159     query end       insert into t select 485        0.000
506     root    localhost       test    Query   149     checking permissions    optimize table t        0.000
515     root    localhost       test    Query   0       init    show processlist        0.000
+ m9-10.1.25/sql.sh show processlist
1       system user             test    Sleep   160     Waiting for table flush optimize table t        0.000
2       system user             NULL    Sleep   200     wsrep aborter idle      NULL    0.000
487     root    localhost       test    Query   160     query end       insert into t select 465        0.000
490     root    localhost       test    Query   150     checking permissions    optimize table t        0.000
501     root    localhost       test    Query   0       init    show processlist        0.000
+ cluster1/sql.sh 'select count(*) from t'
m8 :944
m9 :ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
 
+ sleep 2
+ true
+ m8-10.1.25/sql.sh show processlist
1       system user             NULL    Sleep   192     committed 928   NULL    0.000
2       system user             NULL    Sleep   248     wsrep aborter idle      NULL    0.000
505     root    localhost       test    Query   191     query end       insert into t select 485        0.000
506     root    localhost       test    Query   181     checking permissions    optimize table t        0.000
517     root    localhost       test    Query   0       init    show processlist        0.000
+ m9-10.1.25/sql.sh show processlist
1       system user             test    Sleep   192     Waiting for table flush optimize table t        0.000
2       system user             NULL    Sleep   232     wsrep aborter idle      NULL    0.000
487     root    localhost       test    Query   192     query end       insert into t select 465        0.000
490     root    localhost       test    Query   182     checking permissions    optimize table t        0.000
503     root    localhost       test    Query   0       init    show processlist        0.000
+ cluster1/sql.sh 'select count(*) from t'
m8 :944
m9 :ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
 
+ sleep 2
+ true
+ m8-10.1.25/sql.sh show processlist
1       system user             NULL    Sleep   224     committed 928   NULL    0.000
2       system user             NULL    Sleep   280     wsrep aborter idle      NULL    0.000
505     root    localhost       test    Query   223     query end       insert into t select 485        0.000
506     root    localhost       test    Query   213     checking permissions    optimize table t        0.000
519     root    localhost       test    Query   0       init    show processlist        0.000
+ m9-10.1.25/sql.sh show processlist
1       system user             test    Sleep   224     Waiting for table flush optimize table t        0.000
2       system user             NULL    Sleep   264     wsrep aborter idle      NULL    0.000
487     root    localhost       test    Query   224     query end       insert into t select 465        0.000
490     root    localhost       test    Query   214     checking permissions    optimize table t        0.000
505     root    localhost       test    Query   0       init    show processlist        0.000

Comment by Jan Lindström [ 2023-06-06 ]

10.2 is EOL.

Generated at Thu Feb 08 08:05:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.