Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13379

subsequent OPTIMIZEs on the same table may cause complete table deadlock

Details

    Description

      I get complete table deadlock with simple test case like below:

      1. setup default two-node galera cluster on the same host, e.g. on ports 3314 and 3315
      2. Create table:
      create table test.t select 1;
      3. Start following concurrent load:

      ( for i1 in {1..100000} ; do mysql -h127.0.0.1 -P3314 -e "insert into t select $i1" test || : ; done ) &> /dev/null &
      ( for i1 in {1..100000} ; do mysql -h127.0.0.1 -P3315 -e "insert into t select $i1" test || : ; done ) &> /dev/null &
       
      ( for k1 in {1..1000}; do sleep 10; mysql -h127.0.0.1 -P3314 -e "optimize table t" test & wait ; done ) &>/dev/null &
      ( for k1 in {1..1000}; do sleep 10; mysql -h127.0.0.1 -P3315 -e "optimize table t" test & wait ; done ) &>/dev/null &
       
       
      while true ; do
      mysql -h127.0.0.1 -P3314 -e 'show processlist'
      mysql -h127.0.0.1 -P3315 -e 'show processlist'
      mysql -h127.0.0.1 -P3314 -e 'select count(*) from t' test
      mysql -h127.0.0.1 -P3315 -e 'select count(*) from t' test
      sleep 2
      done
      

      The cluster usually hangs up within 1 min, this is what observed:

      (11:17 sql) > mysql -h 127.0.0.1 -P 3314 -uroot  -e 'show processlist'
      +-----+-------------+-----------------+------+---------+------+-------------------------+--------------------------+----------+
      | Id  | User        | Host            | db   | Command | Time | State                   | Info                     | Progress |
      +-----+-------------+-----------------+------+---------+------+-------------------------+--------------------------+----------+
      |   1 | system user |                 | test | Sleep   |  748 | Waiting for table flush | optimize table t         |    0.000 |
      |   2 | system user |                 | NULL | Sleep   |  815 | wsrep aborter idle      | NULL                     |    0.000 |
      | 518 | a           | localhost:37078 | test | Query   |  748 | query end               | insert into t select 487 |    0.000 |
      | 519 | a           | localhost:37080 | test | Query   |  748 | Waiting for table flush | select count(*) from t   |    0.000 |
      | 520 | a           | localhost:37122 | test | Query   |  738 | checking permissions    | optimize table t         |    0.000 |
      | 526 | root        | localhost:38382 | NULL | Query   |    0 | init                    | show processlist         |    0.000 |
      +-----+-------------+-----------------+------+---------+------+-------------------------+--------------------------+----------+
      (11:17 sql) > mysql -h 127.0.0.1 -P 3315 -uroot -e 'show processlist'
      +-----+-------------+-----------------+------+---------+------+----------------------+--------------------------+----------+
      | Id  | User        | Host            | db   | Command | Time | State                | Info                     | Progress |
      +-----+-------------+-----------------+------+---------+------+----------------------+--------------------------+----------+
      |   1 | system user |                 | NULL | Sleep   |  799 | wsrep aborter idle   | NULL                     |    0.000 |
      |   2 | system user |                 | NULL | Sleep   |  748 | committed 992        | NULL                     |    0.000 |
      | 517 | a           | localhost:46558 | test | Query   |  748 | query end            | insert into t select 505 |    0.000 |
      | 518 | a           | localhost:46562 | test | Query   |  738 | checking permissions | optimize table t         |    0.000 |
      | 524 | root        | localhost:47822 | NULL | Query   |    0 | init                 | show processlist         |    0.000 |
      +-----+-------------+-----------------+------+---------+------+----------------------+--------------------------+----------+
      

      It is kid of strange to me that I cannot reproduce the problem with alter command below instead of OPTIMIZE:
      alter table t engine=innodb, algorithm=copy

      Attachments

        Issue Links

          Activity

            See also MDEV-10245 , which is also about concurrent OPTIMIZE causing a deadlock. Stack traces are different, though.

            elenst Elena Stepanova added a comment - See also MDEV-10245 , which is also about concurrent OPTIMIZE causing a deadlock. Stack traces are different, though.
            anikitin Andrii Nikitin (Inactive) added a comment - - edited

            smaller lock_wait_timeout isn't really helping as well: with lock_wait_timeout=30 client shows timeout error trying to read table, but other locked connections are not affected:

            + cluster1/sql.sh 'select count(*) from t'
            m8 :944
            m9 :ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
            + sleep 2
            + true
            + m8-10.1.25/sql.sh show processlist
            1       system user             NULL    Sleep   128     committed 928   NULL    0.000
            2       system user             NULL    Sleep   184     wsrep aborter idle      NULL    0.000
            505     root    localhost       test    Query   127     query end       insert into t select 485        0.000
            506     root    localhost       test    Query   117     checking permissions    optimize table t        0.000
            513     root    localhost       test    Query   0       init    show processlist        0.000
            + m9-10.1.25/sql.sh show processlist
            1       system user             test    Sleep   128     Waiting for table flush optimize table t        0.000
            2       system user             NULL    Sleep   168     wsrep aborter idle      NULL    0.000
            487     root    localhost       test    Query   128     query end       insert into t select 465        0.000
            490     root    localhost       test    Query   118     checking permissions    optimize table t        0.000
            499     root    localhost       test    Query   0       init    show processlist        0.000
            + cluster1/sql.sh 'select count(*) from t'
            m8 :944
            m9 :ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
             
            + sleep 2
            + true
            + m8-10.1.25/sql.sh show processlist
            1       system user             NULL    Sleep   160     committed 928   NULL    0.000
            2       system user             NULL    Sleep   216     wsrep aborter idle      NULL    0.000
            505     root    localhost       test    Query   159     query end       insert into t select 485        0.000
            506     root    localhost       test    Query   149     checking permissions    optimize table t        0.000
            515     root    localhost       test    Query   0       init    show processlist        0.000
            + m9-10.1.25/sql.sh show processlist
            1       system user             test    Sleep   160     Waiting for table flush optimize table t        0.000
            2       system user             NULL    Sleep   200     wsrep aborter idle      NULL    0.000
            487     root    localhost       test    Query   160     query end       insert into t select 465        0.000
            490     root    localhost       test    Query   150     checking permissions    optimize table t        0.000
            501     root    localhost       test    Query   0       init    show processlist        0.000
            + cluster1/sql.sh 'select count(*) from t'
            m8 :944
            m9 :ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
             
            + sleep 2
            + true
            + m8-10.1.25/sql.sh show processlist
            1       system user             NULL    Sleep   192     committed 928   NULL    0.000
            2       system user             NULL    Sleep   248     wsrep aborter idle      NULL    0.000
            505     root    localhost       test    Query   191     query end       insert into t select 485        0.000
            506     root    localhost       test    Query   181     checking permissions    optimize table t        0.000
            517     root    localhost       test    Query   0       init    show processlist        0.000
            + m9-10.1.25/sql.sh show processlist
            1       system user             test    Sleep   192     Waiting for table flush optimize table t        0.000
            2       system user             NULL    Sleep   232     wsrep aborter idle      NULL    0.000
            487     root    localhost       test    Query   192     query end       insert into t select 465        0.000
            490     root    localhost       test    Query   182     checking permissions    optimize table t        0.000
            503     root    localhost       test    Query   0       init    show processlist        0.000
            + cluster1/sql.sh 'select count(*) from t'
            m8 :944
            m9 :ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
             
            + sleep 2
            + true
            + m8-10.1.25/sql.sh show processlist
            1       system user             NULL    Sleep   224     committed 928   NULL    0.000
            2       system user             NULL    Sleep   280     wsrep aborter idle      NULL    0.000
            505     root    localhost       test    Query   223     query end       insert into t select 485        0.000
            506     root    localhost       test    Query   213     checking permissions    optimize table t        0.000
            519     root    localhost       test    Query   0       init    show processlist        0.000
            + m9-10.1.25/sql.sh show processlist
            1       system user             test    Sleep   224     Waiting for table flush optimize table t        0.000
            2       system user             NULL    Sleep   264     wsrep aborter idle      NULL    0.000
            487     root    localhost       test    Query   224     query end       insert into t select 465        0.000
            490     root    localhost       test    Query   214     checking permissions    optimize table t        0.000
            505     root    localhost       test    Query   0       init    show processlist        0.000
            

            anikitin Andrii Nikitin (Inactive) added a comment - - edited smaller lock_wait_timeout isn't really helping as well: with lock_wait_timeout=30 client shows timeout error trying to read table, but other locked connections are not affected: + cluster1/sql.sh 'select count(*) from t' m8 :944 m9 :ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction + sleep 2 + true + m8-10.1.25/sql.sh show processlist 1 system user NULL Sleep 128 committed 928 NULL 0.000 2 system user NULL Sleep 184 wsrep aborter idle NULL 0.000 505 root localhost test Query 127 query end insert into t select 485 0.000 506 root localhost test Query 117 checking permissions optimize table t 0.000 513 root localhost test Query 0 init show processlist 0.000 + m9-10.1.25/sql.sh show processlist 1 system user test Sleep 128 Waiting for table flush optimize table t 0.000 2 system user NULL Sleep 168 wsrep aborter idle NULL 0.000 487 root localhost test Query 128 query end insert into t select 465 0.000 490 root localhost test Query 118 checking permissions optimize table t 0.000 499 root localhost test Query 0 init show processlist 0.000 + cluster1/sql.sh 'select count(*) from t' m8 :944 m9 :ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction   + sleep 2 + true + m8-10.1.25/sql.sh show processlist 1 system user NULL Sleep 160 committed 928 NULL 0.000 2 system user NULL Sleep 216 wsrep aborter idle NULL 0.000 505 root localhost test Query 159 query end insert into t select 485 0.000 506 root localhost test Query 149 checking permissions optimize table t 0.000 515 root localhost test Query 0 init show processlist 0.000 + m9-10.1.25/sql.sh show processlist 1 system user test Sleep 160 Waiting for table flush optimize table t 0.000 2 system user NULL Sleep 200 wsrep aborter idle NULL 0.000 487 root localhost test Query 160 query end insert into t select 465 0.000 490 root localhost test Query 150 checking permissions optimize table t 0.000 501 root localhost test Query 0 init show processlist 0.000 + cluster1/sql.sh 'select count(*) from t' m8 :944 m9 :ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction   + sleep 2 + true + m8-10.1.25/sql.sh show processlist 1 system user NULL Sleep 192 committed 928 NULL 0.000 2 system user NULL Sleep 248 wsrep aborter idle NULL 0.000 505 root localhost test Query 191 query end insert into t select 485 0.000 506 root localhost test Query 181 checking permissions optimize table t 0.000 517 root localhost test Query 0 init show processlist 0.000 + m9-10.1.25/sql.sh show processlist 1 system user test Sleep 192 Waiting for table flush optimize table t 0.000 2 system user NULL Sleep 232 wsrep aborter idle NULL 0.000 487 root localhost test Query 192 query end insert into t select 465 0.000 490 root localhost test Query 182 checking permissions optimize table t 0.000 503 root localhost test Query 0 init show processlist 0.000 + cluster1/sql.sh 'select count(*) from t' m8 :944 m9 :ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction   + sleep 2 + true + m8-10.1.25/sql.sh show processlist 1 system user NULL Sleep 224 committed 928 NULL 0.000 2 system user NULL Sleep 280 wsrep aborter idle NULL 0.000 505 root localhost test Query 223 query end insert into t select 485 0.000 506 root localhost test Query 213 checking permissions optimize table t 0.000 519 root localhost test Query 0 init show processlist 0.000 + m9-10.1.25/sql.sh show processlist 1 system user test Sleep 224 Waiting for table flush optimize table t 0.000 2 system user NULL Sleep 264 wsrep aborter idle NULL 0.000 487 root localhost test Query 224 query end insert into t select 465 0.000 490 root localhost test Query 214 checking permissions optimize table t 0.000 505 root localhost test Query 0 init show processlist 0.000

            10.2 is EOL.

            janlindstrom Jan Lindström added a comment - 10.2 is EOL.

            People

              ramesh Ramesh Sivaraman
              anikitin Andrii Nikitin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.