Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28949

Deadlock between online alter and DML

    XMLWordPrintable

Details

    Description

      The first test case uses a debug sync point alter_table_online_downgraded; any other sync point after it and up to the exclusive lock at the end can be used instead.
      It fails already upon first update, with ER_LOCK_WAIT_TIMEOUT, which probably shouldn't be happening. But if the test is run with --force --force, it goes past the error and runs another UPDATE. After that the ALTER is allowed to proceed from the sync point, but it gets locked up with UPDATE. The deadlock is timed, both DML and DDL obey lock_wait_timeout.

      --source include/have_debug_sync.inc
       
      create table t (a int) engine=MyISAM;
      insert into t values (1),(2);
       
      --connect (con_dml,localhost,root,,test)
      --send set debug_sync= 'now wait_for waiting'
       
      --connect (con_alter,localhost,root,,test)
      set debug_sync= 'alter_table_online_downgraded signal waiting wait_for goforit';
      --send alter table t force, algorithm=copy, lock=none
       
      --connection con_dml
      --reap
      insert into t values (3);
        set statement lock_wait_timeout=0 for 
      update t set a = 4;
       
      #
      # Run with --force --force to go past this point
      #
       
      --send update t set a = 5
       
      --connection default
      set debug_sync= 'now signal goforit';
       
      --connection con_dml
      --reap
       
      # Cleanup
      drop table t;
      set debug_sync= reset;
      

      The output (with --force --force) looks like this:

      bb-10.10-MDEV-16329 cb1f08bd1

      insert into t values (3);
      set statement lock_wait_timeout=0 for 
      update t set a = 4;
      mysqltest: At line 16: query 'set statement lock_wait_timeout=0 for 
      update t set a = 4' failed: ER_LOCK_WAIT_TIMEOUT (1205): Lock wait timeout exceeded; try restarting transaction
      update t set a = 5;
      connection default;
      set debug_sync= 'now signal goforit';
      connection con_dml;
      

      (stuck at this point)

      The processlist is

      |  4 | root | localhost       | test | Sleep   |   56 |                                 | NULL                                           |    0.000 |
      |  5 | root | localhost       | test | Query   |   56 | Waiting for table level lock    | update t set a = 5                             |    0.000 |
      |  6 | root | localhost       | test | Query   |   56 | Waiting for table metadata lock | alter table t force, algorithm=copy, lock=none |   75.000 |
      |  7 | root | localhost:45942 | NULL | Query   |    0 | starting                        | show processlist                               |    0.000 |
      

      The second test case is a non-deterministic one without sync points (to demonstrate that instrumentation is not the culprit).

      create table t (a int) engine=MyISAM;
      insert into t values (1),(2),(3),(4),(5);
       
      --connect (con1,localhost,root,,)
      --let $run=100
      while ($run)
      {
        --echo  # $run alter(s) left
        --send
          alter table t force, algorithm=copy, lock=none;
        --connection default
          update t set a = 0;
          update t set a = 0;
        --connection con1
        --reap
        --dec $run
      }
       
      # Cleanup
      drop table t;
      

      It fails for me immediately, but it can vary on different machines and builds. If it doesn't fail, add more update attempts (e.g. make it an inner cycle) or populate the table with more values, to make ALTER longer.

      The output is something like

      # 100 alter(s) left
      alter table t force, algorithm=copy, lock=none;
      connection default;
      update t set a = 0;
      update t set a = 0;
      

      (stuck at this point)

      Stack traces from the two threads from the non-deterministic variant:

      bb-10.10-MDEV-16329 cb1f08bd1

      Thread 7 (Thread 0x7fa994357700 (LWP 2440397) "mariadbd"):
      #0  futex_abstimed_wait_cancelable (private=0, abstime=0x7fa99434d8f0, clockid=-1808476848, expected=0, futex_word=0x7fa97c000fe4) at ../sysdeps/nptl/futex-internal.h:323
      #1  __pthread_cond_wait_common (abstime=0x7fa99434d8f0, clockid=-1808476848, mutex=0x7fa97c000f30, cond=0x7fa97c000fb8) at pthread_cond_wait.c:520
      #2  __pthread_cond_timedwait (cond=0x7fa97c000fb8, mutex=0x7fa97c000f30, abstime=0x7fa99434d8f0) at pthread_cond_wait.c:656
      #3  0x00005556bdf80863 in safe_cond_timedwait (cond=0x7fa97c000fb8, mp=0x7fa97c000f08, abstime=0x7fa99434d8f0, file=0x5556be5ca610 "/data/src/preview-10.10-online-alter-gcov/mysys/my_thr_init.c", line=609) at /data/src/preview-10.10-online-alter-gcov/mysys/thr_mutex.c:548
      #4  0x00005556bdf7566f in psi_cond_timedwait (that=0x7fa97c000fb8, mutex=0x7fa97c000f08, abstime=0x7fa99434d8f0, file=0x5556be1b3b98 "/data/src/preview-10.10-online-alter-gcov/sql/mdl.cc", line=1195) at /data/src/preview-10.10-online-alter-gcov/mysys/my_thr_init.c:609
      #5  0x00005556bcc564ad in inline_mysql_cond_timedwait (that=0x7fa97c000fb8, mutex=0x7fa97c000f08, abstime=0x7fa99434d8f0, src_file=0x5556be1b3b98 "/data/src/preview-10.10-online-alter-gcov/sql/mdl.cc", src_line=1195) at /data/src/preview-10.10-online-alter-gcov/include/mysql/psi/mysql_thread.h:1086
      #6  0x00005556bcc59289 in MDL_wait::timed_wait (this=0x7fa97c000f08, owner=0x7fa97c000e98, abs_timeout=0x7fa99434d8f0, set_status_on_timeout=false, wait_state_name=0x5556beded990 <MDL_key::m_namespace_to_wait_state_name+48>) at /data/src/preview-10.10-online-alter-gcov/sql/mdl.cc:1195
      #7  0x00005556bcc5d6af in MDL_context::acquire_lock (this=0x7fa97c000f08, mdl_request=0x7fa99434d9d0, lock_wait_timeout=86400) at /data/src/preview-10.10-online-alter-gcov/sql/mdl.cc:2379
      #8  0x00005556bcc5ea07 in MDL_context::upgrade_shared_lock (this=0x7fa97c000f08, mdl_ticket=0x7fa97c0083d0, new_type=MDL_EXCLUSIVE, lock_wait_timeout=86400) at /data/src/preview-10.10-online-alter-gcov/sql/mdl.cc:2586
      #9  0x00005556bcb04464 in copy_data_between_tables (thd=0x7fa97c000db8, from=0x7fa97c047768, to=0x7fa97c020568, create=..., ignore=false, order_num=0, order=0x0, copied=0x7fa9943530f0, deleted=0x7fa9943530f8, keys_onoff=Alter_info::LEAVE_AS_IS, alter_ctx=0x7fa994354610, online=true) at /data/src/preview-10.10-online-alter-gcov/sql/sql_table.cc:11860
      #10 0x00005556bcafd23f in mysql_alter_table (thd=0x7fa97c000db8, new_db=0x7fa97c005a78, new_name=0x7fa97c005e88, create_info=0x7fa994355420, table_list=0x7fa97c0140e0, alter_info=0x7fa994355330, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/src/preview-10.10-online-alter-gcov/sql/sql_table.cc:10897
      #11 0x00005556bcc510d3 in Sql_cmd_alter_table::execute (this=0x7fa97c0147e0, thd=0x7fa97c000db8) at /data/src/preview-10.10-online-alter-gcov/sql/sql_alter.cc:552
      #12 0x00005556bc8f02a7 in mysql_execute_command (thd=0x7fa97c000db8, is_called_from_prepared_stmt=false) at /data/src/preview-10.10-online-alter-gcov/sql/sql_parse.cc:5996
      #13 0x00005556bc8fbc10 in mysql_parse (thd=0x7fa97c000db8, rawbuf=0x7fa97c013fe0 "alter table t force, algorithm=copy, lock=none", length=46, parser_state=0x7fa994356500) at /data/src/preview-10.10-online-alter-gcov/sql/sql_parse.cc:8036
      #14 0x00005556bc8d2f11 in dispatch_command (command=COM_QUERY, thd=0x7fa97c000db8, packet=0x7fa97c00b9e9 "", packet_length=46, blocking=true) at /data/src/preview-10.10-online-alter-gcov/sql/sql_parse.cc:1894
      #15 0x00005556bc8d00ea in do_command (thd=0x7fa97c000db8, blocking=true) at /data/src/preview-10.10-online-alter-gcov/sql/sql_parse.cc:1407
      #16 0x00005556bcc3bffd in do_handle_one_connection (connect=0x5556c1bc1168, put_in_cache=true) at /data/src/preview-10.10-online-alter-gcov/sql/sql_connect.cc:1418
      #17 0x00005556bcc3b804 in handle_one_connection (arg=0x5556c1bc1168) at /data/src/preview-10.10-online-alter-gcov/sql/sql_connect.cc:1312
      #18 0x00005556bd549b95 in pfs_spawn_thread (arg=0x5556c1bc1248) at /data/src/preview-10.10-online-alter-gcov/storage/perfschema/pfs.cc:2201
      #19 0x00007fa9996c0ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #20 0x00007fa9992bfdef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
       
      Thread 6 (Thread 0x7fa9943a2700 (LWP 2440394) "mariadbd"):
      #0  futex_abstimed_wait_cancelable (private=0, abstime=0x7fa9943a05f0, clockid=-1808137392, expected=0, futex_word=0x7fa978000b90) at ../sysdeps/nptl/futex-internal.h:323
      #1  __pthread_cond_wait_common (abstime=0x7fa9943a05f0, clockid=-1808137392, mutex=0x7fa97c01fc60, cond=0x7fa978000b68) at pthread_cond_wait.c:520
      #2  __pthread_cond_timedwait (cond=0x7fa978000b68, mutex=0x7fa97c01fc60, abstime=0x7fa9943a05f0) at pthread_cond_wait.c:656
      #3  0x00005556bdf80863 in safe_cond_timedwait (cond=0x7fa978000b68, mp=0x7fa97c01fc38, abstime=0x7fa9943a05f0, file=0x5556be5ca610 "/data/src/preview-10.10-online-alter-gcov/mysys/my_thr_init.c", line=609) at /data/src/preview-10.10-online-alter-gcov/mysys/thr_mutex.c:548
      #4  0x00005556bdf7566f in psi_cond_timedwait (that=0x7fa978000b68, mutex=0x7fa97c01fc38, abstime=0x7fa9943a05f0, file=0x5556be5caf90 "/data/src/preview-10.10-online-alter-gcov/mysys/thr_lock.c", line=558) at /data/src/preview-10.10-online-alter-gcov/mysys/my_thr_init.c:609
      #5  0x00005556bdf7ab76 in inline_mysql_cond_timedwait (that=0x7fa978000b68, mutex=0x7fa97c01fc38, abstime=0x7fa9943a05f0, src_file=0x5556be5caf90 "/data/src/preview-10.10-online-alter-gcov/mysys/thr_lock.c", src_line=558) at /data/src/preview-10.10-online-alter-gcov/include/mysql/psi/mysql_thread.h:1086
      #6  0x00005556bdf7b578 in wait_for_lock (wait=0x7fa97c01fd08, data=0x7fa978203808, in_wait_list=0 '\000', lock_wait_timeout=86400) at /data/src/preview-10.10-online-alter-gcov/mysys/thr_lock.c:558
      #7  0x00005556bdf7c3d9 in thr_lock (data=0x7fa978203808, owner=0x7fa978002b38, lock_wait_timeout=86400) at /data/src/preview-10.10-online-alter-gcov/mysys/thr_lock.c:890
      #8  0x00005556bdf7d2f4 in thr_multi_lock (data=0x7fa978015fc8, count=1, owner=0x7fa978002b38, lock_wait_timeout=86400) at /data/src/preview-10.10-online-alter-gcov/mysys/thr_lock.c:1171
      #9  0x00005556bd29509d in mysql_lock_tables (thd=0x7fa978000db8, sql_lock=0x7fa978015fa0, flags=0) at /data/src/preview-10.10-online-alter-gcov/sql/lock.cc:349
      #10 0x00005556bd294c6d in mysql_lock_tables (thd=0x7fa978000db8, tables=0x7fa978015f98, count=1, flags=0) at /data/src/preview-10.10-online-alter-gcov/sql/lock.cc:301
      #11 0x00005556bc799763 in lock_tables (thd=0x7fa978000db8, tables=0x7fa978015530, count=1, flags=0) at /data/src/preview-10.10-online-alter-gcov/sql/sql_base.cc:5796
      #12 0x00005556bcb518b7 in mysql_update (thd=0x7fa978000db8, table_list=0x7fa978015530, fields=..., values=..., conds=0x0, order_num=0, order=0x0, limit=18446744073709551615, ignore=false, found_return=0x7fa9943a0f50, updated_return=0x7fa9943a1040) at /data/src/preview-10.10-online-alter-gcov/sql/sql_update.cc:442
      #13 0x00005556bc8e2c23 in mysql_execute_command (thd=0x7fa978000db8, is_called_from_prepared_stmt=false) at /data/src/preview-10.10-online-alter-gcov/sql/sql_parse.cc:4405
      #14 0x00005556bc8fbc10 in mysql_parse (thd=0x7fa978000db8, rawbuf=0x7fa978015460 "update t set a = 0", length=18, parser_state=0x7fa9943a1500) at /data/src/preview-10.10-online-alter-gcov/sql/sql_parse.cc:8036
      #15 0x00005556bc8d2f11 in dispatch_command (command=COM_QUERY, thd=0x7fa978000db8, packet=0x7fa97800b9e9 "update t set a = 0", packet_length=18, blocking=true) at /data/src/preview-10.10-online-alter-gcov/sql/sql_parse.cc:1894
      #16 0x00005556bc8d00ea in do_command (thd=0x7fa978000db8, blocking=true) at /data/src/preview-10.10-online-alter-gcov/sql/sql_parse.cc:1407
      #17 0x00005556bcc3bffd in do_handle_one_connection (connect=0x5556c1bc1088, put_in_cache=true) at /data/src/preview-10.10-online-alter-gcov/sql/sql_connect.cc:1418
      #18 0x00005556bcc3b804 in handle_one_connection (arg=0x5556c1bc3a68) at /data/src/preview-10.10-online-alter-gcov/sql/sql_connect.cc:1312
      #19 0x00005556bd549b95 in pfs_spawn_thread (arg=0x5556c1bc0bf8) at /data/src/preview-10.10-online-alter-gcov/storage/perfschema/pfs.cc:2201
      #20 0x00007fa9996c0ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #21 0x00007fa9992bfdef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Attachments

        Issue Links

          Activity

            People

              serg Sergei Golubchik
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.