Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-37573

Unhandled deadlock (hang) between DML and online alter table

    XMLWordPrintable

Details

    • Can result in hang or crash

    Description

      Originally from MDEV-34134, but it seems this is a general problem of online alter table, not specific to replication.

      The below test case reproduces a server hang between DML (delete) and online alter table (REPAIR). The hang occurs because online alter first scans the table, then tries to upgrade its metadata lock to exclusive. The scanned rows seem to be locked, so a concurrent DML can be blocked on the scanned rows, holding a shared metadata lock.

      This results in a hang, because DDL is waiting on MDL held by DML, and DML is waiting on InnoDB row lock held by DDL. The deadlock is not detected because there is no common deadlock detector covering both MDL and InnoDB row locks.

      Stack traces showing this below. I'm not sure if I'm missing something here? If not, this seems a fundamental design flaw in online alter table, such DDL will not only not be online, it will completely hang the two operations depending on the lock_wait_timeout values configured by the user.

      Test case:

      --source include/have_innodb.inc
      --source include/have_sequence.inc
      --source include/have_debug_sync.inc
       
      CREATE TABLE t1(a INT PRIMARY KEY) ENGINE=InnoDB;
      INSERT INTO t1 SELECT seq FROM seq_1_to_10;
       
      --connect(con1,localhost,root,,)
      SET debug_sync= 'alter_table_online_before_lock SIGNAL before_upgrade WAIT_FOR cont2';
      send REPAIR TABLE t1;
       
      --connection default
      SET debug_sync= 'now WAIT_FOR before_upgrade';
      BEGIN;
      send DELETE FROM t1 WHERE a=3;
       
      --connect(con2,localhost,root,,)
      # ToDo anything better we can do here than a sleep? Perhaps waiting for processlist state
      --sleep 0.1
      SET debug_sync= 'now SIGNAL cont2';
      --sleep 0.1
       
      --connection default
      reap;
      COMMIT;
       
      --connection con1
      reap;
       
      SELECT COUNT(*) FROM t1;
       
      DROP TABLE t1;
      

      Stack traces of the DML and DDL when they are stuck waiting for each other:

      Thread 10 (Thread 0x7f5d8294a6c0 (LWP 1795466)):
      #7  0x000055cfde5a762c in safe_cond_timedwait (cond=0x7f5d83197c50, mp=0x55cfdfb5e700 <lock_sys+192>, abstime=0x7f5d82947d10, file=0x55cfde8dd9e2 "/kvm/src/my/dev/mariadb3/storage/innobase/lock/lock0lock.cc", line=2382) at /kvm/src/my/dev/mariadb3/mysys/thr_mutex.c:543
      #8  0x000055cfde2987f1 in lock_wait (thr=thr@entry=0x7f5d58272ae8) at /kvm/src/my/dev/mariadb3/storage/innobase/lock/lock0lock.cc:2381
      #9  0x000055cfde37b9b6 in row_mysql_handle_errors (new_err=new_err@entry=0x7f5d82947fa4, trx=trx@entry=0x7f5d83197bc0, thr=thr@entry=0x7f5d58272ae8, savept=savept@entry=0x0) at /kvm/src/my/dev/mariadb3/storage/innobase/row/row0mysql.cc:698
      #10 0x000055cfde3b0511 in row_search_mvcc (buf=buf@entry=0x7f5d5806fa80 "\377", mode=<optimized out>, prebuilt=0x7f5d58272540, match_mode=1, direction=direction@entry=0) at /kvm/src/my/dev/mariadb3/storage/innobase/row/row0sel.cc:5872
      #11 0x000055cfde23128e in ha_innobase::index_read (this=0x7f5d5806f228, buf=0x7f5d5806fa80 "\377", key_ptr=0x7f5d58074dd0 "\003", key_len=4, find_flag=HA_READ_KEY_EXACT) at /kvm/src/my/dev/mariadb3/storage/innobase/handler/ha_innodb.cc:9035
      #12 0x000055cfddf404d4 in handler::ha_index_read_map (this=this@entry=0x7f5d5806f228, buf=0x7f5d5806fa80 "\377", key=0x7f5d58074dd0 "\003", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at /kvm/src/my/dev/mariadb3/sql/handler.cc:3875
      #13 0x000055cfddf492ed in handler::read_range_first (this=0x7f5d5806f228, start_key=0x7f5d5806f388, end_key=0x7f5d5806f3a8, eq_range_arg=true, sorted=<optimized out>) at /kvm/src/my/dev/mariadb3/sql/handler.cc:7206
      #14 0x000055cfdddf229a in handler::multi_range_read_next (this=0x7f5d5806f228, range_info=range_info@entry=0x7f5d82948a40) at /kvm/src/my/dev/mariadb3/sql/multi_range_read.cc:590
      #15 0x000055cfdddf23fc in Mrr_simple_index_reader::get_next (this=0x7f5d5806f900, range_info=0x7f5d82948a40) at /kvm/src/my/dev/mariadb3/sql/multi_range_read.cc:627
      #16 0x000055cfdddf434f in DsMrr_impl::dsmrr_next (this=0x7f5d5806f7b0, range_info=0x7f5d82948a40) at /kvm/src/my/dev/mariadb3/sql/multi_range_read.cc:1731
      #17 0x000055cfddb412fa in QUICK_RANGE_SELECT::get_next (this=0x7f5d58072010) at /kvm/src/my/dev/mariadb3/sql/opt_range.cc:13509
      #18 0x000055cfddb5e091 in rr_quick (info=0x7f5d82948ba0) at /kvm/src/my/dev/mariadb3/sql/records.cc:398
      #19 0x000055cfddbf9d5a in READ_RECORD::read_record (this=0x7f5d82948ba0) at /kvm/src/my/dev/mariadb3/sql/records.h:77
      #20 Sql_cmd_delete::delete_from_single_table (this=this@entry=0x7f5d58017930, thd=thd@entry=0x7f5d58000d58) at /kvm/src/my/dev/mariadb3/sql/sql_delete.cc:861
      #21 0x000055cfddbfcecc in Sql_cmd_delete::execute_inner (this=0x7f5d58017930, thd=0x7f5d58000d58) at /kvm/src/my/dev/mariadb3/sql/sql_delete.cc:2083
      #22 0x000055cfddcdd0b6 in Sql_cmd_dml::execute (this=0x7f5d58017930, thd=0x7f5d58000d58) at /kvm/src/my/dev/mariadb3/sql/sql_select.cc:34525
      #23 0x000055cfddc48581 in mysql_execute_command (thd=thd@entry=0x7f5d58000d58, is_called_from_prepared_stmt=false) at /kvm/src/my/dev/mariadb3/sql/sql_parse.cc:4424
      #24 0x000055cfddc43f3f in mysql_parse (thd=thd@entry=0x7f5d58000d58, rawbuf=<optimized out>, length=24, parser_state=parser_state@entry=0x7f5d82949360) at /kvm/src/my/dev/mariadb3/sql/sql_parse.cc:7905
      #25 0x000055cfddc4211d in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f5d58000d58, packet=packet@entry=0x7f5d58248319 "DELETE FROM t1 WHERE a=3", packet_length=packet_length@entry=24, blocking=true) at /kvm/src/my/dev/mariadb3/sql/sql_parse.cc:1903
       
      Thread 4 (Thread 0x7f5d828ff6c0 (LWP 1795469)):
      #9  0x000055cfdddc0e30 in inline_mysql_cond_timedwait (that=0x7f5d4c000f60, mutex=0x7f5d4c000eb0, abstime=0x7f5d828f9268, src_line=1211, src_file=<optimized out>) at /kvm/src/my/dev/mariadb3/include/mysql/psi/mysql_thread.h:1086
      #10 MDL_wait::timed_wait (this=this@entry=0x7f5d4c000eb0, owner=0x7f5d4c000e40, abs_timeout=abs_timeout@entry=0x7f5d828f9268, set_status_on_timeout=false, wait_state_name=<optimized out>) at /kvm/src/my/dev/mariadb3/sql/mdl.cc:1210
      #11 0x000055cfdddc2a1a in MDL_context::acquire_lock (this=this@entry=0x7f5d4c000eb0, mdl_request=mdl_request@entry=0x7f5d828f9340, lock_wait_timeout=lock_wait_timeout@entry=86400) at /kvm/src/my/dev/mariadb3/sql/mdl.cc:2441
      #12 0x000055cfdddc3447 in MDL_context::upgrade_shared_lock (this=0x7f5d4c000eb0, mdl_ticket=0x7f5d4c01f7f0, new_type=MDL_SHARED_NO_WRITE, lock_wait_timeout=<error reading variable: Value cannot be represented as integer of 8 bytes.>) at /kvm/src/my/dev/mariadb3/sql/mdl.cc:2649
      #13 0x000055cfddd33d24 in copy_data_between_tables (thd=thd@entry=0x7f5d4c000d58, from=0x7f5d4c0231a8, to=to@entry=0x7f5d4c027b28, ignore=true, order_num=<optimized out>, order=<optimized out>, copied=<optimized out>, deleted=<optimized out>, alter_info=<optimized out>, alter_ctx=<optimized out>, online=true, start_alter_id=<optimized out>) at /kvm/src/my/dev/mariadb3/sql/sql_table.cc:12999
      #14 0x000055cfddd2bf5c in mysql_alter_table (thd=thd@entry=0x7f5d4c000d58, new_db=<optimized out>, new_name=<optimized out>, create_info=<optimized out>, table_list=0x7f5d4c015468, recreate_info=<optimized out>, recreate_info@entry=0x7f5d828fd590, alter_info=<optimized out>, order_num=<optimized out>, order=<optimized out>, ignore=<optimized out>, if_exists=<optimized out>) at /kvm/src/my/dev/mariadb3/sql/sql_table.cc:11921
      #15 0x000055cfddd354dc in mysql_recreate_table (thd=thd@entry=0x7f5d4c000d58, table_list=table_list@entry=0x7f5d4c015468, recreate_info=recreate_info@entry=0x7f5d828fd590, table_copy=true) at /kvm/src/my/dev/mariadb3/sql/sql_table.cc:13138
      #16 0x000055cfdddc9d7a in admin_recreate_table (thd=thd@entry=0x7f5d4c000d58, table_list=table_list@entry=0x7f5d4c015468, recreate_info=recreate_info@entry=0x7f5d828fd590, table_copy=true) at /kvm/src/my/dev/mariadb3/sql/sql_admin.cc:77
      #17 0x000055cfdddc82e6 in mysql_admin_table (thd=thd@entry=0x7f5d4c000d58, tables=tables@entry=0x7f5d4c015468, check_opt=check_opt@entry=0x7f5d4c0065c0, operator_name=0x55cfdf072518 <msg_repair>, lock_type=lock_type@entry=TL_WRITE, org_open_for_modify=true, no_errors_from_open=<optimized out>, extra_open_options=<optimized out>, prepare_func=<optimized out>, operator_func=(int (handler::*)(handler * const, THD *, HA_CHECK_OPT *)) 0x55cfddf44e00 <handler::ha_repair(THD*, st_ha_check_opt*)>, view_operator_func=<optimized out>, is_cmd_replicated=<optimized out>) at /kvm/src/my/dev/mariadb3/sql/sql_admin.cc:1124
      #18 0x000055cfdddc90de in Sql_cmd_repair_table::execute (this=<optimized out>, thd=0x7f5d4c000d58) at /kvm/src/my/dev/mariadb3/sql/sql_admin.cc:1701
      #19 0x000055cfddc48473 in mysql_execute_command (thd=thd@entry=0x7f5d4c000d58, is_called_from_prepared_stmt=false) at /kvm/src/my/dev/mariadb3/sql/sql_parse.cc:5882
      #20 0x000055cfddc43f3f in mysql_parse (thd=thd@entry=0x7f5d4c000d58, rawbuf=<optimized out>, length=15, parser_state=parser_state@entry=0x7f5d828fe360) at /kvm/src/my/dev/mariadb3/sql/sql_parse.cc:7905
      #21 0x000055cfddc4211d in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f5d4c000d58, packet=packet@entry=0x7f5d4c00acb9 "", packet_length=packet_length@entry=15, blocking=true) at /kvm/src/my/dev/mariadb3/sql/sql_parse.cc:1903
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              knielsen Kristian Nielsen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.