Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4182

Galera: safe_mutex: Found wrong usage of mutex 'LOCK_thd_data' and 'LOCK_wsrep_thd', deadlock on DDL

    XMLWordPrintable

Details

    Description

      It might well be some other Galera limitation, but I haven't found which particular one it could be. The flow is mainly HANDLER flow, and also some DML and ALTER TABLE.

      I execute it on two nodes in a 3-node cluster, 1 thread per node. Soon after start, the following message appears in the server error log(s):

      safe_mutex: Found wrong usage of mutex 'LOCK_thd_data' and 'LOCK_wsrep_thd'
      Mutex currently locked (in reverse order):
      LOCK_wsrep_thd                    maria-5.5-galera/sql/sql_class.cc  line 858
      LOCK_thd_data                     maria-5.5-galera/sql/sql_base.cc  line 9348
      rwlock->lock                      maria-5.5-galera/mysys/thr_rwlock.c  line 392

      Either immediately or soon afterwards, both nodes stop, at least one on ALTER table, and another one on ALTER or on a DML statement.

      ALTER from a client on one of nodes is shown as Killed, but doesn't disappear from the processlist. ALTER from a replication thread is waiting for a lock:

      +----+-------------+-----------------+------+---------+------+---------------------------------+----------------------------------------------------------------------------------------------+----------+
      | Id | User        | Host            | db   | Command | Time | State                           | Info                                                                                         | Progress |
      +----+-------------+-----------------+------+---------+------+---------------------------------+----------------------------------------------------------------------------------------------+----------+
      |  1 | system user |                 | test | Sleep   |  484 | Waiting for table metadata lock | ALTER TABLE `table1000_innodb_int_autoinc` DROP COLUMN filler                                |    0.000 |
      |  2 | system user |                 | NULL | Sleep   |  566 | wsrep aborter idle              | NULL                                                                                         |    0.000 |
      |  6 | root        | localhost:43415 | test | Killed  |  484 | checking permissions            | ALTER TABLE `table1000_innodb_int_autoinc` ADD COLUMN filler VARCHAR(255) DEFAULT ' filler ' |    0.000 |
      | 11 | root        | localhost:43428 | test | Query   |    0 | sleeping                        | show full processlist                                                                        |    0.000 |
      +----+-------------+-----------------+------+---------+------+---------------------------------+----------------------------------------------------------------------------------------------+----------+

      Another node is also waiting for something:

      +----+-------------+-----------------+------+---------+------+--------------------+--------------------------------------------------------------------+----------+
      | Id | User        | Host            | db   | Command | Time | State              | Info                                                               | Progress |
      +----+-------------+-----------------+------+---------+------+--------------------+--------------------------------------------------------------------+----------+
      |  1 | system user |                 | NULL | Sleep   |  681 | wsrep aborter idle | NULL                                                               |    0.000 |
      |  2 | system user |                 | NULL | Sleep   |  557 | committed 91       | NULL                                                               |    0.000 |
      | 10 | root        | localhost:41929 | test | Query   |  543 | query end          | UPDATE IGNORE `table1000_innodb_int_autoinc` SET `col_int_key` = 0 |    0.000 |
      | 18 | root        | localhost:41947 | test | Query   |    0 | sleeping           | show full processlist                                              |    0.000 |
      +----+-------------+-----------------+------+---------+------+--------------------+--------------------------------------------------------------------+----------+

      Selected threads:

      Thread 2 (Thread 0x7f8abc061700 (LWP 20603)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
      #1  0x00007f8ac7c2f7de in _gcs_sm_enqueue_common (cond=0x7f8abc05e980, sm=0x7f8ac9c0e010) at gcs/src/gcs_sm.h:160
      #2  gcs_sm_enter (scheduled=<optimized out>, cond=0x7f8abc05e980, sm=0x7f8ac9c0e010) at gcs/src/gcs_sm.h:238
      #3  gcs_repl (conn=0x3c29490, act=0x7f8abc05ecf0, scheduled=<optimized out>) at gcs/src/gcs.c:1490
      #4  0x00007f8ac7c81b1c in repl (scheduled=true, act=..., this=0x3c23a00) at galera/src/gcs.hpp:106
      #5  galera::ReplicatorSMM::replicate (this=0x3c23510, trx=0x7f8a6c01c800) at galera/src/replicator_smm.cpp:552
      #6  0x00007f8ac7c993ff in galera_pre_commit (gh=<optimized out>, conn_id=<optimized out>, trx_handle=<optimized out>, rbr_data=0x7f8a6c1ce620, rbr_data_len=11293, flags=1, global_seqno=0x504a1a8) at galera/src/wsrep_provider.cpp:388
      #7  0x0000000000771833 in wsrep_run_wsrep_commit (thd=0x5047dc0, hton=0x3c49160, all=false) at maria-5.5-galera/sql/wsrep_hton.cc:338
      #8  0x0000000000770685 in wsrep_prepare (hton=0x3c49160, thd=0x5047dc0, all=false) at maria-5.5-galera/sql/wsrep_hton.cc:119
      #9  0x00000000007e7988 in ha_commit_trans (thd=0x5047dc0, all=false) at maria-5.5-galera/sql/handler.cc:1294
      #10 0x0000000000730221 in trans_commit_stmt (thd=0x5047dc0) at maria-5.5-galera/sql/transaction.cc:325
      #11 0x0000000000621031 in mysql_execute_command (thd=0x5047dc0) at maria-5.5-galera/sql/sql_parse.cc:4903
      #12 0x0000000000624da9 in mysql_parse (thd=0x5047dc0, rawbuf=0x7f8a6c00e078 "UPDATE IGNORE `table1000_innodb_int_autoinc` SET `col_int_key` = 0", length=66, parser_state=0x7f8abc060550) at maria-5.5-galera/sql/sql_parse.cc:6304
      #13 0x0000000000623e70 in wsrep_mysql_parse (thd=0x5047dc0, rawbuf=0x7f8a6c00e078 "UPDATE IGNORE `table1000_innodb_int_autoinc` SET `col_int_key` = 0", length=66, parser_state=0x7f8abc060550) at maria-5.5-galera/sql/sql_parse.cc:6069
      #14 0x0000000000616335 in dispatch_command (command=COM_QUERY, thd=0x5047dc0, packet=0x4fed231 "UPDATE IGNORE `table1000_innodb_int_autoinc` SET `col_int_key` = 0 ", packet_length=67) at maria-5.5-galera/sql/sql_parse.cc:1245
      #15 0x0000000000615114 in do_command (thd=0x5047dc0) at maria-5.5-galera/sql/sql_parse.cc:891
      #16 0x000000000071ddd7 in do_handle_one_connection (thd_arg=0x5047dc0) at maria-5.5-galera/sql/sql_connect.cc:1291
      #17 0x000000000071d7af in handle_one_connection (arg=0x5047dc0) at maria-5.5-galera/sql/sql_connect.cc:1199
      #18 0x00007f8ac92cce9a in start_thread (arg=0x7f8abc061700) at pthread_create.c:308
      #19 0x00007f8ac89fdcbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
      #20 0x0000000000000000 in ?? ()

      Thread 21 (Thread 0x7f37490d6700 (LWP 20387)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
      #1  0x0000000000cfcb90 in safe_cond_timedwait (cond=0x7f3740000c88, mp=0x7f3740000be0, abstime=0x7f37490d03c0, file=0xd9a3c8 "maria-5.5-galera/sql/mdl.cc", line=1202) at maria-5.5-galera/mysys/thr_mutex.c:547
      #2  0x0000000000725fc1 in inline_mysql_cond_timedwait (that=0x7f3740000c88, mutex=0x7f3740000be0, abstime=0x7f37490d03c0, src_file=0xd9a3c8 "maria-5.5-galera/sql/mdl.cc", src_line=1202) at maria-5.5-galera/include/mysql/psi/mysql_thread.h:1018
      #3  0x0000000000727095 in MDL_wait::timed_wait (this=0x7f3740000be0, thd=0x7f3740000af0, abs_timeout=0x7f37490d03c0, set_status_on_timeout=false, wait_state_name=0xd9a2e8 "Waiting for table metadata lock") at maria-5.5-galera/sql/mdl.cc:1202
      #4  0x0000000000728bc0 in MDL_context::acquire_lock (this=0x7f3740000be0, mdl_request=0x7f37490d0480, lock_wait_timeout=31536000) at maria-5.5-galera/sql/mdl.cc:2209
      #5  0x000000000072929a in MDL_context::upgrade_shared_lock_to_exclusive (this=0x7f3740000be0, mdl_ticket=0x7f3740085840, lock_wait_timeout=31536000) at maria-5.5-galera/sql/mdl.cc:2389
      #6  0x00000000005b7f1a in wait_while_table_is_used (thd=0x7f3740000af0, table=0x7f37400435a0, function=HA_EXTRA_PREPARE_FOR_RENAME) at maria-5.5-galera/sql/sql_base.cc:2347
      #7  0x00000000006bf67f in mysql_alter_table (thd=0x7f3740000af0, new_db=0x7f374000ced0 "test", new_name=0x7f374000c868 "table1000_innodb_int_autoinc", create_info=0x7f37490d1f90, table_list=0x7f374000c908, alter_info=0x7f37490d2090, order_num=0, order=0x0, ignore=false, require_online=false) at maria-5.5-galera/sql/sql_table.cc:7008
      #8  0x000000000095e20e in Alter_table_statement::execute (this=0x7f374000cf00, thd=0x7f3740000af0) at maria-5.5-galera/sql/sql_alter.cc:122
      #9  0x0000000000620c93 in mysql_execute_command (thd=0x7f3740000af0) at maria-5.5-galera/sql/sql_parse.cc:4844
      #10 0x0000000000624da9 in mysql_parse (thd=0x7f3740000af0, rawbuf=0x7f374004ec39 "ALTER TABLE `table1000_innodb_int_autoinc` DROP COLUMN filler", length=61, parser_state=0x7f37490d2e80) at maria-5.5-galera/sql/sql_parse.cc:6304
      #11 0x00000000008c4b01 in Query_log_event::do_apply_event (this=0x7f3740044be0, rli=0x7f3740010aa0, query_arg=0x7f374004ec39 "ALTER TABLE `table1000_innodb_int_autoinc` DROP COLUMN filler", q_len_arg=61) at maria-5.5-galera/sql/log_event.cc:3792
      #12 0x00000000008c404f in Query_log_event::do_apply_event (this=0x7f3740044be0, rli=0x7f3740010aa0) at maria-5.5-galera/sql/log_event.cc:3531
      #13 0x0000000000593196 in Log_event::apply_event (this=0x7f3740044be0, rli=0x7f3740010aa0) at maria-5.5-galera/sql/log_event.h:1230
      #14 0x000000000062948d in wsrep_apply_rbr (thd=0x7f3740000af0, rbr_buf=0x7f374001f170 "\350\\\"Q\002", buf_len=0) at maria-5.5-galera/sql/sql_parse.cc:8131
      #15 0x0000000000629a79 in wsrep_apply_cb (ctx=0x7f3740000af0, buf=0x7f374001f170, buf_len=129, global_seqno=90) at maria-5.5-galera/sql/sql_parse.cc:8210
      #16 0x00007f375431eabf in apply_wscoll (trx=..., apply_cb=0x6299cb <wsrep_apply_cb(void*, void const*, unsigned long, long)>, recv_ctx=0x7f3740000af0) at galera/src/replicator_smm.cpp:37
      #17 apply_trx_ws (recv_ctx=0x7f3740000af0, apply_cb=0x6299cb <wsrep_apply_cb(void*, void const*, unsigned long, long)>, commit_cb=0x629cd6 <wsrep_commit_cb(void*, long, bool)>, trx=...) at galera/src/replicator_smm.cpp:81
      #18 0x00007f3754324030 in galera::ReplicatorSMM::apply_trx (this=<optimized out>, recv_ctx=0x7f3740000af0, trx=0x7f3740022ff0) at galera/src/replicator_smm.cpp:470
      #19 0x00007f37543247c0 in galera::ReplicatorSMM::process_trx (this=0x367f540, recv_ctx=0x7f3740000af0, trx=0x7f3740022ff0) at galera/src/replicator_smm.cpp:1056
      #20 0x00007f3754301d34 in galera::GcsActionSource::dispatch (this=0x367fac8, recv_ctx=0x7f3740000af0, act=...) at galera/src/gcs_action_source.cpp:110
      #21 0x00007f37543029b3 in galera::GcsActionSource::process (this=0x367fac8, recv_ctx=0x7f3740000af0) at galera/src/gcs_action_source.cpp:168
      #22 0x00007f375432637d in async_recv (recv_ctx=0x7f3740000af0, this=0x367f540) at galera/src/replicator_smm.cpp:375
      #23 galera::ReplicatorSMM::async_recv (this=0x367f540, recv_ctx=0x7f3740000af0) at galera/src/replicator_smm.cpp:356
      #24 0x00007f3754339cae in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at galera/src/wsrep_provider.cpp:201
      #25 0x0000000000629ff0 in wsrep_replication_process (thd=0x7f3740000af0) at maria-5.5-galera/sql/sql_parse.cc:8363
      #26 0x00000000005693ff in start_wsrep_THD (arg=0x629f6d) at maria-5.5-galera/sql/mysqld.cc:4832
      #27 0x00007f375596ee9a in start_thread (arg=0x7f37490d6700) at pthread_create.c:308
      #28 0x00007f375509fcbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
      #29 0x0000000000000000 in ?? ()
       
      Thread 2 (Thread 0x7f374810e700 (LWP 20641)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
      #1  0x00007f3754320985 in wait (cond=..., this=<optimized out>) at galerautils/src/gu_lock.hpp:56
      #2  galera::Monitor<galera::ReplicatorSMM::CommitOrder>::enter (this=0x367ffd8, obj=...) at galera/src/monitor.hpp:126
      #3  0x00007f3754322584 in galera::ReplicatorSMM::to_isolation_begin (this=0x367f540, trx=0x7f3708263200) at galera/src/replicator_smm.cpp:961
      #4  0x00007f375433c627 in galera_to_execute_start (gh=<optimized out>, conn_id=6, key=<optimized out>, key_len=1, query=0x7f370825f090, query_len=160, global_seqno=0x4b0c958) at galera/src/wsrep_provider.cpp:579
      #5  0x0000000000775c25 in wsrep_TOI_begin (thd=0x4b0a570, db_=0x0, table_=0x0, table_list=0x7f37082f8328) at maria-5.5-galera/sql/wsrep_mysqld.cc:1091
      #6  0x0000000000776a0a in wsrep_to_isolation_begin (thd=0x4b0a570, db_=0x0, table_=0x0, table_list=0x7f37082f8328) at maria-5.5-galera/sql/wsrep_mysqld.cc:1227
      #7  0x000000000095e0f0 in Alter_table_statement::execute (this=0x7f37082f8ab8, thd=0x4b0a570) at maria-5.5-galera/sql/sql_alter.cc:110
      #8  0x0000000000620c93 in mysql_execute_command (thd=0x4b0a570) at maria-5.5-galera/sql/sql_parse.cc:4844
      #9  0x0000000000624da9 in mysql_parse (thd=0x4b0a570, rawbuf=0x7f37082f8138 "ALTER TABLE `table1000_innodb_int_autoinc` ADD COLUMN filler VARCHAR(255) DEFAULT ' filler '", length=92, parser_state=0x7f374810d550) at maria-5.5-galera/sql/sql_parse.cc:6304
      #10 0x0000000000623e70 in wsrep_mysql_parse (thd=0x4b0a570, rawbuf=0x7f37082f8138 "ALTER TABLE `table1000_innodb_int_autoinc` ADD COLUMN filler VARCHAR(255) DEFAULT ' filler '", length=92, parser_state=0x7f374810d550) at maria-5.5-galera/sql/sql_parse.cc:6069
      #11 0x0000000000616335 in dispatch_command (command=COM_QUERY, thd=0x4b0a570, packet=0x4b0e251 "ALTER TABLE `table1000_innodb_int_autoinc` ADD COLUMN filler VARCHAR(255) DEFAULT ' filler '", packet_length=92) at maria-5.5-galera/sql/sql_parse.cc:1245
      #12 0x0000000000615114 in do_command (thd=0x4b0a570) at maria-5.5-galera/sql/sql_parse.cc:891
      #13 0x000000000071ddd7 in do_handle_one_connection (thd_arg=0x4b0a570) at maria-5.5-galera/sql/sql_connect.cc:1291
      #14 0x000000000071d7af in handle_one_connection (arg=0x4b0a570) at maria-5.5-galera/sql/sql_connect.cc:1199
      #15 0x00007f375596ee9a in start_thread (arg=0x7f374810e700) at pthread_create.c:308
      #16 0x00007f375509fcbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
      #17 0x0000000000000000 in ?? ()
       

      Server command lines:

      mysqld --no-defaults --basedir=/home/elenst/maria-5.5-galera --lc-messages-dir=/home/elenst/maria-5.5-galera/sql/share/ --core --datadir=/home/elenst/maria-5.5-galera/data1 --tmpdir=/home/elenst/maria-5.5-galera/data1/tmp --port=8306 --socket=/home/elenst/maria-5.5-galera/data1/tmp/node1.sock --wsrep-provider=/home/elenst/galera/libgalera_smm.so --wsrep-cluster-address=gcomm:// --binlog-format=row --wsrep-sst-method=rsync --log-error=/home/elenst/maria-5.5-galera/data1/log.err --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --general-log=1
       
      mysqld --no-defaults --basedir=/home/elenst/maria-5.5-galera --lc-messages-dir=/home/elenst/maria-5.5-galera/sql/share/ --core --datadir=/home/elenst/maria-5.5-galera/data2 --tmpdir=/home/elenst/maria-5.5-galera/data2/tmp --port=8307 --socket=/home/elenst/maria-5.5-galera/data2/tmp/node2.sock --wsrep-provider=/home/elenst/galera/libgalera_smm.so --wsrep-cluster-address=gcomm://127.0.0.1:4567?gmcast.listen_addr=tcp://127.0.0.1:4566 --binlog-format=row --wsrep-sst-method=rsync --log-error=/home/elenst/maria-5.5-galera/data2/log.err --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --general-log=1

      To reproduce using RQG:

      # On one node:
      perl ./gentest.pl --gendata=conf/engines/handler.zz --threads=1 --queries=100M --duration=600 --dsn=dbi:mysql:host=127.0.0.1:port=8306:user=root:database=test --grammar=conf/engines/handler.yy --engine=innodb
       
      # On another node, after the first node creates the table and starts the flow:
      perl ./gentest.pl  --threads=1 --queries=100M --duration=600 --dsn=dbi:mysql:host=127.0.0.1:port=8307:user=root:database=test --grammar=conf/engines/handler.yy

      revision-id: seppo.jaakola@codership.com-20130216222240-syypg8kc355qcpsp
      revno: 3380
      branch-nick: maria-5.5-galera

      Attachments

        Activity

          People

            seppo Seppo Jaakola
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.