Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10653

SHOW SLAVE STATUS Can Deadlock an Errored Slave

    XMLWordPrintable

Details

    • 5.5.55

    Description

      Note: this JIRA was formerly titled rpl.rpl_parallel, binlog_encryption.rpl_parallel fails in buildbot with timeout in include.

      A replication parallel worker thread can deadlock with another
      connection running SHOW SLAVE STATUS. That is, if the replication
      worker thread is in do_gco_wait() and is killed, it will already
      hold the LOCK_parallel_entry, and during error reporting, try to
      grab the err_lock. SHOW SLAVE STATUS, however, grabs these locks in
      reverse order. It will initially grab the err_lock, and then try to
      grab LOCK_parallel_entry. This leads to a deadlock when both threads
      have grabbed their first lock without the second.

      This lead to test errors in rpl.rpl_parallel and binlog_encryption.rpl_parallel.

      http://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/9240/steps/test/logs/stdio

      rpl.rpl_parallel 'mix,xtradb'            w2 [ fail ]
              Test ended at 2016-08-23 22:06:29
       
      CURRENT_TEST: rpl.rpl_parallel
      mysqltest: In included file "./include/wait_for_slave_param.inc": 
      included from ./include/wait_for_slave_sql_to_start.inc at line 32:
      included from ./include/wait_for_slave_to_start.inc at line 27:
      included from ./include/start_slave.inc at line 35:
      included from /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/suite/rpl/t/rpl_parallel.test at line 355:
      At line 115: Timeout in include/wait_for_slave_param.inc
       
      The result from queries just before the failure was:
      < snip >
      'group_commit_waiting_for_prior SIGNAL slave_queued3',
      ''))
      master-bin.000002	1417	Xid	1	1444	COMMIT /* xid=370 */
      master-bin.000002	1444	Gtid	1	1482	BEGIN GTID 0-1-15
      master-bin.000002	1482	Query	1	1571	use `test`; INSERT INTO t2 VALUES (20)
      master-bin.000002	1571	Query	1	1660	use `test`; INSERT INTO t1 VALUES (20)
      master-bin.000002	1660	Query	1	1749	use `test`; INSERT INTO t2 VALUES (21)
      master-bin.000002	1749	Query	1	1842	use `test`; INSERT INTO t3 VALUES (20, 20)
      master-bin.000002	1842	Xid	1	1869	COMMIT /* xid=433 */
      master-bin.000002	1869	Gtid	1	1907	BEGIN GTID 0-1-16
      master-bin.000002	1907	Query	1	1999	use `test`; INSERT INTO t3 VALUES(21, 21)
      master-bin.000002	1999	Xid	1	2026	COMMIT /* xid=438 */
      master-bin.000002	2026	Gtid	1	2064	BEGIN GTID 0-1-17
      master-bin.000002	2064	Query	1	2156	use `test`; INSERT INTO t3 VALUES(22, 22)
      master-bin.000002	2156	Xid	1	2183	COMMIT /* xid=439 */
       
      **** SHOW RELAYLOG EVENTS on server_1 ****
      relaylog_name = 'No such row'
      SHOW RELAYLOG EVENTS IN 'No such row';
      Log_name	Pos	Event_type	Server_id	End_log_pos	Info
       
      More results from queries before failure can be found in /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/2/log/rpl_parallel.log
      

      Happens not very often, but quite regularly.

      Attachments

        Issue Links

          Activity

            People

              bnestere Brandon Nesterenko
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.