Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9246

Slave hangs with parallel replication in MariaDB 10.1

    XMLWordPrintable

Details

    Description

      A MariaDB 10.1.9 slave was running for over 24 hours with no change in Relay_Master_Log_File:

      MariaDB [db]> show slave status \G
      *************************** 1. row ***************************
      Slave_IO_State: Waiting for master to send event
      Master_Host: master
      Master_User: replication
      Master_Port: 3306
      Connect_Retry: 60
      Master_Log_File: mysql-bin.000157
      Read_Master_Log_Pos: 161873700
      Relay_Log_File: relay_log.000143
      Relay_Log_Pos: 654
      Relay_Master_Log_File: mysql-bin.000027
      Slave_IO_Running: Yes
      Slave_SQL_Running: Yes
      Replicate_Do_DB: 
      Replicate_Ignore_DB: 
      Replicate_Do_Table: 
      Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
      Last_Errno: 0
      Last_Error: 
      Skip_Counter: 0
      Exec_Master_Log_Pos: 366
      Relay_Log_Space: 703475834482
      Until_Condition: None
      Until_Log_File: 
      Until_Log_Pos: 0
      Master_SSL_Allowed: No
      Master_SSL_CA_File: 
      Master_SSL_CA_Path: 
      Master_SSL_Cert: 
      Master_SSL_Cipher: 
      Master_SSL_Key: 
      Seconds_Behind_Master: 115303
      Master_SSL_Verify_Server_Cert: No
      Last_IO_Errno: 0
      Last_IO_Error: 
      Last_SQL_Errno: 0
      Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
      Master_Server_Id: 1
      Master_SSL_Crl: 
      Master_SSL_Crlpath: 
      Using_Gtid: No
      Gtid_IO_Pos: 
      Replicate_Do_Domain_Ids: 
      Replicate_Ignore_Domain_Ids: 
      Parallel_Mode: optimistic

      It was assumed that the slave was stuck, so the user executed STOP SLAVE. This also hung. SHOW FULL PROCESSLIST output was obtained at that time:

      MariaDB [db]> show full processlist;
      +-----+-----------------+--------------------------------+-------------+---
      ----------+-------+--------------------------------------------------------
      ---------------+--------------------------------+----------+
      | Id | User | Host | db |
      Command | Time | State
      | Info | Progress |
      +-----+-----------------+--------------------------------+-------------+---
      ----------+-------+--------------------------------------------------------
      ---------------+--------------------------------+----------+
      | 2 | event_scheduler | localhost | NULL |
      Daemon | 99968 | Waiting on empty queue
      | NULL | 0.000 |
      | 4 | system user | | NULL |
      Connect | 81773 | Waiting for work from SQL thread
      | NULL | 0.000 |
      | 5 | system user | | NULL |
      Connect | 99968 | Waiting for master to send event
      | NULL | 0.000 |
      | 6 | system user | | NULL |
      Connect | 0 | Write_rows_log_event::write_row(-1)
      | NULL | 0.000 |
      | 7 | system user | | NULL |
      Connect | 80392 | Waiting for work from SQL thread
      | NULL | 0.000 |
      | 8 | system user | | NULL |
      Connect | 82490 | Waiting for work from SQL thread
      | NULL | 0.000 |
      | 9 | system user | | NULL |
      Connect | 82490 | Waiting for work from SQL thread
      | NULL | 0.000 |
      | 10 | system user | | NULL |
      Connect | 82490 | Waiting for work from SQL thread
      | NULL | 0.000 |
      | 11 | system user | | NULL |
      Connect | 81774 | Waiting for work from SQL thread
      | NULL | 0.000 |
      | 12 | system user | | NULL |
      Connect | 82490 | Waiting for work from SQL thread
      | NULL | 0.000 |
      | 13 | system user | | NULL |
      Connect | 81779 | Waiting for work from SQL thread
      | NULL | 0.000 |
      | 14 | system user | | NULL |
      Connect | 82489 | Waiting for work from SQL thread
      | NULL | 0.000 |
      | 15 | system user | | NULL |
      Connect | 81774 | Waiting for work from SQL thread
      | NULL | 0.000 |
      | 16 | system user | | NULL |
      Connect | 81773 | Waiting for work from SQL thread
      | NULL | 0.000 |
      | 17 | system user | | NULL |
      Connect | 44823 | Reading event from the relay log
      | NULL | 0.000 |
      | 19 | replication | master:47237 | NULL |
      Binlog Dump | 99910 | Master has sent all binlog to slave; waiting for
      binlog to be updated | NULL | 0.000 |
      | 94 | root | master:50113 | NULL |
      Query | 2652 | Filling schema table
      | SHOW /*innotop*/ GLOBAL STATUS | 0.000 |
      | 95 | root | localhost | db |
      Killed | 2378 | init
      | show slave status | 0.000 |
      | 96 | root | localhost | db |
      Query | 2730 | Killing slave
      | stop slave | 0.000 |
      | 97 | root | localhost | NULL |
      Killed | 2421 | Filling schema table
      | show status | 0.000 |
      | 100 | root | localhost | NULL |
      Killed | 2351 | Filling schema table
      | show global status | 0.000 |
      | 103 | root | localhost | NULL |
      Killed | 1670 | Filling schema table
      | show status | 0.000 |
      | 106 | root | localhost | db |
      Killed | 86 | Filling schema table
      | show global variables | 0.000 |
      | 111 | root | localhost | db |
      Query | 0 | init
      | show full processlist | 0.000 |
      +-----+-----------------+--------------------------------+-------------+---
      ----------+-------+--------------------------------------------------------
      ---------------+--------------------------------+----------+

      We thought that the one active worker thread may be holding things up, so we stopped that by executing `KILL 6`, but the SQL thread automatically restarted the thread.

      We killed each thread one by one to see if that would free up any mutexes, but eventually all replication-related threads were killed, except for the IO thread (which had automatically restarted after being killed), and the user executed STOP SLAVE again, which hung again:

      MariaDB [(none)]> show processlist;
      +-----+-----------------+--------------------------------+-------------+---
      ----------+--------+-------------------------------------------------------
      ----------------+--------------------------------+----------+
      | Id  | User            | Host                           | db          |
      Command     | Time   | State
                        | Info                           | Progress |
      +-----+-----------------+--------------------------------+-------------+---
      ----------+--------+-------------------------------------------------------
      ----------------+--------------------------------+----------+
      |   2 | event_scheduler | localhost                      | NULL        |
      Daemon      | 106027 | Waiting on empty queue
                        | NULL                           |    0.000 |
      |   4 | system user     |                                | NULL        |
      Killed      |  87833 | Waiting for work from SQL thread
                        | NULL                           |    0.000 |
      |   5 | system user     |                                | NULL        |
      Killed      | 106027 | Waiting for slave mutex on exit
                        | NULL                           |    0.000 |
      |   6 | system user     |                                | NULL        |
      Killed      |      0 | Write_rows_log_event::write_row(-1)
                        | NULL                           |    0.000 |
      |   7 | system user     |                                | NULL        |
      Killed      |  86452 | Waiting for work from SQL thread
                        | NULL                           |    0.000 |
      |   8 | system user     |                                | NULL        |
      Killed      |  88549 | Waiting for work from SQL thread
                        | NULL                           |    0.000 |
      |   9 | system user     |                                | NULL        |
      Killed      |  88549 | Waiting for work from SQL thread
                        | NULL                           |    0.000 |
      |  10 | system user     |                                | NULL        |
      Killed      |  88549 | Waiting for work from SQL thread
                        | NULL                           |    0.000 |
      |  11 | system user     |                                | NULL        |
      Killed      |  87833 | Waiting for work from SQL thread
                        | NULL                           |    0.000 |
      |  12 | system user     |                                | NULL        |
      Killed      |  88549 | Waiting for work from SQL thread
                        | NULL                           |    0.000 |
      |  13 | system user     |                                | NULL        |
      Killed      |  87838 | Waiting for work from SQL thread
                        | NULL                           |    0.000 |
      |  14 | system user     |                                | NULL        |
      Killed      |  88549 | Waiting for work from SQL thread
                        | NULL                           |    0.000 |
      |  15 | system user     |                                | NULL        |
      Killed      |  87833 | Waiting for work from SQL thread
                        | NULL                           |    0.000 |
      |  16 | system user     |                                | NULL        |
      Killed      |  87833 | Waiting for work from SQL thread
                        | NULL                           |    0.000 |
      |  17 | system user     |                                | NULL        |
      Killed      |  50883 | Reading event from the relay log
                        | NULL                           |    0.000 |
      |  94 | root            | master:50113 | NULL        |
      Query       |   8711 | Filling schema table
                        | SHOW /*innotop*/ GLOBAL STATUS |    0.000 |
      |  95 | root            | localhost                      | db |
      Killed      |   8437 | init
                        | show slave status              |    0.000 |
      |  96 | root            | localhost                      | db |
      Killed      |   8789 | Killing slave
                        | stop slave                     |    0.000 |
      |  97 | root            | localhost                      | NULL        |
      Killed      |   8480 | Filling schema table
                        | show status                    |    0.000 |
      | 100 | root            | localhost                      | NULL        |
      Killed      |   8411 | Filling schema table
                        | show global status             |    0.000 |
      | 103 | root            | localhost                      | NULL        |
      Killed      |   7729 | Filling schema table
                        | show status                    |    0.000 |
      | 106 | root            | localhost                      | db |
      Killed      |   6145 | Filling schema table
                        | show global variables          |    0.000 |
      | 111 | root            | localhost                      | db |
      Killed      |   6027 | Filling schema table
                        | show global status             |    0.000 |
      | 115 | root            | localhost                      | NULL        |
      Query       |      0 | init
                        | show processlist               |    0.000 |
      | 116 | replication     |master:55614 | NULL        |
      Binlog Dump |    772 | Master has sent all binlog to slave; waiting for
      binlog to be updated | NULL                           |    0.000 |
      | 117 | root            | localhost                      | NULL        |
      Query       |     64 | init
                        | stop slave                     |    0.000 |
      +-----+-----------------+--------------------------------+-------------+---
      ----------+--------+-------------------------------------------------------
      ----------------+--------------------------------+----------+

      Attachments

        Activity

          People

            Unassigned Unassigned
            GeoffMontee Geoff Montee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.