Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7326

Server deadlock in connection with parallel replication

Details

    Description

      A user testing parallel replication is seeing the server hang.
      From debugging, it appears that the following happens.

      One batch of group-committed transactions from the master all reach their
      commit phase, say T1 and T2.

      This causes the transactions in the following batch, say T3, T4, and T5, to
      wake up and start running.

      At this point, T2 (say) gets a deadlock or other temporary error, and needs to
      be rolled back and retried. This involves unmark_start_commit(), decrementing
      the count of transactions that have already reached their commit step.

      Now when T3, T4, and T5 reach their commit step, they do mark_start_commit().
      But a following T6 is not woken up, because T2 has not yet done
      mark_start_commit().

      Then when T2 has been successfully retried and done mark_start_commit(), the
      bug is that T6 is not woken up. The wakeup is lost. This is because T2 only
      considers the batch with T3-T5 for wakeup, not following batches.

      It seems fairly certain that this is the scenario that the user
      experienced. It is however unknown at this point how it is possible for T2 to
      get a deadlock error, after it has run all its containing queries and has
      started the commit step. (The idea is that if there was a deadlock with T1,
      then T1 will be blocked from proceeding to mark_start_commit() until T2 has
      done rollback; and T2 does unmark_start_commit() before its rollback).

      The fix should be to make sure that this case, of T2 retrying after T3-T5 have
      started running, is handled correctly: When T2 completes its retry, all
      following and possibly waiting transactions should be considered, so the
      wakeup is not lost.

      The user-visible effect in this hang is that at least one replication worker
      threads are stuck in state "Waiting for prior transaction to start commit
      before starting next transaction", and all other threads are stuck in this
      state or the state "Waiting for prior transaction to commit". (As seen in SHOW
      PROCESSLIST). Killing the worker threads will stop replication, and it can
      then be re-started successfully.

      Attachments

        Issue Links

          Activity

            knielsen Kristian Nielsen added a comment - Testcase and patches: http://lists.askmonty.org/pipermail/commits/2014-December/007204.html http://lists.askmonty.org/pipermail/commits/2014-December/007205.html
            danblack Daniel Black added a comment -

            fyi have seen this in production on 10.0.15. Have binlogs, innodb engine status and processlist if this would aid verification. Waiting on client permission to release binlogs.

             
            Id	User	Host	db	Command	Time	State	Info	Progress
            3	system user		NULL	Connect	972115	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000
            4	system user		NULL	Connect	972170	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000
            5	system user		NULL	Connect	972130	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000
            6	system user		NULL	Connect	972118	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000
            7	system user		NULL	Connect	972115	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000
            8	system user		NULL	Connect	972170	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000
            9	system user		NULL	Connect	972170	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000
            10	system user		NULL	Connect	972115	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000
            11	system user		NULL	Connect	1299835	Waiting for master to send event	NULL	0.000
            12	system user		NULL	Connect	984669	Waiting for room in worker thread event queue	NULL	0.000
            8763	replication	192.168.179.103:50232	NULL	Binlog Dump	1292423	Master has sent all binlog to slave; waiting for binlog to be updated	NULL	0.000
            22621	backup	localhost	opalauctions_com	Query	164	Writing to net	SELECT /*!40001 SQL_NO_CACHE */ * FROM `auctions`	0.000
            22637	root	localhost	NULL	Query	0	init	show processlist	0.000
                              Master_Port: 3306
                            Connect_Retry: 60  
                          Master_Log_File: mysql-bin.006188
                      Read_Master_Log_Pos: 88198862  
                           Relay_Log_File: mysqld-relay-bin.001138
                            Relay_Log_Pos: 98608625  
                    Relay_Master_Log_File: mysql-bin.006103
                         Slave_IO_Running: Yes  
                        Slave_SQL_Running: Yes
                          Replicate_Do_DB:   
                      Replicate_Ignore_DB: 
                       Replicate_Do_Table:   
                   Replicate_Ignore_Table: 
                  Replicate_Wild_Do_Table:   
              Replicate_Wild_Ignore_Table: 
                               Last_Errno: 0  
                               Last_Error: 
                             Skip_Counter: 0
                      Exec_Master_Log_Pos: 98608338
                          Relay_Log_Space: 8518426389
                          Until_Condition: None
                           Until_Log_File: 
                            Until_Log_Pos: 0
                       Master_SSL_Allowed: No
                       Master_SSL_CA_File: 
                       Master_SSL_CA_Path: 
                          Master_SSL_Cert: 
                        Master_SSL_Cipher: 
                           Master_SSL_Key: 
                    Seconds_Behind_Master: 972186
            Master_SSL_Verify_Server_Cert: No
                            Last_IO_Errno: 0
                            Last_IO_Error: 
                           Last_SQL_Errno: 0
                           Last_SQL_Error: 
              Replicate_Ignore_Server_Ids: 
                         Master_Server_Id: 12022
                           Master_SSL_Crl: 
                       Master_SSL_Crlpath: 
                               Using_Gtid: No
                              Gtid_IO_Pos: 
             
            =====================================
            2014-12-30 01:38:19 7f33f4f24700 INNODB MONITOR OUTPUT
            =====================================
            Per second averages calculated from the last 54 seconds
            -----------------
            BACKGROUND THREAD
            -----------------
            srv_master_thread loops: 344551 srv_active, 0 srv_shutdown, 953736 srv_idle
            srv_master_thread log flush and writes: 1298179
            ----------
            SEMAPHORES
            ----------
            OS WAIT ARRAY INFO: reservation count 204986
            OS WAIT ARRAY INFO: signal count 340511
            Mutex spin waits 1282140, rounds 7709447, OS waits 71441
            RW-shared spins 285715, rounds 4307335, OS waits 94460
            RW-excl spins 244659, rounds 2516718, OS waits 6129
            Spin rounds per wait: 6.01 mutex, 15.08 RW-shared, 10.29 RW-excl
            --------
            FILE I/O
            --------
            I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
            I/O thread 1 state: waiting for completed aio requests (log thread)
            I/O thread 2 state: waiting for completed aio requests (read thread)
            I/O thread 3 state: waiting for completed aio requests (read thread)
            I/O thread 4 state: waiting for completed aio requests (read thread)
            I/O thread 5 state: waiting for completed aio requests (read thread)
            I/O thread 6 state: waiting for completed aio requests (write thread)
            I/O thread 7 state: waiting for completed aio requests (write thread)
            I/O thread 8 state: waiting for completed aio requests (write thread)
            I/O thread 9 state: waiting for completed aio requests (write thread)
            Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
             ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
            Pending flushes (fsync) log: 0; buffer pool: 0
            43921260 OS file reads, 10843564 OS file writes, 9913825 OS fsyncs
            107.26 reads/s, 16384 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
            -------------------------------------
            INSERT BUFFER AND ADAPTIVE HASH INDEX
            -------------------------------------
            Ibuf: size 1, free list len 12, seg size 14, 14210 merges
            merged operations:
             insert 21673, delete mark 16195, delete 1206
            discarded operations:
             insert 0, delete mark 0, delete 0
            0.00 hash searches/s, 0.00 non-hash searches/s
            ---
            LOG
            ---
            Log sequence number 897195392457
            Log flushed up to   897195392457
            Pages flushed up to 897195392457
            Last checkpoint at  897195392457
            Max checkpoint age    277874566
            Checkpoint age target 269190986
            Modified age          0
            Checkpoint age        0
            0 pending log writes, 0 pending chkp writes
            8884128 log i/o's done, 0.00 log i/o's/second
            ----------------------
            BUFFER POOL AND MEMORY
            ...
            --------------
            ROW OPERATIONS
            --------------
            0 queries inside InnoDB, 0 queries in queue
            1 read views open inside InnoDB
            1 RW transactions active inside InnoDB
            0 RO transactions active inside InnoDB
            1 out of 1000 descriptors used
            ---OLDEST VIEW---
            Normal read view
            Read view low limit trx n:o 1163197449
            Read view up limit trx id 1163197449
            Read view low limit trx id 1163197449
            Read view individually stored trx ids:
            -----------------
            Main thread process no. 2027, id 139861419865856, state: sleeping
            Number of rows inserted 1219224, updated 9708367, deleted 1175520, read 4898178761
            0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 616.60 reads/s
            Number of system rows inserted 8524700, updated 0, deleted 8524373, read 8524734
            0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
             
            TRANSACTIONS
            ------------
            Trx id counter 1163197455
            Purge done for trx's n:o < 1163195205 undo n:o < 0 state: running but idle
            History list length 2464
            LIST OF TRANSACTIONS FOR EACH SESSION:
            ---TRANSACTION 0, not started
            MySQL thread id 22637, OS thread handle 0x7f33f4f24700, query id 18691069 localhost root init
            show engine innodb status
            ---TRANSACTION 1163195193, not started
            MySQL thread id 3, OS thread handle 0x7f364bd08700, query id 17613862 Waiting for prior transaction to start commit before starting next transaction
            ---TRANSACTION 1163193194, not started
            MySQL thread id 4, OS thread handle 0x7f364bcbf700, query id 17611878 Waiting for prior transaction to start commit before starting next transaction
            ---TRANSACTION 1163195196, not started
            MySQL thread id 7, OS thread handle 0x7f364bc76700, query id 17613866 Waiting for prior transaction to start commit before starting next transaction
            ---TRANSACTION 1163195070, not started
            MySQL thread id 6, OS thread handle 0x7f364bc2d700, query id 17613738 Waiting for prior transaction to start commit before starting next transaction
            ---TRANSACTION 1163194627, not started
            MySQL thread id 5, OS thread handle 0x7f3649dc5700, query id 17613298 Waiting for prior transaction to start commit before starting next transaction
            ---TRANSACTION 1163193193, not started
            MySQL thread id 8, OS thread handle 0x7f3649d7c700, query id 17611876 Waiting for prior transaction to start commit before starting next transaction
            ---TRANSACTION 1163193192, not started
            MySQL thread id 9, OS thread handle 0x7f3649d33700, query id 17611874 Waiting for prior transaction to start commit before starting next transaction
            ---TRANSACTION 1163195197, not started
            MySQL thread id 10, OS thread handle 0x7f3649cea700, query id 17613868 Waiting for prior transaction to start commit before starting next transaction
            ---TRANSACTION 1146477619, not started
            MySQL thread id 11, OS thread handle 0x7f3649ca1700, query id 0 Waiting for master to send event
            ---TRANSACTION 1146040835, not started
            MySQL thread id 1, OS thread handle 0x7f364bd9a700, query id 0 Waiting for background binlog tasks
            ---TRANSACTION 1163197448, ACTIVE 2372 sec
            mysql tables in use 1, locked 0
            MySQL thread id 22621, OS thread handle 0x7f3402bf4700, query id 18691034 localhost backup Sending data
            SELECT /*!40001 SQL_NO_CACHE */ * FROM `auctions`
            Trx read view will not see trx with id >= 1163197449, sees < 1163197449
            ----------------------------
            END OF INNODB MONITOR OUTPUT
            ============================

            danblack Daniel Black added a comment - fyi have seen this in production on 10.0.15. Have binlogs, innodb engine status and processlist if this would aid verification. Waiting on client permission to release binlogs.   Id User Host db Command Time State Info Progress 3 system user NULL Connect 972115 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 4 system user NULL Connect 972170 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 5 system user NULL Connect 972130 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 6 system user NULL Connect 972118 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 7 system user NULL Connect 972115 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 8 system user NULL Connect 972170 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 9 system user NULL Connect 972170 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 10 system user NULL Connect 972115 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 11 system user NULL Connect 1299835 Waiting for master to send event NULL 0.000 12 system user NULL Connect 984669 Waiting for room in worker thread event queue NULL 0.000 8763 replication 192.168.179.103:50232 NULL Binlog Dump 1292423 Master has sent all binlog to slave; waiting for binlog to be updated NULL 0.000 22621 backup localhost opalauctions_com Query 164 Writing to net SELECT /*!40001 SQL_NO_CACHE */ * FROM `auctions` 0.000 22637 root localhost NULL Query 0 init show processlist 0.000 Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.006188 Read_Master_Log_Pos: 88198862 Relay_Log_File: mysqld-relay-bin.001138 Relay_Log_Pos: 98608625 Relay_Master_Log_File: mysql-bin.006103 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 98608338 Relay_Log_Space: 8518426389 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 972186 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 12022 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: No Gtid_IO_Pos:   ===================================== 2014-12-30 01:38:19 7f33f4f24700 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 54 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 344551 srv_active, 0 srv_shutdown, 953736 srv_idle srv_master_thread log flush and writes: 1298179 ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 204986 OS WAIT ARRAY INFO: signal count 340511 Mutex spin waits 1282140, rounds 7709447, OS waits 71441 RW-shared spins 285715, rounds 4307335, OS waits 94460 RW-excl spins 244659, rounds 2516718, OS waits 6129 Spin rounds per wait: 6.01 mutex, 15.08 RW-shared, 10.29 RW-excl -------- FILE I/O -------- I/O thread 0 state: waiting for completed aio requests (insert buffer thread) I/O thread 1 state: waiting for completed aio requests (log thread) I/O thread 2 state: waiting for completed aio requests (read thread) I/O thread 3 state: waiting for completed aio requests (read thread) I/O thread 4 state: waiting for completed aio requests (read thread) I/O thread 5 state: waiting for completed aio requests (read thread) I/O thread 6 state: waiting for completed aio requests (write thread) I/O thread 7 state: waiting for completed aio requests (write thread) I/O thread 8 state: waiting for completed aio requests (write thread) I/O thread 9 state: waiting for completed aio requests (write thread) Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] , ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 43921260 OS file reads, 10843564 OS file writes, 9913825 OS fsyncs 107.26 reads/s, 16384 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 1, free list len 12, seg size 14, 14210 merges merged operations: insert 21673, delete mark 16195, delete 1206 discarded operations: insert 0, delete mark 0, delete 0 0.00 hash searches/s, 0.00 non-hash searches/s --- LOG --- Log sequence number 897195392457 Log flushed up to 897195392457 Pages flushed up to 897195392457 Last checkpoint at 897195392457 Max checkpoint age 277874566 Checkpoint age target 269190986 Modified age 0 Checkpoint age 0 0 pending log writes, 0 pending chkp writes 8884128 log i/o's done, 0.00 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ... -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue 1 read views open inside InnoDB 1 RW transactions active inside InnoDB 0 RO transactions active inside InnoDB 1 out of 1000 descriptors used ---OLDEST VIEW--- Normal read view Read view low limit trx n:o 1163197449 Read view up limit trx id 1163197449 Read view low limit trx id 1163197449 Read view individually stored trx ids: ----------------- Main thread process no. 2027, id 139861419865856, state: sleeping Number of rows inserted 1219224, updated 9708367, deleted 1175520, read 4898178761 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 616.60 reads/s Number of system rows inserted 8524700, updated 0, deleted 8524373, read 8524734 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s   TRANSACTIONS ------------ Trx id counter 1163197455 Purge done for trx's n:o < 1163195205 undo n:o < 0 state: running but idle History list length 2464 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0, not started MySQL thread id 22637, OS thread handle 0x7f33f4f24700, query id 18691069 localhost root init show engine innodb status ---TRANSACTION 1163195193, not started MySQL thread id 3, OS thread handle 0x7f364bd08700, query id 17613862 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163193194, not started MySQL thread id 4, OS thread handle 0x7f364bcbf700, query id 17611878 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163195196, not started MySQL thread id 7, OS thread handle 0x7f364bc76700, query id 17613866 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163195070, not started MySQL thread id 6, OS thread handle 0x7f364bc2d700, query id 17613738 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163194627, not started MySQL thread id 5, OS thread handle 0x7f3649dc5700, query id 17613298 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163193193, not started MySQL thread id 8, OS thread handle 0x7f3649d7c700, query id 17611876 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163193192, not started MySQL thread id 9, OS thread handle 0x7f3649d33700, query id 17611874 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163195197, not started MySQL thread id 10, OS thread handle 0x7f3649cea700, query id 17613868 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1146477619, not started MySQL thread id 11, OS thread handle 0x7f3649ca1700, query id 0 Waiting for master to send event ---TRANSACTION 1146040835, not started MySQL thread id 1, OS thread handle 0x7f364bd9a700, query id 0 Waiting for background binlog tasks ---TRANSACTION 1163197448, ACTIVE 2372 sec mysql tables in use 1, locked 0 MySQL thread id 22621, OS thread handle 0x7f3402bf4700, query id 18691034 localhost backup Sending data SELECT /*!40001 SQL_NO_CACHE */ * FROM `auctions` Trx read view will not see trx with id >= 1163197449, sees < 1163197449 ---------------------------- END OF INNODB MONITOR OUTPUT ============================
            danblack Daniel Black added a comment -

             stop  slave;
             
            MariaDB [(none)]> show processlist;
            +-------+------------------+-----------------------+-------+-------------+---------+--------------------------------------------------------------------------------+--------------------+----------+
            | Id    | User             | Host                  | db    | Command     | Time    | State                                                                          | Info               | Progress |
            +-------+------------------+-----------------------+-------+-------------+---------+--------------------------------------------------------------------------------+--------------------+----------+
            |     3 | system user      |                       | NULL  | Connect     |  977011 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |
            |     4 | system user      |                       | NULL  | Connect     |  977066 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |
            |     5 | system user      |                       | NULL  | Connect     |  977026 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |
            |     6 | system user      |                       | NULL  | Connect     |  977014 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |
            |     7 | system user      |                       | NULL  | Connect     |  977011 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |
            |     8 | system user      |                       | NULL  | Connect     |  977066 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |
            |     9 | system user      |                       | NULL  | Connect     |  977066 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |
            |    10 | system user      |                       | NULL  | Connect     |  977011 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |
            |    11 | system user      |                       | NULL  | Connect     | 1304731 | Waiting for master to send event                                               | NULL               |    0.000 |
            |    12 | system user      |                       | NULL  | Connect     |  989565 | Waiting for room in worker thread event queue                                  | NULL               |    0.000 |
            |  8763 | replication      | 192.168.179.103:50232 | NULL  | Binlog Dump | 1297319 | Master has sent all binlog to slave; waiting for binlog to be updated          | NULL               |    0.000 |
            | 22652 | root             | localhost             | NULL  | Sleep       |    1342 |                                                                                | NULL               |    0.000 |
            | 22658 | root             | localhost             | NULL  | Query       |     146 | Killing slave                                                                  | stop  slave        |    0.000 |
            | 22659 | root             | localhost             | NULL  | Killed      |      31 | init                                                                           | show slave status  |    0.000 |
            | 22660 | debian-sys-maint | localhost             | mysql | Query       |      86 | Filling schema table                                                           | SHOW GLOBAL STATUS |    0.000 |
            | 22661 | debian-sys-maint | localhost             | mysql | Query       |      30 | Filling schema table                                                           | SHOW GLOBAL STATUS |    0.000 |
            | 22664 | root             | localhost             | NULL  | Killed      |      26 | init                                                                           | show slave status  |    0.000 |
            | 22667 | root             | localhost             | NULL  | Query       |       0 | init                                                                           | show processlist   |    0.000 |
            +-------+------------------+-----------------------+-------+-------------+---------+--------------------------------------------------------------------------------+--------------------+----------+
             
             
            kill 3
             
            |     3 | system user      |                       | NULL             | Killed      |  977491 | closing tables                                                                 | NULL                                                     |    0.000 |
            |     4 | system user      |                       | NULL             | Connect     |  977546 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |
            |     5 | system user      |                       | NULL             | Connect     |  977506 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |
            |     6 | system user      |                       | NULL             | Connect     |  977494 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |
            |     7 | system user      |                       | NULL             | Connect     |  977491 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |
            |     8 | system user      |                       | NULL             | Connect     |  977546 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |
            |     9 | system user      |                       | NULL             | Connect     |  977546 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |
            |    10 | system user      |                       | NULL             | Connect     |  977491 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |
            |    11 | system user      |                       | NULL             | Connect     | 1305211 | Waiting for master to send event                                               | NULL                                                     |    0.000 |
            |    12 | system user      |                       | NULL             | Connect     |  990045 | Waiting for room in worker thread event queue                                  | NULL                                                     |    0.000 |
            |  8763 | replication      | 192.168.179.103:50232 | NULL             | Binlog Dump | 1297799 | Master has sent all binlog to slave; waiting for binlog to be updated          | NULL                                                     |    0.000 |
            | 22658 | root             | localhost             | NULL             | Query       |     626 | Killing slave                                                                  | stop  slave         
             
             
             
            kill user 'debian-sys-maint'
             
            | 22669 | debian-sys-maint | localhost             | mysql            | Killed      |     485 | Filling schema table                                                           | SHOW GLOBAL STATUS                                |    0.000 |
            | 22670 | debian-sys-maint | localhost             | mysql            | Killed      |     427 | Filling schema table                                                           | SHOW GLOBAL STATUS  
             
             
            daemon.log.1:Dec 30 03:03:03 media1 mysqld: 141230  3:03:03 [ERROR] Slave SQL: Connection was killed, Gtid 0-12022-203533062, Internal MariaDB error code: 1927
            daemon.log.1:Dec 30 03:03:03 media1 mysqld: 141230  3:03:03 [Warning] Slave: Connection was killed Error_code: 1927
            daemon.log.1:Dec 30 03:03:03 media1 mysqld: 141230  3:03:03 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.006103' position 98608338
             
             
            kill 11
             
            |    11 | system user      |                       | NULL             | Killed      | 1305918 | Waiting for slave mutex on exit                                                | NULL                
             
            kill 12
             
            |    12 | system user      |                       | NULL             | Connect     |  990812 | Reading event from the relay log 
             
            kill 4,5,6,7,8,9,10
             
            |     3 | system user |                       | NULL             | Connect     |  978378 | Waiting for work from SQL thread                                      | NULL                              
                            |    0.000 |
            |     4 | system user |                       | NULL             | Connect     |  978433 | Waiting for work from SQL thread                                      | NULL                              
                            |    0.000 |
            |     5 | system user |                       | NULL             | Connect     |  978393 | Waiting for work from SQL thread                                      | NULL                              
                            |    0.000 |
            |     6 | system user |                       | NULL             | Connect     |  978378 | Waiting for work from SQL thread                                      | NULL                              
                            |    0.000 |
            |     7 | system user |                       | NULL             | Killed      |  978378 | Waiting for work from SQL thread                                      | NULL                              
                            |    0.000 |
            |     8 | system user |                       | NULL             | Killed      |  978433 | Waiting for work from SQL thread                                      | NULL                              
                            |    0.000 |
            |     9 | system user |                       | NULL             | Killed      |  978433 | Waiting for work from SQL thread                                      | NULL                              
                            |    0.000 |
            |    10 | system user |                       | NULL             | Killed      |  978378 | Waiting for work from SQL thread                                      | NULL                              
                            |    0.000 |
             
             
                           Slave_IO_State: 
                              Master_Host: 192.168.179.103
                              Master_User: replication
                              Master_Port: 3306
                            Connect_Retry: 60
                          Master_Log_File: mysql-bin.006189
                      Read_Master_Log_Pos: 42196512
                           Relay_Log_File: mysqld-relay-bin.001138
                            Relay_Log_Pos: 98608625
                    Relay_Master_Log_File: mysql-bin.006103
                         Slave_IO_Running: No
                        Slave_SQL_Running: No
                          Replicate_Do_DB: 
                      Replicate_Ignore_DB: 
                       Replicate_Do_Table: 
                   Replicate_Ignore_Table: 
                  Replicate_Wild_Do_Table: 
              Replicate_Wild_Ignore_Table: 
                               Last_Errno: 1927
                               Last_Error: Error 'Connection was killed' on query. Default database: 'opalauctions_com'. Query: 'UPDATE `sessions` SET `data` = 'redire,,,,,rk\";', `date_updated` = '2014-12-18 19:32:57' WHERE `id` =  'uvXXXXXXXX5uo9vcj1''
             
             
            set global slave_parallel_threads=30; start slave;
             
             
            master settings:
            | binlog_commit_wait_count                | 20        |
            | binlog_commit_wait_usec                 | 1000      |
             
             

            obviously slave parallel_threads was 8 and with binlog_commit_wait_count at 20 Is this putting it into a bad code path?

            danblack Daniel Black added a comment - stop slave;   MariaDB [(none)]> show processlist; +-------+------------------+-----------------------+-------+-------------+---------+--------------------------------------------------------------------------------+--------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +-------+------------------+-----------------------+-------+-------------+---------+--------------------------------------------------------------------------------+--------------------+----------+ | 3 | system user | | NULL | Connect | 977011 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 4 | system user | | NULL | Connect | 977066 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 5 | system user | | NULL | Connect | 977026 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 6 | system user | | NULL | Connect | 977014 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 7 | system user | | NULL | Connect | 977011 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 8 | system user | | NULL | Connect | 977066 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 9 | system user | | NULL | Connect | 977066 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 10 | system user | | NULL | Connect | 977011 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 11 | system user | | NULL | Connect | 1304731 | Waiting for master to send event | NULL | 0.000 | | 12 | system user | | NULL | Connect | 989565 | Waiting for room in worker thread event queue | NULL | 0.000 | | 8763 | replication | 192.168.179.103:50232 | NULL | Binlog Dump | 1297319 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL | 0.000 | | 22652 | root | localhost | NULL | Sleep | 1342 | | NULL | 0.000 | | 22658 | root | localhost | NULL | Query | 146 | Killing slave | stop slave | 0.000 | | 22659 | root | localhost | NULL | Killed | 31 | init | show slave status | 0.000 | | 22660 | debian-sys-maint | localhost | mysql | Query | 86 | Filling schema table | SHOW GLOBAL STATUS | 0.000 | | 22661 | debian-sys-maint | localhost | mysql | Query | 30 | Filling schema table | SHOW GLOBAL STATUS | 0.000 | | 22664 | root | localhost | NULL | Killed | 26 | init | show slave status | 0.000 | | 22667 | root | localhost | NULL | Query | 0 | init | show processlist | 0.000 | +-------+------------------+-----------------------+-------+-------------+---------+--------------------------------------------------------------------------------+--------------------+----------+     kill 3   | 3 | system user | | NULL | Killed | 977491 | closing tables | NULL | 0.000 | | 4 | system user | | NULL | Connect | 977546 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 5 | system user | | NULL | Connect | 977506 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 6 | system user | | NULL | Connect | 977494 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 7 | system user | | NULL | Connect | 977491 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 8 | system user | | NULL | Connect | 977546 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 9 | system user | | NULL | Connect | 977546 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 10 | system user | | NULL | Connect | 977491 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 | | 11 | system user | | NULL | Connect | 1305211 | Waiting for master to send event | NULL | 0.000 | | 12 | system user | | NULL | Connect | 990045 | Waiting for room in worker thread event queue | NULL | 0.000 | | 8763 | replication | 192.168.179.103:50232 | NULL | Binlog Dump | 1297799 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL | 0.000 | | 22658 | root | localhost | NULL | Query | 626 | Killing slave | stop slave       kill user 'debian-sys-maint'   | 22669 | debian-sys-maint | localhost | mysql | Killed | 485 | Filling schema table | SHOW GLOBAL STATUS | 0.000 | | 22670 | debian-sys-maint | localhost | mysql | Killed | 427 | Filling schema table | SHOW GLOBAL STATUS     daemon.log.1:Dec 30 03:03:03 media1 mysqld: 141230 3:03:03 [ERROR] Slave SQL: Connection was killed, Gtid 0-12022-203533062, Internal MariaDB error code: 1927 daemon.log.1:Dec 30 03:03:03 media1 mysqld: 141230 3:03:03 [Warning] Slave: Connection was killed Error_code: 1927 daemon.log.1:Dec 30 03:03:03 media1 mysqld: 141230 3:03:03 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.006103' position 98608338     kill 11   | 11 | system user | | NULL | Killed | 1305918 | Waiting for slave mutex on exit | NULL   kill 12   | 12 | system user | | NULL | Connect | 990812 | Reading event from the relay log   kill 4,5,6,7,8,9,10   | 3 | system user | | NULL | Connect | 978378 | Waiting for work from SQL thread | NULL | 0.000 | | 4 | system user | | NULL | Connect | 978433 | Waiting for work from SQL thread | NULL | 0.000 | | 5 | system user | | NULL | Connect | 978393 | Waiting for work from SQL thread | NULL | 0.000 | | 6 | system user | | NULL | Connect | 978378 | Waiting for work from SQL thread | NULL | 0.000 | | 7 | system user | | NULL | Killed | 978378 | Waiting for work from SQL thread | NULL | 0.000 | | 8 | system user | | NULL | Killed | 978433 | Waiting for work from SQL thread | NULL | 0.000 | | 9 | system user | | NULL | Killed | 978433 | Waiting for work from SQL thread | NULL | 0.000 | | 10 | system user | | NULL | Killed | 978378 | Waiting for work from SQL thread | NULL | 0.000 |     Slave_IO_State: Master_Host: 192.168.179.103 Master_User: replication Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.006189 Read_Master_Log_Pos: 42196512 Relay_Log_File: mysqld-relay-bin.001138 Relay_Log_Pos: 98608625 Relay_Master_Log_File: mysql-bin.006103 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1927 Last_Error: Error 'Connection was killed' on query. Default database: 'opalauctions_com'. Query: 'UPDATE `sessions` SET `data` = 'redire,,,,,rk\";', `date_updated` = '2014-12-18 19:32:57' WHERE `id` = 'uvXXXXXXXX5uo9vcj1''     set global slave_parallel_threads=30; start slave;     master settings: | binlog_commit_wait_count | 20 | | binlog_commit_wait_usec | 1000 |     obviously slave parallel_threads was 8 and with binlog_commit_wait_count at 20 Is this putting it into a bad code path?

            > obviously slave parallel_threads was 8 and with binlog_commit_wait_count at
            > 20 Is this putting it into a bad code path?

            The patch linked in an earlier comment should fix the lockup. It is related
            somehow to a transaction deadlock that is detected and resolved automatically,
            the exact cause of the deadlock is not known at the moment. So there should
            not be any "putting into bad code path" or similar error on the part of the
            user here.

            That being said, with slave_parallel_threads=8 and
            binlog_commit_wait_count=20, it will not actually be possible to get 20
            transactions in one group commit, unless there is other activity going on at
            the same time (eg. simultaneous non-replication updates on the same box). If
            there is only replication activity, then the binlog_commit_wait_usec timeout
            will always occur, so binlog_commit_wait_count is effectively infinite. Which
            might or might not be what was intended.

            knielsen Kristian Nielsen added a comment - > obviously slave parallel_threads was 8 and with binlog_commit_wait_count at > 20 Is this putting it into a bad code path? The patch linked in an earlier comment should fix the lockup. It is related somehow to a transaction deadlock that is detected and resolved automatically, the exact cause of the deadlock is not known at the moment. So there should not be any "putting into bad code path" or similar error on the part of the user here. That being said, with slave_parallel_threads=8 and binlog_commit_wait_count=20, it will not actually be possible to get 20 transactions in one group commit, unless there is other activity going on at the same time (eg. simultaneous non-replication updates on the same box). If there is only replication activity, then the binlog_commit_wait_usec timeout will always occur, so binlog_commit_wait_count is effectively infinite. Which might or might not be what was intended.
            danblack Daniel Black added a comment -

            > The patch linked in an earlier comment should fix the lockup. It is related
            somehow to a transaction deadlock that is detected and resolved automatically,
            the exact cause of the deadlock is not known at the moment.

            Good to know.

            Yes having 20/8 as settings was a mistake. I was looking for a setting combination to avoid this bug however there doesn't appear to be one.

            Sorry to say i've added MDEV-7395 / MDEV-7396 . Hope they are easy.

            danblack Daniel Black added a comment - > The patch linked in an earlier comment should fix the lockup. It is related somehow to a transaction deadlock that is detected and resolved automatically, the exact cause of the deadlock is not known at the moment. Good to know. Yes having 20/8 as settings was a mistake. I was looking for a setting combination to avoid this bug however there doesn't appear to be one. Sorry to say i've added MDEV-7395 / MDEV-7396 . Hope they are easy.
            knielsen Kristian Nielsen added a comment - Follow-up patch: http://lists.askmonty.org/pipermail/commits/2015-January/007227.html

            Fix pushed to 10.0.16.

            knielsen Kristian Nielsen added a comment - Fix pushed to 10.0.16.

            People

              knielsen Kristian Nielsen
              knielsen Kristian Nielsen
              Votes:
              5 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.