[MDEV-7326] Server deadlock in connection with parallel replication - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.0.15
Fix Version/s: 10.0.16
Component/s: Replication
Labels:
- parallelslave
- replication

Description

A user testing parallel replication is seeing the server hang.
From debugging, it appears that the following happens.

One batch of group-committed transactions from the master all reach their
commit phase, say T1 and T2.

This causes the transactions in the following batch, say T3, T4, and T5, to
wake up and start running.

At this point, T2 (say) gets a deadlock or other temporary error, and needs to
be rolled back and retried. This involves unmark_start_commit(), decrementing
the count of transactions that have already reached their commit step.

Now when T3, T4, and T5 reach their commit step, they do mark_start_commit().
But a following T6 is not woken up, because T2 has not yet done
mark_start_commit().

Then when T2 has been successfully retried and done mark_start_commit(), the
bug is that T6 is not woken up. The wakeup is lost. This is because T2 only
considers the batch with T3-T5 for wakeup, not following batches.

It seems fairly certain that this is the scenario that the user
experienced. It is however unknown at this point how it is possible for T2 to
get a deadlock error, after it has run all its containing queries and has
started the commit step. (The idea is that if there was a deadlock with T1,
then T1 will be blocked from proceeding to mark_start_commit() until T2 has
done rollback; and T2 does unmark_start_commit() before its rollback).

The fix should be to make sure that this case, of T2 retrying after T3-T5 have
started running, is handled correctly: When T2 completes its retry, all
following and possibly waiting transactions should be considered, so the
wakeup is not lost.

The user-visible effect in this hang is that at least one replication worker
threads are stuck in state "Waiting for prior transaction to start commit
before starting next transaction", and all other threads are stuck in this
state or the state "Waiting for prior transaction to commit". (As seen in SHOW
PROCESSLIST). Killing the worker threads will stop replication, and it can
then be re-started successfully.

Attachments

Issue Links

relates to

MDEV-7458 Deadlock in parallel replication can allow following transaction to start replicating too early

Closed

Activity

Ascending order - Click to sort in descending order

Kristian Nielsen added a comment - 2014-12-21 13:42

Testcase and patches:

http://lists.askmonty.org/pipermail/commits/2014-December/007204.html
http://lists.askmonty.org/pipermail/commits/2014-December/007205.html

Kristian Nielsen added a comment - 2014-12-21 13:42 Testcase and patches: http://lists.askmonty.org/pipermail/commits/2014-December/007204.html http://lists.askmonty.org/pipermail/commits/2014-December/007205.html

Daniel Black added a comment - 2014-12-30 04:53

fyi have seen this in production on 10.0.15. Have binlogs, innodb engine status and processlist if this would aid verification. Waiting on client permission to release binlogs.

Id	User	Host	db	Command	Time	State	Info	Progress

3	system user		NULL	Connect	972115	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000

4	system user		NULL	Connect	972170	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000

5	system user		NULL	Connect	972130	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000

6	system user		NULL	Connect	972118	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000

7	system user		NULL	Connect	972115	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000

8	system user		NULL	Connect	972170	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000

9	system user		NULL	Connect	972170	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000

10	system user		NULL	Connect	972115	Waiting for prior transaction to start commit before starting next transaction	NULL	0.000

11	system user		NULL	Connect	1299835	Waiting for master to send event	NULL	0.000

12	system user		NULL	Connect	984669	Waiting for room in worker thread event queue	NULL	0.000

8763	replication	192.168.179.103:50232	NULL	Binlog Dump	1292423	Master has sent all binlog to slave; waiting for binlog to be updated	NULL	0.000

22621	backup	localhost	opalauctions_com	Query	164	Writing to net	SELECT /*!40001 SQL_NO_CACHE */ * FROM `auctions`	0.000

22637	root	localhost	NULL	Query	0	init	show processlist	0.000

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: mysql-bin.006188

          Read_Master_Log_Pos: 88198862

               Relay_Log_File: mysqld-relay-bin.001138

                Relay_Log_Pos: 98608625

        Relay_Master_Log_File: mysql-bin.006103

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 98608338

              Relay_Log_Space: 8518426389

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 972186

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 12022

               Master_SSL_Crl:

           Master_SSL_Crlpath:

                   Using_Gtid: No

                  Gtid_IO_Pos:

=====================================

2014-12-30 01:38:19 7f33f4f24700 INNODB MONITOR OUTPUT

=====================================

Per second averages calculated from the last 54 seconds

-----------------

BACKGROUND THREAD

-----------------

srv_master_thread loops: 344551 srv_active, 0 srv_shutdown, 953736 srv_idle

srv_master_thread log flush and writes: 1298179

----------

SEMAPHORES

----------

OS WAIT ARRAY INFO: reservation count 204986

OS WAIT ARRAY INFO: signal count 340511

Mutex spin waits 1282140, rounds 7709447, OS waits 71441

RW-shared spins 285715, rounds 4307335, OS waits 94460

RW-excl spins 244659, rounds 2516718, OS waits 6129

Spin rounds per wait: 6.01 mutex, 15.08 RW-shared, 10.29 RW-excl

--------

FILE I/O

--------

I/O thread 0 state: waiting for completed aio requests (insert buffer thread)

I/O thread 1 state: waiting for completed aio requests (log thread)

I/O thread 2 state: waiting for completed aio requests (read thread)

I/O thread 3 state: waiting for completed aio requests (read thread)

I/O thread 4 state: waiting for completed aio requests (read thread)

I/O thread 5 state: waiting for completed aio requests (read thread)

I/O thread 6 state: waiting for completed aio requests (write thread)

I/O thread 7 state: waiting for completed aio requests (write thread)

I/O thread 8 state: waiting for completed aio requests (write thread)

I/O thread 9 state: waiting for completed aio requests (write thread)

Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,

 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0

Pending flushes (fsync) log: 0; buffer pool: 0

43921260 OS file reads, 10843564 OS file writes, 9913825 OS fsyncs

107.26 reads/s, 16384 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s

-------------------------------------

INSERT BUFFER AND ADAPTIVE HASH INDEX

-------------------------------------

Ibuf: size 1, free list len 12, seg size 14, 14210 merges

merged operations:

 insert 21673, delete mark 16195, delete 1206

discarded operations:

 insert 0, delete mark 0, delete 0

0.00 hash searches/s, 0.00 non-hash searches/s

---

LOG

---

Log sequence number 897195392457

Log flushed up to   897195392457

Pages flushed up to 897195392457

Last checkpoint at  897195392457

Max checkpoint age    277874566

Checkpoint age target 269190986

Modified age          0

Checkpoint age        0

0 pending log writes, 0 pending chkp writes

8884128 log i/o's done, 0.00 log i/o's/second

----------------------

BUFFER POOL AND MEMORY

...

--------------

ROW OPERATIONS

--------------

0 queries inside InnoDB, 0 queries in queue

1 read views open inside InnoDB

1 RW transactions active inside InnoDB

0 RO transactions active inside InnoDB

1 out of 1000 descriptors used

---OLDEST VIEW---

Normal read view

Read view low limit trx n:o 1163197449

Read view up limit trx id 1163197449

Read view low limit trx id 1163197449

Read view individually stored trx ids:

-----------------

Main thread process no. 2027, id 139861419865856, state: sleeping

Number of rows inserted 1219224, updated 9708367, deleted 1175520, read 4898178761

0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 616.60 reads/s

Number of system rows inserted 8524700, updated 0, deleted 8524373, read 8524734

0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s

TRANSACTIONS

------------

Trx id counter 1163197455

Purge done for trx's n:o < 1163195205 undo n:o < 0 state: running but idle

History list length 2464

LIST OF TRANSACTIONS FOR EACH SESSION:

---TRANSACTION 0, not started

MySQL thread id 22637, OS thread handle 0x7f33f4f24700, query id 18691069 localhost root init

show engine innodb status

---TRANSACTION 1163195193, not started

MySQL thread id 3, OS thread handle 0x7f364bd08700, query id 17613862 Waiting for prior transaction to start commit before starting next transaction

---TRANSACTION 1163193194, not started

MySQL thread id 4, OS thread handle 0x7f364bcbf700, query id 17611878 Waiting for prior transaction to start commit before starting next transaction

---TRANSACTION 1163195196, not started

MySQL thread id 7, OS thread handle 0x7f364bc76700, query id 17613866 Waiting for prior transaction to start commit before starting next transaction

---TRANSACTION 1163195070, not started

MySQL thread id 6, OS thread handle 0x7f364bc2d700, query id 17613738 Waiting for prior transaction to start commit before starting next transaction

---TRANSACTION 1163194627, not started

MySQL thread id 5, OS thread handle 0x7f3649dc5700, query id 17613298 Waiting for prior transaction to start commit before starting next transaction

---TRANSACTION 1163193193, not started

MySQL thread id 8, OS thread handle 0x7f3649d7c700, query id 17611876 Waiting for prior transaction to start commit before starting next transaction

---TRANSACTION 1163193192, not started

MySQL thread id 9, OS thread handle 0x7f3649d33700, query id 17611874 Waiting for prior transaction to start commit before starting next transaction

---TRANSACTION 1163195197, not started

MySQL thread id 10, OS thread handle 0x7f3649cea700, query id 17613868 Waiting for prior transaction to start commit before starting next transaction

---TRANSACTION 1146477619, not started

MySQL thread id 11, OS thread handle 0x7f3649ca1700, query id 0 Waiting for master to send event

---TRANSACTION 1146040835, not started

MySQL thread id 1, OS thread handle 0x7f364bd9a700, query id 0 Waiting for background binlog tasks

---TRANSACTION 1163197448, ACTIVE 2372 sec

mysql tables in use 1, locked 0

MySQL thread id 22621, OS thread handle 0x7f3402bf4700, query id 18691034 localhost backup Sending data

SELECT /*!40001 SQL_NO_CACHE */ * FROM `auctions`

Trx read view will not see trx with id >= 1163197449, sees < 1163197449

----------------------------

END OF INNODB MONITOR OUTPUT

============================

Daniel Black added a comment - 2014-12-30 04:53 fyi have seen this in production on 10.0.15. Have binlogs, innodb engine status and processlist if this would aid verification. Waiting on client permission to release binlogs. Id User Host db Command Time State Info Progress 3 system user NULL Connect 972115 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 4 system user NULL Connect 972170 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 5 system user NULL Connect 972130 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 6 system user NULL Connect 972118 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 7 system user NULL Connect 972115 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 8 system user NULL Connect 972170 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 9 system user NULL Connect 972170 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 10 system user NULL Connect 972115 Waiting for prior transaction to start commit before starting next transaction NULL 0.000 11 system user NULL Connect 1299835 Waiting for master to send event NULL 0.000 12 system user NULL Connect 984669 Waiting for room in worker thread event queue NULL 0.000 8763 replication 192.168.179.103:50232 NULL Binlog Dump 1292423 Master has sent all binlog to slave; waiting for binlog to be updated NULL 0.000 22621 backup localhost opalauctions_com Query 164 Writing to net SELECT /*!40001 SQL_NO_CACHE */ * FROM `auctions` 0.000 22637 root localhost NULL Query 0 init show processlist 0.000 Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.006188 Read_Master_Log_Pos: 88198862 Relay_Log_File: mysqld-relay-bin.001138 Relay_Log_Pos: 98608625 Relay_Master_Log_File: mysql-bin.006103 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 98608338 Relay_Log_Space: 8518426389 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 972186 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 12022 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: No Gtid_IO_Pos: ===================================== 2014-12-30 01:38:19 7f33f4f24700 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 54 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 344551 srv_active, 0 srv_shutdown, 953736 srv_idle srv_master_thread log flush and writes: 1298179 ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 204986 OS WAIT ARRAY INFO: signal count 340511 Mutex spin waits 1282140, rounds 7709447, OS waits 71441 RW-shared spins 285715, rounds 4307335, OS waits 94460 RW-excl spins 244659, rounds 2516718, OS waits 6129 Spin rounds per wait: 6.01 mutex, 15.08 RW-shared, 10.29 RW-excl -------- FILE I/O -------- I/O thread 0 state: waiting for completed aio requests (insert buffer thread) I/O thread 1 state: waiting for completed aio requests (log thread) I/O thread 2 state: waiting for completed aio requests (read thread) I/O thread 3 state: waiting for completed aio requests (read thread) I/O thread 4 state: waiting for completed aio requests (read thread) I/O thread 5 state: waiting for completed aio requests (read thread) I/O thread 6 state: waiting for completed aio requests (write thread) I/O thread 7 state: waiting for completed aio requests (write thread) I/O thread 8 state: waiting for completed aio requests (write thread) I/O thread 9 state: waiting for completed aio requests (write thread) Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] , ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 43921260 OS file reads, 10843564 OS file writes, 9913825 OS fsyncs 107.26 reads/s, 16384 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 1, free list len 12, seg size 14, 14210 merges merged operations: insert 21673, delete mark 16195, delete 1206 discarded operations: insert 0, delete mark 0, delete 0 0.00 hash searches/s, 0.00 non-hash searches/s --- LOG --- Log sequence number 897195392457 Log flushed up to 897195392457 Pages flushed up to 897195392457 Last checkpoint at 897195392457 Max checkpoint age 277874566 Checkpoint age target 269190986 Modified age 0 Checkpoint age 0 0 pending log writes, 0 pending chkp writes 8884128 log i/o's done, 0.00 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ... -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue 1 read views open inside InnoDB 1 RW transactions active inside InnoDB 0 RO transactions active inside InnoDB 1 out of 1000 descriptors used ---OLDEST VIEW--- Normal read view Read view low limit trx n:o 1163197449 Read view up limit trx id 1163197449 Read view low limit trx id 1163197449 Read view individually stored trx ids: ----------------- Main thread process no. 2027, id 139861419865856, state: sleeping Number of rows inserted 1219224, updated 9708367, deleted 1175520, read 4898178761 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 616.60 reads/s Number of system rows inserted 8524700, updated 0, deleted 8524373, read 8524734 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s TRANSACTIONS ------------ Trx id counter 1163197455 Purge done for trx's n:o < 1163195205 undo n:o < 0 state: running but idle History list length 2464 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0, not started MySQL thread id 22637, OS thread handle 0x7f33f4f24700, query id 18691069 localhost root init show engine innodb status ---TRANSACTION 1163195193, not started MySQL thread id 3, OS thread handle 0x7f364bd08700, query id 17613862 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163193194, not started MySQL thread id 4, OS thread handle 0x7f364bcbf700, query id 17611878 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163195196, not started MySQL thread id 7, OS thread handle 0x7f364bc76700, query id 17613866 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163195070, not started MySQL thread id 6, OS thread handle 0x7f364bc2d700, query id 17613738 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163194627, not started MySQL thread id 5, OS thread handle 0x7f3649dc5700, query id 17613298 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163193193, not started MySQL thread id 8, OS thread handle 0x7f3649d7c700, query id 17611876 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163193192, not started MySQL thread id 9, OS thread handle 0x7f3649d33700, query id 17611874 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1163195197, not started MySQL thread id 10, OS thread handle 0x7f3649cea700, query id 17613868 Waiting for prior transaction to start commit before starting next transaction ---TRANSACTION 1146477619, not started MySQL thread id 11, OS thread handle 0x7f3649ca1700, query id 0 Waiting for master to send event ---TRANSACTION 1146040835, not started MySQL thread id 1, OS thread handle 0x7f364bd9a700, query id 0 Waiting for background binlog tasks ---TRANSACTION 1163197448, ACTIVE 2372 sec mysql tables in use 1, locked 0 MySQL thread id 22621, OS thread handle 0x7f3402bf4700, query id 18691034 localhost backup Sending data SELECT /*!40001 SQL_NO_CACHE */ * FROM `auctions` Trx read view will not see trx with id >= 1163197449, sees < 1163197449 ---------------------------- END OF INNODB MONITOR OUTPUT ============================

Daniel Black added a comment - 2014-12-30 05:28

 stop  slave;

MariaDB [(none)]> show processlist;

+-------+------------------+-----------------------+-------+-------------+---------+--------------------------------------------------------------------------------+--------------------+----------+

| Id    | User             | Host                  | db    | Command     | Time    | State                                                                          | Info               | Progress |

+-------+------------------+-----------------------+-------+-------------+---------+--------------------------------------------------------------------------------+--------------------+----------+

|     3 | system user      |                       | NULL  | Connect     |  977011 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |

|     4 | system user      |                       | NULL  | Connect     |  977066 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |

|     5 | system user      |                       | NULL  | Connect     |  977026 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |

|     6 | system user      |                       | NULL  | Connect     |  977014 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |

|     7 | system user      |                       | NULL  | Connect     |  977011 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |

|     8 | system user      |                       | NULL  | Connect     |  977066 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |

|     9 | system user      |                       | NULL  | Connect     |  977066 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |

|    10 | system user      |                       | NULL  | Connect     |  977011 | Waiting for prior transaction to start commit before starting next transaction | NULL               |    0.000 |

|    11 | system user      |                       | NULL  | Connect     | 1304731 | Waiting for master to send event                                               | NULL               |    0.000 |

|    12 | system user      |                       | NULL  | Connect     |  989565 | Waiting for room in worker thread event queue                                  | NULL               |    0.000 |

|  8763 | replication      | 192.168.179.103:50232 | NULL  | Binlog Dump | 1297319 | Master has sent all binlog to slave; waiting for binlog to be updated          | NULL               |    0.000 |

| 22652 | root             | localhost             | NULL  | Sleep       |    1342 |                                                                                | NULL               |    0.000 |

| 22658 | root             | localhost             | NULL  | Query       |     146 | Killing slave                                                                  | stop  slave        |    0.000 |

| 22659 | root             | localhost             | NULL  | Killed      |      31 | init                                                                           | show slave status  |    0.000 |

| 22660 | debian-sys-maint | localhost             | mysql | Query       |      86 | Filling schema table                                                           | SHOW GLOBAL STATUS |    0.000 |

| 22661 | debian-sys-maint | localhost             | mysql | Query       |      30 | Filling schema table                                                           | SHOW GLOBAL STATUS |    0.000 |

| 22664 | root             | localhost             | NULL  | Killed      |      26 | init                                                                           | show slave status  |    0.000 |

| 22667 | root             | localhost             | NULL  | Query       |       0 | init                                                                           | show processlist   |    0.000 |

+-------+------------------+-----------------------+-------+-------------+---------+--------------------------------------------------------------------------------+--------------------+----------+

kill 3

|     3 | system user      |                       | NULL             | Killed      |  977491 | closing tables                                                                 | NULL                                                     |    0.000 |

|     4 | system user      |                       | NULL             | Connect     |  977546 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |

|     5 | system user      |                       | NULL             | Connect     |  977506 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |

|     6 | system user      |                       | NULL             | Connect     |  977494 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |

|     7 | system user      |                       | NULL             | Connect     |  977491 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |

|     8 | system user      |                       | NULL             | Connect     |  977546 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |

|     9 | system user      |                       | NULL             | Connect     |  977546 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |

|    10 | system user      |                       | NULL             | Connect     |  977491 | Waiting for prior transaction to start commit before starting next transaction | NULL                                                     |    0.000 |

|    11 | system user      |                       | NULL             | Connect     | 1305211 | Waiting for master to send event                                               | NULL                                                     |    0.000 |

|    12 | system user      |                       | NULL             | Connect     |  990045 | Waiting for room in worker thread event queue                                  | NULL                                                     |    0.000 |

|  8763 | replication      | 192.168.179.103:50232 | NULL             | Binlog Dump | 1297799 | Master has sent all binlog to slave; waiting for binlog to be updated          | NULL                                                     |    0.000 |

| 22658 | root             | localhost             | NULL             | Query       |     626 | Killing slave                                                                  | stop  slave

kill user 'debian-sys-maint'

| 22669 | debian-sys-maint | localhost             | mysql            | Killed      |     485 | Filling schema table                                                           | SHOW GLOBAL STATUS                                |    0.000 |

| 22670 | debian-sys-maint | localhost             | mysql            | Killed      |     427 | Filling schema table                                                           | SHOW GLOBAL STATUS

daemon.log.1:Dec 30 03:03:03 media1 mysqld: 141230  3:03:03 [ERROR] Slave SQL: Connection was killed, Gtid 0-12022-203533062, Internal MariaDB error code: 1927

daemon.log.1:Dec 30 03:03:03 media1 mysqld: 141230  3:03:03 [Warning] Slave: Connection was killed Error_code: 1927

daemon.log.1:Dec 30 03:03:03 media1 mysqld: 141230  3:03:03 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.006103' position 98608338

kill 11

|    11 | system user      |                       | NULL             | Killed      | 1305918 | Waiting for slave mutex on exit                                                | NULL

kill 12

|    12 | system user      |                       | NULL             | Connect     |  990812 | Reading event from the relay log

kill 4,5,6,7,8,9,10

|     3 | system user |                       | NULL             | Connect     |  978378 | Waiting for work from SQL thread                                      | NULL

                |    0.000 |

|     4 | system user |                       | NULL             | Connect     |  978433 | Waiting for work from SQL thread                                      | NULL

                |    0.000 |

|     5 | system user |                       | NULL             | Connect     |  978393 | Waiting for work from SQL thread                                      | NULL

                |    0.000 |

|     6 | system user |                       | NULL             | Connect     |  978378 | Waiting for work from SQL thread                                      | NULL

                |    0.000 |

|     7 | system user |                       | NULL             | Killed      |  978378 | Waiting for work from SQL thread                                      | NULL

                |    0.000 |

|     8 | system user |                       | NULL             | Killed      |  978433 | Waiting for work from SQL thread                                      | NULL

                |    0.000 |

|     9 | system user |                       | NULL             | Killed      |  978433 | Waiting for work from SQL thread                                      | NULL

                |    0.000 |

|    10 | system user |                       | NULL             | Killed      |  978378 | Waiting for work from SQL thread                                      | NULL

                |    0.000 |

               Slave_IO_State:

                  Master_Host: 192.168.179.103

                  Master_User: replication

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: mysql-bin.006189

          Read_Master_Log_Pos: 42196512

               Relay_Log_File: mysqld-relay-bin.001138

                Relay_Log_Pos: 98608625

        Relay_Master_Log_File: mysql-bin.006103

             Slave_IO_Running: No

            Slave_SQL_Running: No

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 1927

                   Last_Error: Error 'Connection was killed' on query. Default database: 'opalauctions_com'. Query: 'UPDATE `sessions` SET `data` = 'redire,,,,,rk\";', `date_updated` = '2014-12-18 19:32:57' WHERE `id` =  'uvXXXXXXXX5uo9vcj1''

set global slave_parallel_threads=30; start slave;

master settings:

| binlog_commit_wait_count                | 20        |

| binlog_commit_wait_usec                 | 1000      |

obviously slave parallel_threads was 8 and with binlog_commit_wait_count at 20 Is this putting it into a bad code path?

Kristian Nielsen added a comment - 2014-12-30 15:33

> obviously slave parallel_threads was 8 and with binlog_commit_wait_count at
> 20 Is this putting it into a bad code path?

The patch linked in an earlier comment should fix the lockup. It is related
somehow to a transaction deadlock that is detected and resolved automatically,
the exact cause of the deadlock is not known at the moment. So there should
not be any "putting into bad code path" or similar error on the part of the
user here.

That being said, with slave_parallel_threads=8 and
binlog_commit_wait_count=20, it will not actually be possible to get 20
transactions in one group commit, unless there is other activity going on at
the same time (eg. simultaneous non-replication updates on the same box). If
there is only replication activity, then the binlog_commit_wait_usec timeout
will always occur, so binlog_commit_wait_count is effectively infinite. Which
might or might not be what was intended.

Kristian Nielsen added a comment - 2014-12-30 15:33 > obviously slave parallel_threads was 8 and with binlog_commit_wait_count at > 20 Is this putting it into a bad code path? The patch linked in an earlier comment should fix the lockup. It is related somehow to a transaction deadlock that is detected and resolved automatically, the exact cause of the deadlock is not known at the moment. So there should not be any "putting into bad code path" or similar error on the part of the user here. That being said, with slave_parallel_threads=8 and binlog_commit_wait_count=20, it will not actually be possible to get 20 transactions in one group commit, unless there is other activity going on at the same time (eg. simultaneous non-replication updates on the same box). If there is only replication activity, then the binlog_commit_wait_usec timeout will always occur, so binlog_commit_wait_count is effectively infinite. Which might or might not be what was intended.

Daniel Black added a comment - 2014-12-31 08:00

> The patch linked in an earlier comment should fix the lockup. It is related
somehow to a transaction deadlock that is detected and resolved automatically,
the exact cause of the deadlock is not known at the moment.

Good to know.

Yes having 20/8 as settings was a mistake. I was looking for a setting combination to avoid this bug however there doesn't appear to be one.

Sorry to say i've added ~~MDEV-7395~~ / ~~MDEV-7396~~ . Hope they are easy.

Daniel Black added a comment - 2014-12-31 08:00 > The patch linked in an earlier comment should fix the lockup. It is related somehow to a transaction deadlock that is detected and resolved automatically, the exact cause of the deadlock is not known at the moment. Good to know. Yes having 20/8 as settings was a mistake. I was looking for a setting combination to avoid this bug however there doesn't appear to be one. Sorry to say i've added MDEV-7395 / MDEV-7396 . Hope they are easy.

Kristian Nielsen added a comment - 2015-01-05 14:34

Follow-up patch:

http://lists.askmonty.org/pipermail/commits/2015-January/007227.html

Kristian Nielsen added a comment - 2015-01-05 14:34 Follow-up patch: http://lists.askmonty.org/pipermail/commits/2015-January/007227.html

Kristian Nielsen added a comment - 2015-01-12 09:41

Fix pushed to 10.0.16.

Kristian Nielsen added a comment - 2015-01-12 09:41 Fix pushed to 10.0.16.

People

Assignee:: Kristian Nielsen

Reporter:: Kristian Nielsen

Votes:: 5 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2014-12-15 15:56

Updated:: 2015-01-14 15:14

Resolved:: 2015-01-12 09:41

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server