[MDEV-8031] Parallel replication stops on "connection killed" error (probably incorrectly handled deadlock kill) Created: 2015-04-21  Updated: 2023-12-19  Resolved: 2015-04-23

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0, 10.1
Fix Version/s: 10.0.18, 10.1.5

Type: Bug Priority: Critical
Reporter: Kristian Nielsen Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: parallelslave, replication


 Description   

Parallel replication stopped like this:

150419 11:44:05 [ERROR] Slave SQL: Connection was killed, Gtid 0-187203009-1533130924, Internal MariaDB error code: 1927
150419 11:44:05 [Warning] Slave: Connection was killed Error_code: 1927
150419 11:44:05 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
150419 11:44:05 [Warning] Slave: Connection was killed Error_code: 1927
150419 11:44:05 [Warning] Slave: Connection was killed Error_code: 1927
150419 11:44:05 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000294' position 193637713

This is conservative mode on 10.1.4. Bug may also apply to 10.0, to be
determined.

The problem is likely a rare race, it only occured once so far for a user on
a highly loaded system.

The current theory is that a transaction is deadlock killed due to normal
deadlock condition (1927). That error is converted to a deadlock error for
transaction retry (1213). Then more deadlock kills arrive, which is normal
(more 1927). And for some reason those subsequent deadlock kills are not
converted into deadlock errors and transaction retry (this would be the
bug).



 Comments   
Comment by Kristian Nielsen [ 2015-04-23 ]

http://lists.askmonty.org/pipermail/commits/2015-April/007781.html

Generated at Thu Feb 08 07:24:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.