Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.0.17
Description
I encountered 11 instances of the following in a binlog group commit along with one other transaction on a different table.
UPDATE variable SET value='a:0:{}'
|
WHERE ( (name = 'rules_event_whitelist') )
|
the slave settings where:
(replication not in gtid mode)
slave_transaction_retries=10 (default)
slave-parallel-threads=20
master had binlog_commit_wait_count=20
as such the replication was stopped and the error was in the log:
150701 2:01:33 [ERROR] Slave worker thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
|
150701 2:01:33 [ERROR] Slave SQL: Deadlock found when trying to get lock; try restarting transaction, Gtid 0-8-1270304033, Internal MariaDB error code: 1213
|
150701 2:01:33 [Warning] Slave: Connection was killed Error_code: 1927
|
150701 2:01:33 [Warning] Slave: Connection was killed Error_code: 1927
|
150701 2:01:33 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
|
150701 2:01:33 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.010502' position 38545619
|
a) Stopping replication after slave_transaction_retries has been achieved puts me in a much worse state than continuing to try, so is there any case where slave_transaction_retries is useful (apart from catching a coding error)?
b) is slave_transaction_retries = slave-parallel-threads sufficient to resolve this in the mean time? or do i need to set this to factorial(slave-parallel-threads) ?
I read though MDEV-7882 and it looked different. Does anything else post 10.0.17 release change the handling of this situation?
Attachments
Activity
As per IRC, the other important misunderstanding I had was slave_transaction_retries is the retries per transaction rather than the tries per group.
Nothing left here so may as well close.
reopen requested.
10 retries on 20 threads wasn't sufficient.
master binlog_format=MIXED
error log.
|
150918 16:30:29 [Note] Error reading relay log event: slave SQL thread was killed
|
150918 16:30:29 [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-20076-622004315, Internal MariaDB error code: 1964
|
150918 16:30:29 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
150918 16:30:29 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'tulwaurt005-mysql-bin.001132' position 1034473944; GTID position '1-20076-244008755,0-20076-622004314,3-20076-143051'
|
150918 16:30:29 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
150918 16:30:29 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'tulwaurt005-mysql-bin.001132' position 1034473944; GTID position '1-20076-244008755,0-20076-622004314,3-20076-143051'
|
150918 16:30:29 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
|
150918 16:30:29 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'tulwaurt005-mysql-bin.001132' position 1034473944; GTID position '1-20076-244008755,0-20076-622004314,3-20076-143051'
|
|
*************************** 1. row ***************************
|
Slave_IO_State: Waiting for master to send event
|
Master_Host: 172.16.194.76
|
Master_User: repl
|
Master_Port: 3310
|
Connect_Retry: 60
|
Master_Log_File: tulwaurt005-mysql-bin.001144
|
Read_Master_Log_Pos: 658256992
|
Relay_Log_File: tulwaurt008-relay-bin.000023
|
Relay_Log_Pos: 4
|
Relay_Master_Log_File: tulwaurt005-mysql-bin.001132
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: No
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 1964
|
Last_Error: Commit failed due to failure of an earlier commit on which this one depends
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 1034473944
|
Relay_Log_Space: 10499635818
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: NULL
|
Master_SSL_Verify_Server_Cert: No
|
|
log from sql_log_error plugin:
|
|
2015-09-18 16:22:17 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:22:17 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:22:17 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:22:17 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:22:17 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:22:17 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:23:29 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:23:30 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:23:30 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:24:26 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:24:26 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:24:26 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:24:26 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:25:05 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:25:05 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:25:29 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:25:29 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:25:30 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:25:30 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:26:26 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:26:26 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:26:26 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:26:26 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:26:26 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:26:26 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:28:21 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:28:21 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:28:54 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:28:54 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:28:54 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:28:54 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:28:54 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:28:54 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:29:38 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:29:38 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:29:38 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:29:38 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:29:57 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:29:57 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:29:57 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:29:57 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:29:57 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:29:57 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:29:57 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:29:57 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:29:57 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:29:57 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:30:05 [] @ [] ERROR 1927: Connection was killed : (null)
|
2015-09-18 16:30:05 [] @ [] ERROR 1213: Deadlock found when trying to get lock; try restarting transaction : (null)
|
2015-09-18 16:30:29 [] @ [] ERROR 1927: Connection was killed : COMMIT
|
2015-09-18 16:30:29 [] @ [] ERROR 1964: Commit failed due to failure of an earlier commit on which this one depends : (null)
|
2015-09-18 16:30:29 [] @ [] ERROR 1964: Commit failed due to failure of an earlier commit on which this one depends : (null)
|
2015-09-18 16:30:29 [] @ [] ERROR 1964: Commit failed due to failure of an earlier commit on which this one depends : (null)
|
2015-09-18 16:30:29 [] @ [] ERROR 1964: Commit failed due to failure of an earlier commit on which this one depends : (null)
|
2015-09-18 16:30:29 [] @ [] ERROR 1964: Commit failed due to failure of an earlier commit on which this one depends : (null)
|
2015-09-18 16:30:29 [] @ [] ERROR 1964: Commit failed due to failure of an earlier commit on which this one depends : (null)
|
2015-09-18 16:30:29 [] @ [] ERROR 1964: Commit failed due to failure of an earlier commit on which this one depends : (null)
|
MariaDB [(none)]> show global variables like '%slave%';
|
+-------------------------------+----------------------------------------------------+
|
| Variable_name | Value |
|
+-------------------------------+----------------------------------------------------+
|
| gtid_slave_pos | 1-20076-244008755,0-20076-640910476,3-20076-143051 |
|
| init_slave | |
|
| log_slave_updates | OFF |
|
| slave_compressed_protocol | ON |
|
| slave_ddl_exec_mode | IDEMPOTENT |
|
| slave_domain_parallel_threads | 15 |
|
| slave_exec_mode | STRICT |
|
| slave_load_tmpdir | /u01tmp/mariadb-10/tmp |
|
| slave_max_allowed_packet | 1073741824 |
|
| slave_net_timeout | 3600 |
|
| slave_parallel_max_queued | 131072 |
|
| slave_parallel_threads | 20 |
|
| slave_skip_errors | ALL |
|
| slave_sql_verify_checksum | ON |
|
| slave_transaction_retries | 10 |
|
| slave_type_conversions | |
|
| sql_slave_skip_counter | 0 |
|
|
| innodb_version | 5.6.23-72.1 |
|
| version | 10.0.19-MariaDB-log |
|
There are not really any details here to allow to reproduce or guess at what
the problem could be...
From the timestamps in the error log, it looks like the standard lock wait
timeout of 50 seconds? Are you sure this is not just some conflicting
transaction on another multi-source connection or a manual transaction?
What are the contents of the binlog in the failing group commit, and the
groups around it?
no multisource here. Binlogs on slave (log-slave-updates) show no local transaction.
It does look like the same problem as
MDEV-7882to me.Suppose that T1 and T2 happen to conflict on the slave. Before 10.0.18, we
would deadlock kill T2 when T1 goes to wait. But it was possible for thread
scheduling to let T2 run ahead and once again get the lock before T1. This
causes a new deadlock kill and new race, eventually hitting the 10 retries
limit.
In 10.0.18, it is fixed so that T2 will not retry until T1 (and any other
prior transactions) have finished their commit, avoiding repeated conflicts
and retries.
Until an upgrade to 10.0.18 can be done, maybe a workaround could be to
increase max retries to a really large value, say 10000 or
something. Hopefully thread scheduling will eventually allow T1 to get the
lock before T2. Conflicts between transactions should hopefully be rare in
10.0 parallel replication anyway.