Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Not a Bug
-
10.2.12, 10.2.32, 10.3(EOL)
-
None
Description
There is an async replication setup between two Galera clusters of 3 nodes each. In some cases SQL thread on slave cluster stops with error 1047. Error log content is the following:
...
|
2020-03-26 18:06:27 139923264952064 [Note] Master 'to_master_0': Slave SQL thread initialized, starting replication in log 'binlog.000009' at position 140233980, relay log '/var/lib/mybinlog/relaylog-to_master_0.000012' position: 126225898; GTID position '100-100-5514,200-200-524506'
|
2020-03-26 18:06:55 139923543996160 [Note] WSREP: Trying to continue unpaused monitor
|
2020-03-26 18:09:26 139923535603456 [Note] InnoDB: *** Priority TRANSACTION:
|
TRANSACTION 11656002, ACTIVE 0 sec starting index read
|
mysql tables in use 1, locked 1
|
MySQL thread id 1, OS thread handle 139923535603456, query id 6370090 Update_rows_log_event::find_row(5817624)
|
UPDATE impu SET user_state=0 WHERE id=40382
|
2020-03-26 18:09:26 139923535603456 [Note] InnoDB: *** Victim TRANSACTION:
|
TRANSACTION 11655993, ACTIVE 0 sec
|
, undo log entries 3
|
MySQL thread id 66, OS thread handle 139923264952064, query id 6370086 Unlocking tables
|
2020-03-26 18:09:26 139923535603456 [Note] InnoDB: *** WAITING FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 74 page no 94 n bits 720 index PRIMARY of table `bf_lock_test`.`impu` trx id 11655993 lock_mode X locks rec but not gap
|
Record lock, heap no 531 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
|
0: len 4; hex 80009dbe; asc ;;
|
1: len 6; hex 000000b1db39; asc 9;;
|
2: len 7; hex 13000001d813c6; asc ;;
|
3: len 1; hex 80; asc ;;
|
|
2020-03-26 18:09:26 139923535603456 [Note] InnoDB: SQL1: UPDATE impu SET user_state=0 WHERE id=40382��|^d
|
2020-03-26 18:09:26 139923535603456 [Note] InnoDB: SQL2: NULL
|
2020-03-26 18:09:26 139923535603456 [Note] WSREP: cluster conflict due to high priority abort for threads:
|
2020-03-26 18:09:26 139923535603456 [Note] WSREP: Winning thread:
|
THD: 1, mode: applier, state: executing, conflict: no conflict, seqno: 5817624
|
SQL: UPDATE impu SET user_state=0 WHERE id=40382��|^d
|
2020-03-26 18:09:26 139923535603456 [Note] WSREP: Victim thread:
|
THD: 66, mode: local, state: committing, conflict: no conflict, seqno: -1
|
SQL: NULL
|
2020-03-26 18:09:26 139923264952064 [ERROR] Master 'to_master_0': Slave SQL: Node has dropped from cluster, Gtid 200-200-549803, Internal MariaDB error code: 1047
|
2020-03-26 18:09:26 139923264952064 [Note] Master 'to_master_0': Slave SQL thread exiting, replication stopped in log 'binlog.000009' at position 146989842; GTID position '100-100-5514,200-200-549802'
|
2020-03-26 18:09:26 139923264952064 [Note] Master 'to_master_0': WSREP: Slave error due to node temporarily non-primarySQL slave will continue
|
2020-03-26 18:09:26 139923264952064 [Note] Master 'to_master_0': WSREP: slave restart: 3
|
2020-03-26 18:09:26 139923264952064 [Note] Master 'to_master_0': WSREP: ready state reached
|
2020-03-26 18:09:26 139923264952064 [Note] Master 'to_master_0': Slave SQL thread initialized, starting replication in log 'binlog.000009' at position 146989842, relay log '/var/lib/mybinlog/relaylog-to_master_0.000012' position: 132322234; GTID position '100-100-5514,200-200-549802'
|
2020-03-26 18:11:38 139923543996160 [Note] WSREP: Trying to continue unpaused monitor
|
2020-03-26 18:13:53 139923543996160 [Note] WSREP: Trying to continue unpaused monitor
|
2020-03-26 18:23:45 139923535603456 [Note] InnoDB: *** Priority TRANSACTION:
|
TRANSACTION 13221641, ACTIVE 0 sec starting index read
|
mysql tables in use 1, locked 1
|
MySQL thread id 1, OS thread handle 139923535603456, query id 7230511 Update_rows_log_event::find_row(6600241)
|
UPDATE impu SET user_state=1 WHERE id=20997
|
2020-03-26 18:23:45 139923535603456 [Note] InnoDB: *** Victim TRANSACTION:
|
TRANSACTION 13221628, ACTIVE 0 sec
|
, undo log entries 3
|
MySQL thread id 67, OS thread handle 139923264952064, query id 7230504 Unlocking tables
|
2020-03-26 18:23:45 139923535603456 [Note] InnoDB: *** WAITING FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 74 page no 64 n bits 720 index PRIMARY of table `bf_lock_test`.`impu` trx id 13221628 lock_mode X locks rec but not gap
|
Record lock, heap no 586 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
|
0: len 4; hex 80005205; asc R ;;
|
1: len 6; hex 000000c9befc; asc ;;
|
2: len 7; hex 6c000002b40ef4; asc l ;;
|
3: len 1; hex 81; asc ;;
|
|
2020-03-26 18:23:45 139923535603456 [Note] InnoDB: SQL1: UPDATE impu SET user_state=1 WHERE id=20997!�|^d
|
2020-03-26 18:23:45 139923535603456 [Note] InnoDB: SQL2: NULL
|
2020-03-26 18:23:45 139923535603456 [Note] WSREP: cluster conflict due to high priority abort for threads:
|
2020-03-26 18:23:45 139923535603456 [Note] WSREP: Winning thread:
|
THD: 1, mode: applier, state: executing, conflict: no conflict, seqno: 6600241
|
SQL: UPDATE impu SET user_state=1 WHERE id=20997!�|^d
|
2020-03-26 18:23:45 139923535603456 [Note] WSREP: Victim thread:
|
THD: 67, mode: local, state: committing, conflict: no conflict, seqno: -1
|
SQL: NULL
|
2020-03-26 18:23:45 139923264952064 [ERROR] Master 'to_master_0': Slave SQL: Node has dropped from cluster, Gtid 200-200-626547, Internal MariaDB error code: 1047
|
2020-03-26 18:23:45 139923264952064 [Note] Master 'to_master_0': Slave SQL thread exiting, replication stopped in log 'binlog.000009' at position 167400561; GTID position '100-100-5514,200-200-626546'
|
2020-03-26 18:23:45 139923264952064 [Note] Master 'to_master_0': WSREP: Slave error due to node temporarily non-primarySQL slave will continue
|
2020-03-26 18:23:45 139923264952064 [Note] Master 'to_master_0': WSREP: slave restart: 3
|
2020-03-26 18:23:45 139923264952064 [Note] Master 'to_master_0': WSREP: ready state reached
|
2020-03-26 18:23:45 139923264952064 [Note] Master 'to_master_0': Slave SQL thread initialized, starting replication in log 'binlog.000009' at position 167400561, relay log '/var/lib/mybinlog/relaylog-to_master_0.000012' position: 150809766; GTID position '100-100-5514,200-200-626546'
|
...
|
So, normally SQL thread restarts to (maybe) hit a conflict on some other row (see different id values in messages above. But in some cases it stops and replication does not continue. Like this:
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: *** Priority TRANSACTION:
|
TRANSACTION 360061759, ACTIVE 0 sec starting index read
|
mysql tables in use 1, locked 1
|
MySQL thread id 1, OS thread handle 139923535603456, query id 195459563 Update_rows_log_event::find_row(179994284)
|
UPDATE impu SET user_state=0 WHERE id=46832
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: *** Victim TRANSACTION:
|
TRANSACTION 360061748, ACTIVE 0 sec
|
, undo log entries 3
|
MySQL thread id 1830, OS thread handle 139923258246912, query id 195459555 Unlocking tables
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: *** WAITING FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 74 page no 104 n bits 720 index PRIMARY of table `bf_lock_test`.`impu` trx id 360061748 lock_mode X locks rec but not gap
|
Record lock, heap no 501 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
|
0: len 4; hex 8000b6f0; asc ;;
|
1: len 6; hex 000015761b34; asc v 4;;
|
2: len 7; hex 7f0000028920e5; asc ;;
|
3: len 1; hex 80; asc ;;
|
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: SQL1: UPDATE impu SET user_state=0 WHERE id=46832K�^d
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: SQL2: NULL
|
2020-03-29 0:48:27 139923535603456 [Note] WSREP: cluster conflict due to high priority abort for threads:
|
2020-03-29 0:48:27 139923535603456 [Note] WSREP: Winning thread:
|
THD: 1, mode: applier, state: executing, conflict: no conflict, seqno: 179994284
|
SQL: UPDATE impu SET user_state=0 WHERE id=46832K�^d
|
2020-03-29 0:48:27 139923535603456 [Note] WSREP: Victim thread:
|
THD: 1830, mode: local, state: committing, conflict: no conflict, seqno: -1
|
SQL: NULL
|
2020-03-29 0:48:27 139923258246912 [ERROR] Master 'to_master_0': Slave SQL: Node has dropped from cluster, Gtid 200-200-15220900, Internal MariaDB error code: 1047
|
2020-03-29 0:48:27 139923258246912 [Note] Master 'to_master_0': Slave SQL thread exiting, replication stopped in log 'binlog.000002' at position 353365130; GTID position '100-100-8368143,200-200-15220899'
|
2020-03-29 0:48:27 139923258246912 [Note] Master 'to_master_0': WSREP: Slave error due to node temporarily non-primarySQL slave will continue
|
2020-03-29 0:48:27 139923258246912 [Note] Master 'to_master_0': WSREP: slave restart: 3
|
2020-03-29 0:48:27 139923258246912 [Note] Master 'to_master_0': WSREP: ready state reached
|
2020-03-29 0:48:27 139923258246912 [Note] Master 'to_master_0': Slave SQL thread initialized, starting replication in log 'binlog.000002' at position 353365130, relay log '/var/lib/mybinlog/relaylog-to_master_0.000004' position: 313622044; GTID position '100-100-8368143,200-200-15220899'
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: *** Priority TRANSACTION:
|
TRANSACTION 360061781, ACTIVE 0 sec starting index read
|
mysql tables in use 1, locked 1
|
MySQL thread id 1, OS thread handle 139923535603456, query id 195459577 Update_rows_log_event::find_row(179994293)
|
UPDATE impu SET user_state=1 WHERE id=46841
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: *** Victim TRANSACTION:
|
TRANSACTION 360061773, ACTIVE 0 sec
|
, undo log entries 3
|
MySQL thread id 1838, OS thread handle 139923258246912, query id 195459574 Unlocking tables
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: *** WAITING FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 74 page no 104 n bits 720 index PRIMARY of table `bf_lock_test`.`impu` trx id 360061773 lock_mode X locks rec but not gap
|
Record lock, heap no 510 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
|
0: len 4; hex 8000b6f9; asc ;;
|
1: len 6; hex 000015761b4d; asc v M;;
|
2: len 7; hex 0e000001ce0547; asc G;;
|
3: len 1; hex 81; asc ;;
|
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: SQL1: UPDATE impu SET user_state=1 WHERE id=46841K�^d
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: SQL2: NULL
|
2020-03-29 0:48:27 139923535603456 [Note] WSREP: cluster conflict due to high priority abort for threads:
|
2020-03-29 0:48:27 139923535603456 [Note] WSREP: Winning thread:
|
THD: 1, mode: applier, state: executing, conflict: no conflict, seqno: 179994293
|
SQL: UPDATE impu SET user_state=1 WHERE id=46841K�^d
|
2020-03-29 0:48:27 139923535603456 [Note] WSREP: Victim thread:
|
THD: 1838, mode: local, state: committing, conflict: no conflict, seqno: -1
|
SQL: NULL
|
2020-03-29 0:48:27 139923258246912 [ERROR] Master 'to_master_0': Slave SQL: Node has dropped from cluster, Gtid 200-200-15220902, Internal MariaDB error code: 1047
|
2020-03-29 0:48:27 139923258246912 [Note] Master 'to_master_0': Slave SQL thread exiting, replication stopped in log 'binlog.000002' at position 353365612; GTID position '100-100-8368143,200-200-15220901'
|
2020-03-29 0:48:27 139923258246912 [Note] Master 'to_master_0': WSREP: Slave error due to node temporarily non-primarySQL slave will continue
|
2020-03-29 0:48:27 139923258246912 [Note] Master 'to_master_0': WSREP: slave restart: 3
|
2020-03-29 0:48:27 139923258246912 [Note] Master 'to_master_0': WSREP: ready state reached
|
2020-03-29 0:48:27 139923258246912 [Note] Master 'to_master_0': Slave SQL thread initialized, starting replication in log 'binlog.000002' at position 353365612, relay log '/var/lib/mybinlog/relaylog-to_master_0.000004' position: 313622526; GTID position '100-100-8368143,200-200-15220901'
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: *** Priority TRANSACTION:
|
TRANSACTION 360061796, ACTIVE 0 sec starting index read
|
mysql tables in use 1, locked 1
|
MySQL thread id 1, OS thread handle 139923535603456, query id 195459587 Update_rows_log_event::find_row(179994299)
|
UPDATE impu SET user_state=0 WHERE id=46840
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: *** Victim TRANSACTION:
|
TRANSACTION 360061795, ACTIVE 0 sec starting index read
|
mysql tables in use 1, locked 1
|
, undo log entries 1
|
MySQL thread id 1839, OS thread handle 139923258246912, query id 195459586
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: *** WAITING FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 74 page no 104 n bits 720 index PRIMARY of table `bf_lock_test`.`impu` trx id 360061795 lock_mode X locks rec but not gap
|
Record lock, heap no 509 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
|
0: len 4; hex 8000b6f8; asc ;;
|
1: len 6; hex 000015761b4f; asc v O;;
|
2: len 7; hex 0f000001c80803; asc ;;
|
3: len 1; hex 81; asc ;;
|
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: SQL1: UPDATE impu SET user_state=0 WHERE id=46840K�^d
|
2020-03-29 0:48:27 139923535603456 [Note] InnoDB: SQL2: NULL
|
2020-03-29 0:48:27 139923535603456 [Note] WSREP: cluster conflict due to high priority abort for threads:
|
2020-03-29 0:48:27 139923535603456 [Note] WSREP: Winning thread:
|
THD: 1, mode: applier, state: executing, conflict: no conflict, seqno: 179994299
|
SQL: UPDATE impu SET user_state=0 WHERE id=46840K�^d
|
2020-03-29 0:48:27 139923535603456 [Note] WSREP: Victim thread:
|
THD: 1839, mode: local, state: executing, conflict: no conflict, seqno: -1
|
SQL: NULL
|
2020-03-29 0:48:27 139923258246912 [Note] Master 'to_master_0': Slave SQL thread exiting, replication stopped in log 'binlog.000002' at position 353366094; GTID position '100-100-8368143,200-200-15220903'
|
2020-03-29 0:51:31 139923543996160 [Note] WSREP: Trying to continue unpaused monitor
|
2020-03-29 0:51:31 139923543996160 [Note] WSREP: Trying to continue unpaused monitor
|
2020-03-29 0:55:08 139923543996160 [Note] WSREP: Trying to continue unpaused monitor
|
2020-03-29 1:06:04 139923543996160 [Note] WSREP: Trying to continue unpaused monitor
|
2020-03-29 1:08:24 139923543996160 [Note] WSREP: Trying to continue unpaused monitor
|
2020-03-29 1:34:54 139923543996160 [Note] WSREP: Trying to continue unpaused monitor
|
So, async replication had to be monitored and restarted "manually". After that:
2020-03-29 13:38:01 139923259475712 [Note] Master 'to_master_0': WSREP: ready state reached
|
2020-03-29 13:38:01 139923259475712 [Note] Master 'to_master_0': Slave SQL thread initialized, starting replication in log 'binlog.000002' at position 353366094, relay log '/var/lib/mybinlog/relaylog-to_master_0.000004' position: 313623008; GTID position '100-100-8368143,200-200-15220903'
|
Question is: why with wsrep_restart_slave = 1 there are cases when slave restart does not happen automatically? It looks like a bug.