-
Type:
Bug
-
Status: Closed (View Workflow)
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: 10.1, 10.2.16, 10.2.23
-
Component/s: Galera, Replication
-
Labels:None
When changes to MyISAM table are replicated from async MariaDB 10.1.x master to a node in MariaDB 10.2.x Galera cluster, in GTID-based replication setup, replication eventually breaks with the following slave status:
MariaDB [test]> show slave status\G
|
*************************** 1. row ***************************
|
Slave_IO_State: Waiting for master to send event
|
Master_Host: ao756
|
Master_User: root
|
Master_Port: 3310
|
Connect_Retry: 60
|
Master_Log_File: ao756-bin.000004
|
Read_Master_Log_Pos: 59590494
|
Relay_Log_File: mysql-node1-relay-bin.000002
|
Relay_Log_Pos: 14872226
|
Relay_Master_Log_File: ao756-bin.000004
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: No
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 1942
|
Last_Error: Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0:
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 29774344
|
Relay_Log_Space: 44688691
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: NULL
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 0
|
Last_IO_Error:
|
Last_SQL_Errno: 1942
|
Last_SQL_Error: Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0:
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 1
|
Master_SSL_Crl:
|
Master_SSL_Crlpath:
|
Using_Gtid: Slave_Pos
|
Gtid_IO_Pos: 4100-1-550731
|
Replicate_Do_Domain_Ids:
|
Replicate_Ignore_Domain_Ids:
|
Parallel_Mode: conservative
|
SQL_Delay: 0
|
SQL_Remaining_Delay: NULL
|
Slave_SQL_Running_State:
|
1 row in set (0,02 sec)
|
|
MariaDB [test]> show variables like 'gtid%';\
|
+------------------------+---------------+
|
| Variable_name | Value |
|
+------------------------+---------------+
|
| gtid_binlog_pos | 4100-1-380354 |
|
| gtid_binlog_state | 4100-1-380354 |
|
| gtid_current_pos | 4100-1-380353 |
|
| gtid_domain_id | 4100 |
|
| gtid_ignore_duplicates | OFF |
|
| gtid_seq_no | 0 |
|
| gtid_slave_pos | 4100-1-380353 |
|
| gtid_strict_mode | OFF |
|
+------------------------+---------------+
|
8 rows in set (0,01 sec)
|
|
MariaDB [test]> select * from mysql.gtid_slave_pos;
|
+-----------+--------+-----------+--------+
|
| domain_id | sub_id | server_id | seq_no |
|
+-----------+--------+-----------+--------+
|
| 4100 | 599107 | 1 | 380352 |
|
| 4100 | 599108 | 1 | 380353 |
|
+-----------+--------+-----------+--------+
|
2 rows in set (0,06 sec)
|
|
MariaDB [test]> show create table mysql.gtid_slavee_pos\G
|
ERROR 1146 (42S02): Table 'mysql.gtid_slavee_pos' doesn't exist
|
MariaDB [test]> show create table mysql.gtid_slave_pos\G
|
*************************** 1. row ***************************
|
Table: gtid_slave_pos
|
Create Table: CREATE TABLE `gtid_slave_pos` (
|
`domain_id` int(10) unsigned NOT NULL,
|
`sub_id` bigint(20) unsigned NOT NULL,
|
`server_id` int(10) unsigned NOT NULL,
|
`seq_no` bigint(20) unsigned NOT NULL,
|
PRIMARY KEY (`domain_id`,`sub_id`)
|
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Replication slave GTID position'
|
1 row in set (0,05 sec)
|
|
MariaDB [test]> select version();
|
+---------------------+
|
| version() |
|
+---------------------+
|
| 10.2.23-MariaDB-log |
|
+---------------------+
|
1 row in set (0,06 sec)
|
In the error log we see:
...
|
2019-05-23 16:55:10 140069099460352 [Note] Slave SQL thread initialized, starting replication in log 'ao756-bin.000004' at position 28452044, relay log './mysql-node1-relay-bin.000002' position: 13549926; GTID position '4100-1-372797'
|
2019-05-23 16:55:10 140069099460352 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.ttmyis; Duplicate entry '161766' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log ao756-bin.000004, end_log_pos 28452166, Gtid 4100-1-372798, Internal MariaDB error code: 1062
|
2019-05-23 16:55:10 140069099460352 [Warning] WSREP: SQL statement was ineffective thd: 61940 buf: 126
|
schema: test
|
QUERY: COMMIT
|
=> Skipping replication
|
2019-05-23 16:55:10 140069099460352 [ERROR] Slave SQL: Error 'Got error 35 "Resource deadlock avoided" during COMMIT' on query. Default database: 'test'. Query: 'COMMIT', Gtid 4100-1-372800, Internal MariaDB error code: 1180
|
2019-05-23 16:55:10 140069099460352 [ERROR] Slave SQL: Node has dropped from cluster, Gtid 4100-1-372800, Internal MariaDB error code: 1047
|
...
|
2019-05-23 16:55:54 140069099460352 [Note] Slave SQL thread initialized, starting replication in log 'ao756-bin.000004' at position 29774344, relay log './mysql-node1-relay-bin.000002' position: 14872226; GTID position '4100-1-380353'
|
2019-05-23 16:55:54 140069099460352 [Warning] WSREP: BF applier failed to open_and_lock_tables: 0, fatal: 0 wsrep = (exec_mode: 0 conflict_state: 0 seqno: -1)
|
2019-05-23 16:55:54 140069099460352 [ERROR] Slave SQL: Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0: , Gtid 4100-1-380354, Internal MariaDB error code: 1942
|
2019-05-23 16:55:54 140069099460352 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'ao756-bin.000004' position 29774344; GTID position '4100-1-380353'
|
2019-05-23 16:55:54 140069099460352 [Note] Slave SQL thread exiting, replication stopped in log 'ao756-bin.000004' at position 29774344; GTID position '4100-1-380353'
|
...
|
To end up like this I've created a fresh MariaDB master started as follows:
bin/mysqld_safe --no-defaults --port=3310 --log-bin --gtid_domain_id=4100 --server_id=1 --binlog_format=row --sync_binlog=0 --innodb_flush_log_at_trx_commit=2 &
|
and set up also a local 2-node Galera cluster with nodes started as:
bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode1.cnf --wsrep-new-cluster &
|
bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode2.cnf &
|
while related configuration files are:
openxs@ao756:~/dbs/maria10.2$ cat /home/openxs/galera/mynode1.cnf
|
[mysqld]
|
datadir=/home/openxs/galera/node1
|
port=3306
|
socket=/tmp/mysql-node1.sock
|
pid-file=/tmp/mysql-node1.pid
|
log-error=/tmp/mysql-node1.err
|
binlog_format=ROW
|
innodb_autoinc_lock_mode=2
|
|
wsrep_on=ON
|
wsrep_provider=/usr/lib/libgalera_smm.so
|
wsrep_cluster_name = singlebox
|
wsrep_node_name = node1
|
wsrep_cluster_address=gcomm://127.0.0.1:4567,127.0.0.1:5020,127.0.0.1:6020?pc.wait_prim=no
|
|
log_bin
|
log_slave_updates=1
|
binlog_row_image=FULL
|
gtid_domain_id=4100
|
wsrep_gtid_domain_id=4100
|
wsrep_gtid_mode=ON
|
wsrep_restart_slave=1
|
server_id=2
|
slave-skip-errors=1062
|
innodb_flush_log_at_trx_commit=2
|
sync_binlog=0
|
openxs@ao756:~/dbs/maria10.2$ cat /home/openxs/galera/mynode2.cnf
|
[mysqld]
|
datadir=/home/openxs/galera/node2
|
port=3307
|
socket=/tmp/mysql-node2.sock
|
pid-file=/tmp/mysql-node2.pid
|
log-error=/tmp/mysql-node2.err
|
binlog_format=ROW
|
innodb_autoinc_lock_mode=2
|
|
wsrep_on=ON
|
wsrep_provider=/usr/lib/libgalera_smm.so
|
wsrep_cluster_name = singlebox
|
wsrep_node_name = node2
|
wsrep_cluster_address=gcomm://127.0.0.1:4567,127.0.0.1:5020,127.0.0.1:6020?pc.wait_prim=no
|
wsrep_provider_options = "base_port=5020;"
|
|
log_bin
|
log_slave_updates=1
|
binlog_row_image=FULL
|
gtid_domain_id=4100
|
wsrep_gtid_domain_id=4100
|
wsrep_gtid_mode=ON
|
wsrep_restart_slave=1
|
server_id=3
|
slave-skip-errors=1062
|
innodb_flush_log_at_trx_commit=2
|
sync_binlog=0
|
openxs@ao756:~/dbs/maria10.2$
|
The I've created two tables on master, in test database:
CREATE TABLE `ttinno` (
|
`id` int(11) NOT NULL AUTO_INCREMENT,
|
`c1` int(11) DEFAULT NULL,
|
PRIMARY KEY (`id`)
|
) ENGINE=InnoDB;
|
|
CREATE TABLE `ttmyis` (
|
`id` int(11) NOT NULL AUTO_INCREMENT,
|
`c1` int(11) DEFAULT NULL,
|
PRIMARY KEY (`id`)
|
) ENGINE=MyISAM;
|
|
insert into ttinno(c1) values (1), (2), (3);
|
insert into ttmyis(c1) values (1), (2), (3);
|
Then I pointed out slave on node1 of cluster to master:
set global gtid_slave_pos='4100-1-0';
|
change master to master_host='ao756', master_port=3310, master_user='root', master_use_gtid=slave_pos;
|
and started replication. After checking that initial stat is replicated to cluster I've executed the following mysqlslap commands pointing to async master:
bin/mysqlslap --concurrency=2 --create-schema=test --no-drop --number-of-queries=100000 --iterations=1 --query='insert into ttmyis(c1) values(rand()*100); insert into ttinno(c1) values(rand()*100);' -uroot --host=127.0.0.1 --port=3310 test
|
bin/mysqlslap --concurrency=2 --create-schema=test --no-drop --number-of-queries=100000 --iterations=1 --query='start transaction; update ttmyis set c1=100 where id = rand()*1000; insert into ttinno(c1) values(rand()*100); commit;' -uroot --host=127.0.0.1 --port=3310 test
|
bin/mysqlslap --concurrency=4 --create-schema=test --no-drop --number-of-queries=100000 --iterations=1 --query='start transaction; update ttinno set c1=100 where id = rand()*1000; insert into ttmyis(c1) values(rand()*100); commit;' -uroot --host=127.0.0.1 --port=3310 test
|
Second run was eventually cancelled as it was too slow. When the last mysqlslap completed I've checked node1 and noted slave is broken there in thee way explained above.
- relates to
-
MDEV-16936 Master and Galera Cluster as slave
-
- Open
-
-
MDEV-18641 "Resource deadlock avoided during COMMIT" when a node leaves the group gracefully
-
- Closed
-
-
MDEV-21096 Galera cluster crashed after starting async replication from master standlaone server
-
- Closed
-