Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.2.16, 10.2.23, 10.1(EOL)
-
None
Description
When changes to MyISAM table are replicated from async MariaDB 10.1.x master to a node in MariaDB 10.2.x Galera cluster, in GTID-based replication setup, replication eventually breaks with the following slave status:
MariaDB [test]> show slave status\G
|
*************************** 1. row ***************************
|
Slave_IO_State: Waiting for master to send event
|
Master_Host: ao756
|
Master_User: root
|
Master_Port: 3310
|
Connect_Retry: 60
|
Master_Log_File: ao756-bin.000004
|
Read_Master_Log_Pos: 59590494
|
Relay_Log_File: mysql-node1-relay-bin.000002
|
Relay_Log_Pos: 14872226
|
Relay_Master_Log_File: ao756-bin.000004
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: No
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 1942
|
Last_Error: Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0:
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 29774344
|
Relay_Log_Space: 44688691
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: NULL
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 0
|
Last_IO_Error:
|
Last_SQL_Errno: 1942
|
Last_SQL_Error: Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0:
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 1
|
Master_SSL_Crl:
|
Master_SSL_Crlpath:
|
Using_Gtid: Slave_Pos
|
Gtid_IO_Pos: 4100-1-550731
|
Replicate_Do_Domain_Ids:
|
Replicate_Ignore_Domain_Ids:
|
Parallel_Mode: conservative
|
SQL_Delay: 0
|
SQL_Remaining_Delay: NULL
|
Slave_SQL_Running_State:
|
1 row in set (0,02 sec)
|
|
MariaDB [test]> show variables like 'gtid%';\
|
+------------------------+---------------+
|
| Variable_name | Value |
|
+------------------------+---------------+
|
| gtid_binlog_pos | 4100-1-380354 |
|
| gtid_binlog_state | 4100-1-380354 |
|
| gtid_current_pos | 4100-1-380353 |
|
| gtid_domain_id | 4100 |
|
| gtid_ignore_duplicates | OFF |
|
| gtid_seq_no | 0 |
|
| gtid_slave_pos | 4100-1-380353 |
|
| gtid_strict_mode | OFF |
|
+------------------------+---------------+
|
8 rows in set (0,01 sec)
|
|
MariaDB [test]> select * from mysql.gtid_slave_pos;
|
+-----------+--------+-----------+--------+
|
| domain_id | sub_id | server_id | seq_no |
|
+-----------+--------+-----------+--------+
|
| 4100 | 599107 | 1 | 380352 |
|
| 4100 | 599108 | 1 | 380353 |
|
+-----------+--------+-----------+--------+
|
2 rows in set (0,06 sec)
|
|
MariaDB [test]> show create table mysql.gtid_slavee_pos\G
|
ERROR 1146 (42S02): Table 'mysql.gtid_slavee_pos' doesn't exist
|
MariaDB [test]> show create table mysql.gtid_slave_pos\G
|
*************************** 1. row ***************************
|
Table: gtid_slave_pos
|
Create Table: CREATE TABLE `gtid_slave_pos` (
|
`domain_id` int(10) unsigned NOT NULL,
|
`sub_id` bigint(20) unsigned NOT NULL,
|
`server_id` int(10) unsigned NOT NULL,
|
`seq_no` bigint(20) unsigned NOT NULL,
|
PRIMARY KEY (`domain_id`,`sub_id`)
|
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Replication slave GTID position'
|
1 row in set (0,05 sec)
|
|
MariaDB [test]> select version();
|
+---------------------+
|
| version() |
|
+---------------------+
|
| 10.2.23-MariaDB-log |
|
+---------------------+
|
1 row in set (0,06 sec)
|
In the error log we see:
...
|
2019-05-23 16:55:10 140069099460352 [Note] Slave SQL thread initialized, starting replication in log 'ao756-bin.000004' at position 28452044, relay log './mysql-node1-relay-bin.000002' position: 13549926; GTID position '4100-1-372797'
|
2019-05-23 16:55:10 140069099460352 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.ttmyis; Duplicate entry '161766' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log ao756-bin.000004, end_log_pos 28452166, Gtid 4100-1-372798, Internal MariaDB error code: 1062
|
2019-05-23 16:55:10 140069099460352 [Warning] WSREP: SQL statement was ineffective thd: 61940 buf: 126
|
schema: test
|
QUERY: COMMIT
|
=> Skipping replication
|
2019-05-23 16:55:10 140069099460352 [ERROR] Slave SQL: Error 'Got error 35 "Resource deadlock avoided" during COMMIT' on query. Default database: 'test'. Query: 'COMMIT', Gtid 4100-1-372800, Internal MariaDB error code: 1180
|
2019-05-23 16:55:10 140069099460352 [ERROR] Slave SQL: Node has dropped from cluster, Gtid 4100-1-372800, Internal MariaDB error code: 1047
|
...
|
2019-05-23 16:55:54 140069099460352 [Note] Slave SQL thread initialized, starting replication in log 'ao756-bin.000004' at position 29774344, relay log './mysql-node1-relay-bin.000002' position: 14872226; GTID position '4100-1-380353'
|
2019-05-23 16:55:54 140069099460352 [Warning] WSREP: BF applier failed to open_and_lock_tables: 0, fatal: 0 wsrep = (exec_mode: 0 conflict_state: 0 seqno: -1)
|
2019-05-23 16:55:54 140069099460352 [ERROR] Slave SQL: Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0: , Gtid 4100-1-380354, Internal MariaDB error code: 1942
|
2019-05-23 16:55:54 140069099460352 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'ao756-bin.000004' position 29774344; GTID position '4100-1-380353'
|
2019-05-23 16:55:54 140069099460352 [Note] Slave SQL thread exiting, replication stopped in log 'ao756-bin.000004' at position 29774344; GTID position '4100-1-380353'
|
...
|
To end up like this I've created a fresh MariaDB master started as follows:
bin/mysqld_safe --no-defaults --port=3310 --log-bin --gtid_domain_id=4100 --server_id=1 --binlog_format=row --sync_binlog=0 --innodb_flush_log_at_trx_commit=2 &
|
and set up also a local 2-node Galera cluster with nodes started as:
bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode1.cnf --wsrep-new-cluster &
|
bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode2.cnf &
|
while related configuration files are:
openxs@ao756:~/dbs/maria10.2$ cat /home/openxs/galera/mynode1.cnf
|
[mysqld]
|
datadir=/home/openxs/galera/node1
|
port=3306
|
socket=/tmp/mysql-node1.sock
|
pid-file=/tmp/mysql-node1.pid
|
log-error=/tmp/mysql-node1.err
|
binlog_format=ROW
|
innodb_autoinc_lock_mode=2
|
|
wsrep_on=ON
|
wsrep_provider=/usr/lib/libgalera_smm.so
|
wsrep_cluster_name = singlebox
|
wsrep_node_name = node1
|
wsrep_cluster_address=gcomm://127.0.0.1:4567,127.0.0.1:5020,127.0.0.1:6020?pc.wait_prim=no
|
|
log_bin
|
log_slave_updates=1
|
binlog_row_image=FULL
|
gtid_domain_id=4100
|
wsrep_gtid_domain_id=4100
|
wsrep_gtid_mode=ON
|
wsrep_restart_slave=1
|
server_id=2
|
slave-skip-errors=1062
|
innodb_flush_log_at_trx_commit=2
|
sync_binlog=0
|
openxs@ao756:~/dbs/maria10.2$ cat /home/openxs/galera/mynode2.cnf
|
[mysqld]
|
datadir=/home/openxs/galera/node2
|
port=3307
|
socket=/tmp/mysql-node2.sock
|
pid-file=/tmp/mysql-node2.pid
|
log-error=/tmp/mysql-node2.err
|
binlog_format=ROW
|
innodb_autoinc_lock_mode=2
|
|
wsrep_on=ON
|
wsrep_provider=/usr/lib/libgalera_smm.so
|
wsrep_cluster_name = singlebox
|
wsrep_node_name = node2
|
wsrep_cluster_address=gcomm://127.0.0.1:4567,127.0.0.1:5020,127.0.0.1:6020?pc.wait_prim=no
|
wsrep_provider_options = "base_port=5020;"
|
|
log_bin
|
log_slave_updates=1
|
binlog_row_image=FULL
|
gtid_domain_id=4100
|
wsrep_gtid_domain_id=4100
|
wsrep_gtid_mode=ON
|
wsrep_restart_slave=1
|
server_id=3
|
slave-skip-errors=1062
|
innodb_flush_log_at_trx_commit=2
|
sync_binlog=0
|
openxs@ao756:~/dbs/maria10.2$
|
The I've created two tables on master, in test database:
CREATE TABLE `ttinno` (
|
`id` int(11) NOT NULL AUTO_INCREMENT,
|
`c1` int(11) DEFAULT NULL,
|
PRIMARY KEY (`id`)
|
) ENGINE=InnoDB;
|
|
CREATE TABLE `ttmyis` (
|
`id` int(11) NOT NULL AUTO_INCREMENT,
|
`c1` int(11) DEFAULT NULL,
|
PRIMARY KEY (`id`)
|
) ENGINE=MyISAM;
|
|
insert into ttinno(c1) values (1), (2), (3);
|
insert into ttmyis(c1) values (1), (2), (3);
|
Then I pointed out slave on node1 of cluster to master:
set global gtid_slave_pos='4100-1-0';
|
change master to master_host='ao756', master_port=3310, master_user='root', master_use_gtid=slave_pos;
|
and started replication. After checking that initial stat is replicated to cluster I've executed the following mysqlslap commands pointing to async master:
bin/mysqlslap --concurrency=2 --create-schema=test --no-drop --number-of-queries=100000 --iterations=1 --query='insert into ttmyis(c1) values(rand()*100); insert into ttinno(c1) values(rand()*100);' -uroot --host=127.0.0.1 --port=3310 test
|
bin/mysqlslap --concurrency=2 --create-schema=test --no-drop --number-of-queries=100000 --iterations=1 --query='start transaction; update ttmyis set c1=100 where id = rand()*1000; insert into ttinno(c1) values(rand()*100); commit;' -uroot --host=127.0.0.1 --port=3310 test
|
bin/mysqlslap --concurrency=4 --create-schema=test --no-drop --number-of-queries=100000 --iterations=1 --query='start transaction; update ttinno set c1=100 where id = rand()*1000; insert into ttmyis(c1) values(rand()*100); commit;' -uroot --host=127.0.0.1 --port=3310 test
|
Second run was eventually cancelled as it was too slow. When the last mysqlslap completed I've checked node1 and noted slave is broken there in thee way explained above.
Attachments
Issue Links
- relates to
-
MDEV-16936 Master and Galera Cluster as slave
- Closed
-
MDEV-18641 "Resource deadlock avoided during COMMIT" when a node leaves the group gracefully
- Closed
-
MDEV-21096 Galera cluster crashed after starting async replication from master standlaone server
- Closed