Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.6.2, 10.7(EOL), 10.8(EOL), 10.9(EOL)
-
None
Description
- Test config
- Primary(master) : 192.168.254.24
- Primary(master) : 192.168.254.25
- MaxScale : 192.168.254.26
- test table schema and query.
- max client sessions : 5000 concurrent user
- workload generator and test project file : Jmeter [^sysbench_insert_only_10.6.jmx]
MariaDB [test]> show create table au_test\G
|
*************************** 1. row *************************** |
Table: au_test
|
Create Table: CREATE TABLE `au_test` (
|
`id` bigint(20) NOT NULL AUTO_INCREMENT, |
`b` varchar(10) NOT NULL COMMENT '@@hostname', |
`c` datetime DEFAULT NULL,
|
`d` datetime DEFAULT NULL,
|
`e` bigint(20) DEFAULT NULL, |
`f` varchar(500) DEFAULT NULL, |
`g` varchar(100) DEFAULT NULL, |
PRIMARY KEY (`id`)
|
) ENGINE=InnoDB AUTO_INCREMENT=457948 DEFAULT CHARSET=latin1; |
|
insert into au_test (b,c,d,e,f,g) values (@@hostname,now(), now(), last_insert_id(), (select variable_value from information_schema.session_status where variable_name = 'Rpl_semi_sync_slave_status'), cast(@@gtid_binlog_state as varchar(100))); |
Once master joins as a master once again after `kill -9 $(pidof mariadbd)` against master node, slave status changes to `running` instead of 'slave, running'
Here is the output of `maxctrl list status`:
|
{"timestamp":"2022-05-03 14:26:04.907","command":"maxctrl list servers","status":true,"output":" |
┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬─────────────┐
|
│ Server │ Address │ Port │ Connections │ State │ GTID │
|
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
|
│ primary │ 192.168.254.24 │ 3306 │ 4195 │ Master, Running │ 0-1-7533546 │ |
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
|
│ replica │ 192.168.254.25 │ 3306 │ 4072 │ Slave, Running │ 0-2-7496224 │ |
└─────────┴────────────────┴──────┴─────────────┴─────────────────┴─────────────┘
|
|
_this is the time when master server was forcefully killed:
|
----------------------------------------------------------------
|
[root@primary my.cnf.d]# date |
Tue May 3 14:26:04 KST 2022 |
[root@primary my.cnf.d]# kill -9 $(pidof mariadbd) |
----------------------------------------------------------------_
|
|
{"timestamp":"2022-05-03 14:26:07.547","command":"maxctrl list servers","status":true,"output":" |
┌─────────┬────────────────┬──────┬─────────────┬────────────────┬─────────────┐
|
│ Server │ Address │ Port │ Connections │ State │ GTID │
|
├─────────┼────────────────┼──────┼─────────────┼────────────────┼─────────────┤
|
│ primary │ 192.168.254.24 │ 3306 │ 0 │ Down │ 0-1-7533546 │ |
├─────────┼────────────────┼──────┼─────────────┼────────────────┼─────────────┤
|
│ replica │ 192.168.254.25 │ 3306 │ 4303 │ Slave, Running │ 0-2-7496224 │ |
└─────────┴────────────────┴──────┴─────────────┴────────────────┴─────────────┘
|
|
|
{"timestamp":"2022-05-03 14:26:10.969","command":"maxctrl list servers","status":true,"output":" |
┌─────────┬────────────────┬──────┬─────────────┬────────────────┬─────────────┐
|
│ Server │ Address │ Port │ Connections │ State │ GTID │
|
├─────────┼────────────────┼──────┼─────────────┼────────────────┼─────────────┤
|
│ primary │ 192.168.254.24 │ 3306 │ 0 │ Down │ 0-1-7533546 │ |
├─────────┼────────────────┼──────┼─────────────┼────────────────┼─────────────┤
|
│ replica │ 192.168.254.25 │ 3306 │ 4344 │ Slave, Running │ 0-2-7496224 │ |
└─────────┴────────────────┴──────┴─────────────┴────────────────┴─────────────┘
|
|
|
{"timestamp":"2022-05-03 14:26:13.606","command":"maxctrl list servers","status":true,"output":" => master was successfully rejoin as master again. |
┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬─────────────┐
|
│ Server │ Address │ Port │ Connections │ State │ GTID │
|
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
|
│ primary │ 192.168.254.24 │ 3306 │ 2275 │ Master, Running │ 0-1-7534828 │ |
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
|
│ replica │ 192.168.254.25 │ 3306 │ 4344 │ Slave, Running │ 0-2-7496224 │ |
└─────────┴────────────────┴──────┴─────────────┴─────────────────┴─────────────┘
|
|
|
*{"timestamp":"2022-05-03 14:26:15.677","command":"maxctrl list servers","status":true,"output":" => At this point Slave got changed into Running state due to Duplicate key error. |
┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬─────────────┐
|
│ Server │ Address │ Port │ Connections │ State │ GTID │
|
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
|
│ primary │ 192.168.254.24 │ 3306 │ 3069 │ Master, Running │ 0-1-7534828 │ |
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
|
│ replica │ 192.168.254.25 │ 3306 │ 0 │ Running │ 0-1-7496290 │ |
└─────────┴────────────────┴──────┴─────────────┴─────────────────┴─────────────┘
|
*
|
|
{"timestamp":"2022-05-03 14:26:26.720","command":"maxctrl list servers","status":true,"output":" |
┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬─────────────┐
|
│ Server │ Address │ Port │ Connections │ State │ GTID │
|
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
|
│ primary │ 192.168.254.24 │ 3306 │ 2927 │ Master, Running │ 0-1-7534828 │ |
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼─────────────┤
|
│ replica │ 192.168.254.25 │ 3306 │ 0 │ Running │ 0-1-7496290 │ |
└─────────┴────────────────┴──────┴─────────────┴─────────────────┴─────────────┘
|
I've run this test for quite some time, but still hitting this error quite a lot.
-slave server error log
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 816 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 1577 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 643 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 981 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 812 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 1398 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 803 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 857 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 833 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 894 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 1788 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 846 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 879 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 1118 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 490 |
2022-05-03 14:25:50 0 [Note] Rolled back transaction with xid 1330 |
2022-05-03 14:25:50 0 [Note] Successfully truncated binlog file:./mariadb-bin.000007 from previous file size 1358 to pos:360 to remove transactions starting from GTID 0-2-7496224 |
2022-05-03 14:25:50 0 [Note] Server socket created on IP: '0.0.0.0'. |
2022-05-03 14:25:50 0 [Warning] 'proxies_priv' entry '@% root@node1' ignored in --skip-name-resolve mode. |
2022-05-03 14:25:50 0 [Note] Reading of all Master_info entries succeeded |
2022-05-03 14:25:50 0 [Note] Added new Master_info '' to hash table |
2022-05-03 14:25:50 0 [Note] /usr/sbin/mariadbd: ready for connections. |
Version: '10.6.7-3-MariaDB-enterprise-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Enterprise Server |
2022-05-03 14:25:53 7 [Note] Master connection name: '' Master_info_file: 'master.info' Relay_info_file: 'relay-log.info' |
2022-05-03 14:25:53 7 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='192.168.254.24', master_port='3306', master_log_file='', master_log_pos='4'. |
2022-05-03 14:25:53 7 [Note] Previous Using_Gtid=No. New Using_Gtid=Current_Pos |
2022-05-03 14:25:53 10 [Note] Slave I/O thread: Start semi-sync replication to master 'repl@192.168.254.24:3306' in log '' at position 4 |
2022-05-03 14:25:53 11 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './mariadb-relay-bin.000001' position: 4; GTID position '0-2-7496223' |
2022-05-03 14:25:53 10 [Note] Slave I/O thread: connected to master 'repl@192.168.254.24:3306',replication starts at GTID position '0-2-7496223' |
2022-05-03 14:26:04 10 [ERROR] Error reading packet from server: Lost connection to server during query (server_errno=2013) |
2022-05-03 14:26:04 10 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mariadb-bin.000021' at position 41641390; GTID position '0-1-7534857' |
2022-05-03 14:26:04 10 [ERROR] Slave I/O: error reconnecting to master 'repl@192.168.254.24:3306' - retry-time: 60 maximum-retries: 100000 message: Can't connect to server on '192.168.254.24' (111 "Connection refused"), Internal MariaDB error code: 2003 |
2022-05-03 14:26:13 11 [Warning] Timeout waiting for reply of binlog (file: mariadb-bin.000009, pos: 904), semi-sync up to file , position 0. |
2022-05-03 14:26:13 11 [Note] Semi-sync replication switched OFF. |
2022-05-03 14:26:13 11 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.au_test; Duplicate entry '417427' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mariadb-bin.000021, end_log_pos 22320629, Gtid 0-1-7496291, Internal MariaDB error code: 1062 |
2022-05-03 14:26:13 11 [Warning] Slave: Duplicate entry '417427' for key 'PRIMARY' Error_code: 1062 |
2022-05-03 14:26:13 11 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000021' position 22320161; GTID position '0-1-7496290' |
2022-05-03 14:26:13 11 [Note] Slave SQL thread exiting, replication stopped in log 'mariadb-bin.000021' at position 22320161; GTID position '0-1-7496290', master: 192.168.254.24:3306 |
2022-05-03 14:27:04 10 [Note] Slave: connected to master 'repl@192.168.254.24:3306',replication resumed in log 'mariadb-bin.000021' at position 41641390 |
2022-05-03 14:31:14 2854 [Warning] Aborted connection 2854 to db: 'test' user: 'allen' host: '192.168.254.26' (Got timeout reading communication packets) |
2022-05-03 14:31:14 2876 [Warning] Aborted connection 2876 to db: 'test' user: 'allen' host: '192.168.254.26' (Got timeout reading communication packets) |
nothing much in master's error log.
2022-05-03 14:26:10 0 [Note] Server socket created on IP: '0.0.0.0'. |
2022-05-03 14:26:10 0 [Warning] 'proxies_priv' entry '@% root@node1' ignored in --skip-name-resolve mode. |
2022-05-03 14:26:10 0 [Note] Reading of all Master_info entries succeeded |
2022-05-03 14:26:10 0 [Note] Added new Master_info '' to hash table |
2022-05-03 14:26:10 0 [Note] /usr/sbin/mariadbd: ready for connections. |
Version: '10.6.7-3-MariaDB-enterprise-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Enterprise Server |
2022-05-03 14:26:14 802 [Note] InnoDB: Number of pools: 2 |
2022-05-03 14:26:21 1851 [Note] InnoDB: Number of pools: 3 |
2022-05-03 14:26:32 354 [Warning] Timeout waiting for reply of binlog (file: mariadb-bin.000022, pos: 859), semi-sync up to file , position 0. |
2022-05-03 14:26:32 354 [Note] Semi-sync replication switched OFF. |
2022-05-03 14:27:04 1991 [Note] Start binlog_dump to slave_server(2), pos(mariadb-bin.000021, 41641390), using_gtid(1), gtid('0-1-7534857') |
2022-05-03 14:27:04 1991 [Note] Start semi-sync binlog_dump to slave (server_id: 2), pos(./mariadb-bin.000022, 4) |
2022-05-03 14:27:04 1 [Note] Semi-sync replication switched ON with slave (server_id: 2) at (mariadb-bin.000022, 972296) |
rep_master_semi.7z - master(192.168.254.24) server's semi_sync status and show slave status
rep_slave_semi.7z - slave(192.168.254.25) server's semi_sync status and show slave status