[MDEV-19572] Async slave on Galera node stops due to 1942 Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0: - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.2.16, 10.2.23, 10.1(EOL)
Fix Version/s: 10.1.44, 10.2.30, 10.3.21, 10.4.11
Component/s: Galera, Replication
Labels:
None

Description

When changes to MyISAM table are replicated from async MariaDB 10.1.x master to a node in MariaDB 10.2.x Galera cluster, in GTID-based replication setup, replication eventually breaks with the following slave status:

MariaDB [test]> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: ao756

                  Master_User: root

                  Master_Port: 3310

                Connect_Retry: 60

              Master_Log_File: ao756-bin.000004

          Read_Master_Log_Pos: 59590494

               Relay_Log_File: mysql-node1-relay-bin.000002

                Relay_Log_Pos: 14872226

        Relay_Master_Log_File: ao756-bin.000004

             Slave_IO_Running: Yes

            Slave_SQL_Running: No

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 1942

                   Last_Error: Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 29774344

              Relay_Log_Space: 44688691

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 1942

               Last_SQL_Error: Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 1

               Master_SSL_Crl:

           Master_SSL_Crlpath:

                   Using_Gtid: Slave_Pos

                  Gtid_IO_Pos: 4100-1-550731

      Replicate_Do_Domain_Ids:

  Replicate_Ignore_Domain_Ids:

                Parallel_Mode: conservative

                    SQL_Delay: 0

          SQL_Remaining_Delay: NULL

      Slave_SQL_Running_State:

1 row in set (0,02 sec)

MariaDB [test]> show variables like 'gtid%';\

+------------------------+---------------+

| Variable_name          | Value         |

+------------------------+---------------+

| gtid_binlog_pos        | 4100-1-380354 |

| gtid_binlog_state      | 4100-1-380354 |

| gtid_current_pos       | 4100-1-380353 |

| gtid_domain_id         | 4100          |

| gtid_ignore_duplicates | OFF           |

| gtid_seq_no            | 0             |

| gtid_slave_pos         | 4100-1-380353 |

| gtid_strict_mode       | OFF           |

+------------------------+---------------+

8 rows in set (0,01 sec)

MariaDB [test]> select * from mysql.gtid_slave_pos;

+-----------+--------+-----------+--------+

| domain_id | sub_id | server_id | seq_no |

+-----------+--------+-----------+--------+

|      4100 | 599107 |         1 | 380352 |

|      4100 | 599108 |         1 | 380353 |

+-----------+--------+-----------+--------+

2 rows in set (0,06 sec)

MariaDB [test]> show create table mysql.gtid_slavee_pos\G

ERROR 1146 (42S02): Table 'mysql.gtid_slavee_pos' doesn't exist

MariaDB [test]> show create table mysql.gtid_slave_pos\G

*************************** 1. row ***************************

       Table: gtid_slave_pos

Create Table: CREATE TABLE `gtid_slave_pos` (

  `domain_id` int(10) unsigned NOT NULL,

  `sub_id` bigint(20) unsigned NOT NULL,

  `server_id` int(10) unsigned NOT NULL,

  `seq_no` bigint(20) unsigned NOT NULL,

  PRIMARY KEY (`domain_id`,`sub_id`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Replication slave GTID position'

1 row in set (0,05 sec)

MariaDB [test]> select version();

+---------------------+

| version()           |

+---------------------+

| 10.2.23-MariaDB-log |

+---------------------+

1 row in set (0,06 sec)

In the error log we see:

...

2019-05-23 16:55:10 140069099460352 [Note] Slave SQL thread initialized, starting replication in log 'ao756-bin.000004' at position 28452044, relay log './mysql-node1-relay-bin.000002' position: 13549926; GTID position '4100-1-372797'

2019-05-23 16:55:10 140069099460352 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.ttmyis; Duplicate entry '161766' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log ao756-bin.000004, end_log_pos 28452166, Gtid 4100-1-372798, Internal MariaDB error code: 1062

2019-05-23 16:55:10 140069099460352 [Warning] WSREP: SQL statement was ineffective thd: 61940 buf: 126

schema: test

QUERY: COMMIT

=> Skipping replication

2019-05-23 16:55:10 140069099460352 [ERROR] Slave SQL: Error 'Got error 35 "Resource deadlock avoided" during COMMIT' on query. Default database: 'test'. Query: 'COMMIT', Gtid 4100-1-372800, Internal MariaDB error code: 1180

2019-05-23 16:55:10 140069099460352 [ERROR] Slave SQL: Node has dropped from cluster, Gtid 4100-1-372800, Internal MariaDB error code: 1047

...

2019-05-23 16:55:54 140069099460352 [Note] Slave SQL thread initialized, starting replication in log 'ao756-bin.000004' at position 29774344, relay log './mysql-node1-relay-bin.000002' position: 14872226; GTID position '4100-1-380353'

2019-05-23 16:55:54 140069099460352 [Warning] WSREP: BF applier failed to open_and_lock_tables: 0, fatal: 0 wsrep = (exec_mode: 0 conflict_state: 0 seqno: -1)

2019-05-23 16:55:54 140069099460352 [ERROR] Slave SQL: Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0: , Gtid 4100-1-380354, Internal MariaDB error code: 1942

2019-05-23 16:55:54 140069099460352 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'ao756-bin.000004' position 29774344; GTID position '4100-1-380353'

2019-05-23 16:55:54 140069099460352 [Note] Slave SQL thread exiting, replication stopped in log 'ao756-bin.000004' at position 29774344; GTID position '4100-1-380353'

...

To end up like this I've created a fresh MariaDB master started as follows:

bin/mysqld_safe --no-defaults --port=3310 --log-bin --gtid_domain_id=4100 --server_id=1 --binlog_format=row --sync_binlog=0 --innodb_flush_log_at_trx_commit=2 &

and set up also a local 2-node Galera cluster with nodes started as:

bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode1.cnf --wsrep-new-cluster &

bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode2.cnf &

while related configuration files are:

openxs@ao756:~/dbs/maria10.2$ cat /home/openxs/galera/mynode1.cnf

[mysqld]

datadir=/home/openxs/galera/node1

port=3306

socket=/tmp/mysql-node1.sock

pid-file=/tmp/mysql-node1.pid

log-error=/tmp/mysql-node1.err

binlog_format=ROW

innodb_autoinc_lock_mode=2

wsrep_on=ON

wsrep_provider=/usr/lib/libgalera_smm.so

wsrep_cluster_name = singlebox

wsrep_node_name = node1

wsrep_cluster_address=gcomm://127.0.0.1:4567,127.0.0.1:5020,127.0.0.1:6020?pc.wait_prim=no

log_bin

log_slave_updates=1

binlog_row_image=FULL

gtid_domain_id=4100

wsrep_gtid_domain_id=4100

wsrep_gtid_mode=ON

wsrep_restart_slave=1

server_id=2

slave-skip-errors=1062

innodb_flush_log_at_trx_commit=2

sync_binlog=0

openxs@ao756:~/dbs/maria10.2$ cat /home/openxs/galera/mynode2.cnf

[mysqld]

datadir=/home/openxs/galera/node2

port=3307

socket=/tmp/mysql-node2.sock

pid-file=/tmp/mysql-node2.pid

log-error=/tmp/mysql-node2.err

binlog_format=ROW

innodb_autoinc_lock_mode=2

wsrep_on=ON

wsrep_provider=/usr/lib/libgalera_smm.so

wsrep_cluster_name = singlebox

wsrep_node_name = node2

wsrep_cluster_address=gcomm://127.0.0.1:4567,127.0.0.1:5020,127.0.0.1:6020?pc.wait_prim=no

wsrep_provider_options = "base_port=5020;"

log_bin

log_slave_updates=1

binlog_row_image=FULL

gtid_domain_id=4100

wsrep_gtid_domain_id=4100

wsrep_gtid_mode=ON

wsrep_restart_slave=1

server_id=3

slave-skip-errors=1062

innodb_flush_log_at_trx_commit=2

sync_binlog=0

openxs@ao756:~/dbs/maria10.2$

The I've created two tables on master, in test database:

 CREATE TABLE `ttinno` (

  `id` int(11) NOT NULL AUTO_INCREMENT,

  `c1` int(11) DEFAULT NULL,

  PRIMARY KEY (`id`)

) ENGINE=InnoDB;

CREATE TABLE `ttmyis` (

  `id` int(11) NOT NULL AUTO_INCREMENT,

  `c1` int(11) DEFAULT NULL,

  PRIMARY KEY (`id`)

) ENGINE=MyISAM;

insert into ttinno(c1) values (1), (2), (3);

insert into ttmyis(c1) values (1), (2), (3);

Then I pointed out slave on node1 of cluster to master:

set global gtid_slave_pos='4100-1-0';

change master to master_host='ao756', master_port=3310, master_user='root', master_use_gtid=slave_pos;

and started replication. After checking that initial stat is replicated to cluster I've executed the following mysqlslap commands pointing to async master:

bin/mysqlslap --concurrency=2 --create-schema=test --no-drop --number-of-queries=100000 --iterations=1 --query='insert into ttmyis(c1) values(rand()*100); insert into ttinno(c1) values(rand()*100);' -uroot --host=127.0.0.1 --port=3310 test

bin/mysqlslap --concurrency=2 --create-schema=test --no-drop --number-of-queries=100000 --iterations=1 --query='start transaction; update ttmyis set c1=100 where id = rand()*1000; insert into ttinno(c1) values(rand()*100); commit;' -uroot --host=127.0.0.1 --port=3310 test

bin/mysqlslap --concurrency=4 --create-schema=test --no-drop --number-of-queries=100000 --iterations=1 --query='start transaction; update ttinno set c1=100 where id = rand()*1000; insert into ttmyis(c1) values(rand()*100); commit;' -uroot --host=127.0.0.1 --port=3310 test

Second run was eventually cancelled as it was too slow. When the last mysqlslap completed I've checked node1 and noted slave is broken there in thee way explained above.

Attachments

Issue Links

relates to

MDEV-16936 Master and Galera Cluster as slave

Closed

MDEV-18641 "Resource deadlock avoided during COMMIT" when a node leaves the group gracefully

Closed

MDEV-21096 Galera cluster crashed after starting async replication from master standlaone server

Closed

Activity

Transition	Time In Source Status	Execution Times

Stepan Patryshev (Inactive) made transition - 2019-08-20 09:15

Open

In Progress

88d 18h 15m

Seppo Jaakola made transition - 2019-10-10 14:09

In Progress

Stalled

51d 4h 54m

Jan Lindström (Inactive) made transition - 2019-11-25 09:20

Stalled

In Progress

45d 19h 10m

Jan Lindström (Inactive) made transition - 2019-11-25 09:20

In Progress

Closed

27s

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Valerii Kravchuk

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2019-05-23 14:59

Updated:: 2024-07-07 22:47

Resolved:: 2019-11-25 09:20

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration