[MDEV-17059] Replication stopped with error 12605 'This xid is already exist' Created: 2018-08-24  Updated: 2023-09-20

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - Spider
Affects Version/s: 10.3
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Mattias Jonsson Assignee: Yuchen Pei
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Linux



 Description   

Replication sql thread stopped on error 12605, 'This xid is already exist'

No prepared XA on the spider head or its datanodes.

After restarting the sql thread it continues, but the rows in mysql.spider_xa/mysql.spider_xa_member tables still remains.

Should this row be removed manually or automatically?

Slave status after the sql thread stopped:

spider_head> show slave status\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: spider_intermediate_mariadb_master
...
              Slave_IO_Running: Yes
             Slave_SQL_Running: No
...
                    Last_Errno: 12605
                    Last_Error: This xid is already exist
                  Skip_Counter: 0
...
                 Last_IO_Errno: 0
                 Last_IO_Error: 
                Last_SQL_Errno: 12605
                Last_SQL_Error: This xid is already exist
...
                    Using_Gtid: No
...
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: 
              Slave_DDL_Groups: 330705
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 157650443
1 row in set (0.00 sec)

Content of spider_xa tables after it was stopped (also the same after starting the sql thread).

spider_head> select * from spider_xa\G
*************************** 1. row ***************************
   format_id: 1
gtrid_length: 5
bqual_length: 7
        data: 5195bb56c949                                                                                                                    
      status: NOT YET
1 row in set (0.00 sec)
 
spider_head> select * from spider_xa_failed_log\G
Empty set (0.00 sec)
 
spider_head> select * from spider_xa_member\G
*************************** 1. row ***************************
             format_id: 1
          gtrid_length: 5
          bqual_length: 7
                  data: 5195bb56c949                                                                                                                    
                scheme: mysql
                  host: datanode1.example.com
                  port: 3306
                socket: 
              username: spider_user
              password: ***************
                ssl_ca: NULL
            ssl_capath: NULL
              ssl_cert: NULL
            ssl_cipher: NULL
               ssl_key: NULL
ssl_verify_server_cert: 0
          default_file: NULL
         default_group: NULL
*************************** 2. row ***************************
             format_id: 1
          gtrid_length: 5
          bqual_length: 7
                  data: 5195bb56c949                                                                                                                    
                scheme: mysql
                  host: datanode2.example.com
                  port: 3306
                socket: 
              username: spider_user
              password: **************
                ssl_ca: NULL
            ssl_capath: NULL
              ssl_cert: NULL
            ssl_cipher: NULL
               ssl_key: NULL
ssl_verify_server_cert: 0
          default_file: NULL
         default_group: NULL
2 rows in set (0.00 sec)

From error log on spider head:

20180822 11:20:48 [SEND SPIDER SQL] from 334171 to [datanode2.example.com] 7230589:  sql: xa end 0x3531393562,0x62353663393439,1
20180822 11:20:48 [ERROR SPIDER RESULT] to 334171: 1159 
20180822 11:20:48 [SEND SPIDER SQL] from 334171 to [datanode1.example.com] 8131371:  sql: xa end 0x3531393562,0x62353663393439,1
20180822 11:20:48 [ERROR SPIDER RESULT] to 334171: 1399 XAER_RMFAIL: The command cannot be executed when global transaction is in the  PREPARED state
20180822 11:20:48 [SEND SPIDER SQL] from 334171 to [datanode2.example.com] 7230589:  sql: xa end 0x3531393562,0x62353663393439,1
2018-08-22 11:20:48 334171 [Warning] WSREP: handlerton rollback failed, thd 334171 528536546 conf 0 SQL (null)
20180822 11:20:48 [SEND SPIDER SQL] from 8 to [datanode2.example.com] 2720391:  sql: show table status from `db` like 't1'
2018-08-22 11:24:54 334171 [ERROR] Slave SQL: This xid is already exist, Gtid 0-190237001-119268216, Internal MariaDB error code: 12605
2018-08-22 11:24:54 334171 [Warning] Slave: This xid is already exist Error_code: 12605
2018-08-22 11:24:54 334171 [Warning] Slave: Got error 12605 "Unknown error 12605" during COMMIT Error_code: 1180
2018-08-22 11:24:54 334171 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.001912' position 25922247
2018-08-22 11:24:54 334170 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2018-08-22 11:24:54 334170 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2018-08-22 11:24:54 334170 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.001912' position 25922247
2018-08-22 11:24:54 334172 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2018-08-22 11:24:54 334172 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2018-08-22 11:24:54 334172 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.001912' position 25922247
2018-08-22 11:24:54 334171 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2018-08-22 11:24:54 334171 [Warning] Slave: This xid is already exist Error_code: 12605
2018-08-22 11:24:54 334171 [Warning] Slave: Got error 12605 "Unknown error 12605" during COMMIT Error_code: 1180
2018-08-22 11:24:54 334171 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2018-08-22 11:24:54 334171 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.001912' position 25922247
2018-08-22 11:24:54 334169 [Note] Slave SQL thread exiting, replication stopped in log 'binlog.001912' at position 25922247

On all datanodes: (no prepared xa transactions)

datanode1 [(none)]> xa recover;
Empty set (0.00 sec)

Also the error message should be changed to 'This xid already exist', removing ' is'.



 Comments   
Comment by Mattias Jonsson [ 2018-08-24 ]

spider xa related settings:

show global variables like '%spider%xa%';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| spider_internal_xa          | ON    |
| spider_internal_xa_id_type  | 0     |
| spider_internal_xa_snapshot | 0     |
| spider_support_xa           | ON    |
| spider_xa_register_mode     | 1     |
+-----------------------------+-------+

Comment by Mattias Jonsson [ 2018-08-27 ]

Could it be that during a previous crash, the spider_xa and spider_xa_member tables had 'active' rows, that after the crash no longer where active and had no tracking within the spider engine and became orphan?

Also since the rows still remained after this error and restarting the replication I would assume it can happen again in the future. So the issue probably is that it is possible to have an inconsistent state in spider_xa and spider_xa_member tables, especially after a crash?

Generated at Thu Feb 08 08:33:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.