[MDEV-31038] Parallel Replication Breaks if XA PREPARE Fails Updating Slave GTID State - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL)
Fix Version/s: 10.5.20, 10.6.13, 10.8.8, 10.9.6, 10.10.4, 10.11.3, 11.0.2, 11.1.1
Component/s: Replication
Labels:
None

Description

Similar to ~~MDEV-29642~~, but instead of a slave crashing, if a parallel slave executing an XA PREPARE fails the follow up autocommit transaction due to a temporary error which can be retried (e.g. a deadlock when trying to lock mysql.gtid_slave_pos), then the retry will run into the same issue as ~~MDEV-29642~~, i.e. an out-of-order GTID attempt (if gtid strict mode is enabled) or XID already exists (otherwise).

This is because the XA PREPARE had already committed successfully in binlog and storage engines, and only the slave GTID state update failed. Then on retry, the transaction has already been seen, so we error.

Filed as a separate ticket (rather than extending ~~MDEV-29642~~ scope) because this is specific to the parallel slave, and may need a separate effort.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

rpl_parallel_optimistic.test
2023-04-25 12:49
2 kB
Vladislav Lesin

Issue Links

is caused by

MDEV-31185 rw_trx_hash_t::find() unpins pins too early

Closed

relates to

MDEV-21777 Implement crash-safe execution the user XA on binlog-less slave

Open

MDEV-31185 rw_trx_hash_t::find() unpins pins too early

Closed

MDEV-29642 Server Crash During XA Prepare Can Break Replication

Closed

Activity

Ascending order - Click to sort in descending order

Brandon Nesterenko added a comment - 2023-04-19 14:32

Hi Andrei,

This is ready for review as PR-2610.

Brandon Nesterenko added a comment - 2023-04-19 14:32 Hi Andrei, This is ready for review as PR-2610 .

Andrei Elkin added a comment - 2023-04-20 17:36

A review round suggested couple of improvements.

Andrei Elkin added a comment - 2023-04-20 17:36 A review round suggested couple of improvements.

Brandon Nesterenko added a comment - 2023-04-25 01:21

Pushed into 10.5 as 31f09e3

Brandon Nesterenko added a comment - 2023-04-25 01:21 Pushed into 10.5 as 31f09e3

Vladislav Lesin added a comment - 2023-04-25 12:44 - edited

Guys, I think the fix does not actually fixes the root case. From the user perspective it just replaces one error no another.

The initial problem can be solved with two sides:

1. Replication side. "XA prepare" binlogging and record inserting to gtid_slave_pos is not atomic. If "XA prepare" is binloged, and for some reason the insert to gtid_slave_pos failed, the binlog and gtid_slave_pos are not consistent. It would be great if binlog and gtid_slave_pos updates were in the same transaction, which could be either committed or rolled back.

2. InnoDB side.

If the bug is caused by transaction retry, and there was error during in gtid_slave_pos insert, the error can be either deadlock or timeout(see the code around slave_transaction_retry_errors array and is_parallel_retry_error() call in Xid_apply_log_event::do_record_gtid() for details). Both errors can be caused by InnoDB locks.

But when some record is inserted into gtid_slave_pos, persistent cursor in opened with PAGE_CUR_LE search mode, see row_ins_clust_index_entry_low() for details. I.e. it tries to find the record less or equal to the inserted gtid. During the search, btr_pcur_open_low() fills cursor->up_match and cursor->low_match fields. Here is the comments about the fields:

ulint           up_match;       /*!< If the search mode was PAGE_CUR_LE,

                                            the number of matched fields to the

                                            the first user record to the right of

                                            the cursor record after

                                            btr_cur_search_to_nth_level;

                                            ... */

ulint           low_match;     /*!< if search mode was PAGE_CUR_LE,

                                             the number of matched fields to the

                                             first user record AT THE CURSOR or

                                             to the left of it after

                                             btr_cur_search_to_nth_level;

                                             ...*/

gtid_slave_pos has the following pk:

*************************** 1. row ***************************

       Table: gtid_slave_pos

Create Table: CREATE TABLE `gtid_slave_pos` (

  `domain_id` int(10) unsigned NOT NULL,

  `sub_id` bigint(20) unsigned NOT NULL,

  `server_id` int(10) unsigned NOT NULL,

  `seq_no` bigint(20) unsigned NOT NULL,

  PRIMARY KEY (`domain_id`,`sub_id`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1 PAGE_CHECKSUM=1 TRANSACTIONAL=0 COMMENT='Replication slave GTID position'

i.e. if we have the following records in gtid_slave_pos (0, 1, ...), (0, 4, ...), and want to insert (0, 3, ...), both cursor->low_match and cursor->up_match will be equal to 1, as both (0,1, ...), (0,4, ...) has the same 1 field as (0,3, ...). If gtid_slave_pos had contained also (0,3, ...) record, cursor->low_match would be equal to 1, and cursor->up_match would be equal to 2, what would make (cursor->up_match >= n_uniq || cursor->low_match >= n_uniq) condition to be true in row_ins_clust_index_entry_low(), what would cause non-gap shared lock acquiring for (0,3,...) to compare all fields(see row_ins_duplicate_error_in_clust()) and to check 'deleted' flag. Non-gap locks can be converted to gap lock only on record deleting, or page splitting/merging or page discarding. See lock_rec_inherit_to_gap() for details. So if (0,3,...) is delete-marked and purged(or page is split/merged or discarded) after some thread checked it for duplicates, committed mini-transaction, but have not released locks yet, insert (0,2,...) could be blocked.

In other words, non-gap S-lock on gtid_slave_pos record can happen only if it contain duplicate(even delete-marked), i.e. the record with the same (`domain_id`,`sub_id`) pair as in the inserting record. As I understood correctly(Elkin, correct me if I am wrong), it's impossible, as (`domain_id`,`sub_id`) is supposed to be unique.

In the case of duplicate(even delete-marked) absence only insert-intention locks are acquired(they don't conflict each other). Insert intention locks are not converted(see lock_rec_inherit_to_gap(), lock_rec_inherit_to_gap_if_gap_lock()). Insert-intention locks conflict only with gap locks. But, to make it happen, there must be duplicates in gtid_slave_pos, what is supposed to be impossible.

I would add printing transaction locks info (i.e. what locks the transaction holds, in what lock it's deadlocked or timed out, the locks of conflicting transaction) to understand if the insert failure was caused by locks, and by what exactly locks. Maybe it also makes sense to print the statement of conflicting transaction, as, theoretically, the locks can be caused not only by slave workers, but also by some third-party queries.

Vladislav Lesin added a comment - 2023-04-25 12:44 - edited Guys, I think the fix does not actually fixes the root case. From the user perspective it just replaces one error no another. The initial problem can be solved with two sides: 1. Replication side. "XA prepare" binlogging and record inserting to gtid_slave_pos is not atomic. If "XA prepare" is binloged, and for some reason the insert to gtid_slave_pos failed, the binlog and gtid_slave_pos are not consistent. It would be great if binlog and gtid_slave_pos updates were in the same transaction, which could be either committed or rolled back. 2. InnoDB side. If the bug is caused by transaction retry, and there was error during in gtid_slave_pos insert, the error can be either deadlock or timeout(see the code around slave_transaction_retry_errors array and is_parallel_retry_error() call in Xid_apply_log_event::do_record_gtid() for details). Both errors can be caused by InnoDB locks. But when some record is inserted into gtid_slave_pos, persistent cursor in opened with PAGE_CUR_LE search mode, see row_ins_clust_index_entry_low() for details. I.e. it tries to find the record less or equal to the inserted gtid. During the search, btr_pcur_open_low() fills cursor->up_match and cursor->low_match fields. Here is the comments about the fields: ulint up_match; /*!< If the search mode was PAGE_CUR_LE, the number of matched fields to the the first user record to the right of the cursor record after btr_cur_search_to_nth_level; ... */ ulint low_match; /*!< if search mode was PAGE_CUR_LE, the number of matched fields to the first user record AT THE CURSOR or to the left of it after btr_cur_search_to_nth_level; ...*/ gtid_slave_pos has the following pk: *************************** 1. row *************************** Table: gtid_slave_pos Create Table: CREATE TABLE `gtid_slave_pos` ( `domain_id` int(10) unsigned NOT NULL, `sub_id` bigint(20) unsigned NOT NULL, `server_id` int(10) unsigned NOT NULL, `seq_no` bigint(20) unsigned NOT NULL, PRIMARY KEY (`domain_id`,`sub_id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 PAGE_CHECKSUM=1 TRANSACTIONAL=0 COMMENT='Replication slave GTID position' i.e. if we have the following records in gtid_slave_pos (0, 1, ...), (0, 4, ...), and want to insert (0, 3, ...), both cursor->low_match and cursor->up_match will be equal to 1, as both (0,1, ...), (0,4, ...) has the same 1 field as (0,3, ...). If gtid_slave_pos had contained also (0,3, ...) record, cursor->low_match would be equal to 1, and cursor->up_match would be equal to 2, what would make (cursor->up_match >= n_uniq || cursor->low_match >= n_uniq) condition to be true in row_ins_clust_index_entry_low(), what would cause non-gap shared lock acquiring for (0,3,...) to compare all fields(see row_ins_duplicate_error_in_clust()) and to check 'deleted' flag. Non-gap locks can be converted to gap lock only on record deleting, or page splitting/merging or page discarding. See lock_rec_inherit_to_gap() for details. So if (0,3,...) is delete-marked and purged(or page is split/merged or discarded) after some thread checked it for duplicates, committed mini-transaction, but have not released locks yet, insert (0,2,...) could be blocked. In other words, non-gap S-lock on gtid_slave_pos record can happen only if it contain duplicate(even delete-marked), i.e. the record with the same (`domain_id`,`sub_id`) pair as in the inserting record. As I understood correctly( Elkin , correct me if I am wrong), it's impossible, as (`domain_id`,`sub_id`) is supposed to be unique. In the case of duplicate(even delete-marked) absence only insert-intention locks are acquired(they don't conflict each other). Insert intention locks are not converted(see lock_rec_inherit_to_gap(), lock_rec_inherit_to_gap_if_gap_lock()). Insert-intention locks conflict only with gap locks. But, to make it happen, there must be duplicates in gtid_slave_pos, what is supposed to be impossible. I would add printing transaction locks info (i.e. what locks the transaction holds, in what lock it's deadlocked or timed out, the locks of conflicting transaction) to understand if the insert failure was caused by locks, and by what exactly locks. Maybe it also makes sense to print the statement of conflicting transaction, as, theoretically, the locks can be caused not only by slave workers, but also by some third-party queries.

Vladislav Lesin added a comment - 2023-04-25 12:50 - edited

Here is my try to cause "out-of-order" error: rpl_parallel_optimistic.test.

I launched it with 30 instanced, it worked about 8 hours and did not catch the bug.

Vladislav Lesin added a comment - 2023-04-25 12:50 - edited Here is my try to cause "out-of-order" error: rpl_parallel_optimistic.test . I launched it with 30 instanced, it worked about 8 hours and did not catch the bug.

Marko Mäkelä added a comment - 2023-04-28 13:22

The rpl_parallel_optimistic.test produced an anomaly that I posted some analysis of into ~~MDEV-28430~~. It could explain mystery locking conflicts.

Marko Mäkelä added a comment - 2023-04-28 13:22 The rpl_parallel_optimistic.test produced an anomaly that I posted some analysis of into MDEV-28430 . It could explain mystery locking conflicts.

Brandon Nesterenko added a comment - 2023-06-12 20:49

Closing issue as there was a patch out into a release. Any further work on the corresponding crash-safety concern should be tracked to MDEV-21777.

Brandon Nesterenko added a comment - 2023-06-12 20:49 Closing issue as there was a patch out into a release. Any further work on the corresponding crash-safety concern should be tracked to MDEV-21777 .

Andrei Elkin added a comment - 2023-09-21 16:06

Correcting the relation after discussion with vlad.lesin.

Andrei Elkin added a comment - 2023-09-21 16:06 Correcting the relation after discussion with vlad.lesin .

People

Assignee:: Vladislav Lesin

Reporter:: Brandon Nesterenko

Votes:: 2 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 2023-04-11 21:42

Updated:: 2024-07-07 19:37

Resolved:: 2023-06-12 20:49

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration