[MDEV-30010] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964 Created: 2022-11-14 Updated: 2023-10-27 Resolved: 2023-01-24 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication, Storage Engine - InnoDB |
| Affects Version/s: | 10.4.27, 10.6.11 |
| Fix Version/s: | 10.4.28, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Rob Schwyzer | Assignee: | Andrei Elkin |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Environment: |
https://mariadbcorp.atlassian.net/wiki/spaces/CustomerEngineers/pages/1518829569/snsvr01 CentOS 7 running on 2x Intel Xeon Gold 6134 with 592GB RAM on a PCI-E 3.0 x8 NVMe SSD |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
Please review comments for reproduction information. More to the point, this appears to be a direct reproduction of Primary using binlog_format=mixed. Replica using the below, key configurations-
Table structure uses the following, critical columns-
Main workload consists of patterns such as-
As expected to optimistic parallel replication, this kind of workload results in tons of rollbacks on the replica (but to be clear, the primary is fine throughout), such as-
Despite tons of these deadlocks happening constantly, in my attempts the replica held out for over 24 hours before finally throwing the below error-
Resolved binary log from this repro is available via link in comments. Data leftover on replica after this crash is-
The same query on the primary returns-
SHOW GLOBAL VARIABLES from primary and replica also attached while the full error log from the replica is in the comments. |
| Comments |
| Comment by Vladislav Lesin [ 2022-11-21 ] | ||||||||||||||||||||||||||||||||||||||
|
Some update. I managed to reproduce and record it with rr on 10.6. What I see now in my debug session is that one transaction rolls back DELETE, and another transaction gets "duplicate key error" on INSERT. My suggestion based on what I saw in debug session is that there are 3 groups, which can be executed in parallel, but must be committed sequentially. The 2nd group conflicts with the 1st group and does rollback to let the 1st group to finish it's commit. The 3rd group must also be rolled back, but before rolling back it tries to execute INSERT after the 2nd group rolled back DELETE, what causes "duplicate key error". I am not 100% sure about this scenario, as I am not so familiar with replication code, and, despite Andrei helps me to understand some details, it will take some more time to check the assumption. | ||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2022-11-29 ] | ||||||||||||||||||||||||||||||||||||||
|
There are two records in secondary index with the same secondary index keys, but different primary key values in the scenario: (sk, pk1) - r1, (sk, pk2) - r2, sk is unique secondary index key, pk1 > pk2, it's important. r1 is delete-marked. Let's denote queries as [UDI]/d+. where U - UPDATE, D - DELETE, I - INSERT, the number at the end is sequence number in GTID. Master's binary log contains the following sequence of statements: I.e. D366 deletes record inserted by I355, that's why I371 can insert the record with the same secondary key without "duplicate key error". The sequence of the statements execution is the following on slave: 1. U395 locks r1. When D366 restores cursor position at step 4, it restores the position to the same record as it was before suspending, as the record still exists. But, as it delete-marked, row_search_mvcc() goes to the next record, see locks_ok_del_marked label for details:
There are several questions which are still opened for me: | ||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2022-12-02 ] | ||||||||||||||||||||||||||||||||||||||
|
I have just attached MDEV-30010.test There can be several ways to fix it. 1. Restore cursor position to the first record with the same key instead the exact record it was set before suspending for secondary indexes. 2. Check the result of DELETE on slave, and if it returned not the same amount of deleted rows, just restart it. | ||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2022-12-02 ] | ||||||||||||||||||||||||||||||||||||||
|
MDEV-30010.test I am afraid, the general reason of the issue is different transactions execution order on master and slave. Even of we fix this certain issue, I am quite sure there will be other issues caused by the same reason. The order of records locking is not defined in general. I would suggest just not to mix statement-based replication and parallel execution on slave to avoid such issues. | ||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2022-12-02 ] | ||||||||||||||||||||||||||||||||||||||
|
Just some note for possible future fix. Here is the logic of cursor position restoring used in our case:
The cursor restores the same position, because block_when_stored.run_with_hint() returns true. | ||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2022-12-07 ] | ||||||||||||||||||||||||||||||||||||||
|
The general question raised during the analyses of this case was why cursor position is restored to same (sk, pk) pair for secondary index in DELETE_366 after awakening and after the INSERT_355 inserted and committed new record before the pair. And one more question was about behaviour for primary keys, is it possible to create the same test for primary index, and if no, then why. Let's try to answer the questions with code analyses. Secondary index.Let's remember the test case MDEV-30010.test
| ||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2022-12-08 ] | ||||||||||||||||||||||||||||||||||||||
|
Here is some illustration to the MDEV-30010.test | ||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-12-09 ] | ||||||||||||||||||||||||||||||||||||||
|
vlad.lesin, thank you! I think that an important observation is that due to the way how MVCC and locking is implemented in InnoDB, there may be multiple secondary index records that point to the same primary key. In the clustered index (primary key index), there can be at most one record for each potential key. Old versions are in a singly linked list of undo log records (pointed to by DB_ROLL_PTR; the validity of those pointers is based on the value of DB_TRX_ID in each record). We should also keep in mind that InnoDB does not have any concept of "table row lock". InnoDB transactions only know "table locks" and "index record locks". When there exist secondary indexes on a table, the type of locking conflicts depends on the access path. A DELETE or UPDATE or similar statement will start with a locking read. That read may use a secondary index, that is, lock a secondary index record before potentially locking the clustered index record. If the INSERT had added the secondary index record before the DELETE looked up the secondary key, also then the DELETE would have waited for a lock that was created by the INSERT, but it would have been on the INSERT’ed record, not the record that INSERT read-locked for the duplicate key check. As far as I understand, this problem affects UPDATE and possibly REPLACE as well, and it is not really limited to unique indexes. I see some possible solutions to this.
I would consider any InnoDB change as a work-around. I am concerned that changing the logic in InnoDB could cause a serious performance regression for some types of workloads. The "restart the index scan" fix could potentially introduce a bug where we get into an infinite loop if the circumstances are suitable (say, there are lots of concurrent page splits or merges on the index tree). It would unnecessarily slow down execution on all servers even when the affected mode of replication is not being used at all. | ||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2022-12-09 ] | ||||||||||||||||||||||||||||||||||||||
|
marko, it seems, if we remove not-gap locks from secondary indexes, "table row lock" concept will be implemented. And MDEV-30010.test One more solution would be in requesting next-key locks for secondary indexes. This would also cause deadlock error for the DELETE. But performance impact should be measured. | ||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2022-12-15 ] | ||||||||||||||||||||||||||||||||||||||
|
After long discussions it was decided that skipping record is a bug, which was filed in I am fixing | ||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2023-01-24 ] | ||||||||||||||||||||||||||||||||||||||
|
Closed with 'Done' to indicate that a commit pushed merely extends the test base |