[MDEV-31441] BLOB corruption on UPDATE of PRIMARY KEY with FOREIGN KEY Created: 2023-06-09 Updated: 2023-12-15 Resolved: 2023-11-29 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.4, 10.5, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2, 11.3, 10.6.15 |
| Fix Version/s: | 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Matthias Leich | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | rr-profile-analyzed | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
|
| Comments |
| Comment by Marko Mäkelä [ 2023-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The assertion is failing because during the UPDATE of the PRIMARY KEY column col1, we notice that a BLOB is not "owned" by the record. In the trace that I analyzed, the primary key value col1=1 is being updated: UPDATE test.t_p SET col1 = 267. This is supposed to cause delete-marking of the old record 1 and an ownership transfer to the new record 267. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Before the UPDATE that leads to the assertion failure, the primary key index consists of a single page, comprising the following records: The UPDATE would delete-mark the record 267 and set its transaction ID to 0xa5. It looks like already transaction 0xa2 (executing an identical statement) did something fishy, inserting a disowned BLOB to the new record. Before that UPDATE, the contents of the index was as follows: This bug does not necessarily have anything to do with FOREIGN KEY constraints. The constraint might merely be useful for achieving (un)fortunate timing with respect to the purge of transaction history. It would seem that frequent updates of PRIMARY KEY could cause trouble with BLOBs. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I checked this in a little more detail, setting breakpoints on dispatch_command and row_upd_clust_step, and initially setting a watchpoint on PAGE_N_RECS ((char)0xca11b4c036@2). The assertion fails on the parent table t_p. We should be able to ignore any operations on the child table t_c. There are no secondary indexes on either table.
I think that it should be possible to convert this to a single-connection, single-table, single-row test case, something like this:
Timing or purge control might be important. Because we are updating the PRIMARY KEY here, multiple copies of the row will exist in the clustered index. In this trace, PAGE_N_RECS was incremented to 1 by the INSERT, to 2 by the first UPDATE, and to 3 by the UPDATE to 267. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The flag that trips the debug assertion is set in btr_cur_disown_inherited_fields(). It is supposed to be reset on a rollback, according to a comment in its only caller:
Unfortunately, the rr replay trace does not work anymore, because the file /usr/lib/rr/librrpage.so differs from the rr record run. Is this reproducible if the table t_c is removed from the grammar? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matthias Leich [ 2023-11-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
pluto:/data/results/1700764387/TBR-1808- There was no replay when removing the table t_c from the grammar. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It looks like the PRIMARY KEY(col1) is being updated back and forth quite a bit. The flag is set during the execution of the following statement, which seems to match exactly one row.
It is writing 2 undo log records, one for inserting the updated row and another for delete-marking the old row. After this, we get the assertion failure when executing a statement in another connection:
At the time of the first UPDATE, there is exactly one record in the table, with col1=1. There must be some reason why this is not repeatable by running the following test:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I can reproduce the failure with the following test case:
The issue is that the first UPDATE will copy the ‘disowned’ flag from the original record’s BLOB, so neither of the 2 records in the parent table (the delete-marked 1, or the normal 12) will ‘own’ the BLOB. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
mleich, please test the patch on both 10.4 and 10.6. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matthias Leich [ 2023-11-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2023-11-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Looks good to me. |