[MDEV-16370] row-based binlog events (updates by primary key) can not be applied multiple times to system versioned tables Created: 2018-06-01 Updated: 2021-07-24 Resolved: 2021-07-24 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication, Versioned Tables |
| Affects Version/s: | 10.3.7, 10.3.8 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Minor |
| Reporter: | Valerii Kravchuk | Assignee: | Sergei Golubchik |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | RBR, mysqlbinlog | ||
| Issue Links: |
|
||||||||
| Description |
|
Normally one can re-apply row-based events from the binary log (at least single row updates of non-key column by the primary key) as many times as needed. Consider this simple scenario:
As a result we get a binary log with two row-based updates:
Normally we can apply it to the table as many times as needed without errors, as long as initial state of data is consistent (row exists):
Now, let's try to re-create the table as a system versioned one:
We can successfully apply binary log once and get the result:
But if we try again we fail with error:
Note two things above: 1. The error message is weird and misleading. We do have just one single row with primary key value 1, and we updated other column, not a primary key. 2. We were able to perform the first update, but not the second. So, I'd say data are corrupted. Probably system versioned tables can not be treated in such a way, but then I'd appreciate proper error messages and documented limitation. I also strongly suspect that the same problem may happen during replication, in case of retries by SQL thread for example, or in some other, not yet identified cases. At least this my test was created while working on a problem (same error message for some of multiple single row updates) while replicating row-based changes to a more complex table from older 5.5.x MariaDB version into a similar system-versioned table on MariaDB 10.3.7 slave. |
| Comments |
| Comment by Elena Stepanova [ 2018-06-01 ] | ||||||||
|
Well, you are applying a log from an essentially different table, which has a different number of columns, different contents and different key structure. I don't know how much obligations we have in this case to make it work transparently. For ER_DUP_ENTRY, sadly yes, you have a 2-component PK and you do update a part of it, although implicitly.
I can't argue that the behavior is confusing for a regular user (as many other things about versioning will be), but I don't see a viable alternative, any way we change it, is going to be confusing. What should be done differently? Silently drop the previous history? Avoid creating new history while replaying the binlog? Using current timestamps and not timestamps from the binary log? I think whichever we choose, there will still be opinion that it's wrong. I'll leave it to serg to decide. | ||||||||
| Comment by Valerii Kravchuk [ 2018-06-01 ] | ||||||||
|
If the decision is not to change the behavior, please, consider at least the change to error message, so that the value of "implicit" row_end part of the primary key is shown. That would be a hint for the user on what exactly is wrong. | ||||||||
| Comment by Sergei Golubchik [ 2018-06-01 ] | ||||||||
|
Note, also, that what you're doing is very fundamentally against the concept of versioning. You cannot update a row from 1 to 2 twice at exactly the same point in time. And it's what you've trying to do. If you run the slave with --secure-timestamp=YES then the versioning on the slave will use slave system clock, not master event timestamp. Then you should be able to apply row log as many times as you want, every time creating more historical rows with new values for row_start and row_end. | ||||||||
| Comment by Sergei Golubchik [ 2018-09-19 ] | ||||||||
|
On the second thought, let's try to fix the system versioning on the slave to keep row events idempotent | ||||||||
| Comment by Aleksey Midenkov [ 2018-11-17 ] | ||||||||
|
This should be fixed in 10.4 because it introduces important change to versioning algorithm: UPDATE and DELETE binlog events change `row_start` of historical record if timestamps are applied from binlog (SECURE_TIMESTAMP is off) and binlog timestamp is earlier than `row_start` of existing record. In this case historical record's (row_start, row_end) gets values (binlog timestamp - 1 second, binlog timestamp). | ||||||||
| Comment by Aleksey Midenkov [ 2018-11-20 ] | ||||||||
|
> What should be done differently? Silently drop the previous history? Avoid creating new history while replaying the binlog? I prefer latter because: 1) history before any binlog is more important; 2) it's faster. > Using current timestamps and not timestamps from the binary log? I think whichever we choose, there will still be opinion that it's wrong. As serg suggested: timestamps from binlog are still used, but row_start is corrected. I agree, this is questionable, but it should work for use case of slave initiation and is better than using current timestamps. For better security I would recommend configuration variable as this is one-time operation. | ||||||||
| Comment by Aleksey Midenkov [ 2019-04-29 ] | ||||||||
|
The task is done with the help of new option log_bin_send_microseconds. Along with already mentioned drawbacks of spoof counter the major issue is that multiple slaves will get desynchronised history. This is specific only for unversioned -> versioned replication. Also spoof counter is unstable on binlog replay via mysql client because the counter is affected by RTC and there is no measure to keep sequence of BINLOG commands contiguous (without RTC interference). There is no mention of mysqldump use case in the task. valerii please clarify whether this task has something to do with mysqldump. And of so, serg should describe how SET TIMESTAMP injection should work. | ||||||||
| Comment by Sergei Golubchik [ 2020-10-02 ] | ||||||||
|
So, the goal is to make applying of row-based binlog idempotent when applying unversioned events to the versioned table. Another possible approach would be to completely ignore all updates where now() is less than the row start_time. It won't help inserts, though, inserts will still create duplicate rows, but it's the same without versioning, so it's ok. In general, there are many cases when an rbr-binlog cannot be applied many times. As far as I understand, the goal is to be able to re-apply rbr-binlog in the unversioned->versioned mode in all cases where it's possible in the unversioned->unversioned mode. What are these cases?
| ||||||||
| Comment by Andrei Elkin [ 2020-10-02 ] | ||||||||
|
serg: to complete with all cases where it's possible in the unversioned->unversioned mode,
| ||||||||
| Comment by Andrei Elkin [ 2020-10-02 ] | ||||||||
|
serg, midenok: apologies if I shooting too wildly, but could historical record be perceived as a constraint? Think of FK constraint that can be controlled as a parallel. | ||||||||
| Comment by Sergei Golubchik [ 2020-10-08 ] | ||||||||
|
As far as I can see it, to solve "non-idempotent update with UNIQUE" use case it is, indeed, enough to ignore duplicate key errors. Here "ignore" means — not to insert the row that causes a dup key error and not to abort the statement either. It could be made safer only to ignore them if it's the history column that's duplicate. In a case of "idempotent pair of insert+delete", the duplicate key error will be caused by an update, so the above becomes "delete the row that causes an dup key error on update". Now can this ignoring of errors break some other use cases? It generates corrupted inconsistent history when in the binlog is being applied for the second time. I'm not sure it's a good idea. AS OF queries don't expect that (and shouldn't really) |