[MDEV-22351] InnoDB may report incorrect binlog position information after RESET MASTER Created: 2020-04-23 Updated: 2023-12-11 Resolved: 2023-11-03 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.3, 10.4, 10.5 |
| Fix Version/s: | 10.3.28, 10.4.18, 10.5.9 |
| Type: | Bug | Priority: | Major |
| Reporter: | Sujatha Sivakumar (Inactive) | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Description |
|
Warning Even though this bug is marked "fixed", the problem with wrong binlog position was not fixed until 10.4.32 RESET MASTER should be propagated to InnoDB to actually clear all information from the rseg header pages. Observation: Start a server with binary log option being enabled.
Shutdown and restart server Debug prints: From mysqld.1.err
Execute RESET MASTER on restarted server. // This will clear existing binary logs and a fresh 'master-bin.000001' is created.
Shutdown and restart. Observe the following output.
Shutdown and restart server. A new binary log is created.
shutdown and restart. Since 'TRX_RSEG' file and pos are > 'recovered_binlog_name/offset' the 'recovered_binlog_name/offset' get proper assignments.
Following debug prints were used to observe the behavior:
|
| Comments |
| Comment by Marko Mäkelä [ 2020-04-23 ] | |||||
|
sujatha.sivakumar, we need a handlerton call that will be invoked by RESET MASTER. Once we have that interface, InnoDB can implement it, to clear every copy of binlog information from the TRX_SYS page and the undo log header pages. | |||||
| Comment by Marko Mäkelä [ 2020-04-23 ] | |||||
|
The change is on bb-10.3-marko. | |||||
| Comment by Marko Mäkelä [ 2021-01-21 ] | |||||
|
serg, Elkin, even though a different solution was chosen for What mechanism does the binlog recovery use to determine which transactions need to be replayed to the storage engines? Is it based on the internal XA transactions? That is, will the binlog recovery only invoke XA COMMIT or XA ROLLBACK internally? I want to ensure that it is acceptable to remove this output:
Sample output:
This is covered by the test innodb.group_commit_binlog_pos_no_optimize_thread and mariabackup.binlog (without actually validating that the position matches the expectations). There is also something similar in RocksDB, but I do not know if it is affected by this RESET MASTER bug. Because I do not like corruption bugs, so I would either prefer to fix this bug or remove the functionality as ‘not longer needed’ in the upcoming 10.6 release. | |||||
| Comment by Marko Mäkelä [ 2021-01-22 ] | |||||
|
When it comes to fixing the bug in 10.3, 10.4, 10.5, I have an alternative idea that I did not prototype yet. On recovery, we could try to gather the binlog position from the page that carries the largest FIL_PAGE_LSN (or newest_modification). In that way, even if RESET MASTER and some subsequent commands caused the binlog position to go backwards, we should get the latest version of the position. Currently, trx_rseg_mem_restore() is ignoring the LSN and taking the greatest available binlog file name or log position, even if it occurred on an older page. | |||||
| Comment by Marko Mäkelä [ 2021-01-22 ] | |||||
|
sujatha.sivakumar, can you please test whether the bb-10.3- | |||||
| Comment by Andrei Elkin [ 2021-01-22 ] | |||||
|
marko, serg: to trx_sys.recovered_binlog, MDEV-1895 may need actually it, or gtid-format equivalent 'cos uncertainty | |||||
| Comment by Sujatha Sivakumar (Inactive) [ 2021-01-25 ] | |||||
|
Hello Marko, Thank you for fixing this issue. | |||||
| Comment by Marko Mäkelä [ 2021-01-25 ] | |||||
|
Thank you for verifying my fix, sujatha.sivakumar! I’d still like to remove the storage of the binlog information from InnoDB in an upcoming major release (such as 10.6), but that is a separate discussion. | |||||
| Comment by Marko Mäkelä [ 2022-05-12 ] | |||||
|
Apparently, the main reason to have InnoDB store the binlog position is backup. That could be sorted out in MDEV-21611. | |||||
| Comment by Kristian Nielsen [ 2023-09-13 ] | |||||
|
I think the fix for this bug is incorrect, and introduces a regression. Reverting this fix makes the testcase pass. | |||||
| Comment by Marko Mäkelä [ 2023-11-01 ] | |||||
|
knielsen, I think that we should consider reverting to something that resembles my initial fix. It would not involve any file format changes. If I remember correctly, this simple solution was rejected by serg back then. | |||||
| Comment by Kristian Nielsen [ 2023-11-03 ] | |||||
|
I implemented a proof-of-concept patch that avoids the need to strcmp() binlog file names, using a version number instead associated with each binlog position entry: https://github.com/MariaDB/server/commits/knielsen_mdev22351 The disadvantage is that it requires extending the format of the rseg page header, and requires an extra 8 byte redo log write at every commit. As Marko wrote, for this reason we for now prefer to keep the current strcmp() method (after reverting the original patch for The proof-of-concept patch includes a test case for the RESET MASTER problem that can be used with an alternative fix. | |||||
| Comment by Kristian Nielsen [ 2023-11-03 ] | |||||
|
Re-opening as the original fix causes regression and will be reverted. | |||||
| Comment by Sergei Golubchik [ 2023-11-03 ] | |||||
|
knielsen, it'll be very difficult to see from Jira what's happening with the code. Jira isn't good at recording complex histories like "fixed in versions X1, Y1, Z1, reverted in Y2, Z2, fixed in Z3". Better create a new issue for that and keep this one with FixVersion showing where the fix fix was pushed. | |||||
| Comment by Marko Mäkelä [ 2023-11-03 ] | |||||
|
knielsen, I see that you reverted the pushed fix. Please file a new MDEV for fixing this correctly and link it from here. | |||||
| Comment by Kristian Nielsen [ 2023-11-03 ] | |||||
|
MDEV-32673 filed for the new issue. | |||||
| Comment by Kristian Nielsen [ 2023-11-03 ] | |||||
|
I'm though not sure what is the benefit of marking this bug "fixed" when it was never fixed? There was a patch pushed, but it did not fix the problem. Before the original patch, the binlog position can be wrong when CHANGE MASTER is done, but is correct when no CHANGE MASTER is done. After the original patch was pushed, the binlog position can be wrong both when CHANGE MASTER is used and when CHANGE MASTER is not used. So the problem was never fixed It's not an issue for me whatever status is set for this | |||||
| Comment by Marko Mäkelä [ 2023-11-07 ] | |||||
|
I think that we must fix MDEV-32673 in order to avoid reintroducing the originally reported bug, which was that InnoDB could report a wrong recovered binlog position after RESET MASTER was executed (or log_bin_basename was changed). Yes, it was right to revert the initially implemented |