[MDEV-22351] InnoDB may report incorrect binlog position information after RESET MASTER Created: 2020-04-23  Updated: 2023-12-11  Resolved: 2023-11-03

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3, 10.4, 10.5
Fix Version/s: 10.3.28, 10.4.18, 10.5.9

Type: Bug Priority: Major
Reporter: Sujatha Sivakumar (Inactive) Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-18959 Engine transaction recovery through p... Stalled
Problem/Incident
is caused by MDEV-15158 On commit, do not write to the TRX_SY... Closed
Relates
relates to MDEV-21611 mariabackup should update binlog posi... Open
relates to MDEV-26322 Last binlog file and position are "em... Closed
relates to MDEV-32673 Handle InnoDB binlog position recover... Stalled

 Description   
Warning

Even though this bug is marked "fixed", the problem with wrong binlog position was not fixed until 10.4.32

RESET MASTER should be propagated to InnoDB to actually clear all information from the rseg header pages.

Observation:

Start a server with binary log option being enabled.
Create a simple table and insert couple of rows.

MariaDB [test]> show binlog events;
+-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
| Log_name          | Pos | Event_type        | Server_id | End_log_pos | Info                                                 |
+-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
| master-bin.000001 |   4 | Format_desc       |         1 |         256 | Server ver: 10.3.23-MariaDB-debug-log, Binlog ver: 4 |
| master-bin.000001 | 256 | Gtid_list         |         1 |         285 | []                                                   |
| master-bin.000001 | 285 | Binlog_checkpoint |         1 |         329 | master-bin.000001                                    |
| master-bin.000001 | 329 | Gtid              |         1 |         371 | GTID 0-1-1                                           |
| master-bin.000001 | 371 | Query             |         1 |         481 | use `test`; create table t ( f int ) engine=innodb   |
| master-bin.000001 | 481 | Gtid              |         1 |         523 | BEGIN GTID 0-1-2                                     |
| master-bin.000001 | 523 | Query             |         1 |         620 | use `test`; insert into t values (10)                |
| master-bin.000001 | 620 | Xid               |         1 |         651 | COMMIT /* xid=7 */                                   |
| master-bin.000001 | 651 | Gtid              |         1 |         693 | BEGIN GTID 0-1-3                                     |
| master-bin.000001 | 693 | Query             |         1 |         790 | use `test`; insert into t values (20)                |
| master-bin.000001 | 790 | Xid               |         1 |         821 | COMMIT /* xid=8 */                                   |
| master-bin.000001 | 821 | Stop              |         1 |         844 |                                                      |
+-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+

Shutdown and restart server

Debug prints: From mysqld.1.err

****** BEFORE *********
----TRX_RSEG: binlog name:./master-bin.000001
----TRX_RSEG: binlog_offset:651
----recovered_binlog_name:
----recovered_binlog_offset:0
****** AFTER *********
----TRX_RSEG: binlog name:./master-bin.000001
----TRX_RSEG: binlog_offset:651
----recovered_binlog_name:./master-bin.000001
----recovered_binlog_offset:651
****** BEFORE *********
----TRX_RSEG: binlog name:./master-bin.000001
----TRX_RSEG: binlog_offset:821
----recovered_binlog_name:./master-bin.000001
----recovered_binlog_offset:651
****** AFTER *********
----TRX_RSEG: binlog name:./master-bin.000001
----TRX_RSEG: binlog_offset:821
----recovered_binlog_name:./master-bin.000001
----recovered_binlog_offset:821

Execute RESET MASTER on restarted server. // This will clear existing binary logs and a fresh 'master-bin.000001' is created.

MariaDB [test]> RESET MASTER;
Query OK, 0 rows affected (0.002 sec)
 
MariaDB [test]> create table t2( f int) engine=innodb;
Query OK, 0 rows affected (0.006 sec)
 
MariaDB [test]> insert into t2 values (30);
Query OK, 1 row affected (0.003 sec)
 
MariaDB [test]> show binlog events;
+-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
| Log_name          | Pos | Event_type        | Server_id | End_log_pos | Info                                                 |
+-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
| master-bin.000001 |   4 | Format_desc       |         1 |         256 | Server ver: 10.3.23-MariaDB-debug-log, Binlog ver: 4 |
| master-bin.000001 | 256 | Gtid_list         |         1 |         285 | []                                                   |
| master-bin.000001 | 285 | Binlog_checkpoint |         1 |         329 | master-bin.000001                                    |
| master-bin.000001 | 329 | Gtid              |         1 |         371 | GTID 0-1-1                                           |
| master-bin.000001 | 371 | Query             |         1 |         480 | use `test`; create table t2( f int) engine=innodb    |
| master-bin.000001 | 480 | Gtid              |         1 |         522 | BEGIN GTID 0-1-2                                     |
| master-bin.000001 | 522 | Query             |         1 |         620 | use `test`; insert into t2 values (30)               |
| master-bin.000001 | 620 | Xid               |         1 |         651 | COMMIT /* xid=9 */                                   |
+-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+

Shutdown and restart. Observe the following output.
'recovered_binlog_name' and 'position' are still referring to old 'master-bin.000001' and '821'.

****** BEFORE *********
----TRX_RSEG: binlog name:./master-bin.000001
----TRX_RSEG: binlog_offset:651
----recovered_binlog_name:
----recovered_binlog_offset:0
****** AFTER *********
----TRX_RSEG: binlog name:./master-bin.000001
----TRX_RSEG: binlog_offset:651
----recovered_binlog_name:./master-bin.000001
----recovered_binlog_offset:651
****** BEFORE *********
----TRX_RSEG: binlog name:./master-bin.000001
----TRX_RSEG: binlog_offset:821
----recovered_binlog_name:./master-bin.000001
----recovered_binlog_offset:651
****** AFTER *********
----TRX_RSEG: binlog name:./master-bin.000001
----TRX_RSEG: binlog_offset:821
----recovered_binlog_name:./master-bin.000001
----recovered_binlog_offset:821

Shutdown and restart server.

A new binary log is created.
Now do a DML. With this latest binlog file name and pos are 'master-bin.000002' and pos=514

MariaDB [test]> show binlog events in 'master-bin.000002';
+-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
| Log_name          | Pos | Event_type        | Server_id | End_log_pos | Info                                                 |
+-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
| master-bin.000002 |   4 | Format_desc       |         1 |         256 | Server ver: 10.3.23-MariaDB-debug-log, Binlog ver: 4 |
| master-bin.000002 | 256 | Gtid_list         |         1 |         299 | [0-1-2]                                              |
| master-bin.000002 | 299 | Binlog_checkpoint |         1 |         343 | master-bin.000002                                    |
| master-bin.000002 | 343 | Gtid              |         1 |         385 | BEGIN GTID 0-1-3                                     |
| master-bin.000002 | 385 | Query             |         1 |         483 | use `test`; insert into t2 VALUES (40)               |
| master-bin.000002 | 483 | Xid               |         1 |         514 | COMMIT /* xid=5 */                                   |
+-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+

shutdown and restart. Since 'TRX_RSEG' file and pos are > 'recovered_binlog_name/offset' the 'recovered_binlog_name/offset' get proper assignments.

****** BEFORE *********
----TRX_RSEG: binlog name:./master-bin.000002
----TRX_RSEG: binlog_offset:514
----recovered_binlog_name:
----recovered_binlog_offset:0
****** AFTER *********
----TRX_RSEG: binlog name:./master-bin.000002
----TRX_RSEG: binlog_offset:514
----recovered_binlog_name:./master-bin.000002
----recovered_binlog_offset:514
****** BEFORE *********
----TRX_RSEG: binlog name:./master-bin.000001
----TRX_RSEG: binlog_offset:651
----recovered_binlog_name:./master-bin.000002
----recovered_binlog_offset:514
****** AFTER *********
----TRX_RSEG: binlog name:./master-bin.000001
----TRX_RSEG: binlog_offset:651
----recovered_binlog_name:./master-bin.000002
----recovered_binlog_offset:514
****** BEFORE *********
----TRX_RSEG: binlog name:./master-bin.000001
----TRX_RSEG: binlog_offset:821
----recovered_binlog_name:./master-bin.000002
----recovered_binlog_offset:514
****** AFTER *********
----TRX_RSEG: binlog name:./master-bin.000001
----TRX_RSEG: binlog_offset:821
----recovered_binlog_name:./master-bin.000002
----recovered_binlog_offset:514

Following debug prints were used to observe the behavior:

diff --git a/storage/innobase/trx/trx0rseg.cc b/storage/innobase/trx/trx0rseg.cc
index 46fb2680371..842718ce81d 100644
--- a/storage/innobase/trx/trx0rseg.cc
+++ b/storage/innobase/trx/trx0rseg.cc
@@ -453,6 +453,14 @@ trx_rseg_mem_restore(trx_rseg_t* rseg, trx_id_t& max_trx_id, mtr_t* mtr)
         if (rseg_header[TRX_RSEG_BINLOG_NAME]) {
             const char* binlog_name = reinterpret_cast<const char*>
                 (rseg_header) + TRX_RSEG_BINLOG_NAME;
+            fprintf(stderr,"****** BEFORE *********\n");
+            fprintf(stderr,"----TRX_RSEG: binlog name:%s\n", binlog_name);
+            fprintf(stderr,"----TRX_RSEG: binlog_offset:%lu\n",
+                 mach_read_from_8(rseg_header + TRX_RSEG_BINLOG_OFFSET));
+            fprintf(stderr,"----recovered_binlog_name:%s\n",
+                trx_sys.recovered_binlog_filename);
+            fprintf(stderr,"----recovered_binlog_offset:%lu\n",
+                trx_sys.recovered_binlog_offset);
             compile_time_assert(TRX_RSEG_BINLOG_NAME_LEN == sizeof
                         trx_sys.recovered_binlog_filename);
 
@@ -478,6 +486,14 @@ trx_rseg_mem_restore(trx_rseg_t* rseg, trx_id_t& max_trx_id, mtr_t* mtr)
                         = binlog_offset;
                 }
             }
+            fprintf(stderr,"****** AFTER *********\n");
+            fprintf(stderr,"----TRX_RSEG: binlog name:%s\n", binlog_name);
+            fprintf(stderr,"----TRX_RSEG: binlog_offset:%lu\n",
+                 mach_read_from_8(rseg_header + TRX_RSEG_BINLOG_OFFSET));
+            fprintf(stderr,"----recovered_binlog_name:%s\n",
+                trx_sys.recovered_binlog_filename);
+            fprintf(stderr,"----recovered_binlog_offset:%lu\n",
+                trx_sys.recovered_binlog_offset);
 



 Comments   
Comment by Marko Mäkelä [ 2020-04-23 ]

sujatha.sivakumar, we need a handlerton call that will be invoked by RESET MASTER. Once we have that interface, InnoDB can implement it, to clear every copy of binlog information from the TRX_SYS page and the undo log header pages.

Comment by Marko Mäkelä [ 2020-04-23 ]

The change is on bb-10.3-marko.

Comment by Marko Mäkelä [ 2021-01-21 ]

serg, Elkin, even though a different solution was chosen for MDEV-21117, this bug means that essentially, trx_sys_print_mysql_binlog_offset() may be displaying garbage information. Is it acceptable to remove that functionality altogether in 10.6?

What mechanism does the binlog recovery use to determine which transactions need to be replayed to the storage engines? Is it based on the internal XA transactions? That is, will the binlog recovery only invoke XA COMMIT or XA ROLLBACK internally? I want to ensure that it is acceptable to remove this output:

	ib::info() << "Last binlog file '"
		<< trx_sys.recovered_binlog_filename
		<< "', position "
		<< trx_sys.recovered_binlog_offset;

Sample output:

2021-01-21 18:20:50 0 [Note] InnoDB: Last binlog file './master-bin.000001', position 986

This is covered by the test innodb.group_commit_binlog_pos_no_optimize_thread and mariabackup.binlog (without actually validating that the position matches the expectations).

There is also something similar in RocksDB, but I do not know if it is affected by this RESET MASTER bug.

Because I do not like corruption bugs, so I would either prefer to fix this bug or remove the functionality as ‘not longer needed’ in the upcoming 10.6 release.

Comment by Marko Mäkelä [ 2021-01-22 ]

When it comes to fixing the bug in 10.3, 10.4, 10.5, I have an alternative idea that I did not prototype yet. On recovery, we could try to gather the binlog position from the page that carries the largest FIL_PAGE_LSN (or newest_modification). In that way, even if RESET MASTER and some subsequent commands caused the binlog position to go backwards, we should get the latest version of the position. Currently, trx_rseg_mem_restore() is ignoring the LSN and taking the greatest available binlog file name or log position, even if it occurred on an older page.

Comment by Marko Mäkelä [ 2021-01-22 ]

sujatha.sivakumar, can you please test whether the bb-10.3-MDEV-22351 branch would fix the problem? I am not familiar with replication, and you only included the relevant steps of how to reproduce, not the complete steps.

Comment by Andrei Elkin [ 2021-01-22 ]

marko, serg: to trx_sys.recovered_binlog, MDEV-1895 may need actually it, or gtid-format equivalent 'cos uncertainty
about the binlog starting position for recovery.
In contrast in innodb-flush-log-at-trx-commit =1 (MDEV-21117 context), a not found at recovery as prepared trx is supposed to have been committed (so we can go scanning until a suitable position is met).

Comment by Sujatha Sivakumar (Inactive) [ 2021-01-25 ]

Hello Marko,

Thank you for fixing this issue.
I verified manually and observed that RESET MASTER specific binlog file name and position
change are reflected in 'recovered_binlog_name/offset'.

Comment by Marko Mäkelä [ 2021-01-25 ]

Thank you for verifying my fix, sujatha.sivakumar!

I’d still like to remove the storage of the binlog information from InnoDB in an upcoming major release (such as 10.6), but that is a separate discussion.

Comment by Marko Mäkelä [ 2022-05-12 ]

Apparently, the main reason to have InnoDB store the binlog position is backup. That could be sorted out in MDEV-21611.

Comment by Kristian Nielsen [ 2023-09-13 ]

I think the fix for this bug is incorrect, and introduces a regression.
I haven't had time to analyse this to the end. But I attached a test case to MDEV-29581 that shows the wrong position being reported (nonteterministic, needs to be run N times to trigger the bug).
What I see when the test fails, is that the largest FIL_PAGE_LSN seen by the code and used to choose the right page to get the information from, is from a modification that was done during the backup prepare phase. This is not the last modification done at the point where the backup was taken, so we get the wrong data.

Reverting this fix makes the testcase pass.

Comment by Marko Mäkelä [ 2023-11-01 ]

knielsen, I think that we should consider reverting to something that resembles my initial fix. It would not involve any file format changes. If I remember correctly, this simple solution was rejected by serg back then.

Comment by Kristian Nielsen [ 2023-11-03 ]

I implemented a proof-of-concept patch that avoids the need to strcmp() binlog file names, using a version number instead associated with each binlog position entry:

https://github.com/MariaDB/server/commits/knielsen_mdev22351

The disadvantage is that it requires extending the format of the rseg page header, and requires an extra 8 byte redo log write at every commit. As Marko wrote, for this reason we for now prefer to keep the current strcmp() method (after reverting the original patch for MDEV-22351 which causes the wrong position to be recovered).

The proof-of-concept patch includes a test case for the RESET MASTER problem that can be used with an alternative fix.

Comment by Kristian Nielsen [ 2023-11-03 ]

Re-opening as the original fix causes regression and will be reverted.

Comment by Sergei Golubchik [ 2023-11-03 ]

knielsen, it'll be very difficult to see from Jira what's happening with the code. Jira isn't good at recording complex histories like "fixed in versions X1, Y1, Z1, reverted in Y2, Z2, fixed in Z3".

Better create a new issue for that and keep this one with FixVersion showing where the fix fix was pushed.

Comment by Marko Mäkelä [ 2023-11-03 ]

knielsen, I see that you reverted the pushed fix. Please file a new MDEV for fixing this correctly and link it from here.

Comment by Kristian Nielsen [ 2023-11-03 ]

MDEV-32673 filed for the new issue.
Note to new readers: Even though this bug is marked "closed", the problem with wrong binlog position was not fixed until 10.4.32.

Comment by Kristian Nielsen [ 2023-11-03 ]

I'm though not sure what is the benefit of marking this bug "fixed" when it was never fixed?

There was a patch pushed, but it did not fix the problem.

Before the original patch, the binlog position can be wrong when CHANGE MASTER is done, but is correct when no CHANGE MASTER is done.

After the original patch was pushed, the binlog position can be wrong both when CHANGE MASTER is used and when CHANGE MASTER is not used.

So the problem was never fixed until 10.4.32 (update, sorry, the revert fixes the regression, but the original problem with CHANGE MASTER remains).

It's not an issue for me whatever status is set for this MDEV-22351. Just wanted to point out that the request to leave this bug "fixed" and file another one seems strange; in case there is a misunderstanding about the impact of this bug and the different patches related to it.

Comment by Marko Mäkelä [ 2023-11-07 ]

I think that we must fix MDEV-32673 in order to avoid reintroducing the originally reported bug, which was that InnoDB could report a wrong recovered binlog position after RESET MASTER was executed (or log_bin_basename was changed).

Yes, it was right to revert the initially implemented MDEV-22351 "fix", because it caused a more serious bug: InnoDB could report a wrong recovered binlog position when there was no RESET MASTER at all in the workload.

Generated at Thu Feb 08 09:14:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.