Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22351

InnoDB may report incorrect binlog position information after RESET MASTER

Details

    Description

      Warning

      Even though this bug is marked "fixed", the problem with wrong binlog position was not fixed until 10.4.32

      RESET MASTER should be propagated to InnoDB to actually clear all information from the rseg header pages.

      Observation:

      Start a server with binary log option being enabled.
      Create a simple table and insert couple of rows.

      MariaDB [test]> show binlog events;
      +-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
      | Log_name          | Pos | Event_type        | Server_id | End_log_pos | Info                                                 |
      +-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
      | master-bin.000001 |   4 | Format_desc       |         1 |         256 | Server ver: 10.3.23-MariaDB-debug-log, Binlog ver: 4 |
      | master-bin.000001 | 256 | Gtid_list         |         1 |         285 | []                                                   |
      | master-bin.000001 | 285 | Binlog_checkpoint |         1 |         329 | master-bin.000001                                    |
      | master-bin.000001 | 329 | Gtid              |         1 |         371 | GTID 0-1-1                                           |
      | master-bin.000001 | 371 | Query             |         1 |         481 | use `test`; create table t ( f int ) engine=innodb   |
      | master-bin.000001 | 481 | Gtid              |         1 |         523 | BEGIN GTID 0-1-2                                     |
      | master-bin.000001 | 523 | Query             |         1 |         620 | use `test`; insert into t values (10)                |
      | master-bin.000001 | 620 | Xid               |         1 |         651 | COMMIT /* xid=7 */                                   |
      | master-bin.000001 | 651 | Gtid              |         1 |         693 | BEGIN GTID 0-1-3                                     |
      | master-bin.000001 | 693 | Query             |         1 |         790 | use `test`; insert into t values (20)                |
      | master-bin.000001 | 790 | Xid               |         1 |         821 | COMMIT /* xid=8 */                                   |
      | master-bin.000001 | 821 | Stop              |         1 |         844 |                                                      |
      +-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
      

      Shutdown and restart server

      Debug prints: From mysqld.1.err

      ****** BEFORE *********
      ----TRX_RSEG: binlog name:./master-bin.000001
      ----TRX_RSEG: binlog_offset:651
      ----recovered_binlog_name:
      ----recovered_binlog_offset:0
      ****** AFTER *********
      ----TRX_RSEG: binlog name:./master-bin.000001
      ----TRX_RSEG: binlog_offset:651
      ----recovered_binlog_name:./master-bin.000001
      ----recovered_binlog_offset:651
      ****** BEFORE *********
      ----TRX_RSEG: binlog name:./master-bin.000001
      ----TRX_RSEG: binlog_offset:821
      ----recovered_binlog_name:./master-bin.000001
      ----recovered_binlog_offset:651
      ****** AFTER *********
      ----TRX_RSEG: binlog name:./master-bin.000001
      ----TRX_RSEG: binlog_offset:821
      ----recovered_binlog_name:./master-bin.000001
      ----recovered_binlog_offset:821
      

      Execute RESET MASTER on restarted server. // This will clear existing binary logs and a fresh 'master-bin.000001' is created.

      MariaDB [test]> RESET MASTER;
      Query OK, 0 rows affected (0.002 sec)
       
      MariaDB [test]> create table t2( f int) engine=innodb;
      Query OK, 0 rows affected (0.006 sec)
       
      MariaDB [test]> insert into t2 values (30);
      Query OK, 1 row affected (0.003 sec)
       
      MariaDB [test]> show binlog events;
      +-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
      | Log_name          | Pos | Event_type        | Server_id | End_log_pos | Info                                                 |
      +-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
      | master-bin.000001 |   4 | Format_desc       |         1 |         256 | Server ver: 10.3.23-MariaDB-debug-log, Binlog ver: 4 |
      | master-bin.000001 | 256 | Gtid_list         |         1 |         285 | []                                                   |
      | master-bin.000001 | 285 | Binlog_checkpoint |         1 |         329 | master-bin.000001                                    |
      | master-bin.000001 | 329 | Gtid              |         1 |         371 | GTID 0-1-1                                           |
      | master-bin.000001 | 371 | Query             |         1 |         480 | use `test`; create table t2( f int) engine=innodb    |
      | master-bin.000001 | 480 | Gtid              |         1 |         522 | BEGIN GTID 0-1-2                                     |
      | master-bin.000001 | 522 | Query             |         1 |         620 | use `test`; insert into t2 values (30)               |
      | master-bin.000001 | 620 | Xid               |         1 |         651 | COMMIT /* xid=9 */                                   |
      +-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
      

      Shutdown and restart. Observe the following output.
      'recovered_binlog_name' and 'position' are still referring to old 'master-bin.000001' and '821'.

      ****** BEFORE *********
      ----TRX_RSEG: binlog name:./master-bin.000001
      ----TRX_RSEG: binlog_offset:651
      ----recovered_binlog_name:
      ----recovered_binlog_offset:0
      ****** AFTER *********
      ----TRX_RSEG: binlog name:./master-bin.000001
      ----TRX_RSEG: binlog_offset:651
      ----recovered_binlog_name:./master-bin.000001
      ----recovered_binlog_offset:651
      ****** BEFORE *********
      ----TRX_RSEG: binlog name:./master-bin.000001
      ----TRX_RSEG: binlog_offset:821
      ----recovered_binlog_name:./master-bin.000001
      ----recovered_binlog_offset:651
      ****** AFTER *********
      ----TRX_RSEG: binlog name:./master-bin.000001
      ----TRX_RSEG: binlog_offset:821
      ----recovered_binlog_name:./master-bin.000001
      ----recovered_binlog_offset:821
      

      Shutdown and restart server.

      A new binary log is created.
      Now do a DML. With this latest binlog file name and pos are 'master-bin.000002' and pos=514

      MariaDB [test]> show binlog events in 'master-bin.000002';
      +-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
      | Log_name          | Pos | Event_type        | Server_id | End_log_pos | Info                                                 |
      +-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
      | master-bin.000002 |   4 | Format_desc       |         1 |         256 | Server ver: 10.3.23-MariaDB-debug-log, Binlog ver: 4 |
      | master-bin.000002 | 256 | Gtid_list         |         1 |         299 | [0-1-2]                                              |
      | master-bin.000002 | 299 | Binlog_checkpoint |         1 |         343 | master-bin.000002                                    |
      | master-bin.000002 | 343 | Gtid              |         1 |         385 | BEGIN GTID 0-1-3                                     |
      | master-bin.000002 | 385 | Query             |         1 |         483 | use `test`; insert into t2 VALUES (40)               |
      | master-bin.000002 | 483 | Xid               |         1 |         514 | COMMIT /* xid=5 */                                   |
      +-------------------+-----+-------------------+-----------+-------------+------------------------------------------------------+
      

      shutdown and restart. Since 'TRX_RSEG' file and pos are > 'recovered_binlog_name/offset' the 'recovered_binlog_name/offset' get proper assignments.

      ****** BEFORE *********
      ----TRX_RSEG: binlog name:./master-bin.000002
      ----TRX_RSEG: binlog_offset:514
      ----recovered_binlog_name:
      ----recovered_binlog_offset:0
      ****** AFTER *********
      ----TRX_RSEG: binlog name:./master-bin.000002
      ----TRX_RSEG: binlog_offset:514
      ----recovered_binlog_name:./master-bin.000002
      ----recovered_binlog_offset:514
      ****** BEFORE *********
      ----TRX_RSEG: binlog name:./master-bin.000001
      ----TRX_RSEG: binlog_offset:651
      ----recovered_binlog_name:./master-bin.000002
      ----recovered_binlog_offset:514
      ****** AFTER *********
      ----TRX_RSEG: binlog name:./master-bin.000001
      ----TRX_RSEG: binlog_offset:651
      ----recovered_binlog_name:./master-bin.000002
      ----recovered_binlog_offset:514
      ****** BEFORE *********
      ----TRX_RSEG: binlog name:./master-bin.000001
      ----TRX_RSEG: binlog_offset:821
      ----recovered_binlog_name:./master-bin.000002
      ----recovered_binlog_offset:514
      ****** AFTER *********
      ----TRX_RSEG: binlog name:./master-bin.000001
      ----TRX_RSEG: binlog_offset:821
      ----recovered_binlog_name:./master-bin.000002
      ----recovered_binlog_offset:514
      

      Following debug prints were used to observe the behavior:

      diff --git a/storage/innobase/trx/trx0rseg.cc b/storage/innobase/trx/trx0rseg.cc
      index 46fb2680371..842718ce81d 100644
      --- a/storage/innobase/trx/trx0rseg.cc
      +++ b/storage/innobase/trx/trx0rseg.cc
      @@ -453,6 +453,14 @@ trx_rseg_mem_restore(trx_rseg_t* rseg, trx_id_t& max_trx_id, mtr_t* mtr)
               if (rseg_header[TRX_RSEG_BINLOG_NAME]) {
                   const char* binlog_name = reinterpret_cast<const char*>
                       (rseg_header) + TRX_RSEG_BINLOG_NAME;
      +            fprintf(stderr,"****** BEFORE *********\n");
      +            fprintf(stderr,"----TRX_RSEG: binlog name:%s\n", binlog_name);
      +            fprintf(stderr,"----TRX_RSEG: binlog_offset:%lu\n",
      +                 mach_read_from_8(rseg_header + TRX_RSEG_BINLOG_OFFSET));
      +            fprintf(stderr,"----recovered_binlog_name:%s\n",
      +                trx_sys.recovered_binlog_filename);
      +            fprintf(stderr,"----recovered_binlog_offset:%lu\n",
      +                trx_sys.recovered_binlog_offset);
                   compile_time_assert(TRX_RSEG_BINLOG_NAME_LEN == sizeof
                               trx_sys.recovered_binlog_filename);
       
      @@ -478,6 +486,14 @@ trx_rseg_mem_restore(trx_rseg_t* rseg, trx_id_t& max_trx_id, mtr_t* mtr)
                               = binlog_offset;
                       }
                   }
      +            fprintf(stderr,"****** AFTER *********\n");
      +            fprintf(stderr,"----TRX_RSEG: binlog name:%s\n", binlog_name);
      +            fprintf(stderr,"----TRX_RSEG: binlog_offset:%lu\n",
      +                 mach_read_from_8(rseg_header + TRX_RSEG_BINLOG_OFFSET));
      +            fprintf(stderr,"----recovered_binlog_name:%s\n",
      +                trx_sys.recovered_binlog_filename);
      +            fprintf(stderr,"----recovered_binlog_offset:%lu\n",
      +                trx_sys.recovered_binlog_offset);
       

      Attachments

        Issue Links

          Activity

            knielsen, it'll be very difficult to see from Jira what's happening with the code. Jira isn't good at recording complex histories like "fixed in versions X1, Y1, Z1, reverted in Y2, Z2, fixed in Z3".

            Better create a new issue for that and keep this one with FixVersion showing where the fix fix was pushed.

            serg Sergei Golubchik added a comment - knielsen , it'll be very difficult to see from Jira what's happening with the code. Jira isn't good at recording complex histories like "fixed in versions X1, Y1, Z1, reverted in Y2, Z2, fixed in Z3". Better create a new issue for that and keep this one with FixVersion showing where the fix fix was pushed.

            knielsen, I see that you reverted the pushed fix. Please file a new MDEV for fixing this correctly and link it from here.

            marko Marko Mäkelä added a comment - knielsen , I see that you reverted the pushed fix. Please file a new MDEV for fixing this correctly and link it from here.

            MDEV-32673 filed for the new issue.
            Note to new readers: Even though this bug is marked "closed", the problem with wrong binlog position was not fixed until 10.4.32.

            knielsen Kristian Nielsen added a comment - MDEV-32673 filed for the new issue. Note to new readers: Even though this bug is marked "closed", the problem with wrong binlog position was not fixed until 10.4.32.
            knielsen Kristian Nielsen added a comment - - edited

            I'm though not sure what is the benefit of marking this bug "fixed" when it was never fixed?

            There was a patch pushed, but it did not fix the problem.

            Before the original patch, the binlog position can be wrong when CHANGE MASTER is done, but is correct when no CHANGE MASTER is done.

            After the original patch was pushed, the binlog position can be wrong both when CHANGE MASTER is used and when CHANGE MASTER is not used.

            So the problem was never fixed until 10.4.32 (update, sorry, the revert fixes the regression, but the original problem with CHANGE MASTER remains).

            It's not an issue for me whatever status is set for this MDEV-22351. Just wanted to point out that the request to leave this bug "fixed" and file another one seems strange; in case there is a misunderstanding about the impact of this bug and the different patches related to it.

            knielsen Kristian Nielsen added a comment - - edited I'm though not sure what is the benefit of marking this bug "fixed" when it was never fixed? There was a patch pushed, but it did not fix the problem. Before the original patch, the binlog position can be wrong when CHANGE MASTER is done, but is correct when no CHANGE MASTER is done. After the original patch was pushed, the binlog position can be wrong both when CHANGE MASTER is used and when CHANGE MASTER is not used. So the problem was never fixed until 10.4.32 (update, sorry, the revert fixes the regression, but the original problem with CHANGE MASTER remains). It's not an issue for me whatever status is set for this MDEV-22351 . Just wanted to point out that the request to leave this bug "fixed" and file another one seems strange; in case there is a misunderstanding about the impact of this bug and the different patches related to it.

            I think that we must fix MDEV-32673 in order to avoid reintroducing the originally reported bug, which was that InnoDB could report a wrong recovered binlog position after RESET MASTER was executed (or log_bin_basename was changed).

            Yes, it was right to revert the initially implemented MDEV-22351 "fix", because it caused a more serious bug: InnoDB could report a wrong recovered binlog position when there was no RESET MASTER at all in the workload.

            marko Marko Mäkelä added a comment - I think that we must fix MDEV-32673 in order to avoid reintroducing the originally reported bug, which was that InnoDB could report a wrong recovered binlog position after RESET MASTER was executed (or log_bin_basename was changed). Yes, it was right to revert the initially implemented MDEV-22351 "fix", because it caused a more serious bug: InnoDB could report a wrong recovered binlog position when there was no RESET MASTER at all in the workload.

            People

              marko Marko Mäkelä
              sujatha.sivakumar Sujatha Sivakumar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.