Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27849

rpl.rpl_start_alter_7 (and 8, mysqbinlog_2) fail in buildbot, [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos

Details

    Description

      https://buildbot.mariadb.org/#/builders/195/builds/4566/steps/7/logs/stdio
      https://buildbot.mariadb.org/#/builders/195/builds/4419/steps/7/logs/stdio

      Occurs for rpl.rpl_start_alter_7, rpl.rpl_start_alter_8, rpl.rpl_start_alter_mysqlbinlog_2, rpl.rpl_start_alter_4, rpl.rpl_start_alter_3, rpl.rpl_start_alter_6, rpl.rpl_start_alter_5

      rpl.rpl_start_alter_7 'innodb'           w1 [ fail ]
              Test ended at 2022-02-15 02:25:31
      CURRENT_TEST: rpl.rpl_start_alter_7
      mysqltest: In included file "./include/sync_with_master_gtid.inc": 
      included from /buildbot/amd64-ubuntu-1804-msan/build/mysql-test/suite/rpl/t/rpl_start_alter_7.test at line 83:
      At line 48: Failed to sync with master
      The result from queries just before the failure was:
      < snip >
      ERROR 23000: Duplicate entry '2' for key 'b'
      ERROR 23000: Duplicate entry '2' for key 'b'
      ERROR 23000: Duplicate entry '2' for key 'b'
      connection server_2;
      drop database s2;
      select @@gtid_binlog_pos;
      @@gtid_binlog_pos
      12-2-412
      connection server_3;
      start all slaves;
      Warnings:
      Note	1937	SLAVE 'm2' started
      Note	1937	SLAVE 'm1' started
      set default_master_connection = 'm1';
      include/wait_for_slave_to_start.inc
      set default_master_connection = 'm2';
      include/wait_for_slave_to_start.inc
      set default_master_connection = 'm1';
      include/sync_with_master_gtid.inc
      Timeout in master_gtid_wait('11-1-412', 120), current slave GTID position is: 11-1-291,12-2-412.
      

      Not checked with the original failures, but the replica error log in the multi-master setup shows:

      2022-11-24  4:29:00 31 [Note] Master 'm1': Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './mysqld-relay-bin-m1.000001' position: 4; GTID position '12-2-2'
      2022-11-24  4:29:02 24 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '11-605' for key 'PRIMARY', Gtid 11-1-281, Internal MariaDB error code: 1942
      2022-11-24  4:29:02 24 [ERROR] Slave (additional info): Duplicate entry '11-605' for key 'PRIMARY' Error_code: 1062
      2022-11-24  4:29:02 24 [Warning] Slave: Duplicate entry '11-605' for key 'PRIMARY' Error_code: 1062
      2022-11-24  4:29:02 24 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 49198; GTID position '11-1-280,12-2-334'
      2022-11-24  4:29:02 31 [Note] Master 'm1': Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 49198; GTID position '11-1-280,12-2-334', master: 127.0.0.1:16040
      

      Attachments

        1. logs.tar.gz
          1.13 MB
          Angelique Sklavounos
        2. MDEV-27849.cnf
          0.7 kB
          Daniel Black
        3. MDEV-27849.test
          2 kB
          Daniel Black

        Issue Links

          Activity

            elenst Elena Stepanova added a comment - - edited

            My git blame is as good as everyone's:

            commit e22c3810f059e4f6e3ec52f09d35486e0ff80fb6
            Author: Sergei Golubchik
            Date:   Thu Jun 5 09:04:43 2014 +0200
             
                MDEV-6243 mysql_install_db or mysql_upgrade fails when default_engine=archive
                
                don't use the default storage engine for mysql.gtid_slave_pos, prefer innodb.
                but alter it to myisam in mtr, because many tests run without innodb.
            

            And naturally MyISAM was further switched to Aria along with the system table engine change, by some Monty's commit.

            elenst Elena Stepanova added a comment - - edited My git blame is as good as everyone's: commit e22c3810f059e4f6e3ec52f09d35486e0ff80fb6 Author: Sergei Golubchik Date: Thu Jun 5 09:04:43 2014 +0200   MDEV-6243 mysql_install_db or mysql_upgrade fails when default_engine=archive don't use the default storage engine for mysql.gtid_slave_pos, prefer innodb. but alter it to myisam in mtr, because many tests run without innodb. And naturally MyISAM was further switched to Aria along with the system table engine change, by some Monty's commit.
            knielsen Kristian Nielsen added a comment - - edited

            I guess mtr uses the same mysql_install_db template for all tests, regardless of restart options (I didn't check)? Then we cannot have innodb tables in there. Maybe we could change it in include/have_innodb.inc.

            Anyway, I debugged the rpl_start_alter_6 failure. It turns out the replication deadlock kill+retry is caused by persistent statistics inside InnoDB, dict_stats_save(). This function creates an internal transaction that gets assigned the user-query's THD. The function then goes on to take table(?) locks on the dict tables. When these locks conflict, they cause parallel replication to deadlock kill the later transaction because it has the replication THD assigned. I think this is a redundant/false-alarm retry, since the dict system transaction is internal and should not be able to cause a deadlock.

            I'm wondering if it's correct to assign the query/replication THD to the separate InnoDB-internal trx that updates the dict tables. In this case, it causes unnecessary and unexpected transaction rollback and retry in parallel replication.

            On the other hand, maybe having a NULL trx->mysql_thd would cause problems in other places. It's not a fatal problem to get spurious rollback+retry in parallel replication (it's handled on the upper layer); but would still be preferable to avoid if it is possible. The user might carefully have arranged for no conflicts to be possible between replicated transactions, and then be surprised / experience problems when such conflicts occur from persistent stats updates.

            knielsen Kristian Nielsen added a comment - - edited I guess mtr uses the same mysql_install_db template for all tests, regardless of restart options (I didn't check)? Then we cannot have innodb tables in there. Maybe we could change it in include/have_innodb.inc. Anyway, I debugged the rpl_start_alter_6 failure. It turns out the replication deadlock kill+retry is caused by persistent statistics inside InnoDB, dict_stats_save(). This function creates an internal transaction that gets assigned the user-query's THD. The function then goes on to take table(?) locks on the dict tables. When these locks conflict, they cause parallel replication to deadlock kill the later transaction because it has the replication THD assigned. I think this is a redundant/false-alarm retry, since the dict system transaction is internal and should not be able to cause a deadlock. I'm wondering if it's correct to assign the query/replication THD to the separate InnoDB-internal trx that updates the dict tables. In this case, it causes unnecessary and unexpected transaction rollback and retry in parallel replication. On the other hand, maybe having a NULL trx->mysql_thd would cause problems in other places. It's not a fatal problem to get spurious rollback+retry in parallel replication (it's handled on the upper layer); but would still be preferable to avoid if it is possible. The user might carefully have arranged for no conflicts to be possible between replicated transactions, and then be surprised / experience problems when such conflicts occur from persistent stats updates.

            Starting with MDEV-16678, InnoDB pretty much requires a valid trx_t::mysql_thd. The assignment of THD is outside the control of InnoDB, other than that there is some preallocation of purge-related THD objects related to MDEV-16264 and MDEV-11024. But those cannot be used by dict_stats_save().

            marko Marko Mäkelä added a comment - Starting with MDEV-16678 , InnoDB pretty much requires a valid trx_t::mysql_thd . The assignment of THD is outside the control of InnoDB, other than that there is some preallocation of purge-related THD objects related to MDEV-16264 and MDEV-11024 . But those cannot be used by dict_stats_save() .

            Thanks, Marko.I'll change the tests to use InnoDB for the mysql.gtid_slave_pos table.

            For now, I don't think we need to do anything else. Some transaction rollback+retry is expected in in-order parallel replication, and presumably conflicts due to dict stats update will be rare. Let's just keep in mind that this can be a source of otherwise unexpected conflicts in parallel replication, in case we see it in other tests or user reports.

            knielsen Kristian Nielsen added a comment - Thanks, Marko.I'll change the tests to use InnoDB for the mysql.gtid_slave_pos table. For now, I don't think we need to do anything else. Some transaction rollback+retry is expected in in-order parallel replication, and presumably conflicts due to dict stats update will be rare. Let's just keep in mind that this can be a source of otherwise unexpected conflicts in parallel replication, in case we see it in other tests or user reports.

            Fix pushed to 10.11. Bunch of rpl_start_alter_*.test now change the mysql.gtid_slave_pos table to use InnoDB.

            knielsen Kristian Nielsen added a comment - Fix pushed to 10.11. Bunch of rpl_start_alter_*.test now change the mysql.gtid_slave_pos table to use InnoDB.

            People

              knielsen Kristian Nielsen
              angelique.sklavounos Angelique Sklavounos (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.