Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19572

Async slave on Galera node stops due to 1942 Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0:

    Details

    • Type: Bug
    • Status: In Progress (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 10.2.16, 10.2.23
    • Fix Version/s: 10.2
    • Component/s: Galera, Replication
    • Labels:
      None

      Description

      When changes to MyISAM table are replicated from async MariaDB 10.1.x master to a node in MariaDB 10.2.x Galera cluster, in GTID-based replication setup, replication eventually breaks with the following slave status:

      MariaDB [test]> show slave status\G
      *************************** 1. row ***************************
                     Slave_IO_State: Waiting for master to send event
                        Master_Host: ao756
                        Master_User: root
                        Master_Port: 3310
                      Connect_Retry: 60
                    Master_Log_File: ao756-bin.000004
                Read_Master_Log_Pos: 59590494
                     Relay_Log_File: mysql-node1-relay-bin.000002
                      Relay_Log_Pos: 14872226
              Relay_Master_Log_File: ao756-bin.000004
                   Slave_IO_Running: Yes
                  Slave_SQL_Running: No
                    Replicate_Do_DB:
                Replicate_Ignore_DB:
                 Replicate_Do_Table:
             Replicate_Ignore_Table:
            Replicate_Wild_Do_Table:
        Replicate_Wild_Ignore_Table:
                         Last_Errno: 1942
                         Last_Error: Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0:
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 29774344
                    Relay_Log_Space: 44688691
                    Until_Condition: None
                     Until_Log_File:
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File:
                 Master_SSL_CA_Path:
                    Master_SSL_Cert:
                  Master_SSL_Cipher:
                     Master_SSL_Key:
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 0
                      Last_IO_Error:
                     Last_SQL_Errno: 1942
                     Last_SQL_Error: Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0:
        Replicate_Ignore_Server_Ids:
                   Master_Server_Id: 1
                     Master_SSL_Crl:
                 Master_SSL_Crlpath:
                         Using_Gtid: Slave_Pos
                        Gtid_IO_Pos: 4100-1-550731
            Replicate_Do_Domain_Ids:
        Replicate_Ignore_Domain_Ids:
                      Parallel_Mode: conservative
                          SQL_Delay: 0
                SQL_Remaining_Delay: NULL
            Slave_SQL_Running_State:
      1 row in set (0,02 sec)
       
      MariaDB [test]> show variables like 'gtid%';\
      +------------------------+---------------+
      | Variable_name          | Value         |
      +------------------------+---------------+
      | gtid_binlog_pos        | 4100-1-380354 |
      | gtid_binlog_state      | 4100-1-380354 |
      | gtid_current_pos       | 4100-1-380353 |
      | gtid_domain_id         | 4100          |
      | gtid_ignore_duplicates | OFF           |
      | gtid_seq_no            | 0             |
      | gtid_slave_pos         | 4100-1-380353 |
      | gtid_strict_mode       | OFF           |
      +------------------------+---------------+
      8 rows in set (0,01 sec)
       
      MariaDB [test]> select * from mysql.gtid_slave_pos;
      +-----------+--------+-----------+--------+
      | domain_id | sub_id | server_id | seq_no |
      +-----------+--------+-----------+--------+
      |      4100 | 599107 |         1 | 380352 |
      |      4100 | 599108 |         1 | 380353 |
      +-----------+--------+-----------+--------+
      2 rows in set (0,06 sec)
       
      MariaDB [test]> show create table mysql.gtid_slavee_pos\G
      ERROR 1146 (42S02): Table 'mysql.gtid_slavee_pos' doesn't exist
      MariaDB [test]> show create table mysql.gtid_slave_pos\G
      *************************** 1. row ***************************
             Table: gtid_slave_pos
      Create Table: CREATE TABLE `gtid_slave_pos` (
        `domain_id` int(10) unsigned NOT NULL,
        `sub_id` bigint(20) unsigned NOT NULL,
        `server_id` int(10) unsigned NOT NULL,
        `seq_no` bigint(20) unsigned NOT NULL,
        PRIMARY KEY (`domain_id`,`sub_id`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Replication slave GTID position'
      1 row in set (0,05 sec)
       
      MariaDB [test]> select version();
      +---------------------+
      | version()           |
      +---------------------+
      | 10.2.23-MariaDB-log |
      +---------------------+
      1 row in set (0,06 sec)
      

      In the error log we see:

      ...
      2019-05-23 16:55:10 140069099460352 [Note] Slave SQL thread initialized, starting replication in log 'ao756-bin.000004' at position 28452044, relay log './mysql-node1-relay-bin.000002' position: 13549926; GTID position '4100-1-372797'
      2019-05-23 16:55:10 140069099460352 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.ttmyis; Duplicate entry '161766' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log ao756-bin.000004, end_log_pos 28452166, Gtid 4100-1-372798, Internal MariaDB error code: 1062
      2019-05-23 16:55:10 140069099460352 [Warning] WSREP: SQL statement was ineffective thd: 61940 buf: 126
      schema: test
      QUERY: COMMIT
      => Skipping replication
      2019-05-23 16:55:10 140069099460352 [ERROR] Slave SQL: Error 'Got error 35 "Resource deadlock avoided" during COMMIT' on query. Default database: 'test'. Query: 'COMMIT', Gtid 4100-1-372800, Internal MariaDB error code: 1180
      2019-05-23 16:55:10 140069099460352 [ERROR] Slave SQL: Node has dropped from cluster, Gtid 4100-1-372800, Internal MariaDB error code: 1047
      ...
      2019-05-23 16:55:54 140069099460352 [Note] Slave SQL thread initialized, starting replication in log 'ao756-bin.000004' at position 29774344, relay log './mysql-node1-relay-bin.000002' position: 14872226; GTID position '4100-1-380353'
      2019-05-23 16:55:54 140069099460352 [Warning] WSREP: BF applier failed to open_and_lock_tables: 0, fatal: 0 wsrep = (exec_mode: 0 conflict_state: 0 seqno: -1)
      2019-05-23 16:55:54 140069099460352 [ERROR] Slave SQL: Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0: , Gtid 4100-1-380354, Internal MariaDB error code: 1942
      2019-05-23 16:55:54 140069099460352 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'ao756-bin.000004' position 29774344; GTID position '4100-1-380353'
      2019-05-23 16:55:54 140069099460352 [Note] Slave SQL thread exiting, replication stopped in log 'ao756-bin.000004' at position 29774344; GTID position '4100-1-380353'
      ...
      

      To end up like this I've created a fresh MariaDB master started as follows:

      bin/mysqld_safe --no-defaults --port=3310 --log-bin --gtid_domain_id=4100 --server_id=1 --binlog_format=row --sync_binlog=0 --innodb_flush_log_at_trx_commit=2 &
      

      and set up also a local 2-node Galera cluster with nodes started as:

      bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode1.cnf --wsrep-new-cluster &
      bin/mysqld_safe --defaults-file=/home/openxs/galera/mynode2.cnf &
      

      while related configuration files are:

      openxs@ao756:~/dbs/maria10.2$ cat /home/openxs/galera/mynode1.cnf
      [mysqld]
      datadir=/home/openxs/galera/node1
      port=3306
      socket=/tmp/mysql-node1.sock
      pid-file=/tmp/mysql-node1.pid
      log-error=/tmp/mysql-node1.err
      binlog_format=ROW
      innodb_autoinc_lock_mode=2
       
      wsrep_on=ON
      wsrep_provider=/usr/lib/libgalera_smm.so
      wsrep_cluster_name = singlebox
      wsrep_node_name = node1
      wsrep_cluster_address=gcomm://127.0.0.1:4567,127.0.0.1:5020,127.0.0.1:6020?pc.wait_prim=no
       
      log_bin
      log_slave_updates=1
      binlog_row_image=FULL
      gtid_domain_id=4100
      wsrep_gtid_domain_id=4100
      wsrep_gtid_mode=ON
      wsrep_restart_slave=1
      server_id=2
      slave-skip-errors=1062
      innodb_flush_log_at_trx_commit=2
      sync_binlog=0
      openxs@ao756:~/dbs/maria10.2$ cat /home/openxs/galera/mynode2.cnf
      [mysqld]
      datadir=/home/openxs/galera/node2
      port=3307
      socket=/tmp/mysql-node2.sock
      pid-file=/tmp/mysql-node2.pid
      log-error=/tmp/mysql-node2.err
      binlog_format=ROW
      innodb_autoinc_lock_mode=2
       
      wsrep_on=ON
      wsrep_provider=/usr/lib/libgalera_smm.so
      wsrep_cluster_name = singlebox
      wsrep_node_name = node2
      wsrep_cluster_address=gcomm://127.0.0.1:4567,127.0.0.1:5020,127.0.0.1:6020?pc.wait_prim=no
      wsrep_provider_options = "base_port=5020;"
       
      log_bin
      log_slave_updates=1
      binlog_row_image=FULL
      gtid_domain_id=4100
      wsrep_gtid_domain_id=4100
      wsrep_gtid_mode=ON
      wsrep_restart_slave=1
      server_id=3
      slave-skip-errors=1062
      innodb_flush_log_at_trx_commit=2
      sync_binlog=0
      openxs@ao756:~/dbs/maria10.2$
      

      The I've created two tables on master, in test database:

       CREATE TABLE `ttinno` (
        `id` int(11) NOT NULL AUTO_INCREMENT,
        `c1` int(11) DEFAULT NULL,
        PRIMARY KEY (`id`)
      ) ENGINE=InnoDB;
       
      CREATE TABLE `ttmyis` (
        `id` int(11) NOT NULL AUTO_INCREMENT,
        `c1` int(11) DEFAULT NULL,
        PRIMARY KEY (`id`)
      ) ENGINE=MyISAM;
       
      insert into ttinno(c1) values (1), (2), (3);
      insert into ttmyis(c1) values (1), (2), (3);
      

      Then I pointed out slave on node1 of cluster to master:

      set global gtid_slave_pos='4100-1-0';
      change master to master_host='ao756', master_port=3310, master_user='root', master_use_gtid=slave_pos;
      

      and started replication. After checking that initial stat is replicated to cluster I've executed the following mysqlslap commands pointing to async master:

      bin/mysqlslap --concurrency=2 --create-schema=test --no-drop --number-of-queries=100000 --iterations=1 --query='insert into ttmyis(c1) values(rand()*100); insert into ttinno(c1) values(rand()*100);' -uroot --host=127.0.0.1 --port=3310 test
      bin/mysqlslap --concurrency=2 --create-schema=test --no-drop --number-of-queries=100000 --iterations=1 --query='start transaction; update ttmyis set c1=100 where id = rand()*1000; insert into ttinno(c1) values(rand()*100); commit;' -uroot --host=127.0.0.1 --port=3310 test
      bin/mysqlslap --concurrency=4 --create-schema=test --no-drop --number-of-queries=100000 --iterations=1 --query='start transaction; update ttinno set c1=100 where id = rand()*1000; insert into ttmyis(c1) values(rand()*100); commit;' -uroot --host=127.0.0.1 --port=3310 test
      

      Second run was eventually cancelled as it was too slow. When the last mysqlslap completed I've checked node1 and noted slave is broken there in thee way explained above.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                stepan.patryshev Stepan Patryshev
                Reporter:
                valerii Valerii Kravchuk
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: