Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30010

Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964

    XMLWordPrintable

Details

    Description

      Please review comments for reproduction information.

      More to the point, this appears to be a direct reproduction of MDEV-20628 on a MariaDB version patched against MDEV-20605.

      Primary using binlog_format=mixed. Replica using the below, key configurations-

      slave_parallel_mode=optimistic
      slave_parallel_threads=16
      slave_domain_parallel_threads=16
      slave_parallel_max_queued=2147483647

      Table structure uses the following, critical columns-

      CREATE TABLE test (
        id char(32) NOT NULL PRIMARY KEY,
        name VARCHAR(255) DEFAULT NULL UNIQUE KEY,
        touched datetime DEFAULT NULL
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;

      Main workload consists of patterns such as-

      INSERT INTO test (id, name, touched) VALUES('000000000000000000000000000000001','SYSTEM_LOCK:discovery_device_history:00000000000000000000000000000001', '2019-09-12 19:36:46');
      UPDATE test SET touched = '2019-09-12 19:36:46' WHERE test.name = 'SYSTEM_LOCK:discovery_device_history:00000000000000000000000000000001';
      DELETE FROM test WHERE test.name = 'SYSTEM_LOCK:discovery_device_history:00000000000000000000000000000001'

      As expected to optimistic parallel replication, this kind of workload results in tons of rollbacks on the replica (but to be clear, the primary is fine throughout), such as-

      2022-11-11 22:29:39 16 [Note] InnoDB: *** (2) TRANSACTION:
       
      TRANSACTION 1757073418, ACTIVE 0 sec inserting
      mysql tables in use 1, locked 1
      3 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 1
      MySQL thread id 16, OS thread handle 139911793866496, query id 182677598 Update
      INSERT INTO sys_mutex (`sys_id`,`sys_updated_by`,`system_id`,`sys_created_on`,`name`,`sys_mod_count`,`sys_updated_on`,`sys_created_by`) VALUES('cdd75b0d22e849008aa321f01222ec46','replay.admin.29','available','2019-09-12 19:36:46','SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77dc96199b',0,'2019-09-12 19:36:46','replay.admin.29')
      2022-11-11 22:29:39 16 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):
       
      RECORD LOCKS space id 26 page no 4 n bits 72 index name of table `master_4`.`sys_mutex` trx id 1757073418 lock mode S
      Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
       0: len 8; hex 73757072656d756d; asc supremum;;
       
      Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
       0: len 30; hex 53595354454d5f4c4f434b3a646973636f766572795f6465766963655f68; asc SYSTEM_LOCK:discovery_device_h; (total 69 bytes);
       1: len 30; hex 643039653233346535356137343363326266616264656532393166313561; asc d09e234e55a743c2bfabdee291f15a; (total 32 bytes);
       
      2022-11-11 22:29:39 16 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
       
      RECORD LOCKS space id 26 page no 4 n bits 72 index name of table `master_4`.`sys_mutex` trx id 1757073418 lock_mode X locks gap before rec insert intention waiting
      Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
       0: len 30; hex 53595354454d5f4c4f434b3a646973636f766572795f6465766963655f68; asc SYSTEM_LOCK:discovery_device_h; (total 69 bytes);
       1: len 30; hex 643039653233346535356137343363326266616264656532393166313561; asc d09e234e55a743c2bfabdee291f15a; (total 32 bytes);
       
      2022-11-11 22:29:39 16 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1)

      Despite tons of these deadlocks happening constantly, in my attempts the replica held out for over 24 hours before finally throwing the below error-

      2022-11-11 22:29:39 17 [ERROR] Slave SQL: Error 'Duplicate entry 'SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77...' for key 'name'' on query. Default database: 'master_6'. Query: 'INSERT INTO sys_mutex (`sys_id`,`sys_updated_by`,`system_id`,`sys_created_on`,`name`,`sys_mod_count`,`sys_updated_on`,`sys_created_by`) VALUES('a21dfb83830740b38da2db5e67d6c2c6','replay.admin.29','available','2019-09-12 19:36:46','SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77dc96199b',0,'2019-09-12 19:36:46','replay.admin.29')', Gtid 0-1-445975955, Internal MariaDB error code: 1062
      2022-11-11 22:29:39 17 [Warning] Slave: Duplicate entry 'SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77...' for key 'name' Error_code: 1062
      2022-11-11 22:29:39 17 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'CS_10.000174' position 791116152; GTID position '0-1-445975954'
      2022-11-11 22:29:39 16 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2022-11-11 22:29:39 16 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2022-11-11 22:29:39 16 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'CS_10.000174' position 791116152; GTID position '0-1-445975954'
      2022-11-11 22:29:39 15 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2022-11-11 22:29:39 15 [Warning] Slave: Connection was killed Error_code: 1927
      2022-11-11 22:29:39 15 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
      2022-11-11 22:29:39 15 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964

      Resolved binary log from this repro is available via link in comments.

      Data leftover on replica after this crash is-

      SELECT * FROM master_6.sys_mutex\G
      *************************** 1. row ***************************
                name: SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77dc96199b
           system_id: app128033.iad105.service-now.com:ciscotesttemp2021
            acquired: 2019-09-12 19:36:46
             touched: 2019-09-12 19:36:46
              sys_id: 05259991e9ac49cb9b8860533c352800
      sys_updated_by: replay.admin.29
      sys_updated_on: 2019-09-12 19:36:46
      sys_created_by: replay.admin.29
      sys_created_on: 2019-09-12 19:36:46
       sys_mod_count: 0

      The same query on the primary returns-

      SELECT * FROM master_6.sys_mutex;
      Empty set (0.000 sec)

      SHOW GLOBAL VARIABLES from primary and replica also attached while the full error log from the replica is in the comments.

      Attachments

        1. MDEV-30010.test
          3 kB
          Vladislav Lesin
        2. MDEV-30010-pk.test
          1 kB
          Vladislav Lesin
        3. primary_global_variables.csv
          21 kB
          Rob Schwyzer
        4. replica_global_variables.csv
          21 kB
          Rob Schwyzer

        Issue Links

          Activity

            People

              Elkin Andrei Elkin
              rob.schwyzer@mariadb.com Rob Schwyzer
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.