Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30010

Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964

    XMLWordPrintable

Details

    Description

      Please review comments for reproduction information.

      More to the point, this appears to be a direct reproduction of MDEV-20628 on a MariaDB version patched against MDEV-20605.

      Primary using binlog_format=mixed. Replica using the below, key configurations-

      slave_parallel_mode=optimistic
      slave_parallel_threads=16
      slave_domain_parallel_threads=16
      slave_parallel_max_queued=2147483647

      Table structure uses the following, critical columns-

      CREATE TABLE test (
        id char(32) NOT NULL PRIMARY KEY,
        name VARCHAR(255) DEFAULT NULL UNIQUE KEY,
        touched datetime DEFAULT NULL
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;

      Main workload consists of patterns such as-

      INSERT INTO test (id, name, touched) VALUES('000000000000000000000000000000001','SYSTEM_LOCK:discovery_device_history:00000000000000000000000000000001', '2019-09-12 19:36:46');
      UPDATE test SET touched = '2019-09-12 19:36:46' WHERE test.name = 'SYSTEM_LOCK:discovery_device_history:00000000000000000000000000000001';
      DELETE FROM test WHERE test.name = 'SYSTEM_LOCK:discovery_device_history:00000000000000000000000000000001'

      As expected to optimistic parallel replication, this kind of workload results in tons of rollbacks on the replica (but to be clear, the primary is fine throughout), such as-

      2022-11-11 22:29:39 16 [Note] InnoDB: *** (2) TRANSACTION:
       
      TRANSACTION 1757073418, ACTIVE 0 sec inserting
      mysql tables in use 1, locked 1
      3 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 1
      MySQL thread id 16, OS thread handle 139911793866496, query id 182677598 Update
      INSERT INTO sys_mutex (`sys_id`,`sys_updated_by`,`system_id`,`sys_created_on`,`name`,`sys_mod_count`,`sys_updated_on`,`sys_created_by`) VALUES('cdd75b0d22e849008aa321f01222ec46','replay.admin.29','available','2019-09-12 19:36:46','SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77dc96199b',0,'2019-09-12 19:36:46','replay.admin.29')
      2022-11-11 22:29:39 16 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):
       
      RECORD LOCKS space id 26 page no 4 n bits 72 index name of table `master_4`.`sys_mutex` trx id 1757073418 lock mode S
      Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
       0: len 8; hex 73757072656d756d; asc supremum;;
       
      Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
       0: len 30; hex 53595354454d5f4c4f434b3a646973636f766572795f6465766963655f68; asc SYSTEM_LOCK:discovery_device_h; (total 69 bytes);
       1: len 30; hex 643039653233346535356137343363326266616264656532393166313561; asc d09e234e55a743c2bfabdee291f15a; (total 32 bytes);
       
      2022-11-11 22:29:39 16 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
       
      RECORD LOCKS space id 26 page no 4 n bits 72 index name of table `master_4`.`sys_mutex` trx id 1757073418 lock_mode X locks gap before rec insert intention waiting
      Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
       0: len 30; hex 53595354454d5f4c4f434b3a646973636f766572795f6465766963655f68; asc SYSTEM_LOCK:discovery_device_h; (total 69 bytes);
       1: len 30; hex 643039653233346535356137343363326266616264656532393166313561; asc d09e234e55a743c2bfabdee291f15a; (total 32 bytes);
       
      2022-11-11 22:29:39 16 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1)

      Despite tons of these deadlocks happening constantly, in my attempts the replica held out for over 24 hours before finally throwing the below error-

      2022-11-11 22:29:39 17 [ERROR] Slave SQL: Error 'Duplicate entry 'SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77...' for key 'name'' on query. Default database: 'master_6'. Query: 'INSERT INTO sys_mutex (`sys_id`,`sys_updated_by`,`system_id`,`sys_created_on`,`name`,`sys_mod_count`,`sys_updated_on`,`sys_created_by`) VALUES('a21dfb83830740b38da2db5e67d6c2c6','replay.admin.29','available','2019-09-12 19:36:46','SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77dc96199b',0,'2019-09-12 19:36:46','replay.admin.29')', Gtid 0-1-445975955, Internal MariaDB error code: 1062
      2022-11-11 22:29:39 17 [Warning] Slave: Duplicate entry 'SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77...' for key 'name' Error_code: 1062
      2022-11-11 22:29:39 17 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'CS_10.000174' position 791116152; GTID position '0-1-445975954'
      2022-11-11 22:29:39 16 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2022-11-11 22:29:39 16 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2022-11-11 22:29:39 16 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'CS_10.000174' position 791116152; GTID position '0-1-445975954'
      2022-11-11 22:29:39 15 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2022-11-11 22:29:39 15 [Warning] Slave: Connection was killed Error_code: 1927
      2022-11-11 22:29:39 15 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
      2022-11-11 22:29:39 15 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964

      Resolved binary log from this repro is available via link in comments.

      Data leftover on replica after this crash is-

      SELECT * FROM master_6.sys_mutex\G
      *************************** 1. row ***************************
                name: SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77dc96199b
           system_id: app128033.iad105.service-now.com:ciscotesttemp2021
            acquired: 2019-09-12 19:36:46
             touched: 2019-09-12 19:36:46
              sys_id: 05259991e9ac49cb9b8860533c352800
      sys_updated_by: replay.admin.29
      sys_updated_on: 2019-09-12 19:36:46
      sys_created_by: replay.admin.29
      sys_created_on: 2019-09-12 19:36:46
       sys_mod_count: 0

      The same query on the primary returns-

      SELECT * FROM master_6.sys_mutex;
      Empty set (0.000 sec)

      SHOW GLOBAL VARIABLES from primary and replica also attached while the full error log from the replica is in the comments.

      Attachments

        Issue Links

          Activity

            People

              Elkin Andrei Elkin
              rob.schwyzer@mariadb.com Rob Schwyzer
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.