Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30010

Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964

Details

    Description

      Please review comments for reproduction information.

      More to the point, this appears to be a direct reproduction of MDEV-20628 on a MariaDB version patched against MDEV-20605.

      Primary using binlog_format=mixed. Replica using the below, key configurations-

      slave_parallel_mode=optimistic
      slave_parallel_threads=16
      slave_domain_parallel_threads=16
      slave_parallel_max_queued=2147483647

      Table structure uses the following, critical columns-

      CREATE TABLE test (
        id char(32) NOT NULL PRIMARY KEY,
        name VARCHAR(255) DEFAULT NULL UNIQUE KEY,
        touched datetime DEFAULT NULL
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;

      Main workload consists of patterns such as-

      INSERT INTO test (id, name, touched) VALUES('000000000000000000000000000000001','SYSTEM_LOCK:discovery_device_history:00000000000000000000000000000001', '2019-09-12 19:36:46');
      UPDATE test SET touched = '2019-09-12 19:36:46' WHERE test.name = 'SYSTEM_LOCK:discovery_device_history:00000000000000000000000000000001';
      DELETE FROM test WHERE test.name = 'SYSTEM_LOCK:discovery_device_history:00000000000000000000000000000001'

      As expected to optimistic parallel replication, this kind of workload results in tons of rollbacks on the replica (but to be clear, the primary is fine throughout), such as-

      2022-11-11 22:29:39 16 [Note] InnoDB: *** (2) TRANSACTION:
       
      TRANSACTION 1757073418, ACTIVE 0 sec inserting
      mysql tables in use 1, locked 1
      3 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 1
      MySQL thread id 16, OS thread handle 139911793866496, query id 182677598 Update
      INSERT INTO sys_mutex (`sys_id`,`sys_updated_by`,`system_id`,`sys_created_on`,`name`,`sys_mod_count`,`sys_updated_on`,`sys_created_by`) VALUES('cdd75b0d22e849008aa321f01222ec46','replay.admin.29','available','2019-09-12 19:36:46','SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77dc96199b',0,'2019-09-12 19:36:46','replay.admin.29')
      2022-11-11 22:29:39 16 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):
       
      RECORD LOCKS space id 26 page no 4 n bits 72 index name of table `master_4`.`sys_mutex` trx id 1757073418 lock mode S
      Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
       0: len 8; hex 73757072656d756d; asc supremum;;
       
      Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
       0: len 30; hex 53595354454d5f4c4f434b3a646973636f766572795f6465766963655f68; asc SYSTEM_LOCK:discovery_device_h; (total 69 bytes);
       1: len 30; hex 643039653233346535356137343363326266616264656532393166313561; asc d09e234e55a743c2bfabdee291f15a; (total 32 bytes);
       
      2022-11-11 22:29:39 16 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
       
      RECORD LOCKS space id 26 page no 4 n bits 72 index name of table `master_4`.`sys_mutex` trx id 1757073418 lock_mode X locks gap before rec insert intention waiting
      Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
       0: len 30; hex 53595354454d5f4c4f434b3a646973636f766572795f6465766963655f68; asc SYSTEM_LOCK:discovery_device_h; (total 69 bytes);
       1: len 30; hex 643039653233346535356137343363326266616264656532393166313561; asc d09e234e55a743c2bfabdee291f15a; (total 32 bytes);
       
      2022-11-11 22:29:39 16 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1)

      Despite tons of these deadlocks happening constantly, in my attempts the replica held out for over 24 hours before finally throwing the below error-

      2022-11-11 22:29:39 17 [ERROR] Slave SQL: Error 'Duplicate entry 'SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77...' for key 'name'' on query. Default database: 'master_6'. Query: 'INSERT INTO sys_mutex (`sys_id`,`sys_updated_by`,`system_id`,`sys_created_on`,`name`,`sys_mod_count`,`sys_updated_on`,`sys_created_by`) VALUES('a21dfb83830740b38da2db5e67d6c2c6','replay.admin.29','available','2019-09-12 19:36:46','SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77dc96199b',0,'2019-09-12 19:36:46','replay.admin.29')', Gtid 0-1-445975955, Internal MariaDB error code: 1062
      2022-11-11 22:29:39 17 [Warning] Slave: Duplicate entry 'SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77...' for key 'name' Error_code: 1062
      2022-11-11 22:29:39 17 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'CS_10.000174' position 791116152; GTID position '0-1-445975954'
      2022-11-11 22:29:39 16 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2022-11-11 22:29:39 16 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2022-11-11 22:29:39 16 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'CS_10.000174' position 791116152; GTID position '0-1-445975954'
      2022-11-11 22:29:39 15 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2022-11-11 22:29:39 15 [Warning] Slave: Connection was killed Error_code: 1927
      2022-11-11 22:29:39 15 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
      2022-11-11 22:29:39 15 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964

      Resolved binary log from this repro is available via link in comments.

      Data leftover on replica after this crash is-

      SELECT * FROM master_6.sys_mutex\G
      *************************** 1. row ***************************
                name: SYSTEM_LOCK:discovery_device_history:ba1e3b9adb73b30834fa5e77dc96199b
           system_id: app128033.iad105.service-now.com:ciscotesttemp2021
            acquired: 2019-09-12 19:36:46
             touched: 2019-09-12 19:36:46
              sys_id: 05259991e9ac49cb9b8860533c352800
      sys_updated_by: replay.admin.29
      sys_updated_on: 2019-09-12 19:36:46
      sys_created_by: replay.admin.29
      sys_created_on: 2019-09-12 19:36:46
       sys_mod_count: 0

      The same query on the primary returns-

      SELECT * FROM master_6.sys_mutex;
      Empty set (0.000 sec)

      SHOW GLOBAL VARIABLES from primary and replica also attached while the full error log from the replica is in the comments.

      Attachments

        Issue Links

          Activity

            Here is some illustration to the MDEV-30010.test.

            vlad.lesin Vladislav Lesin added a comment - Here is some illustration to the MDEV-30010.test .
            marko Marko Mäkelä added a comment - - edited

            vlad.lesin, thank you! I think that an important observation is that due to the way how MVCC and locking is implemented in InnoDB, there may be multiple secondary index records that point to the same primary key. In the clustered index (primary key index), there can be at most one record for each potential key. Old versions are in a singly linked list of undo log records (pointed to by DB_ROLL_PTR; the validity of those pointers is based on the value of DB_TRX_ID in each record).

            We should also keep in mind that InnoDB does not have any concept of "table row lock". InnoDB transactions only know "table locks" and "index record locks". When there exist secondary indexes on a table, the type of locking conflicts depends on the access path. A DELETE or UPDATE or similar statement will start with a locking read. That read may use a secondary index, that is, lock a secondary index record before potentially locking the clustered index record.

            If the INSERT had added the secondary index record before the DELETE looked up the secondary key, also then the DELETE would have waited for a lock that was created by the INSERT, but it would have been on the INSERT’ed record, not the record that INSERT read-locked for the duplicate key check.

            As far as I understand, this problem affects UPDATE and possibly REPLACE as well, and it is not really limited to unique indexes. I see some possible solutions to this.

            1. If the affected mode of replication is active, and if DELETE or UPDATE or REPLACE could make use of a secondary index, require row-based replication events to be written.
            2. In InnoDB, after a lock wait on a secondary index record resumes, restart the search from the first matching key, instead of continuing from the current index position.
            3. In InnoDB, after a lock wait on a delete-marked secondary index record resumes, look up and lock the clustered index record and check if it matches the WHERE condition. Sorry, I realized that this would not work even in the scenario of MDEV-30010.test. We would not find the primary key of the INSERT.

            I would consider any InnoDB change as a work-around. I am concerned that changing the logic in InnoDB could cause a serious performance regression for some types of workloads. The "restart the index scan" fix could potentially introduce a bug where we get into an infinite loop if the circumstances are suitable (say, there are lots of concurrent page splits or merges on the index tree). It would unnecessarily slow down execution on all servers even when the affected mode of replication is not being used at all.

            marko Marko Mäkelä added a comment - - edited vlad.lesin , thank you! I think that an important observation is that due to the way how MVCC and locking is implemented in InnoDB, there may be multiple secondary index records that point to the same primary key. In the clustered index (primary key index), there can be at most one record for each potential key. Old versions are in a singly linked list of undo log records (pointed to by DB_ROLL_PTR; the validity of those pointers is based on the value of DB_TRX_ID in each record). We should also keep in mind that InnoDB does not have any concept of "table row lock". InnoDB transactions only know "table locks" and "index record locks". When there exist secondary indexes on a table, the type of locking conflicts depends on the access path. A DELETE or UPDATE or similar statement will start with a locking read. That read may use a secondary index, that is, lock a secondary index record before potentially locking the clustered index record. If the INSERT had added the secondary index record before the DELETE looked up the secondary key, also then the DELETE would have waited for a lock that was created by the INSERT , but it would have been on the INSERT ’ed record, not the record that INSERT read-locked for the duplicate key check. As far as I understand, this problem affects UPDATE and possibly REPLACE as well, and it is not really limited to unique indexes. I see some possible solutions to this. If the affected mode of replication is active, and if DELETE or UPDATE or REPLACE could make use of a secondary index, require row-based replication events to be written. In InnoDB, after a lock wait on a secondary index record resumes, restart the search from the first matching key, instead of continuing from the current index position. In InnoDB, after a lock wait on a delete-marked secondary index record resumes, look up and lock the clustered index record and check if it matches the WHERE condition. Sorry, I realized that this would not work even in the scenario of MDEV-30010.test . We would not find the primary key of the INSERT . I would consider any InnoDB change as a work-around. I am concerned that changing the logic in InnoDB could cause a serious performance regression for some types of workloads. The "restart the index scan" fix could potentially introduce a bug where we get into an infinite loop if the circumstances are suitable (say, there are lots of concurrent page splits or merges on the index tree). It would unnecessarily slow down execution on all servers even when the affected mode of replication is not being used at all.

            marko, it seems, if we remove not-gap locks from secondary indexes, "table row lock" concept will be implemented.

            And MDEV-30010.test should end up with deadlock error for the DELETE, like MDEV-30010-pk.test.

            One more solution would be in requesting next-key locks for secondary indexes. This would also cause deadlock error for the DELETE. But performance impact should be measured.

            vlad.lesin Vladislav Lesin added a comment - marko , it seems, if we remove not-gap locks from secondary indexes, "table row lock" concept will be implemented. And MDEV-30010.test should end up with deadlock error for the DELETE, like MDEV-30010-pk.test . One more solution would be in requesting next-key locks for secondary indexes. This would also cause deadlock error for the DELETE. But performance impact should be measured.
            vlad.lesin Vladislav Lesin added a comment - - edited

            After long discussions it was decided that skipping record is a bug, which was filed in MDEV-30225, and we can fix it quite trivially for RR. But for RC we would need to rescan range every time after transaction resuming to fix it, and such rescanning is quite expensive from performance perspective. That's why it was decided not to fix it for RC, and instead to flag DELETE/UPDATE/REPLACE unsafe for statement-based replication when a secondary index may be used, and the isolation level is READ (UN)COMMITTED.

            I am fixing MDEV-30225 and returning this issue to replication team.

            vlad.lesin Vladislav Lesin added a comment - - edited After long discussions it was decided that skipping record is a bug, which was filed in MDEV-30225 , and we can fix it quite trivially for RR. But for RC we would need to rescan range every time after transaction resuming to fix it, and such rescanning is quite expensive from performance perspective. That's why it was decided not to fix it for RC, and instead to flag DELETE/UPDATE/REPLACE unsafe for statement-based replication when a secondary index may be used, and the isolation level is READ (UN)COMMITTED. I am fixing MDEV-30225 and returning this issue to replication team.
            Elkin Andrei Elkin added a comment -

            Closed with 'Done' to indicate that a commit pushed merely extends the test base
            and clarifies on the documented binlog-related behavior.

            Elkin Andrei Elkin added a comment - Closed with 'Done' to indicate that a commit pushed merely extends the test base and clarifies on the documented binlog-related behavior.

            People

              Elkin Andrei Elkin
              rob.schwyzer@mariadb.com Rob Schwyzer
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.