Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30973

Enhance output of innodb deadlocks (SAMU-124, SAMU-131)

    XMLWordPrintable

Details

    Description

      Merge corresponding issues from Samurai branch:

      SAMU-131 GTID in deadlock output

      Currently, when innodb_print_all_deadlocks=on, it outputs a stream of data like-

      2023-03-05 21:06:31 139610138924800 [Note] InnoDB: Transactions deadlock detected, dumping detailed information.
      2023-03-05 21:06:31 139610138924800 [Note] InnoDB:
      *** (1) TRANSACTION:
      TRANSACTION 421484824000288, ACTIVE 0 sec starting index read
      mysql tables in use 2, locked 2
      LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
      MySQL thread id 15, OS thread handle 139610138617600, query id 2604440527
      2023-03-05 21:06:31 139610138924800 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
       
      RECORD LOCKS space id 250069 page no 9966 n bits 976 index document_number of table `REDACTED`.`REDACTED` trx id 421484824000288 lock mode S waiting
      Record lock, heap no 903 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
       0: len 4; hex 82f519af; asc ;;
       1: len 4; hex 80000d01; asc ;;
       
      2023-03-05 21:06:31 139610138924800 [Note] InnoDB: *** (2)
       
      TRANSACTION: TRANSACTION 920212128464, ACTIVE 0 sec inserting
      mysql tables in use 1, locked 1
      3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 6
      MySQL thread id 14, OS thread handle 139610138924800, query id 2604440517 update
      INSERT INTO REDACTED (`REDACTED`, `REDACTED`, `REDACTED`, `REDACTED`) VALUES (REDACTED, REDACTED, 'REDACTED', REDACTED), (REDACTED, REDACTED, 'REDACTED', REDACTED), ...
      2023-03-05 21:06:31 139610138924800 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):
       
      RECORD LOCKS space id 250069 page no 9966 n bits 976 index document_number of table `REDACTED`.`REDACTED` trx id 920212128464 lock_mode X locks rec but not gap
      Record lock, heap no 903 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
       0: len 4; hex 82f519af; asc ;;
       1: len 4; hex 80000d01; asc ;;
       
      2023-03-05 21:06:31 139610138924800 [Note] InnoDB: *** (2)
       
      WAITING FOR THIS LOCK TO BE GRANTED
       
      RECORD LOCKS space id 250069 page no 9966 n bits 976 index document_number of table `REDACTED`.`REDACTED` trx id 920212128464 lock_mode X locks gap before rec insert intention waiting
      Record lock, heap no 903 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
       0: len 4; hex 82f519af; asc ;;
       1: len 4; hex 80000d01; asc ;;
       
      2023-03-05 21:06:31 139610138924800 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1)

      ...

      While this is helpful, it omits some key data that would make this a lot more helpful:

      1. Only one transaction’s query is logged
        1. We would like ALL involved queries to be logged so we know what was attempting to be done on all sides of the deadlock
      2. Lots of point-in-time reference data like transaction ID, thread ID, etc… but nothing like a full GTID or binary log position that would make it easier to identify the problematic binlog segment
        1. Not as important for stand-alone InnoDB nodes
        2. Very important for replicas- particularly replicas using optimistic parallel replication
      3. May also be worth looking into an option to expand physical record references into user-readable data
        1. Would obviously need to be a variable that is off by default
        2. Would greatly improve the ability to understand exactly what records in affected table(s) are deadlocking

      SAMU-124 logging of slave_transaction_retries including deadlock output

      Same, base insight as SAMU-123, but this would enable the ability to specify a unique log to write ROLLBACK and retry events to. Ideally this would include-

      Timestamp per event

      Each event should clarify what parallel thread ID it comes from (so all events from the same thread can be grep'd to see the serial stream of events for that thread)

      Current value of the SLAVE_TRANSACTION_RETRIES counter value for the parallel thread this event occurred for

      Depending on what triggered the event, the following should be logged-

      Deadlock detected- emit text stating deadlock detected, then clarify AT LEAST the GTIDs of the transactions involved in the deadlock (more ideal would be to also output the raw queries those GTIDs correspond to)

      Rollback performed- emit text clarifying that a rollback was performed successfully (include AT LEAST the GTID of the transaction that was rolled-back; more ideal would be to also output the raw query that GTID corresponds to)

      Rollback failed- emit text clarifying that a rollback failed (and why, if possible) (include AT LEAST the GTID of the transaction that was rolled-back; more ideal would be to also output the raw query that GTID corresponds to)

      Reapply succeeded- emit text clarifying that a reapply attempt succeeded (include AT LEAST the GTID of the transaction that was reapplied; more ideal would be to also output the raw query that GTID corresponds to)

      Reapply failed- emit text clarifying that a reapply failed (and why, if possible) (include AT LEAST the GTID of the transaction that was reapplied; more ideal would be to also output the raw query that GTID corresponds to)

      At least some of the above data should already be going to MariaDB’s error log, but pushing this to a dedicated log with high verbosity should let ServiceNow collect this log via splunk and improve the ability to parse and detect problems as well as to serialize events to reconstruct a 1205's specific events/root cause.

      Attachments

        Issue Links

          Activity

            People

              serg Sergei Golubchik
              rob.schwyzer@mariadb.com Rob Schwyzer
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.