[MDEV-30973] Enhance output of innodb deadlocks (SAMU-124, SAMU-131) Created: 2023-03-30  Updated: 2023-12-22

Status: Stalled
Project: MariaDB Server
Component/s: Replication, Server, Storage Engine - InnoDB
Fix Version/s: 11.5

Type: Task Priority: Major
Reporter: Rob Schwyzer Assignee: Aleksey Midenkov
Resolution: Unresolved Votes: 0
Labels: ServiceNow

Issue Links:
Relates
relates to MDEV-24738 Improve the InnoDB deadlock checker Closed
relates to MDEV-32576 InnoDB deadlock output does not alway... Open

 Description   

Merge corresponding issues from Samurai branch:

SAMU-131 GTID in deadlock output

Currently, when innodb_print_all_deadlocks=on, it outputs a stream of data like-

2023-03-05 21:06:31 139610138924800 [Note] InnoDB: Transactions deadlock detected, dumping detailed information.
2023-03-05 21:06:31 139610138924800 [Note] InnoDB:
*** (1) TRANSACTION:
TRANSACTION 421484824000288, ACTIVE 0 sec starting index read
mysql tables in use 2, locked 2
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 15, OS thread handle 139610138617600, query id 2604440527
2023-03-05 21:06:31 139610138924800 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
 
RECORD LOCKS space id 250069 page no 9966 n bits 976 index document_number of table `REDACTED`.`REDACTED` trx id 421484824000288 lock mode S waiting
Record lock, heap no 903 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 82f519af; asc ;;
 1: len 4; hex 80000d01; asc ;;
 
2023-03-05 21:06:31 139610138924800 [Note] InnoDB: *** (2)
 
TRANSACTION: TRANSACTION 920212128464, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 6
MySQL thread id 14, OS thread handle 139610138924800, query id 2604440517 update
INSERT INTO REDACTED (`REDACTED`, `REDACTED`, `REDACTED`, `REDACTED`) VALUES (REDACTED, REDACTED, 'REDACTED', REDACTED), (REDACTED, REDACTED, 'REDACTED', REDACTED), ...
2023-03-05 21:06:31 139610138924800 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):
 
RECORD LOCKS space id 250069 page no 9966 n bits 976 index document_number of table `REDACTED`.`REDACTED` trx id 920212128464 lock_mode X locks rec but not gap
Record lock, heap no 903 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 82f519af; asc ;;
 1: len 4; hex 80000d01; asc ;;
 
2023-03-05 21:06:31 139610138924800 [Note] InnoDB: *** (2)
 
WAITING FOR THIS LOCK TO BE GRANTED
 
RECORD LOCKS space id 250069 page no 9966 n bits 976 index document_number of table `REDACTED`.`REDACTED` trx id 920212128464 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 903 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 82f519af; asc ;;
 1: len 4; hex 80000d01; asc ;;
 
2023-03-05 21:06:31 139610138924800 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1)

...

While this is helpful, it omits some key data that would make this a lot more helpful:

  1. Only one transaction’s query is logged
    1. We would like ALL involved queries to be logged so we know what was attempting to be done on all sides of the deadlock
  2. Lots of point-in-time reference data like transaction ID, thread ID, etc… but nothing like a full GTID or binary log position that would make it easier to identify the problematic binlog segment
    1. Not as important for stand-alone InnoDB nodes
    2. Very important for replicas- particularly replicas using optimistic parallel replication
  3. May also be worth looking into an option to expand physical record references into user-readable data
    1. Would obviously need to be a variable that is off by default
    2. Would greatly improve the ability to understand exactly what records in affected table(s) are deadlocking

SAMU-124 logging of slave_transaction_retries including deadlock output

Same, base insight as SAMU-123, but this would enable the ability to specify a unique log to write ROLLBACK and retry events to. Ideally this would include-

Timestamp per event

Each event should clarify what parallel thread ID it comes from (so all events from the same thread can be grep'd to see the serial stream of events for that thread)

Current value of the SLAVE_TRANSACTION_RETRIES counter value for the parallel thread this event occurred for

Depending on what triggered the event, the following should be logged-

Deadlock detected- emit text stating deadlock detected, then clarify AT LEAST the GTIDs of the transactions involved in the deadlock (more ideal would be to also output the raw queries those GTIDs correspond to)

Rollback performed- emit text clarifying that a rollback was performed successfully (include AT LEAST the GTID of the transaction that was rolled-back; more ideal would be to also output the raw query that GTID corresponds to)

Rollback failed- emit text clarifying that a rollback failed (and why, if possible) (include AT LEAST the GTID of the transaction that was rolled-back; more ideal would be to also output the raw query that GTID corresponds to)

Reapply succeeded- emit text clarifying that a reapply attempt succeeded (include AT LEAST the GTID of the transaction that was reapplied; more ideal would be to also output the raw query that GTID corresponds to)

Reapply failed- emit text clarifying that a reapply failed (and why, if possible) (include AT LEAST the GTID of the transaction that was reapplied; more ideal would be to also output the raw query that GTID corresponds to)

At least some of the above data should already be going to MariaDB’s error log, but pushing this to a dedicated log with high verbosity should let ServiceNow collect this log via splunk and improve the ability to parse and detect problems as well as to serialize events to reconstruct a 1205's specific events/root cause.



 Comments   
Comment by Aleksey Midenkov [ 2023-08-17 ]

This ticket implements the following requests:

As per SAMU-124 we need to output deadlock info into multiple destinations. Since marko in c68007d refactored DeadlockChecker into Deadlock namespace now this design does not allow printing into multiple streams. I propose to refactor back namespace into a class, so that second filehandle/stream could be stored for the output. Instead of stderr there should be array of streams where output of deadlock info goes. marko, please approve.

Comment by Marko Mäkelä [ 2023-08-17 ]

The InnoDB deadlock checker was rewritten in MDEV-24783 (MariaDB Server 10.6.0). The output format was changed as part of that as well. The old output would only show 2 transactions, while the new one can identify all transactions that are part of the waits-for loop.

Comment by Marko Mäkelä [ 2023-08-17 ]

Is there any change that is applicable to MariaDB Server 10.6 or later?

I suppose that the GTID is stored somewhere in THD, whose definition is hidden from InnoDB. The replication team should be able to help with that. I do not even know at which point of time the GTID is being assigned. It could be right before transaction commit and therefore unavailable most of the time.

I think that it may be useful to carefully read MDEV-13983, especially my comment about innobase_mysql_prepare_print_arbitrary_thd().

Comment by Aleksey Midenkov [ 2023-08-17 ]

marko Can you please be specific on the stated topic.

Comment by Marko Mäkelä [ 2023-08-18 ]

I found 2 change sets that have been applied to a MariaDB Server 10.4 based source code repository.

The change set labeled SAMU-131 would extend thd_get_error_context_description() in the case that THD::LOCK_thd_data could be acquired without any waiting. (That condition is needed in order to avoid a deadlock like MDEV-13983.) The GTID would only be displayed if the data member THD::rgi_slave has been set. While on the primary, the GTID might be assigned as the transaction is about to be committed, I can imagine that on replicas the GTID would be available right from the beginning.

It might be that this extension could break some third-party tools that could be parsing the output. Therefore, it might be advisable to introduce a configuration parameter for enabling the changed format. That said, I think that it is a reasonable change, to allow better understanding of locking conflicts on replicas. I think that serg needs to review this change, and consider it for inclusion in all supported major versions.

When it comes to SAMU-124, I do not think that I can review the change set in its current form, because it conflicts with the refactoring that was made in MDEV-24783. Could the desired effect be achieved by setting innodb_print_all_deadlocks=ON? Instead of being written to a separate file, the output would then be written to the server error log.

Comment by Aleksey Midenkov [ 2023-08-21 ]

marko did you read this:

As per SAMU-124 we need to output deadlock info into multiple destinations. Since Marko Mäkelä in c68007d refactored DeadlockChecker into Deadlock namespace now this design does not allow printing into multiple streams. I propose to refactor back namespace into a class, so that second filehandle/stream could be stored for the output. Instead of stderr there should be array of streams where output of deadlock info goes. Marko Mäkelä, please approve.

Do you agree/disagree with the proposed changes?

Comment by Sergei Golubchik [ 2023-08-21 ]

I don't think it's a meaningful idea to enable logging of something into an arbitrary number of streams.
deadlock information is written into the error log, why would one want it to be duplicated in the another log?
if someone wants to read and analyze only deadlocks with some tool, it's totally possible to read the error log and ignore non-deadlock lines

Comment by Aleksey Midenkov [ 2023-08-21 ]

serg maybe rob.schwyzer@mariadb.com will better explain this. From my POV it is a) hard to link transaction retry with the deadlock; b) filter-out deadlocks from error log requires complex script (I mean something more than simple grep).

serg Logging some aspect into separate file is a powerful feature against constant invention of grep expressions and running grep --line-buffered. I had such experience with another software and it definitely has meaning.

Comment by Sergei Golubchik [ 2023-08-22 ]

midenok, rob.schwyzer@mariadb.com, my comment was about not logging the same information into two log files.

If deadlocks are difficult to filter out from the error log, they could be printed in a form that makes grepping easy.

If extensive deadlock information clutters error log and makes it difficult to see anything else there — well, perhaps deadlocks should go into a separate log file indeed. Instead of the error log, not in addition to it.

Comment by Rob Schwyzer [ 2023-08-22 ]

serg, gotcha, and I agree. Again, my one recommendation would be to leave an option so they can still go into the error log if wanted, as many containerized setups in particular only expose one log file from a service like MariaDB. But I fully agree with the default mode of consumption being to put deadlocks into a separate log file. And I also agree with it being a binary choice- so if a separate log file is the default, then if a variable can switch that to push to the error log instead, it would only go to the error log- not the separate deadlock log. In short, I agree with the content only going to one log file rather than multiple.

Comment by Aleksey Midenkov [ 2023-08-24 ]

containerized setups in particular only expose one log file from a service like MariaDB

Good point. Another reason for deadlocks in error.log might be analysing them in relation to other logging that happens in InnoDB and that can be written only to stderr to my best knowledge.

Since now multi-destination output is even harder to implement I can stay with single stream option which defaults to stderr.

Comment by Aleksey Midenkov [ 2023-08-29 ]

I came to conclusion that coupling of retry message and deadlock message is too complex, requires some THD API extension (there is no TABLE at time of retry logging). SNOW version has it, but in CS this should be added by additional request.

Comment by Aleksey Midenkov [ 2023-09-01 ]

Updated bb-11.0-midenok on top of 11.3

Comment by Marko Mäkelä [ 2023-09-15 ]

I think that we need a branch where the regression test suite passes. The InnoDB changes are only a small part of this; there is more code outside InnoDB.

Comment by Aleksey Midenkov [ 2023-10-12 ]

Please review bb-11.0-midenok

Comment by Marko Mäkelä [ 2023-10-25 ]

Very many ./mtr --suite=encryption tests fail due to

mysqltest: Can't initialize MariaDB server

Generated at Thu Feb 08 10:20:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.