[MDEV-32049] Deadlock due to log_free_check(), involving trx_purge_truncate_rseg_history() and trx_undo_assign_low() - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.5.20, 10.6.13, 10.6.15
Fix Version/s: 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3, 11.2.2, 11.3.0
Component/s: Storage Engine - InnoDB
Labels:
- crash
- deadlock
- innodb
- regression
- semaphore
Environment:
K8s, amd64, occurs both in official mariadb:10.6.15 and quay.io/mariadb-foundation/mariadb-debug:10.6

Description

Hi,
week ago our production database cluster (1 master, 4 replicas, maxscale as proxy) started to deadlock master approx. every 12 hours. We are still looking for trigger but without any success. No obvious problematic query in PROCESSLIST, nothing

Finally today we were able to get decent coredump, using quay.io/mariadb-foundation/mariadb-debug:10.6 image. Exact version is 10.6.16-MariaDB-1:10.6.16+maria~ubu2004-log source revision: 07494006dd0887ebfb31564a8fd4c59cf1b299e9, exact image version docker.io/library/mariadb@sha256:fcbe381e5fef20c7a2932b52a070f58987b770c651aedf705332e54d1dfd465f

SELECTs seems to be running OK, DML queries are blocked. Some in "opening table" some in "sending data".

I'm attaching both server log, full backtrace and I also have coredump, but it is 700MB bzipped so not attaching but is available.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mariadb_deadlock.log
2023-08-31 07:29
6 kB
Cuchac
mariadbd_full_bt_all_threads.txt
2023-08-31 07:29
1.34 MB
Cuchac

Issue Links

is caused by

MDEV-30753 Possible corruption due to trx_purge_free_segment()

Closed

Activity

Ascending order - Click to sort in descending order

View 12 older comments

Cuchac added a comment - 2023-09-03 17:39

Great work everybody! 2 days without deadlock, yay

Cuchac added a comment - 2023-09-03 17:39 Great work everybody! 2 days without deadlock, yay

Daniel Black added a comment - 2023-09-04 03:34 - edited

Thanks for the feedback. Great to have something clear and actionable.

As this is the first bug report using quay.io/mariadb-foundation/mariadb-debug, what where some of the good/bad points about it and what would you (request me to) improve?

Daniel Black added a comment - 2023-09-04 03:34 - edited Thanks for the feedback. Great to have something clear and actionable. As this is the first bug report using quay.io/mariadb-foundation/mariadb-debug, what where some of the good/bad points about it and what would you (request me to) improve?

Cuchac added a comment - 2023-09-05 20:24

Hello,

it is great such image exists. I think we would not be able to produce a working backtrace without the image.

I had some issues that can be fixed by improving documentation:
1) It was quite hard to find this image. I think https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ and/or https://mariadb.com/kb/en/enabling-core-dumps/ should mention that image in containers section.

2) We failed to produce coredump several times. We did not realize kernel.core_pattern is not namespaced and our k8s cluster provider had to change the value from default ubuntu value (apport) to plain file name. It is quite weird that kernel.core_pattern is inside container when file path is specified but it is in root namespace when "pipe" format is used (as is default in ubuntu apport).

Cuchac added a comment - 2023-09-05 20:24 Hello, it is great such image exists. I think we would not be able to produce a working backtrace without the image. I had some issues that can be fixed by improving documentation: 1) It was quite hard to find this image. I think https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ and/or https://mariadb.com/kb/en/enabling-core-dumps/ should mention that image in containers section. 2) We failed to produce coredump several times. We did not realize kernel.core_pattern is not namespaced and our k8s cluster provider had to change the value from default ubuntu value (apport) to plain file name. It is quite weird that kernel.core_pattern is inside container when file path is specified but it is in root namespace when "pipe" format is used (as is default in ubuntu apport).

Daniel Black added a comment - 2023-09-22 07:48

cuchac Thanks for the feedback. Documented.

Daniel Black added a comment - 2023-09-22 07:48 cuchac Thanks for the feedback. Documented.

Marko Mäkelä added a comment - 2023-11-08 15:39

A high-level description of this bug that InnoDB could hang when removing the history of committed transactions. As far as I can tell, this bug can occur independent of any configuration parameters. All that is needed is some transactions that modify non-temporary InnoDB tables, and sufficiently bad luck. A smaller setting of innodb_log_file_size could increase the probability of hitting this hang.

Marko Mäkelä added a comment - 2023-11-08 15:39 A high-level description of this bug that InnoDB could hang when removing the history of committed transactions. As far as I can tell, this bug can occur independent of any configuration parameters. All that is needed is some transactions that modify non-temporary InnoDB tables, and sufficiently bad luck. A smaller setting of innodb_log_file_size could increase the probability of hitting this hang.

People

Assignee:: Marko Mäkelä

Reporter:: Cuchac

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2023-08-31 07:32

Updated:: 2023-11-08 15:39

Resolved:: 2023-08-31 11:00

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration