Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32049

Deadlock due to log_free_check(), involving trx_purge_truncate_rseg_history() and trx_undo_assign_low()

Details

    Description

      Hi,
      week ago our production database cluster (1 master, 4 replicas, maxscale as proxy) started to deadlock master approx. every 12 hours. We are still looking for trigger but without any success. No obvious problematic query in PROCESSLIST, nothing

      Finally today we were able to get decent coredump, using quay.io/mariadb-foundation/mariadb-debug:10.6 image. Exact version is 10.6.16-MariaDB-1:10.6.16+maria~ubu2004-log source revision: 07494006dd0887ebfb31564a8fd4c59cf1b299e9, exact image version docker.io/library/mariadb@sha256:fcbe381e5fef20c7a2932b52a070f58987b770c651aedf705332e54d1dfd465f

      SELECTs seems to be running OK, DML queries are blocked. Some in "opening table" some in "sending data".

      I'm attaching both server log, full backtrace and I also have coredump, but it is 700MB bzipped so not attaching but is available.

      Attachments

        Issue Links

          Activity

            cuchac Cuchac added a comment -

            Great work everybody! 2 days without deadlock, yay

            cuchac Cuchac added a comment - Great work everybody! 2 days without deadlock, yay
            danblack Daniel Black added a comment - - edited

            Thanks for the feedback. Great to have something clear and actionable.

            As this is the first bug report using quay.io/mariadb-foundation/mariadb-debug, what where some of the good/bad points about it and what would you (request me to) improve?

            danblack Daniel Black added a comment - - edited Thanks for the feedback. Great to have something clear and actionable. As this is the first bug report using quay.io/mariadb-foundation/mariadb-debug, what where some of the good/bad points about it and what would you (request me to) improve?
            cuchac Cuchac added a comment -

            Hello,

            it is great such image exists. I think we would not be able to produce a working backtrace without the image.

            I had some issues that can be fixed by improving documentation:
            1) It was quite hard to find this image. I think https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ and/or https://mariadb.com/kb/en/enabling-core-dumps/ should mention that image in containers section.

            2) We failed to produce coredump several times. We did not realize kernel.core_pattern is not namespaced and our k8s cluster provider had to change the value from default ubuntu value (apport) to plain file name. It is quite weird that kernel.core_pattern is inside container when file path is specified but it is in root namespace when "pipe" format is used (as is default in ubuntu apport).

            cuchac Cuchac added a comment - Hello, it is great such image exists. I think we would not be able to produce a working backtrace without the image. I had some issues that can be fixed by improving documentation: 1) It was quite hard to find this image. I think https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ and/or https://mariadb.com/kb/en/enabling-core-dumps/ should mention that image in containers section. 2) We failed to produce coredump several times. We did not realize kernel.core_pattern is not namespaced and our k8s cluster provider had to change the value from default ubuntu value (apport) to plain file name. It is quite weird that kernel.core_pattern is inside container when file path is specified but it is in root namespace when "pipe" format is used (as is default in ubuntu apport).
            danblack Daniel Black added a comment -

            cuchac Thanks for the feedback. Documented.

            danblack Daniel Black added a comment - cuchac Thanks for the feedback. Documented.

            A high-level description of this bug that InnoDB could hang when removing the history of committed transactions. As far as I can tell, this bug can occur independent of any configuration parameters. All that is needed is some transactions that modify non-temporary InnoDB tables, and sufficiently bad luck. A smaller setting of innodb_log_file_size could increase the probability of hitting this hang.

            marko Marko Mäkelä added a comment - A high-level description of this bug that InnoDB could hang when removing the history of committed transactions. As far as I can tell, this bug can occur independent of any configuration parameters. All that is needed is some transactions that modify non-temporary InnoDB tables, and sufficiently bad luck. A smaller setting of innodb_log_file_size could increase the probability of hitting this hang.

            People

              marko Marko Mäkelä
              cuchac Cuchac
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.