Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23303

deadlocks after upgrade from 10.4.12 to 10.5.4

Details

    Description

      A couple of weeks ago we have upgraded our MariaDB + Galera (3 nodes) cluster to the last stable version - 10.5.4. In our configuration, all writes go to 1 node only. And we started facing deadlocks (gap locking) issue on that node. Nothing was changed from the DB side except upgrade and nothing was changed from the application side.

      Could you please advise on this and please let me know what additional information will you require for further investigation?

      Thank you.

      Attachments

        1. galera.cnf
          5 kB
        2. galera_error_log_part.log
          274 kB
        3. galera_error.log_2node.gz
          4.17 MB
        4. galera_error.log_3node.gz
          4.88 MB
        5. screenshot-1.png
          screenshot-1.png
          1.23 MB

        Activity

          As from documentation https://mariadb.com/kb/en/innodb-lock-modes:

          Gap locks are disabled if the innodb_locks_unsafe_for_binlog system variable is set, or the isolation level is set to READ COMMITTED.
          

          So we use READ COMMITTED isolation level and as expected we don't face with the gap locking in 10.4. But something goes wrong in 10.5.
          Please take a look.

          sun4ezzz Aleksandr Omelchuk added a comment - As from documentation https://mariadb.com/kb/en/innodb-lock-modes: Gap locks are disabled if the innodb_locks_unsafe_for_binlog system variable is set, or the isolation level is set to READ COMMITTED. So we use READ COMMITTED isolation level and as expected we don't face with the gap locking in 10.4. But something goes wrong in 10.5. Please take a look.

          Can you repeat the problematic deadlock in 10.5 easily? If you can, can you share a repeatable test case.

          jplindst Jan Lindström (Inactive) added a comment - Can you repeat the problematic deadlock in 10.5 easily? If you can, can you share a repeatable test case.

          We have downgraded all our environments to 10.4 and currently, we can't reproduce the issue.

          sun4ezzz Aleksandr Omelchuk added a comment - We have downgraded all our environments to 10.4 and currently, we can't reproduce the issue.

          Galera had a bug https://jira.mariadb.org/browse/MDEV-23557 that could have effect here also as code used persistent b-tree cursor after mini-transaction commit that could cause page contents to change e.g. split or merge, this naturally would be visible only with high concurrency and lot of foreign key actions.

          jplindst Jan Lindström (Inactive) added a comment - Galera had a bug https://jira.mariadb.org/browse/MDEV-23557 that could have effect here also as code used persistent b-tree cursor after mini-transaction commit that could cause page contents to change e.g. split or merge, this naturally would be visible only with high concurrency and lot of foreign key actions.

          I recommend using more recent version of the server and if this problem is still reproducible, I would need more detailed description how to repeat.

          jplindst Jan Lindström (Inactive) added a comment - I recommend using more recent version of the server and if this problem is still reproducible, I would need more detailed description how to repeat.

          People

            jplindst Jan Lindström (Inactive)
            sun4ezzz Aleksandr Omelchuk
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.