Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25368

Galera cluster hangs on Freeing items

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.4.18, 10.5.9, 10.5.11
    • 10.4.22
    • Galera
    • None

    Description

      Hi All, I have a mariadb 10.4.18 galera cluster with 5 node. Randomly on one node hangs a request with "Freeing items" state and "Busy" command. It stops all the write queries on all nodes forever. I have to kill mariadb on the hanging node, because I can't stop it normal way. After kill, other nodes start working correctly. After restart full SST required. Error log is empty. Query cache is off. my.cnf attached.

      Process list:
      ----------------------------------------------------------
      ID: 19717900
      USER: xxx
      HOST: xxx:44464
      DB:
      COMMAND: Busy
      TIME: 28
      STATE: Freeing items
      INFO:
      TIME_MS: 28497.852
      STAGE: 0
      MAX_STAGE: 0
      PROGRESS: 0.000
      MEMORY_USED: 82232
      MAX_MEMORY_USED: 178320
      EXAMINED_ROWS: 1
      QUERY_ID: 1328408265
      INFO_BINARY:
      TID: 36964
      ----------------------------------------------------------
      ID: 23
      USER: system user
      HOST:
      DB:
      COMMAND: Busy
      TIME: 28
      STATE: committing
      INFO:
      TIME_MS: 28496.583
      STAGE: 0
      MAX_STAGE: 0
      PROGRESS: 0.000
      MEMORY_USED: 65136
      MAX_MEMORY_USED: 81584
      EXAMINED_ROWS: 0
      QUERY_ID: 1328408270
      INFO_BINARY:
      TID: 12877
      ----------------------------------------------------------

      Please help.

      Attachments

        1. Command counters 10.5.9 vs 10.5.6.png
          Command counters 10.5.9 vs 10.5.6.png
          171 kB
        2. gdb2.log
          2.19 MB
        3. my.cnf
          2 kB

        Issue Links

          Activity

            mattlf Matt Le Fevre added a comment -

            We have (anecdotally) heard from other users that have upgraded to later versions that either a similar issue exists, or a different issue was present that caused them to rollback to 10.5.6 again.

            It would be fantastic to hear definitively if this issue is actually resolved/the usual high stability has been restored, we have also been very nervous about upgrading from 10.5.6 since my original comment back in April 2021 :/

            mattlf Matt Le Fevre added a comment - We have (anecdotally) heard from other users that have upgraded to later versions that either a similar issue exists, or a different issue was present that caused them to rollback to 10.5.6 again. It would be fantastic to hear definitively if this issue is actually resolved/the usual high stability has been restored, we have also been very nervous about upgrading from 10.5.6 since my original comment back in April 2021 :/
            stephanvos Stephan Vos added a comment -

            This was supposedly fixed in 10.5.13: MDEV-25114
            Can anyone confirm that this is the case?
            It is not realistic being stuck on 10.5.6 or the 10.6 equivalent.

            stephanvos Stephan Vos added a comment - This was supposedly fixed in 10.5.13: MDEV-25114 Can anyone confirm that this is the case? It is not realistic being stuck on 10.5.6 or the 10.6 equivalent.

            I can confirm this was NOT the case on my side for 10.5.15

            Unfortunately I cannot contribute more to this, I do not have access anymore to my related case.

            Stedounet La Cancellera Yoann added a comment - I can confirm this was NOT the case on my side for 10.5.15 Unfortunately I cannot contribute more to this, I do not have access anymore to my related case.
            stephanvos Stephan Vos added a comment -

            Wondering if it could be related to this: https://rewardgateway.engineering/2021/12/11/a-galera-replication-race-condition/
            https://jira.percona.com/browse/PXC-2959

            From trawling all these cluster freeze issues seems more often than not FK's and DELETE or UPDATE involved.

            stephanvos Stephan Vos added a comment - Wondering if it could be related to this: https://rewardgateway.engineering/2021/12/11/a-galera-replication-race-condition/ https://jira.percona.com/browse/PXC-2959 From trawling all these cluster freeze issues seems more often than not FK's and DELETE or UPDATE involved.

            I had a similar lock, see MDEV-29512

            dupondje Jean-Louis Dupond added a comment - I had a similar lock, see MDEV-29512

            People

              seppo Seppo Jaakola
              koczanakos Kóczán Ákos
              Votes:
              17 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.