Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29346

update_rows_log_event hung causing galera cluster failure

Details

    Description

      We have multiple galera clusters working in a multi-master setup. And noticed that a "sleeping" system thread could hung the whole cluster.

      When this system thread hung as shown in the screenshot, the whole galera cluster goes into a stand still. Nothing an be written into the database

      We have a log that print the "wsrep_last_committed", it shows that one of the node 's wsrep_last_commited is not moving. Did the wsrep plugin in Galera hung?

      The h5 server is the one that stuck. There is nothing in the mysql.err showing any stacktrace

      2022-08-18 06:10:04,862 INFO galera_alert line:93 galerastats on node xxx-h4: 
      2022-08-18 06:10:04,861 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "150", "wsrep_last_committed": "21383020", 
      2022-08-18 06:10:04,862 INFO galera_alert line:93 galerastats on node xxx-h5: 
      2022-08-18 06:10:04,862 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "590", "wsrep_last_committed": "21382990", 
      2022-08-18 06:10:04,863 INFO galera_alert line:93 galerastats on node xxx-h6: 
      2022-08-18 06:10:04,863 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "204", "wsrep_last_committed": "21383020", 
      ....
      ....
      2022-08-18 06:30:04,996 INFO galera_alert line:93 galerastats on node xxx-h4: 
      2022-08-18 06:30:04,996 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "170", "wsrep_last_committed": "21383020",
      2022-08-18 06:30:04,997 INFO galera_alert line:93 galerastats on node xxx-h5: 
      2022-08-18 06:30:04,997 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "643", "wsrep_last_committed": "21382990", 
      2022-08-18 06:30:04,997 INFO galera_alert line:93 galerastats on node xxx-h6: 
      2022-08-18 06:30:04,997 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "228", "wsrep_last_committed": "21383020", 
      

      The only solution to "unbreak" it is to stop the hung node, kill mariadb and start the mariadb service

      Attachments

        Issue Links

          Activity

            khaiping.loh Khai Ping created issue -
            jplindst Jan Lindström (Inactive) made changes -
            Field Original Value New Value
            Assignee Jan Lindström [ jplindst ]
            elenst Elena Stepanova made changes -
            Fix Version/s 10.6 [ 24028 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status Open [ 1 ] Needs Feedback [ 10501 ]
            Roel Roel Van de Paar made changes -
            khaiping.loh Khai Ping made changes -
            Attachment mariadbd_full_bt_all_threads.txt [ 68208 ]
            danblack Daniel Black made changes -
            Assignee Jan Lindström [ jplindst ] Julius Goryavsky [ sysprg ]
            danblack Daniel Black made changes -
            Status Needs Feedback [ 10501 ] Open [ 1 ]
            khaiping.loh Khai Ping made changes -
            sysprg Julius Goryavsky made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            JIraAutomate JiraAutomate made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            sysprg Julius Goryavsky made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            sysprg Julius Goryavsky made changes -
            sysprg Julius Goryavsky made changes -
            Assignee Julius Goryavsky [ sysprg ] Seppo Jaakola [ seppo ]
            janlindstrom Jan Lindström made changes -
            Assignee Seppo Jaakola [ seppo ] Jan Lindström [ JIRAUSER53125 ]
            janlindstrom Jan Lindström made changes -
            Status Stalled [ 10000 ] Needs Feedback [ 10501 ]
            khaiping.loh Khai Ping made changes -
            Attachment mariadb stacktrace.zip [ 73413 ]
            khaiping.loh Khai Ping made changes -
            Attachment mariadb stacktrace.zip [ 73413 ]
            janlindstrom Jan Lindström made changes -
            Status Needs Feedback [ 10501 ] Open [ 1 ]
            janlindstrom Jan Lindström made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            janlindstrom Jan Lindström made changes -
            Fix Version/s 10.6.15 [ 29013 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            janlindstrom Jan Lindström made changes -
            janlindstrom Jan Lindström made changes -
            Fix Version/s 11.3.2 [ 29522 ]
            Fix Version/s 11.2.3 [ 29521 ]
            Fix Version/s 11.1.4 [ 29024 ]
            Fix Version/s 10.11.7 [ 29519 ]
            Fix Version/s 10.10.7 [ 29018 ]
            Fix Version/s 10.9.8 [ 29015 ]

            People

              janlindstrom Jan Lindström
              khaiping.loh Khai Ping
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.