Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29346

update_rows_log_event hung causing galera cluster failure

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 10.6.5
    • Fix Version/s: 10.6
    • Component/s: Galera
    • Labels:
      None
    • Environment:
      3 Node Galera Cluster

      Description

      We have multiple galera clusters working in a multi-master setup. And noticed that a "sleeping" system thread could hung the whole cluster.

      When this system thread hung as shown in the screenshot, the whole galera cluster goes into a stand still. Nothing an be written into the database

      We have a log that print the "wsrep_last_committed", it shows that one of the node 's wsrep_last_commited is not moving. Did the wsrep plugin in Galera hung?

      The h5 server is the one that stuck. There is nothing in the mysql.err showing any stacktrace

      2022-08-18 06:10:04,862 INFO galera_alert line:93 galerastats on node xxx-h4: 
      2022-08-18 06:10:04,861 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "150", "wsrep_last_committed": "21383020", 
      2022-08-18 06:10:04,862 INFO galera_alert line:93 galerastats on node xxx-h5: 
      2022-08-18 06:10:04,862 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "590", "wsrep_last_committed": "21382990", 
      2022-08-18 06:10:04,863 INFO galera_alert line:93 galerastats on node xxx-h6: 
      2022-08-18 06:10:04,863 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "204", "wsrep_last_committed": "21383020", 
      ....
      ....
      2022-08-18 06:30:04,996 INFO galera_alert line:93 galerastats on node xxx-h4: 
      2022-08-18 06:30:04,996 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "170", "wsrep_last_committed": "21383020",
      2022-08-18 06:30:04,997 INFO galera_alert line:93 galerastats on node xxx-h5: 
      2022-08-18 06:30:04,997 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "643", "wsrep_last_committed": "21382990", 
      2022-08-18 06:30:04,997 INFO galera_alert line:93 galerastats on node xxx-h6: 
      2022-08-18 06:30:04,997 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "228", "wsrep_last_committed": "21383020", 
      

      The only solution to "unbreak" it is to stop the hung node, kill mariadb and start the mariadb service

        Attachments

          Activity

            People

            Assignee:
            jplindst Jan Lindström
            Reporter:
            khaiping.loh Khai Ping
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:

                Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.