Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29346

update_rows_log_event hung causing galera cluster failure

    XMLWordPrintable

Details

    Description

      We have multiple galera clusters working in a multi-master setup. And noticed that a "sleeping" system thread could hung the whole cluster.

      When this system thread hung as shown in the screenshot, the whole galera cluster goes into a stand still. Nothing an be written into the database

      We have a log that print the "wsrep_last_committed", it shows that one of the node 's wsrep_last_commited is not moving. Did the wsrep plugin in Galera hung?

      The h5 server is the one that stuck. There is nothing in the mysql.err showing any stacktrace

      2022-08-18 06:10:04,862 INFO galera_alert line:93 galerastats on node xxx-h4: 
      2022-08-18 06:10:04,861 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "150", "wsrep_last_committed": "21383020", 
      2022-08-18 06:10:04,862 INFO galera_alert line:93 galerastats on node xxx-h5: 
      2022-08-18 06:10:04,862 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "590", "wsrep_last_committed": "21382990", 
      2022-08-18 06:10:04,863 INFO galera_alert line:93 galerastats on node xxx-h6: 
      2022-08-18 06:10:04,863 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "204", "wsrep_last_committed": "21383020", 
      ....
      ....
      2022-08-18 06:30:04,996 INFO galera_alert line:93 galerastats on node xxx-h4: 
      2022-08-18 06:30:04,996 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "170", "wsrep_last_committed": "21383020",
      2022-08-18 06:30:04,997 INFO galera_alert line:93 galerastats on node xxx-h5: 
      2022-08-18 06:30:04,997 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "643", "wsrep_last_committed": "21382990", 
      2022-08-18 06:30:04,997 INFO galera_alert line:93 galerastats on node xxx-h6: 
      2022-08-18 06:30:04,997 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "228", "wsrep_last_committed": "21383020", 
      

      The only solution to "unbreak" it is to stop the hung node, kill mariadb and start the mariadb service

      Attachments

        Issue Links

          Activity

            People

              janlindstrom Jan Lindström
              khaiping.loh Khai Ping
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.