Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27130

Node of Galera Cluster with 5 Members freezes after directing traffic of it

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Cannot Reproduce
    • 10.5.13
    • N/A
    • None
    • MariaDB 10.5.13 , galera provider 26.4.10 ,FreeBSD 12 and 13 . ZFS storage, 1500G RAM , 64 or 96 Cores

    Description

      Hello all,

      We updated our MariaDB Galera Cluster to 10.5.13 last week. Since then we are facing following issue each time when try to switch the "master" node.

      When we switch the traffic from one node to other at the time of medium loaded service - 15-20000 q/s, the new node freezes in brutal way.

      The wsrep status stays as it is normal member of the cluster - Synced with 5 IPs listed, but other members exclude it from the quorum.

      The log is filled in infinitive loop with following messages:

      InnoDB: WSREP: BF lock wait long for trx:11255701331 query: INSERT INTO

      In the log are repeated the same 7-10 unique INSERTS.

      The whole cluster freezes until we shutdown the mysqld on "bad" node with regular service shutdown - /usr/local/etc/rc.d/mysql-server onestop.

      When we try to stop the Node with regular shutdown procedure the node is excluded from the cluster and service operations continue as normal. But the mysqld is going to print
      in infinitive loop the queries noted above. The only way to stop the mysqld working on "bad" node is with kill -9.

      The specific thing here is the query ( INSERT ) and target table are always the same. We have massive INSERT load to this table and one daemon which process the data on background. If we stop the daemon before node switching there are no issues.

      Cheers
      Rumen

      Attachments

        Activity

          Hello there,

          We do have the same issue happening on a 3-nodes MariaDB Galera Cluster running 10.4.24 on Debian 10.

          The SQL workload is fully load balanced between the 3 nodes, the average metrics during workday are (on each node) :

          • selects : ~4000/s
          • updates : ~200/s
          • inserts : ~80/s
          • deletes : < 1/s

          The issue mostly happens with 2 or 3 INSERT transactions on the same table (with different values). The table is about 43M rows with 31GB of datas and 20GB of index.

          When it happens, a graceful stop of "bad" node does unblock the whole cluster but it always needs some SIGKILL to be able to restart the service. Usually the restarted node does sync with IST, sometimes it needs a SST (whereas the configured 1 GB cache should cover about 3 hours of downtime...).

          We keep track of each incident with full processlist on JSON format + mysqld log (with "WSREP: BF lock wait long for" messages). How can we privately send you these informations that might be useful to troubleshoot the issue ?

          mathieucausero Mathieu CAUSERO added a comment - Hello there, We do have the same issue happening on a 3-nodes MariaDB Galera Cluster running 10.4.24 on Debian 10. The SQL workload is fully load balanced between the 3 nodes, the average metrics during workday are (on each node) : selects : ~4000/s updates : ~200/s inserts : ~80/s deletes : < 1/s The issue mostly happens with 2 or 3 INSERT transactions on the same table (with different values). The table is about 43M rows with 31GB of datas and 20GB of index. When it happens, a graceful stop of "bad" node does unblock the whole cluster but it always needs some SIGKILL to be able to restart the service. Usually the restarted node does sync with IST, sometimes it needs a SST (whereas the configured 1 GB cache should cover about 3 hours of downtime...). We keep track of each incident with full processlist on JSON format + mysqld log (with "WSREP: BF lock wait long for" messages). How can we privately send you these informations that might be useful to troubleshoot the issue ?

          Hi Mathieu CAUSERO,

          Backtrace of all threads taken with gdb when node hangs would be a good start. I think it can be shared in public as it does not contain any really sensitive information (please, review and check if in doubts before attaching).

          valerii Valerii Kravchuk added a comment - Hi Mathieu CAUSERO, Backtrace of all threads taken with gdb when node hangs would be a good start. I think it can be shared in public as it does not contain any really sensitive information (please, review and check if in doubts before attaching).

          mathieucausero I have few questions. Do you have primary key on all tables affected? Do you have foreign keys between tables affected? Do you use auto increment as primary key column and if you do, do you set different offset for every node?

          jplindst Jan Lindström (Inactive) added a comment - mathieucausero I have few questions. Do you have primary key on all tables affected? Do you have foreign keys between tables affected? Do you use auto increment as primary key column and if you do, do you set different offset for every node?
          mathieucausero Mathieu CAUSERO added a comment - - edited

          @Jan Lindström to answer your few questions :

          • primary key : the table mainly involved by these freezes has a composed primary key (for an unkown reason) -> PRIMARY KEY (`IDappels`,`identifiantProprietaire`)
          • no foreign key on these tables
          • auto increment is set on the primary key (well on the previous example on IDappels)

          From what I see on our customer's database, every impacted table (= every table that have had requests involved in cluster deadlock) is made like this.

          Would that composed PK be an issue ?

          To add a bit more context, the most impacted table is made of 68M rows with 36GB of datas. It's taken dozens of INSERT per second from 3 galera nodes (all are used as writers). Servers are all baremetal servers with 128GB of RAM (InnoDB buffer pool is bigger than the datas+index size), NVMe storage, 2 Gbit/s private network with MTU 9000.

          mathieucausero Mathieu CAUSERO added a comment - - edited @Jan Lindström to answer your few questions : primary key : the table mainly involved by these freezes has a composed primary key (for an unkown reason) -> PRIMARY KEY (`IDappels`,`identifiantProprietaire`) no foreign key on these tables auto increment is set on the primary key (well on the previous example on IDappels) From what I see on our customer's database, every impacted table (= every table that have had requests involved in cluster deadlock) is made like this. Would that composed PK be an issue ? To add a bit more context, the most impacted table is made of 68M rows with 36GB of datas. It's taken dozens of INSERT per second from 3 galera nodes (all are used as writers). Servers are all baremetal servers with 128GB of RAM (InnoDB buffer pool is bigger than the datas+index size), NVMe storage, 2 Gbit/s private network with MTU 9000.

          There has been significant fixes on Galera conflict resolution and multi-master execution. I suggest trying more recent version of MariaDB and Galera library.

          janlindstrom Jan Lindström added a comment - There has been significant fixes on Galera conflict resolution and multi-master execution. I suggest trying more recent version of MariaDB and Galera library.

          People

            seppo Seppo Jaakola
            skarida Rumen Palov
            Votes:
            5 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.