Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33509

Failed to apply write set with flags = (rollback | pa_unsafe)

    XMLWordPrintable

Details

    Description

      Message from the customer:

      Description: Hello ,
      One of our heavy loaded 3 Nodes Galera Clusters fall in Inconsistency issue of 2 nodes from 3.
       
      MariaDB CS 10.6.16 , Galera Provider 26.4.16.
       
      The nodes are with following names and roles:
       
      db3 - Application Master Node for DMLs
      db4 - Wsrep replicator for 99.99% of the queries, despite Aggregation DML which are "offloaded" on it
      db5 - Wsrep replicator and Async Replication Master
       
      All application traffic goes to node db3.
       
      We use node db4 to offload data aggregation functions, which are doing DMLs on dedicated tables, no other functionality is doing changes on those Aggr tables.
       
      Today we start our standard procedure to perform Live Alter on one of Aggr tables. Live alter script was executed on db3 instead of db4. 10 minutes after Live Alter was started the db4 node become Inconsistent.
       
      Live Alter script continue to work on db3.
       
      Background Aggr functions start using db5 for doing aggregation DML. 10 minutes after that db5 become's also Inconsistent.
       
      Live Alter is used to change DDL on big tables without cluster hang. It is working with triggers and INSERT SELECT

      Judging by the customer's logs, we are dealing with the following failure:

      From db4:
       
      2024-02-16 12:05:30 6 [ERROR] WSREP: Failed to apply write set: gtid: a8f0f00f-842d-11eb-b5a7-7a1763e4c1e6:45662493750 server_id: 269fd913-9633-11ee-9629-87d0be11dc45 client_id: 18446744073709551615 trx_id: 48609383118 flags: 20 (rollback | pa_unsafe)
      ...
       
      From db5:
       
      2024-02-16 12:05:30 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
      2024-02-16 12:05:30 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: eab3a4dc-ccb2-11ee-a3ba-c612c5ccc31d
      2024-02-16 12:05:30 0 [Note] WSREP: STATE EXCHANGE: sent state msg: eab3a4dc-ccb2-11ee-a3ba-c612c5ccc31d
      2024-02-16 12:05:30 0 [Note] WSREP: STATE EXCHANGE: got state msg: eab3a4dc-ccb2-11ee-a3ba-c612c5ccc31d from 0 (fx112_db5)
      2024-02-16 12:05:30 0 [Note] WSREP: STATE EXCHANGE: got state msg: eab3a4dc-ccb2-11ee-a3ba-c612c5ccc31d from 1 (fx112_db3)
      2024-02-16 12:05:30 0 [Note] WSREP: Quorum results:
      version = 6,
      component = PRIMARY,
      conf_id = 281,
      members = 2/2 (joined/total),
      act_id = 45662493751,
      last_appl. = 45662493731,
      protocols = 2/10/4 (gcs/repl/appl),
      vote policy= 0,
      group UUID = a8f0f00f-842d-11eb-b5a7-7a1763e4c1e6
      2024-02-16 12:05:30 0 [Note] WSREP: Flow-control interval: [6000, 6000]
      2024-02-16 12:05:30 21 [Note] WSREP: ####### processing CC 45662493752, local, ordered
      2024-02-16 12:05:30 21 [Note] WSREP: ####### My UUID: 184a029b-9622-11ee-8c61-6f91e19335c1
      2024-02-16 12:05:30 21 [Note] WSREP: Skipping cert index reset
      2024-02-16 12:05:30 21 [Note] WSREP: REPL Protocols: 10 (5)
      2024-02-16 12:05:30 21 [Note] WSREP: ####### Adjusting cert position: 45662493751 -> 45662493752
      2024-02-16 12:05:30 0 [Note] WSREP: Service thread queue flushed.
      2024-02-16 12:05:30 21 [Note] WSREP: ================================================
      View:
      id: a8f0f00f-842d-11eb-b5a7-7a1763e4c1e6:45662493752
      status: primary
      protocol_version: 4
      capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
      final: no
      own_index: 0
      members(2):
      0: 184a029b-9622-11ee-8c61-6f91e19335c1, fx112_db5
      1: b8c19c60-962e-11ee-a025-af00b051f68f, fx112_db3
      =================================================
      2024-02-16 12:05:30 21 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2024-02-16 12:05:30 21 [Note] WSREP: Lowest cert index boundary for CC from group: 45662493732
      2024-02-16 12:05:30 21 [Note] WSREP: Min available from gcache for CC from group: 45614618803
      2024-02-16 12:05:36 0 [Note] WSREP: cleaning up 269fd913-9629 (tcp://xxx.xxx.xxx.xxx:yyyy)
      2024-02-16 12:15:46 35 [ERROR] WSREP: Failed to apply write set: gtid: a8f0f00f-842d-11eb-b5a7-7a1763e4c1e6:45662930080 server_id: 184a029b-9622-11ee-8c61-6f91e19335c1 client_id: 18446744073709551615 trx_id: 81448619022 flags: 20 (rollback | pa_unsafe)

      Attachments

        1. provide_to_mariadb_obfuscated.tar
          214 kB
          Valerii Kravchuk
        2. wsrep_variables.txt
          8 kB
          Valerii Kravchuk

        Activity

          People

            sysprg Julius Goryavsky
            sysprg Julius Goryavsky
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.