Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34267

Since version 10.11.8 keep getting Inconsistency HA_ERR_FOUND_DUPP_KEY

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.11.8
    • None
    • Replication
    • None
    • Ubuntu 22.04

    Description

      Have a 3 node cluster of about 230Gb data set with various databases. The cluster is usually kept up to date with apt reasonably often. So we find ourselves on 10.11.8 / Galera 26.4.18(ra96793fc) and we have started having difficulty maintaining primary ever since the first node to run 10.11.8 was restarted. To rule out configuration inconsistencies I cloned the master node and rejoined the clones to it. After catching up things are ok for a few hours until we encounter inconsistent state and the "slave" nodes take themselves out of the cluster.

      It looks like the slave has tried to apply the replication twice. I checked the table and there are no duplicates. That table has a multi-column PK if that is relevant but thats valid and supported.

      At this stage, you'll understand if I am not certain there isn't a problem with the table, the config or the versions, but I can say it only started with 10.11.8. We have another database with a very large data set same versions and doesn't appear to have the issue. The difference is that it doesn't have SST compression defined, but otherwise identical config.

      [sst]
      #compressor="/usr/bin/pigz"
      #decompressor="/usr/bin/pigz -d"

      I have disabled on the problematic cluster to see if the compression is the cause of the problem. I have to wait for it to get in sync and then see what happens. I ran check table and can dump or use mariabackup ok, I think the table is fine. But yesterday it was a different table, same error. Note that the error doesn't occur often but on a busy cluster we trip over it quite often. On a test cluster I can't get enough test data to cause the issue.

      Node config looks like this:

      {{[mysqld]
      wsrep_provider_options="pc.weight=2;gcache.size=1024M;gcache.recover=yes;evs.inactive_check_period=PT1S;evs.keepalive_period=PT3S;evs.suspect_timeout=P30S;evs.inactive_timeout=PT1M;evs.install_timeout=PT1M;evs.send_window=1024;evs.user_send_window=512;gcs.fc_limit=40;gcs.fc_factor=0.8;"
      wsrep_on=ON
      wsrep_cluster_name="ffwebc_cluster001"
      wsrep_cluster_address='gcomm://node1,node2,node3?pc.wait_prim=no'
      wsrep_provider=/usr/lib/galera/libgalera_smm.so
      wsrep_notify_cmd=/usr/local/sbin/wsrep-notify
      wsrep_sst_method=mariabackup
      wsrep_sst_auth='galera:galera'

      1. this node
        wsrep_node_address="192.168.80.33"
        wsrep_node_name="node1"
        wsrep_slave_threads=8

      binlog_format=ROW
      default_storage_engine=InnoDB
      innodb_autoinc_lock_mode=2
      query_cache_size=0
      innodb_flush_log_at_trx_commit=0
      sync_binlog=0

      [sst]
      #compressor="/usr/bin/pigz"
      #decompressor="/usr/bin/pigz -d"}}

      and the error looks like:

      {{2024-05-30 0:55:07 2 [Warning] WSREP: Ignoring error 'Duplicate entry '226-5423890-202422-3-4' for key 'PRIMARY'' on query. Default database: 'sales_report'. Query: 'INSERT INTO wastereporting.product_waste (branch_id, product_id, yearweek, day, value, units, reason_id)
      SELECT branch.id, products.id, 202422, 3, 1, 2, reasons.id
      FROM sales_report.branch, sales_report.products, wastereporting.reasons
      WHERE branch.branch_code = '541'
      AND products.product_code = 'U589'
      AND reasons.name = 'Damaged On Delivery'', Error_code: 1062

      2024-05-30 0:55:39 7 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table wastereporting.product_waste; Duplicate entry '3-655848-202422-4-3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 12250620, Internal MariaDB error code: 1062

      2024-05-30 0:55:39 7 [Warning] WSREP: Event 53104 Write_rows_v1 apply failed: 121, seqno 1493016

      2024-05-30 0:55:39 0 [Note] WSREP: Member 1(node3) initiates vote on 74650a58-1cfb-11ef-91c9-970177d851aa:1493016,e79c311fecd71f20: Duplicate entry '3-655848-202422-4-
      3' for key 'PRIMARY', Error_code: 1062;
      2024-05-30 0:55:39 0 [Note] WSREP: Votes over 74650a58-1cfb-11ef-91c9-970177d851aa:1493016:
      0000000000000000: 1/2
      e79c311fecd71f20: 1/2
      Winner: 0000000000000000

      2024-05-30 0:55:39 7 [ERROR] WSREP: Inconsistency detected: Inconsistent by consensus on 74650a58-1cfb-11ef-91c9-970177d851aa:1493016 at ./galera/src/replicator_smm.cpp:process_apply_error():1370
      2024-05-30 0:55:39 7 [Note] WSREP: Closing send monitor...
      2024-05-30 0:55:39 7 [Note] WSREP: Closed send monitor.
      2024-05-30 0:55:39 7 [Note] WSREP: gcomm: terminating thread
      2024-05-30 0:55:39 7 [Note] WSREP: gcomm: joining thread
      2024-05-30 0:55:39 7 [Note] WSREP: gcomm: closing backend
      2024-05-30 0:55:40 7 [Note] WSREP: view(view_id(NON_PRIM,5d3bcc69-9846,12) memb

      { 7464ab24-929a,0 }

      joined {
      } left {
      } partitioned

      { 5d3bcc69-9846,0 }

      )
      2024-05-30 0:55:40 7 [Note] WSREP: PC protocol downgrade 1 -> 0
      2024-05-30 0:55:40 7 [Note] WSREP: view((empty))
      2024-05-30 0:55:40 7 [Note] WSREP: gcomm: closed}}

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              DrJaymz James Cross
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.