Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26032

"RecordSet checksum does not match" error during IST

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.5.11
    • Fix Version/s: None
    • Component/s: Galera SST
    • Labels:
      None
    • Environment:
      Both native installation on CentOS 8 and using official Docker image

      Description

      This issue happens during IST, but only happens when using "mariabackup" as the sst_method, so I think it's still SST related.

      We have a three node Galera cluster, all using MariaDB 10.5.11. We are using the official docker image (with host networking, one container per VM). I can also reproduce this issue when installing MariaDB 10.5.11 using the official YUM repo on CentOS 8.

      Our cluster is heavily used by multiple clients. None of them do any DDL queries (CREATE, ALTER), but they heavily use INSERTs and DELETEs which cumulate to multiple GB of data per hour.

      When using mariabackup as the `wsrep_sst_method`, the SST succeeds every time (although it takes a few hours, because we have over 300 GB of data, so that is expected), but the immediately following IST fails every time:

      2021-06-27 13:01:21 2 [Note] WSREP: State transfer required: 
      	Group state: 548cc9d7-d6bc-11eb-a54c-3aee935613df:1991660
      	Local state: 00000000-0000-0000-0000-000000000000:-1
      2021-06-27 13:01:21 2 [Note] WSREP: Server status change connected -> joiner
      2021-06-27 13:01:21 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2021-06-27 13:01:21 0 [Note] WSREP: Joiner monitor thread started to monitor
      2021-06-27 13:01:21 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '172.28.180.97' --datadir '/var/lib/mysql/' --parent '1''
      2021-06-27 13:01:21 0 [Note] WSREP: Member 1.0 (cluster1.onpremise.cluster.local) synced with group.
      WSREP_SST: [INFO] SSL configuration: CA='', CERT='', KEY='', MODE='DISABLED', encrypt='0' (20210627 13:01:21.962)
      WSREP_SST: [INFO] Logging all stderr of SST/mariabackup to syslog (20210627 13:01:22.053)
      2021-06-27 13:01:22 2 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 1991660, STRv: 3
      2021-06-27 13:01:22 2 [Note] WSREP: IST receiver addr using tcp://172.28.180.97:4568
      2021-06-27 13:01:22 2 [Note] WSREP: Prepared IST receiver for 0-1991660, listening at: tcp://172.28.180.97:4568
      2021-06-27 13:01:22 0 [Note] WSREP: Member 0.0 (cluster2.onpremise.cluster.local) requested state transfer from '*any*'. Selected 1.0 (cluster1.onpremise.cluster.local)(SYNCED) as donor.
      2021-06-27 13:01:22 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 1991660)
      2021-06-27 13:01:22 2 [Note] WSREP: Requesting state transfer: success, donor: 1
      2021-06-27 13:01:22 2 [Note] WSREP: Resetting GCache seqno map due to different histories.
      2021-06-27 13:01:22 2 [Note] WSREP: GCache history reset: 548cc9d7-d6bc-11eb-a54c-3aee935613df:0 -> 548cc9d7-d6bc-11eb-a54c-3aee935613df:1991660
      2021-06-27 14:02:29 0 [Note] WSREP: 1.0 (cluster1.onpremise.cluster.local): State transfer to 0.0 (cluster2.onpremise.cluster.local) complete.
      2021-06-27 14:02:29 0 [Note] WSREP: Member 1.0 (cluster1.onpremise.cluster.local) synced with group.
      2021-06-27 14:24:00 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000000 of size 134217728 bytes
      2021-06-27 14:24:46 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000001 of size 134217728 bytes
      2021-06-27 14:25:36 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000002 of size 134217728 bytes
      2021-06-27 14:26:11 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000003 of size 134217728 bytes
      2021-06-27 14:26:38 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000004 of size 134217728 bytes
      2021-06-27 14:26:52 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000005 of size 134217728 bytes
      2021-06-27 14:27:23 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000006 of size 134217728 bytes
      2021-06-27 14:27:44 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000007 of size 134217728 bytes
      2021-06-27 14:28:07 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000008 of size 134217728 bytes
      2021-06-27 14:28:24 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000009 of size 134217728 bytes
      2021-06-27 14:28:50 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000010 of size 134217728 bytes
      2021-06-27 14:29:08 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000011 of size 134217728 bytes
      2021-06-27 14:29:45 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000012 of size 134217728 bytes
      2021-06-27 14:30:30 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000013 of size 134217728 bytes
      2021-06-27 14:31:11 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000014 of size 134217728 bytes
      2021-06-27 14:31:27 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000015 of size 134217728 bytes
      2021-06-27 14:31:56 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000016 of size 134217728 bytes
      2021-06-27 14:32:01 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000017 of size 230902344 bytes
      2021-06-27 14:32:24 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000018 of size 134217728 bytes
      2021-06-27 14:32:36 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000019 of size 134217728 bytes
      2021-06-27 14:33:03 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000020 of size 134217728 bytes
      2021-06-27 14:33:30 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000021 of size 134217728 bytes
      2021-06-27 14:33:59 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000022 of size 134217728 bytes
      2021-06-27 14:34:39 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000023 of size 134217728 bytes
      2021-06-27 14:35:09 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000024 of size 134217728 bytes
      2021-06-27 14:37:45 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000025 of size 134217728 bytes
      2021-06-27 14:38:30 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000026 of size 134217728 bytes
      2021-06-27 14:39:14 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000027 of size 134217728 bytes
      2021-06-27 14:39:59 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000028 of size 134217728 bytes
      2021-06-27 14:40:38 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000029 of size 134217728 bytes
      2021-06-27 14:41:09 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000030 of size 134217728 bytes
      2021-06-27 14:41:33 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000031 of size 134217728 bytes
      2021-06-27 14:42:01 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000032 of size 134217728 bytes
      2021-06-27 14:42:26 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000033 of size 134217728 bytes
      2021-06-27 14:42:45 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000034 of size 134217728 bytes
      2021-06-27 14:43:21 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000035 of size 134217728 bytes
      2021-06-27 14:43:39 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000036 of size 134217728 bytes
      2021-06-27 14:44:17 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000037 of size 134217728 bytes
      2021-06-27 14:44:33 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000038 of size 134217728 bytes
      2021-06-27 14:45:15 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000039 of size 134217728 bytes
      2021-06-27 14:45:42 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000040 of size 134217728 bytes
      2021-06-27 14:46:07 3 [Note] WSREP: SST received
      2021-06-27 14:46:07 3 [Note] WSREP: Server status change joiner -> initializing
      2021-06-27 14:46:07 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2021-06-27 14:46:07 0 [Note] InnoDB: Uses event mutexes
      2021-06-27 14:46:07 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2021-06-27 14:46:07 0 [Note] InnoDB: Number of pools: 1
      2021-06-27 14:46:07 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2021-06-27 14:46:07 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)
      2021-06-27 14:46:07 0 [Note] InnoDB: Using Linux native AIO
      2021-06-27 14:46:07 0 [Note] InnoDB: Initializing buffer pool, total size = 17179869184, chunk size = 134217728
      2021-06-27 14:46:08 0 [Note] InnoDB: Completed initialization of buffer pool
      2021-06-27 14:46:08 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 1073741824 bytes
      2021-06-27 14:46:08 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
      2021-06-27 14:46:08 0 [Note] InnoDB: New log file created, LSN=531019491640
      2021-06-27 14:46:08 0 [Note] InnoDB: 128 rollback segments are active.
      2021-06-27 14:46:08 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2021-06-27 14:46:08 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2021-06-27 14:46:08 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
      2021-06-27 14:46:08 0 [Note] InnoDB: 10.5.11 started; log sequence number 0; transaction id 6086012
      2021-06-27 14:46:08 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
      2021-06-27 14:46:08 0 [Note] Plugin 'FEEDBACK' is disabled.
      2021-06-27 14:46:08 0 [Note] Recovering after a crash using tc.log
      2021-06-27 14:46:08 0 [Note] Starting crash recovery...
      2021-06-27 14:46:08 0 [Note] Crash recovery finished.
      2021-06-27 14:46:08 0 [Note] Server socket created on IP: '::'.
      2021-06-27 14:46:08 0 [Warning] 'proxies_priv' entry '@% root@cluster2.onpremise.cluster.local' ignored in --skip-name-resolve mode.
      2021-06-27 14:46:08 0 [Note] WSREP: wsrep_init_schema_and_SR 0x0
      2021-06-27 14:46:08 0 [Note] WSREP: Server initialized
      2021-06-27 14:46:08 0 [Note] WSREP: Server status change initializing -> initialized
      2021-06-27 14:46:08 0 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2021-06-27 14:46:08 3 [Note] WSREP: Server status change initialized -> joined
      2021-06-27 14:46:08 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2021-06-27 14:46:08 3 [Note] WSREP: Recovered position from storage: 548cc9d7-d6bc-11eb-a54c-3aee935613df:2106067
      2021-06-27 14:46:08 3 [Note] WSREP: Recovered view from SST:
        id: 548cc9d7-d6bc-11eb-a54c-3aee935613df:1991660
        status: primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(2):
      	0: 198269dc-d6b7-11eb-b0a8-6f003f372f53, cluster2.onpremise.redispat
      	1: ff7c3e7d-d6b7-11eb-a2f7-263bdfc4eb3d, cluster1.onpremise.redispat
      2021-06-27 14:46:08 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2021-06-27 14:46:08 6 [Note] WSREP: Recovered cluster id 548cc9d7-d6bc-11eb-a54c-3aee935613df
      2021-06-27 14:46:08 0 [Note] mysqld: ready for connections.
      Version: '10.5.11-MariaDB-1:10.5.11+maria~focal'  socket: '/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
      2021-06-27 14:46:08 3 [Note] WSREP: SST received: 548cc9d7-d6bc-11eb-a54c-3aee935613df:2106067
      2021-06-27 14:46:08 3 [Note] WSREP: SST succeeded for position 548cc9d7-d6bc-11eb-a54c-3aee935613df:2106067
      2021-06-27 14:46:08 0 [Note] WSREP: Joiner monitor thread ended with total time 6287 sec
      2021-06-27 14:46:08 2 [Note] WSREP: Installed new state from SST: 548cc9d7-d6bc-11eb-a54c-3aee935613df:2106067
      2021-06-27 14:46:08 2 [Note] WSREP: Cert. index preload up to 2106067
      2021-06-27 14:46:08 0 [Note] WSREP: ####### IST applying starts with 2106068
      2021-06-27 14:46:08 0 [Note] WSREP: ####### IST current seqno initialized to 1991656
      2021-06-27 14:46:08 0 [Note] WSREP: Receiving IST...  0.0% (0/5 events) complete.
      2021-06-27 14:46:08 0 [Note] WSREP: IST preload starting at 1991656
      2021-06-27 14:46:08 0 [ERROR] WSREP: RecordSet checksum does not match:
      computed: 1c92287e 1e1e8b70 3abfa9d2 f1575df4
      found:    101cbb4f fdd8338f dc6bc09b 208caabf: 22 (Invalid argument)
      	 at /home/buildbot/buildbot/build/galerautils/src/gu_rset.cpp:checksum():551
      2021-06-27 14:46:08 0 [ERROR] WSREP: Writeset deserialization failed: Writeset checksum failed: 22 (Invalid argument)
      	 at /home/buildbot/buildbot/build/galera/src/write_set_ng.hpp:checksum_fin():864
      	 at /home/buildbot/buildbot/build/galera/src/trx_handle.hpp:unserialize():533
      WS flags:      0
      Trx proto:     5
      Trx source:    00000000-0000-0000-0000-000000000000
      Trx conn_id:   18446744073709551615
      Trx trx_id:    18446744073709551615
      Trx last_seen: -1
      

      This problem doesn't appear when using `wsrep_sst_method=rsync`, but this blocks the donor, which is really hurtful for us.

      Maybe this is related to this upstream issue, because the error message is the same: https://github.com/codership/galera/issues/385

      This is our server configuration:

      [mysqld]
      innodb_autoinc_lock_mode=2
      innodb_flush_log_at_trx_commit=0
      binlog_format=row
       
      wait_timeout=28800
      skip-log-bin
      log_warnings=1
      innodb_buffer_pool_size=16G
      innodb_log_file_size=1G
      join_buffer_size=4M
      max_connections=1000
      max_heap_table_size=256M
      query_cache_size=0
      query_cache_type=0
      thread_stack=1M
      tmp_table_size=256M
       
      [galera]
      wsrep_on=ON
      wsrep_provider=/usr/lib/libgalera_smm.so
      wsrep_cluster_name=galera
      wsrep_cluster_address=gcomm://172.28.180.96,172.28.180.97,172.28.180.98
      wsrep_node_name=cluster2.onpremise.cluster.local
      wsrep_node_address=172.28.180.97
      wsrep_sst_method=mariabackup
      wsrep_sst_auth=root:thepassword
      wsrep_log_conflicts=1
      wsrep_provider_options="evs.auto_evict=10;evs.inactive_check_period=PT1S;evs.inactive_timeout=PT1M;evs.install_timeout=PT1M;evs.join_retrans_period=PT2S;evs.keepalive_period=PT3S;evs.send_window=128;evs.suspect_timeout=PT30S;evs.user_send_window=64;gcache.recover=yes;gcache.size=8G;gcs.fc_factor=0.8;gcs.fc_limit=60;gcs.max_packet_size=1048576;pc.announce_timeout=PT10S;pc.wait_prim_timeout=PT1M"
      

      Please tell me if there is anything I can provide to debug this issue.

        Attachments

          Activity

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            christianciach Christian Ciach
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:

                Git Integration