Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
10.5.11
-
None
-
Both native installation on CentOS 8 and using official Docker image
Description
This issue happens during IST, but only happens when using "mariabackup" as the sst_method, so I think it's still SST related.
We have a three node Galera cluster, all using MariaDB 10.5.11. We are using the official docker image (with host networking, one container per VM). I can also reproduce this issue when installing MariaDB 10.5.11 using the official YUM repo on CentOS 8.
Our cluster is heavily used by multiple clients. None of them do any DDL queries (CREATE, ALTER), but they heavily use INSERTs and DELETEs which cumulate to multiple GB of data per hour.
When using mariabackup as the `wsrep_sst_method`, the SST succeeds every time (although it takes a few hours, because we have over 300 GB of data, so that is expected), but the immediately following IST fails every time:
2021-06-27 13:01:21 2 [Note] WSREP: State transfer required:
|
Group state: 548cc9d7-d6bc-11eb-a54c-3aee935613df:1991660
|
Local state: 00000000-0000-0000-0000-000000000000:-1
|
2021-06-27 13:01:21 2 [Note] WSREP: Server status change connected -> joiner
|
2021-06-27 13:01:21 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2021-06-27 13:01:21 0 [Note] WSREP: Joiner monitor thread started to monitor
|
2021-06-27 13:01:21 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '172.28.180.97' --datadir '/var/lib/mysql/' --parent '1''
|
2021-06-27 13:01:21 0 [Note] WSREP: Member 1.0 (cluster1.onpremise.cluster.local) synced with group.
|
WSREP_SST: [INFO] SSL configuration: CA='', CERT='', KEY='', MODE='DISABLED', encrypt='0' (20210627 13:01:21.962)
|
WSREP_SST: [INFO] Logging all stderr of SST/mariabackup to syslog (20210627 13:01:22.053)
|
2021-06-27 13:01:22 2 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 1991660, STRv: 3
|
2021-06-27 13:01:22 2 [Note] WSREP: IST receiver addr using tcp://172.28.180.97:4568
|
2021-06-27 13:01:22 2 [Note] WSREP: Prepared IST receiver for 0-1991660, listening at: tcp://172.28.180.97:4568
|
2021-06-27 13:01:22 0 [Note] WSREP: Member 0.0 (cluster2.onpremise.cluster.local) requested state transfer from '*any*'. Selected 1.0 (cluster1.onpremise.cluster.local)(SYNCED) as donor.
|
2021-06-27 13:01:22 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 1991660)
|
2021-06-27 13:01:22 2 [Note] WSREP: Requesting state transfer: success, donor: 1
|
2021-06-27 13:01:22 2 [Note] WSREP: Resetting GCache seqno map due to different histories.
|
2021-06-27 13:01:22 2 [Note] WSREP: GCache history reset: 548cc9d7-d6bc-11eb-a54c-3aee935613df:0 -> 548cc9d7-d6bc-11eb-a54c-3aee935613df:1991660
|
2021-06-27 14:02:29 0 [Note] WSREP: 1.0 (cluster1.onpremise.cluster.local): State transfer to 0.0 (cluster2.onpremise.cluster.local) complete.
|
2021-06-27 14:02:29 0 [Note] WSREP: Member 1.0 (cluster1.onpremise.cluster.local) synced with group.
|
2021-06-27 14:24:00 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000000 of size 134217728 bytes
|
2021-06-27 14:24:46 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000001 of size 134217728 bytes
|
2021-06-27 14:25:36 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000002 of size 134217728 bytes
|
2021-06-27 14:26:11 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000003 of size 134217728 bytes
|
2021-06-27 14:26:38 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000004 of size 134217728 bytes
|
2021-06-27 14:26:52 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000005 of size 134217728 bytes
|
2021-06-27 14:27:23 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000006 of size 134217728 bytes
|
2021-06-27 14:27:44 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000007 of size 134217728 bytes
|
2021-06-27 14:28:07 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000008 of size 134217728 bytes
|
2021-06-27 14:28:24 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000009 of size 134217728 bytes
|
2021-06-27 14:28:50 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000010 of size 134217728 bytes
|
2021-06-27 14:29:08 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000011 of size 134217728 bytes
|
2021-06-27 14:29:45 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000012 of size 134217728 bytes
|
2021-06-27 14:30:30 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000013 of size 134217728 bytes
|
2021-06-27 14:31:11 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000014 of size 134217728 bytes
|
2021-06-27 14:31:27 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000015 of size 134217728 bytes
|
2021-06-27 14:31:56 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000016 of size 134217728 bytes
|
2021-06-27 14:32:01 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000017 of size 230902344 bytes
|
2021-06-27 14:32:24 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000018 of size 134217728 bytes
|
2021-06-27 14:32:36 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000019 of size 134217728 bytes
|
2021-06-27 14:33:03 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000020 of size 134217728 bytes
|
2021-06-27 14:33:30 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000021 of size 134217728 bytes
|
2021-06-27 14:33:59 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000022 of size 134217728 bytes
|
2021-06-27 14:34:39 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000023 of size 134217728 bytes
|
2021-06-27 14:35:09 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000024 of size 134217728 bytes
|
2021-06-27 14:37:45 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000025 of size 134217728 bytes
|
2021-06-27 14:38:30 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000026 of size 134217728 bytes
|
2021-06-27 14:39:14 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000027 of size 134217728 bytes
|
2021-06-27 14:39:59 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000028 of size 134217728 bytes
|
2021-06-27 14:40:38 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000029 of size 134217728 bytes
|
2021-06-27 14:41:09 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000030 of size 134217728 bytes
|
2021-06-27 14:41:33 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000031 of size 134217728 bytes
|
2021-06-27 14:42:01 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000032 of size 134217728 bytes
|
2021-06-27 14:42:26 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000033 of size 134217728 bytes
|
2021-06-27 14:42:45 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000034 of size 134217728 bytes
|
2021-06-27 14:43:21 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000035 of size 134217728 bytes
|
2021-06-27 14:43:39 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000036 of size 134217728 bytes
|
2021-06-27 14:44:17 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000037 of size 134217728 bytes
|
2021-06-27 14:44:33 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000038 of size 134217728 bytes
|
2021-06-27 14:45:15 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000039 of size 134217728 bytes
|
2021-06-27 14:45:42 0 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000040 of size 134217728 bytes
|
2021-06-27 14:46:07 3 [Note] WSREP: SST received
|
2021-06-27 14:46:07 3 [Note] WSREP: Server status change joiner -> initializing
|
2021-06-27 14:46:07 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2021-06-27 14:46:07 0 [Note] InnoDB: Uses event mutexes
|
2021-06-27 14:46:07 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
|
2021-06-27 14:46:07 0 [Note] InnoDB: Number of pools: 1
|
2021-06-27 14:46:07 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
|
2021-06-27 14:46:07 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)
|
2021-06-27 14:46:07 0 [Note] InnoDB: Using Linux native AIO
|
2021-06-27 14:46:07 0 [Note] InnoDB: Initializing buffer pool, total size = 17179869184, chunk size = 134217728
|
2021-06-27 14:46:08 0 [Note] InnoDB: Completed initialization of buffer pool
|
2021-06-27 14:46:08 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 1073741824 bytes
|
2021-06-27 14:46:08 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
|
2021-06-27 14:46:08 0 [Note] InnoDB: New log file created, LSN=531019491640
|
2021-06-27 14:46:08 0 [Note] InnoDB: 128 rollback segments are active.
|
2021-06-27 14:46:08 0 [Note] InnoDB: Creating shared tablespace for temporary tables
|
2021-06-27 14:46:08 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
|
2021-06-27 14:46:08 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
|
2021-06-27 14:46:08 0 [Note] InnoDB: 10.5.11 started; log sequence number 0; transaction id 6086012
|
2021-06-27 14:46:08 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
|
2021-06-27 14:46:08 0 [Note] Plugin 'FEEDBACK' is disabled.
|
2021-06-27 14:46:08 0 [Note] Recovering after a crash using tc.log
|
2021-06-27 14:46:08 0 [Note] Starting crash recovery...
|
2021-06-27 14:46:08 0 [Note] Crash recovery finished.
|
2021-06-27 14:46:08 0 [Note] Server socket created on IP: '::'.
|
2021-06-27 14:46:08 0 [Warning] 'proxies_priv' entry '@% root@cluster2.onpremise.cluster.local' ignored in --skip-name-resolve mode.
|
2021-06-27 14:46:08 0 [Note] WSREP: wsrep_init_schema_and_SR 0x0
|
2021-06-27 14:46:08 0 [Note] WSREP: Server initialized
|
2021-06-27 14:46:08 0 [Note] WSREP: Server status change initializing -> initialized
|
2021-06-27 14:46:08 0 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2021-06-27 14:46:08 3 [Note] WSREP: Server status change initialized -> joined
|
2021-06-27 14:46:08 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2021-06-27 14:46:08 3 [Note] WSREP: Recovered position from storage: 548cc9d7-d6bc-11eb-a54c-3aee935613df:2106067
|
2021-06-27 14:46:08 3 [Note] WSREP: Recovered view from SST:
|
id: 548cc9d7-d6bc-11eb-a54c-3aee935613df:1991660
|
status: primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 0
|
members(2):
|
0: 198269dc-d6b7-11eb-b0a8-6f003f372f53, cluster2.onpremise.redispat
|
1: ff7c3e7d-d6b7-11eb-a2f7-263bdfc4eb3d, cluster1.onpremise.redispat
|
2021-06-27 14:46:08 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2021-06-27 14:46:08 6 [Note] WSREP: Recovered cluster id 548cc9d7-d6bc-11eb-a54c-3aee935613df
|
2021-06-27 14:46:08 0 [Note] mysqld: ready for connections.
|
Version: '10.5.11-MariaDB-1:10.5.11+maria~focal' socket: '/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
|
2021-06-27 14:46:08 3 [Note] WSREP: SST received: 548cc9d7-d6bc-11eb-a54c-3aee935613df:2106067
|
2021-06-27 14:46:08 3 [Note] WSREP: SST succeeded for position 548cc9d7-d6bc-11eb-a54c-3aee935613df:2106067
|
2021-06-27 14:46:08 0 [Note] WSREP: Joiner monitor thread ended with total time 6287 sec
|
2021-06-27 14:46:08 2 [Note] WSREP: Installed new state from SST: 548cc9d7-d6bc-11eb-a54c-3aee935613df:2106067
|
2021-06-27 14:46:08 2 [Note] WSREP: Cert. index preload up to 2106067
|
2021-06-27 14:46:08 0 [Note] WSREP: ####### IST applying starts with 2106068
|
2021-06-27 14:46:08 0 [Note] WSREP: ####### IST current seqno initialized to 1991656
|
2021-06-27 14:46:08 0 [Note] WSREP: Receiving IST... 0.0% (0/5 events) complete.
|
2021-06-27 14:46:08 0 [Note] WSREP: IST preload starting at 1991656
|
2021-06-27 14:46:08 0 [ERROR] WSREP: RecordSet checksum does not match:
|
computed: 1c92287e 1e1e8b70 3abfa9d2 f1575df4
|
found: 101cbb4f fdd8338f dc6bc09b 208caabf: 22 (Invalid argument)
|
at /home/buildbot/buildbot/build/galerautils/src/gu_rset.cpp:checksum():551
|
2021-06-27 14:46:08 0 [ERROR] WSREP: Writeset deserialization failed: Writeset checksum failed: 22 (Invalid argument)
|
at /home/buildbot/buildbot/build/galera/src/write_set_ng.hpp:checksum_fin():864
|
at /home/buildbot/buildbot/build/galera/src/trx_handle.hpp:unserialize():533
|
WS flags: 0
|
Trx proto: 5
|
Trx source: 00000000-0000-0000-0000-000000000000
|
Trx conn_id: 18446744073709551615
|
Trx trx_id: 18446744073709551615
|
Trx last_seen: -1
|
This problem doesn't appear when using `wsrep_sst_method=rsync`, but this blocks the donor, which is really hurtful for us.
Maybe this is related to this upstream issue, because the error message is the same: https://github.com/codership/galera/issues/385
This is our server configuration:
[mysqld]
|
innodb_autoinc_lock_mode=2
|
innodb_flush_log_at_trx_commit=0
|
binlog_format=row
|
 |
wait_timeout=28800
|
skip-log-bin
|
log_warnings=1
|
innodb_buffer_pool_size=16G
|
innodb_log_file_size=1G
|
join_buffer_size=4M
|
max_connections=1000
|
max_heap_table_size=256M
|
query_cache_size=0
|
query_cache_type=0
|
thread_stack=1M
|
tmp_table_size=256M
|
 |
[galera]
|
wsrep_on=ON
|
wsrep_provider=/usr/lib/libgalera_smm.so
|
wsrep_cluster_name=galera
|
wsrep_cluster_address=gcomm://172.28.180.96,172.28.180.97,172.28.180.98
|
wsrep_node_name=cluster2.onpremise.cluster.local
|
wsrep_node_address=172.28.180.97
|
wsrep_sst_method=mariabackup
|
wsrep_sst_auth=root:thepassword
|
wsrep_log_conflicts=1
|
wsrep_provider_options="evs.auto_evict=10;evs.inactive_check_period=PT1S;evs.inactive_timeout=PT1M;evs.install_timeout=PT1M;evs.join_retrans_period=PT2S;evs.keepalive_period=PT3S;evs.send_window=128;evs.suspect_timeout=PT30S;evs.user_send_window=64;gcache.recover=yes;gcache.size=8G;gcs.fc_factor=0.8;gcs.fc_limit=60;gcs.max_packet_size=1048576;pc.announce_timeout=PT10S;pc.wait_prim_timeout=PT1M"
|
Please tell me if there is anything I can provide to debug this issue.
Attachments
Issue Links
- duplicates
-
MDEV-33514 Getting "WSREP: RecordSet checksum does not match when" error when starting 2nd node in galera while the only one is running
- In Testing