Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.3.9, 10.3.11, 10.3.14, 10.3.15
-
None
-
CentOS 7, running on XenServer hypervisor.
Description
Servers went down due to failed query. I bootstrapped cluster and was unable to start the remaining nodes with the following error:
2018-10-12 20:12:15 0 [Warning] WSREP: SST position can't be set in past. Requested: 92172, Current: 3620686.
2018-10-12 20:12:15 0 [Warning] WSREP: Can't continue.
2018-10-12 20:12:15 0 [ERROR] Aborting
It says the current commit is 3620686, but it was actually 9953033. Not sure why this matters since the cluster was restarted and that number restarted.
Attachments
- screenshot-1.png
- 6 kB
Issue Links
- causes
-
MDEV-33540 mariabackup --prepare: [ERROR] InnoDB: Crash recovery is broken due to insufficient innodb_log_file_size
-
- Closed
-
- is caused by
-
MDEV-15158 On commit, do not write to the TRX_SYS page
-
- Closed
-
- relates to
-
MDEV-19193 The --wsrep-new-cluster option does not reset wsrep_cluster_state_uuid and wsrep_last_committed
-
- Closed
-
-
MDEV-18177 Galera test failure on galera_autoinc_sst_mariabackup
-
- Closed
-
Activity
Issue seems to be with the default value of wsrep_start_position, the default is suppose to be Default Value: 00000000-0000-0000-0000-000000000000:-1; however, it shows this.
FYI i do not have this defined in my config file.
Not sure where it got that position from, as i said my cluster was beyond that number.
Same here on 10.3.10+maria~stretch:
018-10-20 19:29:22 0 [Note] WSREP: SST complete, seqno: 38877
2018-10-20 19:29:22 0 [Note] InnoDB: Using Linux native AIO
2018-10-20 19:29:22 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-10-20 19:29:22 0 [Note] InnoDB: Uses event mutexes
2018-10-20 19:29:22 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
2018-10-20 19:29:22 0 [Note] InnoDB: Number of pools: 1
2018-10-20 19:29:22 0 [Note] InnoDB: Using SSE2 crc32 instructions
2018-10-20 19:29:22 0 [Note] InnoDB: Initializing buffer pool, total size = 5G, instances = 5, chunk size = 128M
2018-10-20 19:29:22 0 [Note] InnoDB: Completed initialization of buffer pool
2018-10-20 19:29:22 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-10-20 19:29:22 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 1073741824 bytes
2018-10-20 19:29:22 0 [Note] InnoDB: Setting log file ./ib_logfile1 size to 1073741824 bytes
2018-10-20 19:29:22 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2018-10-20 19:29:22 0 [Note] InnoDB: New log files created, LSN=376470769565
2018-10-20 19:29:23 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2018-10-20 19:29:23 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-10-20 19:29:23 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2018-10-20 19:29:23 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2018-10-20 19:29:23 0 [Note] InnoDB: Waiting for purge to start
2018-10-20 19:29:23 0 [Note] InnoDB: 10.3.10 started; log sequence number 376470769676; transaction id 588592630
2018-10-20 19:29:23 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2018-10-20 19:29:23 0 [Note] Plugin 'FEEDBACK' is disabled.
2018-10-20 19:29:23 0 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin
2018-10-20 19:29:23 0 [Note] Starting crash recovery...
2018-10-20 19:29:23 0 [Note] Crash recovery finished.
2018-10-20 19:29:23 0 [Note] Server socket created on IP: '0.0.0.0'.
2018-10-20 19:29:23 0 [Note] WSREP: Signalling provider to continue.
*2018-10-20 19:29:23 0 [Warning] WSREP: SST position can't be set in past. Requested: 38877, Current: 56220548.
*2018-10-20 19:29:23 0 [Warning] WSREP: Can't continue.
2018-10-20 19:29:23 0 [ERROR] Aborting
i do not understand why the joiner starts a crash recovery after SST..
I tried:
- wresp-new-cluster and then restart joiner
- switch from xtrabackup to mariadbbackup
- delete galera.cache & grastate.dat on master then new joiner
- deleted /var/lib/mysql & /var/log/mysql on new joiner
My Galera cluster is down, only one node can be enabled.
Any idea ?
I will try to downgrade to previous 10.3.x next monday.
I finally found a solution as mentionned here: https://groups.google.com/forum/#!topic/codership-team/HVl5emCJEIw
I stopped the only one active node, edited grastate.dat, put 56220548 as seqno. Restarded the node, and then others nodes successfully synced (SST/Mariadbbackup).
I think this is a bug, --wsrep-new-cluster should start in a clean new cluster state. As i deleted all data in joiner nodes, sync should have been done successfully.
Sorry for my english
I used that same solution but in the grastate.dat i put the requested sequence number instead of current.
It looks like this warning:
2019-04-04 12:32:56 0 [Warning] WSREP: SST position can't be set in past. Requested: 86665367, Current: 1103114475.
|
is printed here:
https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_sst.cc#L311
The inconsistent sequence number that the warning is complaining about is read from the storage engine just before that:
https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_sst.cc#L301
https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_xid.cc#L155
Since this inconsistent seqno is read from the storage engine, it's possible that this could be a Mariabackup or InnoDB problem.
Relevant log file:
2019-04-04 12:31:18 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
|
2019-04-04 12:31:18 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
|
2019-04-04 12:31:18 0 [Note] WSREP: wsrep_load(): Galera 25.3.26(r3857) by Codership Oy <info@codership.com> loaded successfully.
|
2019-04-04 12:31:18 0 [Note] WSREP: CRC-32C: using hardware acceleration.
|
2019-04-04 12:31:18 0 [Note] WSREP: Found saved state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367, safe_to_bootstrap: 0
|
2019-04-04 12:31:18 0 [Note] WSREP: Passing config to GCS: base_dir = /mariadb/data/; base_host = 10.1.1.3; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /mariadb_logs/gcache; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /mariadb_logs/gcache/galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 1T; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0;
|
2019-04-04 12:31:18 0 [Note] WSREP: GCache history reset: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:0 -> ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367
|
2019-04-04 12:31:18 0 [Note] WSREP: Assign initial position for certification: 86665367, protocol version: -1
|
2019-04-04 12:31:18 0 [Note] WSREP: wsrep_sst_grab()
|
2019-04-04 12:31:18 0 [Note] WSREP: Start replication
|
2019-04-04 12:31:18 0 [Note] WSREP: Setting initial position to ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367
|
2019-04-04 12:31:18 0 [Note] WSREP: protonet asio version 0
|
2019-04-04 12:31:18 0 [Note] WSREP: Using CRC-32C for message checksums.
|
2019-04-04 12:31:18 0 [Note] WSREP: backend: asio
|
2019-04-04 12:31:18 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
|
2019-04-04 12:31:18 0 [Warning] WSREP: access file(/mariadb/data//gvwstate.dat) failed(No such file or directory)
|
2019-04-04 12:31:18 0 [Note] WSREP: restore pc from disk failed
|
2019-04-04 12:31:18 0 [Note] WSREP: GMCast version 0
|
2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
|
2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
|
2019-04-04 12:31:18 0 [Note] WSREP: EVS version 0
|
2019-04-04 12:31:18 0 [Note] WSREP: gcomm: connecting to group 'cluster', peer '10.1.1.2:,10.1.1.3:,10.1.1.4:'
|
2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to c843aa31 tcp://10.1.1.3:4567
|
2019-04-04 12:31:18 0 [Warning] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') address 'tcp://10.1.1.3:4567' points to own listening address, blacklisting
|
2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to d76cc0f1 tcp://10.1.1.4:4567
|
2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
|
2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to 65b30a89 tcp://10.1.1.2:4567
|
2019-04-04 12:31:19 0 [Note] WSREP: declaring 65b30a89 at tcp://10.1.1.2:4567 stable
|
2019-04-04 12:31:19 0 [Note] WSREP: declaring d76cc0f1 at tcp://10.1.1.4:4567 stable
|
2019-04-04 12:31:19 0 [Note] WSREP: Node 65b30a89 state prim
|
2019-04-04 12:31:19 0 [Note] WSREP: view(view_id(PRIM,65b30a89,17) memb {
|
65b30a89,0
|
c843aa31,0
|
d76cc0f1,0
|
} joined {
|
} left {
|
} partitioned {
|
})
|
2019-04-04 12:31:19 0 [Note] WSREP: save pc into disk
|
2019-04-04 12:31:19 0 [Note] WSREP: gcomm: connected
|
2019-04-04 12:31:19 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
|
2019-04-04 12:31:19 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
|
2019-04-04 12:31:19 0 [Note] WSREP: Opened channel 'cluster'
|
2019-04-04 12:31:19 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
|
2019-04-04 12:31:19 0 [Note] WSREP: Waiting for SST to complete.
|
2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
|
2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: sent state msg: c890995e-56c4-11e9-a721-6b928fdf9889
|
2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 0 (node30)
|
2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 2 (node32)
|
2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 1 (node31)
|
2019-04-04 12:31:19 0 [Note] WSREP: Quorum results:
|
version = 4,
|
component = PRIMARY,
|
conf_id = 16,
|
members = 2/3 (joined/total),
|
act_id = 86665368,
|
last_appl. = -1,
|
protocols = 0/9/3 (gcs/repl/appl),
|
group UUID = ce38d7fa-3dbc-11e9-947f-42b551dbbedc
|
2019-04-04 12:31:19 0 [Note] WSREP: Flow-control interval: [28, 28]
|
2019-04-04 12:31:19 0 [Note] WSREP: Trying to continue unpaused monitor
|
2019-04-04 12:31:19 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 86665368)
|
2019-04-04 12:31:19 2 [Note] WSREP: State transfer required:
|
Group state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368
|
Local state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367
|
2019-04-04 12:31:19 2 [Note] WSREP: New cluster view: global state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368, view# 17: Primary, number of nodes: 3, my index: 1, protocol version 3
|
2019-04-04 12:31:19 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
|
2019-04-04 12:31:19 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '10.1.1.3' --datadir '/mariadb/data/' --parent '21590' --binlog '/binlogs/binlog' '''
|
WSREP_SST: [INFO] Moving /mariadb/data//mariabackup.prepare.log to /tmp/sst_log_archive/mariabackup.prepare.log.2019.04.04-12.31.19.824490466 (20190404 12:31:19.831)
|
WSREP_SST: [INFO] Moving /mariadb/data//mariabackup.move.log to /tmp/sst_log_archive/mariabackup.move.log.2019.04.04-12.31.19.824490466 (20190404 12:31:19.842)
|
2019-04-04 12:31:21 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection to peer c843aa31 with addr tcp://10.1.1.3:4567 timed out, no messages seen in PT3S
|
2019-04-04 12:31:21 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') turning message relay requesting off
|
WSREP_SST: [INFO] Streaming with xbstream (20190404 12:31:48.886)
|
WSREP_SST: [INFO] Using socat as streamer (20190404 12:31:48.892)
|
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} ) (20190404 12:31:48.949)
|
2019-04-04 12:31:49 2 [Note] WSREP: Prepared SST request: mariabackup|10.1.1.3:4444/xtrabackup_sst//1
|
2019-04-04 12:31:49 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2019-04-04 12:31:49 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
|
2019-04-04 12:31:49 2 [Note] WSREP: Assign initial position for certification: 86665368, protocol version: 4
|
2019-04-04 12:31:49 0 [Note] WSREP: Service thread queue flushed.
|
2019-04-04 12:31:49 2 [Note] WSREP: IST receiver addr using tcp://10.1.1.3:4568
|
2019-04-04 12:31:49 2 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.1.1.3:4568
|
2019-04-04 12:31:49 0 [Note] WSREP: Member 1.0 (node31) requested state transfer from '*any*'. Selected 0.0 (node30)(SYNCED) as donor.
|
2019-04-04 12:31:49 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 86665368)
|
2019-04-04 12:31:49 2 [Note] WSREP: Requesting state transfer: success, donor: 0
|
2019-04-04 12:31:49 2 [Note] WSREP: GCache history reset: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:0 -> ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368
|
2019-04-04 12:31:50 0 [Note] WSREP: 0.0 (node30): State transfer to 1.0 (node31) complete.
|
2019-04-04 12:31:50 0 [Note] WSREP: Member 0.0 (node30) synced with group.
|
WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20190404 12:31:50.123)
|
WSREP_SST: [INFO] Galera co-ords from recovery: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367 0 (20190404 12:31:50.129)
|
WSREP_SST: [INFO] Total time on joiner: 0 seconds (20190404 12:31:50.133)
|
WSREP_SST: [INFO] Removing the sst_in_progress file (20190404 12:31:50.138)
|
2019-04-04 12:31:50 0 [Note] WSREP: SST complete, seqno: 86665367
|
2019-04-04 12:31:50 0 [Note] InnoDB: Using Linux native AIO
|
2019-04-04 12:31:50 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
2019-04-04 12:31:50 0 [Note] InnoDB: Uses event mutexes
|
2019-04-04 12:31:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
|
2019-04-04 12:31:50 0 [Note] InnoDB: Number of pools: 1
|
2019-04-04 12:31:50 0 [Note] InnoDB: Using SSE2 crc32 instructions
|
2019-04-04 12:31:50 0 [Note] InnoDB: Initializing buffer pool, total size = 128G, instances = 16, chunk size = 128M
|
2019-04-04 12:31:55 0 [Note] InnoDB: Completed initialization of buffer pool
|
2019-04-04 12:31:55 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
|
2019-04-04 12:32:55 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
|
2019-04-04 12:32:55 0 [Note] InnoDB: Creating shared tablespace for temporary tables
|
2019-04-04 12:32:55 0 [Note] InnoDB: Setting file '/mariadb/InnoDB/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
|
2019-04-04 12:32:55 0 [Note] InnoDB: File '/mariadb/InnoDB/ibtmp1' size is now 12 MB.
|
2019-04-04 12:32:55 0 [Note] InnoDB: Waiting for purge to start
|
2019-04-04 12:32:55 0 [Note] InnoDB: 10.3.14 started; log sequence number 51784170981115; transaction id 2265708494
|
2019-04-04 12:32:55 0 [Note] InnoDB: Loading buffer pool(s) from /mariadb/InnoDB/ib_buffer_pool
|
2019-04-04 12:32:55 0 [Note] Plugin 'FEEDBACK' is disabled.
|
2019-04-04 12:32:55 0 [Note] Server socket created on IP: '0.0.0.0'.
|
2019-04-04 12:32:56 0 [Note] InnoDB: Buffer pool(s) load completed at 190404 12:32:56
|
2019-04-04 12:32:56 0 [Note] WSREP: Signalling provider to continue.
|
2019-04-04 12:32:56 0 [Warning] WSREP: SST position can't be set in past. Requested: 86665367, Current: 1103114475.
|
2019-04-04 12:32:56 0 [Warning] WSREP: Can't continue.
|
2019-04-04 12:32:56 0 [ERROR] Aborting
|
I think this is a bug, --wsrep-new-cluster should start in a clean new cluster state. As i deleted all data in joiner nodes, sync should have been done successfully.
I submitted a specific bug for that here: MDEV-19193
The upgrade was broken due to MDEV-15158, which was introduced in MariaDB 10.3.5.
Thanks sir, appreciate the input. I think that's sounds about correct, related to mariabackup, as I was also able to duplicate this even doing a manual sst.
I noticed the coords were wrong and usually off between 1 or 4 positions. Here's what I did to fix it:
list of errors with wrong coords:
grep "WSREP: SST position can't be set in past. Requested" /var/log/mysql/error.log
|
2021-02-02 21:30:06 140122607290560 [Warning] WSREP: SST position can't be set in past. Requested: 688039012, Current: 688039013.
|
2021-02-03 10:18:19 139797349947584 [Warning] WSREP: SST position can't be set in past. Requested: 688264699, Current: 688264701.
|
2021-02-03 11:08:51 140313816185024 [Warning] WSREP: SST position can't be set in past. Requested: 688297756, Current: 688297759.
|
2021-02-03 13:16:02 140193343183040 [Warning] WSREP: SST position can't be set in past. Requested: 688370258, Current: 688370259.
|
2021-02-03 14:26:35 140649485912256 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current: 688408040.
|
2021-02-03 14:59:56 140348690200768 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current: 688408040.
|
Fix:
1 remove all data from bad node
2 do streaming backup from good node to bad node:
mariabackup --backup --parallel=4 --tmpdir=/tmp --stream=mbstream --galera-info --datadir=/var/lib/mysql --user=$user --password=$pass | pigz -c --fast | nc -w 2 badNode 4444
|
4 prepare, change perms on bad node
5 get the info from here on bad node
cat /var/lib/mysql/xtrabackup_galera_info
|
bd34378f-525c-11e5-92d5-130282ff70e0:688408036
|
6 edit on bad node /var/lib/mysql/grastate.dat and add the info from the above command
# GALERA saved state
|
version: 2.1
|
uuid: bd34378f-525c-11e5-92d5-130282ff70e0
|
seqno: 688408036
|
safe_to_bootstrap: 0
|
7 change perms on that file so mysql owns it
8 start mariadb again (this will fail again with the same error)
2021-02-03 14:59:56 140348690200768 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current: 688408040.
|
9 edit /var/lib/mysql/grastate.dat again, now replace the seqno with the seqno from the error message above
from 688408036 to 688408040 and start mariadb again, i.e.
# GALERA saved state
|
version: 2.1
|
uuid: bd34378f-525c-11e5-92d5-130282ff70e0
|
seqno: 688408040
|
safe_to_bootstrap: 0
|
Also getting this error after upgrading to 10.3.10:
[root@galeradbprod02 obissick@Ghw.com]# galera_new_cluster
/bin/galera_new_cluster: line 31: return: can only `return' from a function or sourced script