[MDEV-17458] Unable to Start Galera Node - Jira

Oren Bissick (Inactive) added a comment - 2018-10-15 15:57

Also getting this error after upgrading to 10.3.10:
[root@galeradbprod02 obissick@Ghw.com]# galera_new_cluster
/bin/galera_new_cluster: line 31: return: can only `return' from a function or sourced script

Oren Bissick (Inactive) added a comment - 2018-10-15 15:57 Also getting this error after upgrading to 10.3.10: [root@galeradbprod02 obissick@Ghw.com] # galera_new_cluster /bin/galera_new_cluster: line 31: return: can only `return' from a function or sourced script

Oren Bissick (Inactive) added a comment - 2018-10-16 01:35

Issue seems to be with the default value of wsrep_start_position, the default is suppose to be Default Value: 00000000-0000-0000-0000-000000000000:-1; however, it shows this.

FYI i do not have this defined in my config file.

Oren Bissick (Inactive) added a comment - 2018-10-16 01:35 Issue seems to be with the default value of wsrep_start_position, the default is suppose to be Default Value: 00000000-0000-0000-0000-000000000000:-1; however, it shows this. FYI i do not have this defined in my config file.

Oren Bissick (Inactive) added a comment - 2018-10-16 01:38

Not sure where it got that position from, as i said my cluster was beyond that number.

Oren Bissick (Inactive) added a comment - 2018-10-16 01:38 Not sure where it got that position from, as i said my cluster was beyond that number.

Louis Chanouha added a comment - 2018-10-20 17:33

Same here on 10.3.10+maria~stretch:
018-10-20 19:29:22 0 [Note] WSREP: SST complete, seqno: 38877
2018-10-20 19:29:22 0 [Note] InnoDB: Using Linux native AIO
2018-10-20 19:29:22 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-10-20 19:29:22 0 [Note] InnoDB: Uses event mutexes
2018-10-20 19:29:22 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
2018-10-20 19:29:22 0 [Note] InnoDB: Number of pools: 1
2018-10-20 19:29:22 0 [Note] InnoDB: Using SSE2 crc32 instructions
2018-10-20 19:29:22 0 [Note] InnoDB: Initializing buffer pool, total size = 5G, instances = 5, chunk size = 128M
2018-10-20 19:29:22 0 [Note] InnoDB: Completed initialization of buffer pool
2018-10-20 19:29:22 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-10-20 19:29:22 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 1073741824 bytes
2018-10-20 19:29:22 0 [Note] InnoDB: Setting log file ./ib_logfile1 size to 1073741824 bytes
2018-10-20 19:29:22 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2018-10-20 19:29:22 0 [Note] InnoDB: New log files created, LSN=376470769565
2018-10-20 19:29:23 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2018-10-20 19:29:23 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-10-20 19:29:23 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2018-10-20 19:29:23 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2018-10-20 19:29:23 0 [Note] InnoDB: Waiting for purge to start
2018-10-20 19:29:23 0 [Note] InnoDB: 10.3.10 started; log sequence number 376470769676; transaction id 588592630
2018-10-20 19:29:23 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2018-10-20 19:29:23 0 [Note] Plugin 'FEEDBACK' is disabled.
2018-10-20 19:29:23 0 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin
2018-10-20 19:29:23 0 [Note] Starting crash recovery...
2018-10-20 19:29:23 0 [Note] Crash recovery finished.
2018-10-20 19:29:23 0 [Note] Server socket created on IP: '0.0.0.0'.
2018-10-20 19:29:23 0 [Note] WSREP: Signalling provider to continue.
*2018-10-20 19:29:23 0 [Warning] WSREP: SST position can't be set in past. Requested: 38877, Current: 56220548.
*2018-10-20 19:29:23 0 [Warning] WSREP: Can't continue.
2018-10-20 19:29:23 0 [ERROR] Aborting

i do not understand why the joiner starts a crash recovery after SST..

I tried:

wresp-new-cluster and then restart joiner
switch from xtrabackup to mariadbbackup
delete galera.cache & grastate.dat on master then new joiner
deleted /var/lib/mysql & /var/log/mysql on new joiner

My Galera cluster is down, only one node can be enabled.

Any idea ?
I will try to downgrade to previous 10.3.x next monday.

Louis Chanouha added a comment - 2018-10-20 17:33 Same here on 10.3.10+maria~stretch: 018-10-20 19:29:22 0 [Note] WSREP: SST complete, seqno: 38877 2018-10-20 19:29:22 0 [Note] InnoDB: Using Linux native AIO 2018-10-20 19:29:22 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-10-20 19:29:22 0 [Note] InnoDB: Uses event mutexes 2018-10-20 19:29:22 0 [Note] InnoDB: Compressed tables use zlib 1.2.8 2018-10-20 19:29:22 0 [Note] InnoDB: Number of pools: 1 2018-10-20 19:29:22 0 [Note] InnoDB: Using SSE2 crc32 instructions 2018-10-20 19:29:22 0 [Note] InnoDB: Initializing buffer pool, total size = 5G, instances = 5, chunk size = 128M 2018-10-20 19:29:22 0 [Note] InnoDB: Completed initialization of buffer pool 2018-10-20 19:29:22 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2018-10-20 19:29:22 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 1073741824 bytes 2018-10-20 19:29:22 0 [Note] InnoDB: Setting log file ./ib_logfile1 size to 1073741824 bytes 2018-10-20 19:29:22 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0 2018-10-20 19:29:22 0 [Note] InnoDB: New log files created, LSN=376470769565 2018-10-20 19:29:23 0 [Note] InnoDB: 128 out of 128 rollback segments are active. 2018-10-20 19:29:23 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2018-10-20 19:29:23 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2018-10-20 19:29:23 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2018-10-20 19:29:23 0 [Note] InnoDB: Waiting for purge to start 2018-10-20 19:29:23 0 [Note] InnoDB: 10.3.10 started; log sequence number 376470769676; transaction id 588592630 2018-10-20 19:29:23 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2018-10-20 19:29:23 0 [Note] Plugin 'FEEDBACK' is disabled. 2018-10-20 19:29:23 0 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin 2018-10-20 19:29:23 0 [Note] Starting crash recovery... 2018-10-20 19:29:23 0 [Note] Crash recovery finished. 2018-10-20 19:29:23 0 [Note] Server socket created on IP: '0.0.0.0'. 2018-10-20 19:29:23 0 [Note] WSREP: Signalling provider to continue. *2018-10-20 19:29:23 0 [Warning] WSREP: SST position can't be set in past. Requested: 38877, Current: 56220548. *2018-10-20 19:29:23 0 [Warning] WSREP: Can't continue. 2018-10-20 19:29:23 0 [ERROR] Aborting i do not understand why the joiner starts a crash recovery after SST.. I tried: wresp-new-cluster and then restart joiner switch from xtrabackup to mariadbbackup delete galera.cache & grastate.dat on master then new joiner deleted /var/lib/mysql & /var/log/mysql on new joiner My Galera cluster is down, only one node can be enabled. Any idea ? I will try to downgrade to previous 10.3.x next monday.

Louis Chanouha added a comment - 2018-10-21 00:50

I finally found a solution as mentionned here: https://groups.google.com/forum/#!topic/codership-team/HVl5emCJEIw

I stopped the only one active node, edited grastate.dat, put 56220548 as seqno. Restarded the node, and then others nodes successfully synced (SST/Mariadbbackup).

I think this is a bug, --wsrep-new-cluster should start in a clean new cluster state. As i deleted all data in joiner nodes, sync should have been done successfully.

Sorry for my english

Louis Chanouha added a comment - 2018-10-21 00:50 I finally found a solution as mentionned here: https://groups.google.com/forum/#!topic/codership-team/HVl5emCJEIw I stopped the only one active node, edited grastate.dat, put 56220548 as seqno. Restarded the node, and then others nodes successfully synced (SST/Mariadbbackup). I think this is a bug, --wsrep-new-cluster should start in a clean new cluster state. As i deleted all data in joiner nodes, sync should have been done successfully. Sorry for my english

Oren Bissick (Inactive) added a comment - 2018-10-22 14:45

I used that same solution but in the grastate.dat i put the requested sequence number instead of current.

Oren Bissick (Inactive) added a comment - 2018-10-22 14:45 I used that same solution but in the grastate.dat i put the requested sequence number instead of current.

Geoff Montee (Inactive) added a comment - 2019-04-05 18:33 - edited

It looks like this warning:

2019-04-04 12:32:56 0 [Warning] WSREP: SST position can't be set in past. Requested: 86665367, Current:  1103114475.

is printed here:

https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_sst.cc#L311

The inconsistent sequence number that the warning is complaining about is read from the storage engine just before that:

https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_sst.cc#L301

https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_xid.cc#L155

Since this inconsistent seqno is read from the storage engine, it's possible that this could be a Mariabackup or InnoDB problem.

Relevant log file:

2019-04-04 12:31:18 0 [Note] WSREP: Read nil XID from storage engines, skipping position init

2019-04-04 12:31:18 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'

2019-04-04 12:31:18 0 [Note] WSREP: wsrep_load(): Galera 25.3.26(r3857) by Codership Oy <info@codership.com> loaded successfully.

2019-04-04 12:31:18 0 [Note] WSREP: CRC-32C: using hardware acceleration.

2019-04-04 12:31:18 0 [Note] WSREP: Found saved state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367, safe_to_bootstrap: 0

2019-04-04 12:31:18 0 [Note] WSREP: Passing config to GCS: base_dir = /mariadb/data/; base_host = 10.1.1.3; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /mariadb_logs/gcache; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /mariadb_logs/gcache/galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 1T; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0;

2019-04-04 12:31:18 0 [Note] WSREP: GCache history reset: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:0 -> ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367

2019-04-04 12:31:18 0 [Note] WSREP: Assign initial position for certification: 86665367, protocol version: -1

2019-04-04 12:31:18 0 [Note] WSREP: wsrep_sst_grab()

2019-04-04 12:31:18 0 [Note] WSREP: Start replication

2019-04-04 12:31:18 0 [Note] WSREP: Setting initial position to ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367

2019-04-04 12:31:18 0 [Note] WSREP: protonet asio version 0

2019-04-04 12:31:18 0 [Note] WSREP: Using CRC-32C for message checksums.

2019-04-04 12:31:18 0 [Note] WSREP: backend: asio

2019-04-04 12:31:18 0 [Note] WSREP: gcomm thread scheduling priority set to other:0

2019-04-04 12:31:18 0 [Warning] WSREP: access file(/mariadb/data//gvwstate.dat) failed(No such file or directory)

2019-04-04 12:31:18 0 [Note] WSREP: restore pc from disk failed

2019-04-04 12:31:18 0 [Note] WSREP: GMCast version 0

2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567

2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') multicast: , ttl: 1

2019-04-04 12:31:18 0 [Note] WSREP: EVS version 0

2019-04-04 12:31:18 0 [Note] WSREP: gcomm: connecting to group 'cluster', peer '10.1.1.2:,10.1.1.3:,10.1.1.4:'

2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to c843aa31 tcp://10.1.1.3:4567

2019-04-04 12:31:18 0 [Warning] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') address 'tcp://10.1.1.3:4567' points to own listening address, blacklisting

2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to d76cc0f1 tcp://10.1.1.4:4567

2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:

2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to 65b30a89 tcp://10.1.1.2:4567

2019-04-04 12:31:19 0 [Note] WSREP: declaring 65b30a89 at tcp://10.1.1.2:4567 stable

2019-04-04 12:31:19 0 [Note] WSREP: declaring d76cc0f1 at tcp://10.1.1.4:4567 stable

2019-04-04 12:31:19 0 [Note] WSREP: Node 65b30a89 state prim

2019-04-04 12:31:19 0 [Note] WSREP: view(view_id(PRIM,65b30a89,17) memb {

        65b30a89,0

        c843aa31,0

        d76cc0f1,0

} joined {

} left {

} partitioned {

})

2019-04-04 12:31:19 0 [Note] WSREP: save pc into disk

2019-04-04 12:31:19 0 [Note] WSREP: gcomm: connected

2019-04-04 12:31:19 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636

2019-04-04 12:31:19 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)

2019-04-04 12:31:19 0 [Note] WSREP: Opened channel 'cluster'

2019-04-04 12:31:19 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3

2019-04-04 12:31:19 0 [Note] WSREP: Waiting for SST to complete.

2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.

2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: sent state msg: c890995e-56c4-11e9-a721-6b928fdf9889

2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 0 (node30)

2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 2 (node32)

2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 1 (node31)

2019-04-04 12:31:19 0 [Note] WSREP: Quorum results:

        version    = 4,

        component  = PRIMARY,

        conf_id    = 16,

        members    = 2/3 (joined/total),

        act_id     = 86665368,

        last_appl. = -1,

        protocols  = 0/9/3 (gcs/repl/appl),

        group UUID = ce38d7fa-3dbc-11e9-947f-42b551dbbedc

2019-04-04 12:31:19 0 [Note] WSREP: Flow-control interval: [28, 28]

2019-04-04 12:31:19 0 [Note] WSREP: Trying to continue unpaused monitor

2019-04-04 12:31:19 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 86665368)

2019-04-04 12:31:19 2 [Note] WSREP: State transfer required:

        Group state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368

        Local state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367

2019-04-04 12:31:19 2 [Note] WSREP: New cluster view: global state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368, view# 17: Primary, number of nodes: 3, my index: 1, protocol version 3

2019-04-04 12:31:19 2 [Warning] WSREP: Gap in state sequence. Need state transfer.

2019-04-04 12:31:19 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '10.1.1.3' --datadir '/mariadb/data/'   --parent '21590' --binlog '/binlogs/binlog'  '''

WSREP_SST: [INFO] Moving /mariadb/data//mariabackup.prepare.log to /tmp/sst_log_archive/mariabackup.prepare.log.2019.04.04-12.31.19.824490466 (20190404 12:31:19.831)

WSREP_SST: [INFO] Moving /mariadb/data//mariabackup.move.log to /tmp/sst_log_archive/mariabackup.move.log.2019.04.04-12.31.19.824490466 (20190404 12:31:19.842)

2019-04-04 12:31:21 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection to peer c843aa31 with addr tcp://10.1.1.3:4567 timed out, no messages seen in PT3S

2019-04-04 12:31:21 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') turning message relay requesting off

WSREP_SST: [INFO] Streaming with xbstream (20190404 12:31:48.886)

WSREP_SST: [INFO] Using socat as streamer (20190404 12:31:48.892)

WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} ) (20190404 12:31:48.949)

2019-04-04 12:31:49 2 [Note] WSREP: Prepared SST request: mariabackup|10.1.1.3:4444/xtrabackup_sst//1

2019-04-04 12:31:49 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2019-04-04 12:31:49 2 [Note] WSREP: REPL Protocols: 9 (4, 2)

2019-04-04 12:31:49 2 [Note] WSREP: Assign initial position for certification: 86665368, protocol version: 4

2019-04-04 12:31:49 0 [Note] WSREP: Service thread queue flushed.

2019-04-04 12:31:49 2 [Note] WSREP: IST receiver addr using tcp://10.1.1.3:4568

2019-04-04 12:31:49 2 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.1.1.3:4568

2019-04-04 12:31:49 0 [Note] WSREP: Member 1.0 (node31) requested state transfer from '*any*'. Selected 0.0 (node30)(SYNCED) as donor.

2019-04-04 12:31:49 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 86665368)

2019-04-04 12:31:49 2 [Note] WSREP: Requesting state transfer: success, donor: 0

2019-04-04 12:31:49 2 [Note] WSREP: GCache history reset: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:0 -> ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368

2019-04-04 12:31:50 0 [Note] WSREP: 0.0 (node30): State transfer to 1.0 (node31) complete.

2019-04-04 12:31:50 0 [Note] WSREP: Member 0.0 (node30) synced with group.

WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20190404 12:31:50.123)

WSREP_SST: [INFO] Galera co-ords from recovery: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367 0 (20190404 12:31:50.129)

WSREP_SST: [INFO] Total time on joiner: 0 seconds (20190404 12:31:50.133)

WSREP_SST: [INFO] Removing the sst_in_progress file (20190404 12:31:50.138)

2019-04-04 12:31:50 0 [Note] WSREP: SST complete, seqno: 86665367

2019-04-04 12:31:50 0 [Note] InnoDB: Using Linux native AIO

2019-04-04 12:31:50 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins

2019-04-04 12:31:50 0 [Note] InnoDB: Uses event mutexes

2019-04-04 12:31:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.7

2019-04-04 12:31:50 0 [Note] InnoDB: Number of pools: 1

2019-04-04 12:31:50 0 [Note] InnoDB: Using SSE2 crc32 instructions

2019-04-04 12:31:50 0 [Note] InnoDB: Initializing buffer pool, total size = 128G, instances = 16, chunk size = 128M

2019-04-04 12:31:55 0 [Note] InnoDB: Completed initialization of buffer pool

2019-04-04 12:31:55 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().

2019-04-04 12:32:55 0 [Note] InnoDB: 128 out of 128 rollback segments are active.

2019-04-04 12:32:55 0 [Note] InnoDB: Creating shared tablespace for temporary tables

2019-04-04 12:32:55 0 [Note] InnoDB: Setting file '/mariadb/InnoDB/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...

2019-04-04 12:32:55 0 [Note] InnoDB: File '/mariadb/InnoDB/ibtmp1' size is now 12 MB.

2019-04-04 12:32:55 0 [Note] InnoDB: Waiting for purge to start

2019-04-04 12:32:55 0 [Note] InnoDB: 10.3.14 started; log sequence number 51784170981115; transaction id 2265708494

2019-04-04 12:32:55 0 [Note] InnoDB: Loading buffer pool(s) from /mariadb/InnoDB/ib_buffer_pool

2019-04-04 12:32:55 0 [Note] Plugin 'FEEDBACK' is disabled.

2019-04-04 12:32:55 0 [Note] Server socket created on IP: '0.0.0.0'.

2019-04-04 12:32:56 0 [Note] InnoDB: Buffer pool(s) load completed at 190404 12:32:56

2019-04-04 12:32:56 0 [Note] WSREP: Signalling provider to continue.

2019-04-04 12:32:56 0 [Warning] WSREP: SST position can't be set in past. Requested: 86665367, Current:  1103114475.

2019-04-04 12:32:56 0 [Warning] WSREP: Can't continue.

2019-04-04 12:32:56 0 [ERROR] Aborting

Geoff Montee (Inactive) added a comment - 2019-04-05 18:33 - edited It looks like this warning: 2019-04-04 12:32:56 0 [Warning] WSREP: SST position can't be set in past. Requested: 86665367, Current: 1103114475. is printed here: https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_sst.cc#L311 The inconsistent sequence number that the warning is complaining about is read from the storage engine just before that: https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_sst.cc#L301 https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_xid.cc#L155 Since this inconsistent seqno is read from the storage engine, it's possible that this could be a Mariabackup or InnoDB problem. Relevant log file: 2019-04-04 12:31:18 0 [Note] WSREP: Read nil XID from storage engines, skipping position init 2019-04-04 12:31:18 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so' 2019-04-04 12:31:18 0 [Note] WSREP: wsrep_load(): Galera 25.3.26(r3857) by Codership Oy <info@codership.com> loaded successfully. 2019-04-04 12:31:18 0 [Note] WSREP: CRC-32C: using hardware acceleration. 2019-04-04 12:31:18 0 [Note] WSREP: Found saved state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367, safe_to_bootstrap: 0 2019-04-04 12:31:18 0 [Note] WSREP: Passing config to GCS: base_dir = /mariadb/data/; base_host = 10.1.1.3; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /mariadb_logs/gcache; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /mariadb_logs/gcache/galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 1T; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; 2019-04-04 12:31:18 0 [Note] WSREP: GCache history reset: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:0 -> ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367 2019-04-04 12:31:18 0 [Note] WSREP: Assign initial position for certification: 86665367, protocol version: -1 2019-04-04 12:31:18 0 [Note] WSREP: wsrep_sst_grab() 2019-04-04 12:31:18 0 [Note] WSREP: Start replication 2019-04-04 12:31:18 0 [Note] WSREP: Setting initial position to ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367 2019-04-04 12:31:18 0 [Note] WSREP: protonet asio version 0 2019-04-04 12:31:18 0 [Note] WSREP: Using CRC-32C for message checksums. 2019-04-04 12:31:18 0 [Note] WSREP: backend: asio 2019-04-04 12:31:18 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 2019-04-04 12:31:18 0 [Warning] WSREP: access file(/mariadb/data//gvwstate.dat) failed(No such file or directory) 2019-04-04 12:31:18 0 [Note] WSREP: restore pc from disk failed 2019-04-04 12:31:18 0 [Note] WSREP: GMCast version 0 2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 2019-04-04 12:31:18 0 [Note] WSREP: EVS version 0 2019-04-04 12:31:18 0 [Note] WSREP: gcomm: connecting to group 'cluster', peer '10.1.1.2:,10.1.1.3:,10.1.1.4:' 2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to c843aa31 tcp://10.1.1.3:4567 2019-04-04 12:31:18 0 [Warning] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') address 'tcp://10.1.1.3:4567' points to own listening address, blacklisting 2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to d76cc0f1 tcp://10.1.1.4:4567 2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to 65b30a89 tcp://10.1.1.2:4567 2019-04-04 12:31:19 0 [Note] WSREP: declaring 65b30a89 at tcp://10.1.1.2:4567 stable 2019-04-04 12:31:19 0 [Note] WSREP: declaring d76cc0f1 at tcp://10.1.1.4:4567 stable 2019-04-04 12:31:19 0 [Note] WSREP: Node 65b30a89 state prim 2019-04-04 12:31:19 0 [Note] WSREP: view(view_id(PRIM,65b30a89,17) memb { 65b30a89,0 c843aa31,0 d76cc0f1,0 } joined { } left { } partitioned { }) 2019-04-04 12:31:19 0 [Note] WSREP: save pc into disk 2019-04-04 12:31:19 0 [Note] WSREP: gcomm: connected 2019-04-04 12:31:19 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636 2019-04-04 12:31:19 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0) 2019-04-04 12:31:19 0 [Note] WSREP: Opened channel 'cluster' 2019-04-04 12:31:19 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3 2019-04-04 12:31:19 0 [Note] WSREP: Waiting for SST to complete. 2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. 2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: sent state msg: c890995e-56c4-11e9-a721-6b928fdf9889 2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 0 (node30) 2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 2 (node32) 2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 1 (node31) 2019-04-04 12:31:19 0 [Note] WSREP: Quorum results: version = 4, component = PRIMARY, conf_id = 16, members = 2/3 (joined/total), act_id = 86665368, last_appl. = -1, protocols = 0/9/3 (gcs/repl/appl), group UUID = ce38d7fa-3dbc-11e9-947f-42b551dbbedc 2019-04-04 12:31:19 0 [Note] WSREP: Flow-control interval: [28, 28] 2019-04-04 12:31:19 0 [Note] WSREP: Trying to continue unpaused monitor 2019-04-04 12:31:19 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 86665368) 2019-04-04 12:31:19 2 [Note] WSREP: State transfer required: Group state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368 Local state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367 2019-04-04 12:31:19 2 [Note] WSREP: New cluster view: global state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368, view# 17: Primary, number of nodes: 3, my index: 1, protocol version 3 2019-04-04 12:31:19 2 [Warning] WSREP: Gap in state sequence. Need state transfer. 2019-04-04 12:31:19 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '10.1.1.3' --datadir '/mariadb/data/' --parent '21590' --binlog '/binlogs/binlog' ''' WSREP_SST: [INFO] Moving /mariadb/data//mariabackup.prepare.log to /tmp/sst_log_archive/mariabackup.prepare.log.2019.04.04-12.31.19.824490466 (20190404 12:31:19.831) WSREP_SST: [INFO] Moving /mariadb/data//mariabackup.move.log to /tmp/sst_log_archive/mariabackup.move.log.2019.04.04-12.31.19.824490466 (20190404 12:31:19.842) 2019-04-04 12:31:21 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection to peer c843aa31 with addr tcp://10.1.1.3:4567 timed out, no messages seen in PT3S 2019-04-04 12:31:21 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') turning message relay requesting off WSREP_SST: [INFO] Streaming with xbstream (20190404 12:31:48.886) WSREP_SST: [INFO] Using socat as streamer (20190404 12:31:48.892) WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} ) (20190404 12:31:48.949) 2019-04-04 12:31:49 2 [Note] WSREP: Prepared SST request: mariabackup|10.1.1.3:4444/xtrabackup_sst//1 2019-04-04 12:31:49 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2019-04-04 12:31:49 2 [Note] WSREP: REPL Protocols: 9 (4, 2) 2019-04-04 12:31:49 2 [Note] WSREP: Assign initial position for certification: 86665368, protocol version: 4 2019-04-04 12:31:49 0 [Note] WSREP: Service thread queue flushed. 2019-04-04 12:31:49 2 [Note] WSREP: IST receiver addr using tcp://10.1.1.3:4568 2019-04-04 12:31:49 2 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.1.1.3:4568 2019-04-04 12:31:49 0 [Note] WSREP: Member 1.0 (node31) requested state transfer from '*any*'. Selected 0.0 (node30)(SYNCED) as donor. 2019-04-04 12:31:49 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 86665368) 2019-04-04 12:31:49 2 [Note] WSREP: Requesting state transfer: success, donor: 0 2019-04-04 12:31:49 2 [Note] WSREP: GCache history reset: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:0 -> ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368 2019-04-04 12:31:50 0 [Note] WSREP: 0.0 (node30): State transfer to 1.0 (node31) complete. 2019-04-04 12:31:50 0 [Note] WSREP: Member 0.0 (node30) synced with group. WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20190404 12:31:50.123) WSREP_SST: [INFO] Galera co-ords from recovery: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367 0 (20190404 12:31:50.129) WSREP_SST: [INFO] Total time on joiner: 0 seconds (20190404 12:31:50.133) WSREP_SST: [INFO] Removing the sst_in_progress file (20190404 12:31:50.138) 2019-04-04 12:31:50 0 [Note] WSREP: SST complete, seqno: 86665367 2019-04-04 12:31:50 0 [Note] InnoDB: Using Linux native AIO 2019-04-04 12:31:50 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2019-04-04 12:31:50 0 [Note] InnoDB: Uses event mutexes 2019-04-04 12:31:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.7 2019-04-04 12:31:50 0 [Note] InnoDB: Number of pools: 1 2019-04-04 12:31:50 0 [Note] InnoDB: Using SSE2 crc32 instructions 2019-04-04 12:31:50 0 [Note] InnoDB: Initializing buffer pool, total size = 128G, instances = 16, chunk size = 128M 2019-04-04 12:31:55 0 [Note] InnoDB: Completed initialization of buffer pool 2019-04-04 12:31:55 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2019-04-04 12:32:55 0 [Note] InnoDB: 128 out of 128 rollback segments are active. 2019-04-04 12:32:55 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2019-04-04 12:32:55 0 [Note] InnoDB: Setting file '/mariadb/InnoDB/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2019-04-04 12:32:55 0 [Note] InnoDB: File '/mariadb/InnoDB/ibtmp1' size is now 12 MB. 2019-04-04 12:32:55 0 [Note] InnoDB: Waiting for purge to start 2019-04-04 12:32:55 0 [Note] InnoDB: 10.3.14 started; log sequence number 51784170981115; transaction id 2265708494 2019-04-04 12:32:55 0 [Note] InnoDB: Loading buffer pool(s) from /mariadb/InnoDB/ib_buffer_pool 2019-04-04 12:32:55 0 [Note] Plugin 'FEEDBACK' is disabled. 2019-04-04 12:32:55 0 [Note] Server socket created on IP: '0.0.0.0'. 2019-04-04 12:32:56 0 [Note] InnoDB: Buffer pool(s) load completed at 190404 12:32:56 2019-04-04 12:32:56 0 [Note] WSREP: Signalling provider to continue. 2019-04-04 12:32:56 0 [Warning] WSREP: SST position can't be set in past. Requested: 86665367, Current: 1103114475. 2019-04-04 12:32:56 0 [Warning] WSREP: Can't continue. 2019-04-04 12:32:56 0 [ERROR] Aborting

Geoff Montee (Inactive) added a comment - 2019-04-05 18:41

chanouha,

I think this is a bug, --wsrep-new-cluster should start in a clean new cluster state. As i deleted all data in joiner nodes, sync should have been done successfully.

I submitted a specific bug for that here: ~~MDEV-19193~~

Geoff Montee (Inactive) added a comment - 2019-04-05 18:41 chanouha , I think this is a bug, --wsrep-new-cluster should start in a clean new cluster state. As i deleted all data in joiner nodes, sync should have been done successfully. I submitted a specific bug for that here: MDEV-19193

Jan Lindström (Inactive) added a comment - 2019-05-21 07:03

https://github.com/MariaDB/server/pull/1305

Jan Lindström (Inactive) added a comment - 2019-05-21 07:03 https://github.com/MariaDB/server/pull/1305

Marko Mäkelä added a comment - 2019-05-21 07:49

Please request another review after addressing my comments.

Marko Mäkelä added a comment - 2019-05-21 07:49 Please request another review after addressing my comments.

Marko Mäkelä added a comment - 2019-05-21 12:15

The upgrade was broken due to ~~MDEV-15158~~, which was introduced in MariaDB 10.3.5.

Marko Mäkelä added a comment - 2019-05-21 12:15 The upgrade was broken due to MDEV-15158 , which was introduced in MariaDB 10.3.5.

Daniel Almeida (Inactive) added a comment - 2021-02-03 21:03

Thanks sir, appreciate the input. I think that's sounds about correct, related to mariabackup, as I was also able to duplicate this even doing a manual sst.
I noticed the coords were wrong and usually off between 1 or 4 positions. Here's what I did to fix it:

list of errors with wrong coords:

 grep "WSREP: SST position can't be set in past. Requested" /var/log/mysql/error.log

2021-02-02 21:30:06 140122607290560 [Warning] WSREP: SST position can't be set in past. Requested: 688039012, Current:  688039013.

2021-02-03 10:18:19 139797349947584 [Warning] WSREP: SST position can't be set in past. Requested: 688264699, Current:  688264701.

2021-02-03 11:08:51 140313816185024 [Warning] WSREP: SST position can't be set in past. Requested: 688297756, Current:  688297759.

2021-02-03 13:16:02 140193343183040 [Warning] WSREP: SST position can't be set in past. Requested: 688370258, Current:  688370259.

2021-02-03 14:26:35 140649485912256 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current:  688408040.

2021-02-03 14:59:56 140348690200768 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current:  688408040.

Fix:

1 remove all data from bad node
2 do streaming backup from good node to bad node:

mariabackup --backup --parallel=4 --tmpdir=/tmp --stream=mbstream --galera-info --datadir=/var/lib/mysql --user=$user --password=$pass | pigz -c --fast | nc -w 2 badNode 4444

4 prepare, change perms on bad node
5 get the info from here on bad node

cat /var/lib/mysql/xtrabackup_galera_info

bd34378f-525c-11e5-92d5-130282ff70e0:688408036

6 edit on bad node /var/lib/mysql/grastate.dat and add the info from the above command

# GALERA saved state

version: 2.1

uuid:    bd34378f-525c-11e5-92d5-130282ff70e0

seqno:   688408036

safe_to_bootstrap: 0

7 change perms on that file so mysql owns it
8 start mariadb again (this will fail again with the same error)

2021-02-03 14:59:56 140348690200768 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current:  688408040.

9 edit /var/lib/mysql/grastate.dat again, now replace the seqno with the seqno from the error message above
from 688408036 to 688408040 and start mariadb again, i.e.

# GALERA saved state

version: 2.1

uuid:    bd34378f-525c-11e5-92d5-130282ff70e0

seqno:   688408040

safe_to_bootstrap: 0

Daniel Almeida (Inactive) added a comment - 2021-02-03 21:03 Thanks sir, appreciate the input. I think that's sounds about correct, related to mariabackup, as I was also able to duplicate this even doing a manual sst. I noticed the coords were wrong and usually off between 1 or 4 positions. Here's what I did to fix it: list of errors with wrong coords: grep "WSREP: SST position can't be set in past. Requested" /var/log/mysql/error.log 2021-02-02 21:30:06 140122607290560 [Warning] WSREP: SST position can't be set in past. Requested: 688039012, Current: 688039013. 2021-02-03 10:18:19 139797349947584 [Warning] WSREP: SST position can't be set in past. Requested: 688264699, Current: 688264701. 2021-02-03 11:08:51 140313816185024 [Warning] WSREP: SST position can't be set in past. Requested: 688297756, Current: 688297759. 2021-02-03 13:16:02 140193343183040 [Warning] WSREP: SST position can't be set in past. Requested: 688370258, Current: 688370259. 2021-02-03 14:26:35 140649485912256 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current: 688408040. 2021-02-03 14:59:56 140348690200768 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current: 688408040. Fix: 1 remove all data from bad node 2 do streaming backup from good node to bad node: mariabackup --backup --parallel=4 --tmpdir=/tmp --stream=mbstream --galera-info --datadir=/var/lib/mysql --user=$user --password=$pass | pigz -c --fast | nc -w 2 badNode 4444 4 prepare, change perms on bad node 5 get the info from here on bad node cat /var/lib/mysql/xtrabackup_galera_info bd34378f-525c-11e5-92d5-130282ff70e0:688408036 6 edit on bad node /var/lib/mysql/grastate.dat and add the info from the above command # GALERA saved state version: 2.1 uuid: bd34378f-525c-11e5-92d5-130282ff70e0 seqno: 688408036 safe_to_bootstrap: 0 7 change perms on that file so mysql owns it 8 start mariadb again (this will fail again with the same error) 2021-02-03 14:59:56 140348690200768 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current: 688408040. 9 edit /var/lib/mysql/grastate.dat again, now replace the seqno with the seqno from the error message above from 688408036 to 688408040 and start mariadb again, i.e. # GALERA saved state version: 2.1 uuid: bd34378f-525c-11e5-92d5-130282ff70e0 seqno: 688408040 safe_to_bootstrap: 0

MariaDB Server

Unable to Start Galera Node

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration