[MDEV-15453] IST failed during upgrade of version and SST failed too from 10.2.12 to 10.2.13 Created: 2018-03-02  Updated: 2018-09-13  Resolved: 2018-09-13

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST
Affects Version/s: 10.2.13
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Aurélien LEQUOY Assignee: Jan Lindström (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: galera
Environment:

one master , one galera cluster on slave with 10.2.12 under Debian 9.3. We try to upgrade the galera cluster 10.2.12 to 10.2.13


Issue Links:
Relates
relates to MDEV-12289 Keep 128 persistent rollback segments... Closed
relates to MDEV-15466 SST not working after upgrade from 10... Closed

 Description   

Hello I tried to upgrade one galera cluster behind on master.

they way : upgrade node by node and reconnect to the cluster.
(SST method : Xtrabackup)

1) upgraded the node 3 who is slave (no problem, no event on cluster)
2) we let replication, we disconect the node 2 we upgrade MariaDB from 10.2.12 to 10.2.13
(maybe one or 2 minutes)
impossible to rejoin the cluster IST failed, so SST and SST failed.
3) stop the slave on slave 1, remove node 3 from cluster, upgrade to 10.2.13 reconnect to cluster, everthing fine.

2018-03-02 10:47:43 140596706543808 [Note] WSREP: GCache history reset: 6b51589d-0b5f-11e8-94e6-82aa9d3be300:0 -> 6b51589d-0b5f-11e8-94e6-82aa9d3be300:467319538
2018-03-02 10:47:43 140596706543808 [Note] WSREP: Assign initial position for certification: 467319538, protocol version: -1
2018-03-02 10:47:43 140596706543808 [Note] WSREP: wsrep_sst_grab()
2018-03-02 10:47:43 140596706543808 [Note] WSREP: Start replication
2018-03-02 10:47:43 140596706543808 [Note] WSREP: Setting initial position to 6b51589d-0b5f-11e8-94e6-82aa9d3be300:467319538
2018-03-02 10:47:43 140596706543808 [Note] WSREP: protonet asio version 0
2018-03-02 10:47:43 140596706543808 [Note] WSREP: Using CRC-32C for message checksums.
2018-03-02 10:47:43 140596706543808 [Note] WSREP: backend: asio
2018-03-02 10:47:43 140596706543808 [Note] WSREP: gcomm thread scheduling priority set to other:0
2018-03-02 10:47:43 140596706543808 [Warning] WSREP: access file(/RDC/mysql/data//gvwstate.dat) failed(No such file or directory)
2018-03-02 10:47:43 140596706543808 [Note] WSREP: restore pc from disk failed
2018-03-02 10:47:43 140596706543808 [Note] WSREP: GMCast version 0
2018-03-02 10:47:43 140596706543808 [Note] WSREP: (c144d827, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2018-03-02 10:47:43 140596706543808 [Note] WSREP: (c144d827, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2018-03-02 10:47:43 140596706543808 [Note] WSREP: EVS version 0
2018-03-02 10:47:43 140596706543808 [Note] WSREP: gcomm: connecting to group 'rdc-cluster-1', peer '10.241.5.221:,10.241.5.222:,10.241.5.220:'
2018-03-02 10:47:43 140596706543808 [Note] WSREP: (c144d827, 'tcp://0.0.0.0:4567') connection established to c144d827 tcp://10.241.5.221:4567
2018-03-02 10:47:43 140596706543808 [Warning] WSREP: (c144d827, 'tcp://0.0.0.0:4567') address 'tcp://10.241.5.221:4567' points to own listening address, blacklisting
2018-03-02 10:47:43 140596706543808 [Note] WSREP: (c144d827, 'tcp://0.0.0.0:4567') connection established to 40c1e704 tcp://10.241.5.220:4567
2018-03-02 10:47:43 140596706543808 [Note] WSREP: (c144d827, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2018-03-02 10:47:43 140596706543808 [Note] WSREP: (c144d827, 'tcp://0.0.0.0:4567') connection established to 0a0faad9 tcp://10.241.5.222:4567
2018-03-02 10:47:44 140596706543808 [Note] WSREP: declaring 0a0faad9 at tcp://10.241.5.222:4567 stable
2018-03-02 10:47:44 140596706543808 [Note] WSREP: declaring 40c1e704 at tcp://10.241.5.220:4567 stable
2018-03-02 10:47:44 140596706543808 [Note] WSREP: Node 0a0faad9 state prim
2018-03-02 10:47:44 140596706543808 [Note] WSREP: view(view_id(PRIM,0a0faad9,23) memb {
        0a0faad9,0
        40c1e704,0
        c144d827,0
} joined {
} left {
} partitioned {
})
2018-03-02 10:47:44 140596706543808 [Note] WSREP: save pc into disk
2018-03-02 10:47:44 140596706543808 [Note] WSREP: gcomm: connected
2018-03-02 10:47:44 140596706543808 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2018-03-02 10:47:44 140596706543808 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2018-03-02 10:47:44 140596706543808 [Note] WSREP: Opened channel 'rdc-cluster-1'
2018-03-02 10:47:44 140596507354880 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2018-03-02 10:47:44 140596706543808 [Note] WSREP: Waiting for SST to complete.
2018-03-02 10:47:44 140596507354880 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2018-03-02 10:47:44 140596507354880 [Note] WSREP: STATE EXCHANGE: sent state msg: c191bd3a-1dfe-11e8-b54c-4f0a6cf28162
2018-03-02 10:47:44 140596507354880 [Note] WSREP: STATE EXCHANGE: got state msg: c191bd3a-1dfe-11e8-b54c-4f0a6cf28162 from 0 (hb01-galera-cl01-03)
2018-03-02 10:47:44 140596507354880 [Note] WSREP: STATE EXCHANGE: got state msg: c191bd3a-1dfe-11e8-b54c-4f0a6cf28162 from 1 (hb01-galera-cl01-01)
2018-03-02 10:47:44 140596507354880 [Note] WSREP: STATE EXCHANGE: got state msg: c191bd3a-1dfe-11e8-b54c-4f0a6cf28162 from 2 (hb01-galera-cl01-02)
2018-03-02 10:47:44 140596507354880 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 22,
        members    = 2/3 (joined/total),
        act_id     = 467382351,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 6b51589d-0b5f-11e8-94e6-82aa9d3be300
2018-03-02 10:47:44 140596507354880 [Note] WSREP: Flow-control interval: [28, 28]
2018-03-02 10:47:44 140596507354880 [Note] WSREP: Trying to continue unpaused monitor
2018-03-02 10:47:44 140596507354880 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 467382351)
2018-03-02 10:47:44 140596498659072 [Note] WSREP: State transfer required:
        Group state: 6b51589d-0b5f-11e8-94e6-82aa9d3be300:467382351
        Local state: 6b51589d-0b5f-11e8-94e6-82aa9d3be300:467319538
2018-03-02 10:47:44 140596498659072 [Note] WSREP: New cluster view: global state: 6b51589d-0b5f-11e8-94e6-82aa9d3be300:467382351, view# 23: Primary, number of nodes: 3, my index: 2, protocol version 3
2018-03-02 10:47:44 140596498659072 [Warning] WSREP: Gap in state sequence. Need state transfer.
2018-03-02 10:47:44 140574950680320 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.241.5.221' --datadir '/RDC/mysql/data/'   --parent '21401' --binlog '/RDC/mysql/binlog/mariadb-bin' '
WSREP_SST: [INFO] Logging all stderr of SST/Innobackupex to syslog (20180302 10:47:44.863)
2018-03-02 10:47:46 140596515747584 [Note] WSREP: (c144d827, 'tcp://0.0.0.0:4567') connection to peer c144d827 with addr tcp://10.241.5.221:4567 timed out, no messages seen in PT3S
2018-03-02 10:47:47 140596515747584 [Note] WSREP: (c144d827, 'tcp://0.0.0.0:4567') turning message relay requesting off
2018-03-02 10:49:13 140574950680320 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.241.5.221' --datadir '/RDC/mysql/data/'   --parent '21401' --binlog '/RDC/mysql/binlog/mariadb-bin'
        Read: '(null)'
2018-03-02 10:49:13 140574950680320 [ERROR] WSREP: Process was aborted.
2018-03-02 10:49:13 140574950680320 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.241.5.221' --datadir '/RDC/mysql/data/'   --parent '21401' --binlog '/RDC/mysql/binlog/mariadb-bin' : 2 (No such file or directory)
2018-03-02 10:49:13 140596498659072 [ERROR] WSREP: Failed to prepare for 'xtrabackup-v2' SST. Unrecoverable.
2018-03-02 10:49:13 140596498659072 [ERROR] Aborting
 
Error in my_thread_global_end(): 1 threads didn't exit
2018-03-02 11:06:51 139691983460544 [Note] WSREP: Read nil XID from storage engines, skipping position init
2018-03-02 11:06:51 139691983460544 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2018-03-02 11:06:51 139691983460544 [Note] WSREP: wsrep_load(): Galera 25.3.23(r3789) by Codership Oy <info@codership.com> loaded successfully.
2018-03-02 11:06:51 139691983460544 [Note] WSREP: CRC-32C: using hardware acceleration.
2018-03-02 11:06:51 139691983460544 [Note] WSREP: Found saved state: 6b51589d-0b5f-11e8-94e6-82aa9d3be300:-1, safe_to_bootstrap: 0
2018-03-02 11:06:51 139691983460544 [Note] WSREP: Passing config to GCS: base_dir = /RDC/mysql/data/; base_host = 10.241.5.221; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /RDC/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /RDC/mysql/data//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 20G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc
2018-03-02 11:06:51 139691983460544 [Note] WSREP: GCache history reset: 6b51589d-0b5f-11e8-94e6-82aa9d3be300:0 -> 6b51589d-0b5f-11e8-94e6-82aa9d3be300:467319538
2018-03-02 11:06:51 139691983460544 [Note] WSREP: Assign initial position for certification: 467319538, protocol version: -1
2018-03-02 11:06:51 139691983460544 [Note] WSREP: wsrep_sst_grab()
2018-03-02 11:06:51 139691983460544 [Note] WSREP: Start replication
2018-03-02 11:06:51 139691983460544 [Note] WSREP: Setting initial position to 6b51589d-0b5f-11e8-94e6-82aa9d3be300:467319538
2018-03-02 11:06:51 139691983460544 [Note] WSREP: protonet asio version 0
2018-03-02 11:06:51 139691983460544 [Note] WSREP: Using CRC-32C for message checksums.
2018-03-02 11:06:51 139691983460544 [Note] WSREP: backend: asio
2018-03-02 11:06:51 139691983460544 [Note] WSREP: gcomm thread scheduling priority set to other:0
2018-03-02 11:06:51 139691983460544 [Warning] WSREP: access file(/RDC/mysql/data//gvwstate.dat) failed(No such file or directory)
2018-03-02 11:06:51 139691983460544 [Note] WSREP: restore pc from disk failed
2018-03-02 11:06:51 139691983460544 [Note] WSREP: GMCast version 0
2018-03-02 11:06:51 139691983460544 [Note] WSREP: (6d87b0f9, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2018-03-02 11:06:51 139691983460544 [Note] WSREP: (6d87b0f9, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2018-03-02 11:06:51 139691983460544 [Note] WSREP: EVS version 0
2018-03-02 11:06:51 139691983460544 [Note] WSREP: gcomm: connecting to group 'rdc-cluster-1', peer '10.241.5.221:,10.241.5.222:,10.241.5.220:'
2018-03-02 11:06:51 139691983460544 [Note] WSREP: (6d87b0f9, 'tcp://0.0.0.0:4567') connection established to 6d87b0f9 tcp://10.241.5.221:4567
2018-03-02 11:06:51 139691983460544 [Warning] WSREP: (6d87b0f9, 'tcp://0.0.0.0:4567') address 'tcp://10.241.5.221:4567' points to own listening address, blacklisting
2018-03-02 11:06:51 139691983460544 [Note] WSREP: (6d87b0f9, 'tcp://0.0.0.0:4567') connection established to 40c1e704 tcp://10.241.5.220:4567
2018-03-02 11:06:51 139691983460544 [Note] WSREP: (6d87b0f9, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2018-03-02 11:06:51 139691983460544 [Note] WSREP: (6d87b0f9, 'tcp://0.0.0.0:4567') connection established to 0a0faad9 tcp://10.241.5.222:4567
2018-03-02 11:06:51 139691983460544 [Note] WSREP: declaring 0a0faad9 at tcp://10.241.5.222:4567 stable
2018-03-02 11:06:51 139691983460544 [Note] WSREP: declaring 40c1e704 at tcp://10.241.5.220:4567 stable
2018-03-02 11:06:51 139691983460544 [Note] WSREP: Node 0a0faad9 state prim
2018-03-02 11:06:51 139691983460544 [Note] WSREP: view(view_id(PRIM,0a0faad9,25) memb {
        0a0faad9,0
        40c1e704,0
        6d87b0f9,0
} joined {
} left {
} partitioned {
})
2018-03-02 11:06:51 139691983460544 [Note] WSREP: save pc into disk
2018-03-02 11:06:52 139691983460544 [Note] WSREP: gcomm: connected
2018-03-02 11:06:52 139691983460544 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2018-03-02 11:06:52 139691983460544 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2018-03-02 11:06:52 139691983460544 [Note] WSREP: Opened channel 'rdc-cluster-1'
2018-03-02 11:06:52 139670239299328 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2018-03-02 11:06:52 139670239299328 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2018-03-02 11:06:52 139691983460544 [Note] WSREP: Waiting for SST to complete.
2018-03-02 11:06:52 139670239299328 [Note] WSREP: STATE EXCHANGE: sent state msg: 6d8f82e8-1e01-11e8-b761-8af7ed4b3c00
2018-03-02 11:06:52 139670239299328 [Note] WSREP: STATE EXCHANGE: got state msg: 6d8f82e8-1e01-11e8-b761-8af7ed4b3c00 from 0 (hb01-galera-cl01-03)
2018-03-02 11:06:52 139670239299328 [Note] WSREP: STATE EXCHANGE: got state msg: 6d8f82e8-1e01-11e8-b761-8af7ed4b3c00 from 1 (hb01-galera-cl01-01)
2018-03-02 11:06:52 139670239299328 [Note] WSREP: STATE EXCHANGE: got state msg: 6d8f82e8-1e01-11e8-b761-8af7ed4b3c00 from 2 (hb01-galera-cl01-02)
2018-03-02 11:06:52 139670239299328 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 24,
        members    = 2/3 (joined/total),
        act_id     = 467641595,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 6b51589d-0b5f-11e8-94e6-82aa9d3be300
2018-03-02 11:06:52 139670239299328 [Note] WSREP: Flow-control interval: [28, 28]
2018-03-02 11:06:52 139670239299328 [Note] WSREP: Trying to continue unpaused monitor
2018-03-02 11:06:52 139670239299328 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 467641595)
2018-03-02 11:06:52 139691800753920 [Note] WSREP: State transfer required:
        Group state: 6b51589d-0b5f-11e8-94e6-82aa9d3be300:467641595
        Local state: 6b51589d-0b5f-11e8-94e6-82aa9d3be300:467319538
2018-03-02 11:06:52 139691800753920 [Note] WSREP: New cluster view: global state: 6b51589d-0b5f-11e8-94e6-82aa9d3be300:467641595, view# 25: Primary, number of nodes: 3, my index: 2, protocol version 3
2018-03-02 11:06:52 139691800753920 [Warning] WSREP: Gap in state sequence. Need state transfer.
2018-03-02 11:06:52 139670230906624 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.241.5.221' --datadir '/RDC/mysql/data/'   --parent '24625' --binlog '/RDC/mysql/binlog/mariadb-bin' '
WSREP_SST: [INFO] Logging all stderr of SST/Innobackupex to syslog (20180302 11:06:52.380)
2018-03-02 11:06:54 139670247692032 [Note] WSREP: (6d87b0f9, 'tcp://0.0.0.0:4567') connection to peer 6d87b0f9 with addr tcp://10.241.5.221:4567 timed out, no messages seen in PT3S
2018-03-02 11:06:55 139670247692032 [Note] WSREP: (6d87b0f9, 'tcp://0.0.0.0:4567') turning message relay requesting off
2018-03-02 11:08:21 139670230906624 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.241.5.221' --datadir '/RDC/mysql/data/'   --parent '24625' --binlog '/RDC/mysql/binlog/mariadb-bin'
        Read: '(null)'
2018-03-02 11:08:21 139670230906624 [ERROR] WSREP: Process was aborted.
2018-03-02 11:08:21 139670230906624 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.241.5.221' --datadir '/RDC/mysql/data/'   --parent '24625' --binlog '/RDC/mysql/binlog/mariadb-bin' : 2 (No such file or directory)
2018-03-02 11:08:21 139691800753920 [ERROR] WSREP: Failed to prepare for 'xtrabackup-v2' SST. Unrecoverable.
2018-03-02 11:08:21 139691800753920 [ERROR] Aborting
 
Error in my_thread_global_end(): 1 threads didn't exit
2018-03-02 11:15:16 140552530852032 [Note] WSREP: Read nil XID from storage engines, skipping position init
2018-03-02 11:15:16 140552530852032 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2018-03-02 11:15:16 140552530852032 [Note] WSREP: wsrep_load(): Galera 25.3.23(r3789) by Codership Oy <info@codership.com> loaded successfully.
2018-03-02 11:15:16 140552530852032 [Note] WSREP: CRC-32C: using hardware acceleration.
2018-03-02 11:15:16 140552530852032 [Warning] WSREP: Could not open state file for reading: '/RDC/mysql/data//grastate.dat'
2018-03-02 11:15:16 140552530852032 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2018-03-02 11:15:16 140552530852032 [Note] WSREP: Passing config to GCS: base_dir = /RDC/mysql/data/; base_host = 10.241.5.221; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /RDC/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /RDC/mysql/data//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 20G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc
2018-03-02 11:15:16 140552530852032 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 00000000-0000-0000-0000-000000000000:-1
2018-03-02 11:15:16 140552530852032 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2018-03-02 11:15:16 140552530852032 [Note] WSREP: wsrep_sst_grab()
2018-03-02 11:15:16 140552530852032 [Note] WSREP: Start replication
2018-03-02 11:15:16 140552530852032 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2018-03-02 11:15:16 140552530852032 [Note] WSREP: protonet asio version 0
2018-03-02 11:15:16 140552530852032 [Note] WSREP: Using CRC-32C for message checksums.
2018-03-02 11:15:16 140552530852032 [Note] WSREP: backend: asio
2018-03-02 11:15:16 140552530852032 [Note] WSREP: gcomm thread scheduling priority set to other:0
2018-03-02 11:15:16 140552530852032 [Warning] WSREP: access file(/RDC/mysql/data//gvwstate.dat) failed(No such file or directory)
2018-03-02 11:15:16 140552530852032 [Note] WSREP: restore pc from disk failed
2018-03-02 11:15:16 140552530852032 [Note] WSREP: GMCast version 0
2018-03-02 11:15:16 140552530852032 [Note] WSREP: (9a6e409c, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2018-03-02 11:15:16 140552530852032 [Note] WSREP: (9a6e409c, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2018-03-02 11:15:16 140552530852032 [Note] WSREP: EVS version 0
2018-03-02 11:15:16 140552530852032 [Note] WSREP: gcomm: connecting to group 'rdc-cluster-1', peer '10.241.5.221:,10.241.5.222:,10.241.5.220:'
2018-03-02 11:15:16 140552530852032 [Note] WSREP: (9a6e409c, 'tcp://0.0.0.0:4567') connection established to 9a6e409c tcp://10.241.5.221:4567
2018-03-02 11:15:16 140552530852032 [Warning] WSREP: (9a6e409c, 'tcp://0.0.0.0:4567') address 'tcp://10.241.5.221:4567' points to own listening address, blacklisting
2018-03-02 11:15:16 140552530852032 [Note] WSREP: (9a6e409c, 'tcp://0.0.0.0:4567') connection established to 40c1e704 tcp://10.241.5.220:4567
2018-03-02 11:15:16 140552530852032 [Note] WSREP: (9a6e409c, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2018-03-02 11:15:16 140552530852032 [Note] WSREP: (9a6e409c, 'tcp://0.0.0.0:4567') connection established to 0a0faad9 tcp://10.241.5.222:4567
2018-03-02 11:15:16 140552530852032 [Note] WSREP: declaring 0a0faad9 at tcp://10.241.5.222:4567 stable
2018-03-02 11:15:16 140552530852032 [Note] WSREP: declaring 40c1e704 at tcp://10.241.5.220:4567 stable
2018-03-02 11:15:16 140552530852032 [Note] WSREP: Node 0a0faad9 state prim
2018-03-02 11:15:16 140552530852032 [Note] WSREP: view(view_id(PRIM,0a0faad9,27) memb {
        0a0faad9,0
        40c1e704,0
        9a6e409c,0
} joined {
} left {
} partitioned {
})
2018-03-02 11:15:16 140552530852032 [Note] WSREP: save pc into disk
2018-03-02 11:15:16 140552530852032 [Note] WSREP: gcomm: connected
2018-03-02 11:15:16 140552530852032 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2018-03-02 11:15:16 140552530852032 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2018-03-02 11:15:16 140552530852032 [Note] WSREP: Opened channel 'rdc-cluster-1'
2018-03-02 11:15:16 140530784655104 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2018-03-02 11:15:16 140552530852032 [Note] WSREP: Waiting for SST to complete.
2018-03-02 11:15:16 140530784655104 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2018-03-02 11:15:16 140530784655104 [Note] WSREP: STATE EXCHANGE: sent state msg: 9a6ee7ea-1e02-11e8-9075-071fb0408cc9
2018-03-02 11:15:16 140530784655104 [Note] WSREP: STATE EXCHANGE: got state msg: 9a6ee7ea-1e02-11e8-9075-071fb0408cc9 from 0 (hb01-galera-cl01-03)
2018-03-02 11:15:16 140530784655104 [Note] WSREP: STATE EXCHANGE: got state msg: 9a6ee7ea-1e02-11e8-9075-071fb0408cc9 from 1 (hb01-galera-cl01-01)
2018-03-02 11:15:16 140530784655104 [Note] WSREP: STATE EXCHANGE: got state msg: 9a6ee7ea-1e02-11e8-9075-071fb0408cc9 from 2 (hb01-galera-cl01-02)
2018-03-02 11:15:16 140530784655104 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 26,
        members    = 2/3 (joined/total),
        act_id     = 467915709,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 6b51589d-0b5f-11e8-94e6-82aa9d3be300
2018-03-02 11:15:16 140530784655104 [Note] WSREP: Flow-control interval: [28, 28]
2018-03-02 11:15:16 140530784655104 [Note] WSREP: Trying to continue unpaused monitor
2018-03-02 11:15:16 140530784655104 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 467915709)
2018-03-02 11:15:16 140552339752704 [Note] WSREP: State transfer required:
        Group state: 6b51589d-0b5f-11e8-94e6-82aa9d3be300:467915709
        Local state: 00000000-0000-0000-0000-000000000000:-1
2018-03-02 11:15:16 140552339752704 [Note] WSREP: New cluster view: global state: 6b51589d-0b5f-11e8-94e6-82aa9d3be300:467915709, view# 27: Primary, number of nodes: 3, my index: 2, protocol version 3
2018-03-02 11:15:16 140552339752704 [Warning] WSREP: Gap in state sequence. Need state transfer.
2018-03-02 11:15:16 140530776262400 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.241.5.221' --datadir '/RDC/mysql/data/'   --parent '26402' --binlog '/RDC/mysql/binlog/mariadb-bin' '
WSREP_SST: [INFO] Logging all stderr of SST/Innobackupex to syslog (20180302 11:15:17.186)
2018-03-02 11:15:19 140530793047808 [Note] WSREP: (9a6e409c, 'tcp://0.0.0.0:4567') connection to peer 9a6e409c with addr tcp://10.241.5.221:4567 timed out, no messages seen in PT3S
2018-03-02 11:15:19 140530793047808 [Note] WSREP: (9a6e409c, 'tcp://0.0.0.0:4567') turning message relay requesting off
2018-03-02 11:16:46 140530776262400 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.241.5.221' --datadir '/RDC/mysql/data/'   --parent '26402' --binlog '/RDC/mysql/binlog/mariadb-bin'
        Read: '(null)'
2018-03-02 11:16:46 140530776262400 [ERROR] WSREP: Process was aborted.
2018-03-02 11:16:46 140530776262400 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.241.5.221' --datadir '/RDC/mysql/data/'   --parent '26402' --binlog '/RDC/mysql/binlog/mariadb-bin' : 2 (No such file or directory)
2018-03-02 11:16:46 140552339752704 [ERROR] WSREP: Failed to prepare for 'xtrabackup-v2' SST. Unrecoverable.
2018-03-02 11:16:46 140552339752704 [ERROR] Aborting
 
Error in my_thread_global_end(): 1 threads didn't exit
2018-03-02 11:45:51 140065529612480 [Note] WSREP: Read nil XID from storage engines, skipping position init
2018-03-02 11:45:51 140065529612480 [Note] WSREP: wsrep_load(): loading provider library 'none'
2018-03-02 11:45:51 140065529612480 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-03-02 11:45:51 140065529612480 [Note] InnoDB: Uses event mutexes
2018-03-02 11:45:51 140065529612480 [Note] InnoDB: Compressed tables use zlib 1.2.8
2018-03-02 11:45:51 140065529612480 [Note] InnoDB: Using Linux native AIO
2018-03-02 11:45:51 140065529612480 [Note] InnoDB: Number of pools: 1
2018-03-02 11:45:51 140065529612480 [Note] InnoDB: Using SSE2 crc32 instructions
2018-03-02 11:45:51 140065529612480 [Note] InnoDB: Initializing buffer pool, total size = 378G, instances = 8, chunk size = 128M
2018-03-02 11:46:02 140065529612480 [Note] InnoDB: Completed initialization of buffer pool
2018-03-02 11:46:02 139632422274816 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-03-02 11:46:02 140065529612480 [Note] InnoDB: Highest supported file format is Barracuda.
2018-03-02 11:46:04 140065529612480 [Note] InnoDB: 128 out of 128 rollback segments are active.
2018-03-02 11:46:04 140065529612480 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-03-02 11:46:04 140065529612480 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2018-03-02 11:46:04 140065529612480 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2018-03-02 11:46:04 140065529612480 [Note] InnoDB: Waiting for purge to start
2018-03-02 11:46:04 140065529612480 [Note] InnoDB: 5.7.21 started; log sequence number 1619267
2018-03-02 11:46:04 139641726428928 [Note] InnoDB: Loading buffer pool(s) from /RDC/mysql/data/ib_buffer_pool
2018-03-02 11:46:04 139641726428928 [Note] InnoDB: Buffer pool(s) load completed at 180302 11:46:04
2018-03-02 11:46:04 140065529612480 [Note] Plugin 'FEEDBACK' is disabled.
2018-03-02 11:46:48 140111382202560 [Note] WSREP: Read nil XID from storage engines, skipping position init
2018-03-02 11:46:48 140111382202560 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2018-03-02 11:46:48 140111382202560 [Note] WSREP: wsrep_load(): Galera 25.3.23(r3789) by Codership Oy <info@codership.com> loaded successfully.
2018-03-02 11:46:48 140111382202560 [Note] WSREP: CRC-32C: using hardware acceleration.
2018-03-02 11:46:48 140111382202560 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2018-03-02 11:46:48 140111382202560 [Note] WSREP: Passing config to GCS: base_dir = /RDC/mysql/data/; base_host = 10.241.5.221; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /RDC/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /RDC/mysql/data//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 20G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc
2018-03-02 11:46:48 140111382202560 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 00000000-0000-0000-0000-000000000000:-1
2018-03-02 11:46:48 140111382202560 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2018-03-02 11:46:48 140111382202560 [Note] WSREP: wsrep_sst_grab()
2018-03-02 11:46:48 140111382202560 [Note] WSREP: Start replication
2018-03-02 11:46:48 140111382202560 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2018-03-02 11:46:48 140111382202560 [Note] WSREP: protonet asio version 0
2018-03-02 11:46:48 140111382202560 [Note] WSREP: Using CRC-32C for message checksums.
2018-03-02 11:46:48 140111382202560 [Note] WSREP: backend: asio
2018-03-02 11:46:48 140111382202560 [Note] WSREP: gcomm thread scheduling priority set to other:0
2018-03-02 11:46:48 140111382202560 [Warning] WSREP: access file(/RDC/mysql/data//gvwstate.dat) failed(No such file or directory)
2018-03-02 11:46:48 140111382202560 [Note] WSREP: restore pc from disk failed
2018-03-02 11:46:48 140111382202560 [Note] WSREP: GMCast version 0
2018-03-02 11:46:48 140111382202560 [Note] WSREP: (025fa3ee, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2018-03-02 11:46:48 140111382202560 [Note] WSREP: (025fa3ee, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2018-03-02 11:46:48 140111382202560 [Note] WSREP: EVS version 0
2018-03-02 11:46:48 140111382202560 [Note] WSREP: gcomm: connecting to group 'rdc-cluster-1', peer '10.241.5.221:,10.241.5.222:,10.241.5.220:'
2018-03-02 11:46:48 140111382202560 [Note] WSREP: (025fa3ee, 'tcp://0.0.0.0:4567') connection established to 025fa3ee tcp://10.241.5.221:4567
2018-03-02 11:46:48 140111382202560 [Warning] WSREP: (025fa3ee, 'tcp://0.0.0.0:4567') address 'tcp://10.241.5.221:4567' points to own listening address, blacklisting
2018-03-02 11:46:48 140111382202560 [Note] WSREP: (025fa3ee, 'tcp://0.0.0.0:4567') connection established to 8fa6765f tcp://10.241.5.220:4567
2018-03-02 11:46:48 140111382202560 [Note] WSREP: (025fa3ee, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2018-03-02 11:46:48 140111382202560 [Note] WSREP: (025fa3ee, 'tcp://0.0.0.0:4567') connection established to 0a0faad9 tcp://10.241.5.222:4567
2018-03-02 11:46:48 140111382202560 [Note] WSREP: declaring 0a0faad9 at tcp://10.241.5.222:4567 stable
2018-03-02 11:46:48 140111382202560 [Note] WSREP: declaring 8fa6765f at tcp://10.241.5.220:4567 stable
2018-03-02 11:46:48 140111382202560 [Note] WSREP: Node 0a0faad9 state prim
2018-03-02 11:46:48 140111382202560 [Note] WSREP: view(view_id(PRIM,025fa3ee,31) memb {
        025fa3ee,0
        0a0faad9,0
        8fa6765f,0
} joined {
} left {
} partitioned {
})
2018-03-02 11:46:48 140111382202560 [Note] WSREP: save pc into disk
2018-03-02 11:46:49 140111382202560 [Note] WSREP: gcomm: connected
2018-03-02 11:46:49 140111382202560 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2018-03-02 11:46:49 140111382202560 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2018-03-02 11:46:49 140111382202560 [Note] WSREP: Opened channel 'rdc-cluster-1'
2018-03-02 11:46:49 140089639511808 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2018-03-02 11:46:49 140111382202560 [Note] WSREP: Waiting for SST to complete.
2018-03-02 11:46:49 140089639511808 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 02ac5985-1e07-11e8-9509-d60e7fa9376f
2018-03-02 11:46:49 140089639511808 [Note] WSREP: STATE EXCHANGE: sent state msg: 02ac5985-1e07-11e8-9509-d60e7fa9376f
2018-03-02 11:46:49 140089639511808 [Note] WSREP: STATE EXCHANGE: got state msg: 02ac5985-1e07-11e8-9509-d60e7fa9376f from 0 (hb01-galera-cl01-02)
2018-03-02 11:46:49 140089639511808 [Note] WSREP: STATE EXCHANGE: got state msg: 02ac5985-1e07-11e8-9509-d60e7fa9376f from 1 (hb01-galera-cl01-03)
2018-03-02 11:46:49 140089639511808 [Note] WSREP: STATE EXCHANGE: got state msg: 02ac5985-1e07-11e8-9509-d60e7fa9376f from 2 (hb01-galera-cl01-01)
2018-03-02 11:46:49 140089639511808 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 30,
        members    = 2/3 (joined/total),
        act_id     = 469231120,
        last_appl. = -1,
        protocols  = 0/8/3 (gcs/repl/appl),
        group UUID = 6b51589d-0b5f-11e8-94e6-82aa9d3be300
2018-03-02 11:46:49 140089639511808 [Note] WSREP: Flow-control interval: [28, 28]
2018-03-02 11:46:49 140089639511808 [Note] WSREP: Trying to continue unpaused monitor
2018-03-02 11:46:49 140089639511808 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 469231120)
2018-03-02 11:46:49 140111229683456 [Note] WSREP: State transfer required:
        Group state: 6b51589d-0b5f-11e8-94e6-82aa9d3be300:469231120
        Local state: 00000000-0000-0000-0000-000000000000:-1
2018-03-02 11:46:49 140111229683456 [Note] WSREP: New cluster view: global state: 6b51589d-0b5f-11e8-94e6-82aa9d3be300:469231120, view# 31: Primary, number of nodes: 3, my index: 0, protocol version 3
2018-03-02 11:46:49 140111229683456 [Warning] WSREP: Gap in state sequence. Need state transfer.
2018-03-02 11:46:49 140089631119104 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.241.5.221' --datadir '/RDC/mysql/data/'   --parent '31376' --binlog '/RDC/mysql/binlog/mariadb-bin' '
WSREP_SST: [INFO] Logging all stderr of SST/Innobackupex to syslog (20180302 11:46:49.566)
2018-03-02 11:46:52 140089647904512 [Note] WSREP: (025fa3ee, 'tcp://0.0.0.0:4567') connection to peer 025fa3ee with addr tcp://10.241.5.221:4567 timed out, no messages seen in PT3S
2018-03-02 11:46:52 140089647904512 [Note] WSREP: (025fa3ee, 'tcp://0.0.0.0:4567') turning message relay requesting off
2018-03-02 11:48:18 140089631119104 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.241.5.221' --datadir '/RDC/mysql/data/'   --parent '31376' --binlog '/RDC/mysql/binlog/mariadb-bin'
        Read: '(null)'
2018-03-02 11:48:18 140089631119104 [ERROR] WSREP: Process was aborted.
2018-03-02 11:48:18 140089631119104 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.241.5.221' --datadir '/RDC/mysql/data/'   --parent '31376' --binlog '/RDC/mysql/binlog/mariadb-bin' : 2 (No such file or directory)
2018-03-02 11:48:18 140111229683456 [ERROR] WSREP: Failed to prepare for 'xtrabackup-v2' SST. Unrecoverable.
2018-03-02 11:48:18 140111229683456 [ERROR] Aborting
 
Error in my_thread_global_end(): 1 threads didn't exit

on init (12.2.12) I load node 3 with slave, then node 1 & 2 with SST and it's worked perfectly



 Comments   
Comment by Elena Stepanova [ 2018-03-03 ]

Is it not the same problem as MDEV-15254?

Comment by Aurélien LEQUOY [ 2018-03-04 ]

i seem i saw request on donor, i check it too on monday

Comment by Aurélien LEQUOY [ 2018-03-04 ]

the problem here, I should take IST and not SST. there are 2 differents problems there.

and SST totally failed. even on new cluster maybe fixed with 15254.

Comment by Marko Mäkelä [ 2018-03-09 ]

XtraBackup 2.4 is not safe to use with MariaDB 10.2, and Mariabackup should be used instead.

The reason is that unlike MySQL 5.7, which XtraBackup 2.4 is based on, MariaDB 10.2 made an effort (MDEV-12289) to not break compatibility with older InnoDB versions. The fix of this compatibility issue makes MariaDB 10.2 slightly less compatible with MySQL 5.7 and XtraBackup 2.4.

Comment by Aurélien LEQUOY [ 2018-03-20 ]

https://jira.mariadb.org/browse/MDEV-15383?focusedCommentId=108624&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-108624

Comment by Aurélien LEQUOY [ 2018-03-22 ]

@Marko Mäkelä

Xtrabackup or Mariabackup : the problem is the same : https://jira.mariadb.org/browse/MDEV-15383?focusedCommentId=108624&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-108624

Comment by Jan Lindström (Inactive) [ 2018-07-26 ]

I could not repeat the problem, in my system 10.2.12 -> 10.2.13 upgrade worked fine. So, either my configuration was different or system was different. Can you provide configuration files, joiner and donor full logs and system information?

Generated at Thu Feb 08 08:21:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.