[MDEV-9710] MariaDB Galera Cluster on EC2 Created: 2016-03-10  Updated: 2017-11-05  Resolved: 2017-11-05

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Owas Assignee: Sachin Setiya (Inactive)
Resolution: Incomplete Votes: 0
Labels: galera
Environment:

AWS EC2 RHEL



 Description   

I am getting the below error when trying to have the second node join the cluster , the main master was started succesfully, Any idea whats happening here ? Trying to deploy on EC2. Trying to do a quick POC to measure performance against another RDS offering, would appreciate any help

160307 16:34:34 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '{IP Address Removed}' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix 'IP Address Removed' --parent '5910' ''
Read: '(null)'
160307 16:34:34 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '5910' '' : 2 (No such file or directory)
160307 16:34:34 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
160307 16:34:34 [ERROR] Aborting

Thanks,
Onam



 Comments   
Comment by Sergei Golubchik [ 2016-03-11 ]

What version of MariaDB is it?

Comment by Nirbhay Choubey (Inactive) [ 2016-03-11 ]

owaisnmalik Can you also share the full error log from donor node as well as the configuration?

Comment by Owas [ 2016-03-15 ]

The error log can be seen below,
[root@ip-172-21-83-203 mysql]# tail -f ip-172-21-83-203. -n 100
160311 14:59:07 [Note] WSREP: wsrep_load(): Galera 25.3.14(r3560) by Codership Oy <info@codership.com> loaded successfully.
160311 14:59:07 [Note] WSREP: CRC-32C: using hardware acceleration.
160311 14:59:07 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
160311 14:59:07 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.21.83.203; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false
160311 14:59:07 [Note] WSREP: Service thread queue flushed.
160311 14:59:07 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
160311 14:59:07 [Note] WSREP: wsrep_sst_grab()
160311 14:59:07 [Note] WSREP: Start replication
160311 14:59:07 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
160311 14:59:07 [Note] WSREP: protonet asio version 0
160311 14:59:07 [Note] WSREP: Using CRC-32C for message checksums.
160311 14:59:07 [Note] WSREP: backend: asio
160311 14:59:07 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
160311 14:59:07 [Note] WSREP: restore pc from disk failed
160311 14:59:07 [Note] WSREP: GMCast version 0
160311 14:59:07 [Note] WSREP: (b6d57439, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
160311 14:59:07 [Note] WSREP: (b6d57439, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
160311 14:59:07 [Note] WSREP: EVS version 0
160311 14:59:07 [Note] WSREP: gcomm: connecting to group 'galera_cluster', peer '172.21.69.153:,172.21.83.203:,172.21.85.132:'
160311 14:59:07 [Warning] WSREP: (b6d57439, 'tcp://0.0.0.0:4567') address 'tcp://172.21.83.203:4567' points to own listening address, blacklisting
160311 14:59:07 [Note] WSREP: (b6d57439, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
160311 14:59:08 [Note] WSREP: declaring 100fea4e at tcp://172.21.69.153:4567 stable
160311 14:59:08 [Note] WSREP: Node 100fea4e state prim
160311 14:59:08 [Note] WSREP: view(view_id(PRIM,100fea4e,24) memb

{ 100fea4e,0 b6d57439,0 }

joined {
} left {
} partitioned {
})
160311 14:59:08 [Note] WSREP: save pc into disk
160311 14:59:08 [Note] WSREP: discarding pending addr without UUID: tcp://172.21.85.132:4567
160311 14:59:08 [Note] WSREP: discarding pending addr proto entry 0x7f88504ac680
160311 14:59:08 [Note] WSREP: gcomm: connected
160311 14:59:08 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
160311 14:59:08 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
160311 14:59:08 [Note] WSREP: Opened channel 'galera_cluster'
160311 14:59:08 [Note] WSREP: Waiting for SST to complete.
160311 14:59:08 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
160311 14:59:08 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
160311 14:59:08 [Note] WSREP: STATE EXCHANGE: sent state msg: b7226365-e7c3-11e5-9b9c-afd796eece92
160311 14:59:08 [Note] WSREP: STATE EXCHANGE: got state msg: b7226365-e7c3-11e5-9b9c-afd796eece92 from 0 (GaleraMaster1)
160311 14:59:08 [Note] WSREP: STATE EXCHANGE: got state msg: b7226365-e7c3-11e5-9b9c-afd796eece92 from 1 (GaleraMaster2)
160311 14:59:08 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 23,
members = 1/2 (joined/total),
act_id = 0,
last_appl. = -1,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = 1e10a069-e189-11e5-b08f-be76e36244d3
160311 14:59:08 [Note] WSREP: Flow-control interval: [23, 23]
160311 14:59:08 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
160311 14:59:08 [Note] WSREP: State transfer required:
Group state: 1e10a069-e189-11e5-b08f-be76e36244d3:0
Local state: 00000000-0000-0000-0000-000000000000:-1
160311 14:59:08 [Note] WSREP: New cluster view: global state: 1e10a069-e189-11e5-b08f-be76e36244d3:0, view# 24: Primary, number of nodes: 2, my index: 1, protocol version 3
160311 14:59:08 [Warning] WSREP: Gap in state sequence. Need state transfer.
160311 14:59:08 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.21.83.203' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '25897' '' '
160311 14:59:08 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '172.21.83.203' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '25897' ''
Read: '(null)'
160311 14:59:08 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '172.21.83.203' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '25897' '' : 2 (No such file or directory)
160311 14:59:08 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
160311 14:59:08 [ERROR] Aborting

160311 14:59:10 [Note] WSREP: Closing send monitor...
160311 14:59:10 [Note] WSREP: Closed send monitor.
160311 14:59:10 [Note] WSREP: gcomm: terminating thread
160311 14:59:10 [Note] WSREP: gcomm: joining thread
160311 14:59:10 [Note] WSREP: gcomm: closing backend
160311 14:59:11 [Note] WSREP: (b6d57439, 'tcp://0.0.0.0:4567') turning message relay requesting off
160311 14:59:11 [Note] WSREP: view(view_id(NON_PRIM,100fea4e,24) memb

{ b6d57439,0 }

joined {
} left {
} partitioned

{ 100fea4e,0 }

)
160311 14:59:11 [Note] WSREP: view((empty))
160311 14:59:11 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
160311 14:59:11 [Note] WSREP: gcomm: closed
160311 14:59:11 [Note] WSREP: Flow-control interval: [16, 16]
160311 14:59:11 [Note] WSREP: Received NON-PRIMARY.
160311 14:59:11 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 0)
160311 14:59:11 [Note] WSREP: Received self-leave message.
160311 14:59:11 [Note] WSREP: Flow-control interval: [0, 0]
160311 14:59:11 [Note] WSREP: Received SELF-LEAVE. Closing connection.
160311 14:59:11 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 0)
160311 14:59:11 [Note] WSREP: RECV thread exiting 0: Success
160311 14:59:11 [Note] WSREP: recv_thread() joined.
160311 14:59:11 [Note] WSREP: Closing replication queue.
160311 14:59:11 [Note] WSREP: Closing slave action queue.
160311 14:59:11 [Note] WSREP: Service disconnected.
160311 14:59:11 [Note] WSREP: rollbacker thread exiting
160311 14:59:12 [Note] WSREP: Some threads may fail to exit.
160311 14:59:12 [Note] /usr/sbin/mysqld: Shutdown complete

Error in my_thread_global_end(): 1 threads didn't exit
160311 14:59:17 mysqld_safe mysqld from pid file /var/lib/mysql/ip-172-21-83-203..pid ended

The configuration can be seen below.

Master 1:
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
datadir=/var/lib/mysql
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://172.21.69.153,172.21.83.203,172.21.85.132"
wsrep_cluster_name='galera_cluster'
wsrep_node_address='172.21.69.153'
wsrep_node_name='GaleraMaster1'
wsrep_sst_method=rsync
wsrep_sst_auth=sst_user:

{Removed Password From Here}
EOF

Master 2:
sudo cat >> /etc/my.cnf.d/server.cnf << EOF

binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
datadir=/var/lib/mysql
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://172.21.69.153,172.21.83.203,172.21.85.132"
wsrep_cluster_name='galera_cluster'
wsrep_node_address='172.21.83.203'
wsrep_node_name='GaleraMaster2'
wsrep_sst_method=rsync
wsrep_sst_auth=sst_user:{Removed Password From Here}

EOF

Master 3:
sudo cat >> /etc/my.cnf.d/server.cnf << EOF

binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
datadir=/var/lib/mysql
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://172.21.69.153,172.21.83.203,172.21.85.132"
wsrep_cluster_name='galera_cluster'
wsrep_node_address='172.21.85.132'
wsrep_node_name='GaleraMaster3'
wsrep_sst_method=rsync
wsrep_sst_auth=sst_user:

{Removed Password From Here}

EOF

Comment by Nirbhay Choubey (Inactive) [ 2016-03-15 ]

owaisnmalik Logs that you share are from the joiner node. Could you also share donor node's logs?

Generated at Thu Feb 08 07:36:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.