[MDEV-21002] Galera Cluster Node During IST Goes from "Synced" to "Joining: receiving State transfer" (stuck, requires kill -9) Created: 2019-11-07  Updated: 2023-03-20  Resolved: 2020-02-05

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.4.8, 10.4.9, 10.4.10, 10.4.11
Fix Version/s: 10.4.13, 10.5.2

Type: Bug Priority: Critical
Reporter: Justin Y Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 6
Labels: None
Environment:

RHEL 7 on x64 VM. MariaDB from MariaDB repo via Artifactory (not RHEL repo). No docker. 5-node cluster.


Attachments: Text File mariadb.txt     File my.cnf     PNG File screenshot-1.png    
Issue Links:
Relates
relates to MDEV-21008 Node Stuck in joining State Closed

 Description   

Summary: Appears Galera has difficulty switching the value for wsrep_local_state_comment from "Joining: receiving State transfer" back to "Synced" during network slowdowns (and subsequent IST) and then mysqld becomes unstable (unable to stop gracefully).

Solution that works most of the time: kill -9 the process, delete entire datastore on a cluster node, re-join the cluster.

Background: We built a new cluster from scratch using a fresh install of 10.4.8 (and 10.4.9). Imported data and grants fresh from SQL (no carry over of any data files). During some brief network outages, a random node will switch its value for wsrep_local_state_comment from "Synced" to "Joining: receiving State transfer" and stay there. No errors in the logs from either the donor or the random node and it looks like the sync successfully completed (no SST rsync processes or any other evidence transfer still in motion). Debug logging unhelpful. Trying to nicely stop MariaDB on random node not possible without kill -9.

Troubleshooting: Mixing and matching MariaDB 10.4.8 (older) and Galera 26.4.3 (newer) seemed to reduce the frequency of it happening, but it still happens. SST doesn't seem to have an issue.

Problem trigger: This situation is irritated into happening by temporary network loss and generally reproducible by the use of iptables to block cluster replication for a short period of time and then allowing the system to IST re-sync.



 Comments   
Comment by Oren Bissick (Inactive) [ 2019-11-08 ]

also having this issue.

Comment by Justin Y [ 2019-11-08 ]

@Oren Bissick - can you tell me which versions of MariaDB and Galera you are using? We're considering downgrading to the 10.3.x / 26.3.x series.

Comment by Oren Bissick (Inactive) [ 2019-11-08 ]

i'm using 10.4.8 and galera 26.4.2

Comment by Ibai Valencia [ 2019-11-15 ]

We are experiencing same thing with MariaDB 10.4.8 and Galera 26.4.2. This happens every now and then, at least once a day. The only way to solve this is killing one node and SST it.

I am running a huge platform on MariaDB+Galera 24/7 and this is so frustrating, as we cannot automatize the fix.

Maybe this is connected to this post from @Oren Bissick:
https://jira.mariadb.org/browse/MDEV-21008?jql=summary%20~%20%22Joining%3A%20receiving%20State%20Transfer*%22%20OR%20description%20~%20%22Joining%3A%20receiving%20State%20Transfer*%22%20ORDER%20BY%20lastViewed%20DESC

Comment by Justin Y [ 2019-11-15 ]

Linked to MDEV-21008 per Ibai Valencia.

Comment by Daniel Bray [ 2019-12-09 ]

I can throw a "me too" into this bucket. CentOS 7 latest patches with MariaDB 10.4.10 and Galera 26.4.3(r4535) . Everything appears to have been running fine for days, and then we perform a normal patch and reboot. Upon reboot, everything starts up correctly, and we see normal sync in the log files:
2019-12-09 12:45:25 1 [Note] WSREP: Requesting state transfer: success, donor: 2
2019-12-09 12:45:26 0 [Note] WSREP: 2.0 (dbserver-02): State transfer to 1.0 (dbserver-03) complete.
2019-12-09 12:45:27 0 [Note] WSREP: Server status change initializing -> initialized
2019-12-09 12:45:27 0 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2019-12-09 12:45:27 3 [Note] WSREP: Server status change initialized -> joined

We can check the other servers and they all show synced and a wsrep_cluster_size of 3. But on this 3rd server it is stuck with: wsrep_local_state_comment | Joining: receiving State Transfer

Comment by Miroslav Slugen [ 2019-12-19 ]

We hade same problem with version 10.4.10 and 10.4.11, it helped us to define wsrep_node_incoming_address and wsrep_sst_receive_address in cluster.cnf

Comment by Miroslav Slugen [ 2019-12-31 ]

So after alotof testing real reason why it was not working was innodb_flush_log_at_trx_commit = 0, when we set this to something else, currently 2, IST is working without any problems, currently we have 6 galera cluster servers, maybe it will work with 2 servers.

Comment by Stephan Vos [ 2020-01-02 ]

We have same issue on a 3 node cluster.
For example IST starts being applied from GCACHE on node 1 after a short outage but never finishes and node 2 stuck in joining state.
Luckily our database is still quite small so can fix with SST but its going to become large quite soon and SST would not be possible anymore.
Also running with innodb_flush_log_at_trx_commit = 0 on all 3 nodes and changed node 2 to innodb_flush_log_at_trx_commit = 2 before attempting IST again but same issue.
Should all nodes perhaps be set to innodb_flush_log_at_trx_commit = 2 or is this not a viable workaround?

Comment by Justin Y [ 2020-01-10 ]

Original poster here – so we just tried switching to "innodb_flush_log_at_trx_commit = 2" and this did not address the issue. Still awaiting feedback from @JanLindstrom on this issue. Still running MariaDB 10.4.8 and Galera 4-26.4.3.

Comment by Jan Lindström (Inactive) [ 2020-01-13 ]

The default setting of 1 is required for full ACID compliance. Logs are written and flushed to disk at each transaction commit. This is recommended setting for Galera also.

Comment by Jan Lindström (Inactive) [ 2020-01-13 ]

Can someone effected with correct configuration provide full unedited error log and configuration?

Comment by Brandon Patterson [ 2020-01-13 ]

Attached are the logs from when I attempted to try an IST with innodb_flush_log_at_trx_commit set to 2: mariadb.txt
The configuration: my.cnf

I am using iptables to temporarily block one instance from communicating with the rest of the cluster. After restoring network connectivity, I expect the database to reconnect to the cluster and receive an IST. The logs indicate an IST is sent, but then the database gets stuck in a "Joining; receiving state transfer" status. The database will not stop nicely, so it requires you to kill -9 it. I'm only able to rejoin the instance if I entirely clear the data directory and allow the instance to receive an SST.

Comment by Steve White [ 2020-01-24 ]

I have a cluster with heavy usage where I can consistently reproduce this issue. From what I can see it looks like when a pure IST completes, the applier threads get stuck and just sleep forever, however this doesn’t occur for the IST that occurs at the end of an SST.
If you require any further details around this, or if you want to experiment with different configurations, let me know.

Comment by W1zzardTPU [ 2020-01-25 ]

Just upgraded to 10.4 yesterday and now seeing this bug.

Affects 10.4.11, too. Exact same problem as the others describe, 100% repeatable. Only solution is a full SST

Comment by Cagdas Özgür [ 2020-01-27 ]

I have the same issue with MariaDB 10.4.11 and Galera 26.4.3.
Galera Node gets Stuck in joining State during IST, even after a restart.
Incremental State Transfer gets stuck and local_recv_queue increases continously.
Even a restart does not resolve the issue.
A Later restart works because Galera decides for an SST and it works.

Here is some Information on the Stuck Node:

wsrep_patch_version: "wsrep_26.22",
version: "10.4.11-MariaDB-1:10.4.11+maria~bionic-log",
wsrep_provider_version:	"26.4.3(r4535)",
cert_deps_distance: 0,
cluster_conf_id: 18446744073709552000,
cluster_size: 1,
cluster_status: "Primary",
connected: "ON",
flow_control_sent: 151491,
hostname: "mariadb-1.galera.openshift.svc.cluster.local",
local_cached_downto: 2,
local_recv_queue: 2241401,
local_recv_queue_avg: 1120700,
local_send_queue_avg: 0,
ready: "OFF",
seq_no: 2004,
status: "joining: receiving state transfer",
uuid: "02580b98-3ed9-11ea-b65f-cb25c42b48c5"

Comment by Matt Jones [ 2020-01-31 ]

I'm also seeing this on a new Galera cluster I'm setting up:

OS: Ubuntu 18.04
Kernel: 4.15.0-1054-aws
MariaDB: 10.4.12-MariaDB-1:10.4.12+maria~bionic
Galera: Galera 26.4.3(r4535)

 
[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_type=0
query_cache_size=0
bind-address=0.0.0.0
 
[galera]
wsrep_on=ON
 
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_provider_options="gcache.size=5G"
 
wsrep_cluster_name="mycluster"
wsrep_cluster_address="gcomm://[IP1],[IP2],[IP3],[IP4]"
 
wsrep_sst_method=rsync
 
wsrep_node_name="ES1_AWS"
wsrep_node_address="[IP1]"

All firewalls ports are open in range 1-65000 between all servers for the purposes of debugging this.

 
Jan 31 16:09:40 es1 systemd[1]: Starting MariaDB 10.4.12 database server...
Jan 31 16:09:43 es1 mysqld[16641]: WSREP: Recovered position 5da44a66-c997-11e9-84af-8a80ff105689:826172775
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] /usr/sbin/mysqld (mysqld 10.4.12-MariaDB-1:10.4.12+maria~bionic) starting as process 16868 ...
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 5da44a66-c997-11e9-84af-8a80ff105689:826172775
Jan 31 16:09:43 es1 mysqld[16868]: wsrep loader: [INFO] wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
Jan 31 16:09:43 es1 mysqld[16868]: wsrep loader: [INFO] wsrep_load(): Galera 26.4.3(r4535) by Codership Oy <info@codership.com> loaded successfully.
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: CRC-32C: using hardware acceleration.
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: galera/src/saved_state.cpp:SavedState():116: Found saved state: 5da44a66-c997-11e9-84af-8a80ff105689:-1, safe_to_bootstrap: 0
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcache/src/gcache_rb_store.cpp:open_preamble():632: GCache DEBUG: opened preamble:
Jan 31 16:09:43 es1 mysqld[16868]: Version: 2
Jan 31 16:09:43 es1 mysqld[16868]: UUID: 5da44a66-c997-11e9-84af-8a80ff105689
Jan 31 16:09:43 es1 mysqld[16868]: Seqno: -1 - -1
Jan 31 16:09:43 es1 mysqld[16868]: Offset: -1
Jan 31 16:09:43 es1 mysqld[16868]: Synced: 0
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcache/src/gcache_rb_store.cpp:open_preamble():643: Recovering GCache ring buffer: version: 2, UUID: 5da44a66-c997-11e9-84af-8a80ff105689, offset: -1
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: galerautils/src/gu_progress.hpp:report():34: GCache::RingBuffer initial scan...  0.0% (         0/5368709144 bytes) complete.
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: galerautils/src/gu_progress.hpp:report():34: GCache::RingBuffer initial scan...100.0% (5368709144/5368709144 bytes) complete.
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcache/src/gcache_rb_store.cpp:recover():945: Recovering GCache ring buffer: found gapless sequence 826172848-826172848
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcache/src/gcache_rb_store.cpp:recover():950: Recovering GCache ring buffer: discarding seqnos 826149024-826172841
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: galerautils/src/gu_progress.hpp:report():34: GCache::RingBuffer unused buffers scan...  0.0% (         0/3363837520 bytes) complete.
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: galerautils/src/gu_progress.hpp:report():34: GCache::RingBuffer unused buffers scan...100.0% (3363837520/3363837520 bytes) complete.
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcache/src/gcache_rb_store.cpp:recover():1053: GCache DEBUG: RingBuffer::recover(): found 24404/24410 locked buffers
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcache/src/gcache_rb_store.cpp:recover():1055: GCache DEBUG: RingBuffer::recover(): used space: 3363837520/5368709120
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: galera/src/galera_gcs.hpp:Gcs():92: Passing config to GCS: base_dir = /media/storage/mysql/; base_host = 172.x.x.x; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = yes; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /media/storage/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 5G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: galera/src/galera_service_thd.cpp:thd_func():39: Service thread queue flushed.
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: galera/src/certification.cpp:assign_initial_position():1052: ####### Assign initial position for certification: 5da44a66-c997-11e9-84af-8a80ff105689:826172775, protocol version: -1
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: Start replication
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: Connecting with bootstrap option: 0
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: galera/src/replicator_smm.cpp:connect():323: Setting GCS initial position to 5da44a66-c997-11e9-84af-8a80ff105689:826172775
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/protonet.cpp:create():72: protonet asio version 0
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/datagram.cpp:checksum_type():26: Using CRC-32C for message checksums.
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcs/src/gcs_gcomm.cpp:GCommConn():197: backend: asio
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcs/src/gcs_gcomm.cpp:connect():245: gcomm thread scheduling priority set to other:0
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Warning] WSREP: gcomm/src/view.cpp:read_file():323: access file(/media/storage/mysql//gvwstate.dat) failed(No such file or directory)
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/pc.cpp:PC():260: restore pc from disk failed
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/gmcast.cpp:GMCast():123: GMCast version 0
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/gmcast.cpp:GMCast():216: (17dcac4f, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/gmcast.cpp:GMCast():217: (17dcac4f, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/evs_proto.cpp:Proto():199: EVS version 1
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcs/src/gcs_gcomm.cpp:connect():275: gcomm: connecting to group 'mycluster', peer '172.x.x.x:,172.x.x.x:,172.x.x.x:,172.x.x.x:'
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/gmcast_proto.cpp:validate_handshake_uuid():135: (17dcac4f, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://172.x.x.x:4567
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/gmcast.cpp:handle_established():665: (17dcac4f, 'tcp://0.0.0.0:4567') connection established to fefc48c4 tcp://172.x.x.x:4567
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/gmcast.cpp:check_liveness():1230: (17dcac4f, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/gmcast.cpp:handle_established():665: (17dcac4f, 'tcp://0.0.0.0:4567') connection established to ce18e18b tcp://172.x.x.x:4567
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/gmcast.cpp:handle_established():665: (17dcac4f, 'tcp://0.0.0.0:4567') connection established to cada4b4c tcp://172.x.x.x:4567
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/evs_proto.cpp:shift_to():2849: EVS version upgrade 0 -> 1
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/gmcast.cpp:handle_stable_view():1682: declaring cada4b4c at tcp://172.x.x.x:4567 stable
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/gmcast.cpp:handle_stable_view():1682: declaring ce18e18b at tcp://172.x.x.x:4567 stable
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/gmcast.cpp:handle_stable_view():1682: declaring fefc48c4 at tcp://172.x.x.x:4567 stable
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/pc_proto.cpp:handle_reg():601: PC protocol upgrade 0 -> 1
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/pc_proto.cpp:is_prim():838: Node cada4b4c state prim
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/pc_proto.cpp:deliver_view():251: view(view_id(PRIM,17dcac4f,4286) memb {
Jan 31 16:09:43 es1 mysqld[16868]: #01117dcac4f,0
Jan 31 16:09:43 es1 mysqld[16868]: #011cada4b4c,0
Jan 31 16:09:43 es1 mysqld[16868]: #011ce18e18b,0
Jan 31 16:09:43 es1 mysqld[16868]: #011fefc48c4,0
Jan 31 16:09:43 es1 mysqld[16868]: } joined {
Jan 31 16:09:43 es1 mysqld[16868]: } left {
Jan 31 16:09:43 es1 mysqld[16868]: } partitioned {
Jan 31 16:09:43 es1 mysqld[16868]: })
Jan 31 16:09:43 es1 mysqld[16868]: 2020-01-31 16:09:43 0 [Note] WSREP: gcomm/src/pc.cpp:handle_up():30: save pc into disk
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_gcomm.cpp:connect():285: gcomm: connected
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_core.cpp:gcs_core_set_pkt_size():1374: Changing maximum packet size to 64500, resulting msg size: 32636
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs.cpp:gcs_shift_state():663: Shifting CLOSED -> OPEN (TO: 0)
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs.cpp:gcs_open():1595: Opened channel 'mycluster'
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_comp_msg():534: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 4
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_core.cpp:core_handle_comp_msg():834: STATE_EXCHANGE: sent state UUID: 18292ef8-4444-11ea-9477-ef1314828532
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 1 [Note] WSREP: Starting rollbacker thread 1
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: Starting applier thread 2
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_core.cpp:core_handle_uuid_msg():929: STATE EXCHANGE: sent state msg: 18292ef8-4444-11ea-9477-ef1314828532
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_state_msg():722: STATE EXCHANGE: got state msg: 18292ef8-4444-11ea-9477-ef1314828532 from 0 (ES1_AWS)
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_group.cpp:group_print_state_debug():702:
Jan 31 16:09:44 es1 mysqld[16868]: #011Version      : 6
Jan 31 16:09:44 es1 mysqld[16868]: #011Flags        : 0x1
Jan 31 16:09:44 es1 mysqld[16868]: #011Protocols    : 2 / 10 / 4
Jan 31 16:09:44 es1 mysqld[16868]: #011State        : NON-PRIMARY
Jan 31 16:09:44 es1 mysqld[16868]: #011Desync count : 0
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim state   : NON-PRIMARY
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim UUID    : 00000000-0000-0000-0000-000000000000
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim  seqno  : -1
Jan 31 16:09:44 es1 mysqld[16868]: #011First seqno  : -1
Jan 31 16:09:44 es1 mysqld[16868]: #011Last  seqno  : 826172775
Jan 31 16:09:44 es1 mysqld[16868]: #011Commit cut   : 826172775
Jan 31 16:09:44 es1 mysqld[16868]: #011Last vote    : -1.0
Jan 31 16:09:44 es1 mysqld[16868]: #011Vote policy  : 0
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim JOINED  : 0
Jan 31 16:09:44 es1 mysqld[16868]: #011State UUID   : 18292ef8-4444-11ea-9477-ef1314828532
Jan 31 16:09:44 es1 mysqld[16868]: #011Group UUID   : 5da44a66-c997-11e9-84af-8a80ff105689
Jan 31 16:09:44 es1 mysqld[16868]: #011Name         : 'ES1_AWS'
Jan 31 16:09:44 es1 mysqld[16868]: #011Incoming addr: 'AUTO'
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_state_msg():722: STATE EXCHANGE: got state msg: 18292ef8-4444-11ea-9477-ef1314828532 from 1 (ES4_AWS)
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_group.cpp:group_print_state_debug():702:
Jan 31 16:09:44 es1 mysqld[16868]: #011Version      : 6
Jan 31 16:09:44 es1 mysqld[16868]: #011Flags        : 0x2
Jan 31 16:09:44 es1 mysqld[16868]: #011Protocols    : 2 / 10 / 4
Jan 31 16:09:44 es1 mysqld[16868]: #011State        : SYNCED
Jan 31 16:09:44 es1 mysqld[16868]: #011Desync count : 0
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim state   : SYNCED
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim UUID    : 57fd2bf2-4443-11ea-99e8-338c93f60d4a
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim  seqno  : 4162
Jan 31 16:09:44 es1 mysqld[16868]: #011First seqno  : 826330871
Jan 31 16:09:44 es1 mysqld[16868]: #011Last  seqno  : 826330973
Jan 31 16:09:44 es1 mysqld[16868]: #011Commit cut   : 826330870
Jan 31 16:09:44 es1 mysqld[16868]: #011Last vote    : -1.0
Jan 31 16:09:44 es1 mysqld[16868]: #011Vote policy  : 0
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim JOINED  : 3
Jan 31 16:09:44 es1 mysqld[16868]: #011State UUID   : 18292ef8-4444-11ea-9477-ef1314828532
Jan 31 16:09:44 es1 mysqld[16868]: #011Group UUID   : 5da44a66-c997-11e9-84af-8a80ff105689
Jan 31 16:09:44 es1 mysqld[16868]: #011Name         : 'ES4_AWS'
Jan 31 16:09:44 es1 mysqld[16868]: #011Incoming addr: 'AUTO'
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_state_msg():722: STATE EXCHANGE: got state msg: 18292ef8-4444-11ea-9477-ef1314828532 from 2 (ES3_AWS)
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_group.cpp:group_print_state_debug():702:
Jan 31 16:09:44 es1 mysqld[16868]: #011Version      : 6
Jan 31 16:09:44 es1 mysqld[16868]: #011Flags        : 0x2
Jan 31 16:09:44 es1 mysqld[16868]: #011Protocols    : 2 / 10 / 4
Jan 31 16:09:44 es1 mysqld[16868]: #011State        : SYNCED
Jan 31 16:09:44 es1 mysqld[16868]: #011Desync count : 0
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim state   : SYNCED
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim UUID    : 57fd2bf2-4443-11ea-99e8-338c93f60d4a
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim  seqno  : 4162
Jan 31 16:09:44 es1 mysqld[16868]: #011First seqno  : 826033913
Jan 31 16:09:44 es1 mysqld[16868]: #011Last  seqno  : 826330973
Jan 31 16:09:44 es1 mysqld[16868]: #011Commit cut   : 826330870
Jan 31 16:09:44 es1 mysqld[16868]: #011Last vote    : -1.0
Jan 31 16:09:44 es1 mysqld[16868]: #011Vote policy  : 0
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim JOINED  : 3
Jan 31 16:09:44 es1 mysqld[16868]: #011State UUID   : 18292ef8-4444-11ea-9477-ef1314828532
Jan 31 16:09:44 es1 mysqld[16868]: #011Group UUID   : 5da44a66-c997-11e9-84af-8a80ff105689
Jan 31 16:09:44 es1 mysqld[16868]: #011Name         : 'ES3_AWS'
Jan 31 16:09:44 es1 mysqld[16868]: #011Incoming addr: 'AUTO'
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_state_msg():722: STATE EXCHANGE: got state msg: 18292ef8-4444-11ea-9477-ef1314828532 from 3 (ES2_AWS)
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_group.cpp:group_print_state_debug():702:
Jan 31 16:09:44 es1 mysqld[16868]: #011Version      : 6
Jan 31 16:09:44 es1 mysqld[16868]: #011Flags        : 0x2
Jan 31 16:09:44 es1 mysqld[16868]: #011Protocols    : 2 / 10 / 4
Jan 31 16:09:44 es1 mysqld[16868]: #011State        : SYNCED
Jan 31 16:09:44 es1 mysqld[16868]: #011Desync count : 0
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim state   : SYNCED
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim UUID    : 57fd2bf2-4443-11ea-99e8-338c93f60d4a
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim  seqno  : 4162
Jan 31 16:09:44 es1 mysqld[16868]: #011First seqno  : 826033913
Jan 31 16:09:44 es1 mysqld[16868]: #011Last  seqno  : 826330973
Jan 31 16:09:44 es1 mysqld[16868]: #011Commit cut   : 826330870
Jan 31 16:09:44 es1 mysqld[16868]: #011Last vote    : -1.0
Jan 31 16:09:44 es1 mysqld[16868]: #011Vote policy  : 0
Jan 31 16:09:44 es1 mysqld[16868]: #011Prim JOINED  : 3
Jan 31 16:09:44 es1 mysqld[16868]: #011State UUID   : 18292ef8-4444-11ea-9477-ef1314828532
Jan 31 16:09:44 es1 mysqld[16868]: #011Group UUID   : 5da44a66-c997-11e9-84af-8a80ff105689
Jan 31 16:09:44 es1 mysqld[16868]: #011Name         : 'ES2_AWS'
Jan 31 16:09:44 es1 mysqld[16868]: #011Incoming addr: 'AUTO'
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_group.cpp:group_post_state_exchange():492: Quorum results:
Jan 31 16:09:44 es1 mysqld[16868]: #011version    = 6,
Jan 31 16:09:44 es1 mysqld[16868]: #011component  = PRIMARY,
Jan 31 16:09:44 es1 mysqld[16868]: #011conf_id    = 4162,
Jan 31 16:09:44 es1 mysqld[16868]: #011members    = 3/4 (joined/total),
Jan 31 16:09:44 es1 mysqld[16868]: #011act_id     = 826330973,
Jan 31 16:09:44 es1 mysqld[16868]: #011last_appl. = 826330870,
Jan 31 16:09:44 es1 mysqld[16868]: #011protocols  = 2/10/4 (gcs/repl/appl),
Jan 31 16:09:44 es1 mysqld[16868]: #011vote policy= 0,
Jan 31 16:09:44 es1 mysqld[16868]: #011group UUID = 5da44a66-c997-11e9-84af-8a80ff105689
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs.cpp:_set_fc_limits():864: Flow-control interval: [32, 32]
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs.cpp:gcs_shift_state():663: Shifting OPEN -> PRIMARY (TO: 826330974)
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: galera/src/replicator_smm.cpp:process_conf_change():2448: ####### processing CC 826330974, local, ordered
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: galera/src/replicator_smm.cpp:establish_protocol_versions():2347: REPL Protocols: 10 (5, 3)
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: galera/src/replicator_smm.cpp:process_conf_change():2540: ####### My UUID: 17dcac4f-4444-11ea-b078-f22376c9eaa5
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: Server ES1_AWS connected to cluster at position 5da44a66-c997-11e9-84af-8a80ff105689:826330974 with ID 17dcac4f-4444-11ea-b078-f22376c9eaa5
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: Server status change disconnected -> connected
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: galera/src/replicator_smm.cpp:process_conf_change():2585: State transfer required:
Jan 31 16:09:44 es1 mysqld[16868]: #011Group state: 5da44a66-c997-11e9-84af-8a80ff105689:826330974
Jan 31 16:09:44 es1 mysqld[16868]: #011Local state: 5da44a66-c997-11e9-84af-8a80ff105689:826172775
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: Server status change connected -> joiner
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: Joiner monitor thread started to monitor
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.x.x.x' --datadir '/media/storage/mysql/' --parent '16868' --mysqld-args --wsrep_start_position=5da44a66-c997-11e9-84af-8a80ff105689:826172775'
Jan 31 16:09:44 es1 rsyncd[16937]: rsyncd version 3.1.2 starting, listening on port 4444
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: Prepared SST request: rsync|172.x.x.x:4444/rsync_sst
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: galera/src/replicator_smm.cpp:process_conf_change():2664: Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: galera/src/galera_service_thd.cpp:thd_func():39: Service thread queue flushed.
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: galera/src/certification.cpp:assign_initial_position():1052: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 5
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: galera/src/replicator_str.cpp:prepare_for_IST():581: ####### IST uuid:5da44a66-c997-11e9-84af-8a80ff105689 f: 826172776, l: 826330974, STRv: 3
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: galera/src/ist.cpp:IST_determine_recv_addr():229: IST receiver addr using tcp://172.x.x.x:4568
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: galera/src/ist.cpp:prepare():355: Prepared IST receiver for 826172776-826330974, listening at: tcp://172.x.x.x:4568
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_group.cpp:group_select_donor():1721: Member 0.0 (ES1_AWS) requested state transfer from '*any*'. Selected 3.0 (ES2_AWS)(SYNCED) as donor.
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs.cpp:gcs_shift_state():663: Shifting PRIMARY -> JOINER (TO: 826330974)
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 2 [Note] WSREP: galera/src/replicator_str.cpp:send_state_request():757: Requesting state transfer: success, donor: 3
Jan 31 16:09:44 es1 rsyncd[16958]: connect from ip-172-x-x-x.eu-west-1.compute.internal (172.x.x.x)
Jan 31 16:09:44 es1 rsyncd[16958]: rsync to rsync_sst/ from ip-172-x-x-x.eu-west-1.compute.internal (172.x.x.x)
Jan 31 16:09:44 es1 rsyncd[16958]: receiving file list
Jan 31 16:09:44 es1 rsyncd[16958]: sent 48 bytes  received 193 bytes  total size 49
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1205: 3.0 (ES2_AWS): State transfer to 0.0 (ES1_AWS) complete.
Jan 31 16:09:44 es1 mysqld[16868]: 2020-01-31 16:09:44 0 [Note] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_sync_msg():1252: Member 3.0 (ES2_AWS) synced with group.
Jan 31 16:09:45 es1 mysqld[16868]: WSREP_SST: [INFO] Joiner cleanup. rsync PID: 16937 (20200131 16:09:45.333)
Jan 31 16:09:45 es1 rsyncd[16937]: sent 0 bytes  received 0 bytes  total size 0
Jan 31 16:09:45 es1 mysqld[16868]: WSREP_SST: [INFO] Joiner cleanup done. (20200131 16:09:45.840)
Jan 31 16:09:45 es1 mysqld[16868]: 2020-01-31 16:09:45 3 [Note] WSREP: SST received
Jan 31 16:09:45 es1 mysqld[16868]: 2020-01-31 16:09:45 3 [Note] WSREP: Server status change joiner -> initializing
Jan 31 16:09:45 es1 mysqld[16868]: 2020-01-31 16:09:45 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Jan 31 16:09:45 es1 mysqld[16868]: 2020-01-31 16:09:45 0 [Note] InnoDB: Using Linux native AIO
Jan 31 16:09:45 es1 mysqld[16868]: 2020-01-31 16:09:45 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
Jan 31 16:09:45 es1 mysqld[16868]: 2020-01-31 16:09:45 0 [Note] InnoDB: Uses event mutexes
Jan 31 16:09:45 es1 mysqld[16868]: 2020-01-31 16:09:45 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
Jan 31 16:09:45 es1 mysqld[16868]: 2020-01-31 16:09:45 0 [Note] InnoDB: Number of pools: 1
Jan 31 16:09:45 es1 mysqld[16868]: 2020-01-31 16:09:45 0 [Note] InnoDB: Using SSE2 crc32 instructions
Jan 31 16:09:45 es1 mysqld[16868]: 2020-01-31 16:09:45 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)
Jan 31 16:09:45 es1 mysqld[16868]: 2020-01-31 16:09:45 0 [Note] InnoDB: Initializing buffer pool, total size = 512M, instances = 1, chunk size = 128M
Jan 31 16:09:45 es1 mysqld[16868]: 2020-01-31 16:09:45 0 [Note] InnoDB: Completed initialization of buffer pool
Jan 31 16:09:45 es1 mysqld[16868]: 2020-01-31 16:09:45 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] InnoDB: Creating shared tablespace for temporary tables
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] InnoDB: 10.4.12 started; log sequence number 499333110193; transaction id 1210753653
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] InnoDB: Loading buffer pool(s) from /media/storage/mysql/ib_buffer_pool
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] InnoDB: Cannot open '/media/storage/mysql/ib_buffer_pool' for reading: No such file or directory
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Plugin 'FEEDBACK' is disabled.
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Server socket created on IP: '0.0.0.0'.
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Zerofilling moved table:  './mysql/db'
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Zerofilling moved table:  './mysql/proxies_priv'
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Zerofilling moved table:  './mysql/roles_mapping'
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Zerofilling moved table:  './mysql/time_zone_leap_second'
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Zerofilling moved table:  './mysql/time_zone_name'
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Zerofilling moved table:  './mysql/time_zone'
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Zerofilling moved table:  './mysql/time_zone_transition_type'
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Zerofilling moved table:  './mysql/time_zone_transition'
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Zerofilling moved table:  './mysql/tables_priv'
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Zerofilling moved table:  './mysql/columns_priv'
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Zerofilling moved table:  './mysql/procs_priv'
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Zerofilling moved table:  './mysql/func'
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Zerofilling moved table:  './mysql/event'
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: wsrep_init_schema_and_SR (nil)
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: Server initialized
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: Server status change initializing -> initialized
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 3 [Note] WSREP: Server status change initialized -> joined
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 3 [Note] WSREP: Recovered position from storage: 5da44a66-c997-11e9-84af-8a80ff105689:826172775
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Reading of all Master_info entries succeeded
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] Added new Master_info '' to hash table
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] /usr/sbin/mysqld: ready for connections.
Jan 31 16:09:46 es1 mysqld[16868]: Version: '10.4.12-MariaDB-1:10.4.12+maria~bionic'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 3 [Note] WSREP: Recovered view from SST:
Jan 31 16:09:46 es1 mysqld[16868]:   id: 5da44a66-c997-11e9-84af-8a80ff105689:775085383
Jan 31 16:09:46 es1 mysqld[16868]:   status: primary
Jan 31 16:09:46 es1 mysqld[16868]:   protocol_version: 4
Jan 31 16:09:46 es1 mysqld[16868]:   capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
Jan 31 16:09:46 es1 mysqld[16868]:   final: no
Jan 31 16:09:46 es1 mysqld[16868]:   own_index: -1
Jan 31 16:09:46 es1 mysqld[16868]:   members(4):
Jan 31 16:09:46 es1 mysqld[16868]: #0110: cada4b4c-3dde-11ea-b534-6f59e27348a3, ES4_AWS
Jan 31 16:09:46 es1 mysqld[16868]: #0111: ce18e18b-3ddd-11ea-9c2b-8a43433a36d1, ES3_AWS
Jan 31 16:09:46 es1 mysqld[16868]: #0112: de357997-3dda-11ea-ab67-164667f0f648, ES1_AWS
Jan 31 16:09:46 es1 mysqld[16868]: #0113: fefc48c4-3ddb-11ea-a3d6-4219f67aa88e, ES2_AWS
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 12 [Note] WSREP: Recovered cluster id 5da44a66-c997-11e9-84af-8a80ff105689
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 3 [Note] WSREP: galera/src/replicator_str.cpp:sst_received():63: SST received: 5da44a66-c997-11e9-84af-8a80ff105689:826172775
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: Joiner monitor thread ended with total time 2 sec
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 2 [Note] WSREP: galera/src/replicator_str.cpp:request_state_transfer():891: Installed new state from SST: 5da44a66-c997-11e9-84af-8a80ff105689:826172775
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 2 [Note] WSREP: galera/src/replicator_str.cpp:request_state_transfer():941: Receiving IST: 158199 writesets, seqnos 826172776-826330974
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/ist.cpp:run():424: ####### IST applying starts with 826172776
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/ist.cpp:run():468: ####### IST current seqno initialized to 826172776
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galerautils/src/gu_progress.hpp:report():34: Receiving IST...  0.0% (     0/158199 events) complete.
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/galera_service_thd.cpp:thd_func():39: Service thread queue flushed.
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/certification.cpp:assign_initial_position():1052: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:826172775, protocol version: 5
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/replicator_smm.cpp:process_conf_change():2448: ####### processing CC 826172811, from IST, ordered
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/replicator_smm.cpp:establish_protocol_versions():2347: REPL Protocols: 10 (5, 3)
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/replicator_smm.cpp:process_conf_change():2618: ####### ST not required
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/replicator_smm.cpp:process_conf_change():2672: Skipping cert index reset
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/certification.cpp:adjust_position():1078: ####### Adjusting cert position: 826172775 -> 826172811
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/galera_service_thd.cpp:thd_func():39: Service thread queue flushed.
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/replicator_smm.cpp:process_conf_change():2702: ####### Setting monitor position to 826172811
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/replicator_smm.cpp:record_cc_seqnos():2356: Lowest cert indnex boundary for CC from group: 826172811
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/replicator_smm.cpp:record_cc_seqnos():2358: Min available from gcache for CC from group: 826172776
Jan 31 16:09:46 es1 mysqld[16868]: 2020-01-31 16:09:46 0 [Note] WSREP: galera/src/replicator_smm.cpp:process_conf_change():2448: ####### processing CC 826172989, from IST, ordered
Jan 31 16:09:46 es1 systemd[1]: Started MariaDB 10.4.12 database server.
Jan 31 16:09:46 es1 /etc/mysql/debian-start[17020]: Upgrading MySQL tables if necessary.
Jan 31 16:09:46 es1 /etc/mysql/debian-start[17024]: Looking for 'mysql' as: /usr/bin/mysql
Jan 31 16:09:46 es1 /etc/mysql/debian-start[17024]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck
Jan 31 16:09:46 es1 /etc/mysql/debian-start[17024]: This installation of MariaDB is already upgraded to 10.4.12-MariaDB, use --force if you still need to run mysql_upgrade
Jan 31 16:09:46 es1 /etc/mysql/debian-start[17034]: Checking for insecure root accounts.
Jan 31 16:09:46 es1 mysqld[17018]: ERROR 1047 (08S01) at line 1: WSREP has not yet prepared node for application use
Jan 31 16:09:47 es1 mysqld[16868]: 2020-01-31 16:09:47 0 [Note] WSREP: gcomm/src/gmcast.cpp:check_liveness():1271: (17dcac4f, 'tcp://0.0.0.0:4567') turning message relay requesting off

Comment by Levieux stéphane [ 2020-02-04 ]

It seems we have the same problem :
We have a 3 nodes cluster and today, the cluster view indicates a node seems to have difficulties to communicate (juste a few secondes) with others nodes but quickly wanted to rejoin the cluster
then in logs " : Prepared IST receiver for 353616809-353616859, listening at: tcp://10.253.253.94:4568 "
and no more actions ... Normaly next log should indicate the donnor choosen .
We were not at office ..after 30 min, we had to stop this nodes (the process mysqld ) and after a start it dones a SST successfully and join the cluster . (the gcache was surely exceded (delay of 30 min ))
It's not the first time, since 10.4.x we have more cluster problems than before

NB : Each node of our cluster restart one a week and after a restart it done a IST without any problem so pehaps the problem occurs "only" when the mariadb process is running .

here the full event on this node

2020-02-04 12:20:48 0 [Note] WSREP: (e3c4e6be, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.253.253.93:4567 tcp://10.253.253.95:4567
2020-02-04 12:20:49 0 [Note] WSREP: (e3c4e6be, 'tcp://0.0.0.0:4567') reconnecting to 9da1d772 (tcp://10.253.253.95:4567), attempt 0
2020-02-04 12:20:49 0 [Note] WSREP: (e3c4e6be, 'tcp://0.0.0.0:4567') connection established to 9da1d772 tcp://10.253.253.95:4567
2020-02-04 12:20:49 0 [Note] WSREP: (e3c4e6be, 'tcp://0.0.0.0:4567') reconnecting to 773f3d4c (tcp://10.253.253.93:4567), attempt 0
2020-02-04 12:20:49 0 [Note] WSREP: (e3c4e6be, 'tcp://0.0.0.0:4567') connection established to 773f3d4c tcp://10.253.253.93:4567
2020-02-04 12:20:49 190926 [Warning] Aborted connection 190926 to db: 'unconnected' user: 'unauthenticated' host: '10.253.253.80' (This connection closed normally without authentication)
2020-02-04 12:20:50 0 [Note] WSREP: (e3c4e6be, 'tcp://0.0.0.0:4567') reconnecting to 9da1d772 (tcp://10.253.253.95:4567), attempt 0
2020-02-04 12:20:50 0 [Note] WSREP: (e3c4e6be, 'tcp://0.0.0.0:4567') connection established to 9da1d772 tcp://10.253.253.95:4567
2020-02-04 12:20:50 190927 [Warning] Aborted connection 190927 to db: 'unconnected' user: 'unauthenticated' host: '10.253.253.81' (This connection closed normally without authentication)
2020-02-04 12:20:51 0 [Note] WSREP: (e3c4e6be, 'tcp://0.0.0.0:4567') reconnecting to 773f3d4c (tcp://10.253.253.93:4567), attempt 0
2020-02-04 12:20:51 0 [Note] WSREP: (e3c4e6be, 'tcp://0.0.0.0:4567') connection established to 773f3d4c tcp://10.253.253.93:4567
2020-02-04 12:20:51 0 [Note] WSREP: (e3c4e6be, 'tcp://0.0.0.0:4567') connection established to 9da1d772 tcp://10.253.253.95:4567
2020-02-04 12:20:52 0 [Note] WSREP: evs::proto(e3c4e6be, OPERATIONAL, view_id(REG,773f3d4c,5)) suspecting node: 9da1d772
2020-02-04 12:20:52 0 [Note] WSREP: evs::proto(e3c4e6be, OPERATIONAL, view_id(REG,773f3d4c,5)) suspected node without join message, declaring inactive
2020-02-04 12:20:53 0 [Note] WSREP: evs::proto(e3c4e6be, GATHER, view_id(REG,773f3d4c,5)) suspecting node: 773f3d4c
2020-02-04 12:20:53 0 [Note] WSREP: evs::proto(e3c4e6be, GATHER, view_id(REG,773f3d4c,5)) suspected node without join message, declaring inactive
2020-02-04 12:20:53 0 [Note] WSREP: view(view_id(NON_PRIM,773f3d4c,5) memb

Unknown macro: { e3c4e6be,0 }

joined {
} left {
} partitioned

Unknown macro: { 773f3d4c,0 9da1d772,0 }

)
2020-02-04 12:20:53 0 [Note] WSREP: view(view_id(NON_PRIM,e3c4e6be,6) memb

joined {
} left {
} partitioned

Unknown macro: { 773f3d4c,0 9da1d772,0 }

)
2020-02-04 12:20:53 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2020-02-04 12:20:53 0 [Note] WSREP: Flow-control interval: [16, 16]
2020-02-04 12:20:53 0 [Note] WSREP: Received NON-PRIMARY.
2020-02-04 12:20:53 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 353616808)
2020-02-04 12:20:53 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2020-02-04 12:20:53 11 [Note] WSREP: ####### processing CC -1, local, ordered
2020-02-04 12:20:53 11 [Note] WSREP: ####### My UUID: e3c4e6be-4668-11ea-87e9-3afb72a56194
2020-02-04 12:20:53 11 [Note] WSREP: ####### ST not required
2020-02-04 12:20:53 11 [Note] WSREP: ================================================
View:
id: 05f55365-1970-11e9-89d3-d7b7715e6b83:-1
status: non-primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: no
own_index: 0
members(1):
0: e3c4e6be-4668-11ea-87e9-3afb72a56194, node5
=================================================
2020-02-04 12:20:53 11 [Note] WSREP: Non-primary view
2020-02-04 12:20:53 11 [Note] WSREP: Server status change synced -> connected
2020-02-04 12:20:53 11 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-02-04 12:20:53 11 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-02-04 12:20:53 0 [Note] WSREP: Flow-control interval: [16, 16]
2020-02-04 12:20:53 0 [Note] WSREP: Received NON-PRIMARY.
2020-02-04 12:20:53 22 [Note] WSREP: ####### processing CC -1, local, ordered
2020-02-04 12:20:53 22 [Note] WSREP: ####### My UUID: e3c4e6be-4668-11ea-87e9-3afb72a56194
2020-02-04 12:20:53 22 [Note] WSREP: ####### ST not required
2020-02-04 12:20:53 22 [Note] WSREP: ================================================
View:
id: 05f55365-1970-11e9-89d3-d7b7715e6b83:-1
status: non-primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
final: no
own_index: 0
members(1):
0: e3c4e6be-4668-11ea-87e9-3afb72a56194, node5
=================================================
2020-02-04 12:20:53 22 [Note] WSREP: Non-primary view
2020-02-04 12:20:53 22 [Note] WSREP: Server status change connected -> connected
2020-02-04 12:20:53 22 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-02-04 12:20:53 22 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-02-04 12:20:53 0 [Note] WSREP: declaring 773f3d4c at tcp://10.253.253.93:4567 stable
2020-02-04 12:20:53 0 [Note] WSREP: declaring 9da1d772 at tcp://10.253.253.95:4567 stable
2020-02-04 12:20:53 0 [Note] WSREP: Node 773f3d4c state prim
2020-02-04 12:20:53 0 [Note] WSREP: view(view_id(PRIM,773f3d4c,7) memb

Unknown macro: { 773f3d4c,0 9da1d772,0 e3c4e6be,0 }

joined {
} left {
} partitioned {
})
2020-02-04 12:20:53 0 [Note] WSREP: save pc into disk
2020-02-04 12:20:53 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2020-02-04 12:20:53 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2020-02-04 12:20:53 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 68394f57-4740-11ea-b789-8fe3ad62108d
2020-02-04 12:20:53 0 [Note] WSREP: STATE EXCHANGE: got state msg: 68394f57-4740-11ea-b789-8fe3ad62108d from 0 (node4)
2020-02-04 12:20:53 0 [Note] WSREP: STATE EXCHANGE: got state msg: 68394f57-4740-11ea-b789-8fe3ad62108d from 1 (node6)
2020-02-04 12:20:53 0 [Note] WSREP: STATE EXCHANGE: got state msg: 68394f57-4740-11ea-b789-8fe3ad62108d from 2 (node5)
2020-02-04 12:20:53 0 [Note] WSREP: Quorum results:
version = 6,
component = PRIMARY,
conf_id = 6,
members = 2/3 (joined/total),
act_id = 353616858,
last_appl. = 353616682,
protocols = 2/10/4 (gcs/repl/appl),
vote policy= 0,
group UUID = 05f55365-1970-11e9-89d3-d7b7715e6b83
2020-02-04 12:20:53 0 [Note] WSREP: Flow-control interval: [28, 28]
2020-02-04 12:20:53 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 353616859)
2020-02-04 12:20:53 22 [Note] WSREP: ####### processing CC 353616859, local, ordered
2020-02-04 12:20:53 22 [Note] WSREP: REPL Protocols: 10 (5, 3)
2020-02-04 12:20:53 22 [Note] WSREP: ####### My UUID: e3c4e6be-4668-11ea-87e9-3afb72a56194
2020-02-04 12:20:53 22 [Note] WSREP: State transfer required:
Group state: 05f55365-1970-11e9-89d3-d7b7715e6b83:353616859
Local state: 05f55365-1970-11e9-89d3-d7b7715e6b83:353616808
2020-02-04 12:20:53 22 [Note] WSREP: Server status change connected -> joiner
2020-02-04 12:20:53 22 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-02-04 12:20:53 22 [Note] WSREP: You have configured 'mariabackup' state snapshot transfer method which cannot be performed on a running server. Wsrep provider won't be able to fall back to it if other means of state transfer are unavailable. In that case you will need to restart the server.
2020-02-04 12:20:53 22 [Note] WSREP: Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
2020-02-04 12:20:54 0 [Note] WSREP: Service thread queue flushed.
2020-02-04 12:20:54 22 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 5
2020-02-04 12:20:54 22 [Note] WSREP: ####### IST uuid:05f55365-1970-11e9-89d3-d7b7715e6b83 f: 353616809, l: 353616859, STRv: 3
2020-02-04 12:20:54 22 [Note] WSREP: IST receiver addr using tcp://10.253.253.94:4568
2020-02-04 12:20:54 22 [Note] WSREP: Prepared IST receiver for 353616809-353616859, listening at: tcp://10.253.253.94:4568
2020-02-04 12:20:54 0 [Note] WSREP: (e3c4e6be, 'tcp://0.0.0.0:4567') turning message relay requesting off

Comment by Justin Y [ 2020-02-05 ]

Raised to Critical and removed 'needs_feedback' tag in hopes to get attention of the MariaDB developers.

Comment by Jan Lindström (Inactive) [ 2020-02-05 ]

Fixed on galera library 25.3.29

Comment by Stephan Vos [ 2020-02-05 ]

Is there a release date for this fix version?
We are doing a big migration on the 9th of Feb and would like to upgrade as soon as possible because we have run into the issue with our initial Production deployment and after the next migration SST would not be possible due to the resulting size of our Production Database.

Comment by Brandon Patterson [ 2020-02-05 ]

Jan, will there be an updated 26.4.x Galera release for this bug as well?

Comment by Jan Lindström (Inactive) [ 2020-02-05 ]

Yes, 26.4.4 Galera will be released.

Comment by Stephan Vos [ 2020-02-06 ]

Jan, we are running MariaDB 10.4.7, is the a new version available with the fix?

Comment by Jan Lindström (Inactive) [ 2020-02-06 ]

In theory you should be just upgrade the galera library to new version and it should work with 10.4.7 also, but I have not personally tested it.

Comment by Stephan Vos [ 2020-02-14 ]

Is there a planned release date for 10.4.13?

Comment by Nilesh [ 2020-03-17 ]

Observed same issue. However, decided to wait and observe. The receive queue kept on increasing, had to kill process and SST is the only option for syncing the node back.

Comment by Daan van Gorkum [ 2020-04-09 ]

Will Galera 26.4.4 be release for 10.4? It's not in the repo.
https://mariadb.com/kb/en/meta/galera-versions/ mentions it's only used for 10.5.

We hit this problem today and it was difficult to understand what was happening at first.

Comment by Jan Lindström (Inactive) [ 2020-04-09 ]

Yes it will when next 10.4 is released, if you really need it you can naturally use library from 10.5 as they are exactly the same.

Comment by Justin Y [ 2020-04-29 ]

We've been hit pretty hard two times in the last month. Do we have a release date for 10.4.13?

And, Jan - what library do we copy and from what exact release of 10.5 to match with 10.4.12 without worry of data corruption or worse performance? Need precise instructions if we're going to hack this to stay afloat.

I've used MySQL and MariaDB since 1999, and this is the worst stability-crippling experience I've ever had. I recommend it worth inclusion into the release notes: https://mariadb.com/kb/en/mariadb-10413-release-notes/

Thanks!

Comment by Rasmus Johansson (Inactive) [ 2020-05-13 ]

Hi jyusb! 10.4.13 comes with Galera library 26.4.4, which includes the fix for MDEV-21002. The fix is in the Galera library.

I verified that it's in the download from https://mariadb.com/downloads/#mariadb_platform-mariadb_server

If you use downloads.mariadb.org, it's also there first in the list: https://downloads.mariadb.org/mariadb/10.4.13/

We have a discussion ongoing about including release notes from 3rd party components.

Comment by Daniel Bartholomew [ 2020-05-14 ]

I've updated the MariaDB 10.4.13 release notes

Comment by MikaH [ 2020-05-14 ]

Justin, I share you pain. We have had serious issues with 10.4 and we were forced to go back 10.3 (10.3.22). MariaDB needs to put more effort to testing and and also performance testing.

Sorry, but i had to comment...

Comment by Ranjan Ghosh [ 2020-05-20 ]

Sorry, I also have to comment, because I think this bug shows quite well a deficiency of the current release model: 10.4.13 contained the fix for quite some time now, but binary packages became available only very recently (apart from the somewhat eerie fix to mix in packages from 10.5 which I was a bit reluctant to do on a production server). This means, we had to live with a bug like this for months which brought the whole cluster down and caused me (and it seems many other) a lot of gray hairs

Perhaps you might want to reevaluate how you ship such stuff in similar cases. I have the feeling (might be wrong) that if this wasn't a cluster/Galera problem, but a bug affecting also normal DB users it would have justified an emergency fix. Cluster is imporant and shouldn't be put on the back-burner. Perhaps it might be advisable to release some kind of 10.4.12.1 in such urgent cases. Just my 2 cents - sorry for the spam.

Comment by Imre Jonk [ 2022-06-01 ]

I experienced this issue with Galera 25.3.32 and MariaDB 10.3: https://bugzilla.redhat.com/show_bug.cgi?id=2072442

Generated at Thu Feb 08 09:03:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.