Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.6.9
-
None
Description
On a 3-nodes cluster, 2 nodes got completely stuck. Anything trying to connect would also get stuck. The third node had 200+ (or 500?) queries running, but stuck as well.
Stopping those 2 servers got stuck to, had to kill the processes manually. Logs shows the server being disconnected from the cluster. Cluster size went from 3 to 1 as it should, but mariadb processes where stuck on those 2 servers until manually killed.
Starting first node back after a minute got it to sync using an IST, the other node restarted after 5 minutes synced using an SST. As if nothing happened!?
Digging into logs of the 3 servers shows nothing, no output at all during the whole period they where stuck.
Anything I could do to track this down if it happens again?
Server 1 log (stuck):
2022-08-25 0:29:16 0 [Note] WSREP: Deleted page /var/lib/mysql/galera.page.000191 |
2022-08-25 0:29:32 0 [Note] WSREP: Created page /var/lib/mysql/galera.page.000192 of size 134217696 bytes |
2022-08-25 0:29:39 0 [Note] WSREP: Deleted page /var/lib/mysql/galera.page.000192 |
2022-08-25 0:30:38 0 [Note] WSREP: Created page /var/lib/mysql/galera.page.000193 of size 134217696 bytes |
2022-08-25 0:30:47 0 [Note] WSREP: Created page /var/lib/mysql/galera.page.000194 of size 134217696 bytes |
2022-08-25 0:30:56 0 [Note] WSREP: Deleted page /var/lib/mysql/galera.page.000193 |
2022-08-25 0:30:57 0 [Note] WSREP: Deleted page /var/lib/mysql/galera.page.000194 |
2022-08-25 0:32:23 0 [Note] WSREP: Created page /var/lib/mysql/galera.page.000195 of size 134217696 bytes |
2022-08-25 1:03:21 0 [Note] WSREP: declaring 19edde19-9435 at tcp://xxxx:4567 stable |
2022-08-25 1:03:21 0 [Note] WSREP: forgetting e26a09b5-b02e (tcp://xxxx:4567) |
2022-08-25 1:03:21 0 [Note] WSREP: Node 19edde19-9435 state prim |
2022-08-25 1:03:21 0 [Note] WSREP: view(view_id(PRIM,19edde19-9435,63) memb { |
2022-08-25 1:03:21 0 [Note] WSREP: save pc into disk |
2022-08-25 1:03:21 0 [Note] WSREP: forgetting e26a09b5-b02e (tcp://xxxx:4567) |
2022-08-25 1:03:21 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2 |
2022-08-25 1:03:21 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. |
2022-08-25 1:03:21 0 [Note] WSREP: STATE EXCHANGE: sent state msg: f2ffea9d-2400-11ed-9136-ef240a6d8bee |
2022-08-25 1:03:21 0 [Note] WSREP: STATE EXCHANGE: got state msg: f2ffea9d-2400-11ed-9136-ef240a6d8bee from 0 (ovh7.1check.com) |
2022-08-25 1:03:21 0 [Note] WSREP: STATE EXCHANGE: got state msg: f2ffea9d-2400-11ed-9136-ef240a6d8bee from 1 (ovh1.1check.com) |
2022-08-25 1:03:21 0 [Note] WSREP: Quorum results: |
2022-08-25 1:03:21 0 [Note] WSREP: Flow-control interval: [23, 23] |
2022-08-25 1:03:26 0 [Note] WSREP: cleaning up e26a09b5-b02e (tcp://xxxx:4567) |
2022-08-25 1:06:00 0 [Note] /usr/sbin/mariadbd (initiated by: unknown): Normal shutdown |
2022-08-25 1:06:00 0 [Note] WSREP: Shutdown replication |
2022-08-25 1:06:00 0 [Note] WSREP: Server status change synced -> disconnecting |
2022-08-25 1:06:00 0 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. |
2022-08-25 1:06:00 0 [Note] WSREP: Closing send monitor... |
2022-08-25 1:06:00 0 [Note] WSREP: Closed send monitor. |
2022-08-25 1:06:00 0 [Note] WSREP: gcomm: terminating thread |
2022-08-25 1:06:00 0 [Note] WSREP: gcomm: joining thread |
2022-08-25 1:06:00 0 [Note] WSREP: gcomm: closing backend |
2022-08-25 1:06:01 0 [Note] WSREP: view(view_id(NON_PRIM,19edde19-9435,63) memb { |
2022-08-25 1:06:01 0 [Note] WSREP: PC protocol downgrade 1 -> 0 |
2022-08-25 1:06:01 0 [Note] WSREP: view((empty)) |
2022-08-25 1:06:01 0 [Note] WSREP: gcomm: closed |
2022-08-25 1:06:01 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 |
2022-08-25 1:06:01 0 [Note] WSREP: Flow-control interval: [16, 16] |
2022-08-25 1:06:01 0 [Note] WSREP: Received NON-PRIMARY. |
2022-08-25 1:06:01 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 70213440) |
2022-08-25 1:06:01 0 [Note] WSREP: New SELF-LEAVE. |
2022-08-25 1:06:01 0 [Note] WSREP: Flow-control interval: [0, 0] |
2022-08-25 1:06:01 0 [Note] WSREP: Received SELF-LEAVE. Closing connection. |
2022-08-25 1:06:01 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 70213440) |
2022-08-25 1:06:01 0 [Note] WSREP: RECV thread exiting 0: Success |
2022-08-25 1:06:01 0 [Note] WSREP: recv_thread() joined. |
2022-08-25 1:06:01 0 [Note] WSREP: Closing replication queue. |
2022-08-25 1:06:01 0 [Note] WSREP: Closing slave action queue. |
2022-08-25 1:07:18 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 2c96698a-0fdf-11ed-90d6-7ecc8fa70984:70213386 |
2022-08-25 1:07:18 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so' |
Server 2 log (stuck):
2022-08-25 0:29:32 0 [Note] WSREP: Created page /var/lib/mysql/galera.page.000192 of size 134217696 bytes |
2022-08-25 0:29:39 0 [Note] WSREP: Deleted page /var/lib/mysql/galera.page.000192 |
2022-08-25 0:30:38 0 [Note] WSREP: Created page /var/lib/mysql/galera.page.000193 of size 134217696 bytes |
2022-08-25 0:30:47 0 [Note] WSREP: Created page /var/lib/mysql/galera.page.000194 of size 134217696 bytes |
2022-08-25 0:30:56 0 [Note] WSREP: Deleted page /var/lib/mysql/galera.page.000193 |
2022-08-25 0:30:57 0 [Note] WSREP: Deleted page /var/lib/mysql/galera.page.000194 |
2022-08-25 0:32:23 0 [Note] WSREP: Created page /var/lib/mysql/galera.page.000195 of size 134217696 bytes |
2022-08-25 1:03:21 0 [Note] /usr/sbin/mariadbd (initiated by: unknown): Normal shutdown |
2022-08-25 1:03:21 0 [Note] WSREP: Shutdown replication |
2022-08-25 1:03:21 0 [Note] WSREP: Server status change synced -> disconnecting |
2022-08-25 1:03:21 0 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. |
2022-08-25 1:03:21 0 [Note] WSREP: Closing send monitor... |
2022-08-25 1:03:21 0 [Note] WSREP: Closed send monitor. |
2022-08-25 1:03:21 0 [Note] WSREP: gcomm: terminating thread |
2022-08-25 1:03:21 0 [Note] WSREP: gcomm: joining thread |
2022-08-25 1:03:21 0 [Note] WSREP: gcomm: closing backend |
2022-08-25 1:03:21 0 [Note] WSREP: view(view_id(NON_PRIM,19edde19-9435,62) memb { |
2022-08-25 1:03:21 0 [Note] WSREP: PC protocol downgrade 1 -> 0 |
2022-08-25 1:03:21 0 [Note] WSREP: view((empty)) |
2022-08-25 1:03:21 0 [Note] WSREP: gcomm: closed |
2022-08-25 1:03:21 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 |
2022-08-25 1:03:21 0 [Note] WSREP: Flow-control interval: [16, 16] |
2022-08-25 1:03:21 0 [Note] WSREP: Received NON-PRIMARY. |
2022-08-25 1:03:21 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 70213433) |
2022-08-25 1:03:21 0 [Note] WSREP: New SELF-LEAVE. |
2022-08-25 1:03:21 0 [Note] WSREP: Flow-control interval: [0, 0] |
2022-08-25 1:03:21 0 [Note] WSREP: Received SELF-LEAVE. Closing connection. |
2022-08-25 1:03:21 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 70213433) |
2022-08-25 1:03:21 0 [Note] WSREP: RECV thread exiting 0: Success |
2022-08-25 1:03:21 0 [Note] WSREP: recv_thread() joined. |
2022-08-25 1:03:21 0 [Note] WSREP: Closing replication queue. |
2022-08-25 1:03:21 0 [Note] WSREP: Closing slave action queue. |
2022-08-25 1:08:05 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 2c96698a-0fdf-11ed-90d6-7ecc8fa70984:70213385 |
2022-08-25 1:08:05 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so' |
2022-08-25 1:08:05 0 [Note] WSREP: wsrep_load(): Galera 26.4.12(r1eac5b64) by Codership Oy <info@codership.com> loaded successfully. |
2022-08-25 1:08:05 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration. |
Server 3 log (still alive):
2022-08-25 0:30:56 0 [Note] WSREP: Deleted page /var/lib/mysql/galera.page.000389 |
2022-08-25 0:30:57 0 [Note] WSREP: Deleted page /var/lib/mysql/galera.page.000390 |
2022-08-25 0:32:23 0 [Note] WSREP: Created page /var/lib/mysql/galera.page.000391 of size 134217696 bytes |
2022-08-25 1:03:21 0 [Note] WSREP: declaring 3e14de76-94d4 at tcp://xxxx:4567 stable |
2022-08-25 1:03:21 0 [Note] WSREP: Deferred close timer started for socket with remote endpoint: tcp://xxxx:44208 |
2022-08-25 1:03:21 0 [Note] WSREP: forgetting e26a09b5-b02e (tcp://xxxx:4567) |
2022-08-25 1:03:21 0 [Note] WSREP: Deferred close timer handle_wait Operation aborted. for 0x7fd768845210 |
2022-08-25 1:03:21 0 [Note] WSREP: Deferred close timer destruct |
2022-08-25 1:03:21 0 [Note] WSREP: Node 19edde19-9435 state prim |
2022-08-25 1:03:21 0 [Note] WSREP: view(view_id(PRIM,19edde19-9435,63) memb { |
2022-08-25 1:03:21 0 [Note] WSREP: save pc into disk |
2022-08-25 1:03:21 0 [Note] WSREP: forgetting e26a09b5-b02e (tcp://xxxx:4567) |
2022-08-25 1:03:21 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2 |
2022-08-25 1:03:21 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: f2ffea9d-2400-11ed-9136-ef240a6d8bee |
2022-08-25 1:03:21 0 [Note] WSREP: STATE EXCHANGE: sent state msg: f2ffea9d-2400-11ed-9136-ef240a6d8bee |
2022-08-25 1:03:21 0 [Note] WSREP: STATE EXCHANGE: got state msg: f2ffea9d-2400-11ed-9136-ef240a6d8bee from 0 (ovh7.1check.com) |
2022-08-25 1:03:21 0 [Note] WSREP: STATE EXCHANGE: got state msg: f2ffea9d-2400-11ed-9136-ef240a6d8bee from 1 (ovh1.1check.com) |
2022-08-25 1:03:21 0 [Note] WSREP: Quorum results: |
2022-08-25 1:03:21 0 [Note] WSREP: Flow-control interval: [23, 23] |
2022-08-25 1:03:21 17 [Note] WSREP: ####### processing CC 70213434, local, ordered |
2022-08-25 1:03:21 17 [Note] WSREP: ####### My UUID: 19edde19-2139-11ed-9435-e69b7d8e37e9 |
2022-08-25 1:03:21 17 [Note] WSREP: Skipping cert index reset |
2022-08-25 1:03:21 17 [Note] WSREP: REPL Protocols: 10 (5) |
2022-08-25 1:03:21 17 [Note] WSREP: ####### Adjusting cert position: 70213433 -> 70213434 |
2022-08-25 1:03:21 0 [Note] WSREP: Service thread queue flushed. |
2022-08-25 1:03:21 17 [Note] WSREP: ================================================ |
2022-08-25 1:03:21 17 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. |
2022-08-25 1:03:21 17 [Note] WSREP: Lowest cert index boundary for CC from group: 70213347 |
2022-08-25 1:03:21 17 [Note] WSREP: Min available from gcache for CC from group: 70213347 |
2022-08-25 1:03:26 0 [Note] WSREP: cleaning up e26a09b5-b02e (tcp://xxxx:4567) |
2022-08-25 1:06:01 0 [Note] WSREP: Deferred close timer started for socket with remote endpoint: tcp://xxxx:36660 |
2022-08-25 1:06:01 0 [Note] WSREP: forgetting 3e14de76-94d4 (tcp://xxxx:4567) |
2022-08-25 1:06:01 0 [Note] WSREP: Node 19edde19-9435 state prim |
2022-08-25 1:06:01 0 [Note] WSREP: view(view_id(PRIM,19edde19-9435,64) memb { |
2022-08-25 1:06:01 0 [Note] WSREP: save pc into disk |
2022-08-25 1:06:01 0 [Note] WSREP: forgetting 3e14de76-94d4 (tcp://xxxx:4567) |
2022-08-25 1:06:01 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1 |
2022-08-25 1:06:01 0 [Note] WSREP: Deferred close timer handle_wait Operation aborted. for 0x7fd7344f9160 |
2022-08-25 1:06:01 0 [Note] WSREP: Deferred close timer destruct |
2022-08-25 1:06:01 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 523fb653-2401-11ed-8253-82473e2187ea |
2022-08-25 1:06:01 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 523fb653-2401-11ed-8253-82473e2187ea |
2022-08-25 1:06:01 0 [Note] WSREP: STATE EXCHANGE: got state msg: 523fb653-2401-11ed-8253-82473e2187ea from 0 (ovh7.1check.com) |
2022-08-25 1:06:01 0 [Note] WSREP: Quorum results: |
2022-08-25 1:06:01 0 [Note] WSREP: Flow-control interval: [16, 16] |
2022-08-25 1:06:01 17 [Note] WSREP: ####### processing CC 70213441, local, ordered |
2022-08-25 1:06:01 17 [Note] WSREP: ####### My UUID: 19edde19-2139-11ed-9435-e69b7d8e37e9 |
2022-08-25 1:06:01 17 [Note] WSREP: Skipping cert index reset |
2022-08-25 1:06:01 17 [Note] WSREP: REPL Protocols: 10 (5) |
2022-08-25 1:06:01 17 [Note] WSREP: ####### Adjusting cert position: 70213440 -> 70213441 |
2022-08-25 1:06:01 0 [Note] WSREP: Service thread queue flushed. |
2022-08-25 1:06:01 17 [Note] WSREP: ================================================ |
2022-08-25 1:06:01 17 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. |
2022-08-25 1:06:01 17 [Note] WSREP: Lowest cert index boundary for CC from group: 70213347 |
2022-08-25 1:06:01 17 [Note] WSREP: Min available from gcache for CC from group: 70213347 |
2022-08-25 1:06:06 0 [Note] WSREP: cleaning up 3e14de76-94d4 (tcp://xxxx:4567) |
2022-08-25 1:06:17 16935047 [Warning] Aborted connection 16935047 to db: '_1check_mail_export' user: 'bo_1check' host: '127.0.0.1' (Got an error reading communication packets) |
2022-08-25 1:06:17 16935044 [Warning] Aborted connection 16935044 to db: '_1check_mail_export' user: 'bo_1check' host: '127.0.0.1' (Got an error reading communication packets) |
2022-08-25 1:07:12 0 [Note] WSREP: Created page /var/lib/mysql/galera.page.000392 of size 134217696 bytes |
2022-08-25 1:07:18 0 [Note] WSREP: (19edde19-9435, 'tcp://0.0.0.0:4567') connection established to 807a59e3-896d tcp://xxxx:4567 |
2022-08-25 1:07:18 0 [Note] WSREP: (19edde19-9435, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: |
2022-08-25 1:07:19 0 [Note] WSREP: declaring 807a59e3-896d at tcp://xxxx:4567 stable |
2022-08-25 1:07:19 0 [Note] WSREP: Node 19edde19-9435 state prim |
2022-08-25 1:07:19 0 [Note] WSREP: view(view_id(PRIM,19edde19-9435,65) memb { |
2022-08-25 1:07:19 0 [Note] WSREP: save pc into disk |
2022-08-25 1:07:19 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2 |
2022-08-25 1:07:19 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 80c6eee7-2401-11ed-a085-a2c1fb1fa522 |
2022-08-25 1:07:19 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 80c6eee7-2401-11ed-a085-a2c1fb1fa522 |
2022-08-25 1:07:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: 80c6eee7-2401-11ed-a085-a2c1fb1fa522 from 0 (ovh7.1check.com) |
2022-08-25 1:07:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: 80c6eee7-2401-11ed-a085-a2c1fb1fa522 from 1 (ovh1.1check.com) |
2022-08-25 1:07:19 0 [Note] WSREP: Quorum results: |
2022-08-25 1:07:19 0 [Note] WSREP: Flow-control interval: [23, 23] |
2022-08-25 1:07:19 17 [Note] WSREP: ####### processing CC 70224411, local, ordered |
2022-08-25 1:07:19 17 [Note] WSREP: ####### My UUID: 19edde19-2139-11ed-9435-e69b7d8e37e9 |
2022-08-25 1:07:19 17 [Note] WSREP: Skipping cert index reset |
2022-08-25 1:07:19 17 [Note] WSREP: REPL Protocols: 10 (5) |
2022-08-25 1:07:19 17 [Note] WSREP: ####### Adjusting cert position: 70224410 -> 70224411 |
2022-08-25 1:07:19 0 [Note] WSREP: Service thread queue flushed. |
2022-08-25 1:07:19 17 [Note] WSREP: ================================================ |
2022-08-25 1:07:19 17 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. |
2022-08-25 1:07:19 17 [Note] WSREP: Lowest cert index boundary for CC from group: 70213347 |
2022-08-25 1:07:19 17 [Note] WSREP: Min available from gcache for CC from group: 70213347 |