Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.5, 10.6, 10.2(EOL), 10.3(EOL), 10.4(EOL)
-
None
Description
This bug was originally seen in the galera_nbo_sst_slave mtr test for 10.6, however it is relevant for all versions and can lead to intermittent SST crashes via rsync on very fast server restarts - when a new SST process (for example, after starting a new server) overlaps the old SST process from the previous (already terminated) server. This overlap can result in the new rsync being killed instead of the old rsync, or the pid file from the new rsync being killed, which then lead to problems.
For example:
2021-06-09 3:28:56 0 [Warning] WSREP: 0.0 (panda): State transfer to 1.0 (panda) failed: -11 (Resource temporarily unavailable)
|
2021-06-09 3:28:56 0 [ERROR] WSREP: /home/panda/galera-es-4.x/gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1205: Will never receive state. Need to abort.
|
2021-06-09 3:28:56 0 [Note] WSREP: gcomm: terminating thread
|
2021-06-09 3:28:56 0 [Note] WSREP: gcomm: joining thread
|
2021-06-09 3:28:56 0 [Note] WSREP: gcomm: closing backend
|
2021-06-09 3:28:56 2 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): discarded 24 bytes
|
2021-06-09 3:28:56 2 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): found 1/2 locked buffers
|
2021-06-09 3:28:57 0 [Note] WSREP: PC protocol downgrade 1 -> 0
|
2021-06-09 3:28:57 0 [Note] WSREP: view((empty))
|
2021-06-09 3:28:57 0 [Note] WSREP: gcomm: closed
|
2021-06-09 3:28:57 0 [Note] WSREP: /home/panda/maria-10.6/build/sql/mariadbd: Terminated.
|
2021-06-09 3:28:58 0 [Warning] WSREP: option --wsrep-causal-reads is deprecated
|
2021-06-09 3:28:58 0 [Note] /home/panda/maria-10.6/build/sql/mariadbd (mysqld 10.6.1-1-MariaDB-debug-log) starting as process 410627 ...
|
.................
|
.................
|
2021-06-09 3:28:58 0 [Note] WSREP: save pc into disk
|
WSREP_SST: [ERROR] Parent mysqld process (PID: 410497) terminated unexpectedly. (20210609 03:28:58.800)
|
/home/panda/maria-10.6/build/scripts/wsrep_sst_rsync: line 681: kill: (-410497) - No such process
|
WSREP_SST: [INFO] Joiner cleanup: rsync PID=0, stunnel PID=410592 (20210609 03:28:58.803)
|
Attachments
Issue Links
- relates to
-
MDEV-24097 galera_3nodes suite tests in MTR sporadically fails: Failed to start mysqld or mysql_shutdown failed
- Closed