Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.6.8
-
None
-
centos7 x86
Description
My environment is a virtual machine environment. The machine performance is not good, and mariadb-galera is deployed on three nodes in the form of containers. After all the containers crashed, a randomly selected node was restarted, and occasionally all the files in the mysql directory of the three nodes were gone. Suspicion is SST or something else.Have you encountered similar issues in other versions?
Attachments
- mysqld0.log
- 4.18 MB
- mysqld1.log
- 3.76 MB
- mysqld2.log
- 1.66 MB
- screenshot-1.png
- 11 kB
- screenshot-2.png
- 124 kB
Activity
mariadb-galera-0.log mariadb-galera-1.log
mariadb-galera-2.log
mysqld0.log
mysqld1.log
mysqld2.log
Restart the three-node system, the data of the database node is gone, the above is the corresponding log.
We use helm to deploy bitnami/mariadb-galera to the k8s cluster, the k8s version is 1.23.9, and the containerd is 1.6.26
I don't know the current status there.
There is https://github.com/mariadb-operator/mariadb-operator that manages galera too that is actively maintained.
/bin/pt-galera-log-explainer list --all --since '2024-12-05T10:30:10Z' ~/Downloads/mysqld[012].log |
identifier mariadb-galera-0 mariadb-galera-1 mariadb-galera-2
|
current path /home/dan/Downloads/mysqld0.log /home/dan/Downloads/mysqld1.log /home/dan/Downloads/mysqld2.log
|
last known ip 192.168.241.113 192.168.235.34 192.168.59.252
|
last known name mariadb-galera-0 mariadb-galera-1 mariadb-galera-2
|
|
2024-12-05 10:30:10 mariadb-galera-2 suspected to be down mariadb-galera-2 suspected to be down |
|
2024-12-05 10:30:11 mariadb-galera-1 joined mariadb-galera-0 joined |
|
2024-12-05 10:30:11 | mariadb-galera-2 left |
|
2024-12-05 10:30:11 PRIMARY(n=2) | |
|
2024-12-05 10:30:11 mariadb-galera-2 left PRIMARY(n=2) |
|
2024-12-05 10:30:31 | | inactive check more than 1.5s (25.7908s)
|
| | inactive check more than 1.5s (1.51535s)
|
2024-12-05 10:30:33 | | mariadb-galera-0 suspected to be down
|
2024-12-05 10:30:34 | | NON-PRIMARY(n=1)
|
2024-12-05 10:30:34 | | SYNCED -> OPEN
|
2024-12-05 10:30:34 | | NON-PRIMARY(n=1)
|
mariadb-galera-1 joined mariadb-galera-0 joined |
|
2024-12-05 10:30:36 mariadb-galera-2 joined mariadb-galera-2 joined |
|
| PRIMARY(n=3) |
|
2024-12-05 10:30:36 | | mariadb-galera-0 joined
|
2024-12-05 10:30:36 | | mariadb-galera-1 joined
|
2024-12-05 10:30:36 | | PRIMARY(n=3)
|
2024-12-05 10:30:37 | | OPEN -> PRIMARY
|
2024-12-05 10:30:37 PRIMARY(n=3) | |
|
| | will receive IST(seqno:2082)
|
| | mariadb-galera-0 will resync local node
|
| | PRIMARY -> JOINER
|
| | got SST from mariadb-galera-0
|
2024-12-05 10:30:38 local node will resync mariadb-galera-2 mariadb-galera-0 will resync mariadb-galera-2 |
|
2024-12-05 10:30:38 SYNCED -> DONOR mariadb-galera-0 synced mariadb-galera-2 |
|
2024-12-05 10:30:38 IST to mariadb-galera-2(seqno:2082) | |
|
2024-12-05 10:30:38 finished sending IST to mariadb-galera-2 | |
|
2024-12-05 10:30:38 DESYNCED -> JOINED | |
|
2024-12-05 10:30:38 JOINED -> SYNCED | |
|
2024-12-05 10:30:47 | | received shutdown
|
2024-12-05 10:30:49 mariadb-galera-1 suspected to be down | mariadb-galera-1 suspected to be down
|
2024-12-05 10:30:50 NON-PRIMARY(n=1) | NON-PRIMARY(n=1)
|
2024-12-05 10:30:50 SYNCED -> OPEN | JOINER -> OPEN
|
2024-12-05 10:30:50 | | OPEN -> CLOSED
|
2024-12-05 10:30:50 mariadb-galera-2 left | |
|
2024-12-05 10:30:50 NON-PRIMARY(n=1) | |
|
2024-12-05 10:30:53 | | IST received(seqno:2082)
|
2024-12-05 10:30:54 | | shutdown complete
|
2024-12-05 10:31:14 received shutdown | |
|
2024-12-05 10:31:14 OPEN -> CLOSED | |
|
2024-12-05 10:31:16 shutdown complete | |
|
2024-12-05 10:31:32 | inactive check more than 1.5s (47.837s) |
|
2024-12-05 10:31:34 | NON-PRIMARY(n=1) |
|
2024-12-05 10:31:34 | SYNCED -> OPEN |
|
2024-12-05 10:31:34 | NON-PRIMARY(n=1) |
|
2024-12-05 10:32:07 | received shutdown |
|
2024-12-05 10:32:08 | OPEN -> CLOSED |
|
2024-12-05 10:32:11 | shutdown complete |
|
So galera-2 timed out:
2024-12-05 10:30:30 0 [Note] WSREP: (9e537da6-895d, 'tcp://0.0.0.0:4567') connection to peer 8e07881d-841b with addr tcp://192.168.235.34:4567 timed out, no messages seen in PT3S, socket stats: rtt: 14643 rttvar: 17547 rto: 215000 lost: 0 last_data_recv: 21621 cwnd: 8 last_queued_since: 1519241031 last_delivered_since: 24133431088 send_queue_length: 7 send_queue_bytes: 588 segment: 0 messages: 7
|
It did a sst recovery from galera-0, all seemed successful and then was immediately shutdown (by bintnami? hard so say - no logs after the startup on 07:55).
Frequent errors:
Slave SQL: Error 'Can't create table `idatafusion`.`agent` (errno: 121 "Duplicate key on write or update")' on query. Default database: 'idatafusion'. Query: 'ALTER TABLE `agent` ADD CONSTRAINT `fk_agent_version_info` FOREIGN KEY (`version_id`) REFERENCES `version`(`id`)', Internal MariaDB error code: 1005
|
Unsure if this was meant to replicate. But next log message says it was ignored.
I can't see anything around data disappearing, only the container being shutdown.
You have helm configured to make /bitnami/mariadb/data a persistent volume right?
yes, /bitnami/mariadb/data is a persistent volume
The following file for mariadb-galera-1 after the reboot is as follows
The following file for mariadb-galera-2 after the rebodot is as follows
The following file for mariadb-galera-0 is lose
I looked at the selection of the master to start with maria-galera-1 as the master node.
I don't know if any of this helps, thanks for the answer
"The following file for mariadb-galera-0 is lose" is blank.
The "not a database .sst" could be as result of the SST not completing finishing donating to that node. There was insufficient logs to say if this was definately the case and not enough to suggest what may have occurred sorry.
Recommend bumping to a later 10.6 version as there have been a considerable amount of fixes and bootstrapping on your latest galera node.
I'm going to close this as incomplete for now, but if there's further information, especially on a later version this can be examined.
Some container logs would provide some searchable material of what might have gone wrong. Can you attach these as text files (not images)?
Take a look for similar issues in the release notes from 10.6.8 onwards.
https://mariadb.com/kb/en/release-notes-mariadb-106-series/
Also note we aren't going to be providing Centos7 releases any more - https://mariadb.com/kb/en/mariadb-platform-deprecation-policy/.