[MDEV-32162] debian-start attempts to use DB before WSREP/Galera complete Created: 2023-09-13 Updated: 2024-01-08 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Platform Debian |
| Affects Version/s: | 10.6.15 |
| Fix Version/s: | 10.6 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Brendon Abbott | Assignee: | Tuukka Pasanen |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Ubuntu Jammy. Standard community packages |
||
| Description |
|
I am working on upgrading our platform from 10.4 to 10.6, using Galera. I can see how the server is started, and the systemd service unit file declares to start various pre steps (to handle galera variables), launches the main mariadbd process, and then performs post start operations - i.e. start debian-start. I think, that mariadbd is forking its process, and hence returning to systemd before WSREP is willing to perform queries. However, debian-start is launching, and attempting to ensure that the root accounts are secure. I am using `galera_new_cluster` to bring up the first node of the cluster. I am yet to get as far as joining a second node and seeing if the issue exists there. I have only started looking at 10.6.15, so unsure as to how early in the version this problem exists. I can't recall having the issue in 10.4 The mariadb logs entries, are separated from syslog.
|
| Comments |
| Comment by Tuukka Pasanen [ 2023-11-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have question before starting to work on this. Do you have Galera cluster configured? It does not change much as Galera seems to be started too early but just for background info. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tuukka Pasanen [ 2023-11-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Needs feedback is user having Galera cluster configured as it seems. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brendon Abbott [ 2023-11-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Tuuka, Yes I do have Galera configured. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tuukka Pasanen [ 2023-11-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you for sharp and quick response. I'll assume this line is one that fails: https://github.com/MariaDB/server/blob/1fc2843eeece8500e20bd4f0a814996fc25b3e01/debian/additions/debian-start.inc.sh#L75 as last log is just above that. Can you check for that it is correct? File location is /usr/share/mysql/debian-start.inc.sh | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brendon Abbott [ 2023-11-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi. yes it is. i tweaked the script with `set -x` to be double sure.
I am currently testing this on a VirtualBox VM on my development machine, which is potentially under high load. We are further on with our R&D now, so I am also currently getting access to a test Galera cluster on AWS, to see if that has any different behaviour. I thought this may be useful in case the problem is memory/processor load related. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tuukka Pasanen [ 2023-11-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If you just add small delay above the SQL clause like sleep 5 (it's not a solution but helps to debug behavior) does Galera have time to start correctly? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brendon Abbott [ 2023-11-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi I went one better, and I made a crude loop, querying the local Galera state, sleeping for 1 second each time. I always loop 9 times. (I chose 5 at first, but it wasn't enough). Similarly, the logs in the original description - suggest that one took about 3 seconds if we can presume `WSREP: Synchronized with group, ready for connections` is when it is available. How does an IST come into this? If this also happens in this window, I guess the delay is going to be very variable.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tuukka Pasanen [ 2023-11-07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have little bit learn about Galera as I'm not very familiar with it. As this could be the one solution I've just wondering when one does not have Galera available. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brendon Abbott [ 2023-11-07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yes, that possibly is one solution. However, I would imagine there is probably more systems without Galera configured, than there are with Galera. There is also the question of understanding what has changed in the server process to break it since 10.4. I am fairly sure that this used to not be a problem. However, it might have been occuring, but just not logging as severely. Hopefully someone else from MariaDB can fill you in with the Galera info - as it becomes quite complex regarding how ready the server might be to handle requests. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tuukka Pasanen [ 2023-11-07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yes there is need little bit of source reading why it's printing that error on what commit has made the change | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tuukka Pasanen [ 2023-11-23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ok I've done little bit homework (not enough though) with this and as I haven't setup galera cluster cloud you provuide what what does:
show before/after it's in working state | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brendon Abbott [ 2024-01-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The information requested is shown in the bash output, in a previous comment. |