[MDEV-20115] 10.4 crash after upgrade from 10.3 ( /usr/sbin/mysqld: Thread 11 (user : '') did not exit) Created: 2019-07-22 Updated: 2022-03-17 Resolved: 2022-03-17 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Server |
| Affects Version/s: | 10.4.6 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Luke Alexander | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Incomplete | Votes: | 8 |
| Labels: | None | ||
| Environment: |
Ubuntu 16.04 4.15.0-55-generic #60~16.04.2-Ubuntu |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
We have just upgraded from 10.3 to 10.4 on our slave database server to try and move away from a SEGV(11) issue we were seeing on our master server running 10.3 (no stack trace available for that issue). The upgrade seemed to happen OK, but I needed to reboot the server after a kernel update, since the reboot the mariadb instance on the slave will not start (master still running 10.3). error logs shows:
|
| Comments |
| Comment by Luke Alexander [ 2019-07-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It seems that something wrote this zero byte file `/var/lib/mysql/debian-10.4.flag` On removing that file, the server was able to startup again! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Luke Alexander [ 2019-07-23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Actually the crash we had on 10.3 appears to have come back, all we have in syslog is:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-08-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi. I see you have 2 issues: crash and something from Debian packaging. I'm not an expert in the latter but it seems to be somehow related to binary format compatibility: https://github.com/MariaDB/server/blob/b428b09997d172f29fc201b9ab05c160ef4cbc39/debian/po/templates.pot#L30 Now about the crash. I can help without some addition info. Why don't you have a stack trace? Did MariaDB print it to you? I can't help without additional info. One more thing. Bugs are fixed in the most old version. Then fix is merged to a more recent branches. That means that to try get fix you can just upgrade the minor version of the server. Of course, upgrading a major version could help too but that could mean that bug disappeared by accident. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Luke Alexander [ 2019-08-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
That is what I find strange - there is no crash dump, no core file, nothing in the logs as to why mariadb crashes - just the SEGV message in syslog, if you are able to advise me how to enable crash dump reporting that would be great? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Luke Alexander [ 2019-08-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Also of note about the start/restart issue, the server appears to restart normally - then just when you would expect connections to resume it dies with the errors as below:
The process is still running (and consuming resources):
But the mysql.sock file has disappeared:
The pid file exists:
The process has many threads in operation still:
Sometimes if I kill the process with a hammer (kill -9) it will then restart OK, other times I have seen that it will restart on it's own, other times I will spend a long time doing the same operation until eventually it starts responding and the socket file comes back... | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Luke Alexander [ 2019-08-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Could this be related to the SEGV 11 we are experiencing? https://jira.mariadb.org/browse/MDEV-20108 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-08-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Could you try adding `--core-file` to your daemon? Maybe this could help us getting more info about crash. One other thing is to gdb -p $YOUR_PID and then from gdb run thread apply all bt. This is for the case when daemon hangs. As of | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Luke Alexander [ 2019-08-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've attached a gdb file to this ticket, I don't know if that will help debug this or not... | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by acsfer [ 2019-08-13 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Similar bug here (after upgrade from 10.3 and symptoms) https://jira.mariadb.org/browse/MDEV-20319 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by acsfer [ 2019-08-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In my case, if i issue a STOP SLAVE; before systemctl stop|restart mariadb server, i have no problems (in my case, I'm only having this exact issue - same log output, process still running... - when server needs to be stopped or restarted). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Luke Alexander [ 2019-08-21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We have some more data on this one, for some reason stack-trace was not enabled, it is now, this is from syslog:
This is the trace in mysql error log
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-08-29 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
lukealexander I think in Ubuntu you have to also install a package with debug symbols. I do not know how exactly it's called. Probably that's the reason of having ?? in your gdb stacktraces: it's hard to said anything from it. Probably you have | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-08-29 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The last crash is something unrelated. Could you disassemble vio_read() like this gdb -batch -ex 'file mysqld' -ex 'disassemble vio_read'? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chris [ 2019-09-13 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I also experience this same issue on Debian Buster using 10.4.7-MariaDB-1:10.4.7+maria~buster. As per previous notes if I stop the slave I can stop or restart the server otherwise it will get stuck waiting. As a work around I drop in a systemd configuration file with an execstop action to stop the slave. If you want to do this for Debian (and Ubuntu probably): 1. Create /etc/systemd/system/mariadb.service.d directory if not already existing
3. Reload systemd daemon: systemctl daemon-reload | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stefan Reger [ 2019-10-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have a similar problem with Debian Buster running 10.4.8+maria~buster with parallel replication enabled: On "systemctl stop mariadb" my log shows: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by acsfer [ 2019-11-13 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
On the same series...trying to update the server from APT, server can't be stopped (same reasons) so timeouts occurr, only solution kill -9 PID
So;
Otherwise, it crashes. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Rishøj [ 2019-12-12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This is still an issue (with 10.4.10). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ranjan Ghosh [ 2019-12-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hm. Sth. is seriously wrong here IMO. APT upgrade also never worked for me. Perhaps any connection to #15748? I just did an "apt upgrade" on the first node of a 2-node cluster and the process got stuck:
and with journalctl I get his:
and that's about it. No new log entries now for 10 minutes. Nothing. I will kill mysqld now, but I just also wanted to chime in because it's not really a great upgrading experience. This should be easier | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ranjan Ghosh [ 2019-12-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
BTW: In my case it wasn't even a 10.3 => 10.4 upgrade but only a 10.4.10 to 10.4.11... | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Geoff Montee (Inactive) [ 2019-12-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It sounds like the server process may have been hung. If you are able to reproduce it, then it might help to get a gdb back trace of all threads when the process is hung. See here: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ranjan Ghosh [ 2019-12-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@GeoffMontee Thanks for the info. Currently I cannot reproduce it, because it was a live server where I obviously don't want to force that situation again. I'm glad that it's running now. I simply did an "apt upgrade". Before it was 10.4.10, new 10.4.11. APT tried to stop the server process and then got stuck. (On an admittely somewhat off-topic sidenote, I still find it somewhat disconcerting how "crashy" Galera is in general on restarts. If our 2-node cluster is running and you don't touch it: Great. But every other time I try to restart one of the nodes for some reason, all hell breaks loose. More often than not, I find that I can only really get out of the situation if I wipe /var/lib/mysql on one of the nodes and let it resync everything. If the wise MariaDB folks are reading this: Please, please consider adding more tests on these kind of restarts - it's terribly buggy IMHO). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Rishøj [ 2020-01-29 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
While similar to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeremy Haozhe Luo [ 2020-07-19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I found here from I have systemd stop/restart mariadb.service hang issue with new installed MariaDB 10.5.4 on CentOS Linux release 7.6.1810 (Core), not a upgrade case. I try to edit systemd mariadb.service, then it works fine. This is what I done:
then add one line in [Service] section:
The value is my mariadb instance pid file path, you should put what yours there.
Once hung already after stop/restart, I try
can do the same thing. Then systemd stop/restart mariadb.service works for me. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2020-07-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
jeremy.l, for minor systemd service file changes recommend either `systemctl edit mariadb.service` to add a particular line or a drop in like gbe0 did above. This way more substantial changes in upgrades don't get ignored and your desired effect is still applicable. The systemd service file is written not to SIGKILL (aka SendSIGKILL=NO) mariadb and let it shutdown smoothly because the recovery can be just as time expensive so solutions around sigkill aren't helping understand the problem. More recent systemd versions (v242+) will not start a new mariadb instance while the previous one shutting down (preventing the duplicate running instances seen in the Pidfiles were left out of mariadb packaged systemd files from the start because when systemd has a reference to the child process a pidfile isn't needed. Having said all that, identifying the cause of hanging/lack of completion during shutdown is an issue that sill needs resovling. jeremy.l if your problem in the hanging is related to async/classical/master/replica/slave replication then please include the gdb backtrace and include it as an attachment along with mariadb error log and `journalctl -n 40 -u mariadb.service`. If in doubt as to if this is replication related please create a new bug report. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergey Dushenkov [ 2022-01-14 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi All, I wondered what has happened and what were the changes I've made to my system recently.... And the ONLY change I did several days ago was: hostname change! So what I did - I have just changed ny hostname back to old one and voila - issue is gone. Now comes the questions: where should I look to update my config properly and to match with a new hostname? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-02-21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I see that the Galera cluster has been mentioned at least in a few comments. MariaDB Server 10.3 used Galera library version 3, while 10.4 and later use Galera 4. Maybe something in the Galera upgrade is not working correctly. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2022-03-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Clear is that rolling-upgrade from 10.3 --> 10.4 is not supported. You need to cleanly shutdown your all nodes, do upgrade and restart nodes. 10.4.6 is quite old so I recommend using more recent version of MariaDB server and Galera library. If these steps do not help, please send more information e.g. full error logs. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Luke Alexander [ 2022-03-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, We upgraded a few times, somewhere along the way, this error disappeared/got fixed, we are now on 10.4.21 - so from my point of view the original issue is resolved. Thanks |