[MDEV-9825] mariadb 10.1 crashing Created: 2016-03-29 Updated: 2020-10-20 Resolved: 2020-10-20 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | OTHER |
| Affects Version/s: | 10.1.12 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Alex | Assignee: | Unassigned |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Description |
|
Hello, I faced few problems: 1. master crashing 2. Backup slaves crashing I seen nothing in dmesg, logs, or anywhere else. I know I need to compile/install some debug version and tune my.cnf But for now all systems run with 10.0 without any problem or any crash seen so far (maybe this will happen some day and it relates to centos7 or other settings I've missed..who knows) Thanks |
| Comments |
| Comment by Elena Stepanova [ 2016-03-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ShivaS, With other signals, even with a non-debug and stripped version, there is always something in the logs, at least the note about receiving the signal, you just need to find it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alex [ 2016-03-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Elena, About the masters: I never seen any OOM with that setup. this is how's top looks like with still-running 10.1 setup:
this server runs with 20qps and never crashed (yet) And this on a different machine with 10.0, that was previously upgraded to double it capacity (I thought it could help) and running 10.0 because it didn't help and 10.1 kept crashing:
The above 10.0 machine runs at 16qps It may happen under any load, whether it's 3k/sec or 20k/sec. The pattern is simple insert delay and I am using thread-per-connection cause pool of threads seems to be not effective for quick queries (server stops responding and goes to high LA and CPU usage) Even if I missed OOM (which I beleive I did not) or misconfigured it by overusing resources/settings I need for it - 10.0 is far more stable anyway - either as master or a slave. On master, when the crash happens - restarts can be endless, database starts and crashes and over and over again. Until I manually stop it. All I see in messages is something like this:
I think that signal 11 isn't related to OOM, As for the slave - here you are right. It's OOM, however it doesn't happen with 10.0 and even with 10.1 it's quite rare - could be twice a day or once a few days. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alex [ 2016-04-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Elena, So I guess the case can be closed However - 10.1 crashes more frequently and I couldn't find where it's over allocated memory compared to 10.0. Also, I noticed that under sudden loads on masters I can get duplicated binary log entry (statement log used). Whole app level and logs been rechecked, it's just double entry caused by MariaDB and it happens in both 10.0 and 10.1 I'll give another try to 10.1 on master servers but a bit later. Too many problems with it (in my humble opinion), including freeze on stop if multi-source replication wasn't stopped in advance as well as parallel work threads don't display 'seconds behind master' info) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2016-04-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ShivaS, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alex [ 2016-04-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Elena, However, I am here for anything you want me to test/check. Can't promise I'll be able to sacrifice production once again, but I'll do my best to help with everything I can. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2020-10-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We've never been able to get even close to solve the mystery, closing as incomplete. |