[MXS-976] Crash in libqc_sqlite Created: 2016-11-16 Updated: 2016-11-18 Resolved: 2016-11-18 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | qc_sqlite |
| Affects Version/s: | 2.0.1 |
| Fix Version/s: | 2.0.2 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Mikko Mensonen | Assignee: | Johan Wikman |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Debian 8 / VMWare |
||
| Issue Links: |
|
||||||||
| Description |
|
Hello, as originally reported on the google forum, one of our maxscale instances, fresh updated from 1.2.1 to 2.0.1 is experiencing a hard crash repeatedly:
The crash happens practically constantly (2-3 times a minute) when under load, can't reproduce it under testing conditions. |
| Comments |
| Comment by Johan Wikman [ 2016-11-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Deciphered stacktrace
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Johan Wikman [ 2016-11-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
mensonen Would you know what kind of statement is being handled when the crash occurs? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mikko Mensonen [ 2016-11-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Not with any certainty, no. For the short duration that I ran this in production, the server was running for a while with info and debug logging enabled, but looking at the queries that the log reported prior to each crash, I can't find anything in common. I can't really state with any certainty either whether there were queries outside of what was logged (apparently maxscale doesn't really log every single query?). Talking about about 700-800 queries per second at that particular time frame, all of them different in nature; a mixed set of selects, updates and inserts. Not sure if this is of any help, but I extracted some snippets from the logs just before each crash: Crash 1
Crash 2
Crash 3
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Johan Wikman [ 2016-11-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the log snippets, even though there were no smoking gun The QLA filter https://mariadb.com/kb/en/mariadb-enterprise/mariadb-maxscale-14/maxscale-query-log-all-filter can be used for logging all queries, although in a crash case like this, the logged query might not hit the disk before the process crashes. If I were to build a version of `qc_sqlite` that logs all queries to disk before parsing them, would you be interested in testing with that, so that we would get the precise statement that is being processed when the crash occurs? I understand if you are not, since it will cause a performance hit. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Johan Wikman [ 2016-11-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I might add that I'm actually pretty convinced this is not related to the precise statement. Namely, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Johan Wikman [ 2016-11-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If you have a core file that it is possible to share, please let me know. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mikko Mensonen [ 2016-11-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yes, I saw We do have one person using sequel pro, just like in that ticket, but they weren't trying to connect at the time of the crash. Just to be sure, I ran now an isolated test, letting them connect alone to the maxscale instance that crashed yesterday and run some queries that they usually do, but everything ran smoothly. The only other client tool we use is mysql workbench and I tested that too - no crashes. Also tried throwing similar traffic to it as at the time of the crash, but from a test bench (so considerably lower query rate than with production traffic), also no crashes. I'm willing to run a special build for a short while under production load, if you think that will reveal something (but as you said, probably not a single statement..). No core dump from that time, but I can try to produce one. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Johan Wikman [ 2016-11-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks. I'll continue investigating for a while before making any modifications, because I think I am on to something. Namely, the crash occurs when sqlite is parsing a select of its own, due to some other statement being parsed. That it does, for instance, at least, when a CREATE TABLE statement is parsed for the first time, for the current thread. I'll see if I could change that, so that all needed initialization would be performed at startup. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mikko Mensonen [ 2016-11-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
A minor update; I have been running maxscale in terminal against production traffic for 45 minutes now, trying to produce a core dump, but no crash has taken place (yet). The only difference is that the average query rate is today considerably lower than it was yesterday (only ~100 / second, compared to yesterday's ~500 / second). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mikko Mensonen [ 2016-11-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I can now reliably reproduce the crash, simply by attempting to login onto the server with Sequel Pro, versions at least 1.0.2 or 1.1-rc1, while maxscale is otherwise idling with no traffic at all. Newest Sequel Pro version does not seem to produce a crash at all (1.1.2 tested too). Core dump is available if you still need it. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Johan Wikman [ 2016-11-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks, that's good to know. No, a core dump is not needed anymore as I've successfully reproduced this myself now. But thanks for the offer! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Johan Wikman [ 2016-11-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fixed by commit https://github.com/mariadb-corporation/MaxScale/commit/ecb6680e71f064e1f14c316558ab4f688fec9265 |