Under a couple of circumstances a SELECT statement makes the mysql to behave really weird, the query takes forever and can not be killed (it gets locked in status killed forever if you kill it). The only way to get out of these unhealthy situation is to kill -9 the mysqld process. The statement must be run several times simultaneously to reproduce the problem.
This is the simplest version of my query which made it happen:
To make it fail I just run it simultaneously, I use mysqlslap to achieve it:
mysqlslap --concurrency=11 --query=last-t.sql --no-drop -uusername --create-schema=testcrash --delimiter=";"
Where last-t.sql is the attached file containing the above mentioned query, and testcrash is the DB schema which can be restored using the attached dump file (tiny-tc.sql.gz).
Here are some elements I have already tested which could help to understand the causes:
- mysqlslap makes it fails with 11 as concurrency and, of course higher numbers. It doesn't with lower values. Most probably it depends on the hardware set, I tested it mainly in 36 cores envir.
- all the parts of the query must be present to reproduce it.
- one of the mandatory parts are the "text" columns included in the select list clause. If you just replace them with the id (integer) column of the same table it just stops failing.
- it fails also replacing the concat_group function with another aggregate function such as count(1).
- it doesn't fail if you force any other plan, for example by adding straight_join hint.
- The contents of t1 table, the largest table because of the html text column must be as large as they are in the dump. Running exactly the same mysqlslap test with lighter contents prevents the crash.
This is the processlist you get when the behaviour is already happening:
The last one (query 165) was killed but it gets hang on "killed" status for ever.