[MDEV-16390] SELECT with text columns lasts forever and can not be killed - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 10.1.20
Fix Version/s: N/A
Component/s: Data Manipulation - Subquery, Data types, Storage Engine - InnoDB
Labels:
- blob
- need_feedback
Environment:
Ubuntu 16.04

Description

Under a couple of circumstances a SELECT statement makes the mysql to behave really weird, the query takes forever and can not be killed (it gets locked in status killed forever if you kill it). The only way to get out of these unhealthy situation is to kill -9 the mysqld process. The statement must be run several times simultaneously to reproduce the problem.
This is the simplest version of my query which made it happen:

SELECT

  t2.`id`,

  t11_sf1.html,

  t11_sf13076.text,

  GROUP_CONCAT( CONCAT(IFNULL(t12_dataSetTableN31727_sf31612.`f2`,""), ",", IFNULL(t12_dataSetTableN31727_sf31613.`f2`,""), ",", IFNULL(t12_dataSetTableN31727_sf31931.`f2`,"")) SEPARATOR "\n") AS `AdditionalPostingLocation`

FROM t2

JOIN t4 AS t44305 ON (t44305.`jobId` = t2.`id` AND t44305.`schemaDataId` IN (SELECT id FROM t5  WHERE (schemaSpecId = 3)))

JOIN t6 AS t11 ON (t11.`schemaDataId` = t44305.`schemaDataId`)

LEFT JOIN t1 AS t11_sf1 ON (t11_sf1.`id` = t11.`htmlContentIdf1`)

LEFT JOIN t3 AS t11_sf13076 ON (t11_sf13076.`id` = t11.`multiLineTextIdf114`)

LEFT JOIN t11 AS t12 ON (t12.`schemaDataId` = t44305.`schemaDataId` AND t12.`schemaFieldId` = 31727)

LEFT JOIN t9 AS t12_dataSetTableN31727 ON (t12_dataSetTableN31727.`schemaDataId` = t12.`sdie`)

LEFT JOIN t7 AS t12_dataSetTableN31727_sf31612 ON (t12_dataSetTableN31727_sf31612.`schemaDataId` = t12_dataSetTableN31727.`schemaData322Idf1`)

LEFT JOIN t8 AS t12_dataSetTableN31727_sf31613 ON (t12_dataSetTableN31727_sf31613.`schemaDataId` = t12_dataSetTableN31727.`schemaData323Idf2`)

LEFT JOIN t10 AS t12_dataSetTableN31727_sf31931 ON (t12_dataSetTableN31727_sf31931.`schemaDataId` = t12_dataSetTableN31727.`schemaData326Idf4`)

LEFT JOIN t12 AS t11_sf22401 ON (t11_sf22401.`schemaDataId` = t44305.`schemaDataId` AND t11_sf22401.`schemaFieldId` = 22401)

LEFT JOIN t13 AS t11_sf22401_sf22401 ON (t11_sf22401_sf22401.`id` = t11_sf22401.`schemaFieldOptionId`)

WHERE

  t11_sf22401_sf22401.`id` = 1245

GROUP BY t2.`id`

ORDER BY t2.`id` LIMIT 0, 20

To make it fail I just run it simultaneously, I use mysqlslap to achieve it:
mysqlslap --concurrency=11 --query=last-t.sql --no-drop -uusername --create-schema=testcrash --delimiter=";"
Where last-t.sql is the attached file containing the above mentioned query, and testcrash is the DB schema which can be restored using the attached dump file (tiny-tc.sql.gz).

Here are some elements I have already tested which could help to understand the causes:

mysqlslap makes it fails with 11 as concurrency and, of course higher numbers. It doesn't with lower values. Most probably it depends on the hardware set, I tested it mainly in 36 cores envir.
all the parts of the query must be present to reproduce it.
one of the mandatory parts are the "text" columns included in the select list clause. If you just replace them with the id (integer) column of the same table it just stops failing.
it fails also replacing the concat_group function with another aggregate function such as count(1).
it doesn't fail if you force any other plan, for example by adding straight_join hint.
The contents of t1 table, the largest table because of the html text column must be as large as they are in the dump. Running exactly the same mysqlslap test with lighter contents prevents the crash.

This is the processlist you get when the behaviour is already happening:

MariaDB [testcrash]> show processlist;

+-----+-----------+-----------+-----------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+----------+

| Id  | User      | Host      | db              | Command | Time | State                | Info                                                                                                 | Progress |

+-----+-----------+-----------+-----------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+----------+

|   4 | mataquery | localhost | NULL            | Sleep   |    0 |                      | NULL                                                                                                 |    0.000 |

|   6 | root      | localhost | genesysloadtest | Sleep   |  603 |                      | NULL                                                                                                 |    0.000 |

|   9 | root      | localhost | testcrash       | Query   |    0 | init                 | show processlist                                                                                     |    0.000 |

| 155 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT

  t2.`id`,

  t11_sf1.html AS `Description`,

  t11_sf13076.text AS `JobLocation`,

  GROUP |    0.000 |

| 156 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT

  t2.`id`,

  t11_sf1.html AS `Description`,

  t11_sf13076.text AS `JobLocation`,

  GROUP |    0.000 |

| 157 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT

  t2.`id`,

  t11_sf1.html AS `Description`,

  t11_sf13076.text AS `JobLocation`,

  GROUP |    0.000 |

| 158 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT

  t2.`id`,

  t11_sf1.html AS `Description`,

  t11_sf13076.text AS `JobLocation`,

  GROUP |    0.000 |

| 159 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT

  t2.`id`,

  t11_sf1.html AS `Description`,

  t11_sf13076.text AS `JobLocation`,

  GROUP |    0.000 |

| 160 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT

  t2.`id`,

  t11_sf1.html AS `Description`,

  t11_sf13076.text AS `JobLocation`,

  GROUP |    0.000 |

| 161 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT

  t2.`id`,

  t11_sf1.html AS `Description`,

  t11_sf13076.text AS `JobLocation`,

  GROUP |    0.000 |

| 162 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT

  t2.`id`,

  t11_sf1.html AS `Description`,

  t11_sf13076.text AS `JobLocation`,

  GROUP |    0.000 |

| 163 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT

  t2.`id`,

  t11_sf1.html AS `Description`,

  t11_sf13076.text AS `JobLocation`,

  GROUP |    0.000 |

| 164 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT

  t2.`id`,

  t11_sf1.html AS `Description`,

  t11_sf13076.text AS `JobLocation`,

  GROUP |    0.000 |

| 165 | root      | localhost | testcrash       | Killed  |   33 | Copying to tmp table | SELECT

  t2.`id`,

  t11_sf1.html AS `Description`,

  t11_sf13076.text AS `JobLocation`,

  GROUP |    0.000 |

+-----+-----------+-----------+-----------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+----------+

14 rows in set (0.00 sec)

The last one (query 165) was killed but it gets hang on "killed" status for ever.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

analyze-query.json
8 kB
2018-06-11 14:47
last-t.sql
2 kB
2018-06-05 12:30
my-10-1.cnf
4 kB
2018-06-08 13:27
stack-trace.output
66 kB
2018-07-23 19:56
tiny-tc.sql.gz
2.32 MB
2018-06-05 12:30

Issue Links

is duplicated by

MDEV-15206 Server hangs with small aria_pagecache_buffer_size instead of producing an error

Open

SELECT with text columns lasts forever and can not be killed

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration