Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16390

SELECT with text columns lasts forever and can not be killed

    XMLWordPrintable

Details

    Description

      Under a couple of circumstances a SELECT statement makes the mysql to behave really weird, the query takes forever and can not be killed (it gets locked in status killed forever if you kill it). The only way to get out of these unhealthy situation is to kill -9 the mysqld process. The statement must be run several times simultaneously to reproduce the problem.
      This is the simplest version of my query which made it happen:

      SELECT 
        t2.`id`,  
        t11_sf1.html,
        t11_sf13076.text,
        GROUP_CONCAT( CONCAT(IFNULL(t12_dataSetTableN31727_sf31612.`f2`,""), ",", IFNULL(t12_dataSetTableN31727_sf31613.`f2`,""), ",", IFNULL(t12_dataSetTableN31727_sf31931.`f2`,"")) SEPARATOR "\n") AS `AdditionalPostingLocation` 
      FROM t2
      JOIN t4 AS t44305 ON (t44305.`jobId` = t2.`id` AND t44305.`schemaDataId` IN (SELECT id FROM t5  WHERE (schemaSpecId = 3))) 
      JOIN t6 AS t11 ON (t11.`schemaDataId` = t44305.`schemaDataId`) 
      LEFT JOIN t1 AS t11_sf1 ON (t11_sf1.`id` = t11.`htmlContentIdf1`) 
      LEFT JOIN t3 AS t11_sf13076 ON (t11_sf13076.`id` = t11.`multiLineTextIdf114`) 
      LEFT JOIN t11 AS t12 ON (t12.`schemaDataId` = t44305.`schemaDataId` AND t12.`schemaFieldId` = 31727) 
      LEFT JOIN t9 AS t12_dataSetTableN31727 ON (t12_dataSetTableN31727.`schemaDataId` = t12.`sdie`) 
      LEFT JOIN t7 AS t12_dataSetTableN31727_sf31612 ON (t12_dataSetTableN31727_sf31612.`schemaDataId` = t12_dataSetTableN31727.`schemaData322Idf1`) 
      LEFT JOIN t8 AS t12_dataSetTableN31727_sf31613 ON (t12_dataSetTableN31727_sf31613.`schemaDataId` = t12_dataSetTableN31727.`schemaData323Idf2`) 
      LEFT JOIN t10 AS t12_dataSetTableN31727_sf31931 ON (t12_dataSetTableN31727_sf31931.`schemaDataId` = t12_dataSetTableN31727.`schemaData326Idf4`) 
      LEFT JOIN t12 AS t11_sf22401 ON (t11_sf22401.`schemaDataId` = t44305.`schemaDataId` AND t11_sf22401.`schemaFieldId` = 22401) 
      LEFT JOIN t13 AS t11_sf22401_sf22401 ON (t11_sf22401_sf22401.`id` = t11_sf22401.`schemaFieldOptionId`) 
      WHERE 
        t11_sf22401_sf22401.`id` = 1245 
      GROUP BY t2.`id`  
      ORDER BY t2.`id` LIMIT 0, 20 
      ;
      

      To make it fail I just run it simultaneously, I use mysqlslap to achieve it:
      mysqlslap --concurrency=11 --query=last-t.sql --no-drop -uusername --create-schema=testcrash --delimiter=";"
      Where last-t.sql is the attached file containing the above mentioned query, and testcrash is the DB schema which can be restored using the attached dump file (tiny-tc.sql.gz).

      Here are some elements I have already tested which could help to understand the causes:

      • mysqlslap makes it fails with 11 as concurrency and, of course higher numbers. It doesn't with lower values. Most probably it depends on the hardware set, I tested it mainly in 36 cores envir.
      • all the parts of the query must be present to reproduce it.
      • one of the mandatory parts are the "text" columns included in the select list clause. If you just replace them with the id (integer) column of the same table it just stops failing.
      • it fails also replacing the concat_group function with another aggregate function such as count(1).
      • it doesn't fail if you force any other plan, for example by adding straight_join hint.
      • The contents of t1 table, the largest table because of the html text column must be as large as they are in the dump. Running exactly the same mysqlslap test with lighter contents prevents the crash.

      This is the processlist you get when the behaviour is already happening:

      MariaDB [testcrash]> show processlist;
      +-----+-----------+-----------+-----------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+----------+
      | Id  | User      | Host      | db              | Command | Time | State                | Info                                                                                                 | Progress |
      +-----+-----------+-----------+-----------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+----------+
      |   4 | mataquery | localhost | NULL            | Sleep   |    0 |                      | NULL                                                                                                 |    0.000 |
      |   6 | root      | localhost | genesysloadtest | Sleep   |  603 |                      | NULL                                                                                                 |    0.000 |
      |   9 | root      | localhost | testcrash       | Query   |    0 | init                 | show processlist                                                                                     |    0.000 |
      | 155 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT 
        t2.`id`,  
        t11_sf1.html AS `Description`, 
        t11_sf13076.text AS `JobLocation`, 
        GROUP |    0.000 |
      | 156 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT 
        t2.`id`,  
        t11_sf1.html AS `Description`, 
        t11_sf13076.text AS `JobLocation`, 
        GROUP |    0.000 |
      | 157 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT 
        t2.`id`,  
        t11_sf1.html AS `Description`, 
        t11_sf13076.text AS `JobLocation`, 
        GROUP |    0.000 |
      | 158 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT 
        t2.`id`,  
        t11_sf1.html AS `Description`, 
        t11_sf13076.text AS `JobLocation`, 
        GROUP |    0.000 |
      | 159 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT 
        t2.`id`,  
        t11_sf1.html AS `Description`, 
        t11_sf13076.text AS `JobLocation`, 
        GROUP |    0.000 |
      | 160 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT 
        t2.`id`,  
        t11_sf1.html AS `Description`, 
        t11_sf13076.text AS `JobLocation`, 
        GROUP |    0.000 |
      | 161 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT 
        t2.`id`,  
        t11_sf1.html AS `Description`, 
        t11_sf13076.text AS `JobLocation`, 
        GROUP |    0.000 |
      | 162 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT 
        t2.`id`,  
        t11_sf1.html AS `Description`, 
        t11_sf13076.text AS `JobLocation`, 
        GROUP |    0.000 |
      | 163 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT 
        t2.`id`,  
        t11_sf1.html AS `Description`, 
        t11_sf13076.text AS `JobLocation`, 
        GROUP |    0.000 |
      | 164 | root      | localhost | testcrash       | Query   |   33 | Copying to tmp table | SELECT 
        t2.`id`,  
        t11_sf1.html AS `Description`, 
        t11_sf13076.text AS `JobLocation`, 
        GROUP |    0.000 |
      | 165 | root      | localhost | testcrash       | Killed  |   33 | Copying to tmp table | SELECT 
        t2.`id`,  
        t11_sf1.html AS `Description`, 
        t11_sf13076.text AS `JobLocation`, 
        GROUP |    0.000 |
      +-----+-----------+-----------+-----------------+---------+------+----------------------+------------------------------------------------------------------------------------------------------+----------+
      14 rows in set (0.00 sec)
      

      The last one (query 165) was killed but it gets hang on "killed" status for ever.

      Attachments

        1. analyze-query.json
          8 kB
        2. last-t.sql
          2 kB
        3. my-10-1.cnf
          4 kB
        4. stack-trace.output
          66 kB
        5. tiny-tc.sql.gz
          2.32 MB

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gschulman Guillermo Schulman
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.