Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17970

Query state "removing duplicates" - DISTINCT + GROUP BY

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 5.5, 10.0, 10.1, 10.2, 10.3
    • N/A
    • Optimizer
    • None
    • PROD

    Description

      We have identified problematic query, it takes average ~90 seconds to complete. If we do not use DISTINCT keyword then it runs fast.

      MariaDB [sbtest]> SELECT DISTINCT `inspection_id` ,`field_value`   FROM `inspections`  WHERE (`label` LIKE 'start time%' )  GROUP BY `inspection_id` ,`field_value` ,`field_num` ,`label` LIMIT 1;
      +--------------------+-------------+
      | inspection_id      | field_value |
      +--------------------+-------------+
      | ACE-P/201007081215 | 12:15       |
      +--------------------+-------------+
      1 row in set (1 min 26.36 sec)
       
      MariaDB [sbtest]> EXPLAIN SELECT DISTINCT `inspection_id` ,`field_value`   FROM `inspections`  WHERE (`label` LIKE 'start time%' )  GROUP BY `inspection_id` ,`field_value` ,`field_num` ,`label`;
      +------+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------------------------------------+
      | id   | select_type | table       | type  | possible_keys | key   | key_len | ref  | rows | Extra                                                  |
      +------+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------------------------------------+
      |    1 | SIMPLE      | inspections | range | label         | label | 62      | NULL | 3160 | Using index condition; Using temporary; Using filesort |
      +------+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------------------------------------+
      1 row in set (0.00 sec)
       
      MariaDB [sbtest]>  SELECT COUNT(*)   FROM `inspections`  WHERE (`label` LIKE 'start time%' ) ;
      +----------+
      | COUNT(*) |
      +----------+
      |     8648 |
      +----------+
      1 row in set (0.00 sec)
      

      While query is running, show processlist output shows query state 'removing duplicates'

      +------+------+--------------------------------+--------+-------------+-------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
      | Id   | User | Host                           | db     | Command     | Time  | State                                                                 | Info                                                                                                 | Progress |
      +------+------+--------------------------------+--------+-------------+-------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
      |    4 | repl | npedbmys001.xxxxx.xxx.uk:53411 | NULL   | Binlog Dump | 96575 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL                                                                                                 |    0.000 |
      | 1101 | root | localhost                      | sbtest | Query       |     0 | init                                                                  | show processlist                                                                                     |    0.000 |
      | 1140 | root | localhost                      | sbtest | Query       |     6 | Removing duplicates                                                   | SELECT DISTINCT `inspection_id` ,`field_value`   FROM `inspections`  WHERE (`label` LIKE 'start time |    0.000 |
      +------+------+--------------------------------+--------+-------------+-------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
      3 rows in set (0.00 sec)
      
      

      However, if we remove distinct key word then the same query completes under 1 second

      MariaDB [sbtest]> SELECT `inspection_id` ,`field_value`   FROM `inspections`  WHERE (`label` LIKE 'start time%' )  GROUP BY `inspection_id` ,`field_value` ,`field_num` ,`label` LIMIT 1;
      +--------------------+-------------+
      | inspection_id      | field_value |
      +--------------------+-------------+
      | ACE-P/201007081215 | 12:15       |
      +--------------------+-------------+
      1 row in set (0.10 sec)
      

      This issue is easy to reproduce:

      1) Create table with following structure:

      CREATE TABLE `foo` (
        `a` varchar(18) DEFAULT NULL,
        `b` text,
        `c` int(3) DEFAULT NULL,
        `d` varchar(60) DEFAULT NULL
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1
      ## Storage engine can be either INNODB or MYISAM ##
      

      2) Load data using attached file 'foo.sql', it would bring ~2k rows

      3) reproduce

      MariaDB [sbtest]> SELECT DISTINCT  `a` ,`b`  FROM `foo` GROUP BY `a` ,`b` ,`c` ,`d` limit 1;
      +--------------------+------+
      | a                  | b    |
      +--------------------+------+
      | ALD-A/200808181115 |      |
      +--------------------+------+
      1 row in set (0.68 sec)
       
      MariaDB [sbtest]> show table status like 'foo'\G
      *************************** 1. row ***************************
                 Name: foo
               Engine: InnoDB
              Version: 10
           Row_format: Compact
                 Rows: 2000
       Avg_row_length: 106
          Data_length: 212992
      Max_data_length: 0
         Index_length: 0
            Data_free: 0
       Auto_increment: NULL
          Create_time: 2018-12-11 13:36:30
          Update_time: NULL
           Check_time: NULL
            Collation: latin1_swedish_ci
             Checksum: NULL
       Create_options:
              Comment:
      1 row in set (0.00 sec)
      

      Attachments

        1. foo.sql
          125 kB
        2. schema.sql
          1 kB

        Activity

          People

            igor Igor Babaev
            aftabxkhan Aftab Khan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.