[MDEV-17970] Query state "removing duplicates" - DISTINCT + GROUP BY Created: 2018-12-11  Updated: 2019-04-23  Resolved: 2019-04-15

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Aftab Khan Assignee: Igor Babaev
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

PROD


Attachments: File foo.sql     File schema.sql    

 Description   

We have identified problematic query, it takes average ~90 seconds to complete. If we do not use DISTINCT keyword then it runs fast.

MariaDB [sbtest]> SELECT DISTINCT `inspection_id` ,`field_value`   FROM `inspections`  WHERE (`label` LIKE 'start time%' )  GROUP BY `inspection_id` ,`field_value` ,`field_num` ,`label` LIMIT 1;
+--------------------+-------------+
| inspection_id      | field_value |
+--------------------+-------------+
| ACE-P/201007081215 | 12:15       |
+--------------------+-------------+
1 row in set (1 min 26.36 sec)
 
MariaDB [sbtest]> EXPLAIN SELECT DISTINCT `inspection_id` ,`field_value`   FROM `inspections`  WHERE (`label` LIKE 'start time%' )  GROUP BY `inspection_id` ,`field_value` ,`field_num` ,`label`;
+------+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------------------------------------+
| id   | select_type | table       | type  | possible_keys | key   | key_len | ref  | rows | Extra                                                  |
+------+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------------------------------------+
|    1 | SIMPLE      | inspections | range | label         | label | 62      | NULL | 3160 | Using index condition; Using temporary; Using filesort |
+------+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [sbtest]>  SELECT COUNT(*)   FROM `inspections`  WHERE (`label` LIKE 'start time%' ) ;
+----------+
| COUNT(*) |
+----------+
|     8648 |
+----------+
1 row in set (0.00 sec)

While query is running, show processlist output shows query state 'removing duplicates'

+------+------+--------------------------------+--------+-------------+-------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
| Id   | User | Host                           | db     | Command     | Time  | State                                                                 | Info                                                                                                 | Progress |
+------+------+--------------------------------+--------+-------------+-------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
|    4 | repl | npedbmys001.xxxxx.xxx.uk:53411 | NULL   | Binlog Dump | 96575 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL                                                                                                 |    0.000 |
| 1101 | root | localhost                      | sbtest | Query       |     0 | init                                                                  | show processlist                                                                                     |    0.000 |
| 1140 | root | localhost                      | sbtest | Query       |     6 | Removing duplicates                                                   | SELECT DISTINCT `inspection_id` ,`field_value`   FROM `inspections`  WHERE (`label` LIKE 'start time |    0.000 |
+------+------+--------------------------------+--------+-------------+-------+-----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+----------+
3 rows in set (0.00 sec)

However, if we remove distinct key word then the same query completes under 1 second

MariaDB [sbtest]> SELECT `inspection_id` ,`field_value`   FROM `inspections`  WHERE (`label` LIKE 'start time%' )  GROUP BY `inspection_id` ,`field_value` ,`field_num` ,`label` LIMIT 1;
+--------------------+-------------+
| inspection_id      | field_value |
+--------------------+-------------+
| ACE-P/201007081215 | 12:15       |
+--------------------+-------------+
1 row in set (0.10 sec)

This issue is easy to reproduce:

1) Create table with following structure:

CREATE TABLE `foo` (
  `a` varchar(18) DEFAULT NULL,
  `b` text,
  `c` int(3) DEFAULT NULL,
  `d` varchar(60) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
## Storage engine can be either INNODB or MYISAM ##

2) Load data using attached file 'foo.sql', it would bring ~2k rows

3) reproduce

MariaDB [sbtest]> SELECT DISTINCT  `a` ,`b`  FROM `foo` GROUP BY `a` ,`b` ,`c` ,`d` limit 1;
+--------------------+------+
| a                  | b    |
+--------------------+------+
| ALD-A/200808181115 |      |
+--------------------+------+
1 row in set (0.68 sec)
 
MariaDB [sbtest]> show table status like 'foo'\G
*************************** 1. row ***************************
           Name: foo
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 2000
 Avg_row_length: 106
    Data_length: 212992
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2018-12-11 13:36:30
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)



 Comments   
Comment by Alice Sherepa [ 2018-12-11 ]

Thanks a lot for the report and test case. The difference is in "duplicate_removal" phase, and only after that there is LIMIT 1.

10.1.37

MariaDB [test]> analyze format=json SELECT  DISTINCT  `a` ,`b`  FROM `foo` GROUP BY `a` ,`b` ,`c` ,`d` limit 1;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ANALYZE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 358.62,
    "filesort": {
      "r_loops": 1,
      "r_total_time_ms": 0.1855,
      "r_limit": 1,
      "r_used_priority_queue": true,
      "r_output_rows": 2,
      "duplicate_removal": {
        "temporary_table": {
          "table": {
            "table_name": "foo",
            "access_type": "ALL",
            "r_loops": 1,
            "rows": 2000,
            "r_rows": 2000,
            "r_total_time_ms": 1.7921,
            "filtered": 100,
            "r_filtered": 100
          }
        }
      }
    }
  }
} |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.36 sec)

MariaDB [test]> analyze format=json SELECT  `a` ,`b`  FROM `foo` GROUP BY `a` ,`b` ,`c` ,`d` limit 1;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ANALYZE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 18.394,
    "filesort": {
      "r_loops": 1,
      "r_total_time_ms": 0.6903,
      "r_limit": 1,
      "r_used_priority_queue": true,
      "r_output_rows": 2,
      "temporary_table": {
        "table": {
          "table_name": "foo",
          "access_type": "ALL",
          "r_loops": 1,
          "rows": 2000,
          "r_rows": 2000,
          "r_total_time_ms": 1.8729,
          "filtered": 100,
          "r_filtered": 100
        }
      }
    }
  }
} |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

Comment by Igor Babaev [ 2019-04-15 ]

One query is just with LIMIT, another with DISTINCT. How their performance can be compared? Apparently sorting for distinct takes a lot of time.

Generated at Thu Feb 08 08:40:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.