[MDEV-8077] Add order by clause make query to 100 times longer to execute Created: 2015-04-29  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.4, 10.5, 10.6, 10.11

Type: Bug Priority: Major
Reporter: Patrick Quentin Armitage Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 1
Labels: optimizer, order-by-optimization, subquery, verified
Environment:

Fedora 22


Attachments: File op.tar.bz2    

 Description   

The query
select * from fd_peal where fid in (select fid from bert where sid is null) ;
takes 0.7 seconds to execute.
The query
select * from fd_peal where fid in (select fid from bert where sid is null) order by rung;
(i.e. adding order by rung) takes between 70 and 90 seconds to execute.

The query returns 94 rows, the table fd_peal has 335154 rows, and the table bert has 335137 rows.

The execution plans for the queries are:

MariaDB [db]> explain extended select * from fd_peal where fid in (select fid from bert where sid is null) order by rung;
+------+--------------+-------------+--------+---------------+--------------+---------+------+--------+----------+----------------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref  | rows   | filtered | Extra          |
+------+--------------+-------------+--------+---------------+--------------+---------+------+--------+----------+----------------+
|    1 | PRIMARY      | fd_peal     | ALL    | NULL          | NULL         | NULL    | NULL | 311567 |   100.00 | Using filesort |
|    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func |      1 |   100.00 |                |
|    2 | MATERIALIZED | bert        | ALL    | NULL          | NULL         | NULL    | NULL | 357018 |   100.00 | Using where    |
+------+--------------+-------------+--------+---------------+--------------+---------+------+--------+----------+----------------+
 
MariaDB [db]> explain extended select * from fd_peal where fid in (select fid from bert where sid is null) ;
+------+--------------+-------------+--------+---------------+--------------+---------+------+--------+----------+-------------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref  | rows   | filtered | Extra       |
+------+--------------+-------------+--------+---------------+--------------+---------+------+--------+----------+-------------+
|    1 | PRIMARY      | fd_peal     | ALL    | NULL          | NULL         | NULL    | NULL | 311567 |   100.00 |             |
|    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func |      1 |   100.00 |             |
|    2 | MATERIALIZED | bert        | ALL    | NULL          | NULL         | NULL    | NULL | 357018 |   100.00 | Using where |
+------+--------------+-------------+--------+---------------+--------------+---------+------+--------+----------+-------------+

The tables are defined as follows:

show create table fd_peal;
| fd_peal | CREATE TABLE `fd_peal` (
  `fid` int(11) NOT NULL,
  `flag` enum('OK','BAD_TOWER','FALSE','WITHDRAWN','UNACCEPTABLE','BAD_METHOD','HOAX','DUPLICATE','BAD_DATE','','ERROR','NON-COMPLIANT') DEFAULT NULL,
  `tid` smallint(6) DEFAULT NULL,
  `rung` date DEFAULT NULL,
  `seq` int(11) DEFAULT NULL,
  `method` varchar(255) DEFAULT NULL,
  `comment` varchar(255) DEFAULT NULL,
  `changed` varchar(255) DEFAULT NULL,
  `camp` int(11) DEFAULT NULL,
  KEY `xtid` (`tid`),
  KEY `xrung` (`rung`),
  FULLTEXT KEY `method` (`method`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
 
MariaDB [db]> show create table bert;
| bert  | CREATE TABLE `bert` (
  `fid` int(11) NOT NULL,
  `flag` enum('OK','BAD_TOWER','FALSE','WITHDRAWN','UNACCEPTABLE','BAD_METHOD','HOAX','DUPLICATE','BAD_DATE','','ERROR','NON-COMPLIANT') DEFAULT NULL,
  `tid` smallint(6) DEFAULT NULL,
  `rung` date DEFAULT NULL,
  `seq` int(11) DEFAULT NULL,
  `method` varchar(255) DEFAULT NULL,
  `mid` smallint(6) DEFAULT NULL,
  `sid` smallint(3) DEFAULT NULL,
  `vc` tinyint(1) NOT NULL DEFAULT '0',
  `methods` smallint(4) NOT NULL DEFAULT '1',
  `variations` smallint(4) NOT NULL DEFAULT '0',
  `comment` varchar(255) DEFAULT NULL,
  `changed` varchar(255) DEFAULT NULL,
  `camp` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

I would have expected to order by clause to sort the rows returned by the query, whereas, given the amount of time the query is taking it is presumably doing the order by before the query.



 Comments   
Comment by Daniel Black [ 2015-04-29 ]

How does the following compare:

select fd_peal.* from fd_peal join bert on fd_peal.fid =bert.fid  where sid is null order by rung;

Having an index on fd_peal.fid and bert.(fid,sid,rung) would help a lot.

Can you include SHOW INDEXES FROM bert; SHOW INDEXES FROM fd_peal

Comment by Daniel Black [ 2015-04-29 ]

Also might be with trying

ALTER TABLE bert ADD KEY (sid,fid,rung);

too

Comment by Patrick Quentin Armitage [ 2015-04-29 ]

The issue I was attempting to report was that adding the "order by rung" to the query causes the query to take 100 times longer to execute.

The query without the order by clause executes in 0.7 seconds, and so indices clearly aren't needed to make the basic query execute in a reasonable time. Why does adding the order by clause, when one would expect it to only have to sort the 94 rows returned, make the query take 70 seconds instead of 0.7 seconds?

I reported this because it looked to me as though something strange was happening with the way the query was being executed, and it might benefit other users if this strange behaviour were resolved.

Comment by Patrick Quentin Armitage [ 2015-04-29 ]

Re query: select fd_peal.* from fd_peal join bert on fd_peal.fid =bert.fid where sid is null order by rung;

it takes 31 seconds to execute this whether the order by clause is included to not.

The output of the show indexes commands is:

MariaDB [db]> show indexes from fd_peal;
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| fd_peal |          1 | xtid     |            1 | tid         | A         |       14162 |     NULL | NULL   | YES  | BTREE      |         |               |
| fd_peal |          1 | xrung    |            1 | rung        | A         |       77891 |     NULL | NULL   | YES  | BTREE      |         |               |
| fd_peal |          1 | method   |            1 | method      | NULL      |      311567 |     NULL | NULL   | YES  | FULLTEXT   |         |               |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.05 sec)
 
MariaDB [db]> show indexes from bert;
Empty set (0.00 sec)

I fully appreciate that adding indices would enable the query to execute more sensibly, but the point was that without the "order by rung" clause it executes in a reasonable time, but with the "order by rung" clause it doesn't

Comment by Daniel Black [ 2015-04-29 ]

I fully appreciate that adding indices would enable the query to execute more sensibly, but the point was that without the "order by rung" clause it executes in a reasonable time, but with the "order by rung" clause it doesn't

Its sorting the entire fd_peal table first hence the delay. Sorting on the result set is a better solution.

Can you include your show global variables output? Mainly interested to see what optimizer settings you have.

A optimizer feature not yet enabled by default is https://mariadb.com/kb/en/mariadb/histogram-based-statistics .

Is it possible to get the table extract fd_peal(fid,rung) and bert(fid,sid) to save us creating a test case

select fid,rung into outfile '/tmp/fd_pedal' from fd_pedal;
select fid,sid into outfile '/tmp/bert' from bert;

Comment by Patrick Quentin Armitage [ 2015-04-29 ]

Attachment op.tar.bz2 has been added that includes the output of "show global variabless" and also the two requested outfiles.

Comment by Elena Stepanova [ 2015-05-01 ]

It's easy enough to reproduce, I created a very artificial data set which only meets two conditions – 300,000 rows in each table and 100 rows intersection for the query – and got similar results. Well, maybe it's not 100x slower for me, but 10-60x, depending on luck, but still.
ALTER TABLE fd_peal ADD KEY (fid) and ALTER TABLE bert ADD KEY (sid,fid,rung) didn't help.

MariaDB [test]> show index in bert;
Empty set (0.00 sec)
 
MariaDB [test]> show index in fd_peal;
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| fd_peal |          1 | xtid     |            1 | tid         | A         |      104140 |     NULL | NULL   | YES  | BTREE      |         |               |
| fd_peal |          1 | xrung    |            1 | rung        | A         |        6647 |     NULL | NULL   | YES  | BTREE      |         |               |
| fd_peal |          1 | method   |            1 | method      | NULL      |      312422 |     NULL | NULL   | YES  | FULLTEXT   |         |               |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

MariaDB [test]> explain extended select * from fd_peal where fid in (select fid from bert where sid is null) order by rung;
+------+--------------+-------------+--------+---------------+--------------+---------+------+--------+----------+----------------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref  | rows   | filtered | Extra          |
+------+--------------+-------------+--------+---------------+--------------+---------+------+--------+----------+----------------+
|    1 | PRIMARY      | fd_peal     | ALL    | NULL          | NULL         | NULL    | NULL | 312422 |   100.00 | Using filesort |
|    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func |      1 |   100.00 |                |
|    2 | MATERIALIZED | bert        | ALL    | NULL          | NULL         | NULL    | NULL | 315532 |   100.00 | Using where    |
+------+--------------+-------------+--------+---------------+--------------+---------+------+--------+----------+----------------+
3 rows in set, 1 warning (0.00 sec)
 
MariaDB [test]> explain extended select * from fd_peal where fid in (select fid from bert where sid is null);
+------+--------------+-------------+--------+---------------+--------------+---------+------+--------+----------+-------------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref  | rows   | filtered | Extra       |
+------+--------------+-------------+--------+---------------+--------------+---------+------+--------+----------+-------------+
|    1 | PRIMARY      | fd_peal     | ALL    | NULL          | NULL         | NULL    | NULL | 312422 |   100.00 |             |
|    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func |      1 |   100.00 |             |
|    2 | MATERIALIZED | bert        | ALL    | NULL          | NULL         | NULL    | NULL | 315532 |   100.00 | Using where |
+------+--------------+-------------+--------+---------------+--------------+---------+------+--------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)

MariaDB [test]> select * from fd_peal where fid in (select fid from bert where sid is null) order by rung;
<result set>
100 rows in set (3.71 sec)
 
MariaDB [test]> select * from fd_peal where fid in (select fid from bert where sid is null);
<result set>
100 rows in set (0.39 sec)
 
MariaDB [test]> flush tables;
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> select * from fd_peal where fid in (select fid from bert where sid is null) order by rung;
<result set>
100 rows in set (3.05 sec)
 
MariaDB [test]> select * from fd_peal where fid in (select fid from bert where sid is null);
<result set>
100 rows in set (0.39 sec)

MariaDB [test]> analyze table bert, fd_peal;
+--------------+---------+----------+----------+
| Table        | Op      | Msg_type | Msg_text |
+--------------+---------+----------+----------+
| test.bert    | analyze | status   | OK       |
| test.fd_peal | analyze | status   | OK       |
+--------------+---------+----------+----------+
2 rows in set (0.26 sec)
 
MariaDB [test]> select * from fd_peal where fid in (select fid from bert where sid is null) order by rung;
<result set>
100 rows in set (5.50 sec)
 
MariaDB [test]> select * from fd_peal where fid in (select fid from bert where sid is null);
<result set>
100 rows in set (0.38 sec)
 
MariaDB [test]> select * from fd_peal where fid in (select fid from bert where sid is null) order by rung;
<result set>
100 rows in set (3.12 sec)

MariaDB [test]> ALTER TABLE fd_peal ADD KEY (fid);
Query OK, 0 rows affected (0.93 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
MariaDB [test]> ALTER TABLE bert ADD KEY (sid,fid,rung);
Query OK, 0 rows affected (1.46 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
MariaDB [test]> analyze table bert, fd_peal;
+--------------+---------+----------+----------+
| Table        | Op      | Msg_type | Msg_text |
+--------------+---------+----------+----------+
| test.bert    | analyze | status   | OK       |
| test.fd_peal | analyze | status   | OK       |
+--------------+---------+----------+----------+
2 rows in set (0.06 sec)

MariaDB [test]> show index in bert;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| bert  |          1 | sid      |            1 | sid         | A         |      104233 |     NULL | NULL   | YES  | BTREE      |         |               |
| bert  |          1 | sid      |            2 | fid         | A         |      104233 |     NULL | NULL   |      | BTREE      |         |               |
| bert  |          1 | sid      |            3 | rung        | A         |      312700 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)
 
MariaDB [test]> show index in fd_peal;
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| fd_peal |          1 | xtid     |            1 | tid         | A         |       79979 |     NULL | NULL   | YES  | BTREE      |         |               |
| fd_peal |          1 | xrung    |            1 | rung        | A         |        6954 |     NULL | NULL   | YES  | BTREE      |         |               |
| fd_peal |          1 | fid      |            1 | fid         | A         |           2 |     NULL | NULL   |      | BTREE      |         |               |
| fd_peal |          1 | method   |            1 | method      | NULL      |      319917 |     NULL | NULL   | YES  | FULLTEXT   |         |               |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)

MariaDB [test]> explain extended select * from fd_peal where fid in (select fid from bert where sid is null) order by rung;
+------+--------------+-------------+--------+---------------+--------------+---------+-------+--------+----------+--------------------------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref   | rows   | filtered | Extra                    |
+------+--------------+-------------+--------+---------------+--------------+---------+-------+--------+----------+--------------------------+
|    1 | PRIMARY      | fd_peal     | ALL    | fid           | NULL         | NULL    | NULL  | 319917 |   100.00 | Using filesort           |
|    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func  |      1 |   100.00 |                          |
|    2 | MATERIALIZED | bert        | ref    | sid           | sid          | 3       | const | 156350 |   100.00 | Using where; Using index |
+------+--------------+-------------+--------+---------------+--------------+---------+-------+--------+----------+--------------------------+
3 rows in set, 1 warning (0.00 sec)
 
MariaDB [test]> explain extended select * from fd_peal where fid in (select fid from bert where sid is null);
+------+--------------+-------------+--------+---------------+--------------+---------+-------+--------+----------+--------------------------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref   | rows   | filtered | Extra                    |
+------+--------------+-------------+--------+---------------+--------------+---------+-------+--------+----------+--------------------------+
|    1 | PRIMARY      | fd_peal     | ALL    | fid           | NULL         | NULL    | NULL  | 319917 |   100.00 |                          |
|    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func  |      1 |   100.00 |                          |
|    2 | MATERIALIZED | bert        | ref    | sid           | sid          | 3       | const | 156350 |   100.00 | Using where; Using index |
+------+--------------+-------------+--------+---------------+--------------+---------+-------+--------+----------+--------------------------+
3 rows in set, 1 warning (0.00 sec)

MariaDB [test]> select * from fd_peal where fid in (select fid from bert where sid is null) order by rung;
<result set>
100 rows in set (3.48 sec)
 
MariaDB [test]> select * from fd_peal where fid in (select fid from bert where sid is null);
<result set>
100 rows in set (0.30 sec)

Comment by Sergei Petrunia [ 2015-06-16 ]

Confirm. I wasn't able to observe such a big difference (for me it's 2x - 5x) but the issue seems to be that the optimizer first sorts the table (sorting 300K rows), and then does the join (which leaves ~90 rows).

MySQL/MariaDB generally tries to produce the required ordering as soon as possible. I guess, the reasons for this are:
1. Typically, join operations have fanout > 1 (i.e. doing a join (or semi-join) operation with other table makes the dataset bigger, not smaller).
2. ORDER BY optimizer is closely related to ORDER BY ... LIMIT optimizer. When LIMIT n clause is present, producing results in the required ordering allows to finish query execution as soon as n rows are produced.

It is actually quite difficult to reliably detect the case where fanout of join operation is much less than 1. Even in this case:
" fid in (select fid from bert where sid is null)". What if all/most of fid values are the same? It is diffiuclt to prove that join operation will have fanout << 1.

Comment by Sergei Petrunia [ 2015-06-16 ]

I don't see a way to fix this soon. ORDER BY ... LIMIT optimization has other shortcomings that need to be fixed first.

Still, thanks for reporting this case, it is useful to know what kind of things are observed in production.

Comment by Elena Stepanova [ 2023-01-22 ]

I cannot reproduce a considerable difference in execution time anymore, but it looks like the logic which psergei described (first sort the big table, then apply the join condition) may still be the case, so I'm updating the affected/fix versions based on this assumption.

Generated at Thu Feb 08 07:24:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.