[MDEV-29865] Optimizer chooses wrong path for big results Created: 2022-10-24  Updated: 2022-10-26

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.6.9, 10.10.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Oli Sennhauser Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: optimizer, sql_big_result, sql_small_result
Environment:

Linux, n.a.


Attachments: File test_dump.sql.gz    

 Description   

Optimizer chooses wrong way to execute GROUP BY for big results. The difference is dramatic.

The documentation states:

"The optimizer automatically knows if the result is too big, but you can force the optimizer to use a temporary table with SQL_SMALL_RESULT, or avoid the temporary table using SQL_BIG_RESULT."

SQL> SELECT SQL_BIG_RESULT data, id, ts, COUNT(*) FROM test GROUP BY data, id, ts;
SQL> SELECT SQL_SMALL_RESULT /* which is the default */ data, id, ts, COUNT(*) FROM test GROUP BY data, id, ts;

10.6.9

# Query_time: 1.503094  Lock_time: 0.000179  Rows_sent: 1048578  Rows_examined: 2097156
# Rows_affected: 0  Bytes_sent: 64202110
SELECT SQL_BIG_RESULT data, id, ts, COUNT(*) FROM test GROUP BY data, id, ts;
 
# Query_time: 14.653313  Lock_time: 0.000131  Rows_sent: 1048578  Rows_examined: 3145734
# Rows_affected: 0  Bytes_sent: 64202110
SELECT data, id, ts, COUNT(*) FROM test GROUP BY data, id, ts;

10.10.1

# Query_time: 1.535291  Lock_time: 0.000150  Rows_sent: 1048578  Rows_examined: 2097156
# Rows_affected: 0  Bytes_sent: 64202110
# Full_scan: Yes  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
# Filesort: Yes  Filesort_on_disk: No  Merge_passes: 7  Priority_queue: No
select sql_big_result data, id, ts, count(*) from test group by data, id, ts;
 
# Query_time: 14.631351  Lock_time: 0.000150  Rows_sent: 1048578  Rows_examined: 3145734
# Rows_affected: 0  Bytes_sent: 64202110
# Tmp_tables: 2  Tmp_disk_tables: 1  Tmp_table_sizes: 106225664
# Full_scan: Yes  Full_join: No  Tmp_table: Yes  Tmp_table_on_disk: Yes
# Filesort: Yes  Filesort_on_disk: No  Merge_passes: 8  Priority_queue: No
select data, id, ts, count(*) from test group by data, id, ts;

It looks like the Optimizer is doing 50% more work in the default case...

Test set is attached.

For small results it seems OK:

SQL> select sql_small_result data, ts, count(*) from test group by data, ts;
12 rows in set (0.483 sec)
 
SQL> select sql_big_result data, ts, count(*) from test group by data, ts;
12 rows in set (1.159 sec)

I see some huge potential here for customers with huge GROUP BY...


Generated at Thu Feb 08 10:11:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.