|
Final evaluation results:
( see the the list of commits here https://github.com/MariaDB/server/compare/10.3...shagalla:10.3-mdev12172 )
Optimization that transforms IN predicate into IN subquery that uses TVC was made.
It works this way:
SELECT ... WHERE (expr1, ...) [NOT] IN (value_list) ...;
|
=>
|
SELECT ... WHERE (expr1, ...) [NOT] IN (SELECT * FROM (VALUES value_list) AS tvc_name) ...;
|
Optimization works when IN predicate is used in WHERE clause of the query or when it is ON expression of some join. It also works when IN predicate is in derived tables, CTEs and subqueries.
What's done:
1. global variable in_subquery_conversion_threshold that controls optimization working
Optimization works only when the total number of scalar values used in IN predicate is more than this variable
2. transformer Item_func_in::in_predicate_to_in_subs_transformer that transforms IN predicate in IN subselect
3. creation of TVC specified by values from IN predicate
4. creation of new statement
SELECT * FROM (VALUES value_list) AS tvc_name
|
5. new test file: opt_tvc.test
To clone this project from github you need to do:
git clone https://github.com/shagalla/server 10.3-mdev12172
|
cd 10.3-mdev12172
|
git branch 10.3-mdev12172
|
git checkout 10.3-mdev12172
|
maria-git/10.3-mdev12172> git pull https://github.com/shagalla/server 10.3-mdev12172
|
|
|
Thanks Igor for that MDEV that's one of super generic performance improvement i'v seen so far
|
|
Can i deactivate that "optimization", as it seems for me it degrades performance badly.
MariaDB 10.2.19:
# Time: 181120 15:39:39
|
# User@Host:
|
# Thread_id: 649859 Schema: leitsystem QC_hit: No
|
# Query_time: *3*.009499 Lock_time: 0.001029 Rows_sent: 25 Rows_examined: 529865
|
# Rows_affected: 0
|
# Full_scan: No Full_join: No Tmp_table: Yes Tmp_table_on_disk: No
|
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: Yes
|
#
|
# explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
|
# explain: 1 SIMPLE zr_te_value range PRIMARY,ts PRIMARY 9 NULL 273462 264920.00 100.00 100.00 Using where; Using temporary; Using filesort
|
#
|
SET timestamp=1542724779;
|
SELECT CONCAT("9906643000004NEXT", LPAD(next_id_int, 16, "0")) AS next_id, ts - MOD(ts, 300), ROUND(AVG(value), 3), AVG(quality) FROM leitsystem.zr_te_value WHERE next_id_int IN ('0000000000006447', '0000000000005272', ... )
|
AND ts >= 1542668700 AND ts < 1542755400 AND zr_art = 12 GROUP BY next_id, ts DIV 300 ORDER BY ts
|
MariaDB 10.3.10:
# Time: 181120 15:43:02
|
# User@Host:
|
# Thread_id: 195 Schema: leitsystem QC_hit: No
|
# Query_time: *191*.434622 Lock_time: 0.001057 Rows_sent: 25 Rows_examined: 210523423
|
# Rows_affected: 0 Bytes_sent: 1778
|
# Tmp_tables: 3 Tmp_disk_tables: 0 Tmp_table_sizes: 70491296
|
# Full_scan: Yes Full_join: No Tmp_table: Yes Tmp_table_on_disk: No
|
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: Yes
|
#
|
# explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
|
# explain: 1 PRIMARY <derived3> ALL NULL NULL NULL NULL 1548 1548.00 100.00 100.00 Start temporary; Using temporary; Using filesort
|
# explain: 1 PRIMARY zr_te_value ref PRIMARY,ts PRIMARY 5 const,tvc_0.0000000000006447 218 135824.89 100.00 0.13 Using where; End temporary
|
# explain: 3 DERIVED NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL No tables used
|
#
|
use leitsystem;
|
SET timestamp=1542724982;
|
SELECT CONCAT("9906643000004NEXT", LPAD(next_id_int, 16, "0")) AS next_id, ts - MOD(ts, 300), ROUND(AVG(value), 3), AVG(quality) FROM leitsystem.zr_te_value WHERE next_id_int IN ('0000000000006447', '0000000000005272', ... )
|
AND ts >= 1542668700 AND ts < 1542755400 AND zr_art = 12 GROUP BY next_id, ts DIV 300 ORDER BY ts
|
|
|
Hi Frank,
What is the type of the next_id_int column?
|
|
Frank
If you don't need this conversion use
set @@in_predicate_conversion_threshold=1000000;
|
|
Can i suggest to experiment
create temporary table t(next_id_int , key idx(next_id_int) ) engine =memory;
insert t select ('0000000000006447', '0000000000005272', ... );
SELECT CONCAT("9906643000004NEXT", LPAD(a.next_id_int, 16, "0")) AS next_id, a.ts - MOD(a.ts, 300), ROUND(AVG(value), 3), AVG(quality) FROM leitsystem.zr_te_value a left join t ON a.next_id_int=t.next_id_int WHERE t.next_id_int IS NULL AND a.ts >= 1542668700 AND a.ts < 1542755400 AND a.zr_art = 12 GROUP BY a.next_id, a.ts DIV 300 ORDER BY a.ts;
it can help to check if the issue comes from icp shortcut, where filtering may take place after sending record or from full join cost ?
|
|
igor the column type is int. I told the application developers long ago they should give me int at this, but as it never made problems i did not put force on that.
Server version: 10.3.10-MariaDB-1:10.3.10+maria~bionic-log mariadb.org binary distribution
|
|
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
|
|
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
|
|
MariaDB [(none)]> set @@in_predicate_conversion_threshold=1000000;
|
ERROR 1193 (HY000): Unknown system variable 'in_predicate_conversion_threshold'
|
MariaDB [(none)]>
|
|
|
stephane@skysql.com:
I had to adapt it a little bit, but that was fast enough.
But it gives me totally different row count. The original query gives me about 400k rows, this only about 80k rows (That are the numbers that phpmyadmin shows me):
create temporary table t(next_id_int INT , key idx(next_id_int) ) engine =memory;
|
INSERT INTO t (next_id_int) VALUES (6447),(5272),...,(3104);
|
|
SELECT CONCAT("9906643000004NEXT", LPAD(a.next_id_int, 16, "0")) AS next_id, a.ts - MOD(a.ts, 300), ROUND(AVG(value), 3), AVG(quality)
|
FROM leitsystem.zr_te_value a left join t ON a.next_id_int=t.next_id_int
|
WHERE t.next_id_int IS NULL AND a.ts >= 1542668700 AND a.ts < 1542755400 AND a.zr_art = 12 GROUP BY a.next_id_int, a.ts DIV 300 ORDER BY a.ts;
|
# Time: 181121 9:48:07
|
# User@Host:
|
# Thread_id: 525 Schema: leitsystem QC_hit: No
|
# Query_time: 4.268082 Lock_time: 0.000173 Rows_sent: 25 Rows_examined: 1474207
|
# Rows_affected: 0 Bytes_sent: 1793
|
# Tmp_tables: 1 Tmp_disk_tables: 0 Tmp_table_sizes: 18162792
|
# Full_scan: No Full_join: No Tmp_table: Yes Tmp_table_on_disk: No
|
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: Yes
|
#
|
# explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
|
# explain: 1 SIMPLE a range PRIMARY,ts ts 4 NULL 2063432 981521.00 75.00 50.19 Using where; Using temporary; Using filesort
|
# explain: 1 SIMPLE t ref idx idx 5 leitsystem.a.next_id_int 2 0.84 100.00 100.00 Using where
|
#
|
use leitsystem;
|
SET timestamp=1542790087;
|
SELECT CONCAT("9906643000004NEXT", LPAD(a.next_id_int, 16, "0")) AS next_id, a.ts - MOD(a.ts, 300), ROUND(AVG(value), 3), AVG(quality)
|
FROM leitsystem.zr_te_value a left join t ON a.next_id_int=t.next_id_int
|
WHERE t.next_id_int IS NULL AND a.ts >= 1542668700 AND a.ts < 1542755400 AND a.zr_art = 12 GROUP BY a.next_id_int, a.ts DIV 300 ORDER BY a.ts LIMIT 0, 25;
|
|
|
Tx Frank
Who is correct that is the question, in the original query you get string IN ('0000xxxx') witch looks wrong at first place if data type of next_id_int is INTEGER. it looks like the optimization will build a temp table with the datatype of NOT IN list : VARCHAR where in your table you get INTEGER, forcing conversion from int to string and not equivalent from string to int.
1 <=> "1"
1 <=> "01"
"1" <> "01"
"1" <=> "1"
|
|
@Stephane:
The INT conversion should not be the problem, as for both tests i used a list of INTs.
|
|
Frank,
"The INT conversion should not be the problem, as for both tests i used a list of INTs."
This is exactly the cause of the problem.
Anyway you can report it as a bug.
|
|
igor: No, it is not:
# Time: 181122 9:20:42
|
# User@Host:
|
# Thread_id: 82 Schema: leitsystem QC_hit: No
|
# Query_time: 214.707173 Lock_time: 0.000781 Rows_sent: 25 Rows_examined: 211389323
|
# Rows_affected: 0 Bytes_sent: 1773
|
# Tmp_tables: 3 Tmp_disk_tables: 0 Tmp_table_sizes: 92843912
|
# Full_scan: Yes Full_join: Yes Tmp_table: Yes Tmp_table_on_disk: No
|
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: Yes
|
#
|
# explain: id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
|
# explain: 1 PRIMARY <subquery2> ALL distinct_key NULL NULL NULL 1548 1548.00 100.00 100.00 Using temporary; Using filesort
|
# explain: 1 PRIMARY zr_te_value ref PRIMARY,ts PRIMARY 5 const,tvc_0._col_1 218 136287.92 100.00 0.20 Using where
|
# explain: 2 MATERIALIZED <derived3> ALL NULL NULL NULL NULL 1548 1548.00 100.00 100.00
|
# explain: 3 DERIVED NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL No tables used
|
#
|
use leitsystem;
|
SET timestamp=1542874842;
|
SELECT CONCAT("9906643000004NEXT", LPAD(next_id_int, 16, "0")) AS next_id, ts - MOD(ts, 300), ROUND(AVG(value), 3), AVG(quality) FROM leitsystem.zr_te_value WHERE next_id_in
|
t IN (6447,5272,...,3104)
|
AND ts >= 1542668700 AND ts < 1542755400 AND zr_art = 12 GROUP BY next_id, ts DIV 300 ORDER BY ts ASC;
|
|
|
Hi Frank !
Can you report it as a regular bug a close task does not look the good place ! you can link that task as a reference !
/svar
|
|
Should i open an additional bug for this one?
MariaDB [(none)]> set @@in_predicate_conversion_threshold=1000000;
ERROR 1193 (HY000): Unknown system variable 'in_predicate_conversion_threshold'
MariaDB [(none)]>
|
|
Apparently @@in_predicate_conversion_threshold is present in debug builds only.
|
|
Added a documentation stub: https://mariadb.com/kb/en/library/conversion-of-big-in-predicates-into-subqueries/
|
|
Related: MDEV-20871
@Frank: Did you ever create an additional bug for this one?
|
|
@Alexander: No i only created MDEV-17795, but there is MDEV-16871
|