[MCOL-686] Using BETWEEN together with date functions in WHERE clause 100x slower than InfiniDB Created: 2017-04-27  Updated: 2017-05-08  Resolved: 2017-05-04

Status: Closed
Project: MariaDB ColumnStore
Component/s: MDB Plugin
Affects Version/s: 1.0.8
Fix Version/s: 1.1.0

Type: Bug Priority: Minor
Reporter: Geoff Cleaves Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

centos7


Issue Links:
Relates
relates to MCOL-704 backport to 1.0.9: MCOL-686 Using BET... Closed
Sprint: 2017-9

 Description   

Something strange is happening, perhaps partition trimming is getting confused. The problem does not exist in IDB 4.6.

/* this is SLOW (27s) because 
   it combines BETWEEN and date function
*/
SELECT 
 SUM(impressions)
FROM
 action_date_hour
WHERE
 action_date BETWEEN CURRENT_DATE() AND CURRENT_DATE()

/* this is fast (0.23s) because no BETWEEN */
SELECT 
 SUM(impressions)
FROM
 action_date_hour
WHERE
 action_date >= CURRENT_DATE() 
  AND action_date <= CURRENT_DATE()

/* this is fast, no function in where */
SET @startDate=current_date();
SET @endDate=current_date();
SELECT 
 SUM(impressions)
FROM
 action_date_hour
WHERE
 action_date BETWEEN @startDate AND @endDate

The three queries return the same results.



 Comments   
Comment by Geoff Cleaves [ 2017-04-27 ]

This can certainly be worked around but unfortunately we have many stored queries using the BETWEEN keyword.

Comment by Andrew Hutchings (Inactive) [ 2017-04-27 ]

I'm not 100% sure from a quick debug but it appears BETWEEN with a pushed down cache function is failing to evaluate the function before it hits PrimProc. Therefore blocking any extent elimination from occurring.

Comment by Pierre Coustilas [ 2017-04-27 ]

I test with my database,
It's a little slower when I use BETWEEN and CURDATE() :

 # mysql 11:06:59 > SELECT count(*) FROM difsn__odc__performance_nl ; 
+-----------+
| count(*)  |
+-----------+
| 103740282 |
+-----------+
1 row in set (0.48 sec)
 
 # mysql 11:11:24 > desc difsn__odc__performance_nl;
+----------------+-------------+------+-----+---------+-------+
| Field          | Type        | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| num_auto       | int(11)     | NO   |     | 0       |       |
| id_unique      | bigint(12)  | NO   |     | 0       |       |
| id_cmp         | int(11)     | NO   |     | 0       |       |
| recu_nl        | varchar(30) | NO   |     | NULL    |       |
| date_recu_nl   | date        | NO   |     | NULL    |       |
| ouvert_nl      | varchar(30) | NO   |     | NULL    |       |
| date_ouvert_nl | date        | NO   |     | NULL    |       |
| clic_nl        | varchar(30) | NO   |     | NULL    |       |
| date_clic_nl   | date        | NO   |     | NULL    |       |
+----------------+-------------+------+-----+---------+-------+
 
 
 # mysql 11:11:04 > SELECT count(*) FROM difsn__odc__performance_nl WHERE date_ouvert_nl >= '2017-02-01' AND date_ouvert_nl <= '2017-04-27'; 
+----------+
| count(*) |
+----------+
| 16639064 |
+----------+
1 row in set (0.88 sec)
 
 
 # mysql 11:12:06 > SELECT count(*) FROM difsn__odc__performance_nl WHERE date_ouvert_nl BETWEEN '2017-02-01' AND '2017-04-27'; 
+----------+
| count(*) |
+----------+
| 16639064 |
+----------+
1 row in set (0.90 sec)
 
 # mysql 11:14:01 > SELECT count(*) FROM difsn__odc__performance_nl WHERE date_ouvert_nl >= '2017-02-01' AND date_ouvert_nl <= CURDATE();
+----------+
| count(*) |
+----------+
| 16639064 |
+----------+
1 row in set (0.91 sec)
 
 # mysql 11:14:21 > SELECT count(*) FROM difsn__odc__performance_nl WHERE date_ouvert_nl BETWEEN '2017-02-01' AND CURDATE(); 
+----------+
| count(*) |
+----------+
| 16639064 |
+----------+
1 row in set (1.26 sec)
 
 
If I use variable it's OK :
 # mysql 11:14:41 > set @datejour=CURDATE();
Query OK, 0 rows affected (0.00 sec)
 
 # mysql 11:14:46 > SELECT count(*) FROM difsn__odc__performance_nl WHERE date_ouvert_nl BETWEEN '2017-02-01' AND @datejour; 
+----------+
| count(*) |
+----------+
| 16639064 |
+----------+
1 row in set (0.89 sec)
 
 
 

Comment by Andrew Hutchings (Inactive) [ 2017-04-27 ]

Found the cause. We are executing BETWEEN as a nondeterministic function rather than a predicate. I have a prototype of a patch ready I'm testing now

Comment by Andrew Hutchings (Inactive) [ 2017-04-27 ]

Changed from 'critical' to 'minor' as this is a performance regression since InfiniDB. This is not a security/crashing bug, erroneous behaviour or a behaviour regression since ColumnStore GA. This will not affect the fact that I'm already working on this but helps the rest of my team.

Comment by Geoff Cleaves [ 2017-04-27 ]

Thanks. When I originally filed the issue I thought that I was getting incorrect result but in fact only slow results.

Comment by Andrew Hutchings (Inactive) [ 2017-04-27 ]

Pull request for 1.1 only for now as there is a regression risk (although regression suite passes)

Comment by Daniel Lee (Inactive) [ 2017-05-04 ]

Build tested: Github source
[root@localhost mariadb-columnstore-server]# git show
commit 5a090c64bced6532578dd8910f15530fd37fce2c
Merge: 9efb0a7 b062156
Author: david hill <david.hill@mariadb.com>
Date: Thu Apr 27 16:13:15 2017 -0500
Merge pull request #45 from mariadb-corporation/FixPackageName
Change the package name to match engines format

Reproduced the issue in 1.0.8-1 using a 10g lineitem table with 59986052 rows:

MariaDB [tpch1]> select count(l_linenumber) from lineitem where l_shipdate between current_date() and current_date();
---------------------

count(l_linenumber)

---------------------

0

---------------------
1 row in set (0.72 sec)
MariaDB [tpch1]> select count(l_linenumber) from lineitem where l_shipdate between '2017-04-04' and '2017-04-04';
---------------------

count(l_linenumber)

---------------------

0

---------------------
1 row in set (0.01 sec)

In 1.1.0-1, the two queries took the same amount of time to execute
MariaDB [tpch1]> select count(l_linenumber) from lineitem where l_shipdate between current_date() and current_date();
---------------------

count(l_linenumber)

---------------------

0

---------------------
1 row in set (0.01 sec)
MariaDB [tpch1]> select count(l_linenumber) from lineitem where l_shipdate between '2017-04-04' and '2017-04-04';
---------------------

count(l_linenumber)

---------------------

0

---------------------
1 row in set (0.01 sec)

Generated at Thu Feb 08 02:23:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.