[MCOL-3663] Performance regression in LIKE & NOT LIKE Created: 2019-12-10  Updated: 2020-03-13  Resolved: 2020-02-12

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.4.1
Fix Version/s: 1.4.3

Type: Bug Priority: Blocker
Reporter: Andrew Hutchings (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: PNG File not like bug.png    
Issue Links:
Relates
relates to MCOL-3778 Replace glibc with google's re2 for r... Closed
Sprint: 2020-1, 2020-2

 Description   

LIKE and NOT LIKE perform significantly worse in 1.4 vs 1.2. The assumed guess is this falls back to non-select-handler based methods. See attached image for example.



 Comments   
Comment by Roman [ 2020-02-05 ]

I didn't find the regression out. Here is my methodology. I loaded orders with enough records in both develop-1.2 and 1.4 and run the query. The results differs from version to version but not so radically. FYI I conducted the test in the cloud env so this fact introduced some noise for sure. The results are below.
Another fact is that select count from orders converged in 8 seconds that is even less then select..like. I think that was a bug. calgettrace(0) info might had pointed to the origins of this timings but we lack this information.
1.2

MariaDB [test]> select count(*) from orders;
+-----------+
| count(*)  |
+-----------+
| 153000000 |
+-----------+
1 row in set (0.757 sec)
 
MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%';
 
+----------+
| count(*) |
+----------+
|  1638018 |
+----------+
1 row in set (48.223 sec)
 
MariaDB [test]> 
MariaDB [test]> 
MariaDB [test]> 
MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%';
+----------+
| count(*) |
+----------+
|  1638018 |
+----------+
1 row in set (33.524 sec)
 
MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%';
+----------+
| count(*) |
+----------+
|  1638018 |
+----------+
1 row in set (29.488 sec)

1.4

MariaDB [test]> select count(*) from orders;
+-----------+
| count(*)  |
+-----------+
| 153000000 |
+-----------+
1 row in set (1.211 sec)
 
MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%';
+----------+
| count(*) |
+----------+
|  1638018 |
+----------+
1 row in set (56.948 sec)
 
MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%';
+----------+
| count(*) |
+----------+
|  1638018 |
+----------+
1 row in set (37.778 sec)
 
MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%';
+----------+
| count(*) |
+----------+
|  1638018 |
+----------+
1 row in set (37.612 sec)

Comment by Gregory Dorman (Inactive) [ 2020-02-05 ]

dleeyh - please test on 1.2 prod, and then on 1.2 DEV (and 1.4 then). Roman did it on 1.2 dev, maybe it regressed between two 1.2 branches. If the same (or statistically close) - just close.

Comment by Daniel Lee (Inactive) [ 2020-02-06 ]

Builds tested: 1.2.5-1, 1.4.2-1 (Released)

Environment: VirtualBox VMS, 4 vCores, 40GB memory

1.2.5-1 is 2.5x faster than 1.4.2-1 for disk run and 3.5x for cached run.

I will check the current develop branches next.

1.2.5-1

Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 22
Server version: 10.3.16-MariaDB-log Columnstore 1.2.5-1

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [tpch10]> select count from orders;
-----------

count

-----------

150000000

-----------
1 row in set (2.172 sec)

MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
----------

count

----------

1610254

----------
1 row in set (26.337 sec)

MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
----------

count

----------

1610254

----------
1 row in set (17.063 sec)

1.4.2-1

Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 13
Server version: 10.4.11-5-MariaDB-enterprise MariaDB Enterprise Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [tpch10]> select count from orders;
-----------

count

-----------

150000000

-----------
1 row in set (1.624 sec)

MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
----------

count

----------

1610254

----------
1 row in set (1 min 4.240 sec)

MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
----------

count

----------

1610254

----------
1 row in set (59.390 sec)

MariaDB [tpch10]>

Comment by Gregory Dorman (Inactive) [ 2020-02-06 ]

So, Roman - the evidence so far is that some massive perf regression was introduced into both 1.2.dev and 1.4.dev. Afraid we can't go out like that, sorry.

From Daniel:
1.2.5-1 is 2.5x faster than 1.4.2-1 for disk run and 3.5x for cached run.
I will check the current develop branches next.

Comment by Daniel Lee (Inactive) [ 2020-02-06 ]

Build tested: 1.4.3-1, built from source

I had issues building 1.2.6-1 and 1.2.6-1 is not available from buildbot
Tested on 1.4.3-1 only.

1.4.3-1 is 60% slower than 1.4.2-1 for both disk and cached runs

I did make sure the test VM is the only VM existed in my test server.

server branch: 10.4-enterprise, commit: 6a8bf09d0dfd012d21fd7c8c542ad46215d1547e
engine branch: develop-1.4, commit: 5efa6a4dc52129be2de49fdfc23e44020401b86b

---------

Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 9
Server version: 10.4.12-6-MariaDB Source distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [tpch10]> select count from orders;
-----------

count

-----------

150000000

-----------
1 row in set (1.638 sec)

MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
----------

count

----------

1610254

----------
1 row in set (1 min 42.036 sec)

MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
----------

count

----------

1610254

----------
1 row in set (1 min 31.760 sec)

MariaDB [tpch10]>

Comment by Daniel Lee (Inactive) [ 2020-02-06 ]

reopen per latest test result

Comment by David Hall (Inactive) [ 2020-02-10 ]

LIKE and NOT LIKE are implemented internally using regular expressions. In 1.2.6, while fixing other things, we changed from using posix regex to using boost regex. Who woulda thought boost would be 4 times slower? Switched pack to posix regex.

In C++x11, there's a std::regex, which may or may not be better to use.

Comment by David Hall (Inactive) [ 2020-02-10 ]

Google uses RE2 https://github.com/google/re2. We may want to benchmark that.

Comment by Roman [ 2020-02-10 ]

std::regex is as bad as boost:regex so we don't want it unless I missed some magic performance knob that I doubt.

We surely want to test re2 having this test in mind. Worth to note the test is outdated.

Comment by Daniel Lee (Inactive) [ 2020-02-12 ]

Build verified: 1.4.3-1 BB nightly

engine commit:
8588678

MariaDB [tpch10]> select count from orders;
-----------

count

-----------

150000000

-----------
1 row in set (1.690 sec)

MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
----------

count

----------

1610254

----------
1 row in set (26.867 sec)

MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
----------

count

----------

1610254

----------
1 row in set (16.448 sec)

Generated at Thu Feb 08 02:44:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.