[MCOL-3663] Performance regression in LIKE & NOT LIKE Created: 2019-12-10 Updated: 2020-03-13 Resolved: 2020-02-12 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | None |
| Affects Version/s: | 1.4.1 |
| Fix Version/s: | 1.4.3 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Andrew Hutchings (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Sprint: | 2020-1, 2020-2 | ||||||||
| Description |
|
LIKE and NOT LIKE perform significantly worse in 1.4 vs 1.2. The assumed guess is this falls back to non-select-handler based methods. See attached image for example. |
| Comments |
| Comment by Roman [ 2020-02-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I didn't find the regression out. Here is my methodology. I loaded orders with enough records in both develop-1.2 and 1.4 and run the query. The results differs from version to version but not so radically. FYI I conducted the test in the cloud env so this fact introduced some noise for sure. The results are below.
1.4
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gregory Dorman (Inactive) [ 2020-02-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
dleeyh - please test on 1.2 prod, and then on 1.2 DEV (and 1.4 then). Roman did it on 1.2 dev, maybe it regressed between two 1.2 branches. If the same (or statistically close) - just close. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2020-02-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Builds tested: 1.2.5-1, 1.4.2-1 (Released) Environment: VirtualBox VMS, 4 vCores, 40GB memory 1.2.5-1 is 2.5x faster than 1.4.2-1 for disk run and 3.5x for cached run. I will check the current develop branches next. 1.2.5-1 Welcome to the MariaDB monitor. Commands end with ; or \g. Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [tpch10]> select count
-----------
----------- MariaDB [tpch10]> select count
----------
---------- MariaDB [tpch10]> select count
----------
---------- 1.4.2-1 Reading table information for completion of table and column names Welcome to the MariaDB monitor. Commands end with ; or \g. Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [tpch10]> select count
-----------
----------- MariaDB [tpch10]> select count
----------
---------- MariaDB [tpch10]> select count
----------
---------- MariaDB [tpch10]> | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gregory Dorman (Inactive) [ 2020-02-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So, Roman - the evidence so far is that some massive perf regression was introduced into both 1.2.dev and 1.4.dev. Afraid we can't go out like that, sorry. From Daniel: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2020-02-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Build tested: 1.4.3-1, built from source I had issues building 1.2.6-1 and 1.2.6-1 is not available from buildbot 1.4.3-1 is 60% slower than 1.4.2-1 for both disk and cached runs I did make sure the test VM is the only VM existed in my test server. server branch: 10.4-enterprise, commit: 6a8bf09d0dfd012d21fd7c8c542ad46215d1547e --------- Reading table information for completion of table and column names Welcome to the MariaDB monitor. Commands end with ; or \g. Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [tpch10]> select count
-----------
----------- MariaDB [tpch10]> select count
----------
---------- MariaDB [tpch10]> select count
----------
---------- MariaDB [tpch10]> | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2020-02-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
reopen per latest test result | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2020-02-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
LIKE and NOT LIKE are implemented internally using regular expressions. In 1.2.6, while fixing other things, we changed from using posix regex to using boost regex. Who woulda thought boost would be 4 times slower? Switched pack to posix regex. In C++x11, there's a std::regex, which may or may not be better to use. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2020-02-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Google uses RE2 https://github.com/google/re2. We may want to benchmark that. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roman [ 2020-02-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
std::regex is as bad as boost:regex so we don't want it unless I missed some magic performance knob that I doubt. We surely want to test re2 having this test in mind. Worth to note the test is outdated. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2020-02-12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Build verified: 1.4.3-1 BB nightly engine commit: MariaDB [tpch10]> select count
-----------
----------- MariaDB [tpch10]> select count
----------
---------- MariaDB [tpch10]> select count
----------
---------- |