Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3663

Performance regression in LIKE & NOT LIKE

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Fixed
    • 1.4.1
    • 1.4.3
    • None
    • None
    • 2020-1, 2020-2

    Description

      LIKE and NOT LIKE perform significantly worse in 1.4 vs 1.2. The assumed guess is this falls back to non-select-handler based methods. See attached image for example.

      Attachments

        Issue Links

          Activity

            drrtuy Roman added a comment -

            I didn't find the regression out. Here is my methodology. I loaded orders with enough records in both develop-1.2 and 1.4 and run the query. The results differs from version to version but not so radically. FYI I conducted the test in the cloud env so this fact introduced some noise for sure. The results are below.
            Another fact is that select count from orders converged in 8 seconds that is even less then select..like. I think that was a bug. calgettrace(0) info might had pointed to the origins of this timings but we lack this information.
            1.2

            MariaDB [test]> select count(*) from orders;
            +-----------+
            | count(*)  |
            +-----------+
            | 153000000 |
            +-----------+
            1 row in set (0.757 sec)
             
            MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%';
             
            +----------+
            | count(*) |
            +----------+
            |  1638018 |
            +----------+
            1 row in set (48.223 sec)
             
            MariaDB [test]> 
            MariaDB [test]> 
            MariaDB [test]> 
            MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%';
            +----------+
            | count(*) |
            +----------+
            |  1638018 |
            +----------+
            1 row in set (33.524 sec)
             
            MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%';
            +----------+
            | count(*) |
            +----------+
            |  1638018 |
            +----------+
            1 row in set (29.488 sec)
            

            1.4

            MariaDB [test]> select count(*) from orders;
            +-----------+
            | count(*)  |
            +-----------+
            | 153000000 |
            +-----------+
            1 row in set (1.211 sec)
             
            MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%';
            +----------+
            | count(*) |
            +----------+
            |  1638018 |
            +----------+
            1 row in set (56.948 sec)
             
            MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%';
            +----------+
            | count(*) |
            +----------+
            |  1638018 |
            +----------+
            1 row in set (37.778 sec)
             
            MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%';
            +----------+
            | count(*) |
            +----------+
            |  1638018 |
            +----------+
            1 row in set (37.612 sec)
            

            drrtuy Roman added a comment - I didn't find the regression out. Here is my methodology. I loaded orders with enough records in both develop-1.2 and 1.4 and run the query. The results differs from version to version but not so radically. FYI I conducted the test in the cloud env so this fact introduced some noise for sure. The results are below. Another fact is that select count from orders converged in 8 seconds that is even less then select..like. I think that was a bug. calgettrace(0) info might had pointed to the origins of this timings but we lack this information. 1.2 MariaDB [test]> select count(*) from orders; +-----------+ | count(*) | +-----------+ | 153000000 | +-----------+ 1 row in set (0.757 sec)   MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%';   +----------+ | count(*) | +----------+ | 1638018 | +----------+ 1 row in set (48.223 sec)   MariaDB [test]> MariaDB [test]> MariaDB [test]> MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%'; +----------+ | count(*) | +----------+ | 1638018 | +----------+ 1 row in set (33.524 sec)   MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%'; +----------+ | count(*) | +----------+ | 1638018 | +----------+ 1 row in set (29.488 sec) 1.4 MariaDB [test]> select count(*) from orders; +-----------+ | count(*) | +-----------+ | 153000000 | +-----------+ 1 row in set (1.211 sec)   MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%'; +----------+ | count(*) | +----------+ | 1638018 | +----------+ 1 row in set (56.948 sec)   MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%'; +----------+ | count(*) | +----------+ | 1638018 | +----------+ 1 row in set (37.778 sec)   MariaDB [test]> select count(*) from orders o where o_comment like '%express%packages%'; +----------+ | count(*) | +----------+ | 1638018 | +----------+ 1 row in set (37.612 sec)

            dleeyh - please test on 1.2 prod, and then on 1.2 DEV (and 1.4 then). Roman did it on 1.2 dev, maybe it regressed between two 1.2 branches. If the same (or statistically close) - just close.

            gdorman Gregory Dorman (Inactive) added a comment - dleeyh - please test on 1.2 prod, and then on 1.2 DEV (and 1.4 then). Roman did it on 1.2 dev, maybe it regressed between two 1.2 branches. If the same (or statistically close) - just close.

            Builds tested: 1.2.5-1, 1.4.2-1 (Released)

            Environment: VirtualBox VMS, 4 vCores, 40GB memory

            1.2.5-1 is 2.5x faster than 1.4.2-1 for disk run and 3.5x for cached run.

            I will check the current develop branches next.

            1.2.5-1

            Welcome to the MariaDB monitor. Commands end with ; or \g.
            Your MariaDB connection id is 22
            Server version: 10.3.16-MariaDB-log Columnstore 1.2.5-1

            Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

            Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

            MariaDB [tpch10]> select count from orders;
            -----------

            count

            -----------

            150000000

            -----------
            1 row in set (2.172 sec)

            MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
            ----------

            count

            ----------

            1610254

            ----------
            1 row in set (26.337 sec)

            MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
            ----------

            count

            ----------

            1610254

            ----------
            1 row in set (17.063 sec)

            1.4.2-1

            Reading table information for completion of table and column names
            You can turn off this feature to get a quicker startup with -A

            Welcome to the MariaDB monitor. Commands end with ; or \g.
            Your MariaDB connection id is 13
            Server version: 10.4.11-5-MariaDB-enterprise MariaDB Enterprise Server

            Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

            Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

            MariaDB [tpch10]> select count from orders;
            -----------

            count

            -----------

            150000000

            -----------
            1 row in set (1.624 sec)

            MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
            ----------

            count

            ----------

            1610254

            ----------
            1 row in set (1 min 4.240 sec)

            MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
            ----------

            count

            ----------

            1610254

            ----------
            1 row in set (59.390 sec)

            MariaDB [tpch10]>

            dleeyh Daniel Lee (Inactive) added a comment - Builds tested: 1.2.5-1, 1.4.2-1 (Released) Environment: VirtualBox VMS, 4 vCores, 40GB memory 1.2.5-1 is 2.5x faster than 1.4.2-1 for disk run and 3.5x for cached run. I will check the current develop branches next. 1.2.5-1 Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 22 Server version: 10.3.16-MariaDB-log Columnstore 1.2.5-1 Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [tpch10] > select count from orders; ----------- count ----------- 150000000 ----------- 1 row in set (2.172 sec) MariaDB [tpch10] > select count from orders o where o_comment like '%express%packages%'; ---------- count ---------- 1610254 ---------- 1 row in set (26.337 sec) MariaDB [tpch10] > select count from orders o where o_comment like '%express%packages%'; ---------- count ---------- 1610254 ---------- 1 row in set (17.063 sec) 1.4.2-1 Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 13 Server version: 10.4.11-5-MariaDB-enterprise MariaDB Enterprise Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [tpch10] > select count from orders; ----------- count ----------- 150000000 ----------- 1 row in set (1.624 sec) MariaDB [tpch10] > select count from orders o where o_comment like '%express%packages%'; ---------- count ---------- 1610254 ---------- 1 row in set (1 min 4.240 sec) MariaDB [tpch10] > select count from orders o where o_comment like '%express%packages%'; ---------- count ---------- 1610254 ---------- 1 row in set (59.390 sec) MariaDB [tpch10] >

            So, Roman - the evidence so far is that some massive perf regression was introduced into both 1.2.dev and 1.4.dev. Afraid we can't go out like that, sorry.

            From Daniel:
            1.2.5-1 is 2.5x faster than 1.4.2-1 for disk run and 3.5x for cached run.
            I will check the current develop branches next.

            gdorman Gregory Dorman (Inactive) added a comment - So, Roman - the evidence so far is that some massive perf regression was introduced into both 1.2.dev and 1.4.dev. Afraid we can't go out like that, sorry. From Daniel: 1.2.5-1 is 2.5x faster than 1.4.2-1 for disk run and 3.5x for cached run. I will check the current develop branches next.

            Build tested: 1.4.3-1, built from source

            I had issues building 1.2.6-1 and 1.2.6-1 is not available from buildbot
            Tested on 1.4.3-1 only.

            1.4.3-1 is 60% slower than 1.4.2-1 for both disk and cached runs

            I did make sure the test VM is the only VM existed in my test server.

            server branch: 10.4-enterprise, commit: 6a8bf09d0dfd012d21fd7c8c542ad46215d1547e
            engine branch: develop-1.4, commit: 5efa6a4dc52129be2de49fdfc23e44020401b86b

            ---------

            Reading table information for completion of table and column names
            You can turn off this feature to get a quicker startup with -A

            Welcome to the MariaDB monitor. Commands end with ; or \g.
            Your MariaDB connection id is 9
            Server version: 10.4.12-6-MariaDB Source distribution

            Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

            Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

            MariaDB [tpch10]> select count from orders;
            -----------

            count

            -----------

            150000000

            -----------
            1 row in set (1.638 sec)

            MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
            ----------

            count

            ----------

            1610254

            ----------
            1 row in set (1 min 42.036 sec)

            MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
            ----------

            count

            ----------

            1610254

            ----------
            1 row in set (1 min 31.760 sec)

            MariaDB [tpch10]>

            dleeyh Daniel Lee (Inactive) added a comment - Build tested: 1.4.3-1, built from source I had issues building 1.2.6-1 and 1.2.6-1 is not available from buildbot Tested on 1.4.3-1 only. 1.4.3-1 is 60% slower than 1.4.2-1 for both disk and cached runs I did make sure the test VM is the only VM existed in my test server. server branch: 10.4-enterprise, commit: 6a8bf09d0dfd012d21fd7c8c542ad46215d1547e engine branch: develop-1.4, commit: 5efa6a4dc52129be2de49fdfc23e44020401b86b --------- Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 9 Server version: 10.4.12-6-MariaDB Source distribution Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [tpch10] > select count from orders; ----------- count ----------- 150000000 ----------- 1 row in set (1.638 sec) MariaDB [tpch10] > select count from orders o where o_comment like '%express%packages%'; ---------- count ---------- 1610254 ---------- 1 row in set (1 min 42.036 sec) MariaDB [tpch10] > select count from orders o where o_comment like '%express%packages%'; ---------- count ---------- 1610254 ---------- 1 row in set (1 min 31.760 sec) MariaDB [tpch10] >

            reopen per latest test result

            dleeyh Daniel Lee (Inactive) added a comment - reopen per latest test result

            LIKE and NOT LIKE are implemented internally using regular expressions. In 1.2.6, while fixing other things, we changed from using posix regex to using boost regex. Who woulda thought boost would be 4 times slower? Switched pack to posix regex.

            In C++x11, there's a std::regex, which may or may not be better to use.

            David.Hall David Hall (Inactive) added a comment - LIKE and NOT LIKE are implemented internally using regular expressions. In 1.2.6, while fixing other things, we changed from using posix regex to using boost regex. Who woulda thought boost would be 4 times slower? Switched pack to posix regex. In C++x11, there's a std::regex, which may or may not be better to use.

            Google uses RE2 https://github.com/google/re2. We may want to benchmark that.

            David.Hall David Hall (Inactive) added a comment - Google uses RE2 https://github.com/google/re2 . We may want to benchmark that.
            drrtuy Roman added a comment - - edited

            std::regex is as bad as boost:regex so we don't want it unless I missed some magic performance knob that I doubt.

            We surely want to test re2 having this test in mind. Worth to note the test is outdated.

            drrtuy Roman added a comment - - edited std::regex is as bad as boost:regex so we don't want it unless I missed some magic performance knob that I doubt. We surely want to test re2 having this test in mind. Worth to note the test is outdated.

            Build verified: 1.4.3-1 BB nightly

            engine commit:
            8588678

            MariaDB [tpch10]> select count from orders;
            -----------

            count

            -----------

            150000000

            -----------
            1 row in set (1.690 sec)

            MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
            ----------

            count

            ----------

            1610254

            ----------
            1 row in set (26.867 sec)

            MariaDB [tpch10]> select count from orders o where o_comment like '%express%packages%';
            ----------

            count

            ----------

            1610254

            ----------
            1 row in set (16.448 sec)

            dleeyh Daniel Lee (Inactive) added a comment - Build verified: 1.4.3-1 BB nightly engine commit: 8588678 MariaDB [tpch10] > select count from orders; ----------- count ----------- 150000000 ----------- 1 row in set (1.690 sec) MariaDB [tpch10] > select count from orders o where o_comment like '%express%packages%'; ---------- count ---------- 1610254 ---------- 1 row in set (26.867 sec) MariaDB [tpch10] > select count from orders o where o_comment like '%express%packages%'; ---------- count ---------- 1610254 ---------- 1 row in set (16.448 sec)

            People

              dleeyh Daniel Lee (Inactive)
              LinuxJedi Andrew Hutchings (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.