Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • None
    • 5.6.1
    • ExeMgr, PrimProc
    • None
    • 2021-2, 2021-3, 2021-4, 2021-5

    Description

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(3) CHARACTER SET utf8) ENGINE=ColumnStore;
      INSERT INTO t1 VALUES ('A'),('a');
      SELECT * FROM t1 WHERE a LIKE 'a';
      

      +------+
      | a    |
      +------+
      | a    |
      +------+
      

      Notice, it returns only one row ignoring the fact that the collations is utf8_general_ci, which is case insensitive.

      It should return two rows like MyISAM does:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(3) CHARACTER SET utf8) ENGINE=MyISAM;
      INSERT INTO t1 VALUES ('A'),('a');
      SELECT * FROM t1 WHERE a LIKE 'a';
      

      +------+
      | a    |
      +------+
      | A    |
      | a    |
      +------+
      

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov added a comment - - edited

            Hello dleeyh,

            When doing QA for this task can you please check if the change affects performance?

            In MariaDB Server, the code from the collation library behind the LIKE operator is about twice faster than RLIKE:

            MariaDB [test]> SELECT  BENCHMARK(10000000, 'aaa' LIKE 'aaa');
            +---------------------------------------+
            | BENCHMARK(10000000, 'aaa' LIKE 'aaa') |
            +---------------------------------------+
            |                                     0 |
            +---------------------------------------+
            1 row in set (0.405 sec)
            

            MariaDB [test]> SELECT  BENCHMARK(10000000, 'aaa' RLIKE '^aaa$');
            +------------------------------------------+
            | BENCHMARK(10000000, 'aaa' RLIKE '^aaa$') |
            +------------------------------------------+
            |                                        0 |
            +------------------------------------------+
            1 row in set (0.906 sec)
            

            As this change in ColumnStore replaces the regex based LIKE implementation to the MariaDB collation library implementation, I also expect near 2 times performance improvement. It would be nice to verify this.

            Thank you!

            bar Alexander Barkov added a comment - - edited Hello dleeyh , When doing QA for this task can you please check if the change affects performance? In MariaDB Server, the code from the collation library behind the LIKE operator is about twice faster than RLIKE: MariaDB [test]> SELECT BENCHMARK(10000000, 'aaa' LIKE 'aaa' ); + ---------------------------------------+ | BENCHMARK(10000000, 'aaa' LIKE 'aaa' ) | + ---------------------------------------+ | 0 | + ---------------------------------------+ 1 row in set (0.405 sec) MariaDB [test]> SELECT BENCHMARK(10000000, 'aaa' RLIKE '^aaa$' ); + ------------------------------------------+ | BENCHMARK(10000000, 'aaa' RLIKE '^aaa$' ) | + ------------------------------------------+ | 0 | + ------------------------------------------+ 1 row in set (0.906 sec) As this change in ColumnStore replaces the regex based LIKE implementation to the MariaDB collation library implementation, I also expect near 2 times performance improvement. It would be nice to verify this. Thank you!
            dleeyh Daniel Lee (Inactive) added a comment - - edited

            Build tested: 5.6.1 (Drone #1915)

            The issue has been fixed.

            MariaDB [mytest]> DROP TABLE IF EXISTS t1;
            Query OK, 0 rows affected, 1 warning (0.022 sec)
             
            MariaDB [mytest]> CREATE TABLE t1 (a CHAR(3) CHARACTER SET utf8) ENGINE=ColumnStore;
            Query OK, 0 rows affected (0.148 sec)
             
            MariaDB [mytest]> INSERT INTO t1 VALUES ('A'),('a');
            Query OK, 2 rows affected (0.237 sec)
            Records: 2  Duplicates: 0  Warnings: 0
             
            MariaDB [mytest]> SELECT * FROM t1 WHERE a LIKE 'a';
            +------+
            | a    |
            +------+
            | A    |
            | a    |
            +------+
            2 rows in set (0.054 sec)
            

            Still need to check performance

            dleeyh Daniel Lee (Inactive) added a comment - - edited Build tested: 5.6.1 (Drone #1915) The issue has been fixed. MariaDB [mytest]> DROP TABLE IF EXISTS t1; Query OK, 0 rows affected, 1 warning (0.022 sec)   MariaDB [mytest]> CREATE TABLE t1 (a CHAR(3) CHARACTER SET utf8) ENGINE=ColumnStore; Query OK, 0 rows affected (0.148 sec)   MariaDB [mytest]> INSERT INTO t1 VALUES ('A'),('a'); Query OK, 2 rows affected (0.237 sec) Records: 2 Duplicates: 0 Warnings: 0   MariaDB [mytest]> SELECT * FROM t1 WHERE a LIKE 'a'; +------+ | a | +------+ | A | | a | +------+ 2 rows in set (0.054 sec) Still need to check performance
            dleeyh Daniel Lee (Inactive) added a comment - - edited

            MariaDB [(none)]> SELECT  BENCHMARK(10000000, 'aaa' LIKE 'aaa');
            +---------------------------------------+
            | BENCHMARK(10000000, 'aaa' LIKE 'aaa') |
            +---------------------------------------+
            |                                     0 |
            +---------------------------------------+
            1 row in set (1.557 sec)
             
            MariaDB [(none)]> SELECT  BENCHMARK(10000000, 'aaa' RLIKE '^aaa$');
            +------------------------------------------+
            | BENCHMARK(10000000, 'aaa' RLIKE '^aaa$') |
            +------------------------------------------+
            |                                        0 |
            +------------------------------------------+
            1 row in set (2.940 sec)
            

            dleeyh Daniel Lee (Inactive) added a comment - - edited MariaDB [(none)]> SELECT BENCHMARK(10000000, 'aaa' LIKE 'aaa'); +---------------------------------------+ | BENCHMARK(10000000, 'aaa' LIKE 'aaa') | +---------------------------------------+ | 0 | +---------------------------------------+ 1 row in set (1.557 sec)   MariaDB [(none)]> SELECT BENCHMARK(10000000, 'aaa' RLIKE '^aaa$'); +------------------------------------------+ | BENCHMARK(10000000, 'aaa' RLIKE '^aaa$') | +------------------------------------------+ | 0 | +------------------------------------------+ 1 row in set (2.940 sec)
            bar Alexander Barkov added a comment - - edited

            Sorry, I must have not been precise enough about benchmarking.
            The goal is to test how the patch affects the LIKE operator performance in ColumnStore tables.
            I think it could be done as follows::

            • Create a ColumnStore table with a huge amount of data
            • Run a query involving LIKE using a binary without this patch (using any latest develop-5 binary), e.g:

              SELECT * FROM huge_table WHERE some_column LIKE '... some pattern ..';
              

              Or the query can use COUNT to reduce the overhead needed to send values over the network:

              SELECT COUNT(*) FROM huge_table WHERE some_column LIKE '... some pattern ..';
              

            • Run the same query using a binary with the patch for MCOL-4498
            • Measure the performance difference
            • Repeat the above procedure for TEXT, VARCHAR, long CHAR and short CHAR columns

            The tests that you pasted in the last comment with queries like this:

            SELECT BENCHMARK(10000000, 'aaa' LIKE 'aaa');
            

            do not involve ColumnStore at all. They are run on the server side completely and don't demonstrate how the patch affected LIKE performance in ColumnStore. I gave them only to show that in MariaDB, LIKE is faster than RLIKE. Therefore, in ColumnStore new LIKE implementation will hopefully be also faster than the old regex based one.

            bar Alexander Barkov added a comment - - edited Sorry, I must have not been precise enough about benchmarking. The goal is to test how the patch affects the LIKE operator performance in ColumnStore tables. I think it could be done as follows:: Create a ColumnStore table with a huge amount of data Run a query involving LIKE using a binary without this patch (using any latest develop-5 binary), e.g: SELECT * FROM huge_table WHERE some_column LIKE '... some pattern ..' ; Or the query can use COUNT to reduce the overhead needed to send values over the network: SELECT COUNT (*) FROM huge_table WHERE some_column LIKE '... some pattern ..' ; Run the same query using a binary with the patch for MCOL-4498 Measure the performance difference Repeat the above procedure for TEXT, VARCHAR, long CHAR and short CHAR columns The tests that you pasted in the last comment with queries like this: SELECT BENCHMARK(10000000, 'aaa' LIKE 'aaa' ); do not involve ColumnStore at all. They are run on the server side completely and don't demonstrate how the patch affected LIKE performance in ColumnStore. I gave them only to show that in MariaDB, LIKE is faster than RLIKE. Therefore, in ColumnStore new LIKE implementation will hopefully be also faster than the old regex based one.

            drrtuy, benchmarks showed a good performance improvement. Please continue with the review. Thanks.

            bar Alexander Barkov added a comment - drrtuy , benchmarks showed a good performance improvement. Please continue with the review. Thanks.
            drrtuy Roman added a comment -

            Closing b/c dleeyh had tested the patch.

            drrtuy Roman added a comment - Closing b/c dleeyh had tested the patch.

            People

              drrtuy Roman
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.