Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4534

MariaDB collation library: improve comparison performance in 8bit nopad_bin collations

    XMLWordPrintable

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 5.5.1
    • 23.10
    • MariaDB Server
    • None
    • 2021-2, 2021-3, 2021-8, 2021-9, 2021-10, 2021-11, 2021-12

    Description

      2021-03-02 Update

      Before starting working in this issue, we probably need to implement MCOL-4568 first. The values passed to the new function in MariaDB collation library must be padded with spaces rather than zero bytes.

      2021-02-15 Update

      Note, the technique described below is only valid for xxx_nopad_bin collations (having NO PAD attribute).
      It would not be correct to apply the same improvement to xxx_bin (PAD SPACE) collations, because trailing spaces handing would change!

      Actual description

      After adding collation support into ColumnStore, performance of the comparison operator degraded for short CHAR columns, even for _bin collations.
      This happened because:

      • Before making ColumnStore collation aware it used to compare short CHAR values as uint32 or uint64 numbers.
      • Since adding collations, ColumnStore delegates comparison to a call of CHARSET_INFO::strnncollsp(). The latter compares the data as strings (even for _bin collations), which is slower than comparing numbers.

      The attached patch reconstructs old ColumnStore behavior inside MariaDB collation library. It really makes comparison work faster for CHAR(4).

      Benchmarking a 10.5 RelWithDebInfo build before and after the patch applied:

      Comparison performance for INT and latin1_swedish_ci (for reference)

      select benchmark(100000000,1111=1111);
      1 row in set (0.689 sec)
       
      SET NAMES latin1; select benchmark(100000000,'1111'='1111');
      1 row in set (0.975 sec)
      

      Comparison performance for BINARY, CHAR(4) latin1_bin, CHAR(4) latin1_nopad_bin

      For `clean 10.5` versus `10.5 with patch applied`

      SET NAMES binary; select benchmark(100000000,'1111'='1111');
      1 row in set (0.958 sec) -- clean
      1 row in set (0.839 sec) -- after the patch
       
      SET NAMES latin1 COLLATE latin1_bin; select benchmark(100000000,'1111'='1111');
      1 row in set (1.086 sec) -- before the patch
      1 row in set (0.812 sec) -- after the patch
       
      SET NAMES latin1 COLLATE latin1_nopad_bin; select benchmark(100000000,'1111'='1111');
      1 row in set (1.066 sec) -- before the patch
      1 row in set (0.852 sec) -- after the patch
      

      Notice, comparison of CHAR(4) latin1_bin (versus comparison of INT data) is:

      • 1.086÷0.689 = 1.58 times slower in the clean version
      • 0.812÷0.689 = 1.18 times slower in the patched version

      The patch gives a good performance improvement, around 25%.

      Let's add new methods into my_collation_handler_st, with the following tentative API:

        int     (*strnncollsp_4bytes)(CHARSET_INFO *,
                                      const uchar *a,
                                      const uchar *b);
        int     (*strnncollsp_8bytes)(CHARSET_INFO *,
                                      const uchar *a,
                                      const uchar *b);
      
      

      (and correspoding wrapper methods in CHARSET_INFO).

      So ColumnStore will be able to use these optimized comparison functions for short CHAR and VARCHAR data.

      ColumnStore stores short CHAR values in memory in numeric format, either in 4 bytes or in 8 bytes, depending on the width. So it will use:

      • strnncollsp_4bytes() for CHAR(1), CHAR(2), CHAR(3), CHAR(4)
      • strnncollsp_8bytes() for CHAR(5), CHAR(6), CHAR(7), CHAR(8)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.