Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34352

Execution time for Block Nested Loop Hash (BNLH) join with charset utf8mb4 is 7-20 times slower than with latin1

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Duplicate
    • 10.5, 10.6, 10.11, 11.1(EOL), 11.2, 11.4, 11.6, 11.5(EOL)
    • N/A
    • Character Sets

    Description

      2024-06-21 Update

      This issue appeared to be a combination of two problems:

      • MDEV-34417 Wrong result set with utf8mb4_danish_ci and BNLH join
      • MDEV-34427 BNL-H has not optimal implementation for varchar type

      All further work will be done under terms of the above two tickets.

      The original report

      Testcase:
      1. Enable Block Nested Loop Hash (BNLH):

      set join_cache_level =3;
      

      2. Variables join_buffer_size and join_buffer_space_limit were used with default value (They can be changed, but the difference in execution time will still remain)

      +-------------------------+----------------------+
      | Variable_name           | Value                |
      +-------------------------+----------------------+
      | join_buffer_size        | 262144               |
      | join_buffer_space_limit | 2097152              |
      +-------------------------+----------------------+
      

      3. Create table with charset=latin1 or charset=utf8mb4:

      CREATE TABLE t1(col1 VARCHAR(500)) engine=INNODB character set latin1 collate latin1_swedish_ci;
      CREATE TABLE t2(col2 VARCHAR(500)) engine=INNODB character set latin1 collate latin1_swedish_ci;
      

      or

      CREATE TABLE t1(col1 VARCHAR(500)) engine=INNODB character set utf8mb4 collate utf8mb4_general_ci;
      CREATE TABLE t2(col2 VARCHAR(500)) engine=INNODB character set utf8mb4 collate utf8mb4_general_ci;
      

      4. Fill out the tables

      insert into t1 select  FLOOR(1 + (RAND() * 200000)) from seq_1_to_200000;
      insert into t2 select  FLOOR(1 + (RAND() * 20000)) from seq_1_to_200000;
      

      5. Execute the query:

      select max(A.col1) from t1 A, t2 B where A.col1=B.col2;
      

      Results for v. 11.6:
      Debug build:

      latin1 utf8mb4
      7 min 6,426 sec 54 min 2,303 sec

      Release build:

      latin1 utf8mb4
      17,266 sec 8 min 21,989 sec

      The results of statistics collected using perf are also attached to the task.

      Attachments

        Issue Links

          Activity

            People

              bar Alexander Barkov
              lstartseva Lena Startseva
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.