Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22303

Incorrect ordering with REGEXP_REPLACE and OFFSET/LIMIT

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.5.2, 10.5
    • 10.5.4
    • Optimizer
    • Running through Docker: 10.5.2-MariaDB-1:10.5.2+maria~bionic mariadb.org binary distribution

    Description

      Reproducing SQL test:

      CREATE DATABASE test DEFAULT CHARACTER SET utf8mb4;
      USE test;
      CREATE TABLE author (name VARCHAR(32) NOT NULL);
      INSERT INTO author (name) VALUES ('Charles Dickens'), ('Roald Dahl');
      SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC;
      SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC LIMIT 1;
      SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC LIMIT 1 OFFSET 1;
      

      (Based on a failing test in Django-MySQL: https://github.com/adamchainz/django-mysql/blob/2e4bb08488f737306f22b29ff217e2a33823d276/tests/testapp/test_functions.py#L634 ).

      Tested using latest Docker images for 10.4 and 10.5.

      On MariaDB 10.4, the LIMIT/OFFSET don't chage the order:

      $ mysql -h 127.0.0.1 -P 3306 -u root
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 8
      Server version: 10.4.12-MariaDB-1:10.4.12+maria~bionic mariadb.org binary distribution
       
      Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      root@127.0.0.1 [1]> CREATE DATABASE test DEFAULT CHARACTER SET utf8mb4;
      Query OK, 1 row affected (0.002 sec)
       
      root@127.0.0.1 [2]> USE test;
      Database changed
      root@127.0.0.1 [3]> CREATE TABLE author (name VARCHAR(32) NOT NULL);
      Query OK, 0 rows affected (0.012 sec)
       
      root@127.0.0.1 [4]> INSERT INTO author (name) VALUES ('Charles Dickens'), ('Roald Dahl');
      Query OK, 2 rows affected (0.006 sec)
      Records: 2  Duplicates: 0  Warnings: 0
       
      root@127.0.0.1 [5]> SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC;
      +-----------------+------------------+
      | name            | surname_first    |
      +-----------------+------------------+
      | Roald Dahl      | Dahl, Roald      |
      | Charles Dickens | Dickens, Charles |
      +-----------------+------------------+
      2 rows in set (0.002 sec)
       
      root@127.0.0.1 [6]> SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC LIMIT 1;
      +------------+---------------+
      | name       | surname_first |
      +------------+---------------+
      | Roald Dahl | Dahl, Roald   |
      +------------+---------------+
      1 row in set (0.002 sec)
       
      root@127.0.0.1 [7]> SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC LIMIT 1 OFFSET 1;
      +-----------------+------------------+
      | name            | surname_first    |
      +-----------------+------------------+
      | Charles Dickens | Dickens, Charles |
      +-----------------+------------------+
      1 row in set (0.003 sec)
      

      On MariaDB 10.5, using LIMIT/OFFSET reverses the order

      $ mysql -h 127.0.0.1 -P 3306 -u root
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 3
      Server version: 10.5.2-MariaDB-1:10.5.2+maria~bionic mariadb.org binary distribution
       
      Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      root@127.0.0.1 [1]> CREATE DATABASE test DEFAULT CHARACTER SET utf8mb4;
      Query OK, 1 row affected (0.002 sec)
       
      root@127.0.0.1 [2]> USE test;
      Database changed
      root@127.0.0.1 [3]> CREATE TABLE author (name VARCHAR(32) NOT NULL);
      Query OK, 0 rows affected (0.009 sec)
       
      root@127.0.0.1 [4]> INSERT INTO author (name) VALUES ('Charles Dickens'), ('Roald Dahl');
      Query OK, 2 rows affected (0.007 sec)
      Records: 2  Duplicates: 0  Warnings: 0
       
      root@127.0.0.1 [5]> SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC;
      +-----------------+------------------+
      | name            | surname_first    |
      +-----------------+------------------+
      | Roald Dahl      | Dahl, Roald      |
      | Charles Dickens | Dickens, Charles |
      +-----------------+------------------+
      2 rows in set (0.003 sec)
       
      root@127.0.0.1 [6]> SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC LIMIT 1;
      +-----------------+------------------+
      | name            | surname_first    |
      +-----------------+------------------+
      | Charles Dickens | Dickens, Charles |
      +-----------------+------------------+
      1 row in set (0.002 sec)
       
      root@127.0.0.1 [7]> SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC LIMIT 1 OFFSET 1;
      +------------+---------------+
      | name       | surname_first |
      +------------+---------------+
      | Roald Dahl | Dahl, Roald   |
      +------------+---------------+
      1 row in set (0.002 sec)
      

      Attachments

        Activity

          alice Alice Sherepa added a comment -

          Thanks for the report! I repeated as described on 10.5 27d9986c1b39dcb061dff9f5ade79

          alice Alice Sherepa added a comment - Thanks for the report! I repeated as described on 10.5 27d9986c1b39dcb061dff9f5ade79
          bar Alexander Barkov added a comment - - edited

          Repeatable by this script:

          CREATE OR REPLACE TABLE t1 (name VARCHAR(32) CHARACTER SET utf8mb4 NOT NULL);
          INSERT INTO t1 (name) VALUES ('Charles Dickens'), ('Roald Dahl');
          SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM t1 ORDER BY surname_first ASC;
          SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM t1 ORDER BY surname_first ASC LIMIT 1;
          SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM t1 ORDER BY surname_first ASC LIMIT 1 OFFSET 1;
          

          bar Alexander Barkov added a comment - - edited Repeatable by this script: CREATE OR REPLACE TABLE t1 ( name VARCHAR (32) CHARACTER SET utf8mb4 NOT NULL ); INSERT INTO t1 ( name ) VALUES ( 'Charles Dickens' ), ( 'Roald Dahl' ); SELECT name , REGEXP_REPLACE( name , '^(.*) (.*)$' , '\\2, \\1' ) AS surname_first FROM t1 ORDER BY surname_first ASC ; SELECT name , REGEXP_REPLACE( name , '^(.*) (.*)$' , '\\2, \\1' ) AS surname_first FROM t1 ORDER BY surname_first ASC LIMIT 1; SELECT name , REGEXP_REPLACE( name , '^(.*) (.*)$' , '\\2, \\1' ) AS surname_first FROM t1 ORDER BY surname_first ASC LIMIT 1 OFFSET 1;
          bar Alexander Barkov added a comment - - edited

          The problem happens because when my_charset_utf8mb4_general_ci is called:

          #0  my_strnxfrm_utf8mb4_general_ci (
              cs=0x20deba0 <my_charset_utf8mb4_general_ci>, 
              dst=0x7fffb0082fcc "Dahl, Roald", dstlen=512, nweights=4194304, 
              src=0x7fffb0082fcc "Dahl, Roald", srclen=11, flags=192)
              at /home/bar/maria-git/server.10.5/strings/strcoll.ic:418
          

          the value of "src" and "dst" is the same:

          (gdb) p	src
          $50 = (const uchar *) 0x7fffb0082fcc "Dahl, Roald"
          (gdb) p	dst
          $51 = (uchar *) 0x7fffb0082fcc "Dahl, Roald"
          

          It happens because Sort_param::tmp_buffer is NULL.

          bar Alexander Barkov added a comment - - edited The problem happens because when my_charset_utf8mb4_general_ci is called: #0 my_strnxfrm_utf8mb4_general_ci ( cs=0x20deba0 <my_charset_utf8mb4_general_ci>, dst=0x7fffb0082fcc "Dahl, Roald" , dstlen=512, nweights=4194304, src=0x7fffb0082fcc "Dahl, Roald" , srclen=11, flags=192) at /home/bar/maria-git/server.10.5/strings/strcoll.ic:418 the value of "src" and "dst" is the same: (gdb) p src $50 = (const uchar *) 0x7fffb0082fcc "Dahl, Roald" (gdb) p dst $51 = (uchar *) 0x7fffb0082fcc "Dahl, Roald" It happens because Sort_param::tmp_buffer is NULL.

          the branch is here
          https://github.com/MariaDB/server/commits/10.5-mdev22303
          have 2 commits, one for the fix and the other for the refactorting changes, after review i can squash them into one

          varun Varun Gupta (Inactive) added a comment - the branch is here https://github.com/MariaDB/server/commits/10.5-mdev22303 have 2 commits, one for the fix and the other for the refactorting changes, after review i can squash them into one

          Both patches look good for me. Ok to push. Thanks!

          bar Alexander Barkov added a comment - Both patches look good for me. Ok to push. Thanks!

          People

            varun Varun Gupta (Inactive)
            adamchainz Adam Johnson
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.