[MDEV-22303] Incorrect ordering with REGEXP_REPLACE and OFFSET/LIMIT Created: 2020-04-19  Updated: 2020-06-01  Resolved: 2020-06-01

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.5.2, 10.5
Fix Version/s: 10.5.4

Type: Bug Priority: Major
Reporter: Adam Johnson Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 0
Labels: regression
Environment:

Running through Docker: 10.5.2-MariaDB-1:10.5.2+maria~bionic mariadb.org binary distribution



 Description   

Reproducing SQL test:

CREATE DATABASE test DEFAULT CHARACTER SET utf8mb4;
USE test;
CREATE TABLE author (name VARCHAR(32) NOT NULL);
INSERT INTO author (name) VALUES ('Charles Dickens'), ('Roald Dahl');
SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC;
SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC LIMIT 1;
SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC LIMIT 1 OFFSET 1;

(Based on a failing test in Django-MySQL: https://github.com/adamchainz/django-mysql/blob/2e4bb08488f737306f22b29ff217e2a33823d276/tests/testapp/test_functions.py#L634 ).

Tested using latest Docker images for 10.4 and 10.5.

On MariaDB 10.4, the LIMIT/OFFSET don't chage the order:

$ mysql -h 127.0.0.1 -P 3306 -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 8
Server version: 10.4.12-MariaDB-1:10.4.12+maria~bionic mariadb.org binary distribution
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
root@127.0.0.1 [1]> CREATE DATABASE test DEFAULT CHARACTER SET utf8mb4;
Query OK, 1 row affected (0.002 sec)
 
root@127.0.0.1 [2]> USE test;
Database changed
root@127.0.0.1 [3]> CREATE TABLE author (name VARCHAR(32) NOT NULL);
Query OK, 0 rows affected (0.012 sec)
 
root@127.0.0.1 [4]> INSERT INTO author (name) VALUES ('Charles Dickens'), ('Roald Dahl');
Query OK, 2 rows affected (0.006 sec)
Records: 2  Duplicates: 0  Warnings: 0
 
root@127.0.0.1 [5]> SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC;
+-----------------+------------------+
| name            | surname_first    |
+-----------------+------------------+
| Roald Dahl      | Dahl, Roald      |
| Charles Dickens | Dickens, Charles |
+-----------------+------------------+
2 rows in set (0.002 sec)
 
root@127.0.0.1 [6]> SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC LIMIT 1;
+------------+---------------+
| name       | surname_first |
+------------+---------------+
| Roald Dahl | Dahl, Roald   |
+------------+---------------+
1 row in set (0.002 sec)
 
root@127.0.0.1 [7]> SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC LIMIT 1 OFFSET 1;
+-----------------+------------------+
| name            | surname_first    |
+-----------------+------------------+
| Charles Dickens | Dickens, Charles |
+-----------------+------------------+
1 row in set (0.003 sec)

On MariaDB 10.5, using LIMIT/OFFSET reverses the order

$ mysql -h 127.0.0.1 -P 3306 -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 10.5.2-MariaDB-1:10.5.2+maria~bionic mariadb.org binary distribution
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
root@127.0.0.1 [1]> CREATE DATABASE test DEFAULT CHARACTER SET utf8mb4;
Query OK, 1 row affected (0.002 sec)
 
root@127.0.0.1 [2]> USE test;
Database changed
root@127.0.0.1 [3]> CREATE TABLE author (name VARCHAR(32) NOT NULL);
Query OK, 0 rows affected (0.009 sec)
 
root@127.0.0.1 [4]> INSERT INTO author (name) VALUES ('Charles Dickens'), ('Roald Dahl');
Query OK, 2 rows affected (0.007 sec)
Records: 2  Duplicates: 0  Warnings: 0
 
root@127.0.0.1 [5]> SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC;
+-----------------+------------------+
| name            | surname_first    |
+-----------------+------------------+
| Roald Dahl      | Dahl, Roald      |
| Charles Dickens | Dickens, Charles |
+-----------------+------------------+
2 rows in set (0.003 sec)
 
root@127.0.0.1 [6]> SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC LIMIT 1;
+-----------------+------------------+
| name            | surname_first    |
+-----------------+------------------+
| Charles Dickens | Dickens, Charles |
+-----------------+------------------+
1 row in set (0.002 sec)
 
root@127.0.0.1 [7]> SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM author ORDER BY surname_first ASC LIMIT 1 OFFSET 1;
+------------+---------------+
| name       | surname_first |
+------------+---------------+
| Roald Dahl | Dahl, Roald   |
+------------+---------------+
1 row in set (0.002 sec)



 Comments   
Comment by Alice Sherepa [ 2020-04-20 ]

Thanks for the report! I repeated as described on 10.5 27d9986c1b39dcb061dff9f5ade79

Comment by Alexander Barkov [ 2020-05-29 ]

Repeatable by this script:

CREATE OR REPLACE TABLE t1 (name VARCHAR(32) CHARACTER SET utf8mb4 NOT NULL);
INSERT INTO t1 (name) VALUES ('Charles Dickens'), ('Roald Dahl');
SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM t1 ORDER BY surname_first ASC;
SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM t1 ORDER BY surname_first ASC LIMIT 1;
SELECT name, REGEXP_REPLACE(name, '^(.*) (.*)$', '\\2, \\1') AS surname_first FROM t1 ORDER BY surname_first ASC LIMIT 1 OFFSET 1;

Comment by Alexander Barkov [ 2020-05-29 ]

The problem happens because when my_charset_utf8mb4_general_ci is called:

#0  my_strnxfrm_utf8mb4_general_ci (
    cs=0x20deba0 <my_charset_utf8mb4_general_ci>, 
    dst=0x7fffb0082fcc "Dahl, Roald", dstlen=512, nweights=4194304, 
    src=0x7fffb0082fcc "Dahl, Roald", srclen=11, flags=192)
    at /home/bar/maria-git/server.10.5/strings/strcoll.ic:418

the value of "src" and "dst" is the same:

(gdb) p	src
$50 = (const uchar *) 0x7fffb0082fcc "Dahl, Roald"
(gdb) p	dst
$51 = (uchar *) 0x7fffb0082fcc "Dahl, Roald"

It happens because Sort_param::tmp_buffer is NULL.

Comment by Varun Gupta (Inactive) [ 2020-05-30 ]

the branch is here
https://github.com/MariaDB/server/commits/10.5-mdev22303
have 2 commits, one for the fix and the other for the refactorting changes, after review i can squash them into one

Comment by Alexander Barkov [ 2020-05-30 ]

Both patches look good for me. Ok to push. Thanks!

Generated at Thu Feb 08 09:13:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.