[MDEV-25858] Query results are incorrect when indexes are added. Created: 2021-06-04  Updated: 2022-01-26  Resolved: 2021-07-15

Status: Closed
Project: MariaDB Server
Component/s: Data Manipulation - Subquery
Affects Version/s: 10.5.10, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.2.40, 10.3.31, 10.4.21, 10.5.12, 10.6.4

Type: Bug Priority: Major
Reporter: Wataru Maeda Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS 7.9, Docker


Attachments: File insert_data.sql     File select.sql     File setup.sql    
Issue Links:
Relates
relates to MDEV-27270 Wrong query plan with Range Checked f... Closed

 Description   

I am using MariaDB for several products.

When I inserted data in a table with indexes pre-added, I sometimes did not get the expected search results.

I tried this with Mariadb 10.5.10 on Docker container.

First, start the MariaDB container on Linux (CentOS 7.9).

docker run -p 3306:3306 --name test-mariadb -e MARIADB_ROOT_PASSWORD=12345678 -d mariadb:10.5.10

Next, I ran the following on my MacBook.

lobin@Nagisa relief-map % mysql -h192.168.59.109 -uroot -p12345678 < setup.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
lobin@Nagisa relief-map % mysql -h192.168.59.109 -uroot -p12345678 test1 < insert_data.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
lobin@Nagisa relief-map % mysql -h192.168.59.109 -uroot -p12345678 --table test1 < select.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
+----+----------+---------------+------+-------------+-------------+---------------------+---------------------+---------------------+
| ID | Facility | prefecture_id | ID   | Facility_ID | damage_type | comment             | regist_date         | update_date         |
+----+----------+---------------+------+-------------+-------------+---------------------+---------------------+---------------------+
|  1 | having0  |            27 | NULL |        NULL |        NULL | NULL                | NULL                | NULL                |
|  2 | having1  |            27 |    1 |           2 |           4 | this is latest.     | 2019-03-05 00:00:00 | 2019-03-06 00:00:00 |
|  3 | having2  |            27 |    2 |           3 |           4 | this is not latest. | 2019-03-05 00:00:00 | 2019-03-06 00:00:00 |
+----+----------+---------------+------+-------------+-------------+---------------------+---------------------+---------------------+
lobin@Nagisa relief-map %

This is not the result I was expecting.

Next, I removed the index created at the end of `setup.sql` and ran `select.sql` again, and got the expected results.

lobin@Nagisa relief-map % mysql -h192.168.59.109 -uroot -p12345678 --table test1
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 42
Server version: 5.5.5-10.5.10-MariaDB-1:10.5.10+maria~focal mariadb.org binary distribution
 
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> DROP INDEX `for_latest_sort` ON `facility2_damages`
    -> ;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
mysql> ^DBye
lobin@Nagisa relief-map % mysql -h192.168.59.109 -uroot -p12345678 --table test1 < select.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
+----+----------+---------------+------+-------------+-------------+------------------+---------------------+---------------------+
| ID | Facility | prefecture_id | ID   | Facility_ID | damage_type | comment          | regist_date         | update_date         |
+----+----------+---------------+------+-------------+-------------+------------------+---------------------+---------------------+
|  1 | having0  |            27 | NULL |        NULL |        NULL | NULL             | NULL                | NULL                |
|  2 | having1  |            27 |    1 |           2 |           4 | this is latest.  | 2019-03-05 00:00:00 | 2019-03-06 00:00:00 |
|  3 | having2  |            27 |    3 |           3 |           4 | this is latest2. | 2019-03-06 00:00:00 | 2019-03-05 00:00:00 |
+----+----------+---------------+------+-------------+-------------+------------------+---------------------+---------------------+
lobin@Nagisa relief-map %

After that, I re-added the same index, and the correct search results remained.
It seems that adding the index first and then inserting the data will result in incorrect search results.

I also tried with MariaDB 10.2.38 and the results were the same.

I think It might be related to the fact that I'm trying to reference the index in the correlation subquery.

I tried the same with MySQL 5.7.34 and got the correct results without reindexing.

# start a MySQL container On CentOS
docker run -p 3307:3306 --name test-mysql -e MYSQL_ROOT_PASSWORD=12345678 -d mysql:5.7.34

lobin@Nagisa relief-map % mysql -h192.168.59.109 -P3307 -uroot -p12345678 < setup.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
lobin@Nagisa relief-map % mysql -h192.168.59.109 -P3307 -uroot -p12345678 test1 < insert_data.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
lobin@Nagisa relief-map % mysql -h192.168.59.109 -P3307 -uroot -p12345678 --table  test1 < select.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
+----+----------+---------------+------+-------------+-------------+------------------+---------------------+---------------------+
| ID | Facility | prefecture_id | ID   | Facility_ID | damage_type | comment          | regist_date         | update_date         |
+----+----------+---------------+------+-------------+-------------+------------------+---------------------+---------------------+
|  2 | having1  |            27 |    1 |           2 |           4 | this is latest.  | 2019-03-05 00:00:00 | 2019-03-06 00:00:00 |
|  3 | having2  |            27 |    3 |           3 |           4 | this is latest2. | 2019-03-06 00:00:00 | 2019-03-05 00:00:00 |
|  1 | having0  |            27 | NULL |        NULL |        NULL | NULL             | NULL                | NULL                |
+----+----------+---------------+------+-------------+-------------+------------------+---------------------+---------------------+
lobin@Nagisa relief-map %

thanx.



 Comments   
Comment by Alice Sherepa [ 2021-06-04 ]

Thanks for the report!
I repeated on 10.2-10.5:

--source include/have_innodb.inc
 
CREATE TABLE t1 (id int NOT NULL PRIMARY KEY) engine=innodb ;
insert into t1 values (1),(2),(3);
 
CREATE TABLE t2 (id int NOT NULL PRIMARY KEY, id2 int NOT NULL, d1 datetime, d2 timestamp NOT NULL,
 KEY id2 (id2)) engine=innodb ;
insert into t2 values (1,2,'2019-03-05 00:00:00','2019-03-06 00:00:00'),(2,3,'2019-03-05 00:00:00','2019-03-06 00:00:00'),(3,3,'2019-03-06 00:00:00','2019-03-05 00:00:00');
 
select t1.id,t2.id from t1 left join t2 
on t2.id2 = t1.id
    and (t2.id = (select dd.id from t2 dd 
    where dd.id2 = t1.id and d1 > '2019-02-06 00:00:00' 
    order by dd.d1 desc, dd.d2 desc, dd.id desc limit 1 ));
 
create index for_latest_sort on t2 (d1 desc, d2 desc, id desc);
 
select t1.id,t2.id from t1 left join t2 
on t2.id2 = t1.id
    and (t2.id = (select dd.id from t2 dd 
    where dd.id2 = t1.id and d1 > '2019-02-06 00:00:00' 
    order by dd.d1 desc, dd.d2 desc, dd.id desc limit 1 ));

MariaDB [test]> select t1.id,t2.id from t1 left join t2 
    -> on t2.id2 = t1.id
    -> and (t2.id = (select dd.id from t2 dd 
    -> where dd.id2 = t1.id and d1 > '2019-02-06 00:00:00' 
    -> order by dd.d1 desc, dd.d2 desc, dd.id desc limit 1 ));
+----+------+
| id | id   |
+----+------+
|  1 | NULL |
|  2 |    1 |
|  3 |    3 |
+----+------+
3 rows in set (0.001 sec)
 
MariaDB [test]> analyze select t1.id,t2.id from t1 left join t2  on t2.id2 = t1.id and (t2.id = (select dd.id from t2 dd  where dd.id2 = t1.id and d1 > '2019-02-06 00:00:00'  order by dd.d1 desc, dd.d2 desc, dd.id desc limit 1 ));
+------+--------------------+-------+--------+---------------+---------+---------+------+------+--------+----------+------------+-----------------------------+
| id   | select_type        | table | type   | possible_keys | key     | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra                       |
+------+--------------------+-------+--------+---------------+---------+---------+------+------+--------+----------+------------+-----------------------------+
|    1 | PRIMARY            | t1    | index  | NULL          | PRIMARY | 4       | NULL | 3    | 3.00   |   100.00 |     100.00 | Using index                 |
|    1 | PRIMARY            | t2    | eq_ref | PRIMARY,id2   | PRIMARY | 4       | func | 1    | 0.67   |   100.00 |     100.00 | Using where                 |
|    2 | DEPENDENT SUBQUERY | dd    | ALL    | id2           | NULL    | NULL    | NULL | 3    | 0.86   |   100.00 |     100.00 | Using where; Using filesort |
+------+--------------------+-------+--------+---------------+---------+---------+------+------+--------+----------+------------+-----------------------------+
3 rows in set (0.001 sec)
 
 
MariaDB [test]> create index for_latest_sort on t2 (d1 desc, d2 desc, id desc);
Query OK, 0 rows affected (0.026 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select t1.id,t2.id from t1 left join t2 
    -> on t2.id2 = t1.id
    -> and (t2.id = (select dd.id from t2 dd 
    -> where dd.id2 = t1.id and d1 > '2019-02-06 00:00:00' 
    -> order by dd.d1 desc, dd.d2 desc, dd.id desc limit 1 ));
+----+------+
| id | id   |
+----+------+
|  1 | NULL |
|  2 |    1 |
|  3 |    2 |
+----+------+
3 rows in set (0.002 sec)
 
MariaDB [test]> analyze select t1.id,t2.id from t1 left join t2  on t2.id2 = t1.id and (t2.id = (select dd.id from t2 dd  where dd.id2 = t1.id and d1 > '2019-02-06 00:00:00'  order by dd.d1 desc, dd.d2 desc, dd.id desc limit 1 ));
+------+--------------------+-------+--------+---------------------+-----------------+---------+------+------+--------+----------+------------+------------------------------------------------+
| id   | select_type        | table | type   | possible_keys       | key             | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra                                          |
+------+--------------------+-------+--------+---------------------+-----------------+---------+------+------+--------+----------+------------+------------------------------------------------+
|    1 | PRIMARY            | t1    | index  | NULL                | PRIMARY         | 4       | NULL | 3    | 3.00   |   100.00 |     100.00 | Using index                                    |
|    1 | PRIMARY            | t2    | eq_ref | PRIMARY,id2         | PRIMARY         | 4       | func | 1    | 0.67   |   100.00 |     100.00 | Using where                                    |
|    2 | DEPENDENT SUBQUERY | dd    | ALL    | id2,for_latest_sort | for_latest_sort | 6       | NULL | 3    | 0.86   |   100.00 |     100.00 | Range checked for each record (index map: 0x6) |
+------+--------------------+-------+--------+---------------------+-----------------+---------+------+------+--------+----------+------------+------------------------------------------------+
3 rows in set (0.002 sec)

Comment by Sergei Petrunia [ 2021-06-09 ]

The problem is here in the subquery's plan:

*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: dd
         type: ALL
possible_keys: id2,for_latest_sort
          key: for_latest_sort
      key_len: 6
          ref: NULL
         rows: 3
        Extra: Range checked for each record (index map: 0x6)
3 rows in set (0.00 sec)

How does one ensure that order by dd.d1 desc, dd.d2 desc, dd.id desc limit 1 is enforced?
It is possible to do with index for_latest_stort, but it is not done with index id2.

Comment by Sergei Petrunia [ 2021-06-09 ]

Yes, when debugging, I see this to happen:

The subquery is executed,
"Range checked for each record" picks index id2.
The first record returned by the quick select is saved as the subselect's output.
The record is

(2,3,'2019-03-05 00:00:00','2019-03-06 00:00:00'),

instead of

(3,3,'2019-03-06 00:00:00','2019-03-05 00:00:00');

and the wrong query output agrees with this.

Comment by Sergei Petrunia [ 2021-06-10 ]

Indexes in table t2:

  PRIMARY KEY (id),
  KEY id2 (id2),
  KEY for_latest_sort (d1,d2,id)

Subquery optimization goes like this:

Range optimizer produces:

  • sel->needed_reg= {2}

    // key id2

  • sel->quick_keys= {4}

    // key for_latest_sort

  • sel->quick = QUICK_RANGE_SELECT(index=2, for_latest_sort)

Based on the above, make_join_select() sets tab->use_quick=2 (do "Range checked for each record")

Then, test_if_skip_sort_order()

  • figures that the present quick select uses a suitable index
  • produces QUICK_SELECT_DESC to do reverse scans
  • doesn't reset tab->use_quick, though.
    • (which it could do, and would do in other scenarios, it has tab->use_quick=1 assignments in several places).

Then, query execution starts

  • Range-checked-for-each-record code deletes the quick select (the QUICK_SELECT_DESC), and creates a quick select on index id2.
  • That quick select produces wrong data
Comment by Sergei Petrunia [ 2021-06-10 ]

If I make test_if_skip_sort_order() to change tab->use_quick from 2 to 1 for this query, it doesn't help.

This is because make_join_readinfo() has already set tab->read_first_record to be join_init_quick_read_record. Changing tab->use_quick is too late here.

Comment by Sergei Petrunia [ 2021-06-10 ]

... what about already existing lines in test_if_skip_sort_order() that assign tab->use_quick=1 ? Do we have tests where they would actually change tab->use_quick from 2 to 1? The answer is no: I've added printout and ran the testsuite. Nothing was printed.

Comment by Sergei Petrunia [ 2021-06-10 ]

Note that test_if_skip_sort_order also does this:

  tab->read_first_record= join_init_read_record;

Relevant changesets:
https://github.com/MariaDB/server/commit/77fbeeab2464d5f035a7393eb11953604a4ac3f8
https://github.com/MariaDB/server/commit/8aa88db3c2f39cadeb8960f514d27cc7f071dcac

Comment by Sergei Petrunia [ 2021-06-11 ]

bb-10.2-mdev25858

Comment by Sergei Petrunia [ 2021-06-11 ]

http://lists.askmonty.org/pipermail/commits/2021-June/014645.html

Comment by Sergei Petrunia [ 2021-06-29 ]

Sanja, please review.

Comment by Oleksandr Byelkin [ 2021-07-14 ]

OK to push

Generated at Thu Feb 08 09:40:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.