Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34837

Bad index selection for sub-query on self with mariadb >= 11

Details

    • Bug
    • Status: Confirmed (View Workflow)
    • Major
    • Resolution: Unresolved
    • 11.0(EOL), 11.5(EOL), 11.6(EOL)
    • 11.4
    • Optimizer
    • None
    • Official docker image
      Host : Ubuntu 24.04 - amd64

    Description

      Starting with mariadb 11.0, some index are ignored when runing a sub query on the same table as the main query.

      Tester versions

      • 10.11.9 -> Ok
      • 11.0.6 -> Index selection issue
      • 11.5.2 -> Index selection issue
      • 11.6.1 (RC) -> Index selection issue

      Reproduction

      Data setup

      DROP TABLE IF EXISTS `tmp`;
      CREATE TABLE `tmp` (
          `id` INT NOT NULL AUTO_INCREMENT,
          `reference` varchar(64) DEFAULT NULL,
          `mainLine_fk` INT DEFAULT NULL,
          PRIMARY KEY (`id`),
          KEY `mainLineIndex` (`mainLine_fk`),
          CONSTRAINT `mainLineFk` FOREIGN KEY (`mainLine_fk`) REFERENCES `tmp` (`id`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
       
      DROP PROCEDURE IF EXISTS insertProcedure;
      CREATE PROCEDURE insertProcedure()
      BEGIN
          DECLARE i int DEFAULT 1;
          TRUNCATE TABLE tmp;
          WHILE i <= 50000 DO
              INSERT INTO tmp (id, reference, mainLine_fk) VALUES (i, LEFT(UUID(), 8), null);
              SET i = i + 1;
          END WHILE;
      END;
       
      CALL insertProcedure();
       
      UPDATE tmp SET mainLine_fk = 4005 WHERE id > 60 AND id < 80;
      

      Query

      SELECT `root`.`id` AS id,
          (SELECT GROUP_CONCAT(subLines.`reference`)
          FROM `tmp` subLines
          WHERE `root`.`id` = subLines.`mainLine_fk`) AS subLineRefs
      FROM `tmp` `root`
      WHERE `root`.`id` > 4000 AND `root`.`id` < 4500;
      

      In this query, the `mainLineIndex` key is not used with mariadb 11.5.2:
      The query is slow (~1.5s)

      +------+--------------------+----------+-------+---------------+---------+---------+------+-------+-------------+
      | id   | select_type        | table    | type  | possible_keys | key     | key_len | ref  | rows  | Extra       |
      +------+--------------------+----------+-------+---------------+---------+---------+------+-------+-------------+
      |    1 | PRIMARY            | root     | range | PRIMARY       | PRIMARY | 4       | NULL | 499   | Using where |
      |    2 | DEPENDENT SUBQUERY | subLines | ALL   | mainLineIndex | NULL    | NULL    | NULL | 50436 | Using where |
      +------+--------------------+----------+-------+---------------+---------+---------+------+-------+-------------+
      

      With mariadb = 10.11.9, the `mainLineIndex` is used:
      The query is very fast (~20ms)

      +------+--------------------+----------+-------+---------------+---------------+---------+-------------+-------+--------------------------+
      | id   | select_type        | table    | type  | possible_keys | key           | key_len | ref         | rows  | Extra                    |
      +------+--------------------+----------+-------+---------------+---------------+---------+-------------+-------+--------------------------+
      |    1 | PRIMARY            | root     | range | PRIMARY       | PRIMARY       | 4       | NULL        | 499   | Using where; Using index |
      |    2 | DEPENDENT SUBQUERY | subLines | ref   | mainLineIndex | mainLineIndex | 5       | blg.root.id | 24871 |                          |
      +------+--------------------+----------+-------+---------------+---------------+---------+-------------+-------+--------------------------+
      

      Using `FORCE INDEX`, we can use the index with mariadb > 11, and get the fast performance (~20ms) :

      SELECT `root`.`id` AS id,
          (SELECT GROUP_CONCAT(subLines.`reference`)
          FROM `tmp` subLines
          FORCE INDEX (mainLineIndex)
          WHERE `root`.`id` = subLines.`mainLine_fk`) AS subLineRefs
      FROM `tmp` `root`
      WHERE `root`.`id` > 4000 AND `root`.`id` < 4500;
      

      +------+--------------------+----------+-------+---------------+---------------+---------+-------------+-------+-------------+
      | id   | select_type        | table    | type  | possible_keys | key           | key_len | ref         | rows  | Extra       |
      +------+--------------------+----------+-------+---------------+---------------+---------+-------------+-------+-------------+
      |    1 | PRIMARY            | root     | range | PRIMARY       | PRIMARY       | 4       | NULL        | 499   | Using where |
      |    2 | DEPENDENT SUBQUERY | subLines | ref   | mainLineIndex | mainLineIndex | 5       | blg.root.id | 25218 |             |
      +------+--------------------+----------+-------+---------------+---------------+---------+-------------+-------+-------------+
      

      `USE INDEX` has no effect.

      Attachments

        Activity

          alice Alice Sherepa added a comment -

          Thank you for the report!
          I repeated as described on 11.0-11.5.

          If the optimizer has the correct statistics, then query execution is fast:

           
          MariaDB [test]> analyze SELECT sql_no_cache root.id AS id,     (SELECT GROUP_CONCAT(subLines.reference)     FROM t1 subLines     WHERE root.id = subLines.mainLine_fk) AS subLineRefs FROM t1 root WHERE root.id > 4000 AND root.id < 4500;
          +------+--------------------+----------+-------+---------------+---------+---------+------+-------+----------+----------+------------+-------------+
          | id   | select_type        | table    | type  | possible_keys | key     | key_len | ref  | rows  | r_rows   | filtered | r_filtered | Extra       |
          +------+--------------------+----------+-------+---------------+---------+---------+------+-------+----------+----------+------------+-------------+
          |    1 | PRIMARY            | root     | range | PRIMARY       | PRIMARY | 4       | NULL | 499   | 499.00   |   100.00 |     100.00 | Using where |
          |    2 | DEPENDENT SUBQUERY | subLines | ALL   | mainLineIndex | NULL    | NULL    | NULL | 49958 | 50000.00 |    50.00 |       0.00 | Using where |
          +------+--------------------+----------+-------+---------------+---------+---------+------+-------+----------+----------+------------+-------------+
          2 rows in set (2,475 sec)
           
          MariaDB [test]> set use_stat_tables='preferably';
          Query OK, 0 rows affected (0,000 sec)
           
          MariaDB [test]> analyze table t1;
          +---------+---------+----------+-----------------------------------------+
          | Table   | Op      | Msg_type | Msg_text                                |
          +---------+---------+----------+-----------------------------------------+
          | test.t1 | analyze | status   | Engine-independent statistics collected |
          | test.t1 | analyze | status   | OK                                      |
          +---------+---------+----------+-----------------------------------------+
          2 rows in set (0,053 sec)
           
           
          MariaDB [test]> analyze SELECT sql_no_cache root.id AS id,     (SELECT GROUP_CONCAT(subLines.reference)     FROM t1 subLines     WHERE root.id = subLines.mainLine_fk) AS subLineRefs FROM t1 root WHERE root.id > 4000 AND root.id < 4500;
          +------+--------------------+----------+-------+---------------+---------------+---------+--------------+------+--------+----------+------------+-------------+
          | id   | select_type        | table    | type  | possible_keys | key           | key_len | ref          | rows | r_rows | filtered | r_filtered | Extra       |
          +------+--------------------+----------+-------+---------------+---------------+---------+--------------+------+--------+----------+------------+-------------+
          |    1 | PRIMARY            | root     | range | PRIMARY       | PRIMARY       | 4       | NULL         | 499  | 499.00 |   100.00 |     100.00 | Using where |
          |    2 | DEPENDENT SUBQUERY | subLines | ref   | mainLineIndex | mainLineIndex | 5       | test.root.id | 19   | 0.04   |   100.00 |     100.00 |             |
          +------+--------------------+----------+-------+---------------+---------------+---------+--------------+------+--------+----------+------------+-------------+
          2 rows in set (0,004 sec)
          
          

          mtr has use_stat_tables='preferably' as a default, so I did not set it in the test:

          --source include/have_innodb.inc
           
          CREATE TABLE t1 (
              id INT NOT NULL AUTO_INCREMENT,
              reference varchar(64),
              mainLine_fk INT ,
              PRIMARY KEY (id),
              KEY mainLineIndex (mainLine_fk),
              CONSTRAINT mainLineFk FOREIGN KEY (mainLine_fk) REFERENCES t1 (id)
          ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
           
           
          delimiter //;
           
          CREATE PROCEDURE insertProcedure()
          BEGIN
              DECLARE i int DEFAULT 1;
              TRUNCATE TABLE t1;
              WHILE i <= 50000 DO
                  INSERT INTO t1 (id, reference, mainLine_fk) VALUES (i, LEFT(UUID(), 8), null);
                  SET i = i + 1;
              END WHILE;
          END;//
           
          delimiter ;//
           
          CALL insertProcedure();
           
          UPDATE t1 SET mainLine_fk = 4005 WHERE id > 60 AND id < 80;
           
          #analyze table t1;
           
          explain extended  SELECT root.id AS id,
              (SELECT GROUP_CONCAT(subLines.reference)
              FROM t1 subLines
              WHERE root.id = subLines.mainLine_fk) AS subLineRefs
          FROM t1 root
          WHERE root.id > 4000 AND root.id < 4500;
           
           
          drop table t1;
          

          alice Alice Sherepa added a comment - Thank you for the report! I repeated as described on 11.0-11.5. If the optimizer has the correct statistics, then query execution is fast:   MariaDB [test]> analyze SELECT sql_no_cache root.id AS id, (SELECT GROUP_CONCAT(subLines.reference) FROM t1 subLines WHERE root.id = subLines.mainLine_fk) AS subLineRefs FROM t1 root WHERE root.id > 4000 AND root.id < 4500; +------+--------------------+----------+-------+---------------+---------+---------+------+-------+----------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+--------------------+----------+-------+---------------+---------+---------+------+-------+----------+----------+------------+-------------+ | 1 | PRIMARY | root | range | PRIMARY | PRIMARY | 4 | NULL | 499 | 499.00 | 100.00 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | subLines | ALL | mainLineIndex | NULL | NULL | NULL | 49958 | 50000.00 | 50.00 | 0.00 | Using where | +------+--------------------+----------+-------+---------------+---------+---------+------+-------+----------+----------+------------+-------------+ 2 rows in set (2,475 sec)   MariaDB [test]> set use_stat_tables='preferably'; Query OK, 0 rows affected (0,000 sec)   MariaDB [test]> analyze table t1; +---------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+-----------------------------------------+ | test.t1 | analyze | status | Engine-independent statistics collected | | test.t1 | analyze | status | OK | +---------+---------+----------+-----------------------------------------+ 2 rows in set (0,053 sec)     MariaDB [test]> analyze SELECT sql_no_cache root.id AS id, (SELECT GROUP_CONCAT(subLines.reference) FROM t1 subLines WHERE root.id = subLines.mainLine_fk) AS subLineRefs FROM t1 root WHERE root.id > 4000 AND root.id < 4500; +------+--------------------+----------+-------+---------------+---------------+---------+--------------+------+--------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+--------------------+----------+-------+---------------+---------------+---------+--------------+------+--------+----------+------------+-------------+ | 1 | PRIMARY | root | range | PRIMARY | PRIMARY | 4 | NULL | 499 | 499.00 | 100.00 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | subLines | ref | mainLineIndex | mainLineIndex | 5 | test.root.id | 19 | 0.04 | 100.00 | 100.00 | | +------+--------------------+----------+-------+---------------+---------------+---------+--------------+------+--------+----------+------------+-------------+ 2 rows in set (0,004 sec) mtr has use_stat_tables='preferably' as a default, so I did not set it in the test: --source include/have_innodb.inc CREATE TABLE t1 ( id INT NOT NULL AUTO_INCREMENT, reference varchar (64), mainLine_fk INT , PRIMARY KEY (id), KEY mainLineIndex (mainLine_fk), CONSTRAINT mainLineFk FOREIGN KEY (mainLine_fk) REFERENCES t1 (id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE =utf8mb4_bin;   delimiter //;   CREATE PROCEDURE insertProcedure() BEGIN DECLARE i int DEFAULT 1; TRUNCATE TABLE t1; WHILE i <= 50000 DO INSERT INTO t1 (id, reference, mainLine_fk) VALUES (i, LEFT (UUID(), 8), null ); SET i = i + 1; END WHILE; END ;//   delimiter ;// CALL insertProcedure(); UPDATE t1 SET mainLine_fk = 4005 WHERE id > 60 AND id < 80;   #analyze table t1;   explain extended SELECT root.id AS id, ( SELECT GROUP_CONCAT(subLines.reference) FROM t1 subLines WHERE root.id = subLines.mainLine_fk) AS subLineRefs FROM t1 root WHERE root.id > 4000 AND root.id < 4500;     drop table t1;
          Congelli501 Colin GILLE added a comment -

          FYI, running analyse table didn't improve the query speed / execution schema on my side (with real data).

          Would you like an improved dataset that doesn't improve the query speed after an analyse table ?

          Congelli501 Colin GILLE added a comment - FYI, running analyse table didn't improve the query speed / execution schema on my side (with real data). Would you like an improved dataset that doesn't improve the query speed after an analyse table ?

          People

            psergei Sergei Petrunia
            Congelli501 Colin GILLE
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.