Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6991

GROUP_MIN_MAX optimization is erroneously applied in some cases

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0.14
    • Fix Version/s: 10.0.15
    • Component/s: Optimizer
    • Labels:
      None

      Description

      Problem 1:
      I create a table and populate it with data like this:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (id INT NOT NULL, a VARCHAR(20)) ENGINE=MyISAM;
      INSERT INTO t1 VALUES (1,'2001-01-01');
      INSERT INTO t1 VALUES (1,'2001-01-02');
      INSERT INTO t1 VALUES (1,'2001-01-03');
      INSERT INTO t1 VALUES (1,' 2001-01-04');
      INSERT INTO t1 VALUES (2,'2001-01-01');
      INSERT INTO t1 VALUES (2,'2001-01-02');
      INSERT INTO t1 VALUES (2,'2001-01-03');
      INSERT INTO t1 VALUES (2,' 2001-01-04');
      INSERT INTO t1 VALUES (3,'2001-01-01');
      INSERT INTO t1 VALUES (3,'2001-01-02');
      INSERT INTO t1 VALUES (3,'2001-01-03');
      INSERT INTO t1 VALUES (3,' 2001-01-04');
      INSERT INTO t1 VALUES (4,'2001-01-01');
      INSERT INTO t1 VALUES (4,'2001-01-02');
      INSERT INTO t1 VALUES (4,'2001-01-03');
      INSERT INTO t1 VALUES (4,' 2001-01-04');

      Notice, the values with ' 2001-01-04' in the column 'a' have leading spaces
      ('*space*2001-01-04'), while the other values have no spaces.

      Now I run this query:

      SELECT id,MIN(a),MAX(a) FROM t1 WHERE a>=DATE'2001-01-04' GROUP BY id;

      returns the following result:

      +----+-------------+-------------+
      | id | MIN(a)      | MAX(a)      |
      +----+-------------+-------------+
      |  1 |  2001-01-04 |  2001-01-04 |
      |  2 |  2001-01-04 |  2001-01-04 |
      |  3 |  2001-01-04 |  2001-01-04 |
      |  4 |  2001-01-04 |  2001-01-04 |
      +----+-------------+-------------+

      Looks fine so far. Notice, the column is VARCHAR and comparison is done as DATE.

      Now if I add an index and re-run the query:

      ALTER TABLE t1 ADD KEY(id,a);
      SELECT id,MIN(a),MAX(a) FROM t1 WHERE a>=DATE'2001-01-04' GROUP BY id;

      it returns empty set. Looks wrong. The result set should not change when a new index is added.

      Tracing the code reveals that GROUP_MIN_MAX optimization is applied after adding the index. This is wrong, because comparison is done according to DATE in this example. The index on a VARCHAR column should not be used to optimize DATE operations.

      Problem 2:

      I create a table with a ENUM column with values ordered in counter-alphabetic order:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (id INT NOT NULL, a ENUM('04','03','02','01')) ENGINE=MyISAM;
      INSERT INTO t1 VALUES (1,'01');
      INSERT INTO t1 VALUES (1,'02');
      INSERT INTO t1 VALUES (1,'03');
      INSERT INTO t1 VALUES (1,'04');
      INSERT INTO t1 VALUES (2,'01');
      INSERT INTO t1 VALUES (2,'02');
      INSERT INTO t1 VALUES (2,'03');
      INSERT INTO t1 VALUES (2,'04');
      INSERT INTO t1 VALUES (3,'01');
      INSERT INTO t1 VALUES (3,'02');
      INSERT INTO t1 VALUES (3,'03');
      INSERT INTO t1 VALUES (3,'04');
      INSERT INTO t1 VALUES (4,'01');
      INSERT INTO t1 VALUES (4,'02');
      INSERT INTO t1 VALUES (4,'03');
      INSERT INTO t1 VALUES (4,'04');
      SELECT id,MIN(a),MAX(a) FROM t1 WHERE a>='02' GROUP BY id;

      The query returns:

      +----+--------+--------+
      | id | MIN(a) | MAX(a) |
      +----+--------+--------+
      |  1 | 02     | 04     |
      |  2 | 02     | 04     |
      |  3 | 02     | 04     |
      |  4 | 02     | 04     |
      +----+--------+--------+

      Looks good so far.
      Now I add a key and rerun the query:

      ALTER TABLE t1 ADD KEY(id,a);
      SELECT id,MIN(a),MAX(a) FROM t1 WHERE a>='02' GROUP BY id;

      It returns empty set. This is wrong.

      Explain tells that only 9 out of 16 rows are examined, and tracing reveals that GROUP_MIN_MAX optimization is erroneously enabled to optimize this.

      mysql> EXPLAIN SELECT id,MIN(a),MAX(a) FROM t1 WHERE a>='03' GROUP BY id;
      +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------+
      | id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                 |
      +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------+
      |    1 | SIMPLE      | t1    | range | NULL          | id   | 6       | NULL |    9 | Using where; Using index for group-by |
      +------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------+
      1 row in set (0.00 sec)

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              bar Alexander Barkov
              Reporter:
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: