Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-37050

Lousy join order with highly skewed data distribution

    XMLWordPrintable

Details

    • Related to performance
    • Highly skewed data can cause epically bad join order.
    • Q3/2025 Maintenance

    Description

      Discovered while analyzing MDEV-36948

      (See also the comment starting with One can actually construct a join of base tables that shows a similar effect. )

      drop table if exists t1, t2;
      create table t1 (a int, b int, c int, d int);
      create table t2 (a int, b int, c int, d int);
      create index t1_ix1 on t1 (a, b, c);
      create index t2_ix1 on t2 (a, b);
       
      insert into t1 select x.seq, y.seq, z.seq, 1 from seq_1_to_100 x, seq_1_to_100 y, seq_1_to_100 z;
      insert into t1 select 100, 100, 101, seq from seq_1_to_10000;
      insert into t2 select a, b, c, max(d) from t1 group by a, b, c;
       
      explain select count(*) from t1 join t2 on t1.a = t2.a and t1.b = t2.b and ((t1.c = t2.c) or (t1.c is null and t2.c is null)) and t1.d < t2.d;
       
      explain select count(*) from t1 join ( select a, b, c, max(d) as max_d from t1 group by a, b, c ) t on t1.a = t.a and t1.b = t.b and t1.c = t.c and t1.d < max_d;
       
      explain select count(*) from t1 join t2 on t1.a = t2.a and t1.b = t2.b and ((t1.c = t2.c) or (t1.c is null and t2.c is null)) and t1.d < t2.d;
       
      select count(*) from t1 join ( select a, b, c, max(d) as max_d from t1 group by a, b, c ) t on t1.a = t.a and t1.b = t.b and t1.c = t.c and t1.d < max_d;
       
      analyze table t1, t2 persistent for all;
       
      select count(*) from t1 join ( select a, b, c, max(d) as max_d from t1 group by a, b, c ) t on t1.a = t.a and t1.b = t.b and t1.c = t.c and t1.d < max_d;
       
      drop table t1, t2;
      

      prior to the analyze, main branch server, debug build

      MariaDB [test]> explain select count(*) from t1 join t2 on t1.a = t2.a and t1.b = t2.b and ((t1.c = t2.c) or (t1.c is null and t2.c is null)) and t1.d < t2.d;
      +------+-------------+-------+------+---------------+--------+---------+---------------------+---------+-------------+
      | id   | select_type | table | type | possible_keys | key    | key_len | ref                 | rows    | Extra       |
      +------+-------------+-------+------+---------------+--------+---------+---------------------+---------+-------------+
      |    1 | SIMPLE      | t1    | ALL  | t1_ix1        | NULL   | NULL    | NULL                | 1010000 | Using where |
      |    1 | SIMPLE      | t2    | ref  | t2_ix1        | t2_ix1 | 10      | test.t1.a,test.t1.b | 1       | Using where |
      +------+-------------+-------+------+---------------+--------+---------+---------------------+---------+-------------+
      2 rows in set (0.000 sec)
       
      MariaDB [test]> select count(*) from t1 join t2 on t1.a = t2.a and t1.b = t2.b and ((t1.c = t2.c) or (t1.c is null and t2.c is null)) and t1.d < t2.d;
      +----------+
      | count(*) |
      +----------+
      |     9999 |
      +----------+
      1 row in set (58.803 sec)
      

      post analyze

      MariaDB [test]> explain select count(*) from t1 join t2 on t1.a = t2.a and t1.b = t2.b and ((t1.c = t2.c) or (t1.c is null and t2.c is null)) and t1.d < t2.d;
      +------+-------------+-------+-------------+---------------+--------+---------+-------------------------------+---------+------------------------------------+
      | id   | select_type | table | type        | possible_keys | key    | key_len | ref                           | rows    | Extra                              |
      +------+-------------+-------+-------------+---------------+--------+---------+-------------------------------+---------+------------------------------------+
      |    1 | SIMPLE      | t2    | ALL         | t2_ix1        | NULL   | NULL    | NULL                          | 1000001 | Using where                        |
      |    1 | SIMPLE      | t1    | ref_or_null | t1_ix1        | t1_ix1 | 15      | test.t2.a,test.t2.b,test.t2.c | 2       | Using index condition; Using where |
      +------+-------------+-------+-------------+---------------+--------+---------+-------------------------------+---------+------------------------------------+
      2 rows in set (0.001 sec)
       
      MariaDB [test]> select count(*) from t1 join t2 on t1.a = t2.a and t1.b = t2.b and ((t1.c = t2.c) or (t1.c is null and t2.c is null)) and t1.d < t2.d;
      +----------+
      | count(*) |
      +----------+
      |     9999 |
      +----------+
      1 row in set (2.613 sec)
      

      Attachments

        1. otsn.text
          12 kB
        2. otsy.text
          12 kB

        Issue Links

          Activity

            People

              Johnston Rex Johnston
              Johnston Rex Johnston
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.