Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30877

Wrong cardinality estimation for the derived table leads to slow plan with LATERAL DERIVED

    XMLWordPrintable

Details

    Description

      Consider the following slow query:

      MariaDB [(none)]> analyze
          -> SELECT
          ->     esd.esd_c1,
          ->     esd.esd_c2,
          ->     esd.esd_c3,
          ->     es.es_c1,
          ->     es.es_c2,
          ->     es.es_c3,
          ->     es.es_c4,
          ->     es.es_c5,
          ->     es.es_c6,
          ->     es.es_c7,
          ->     es.es_c8
          -> FROM
          ->     DB1.t1 esd
          ->         INNER JOIN
          ->     DB1.t2 es ON esd.t1Id = es.t1Id
          ->         INNER JOIN
          ->     (SELECT
          ->         esd.esd_c1, MAX(es.es_c1) AS last_set
          ->     FROM
          ->         DB1.t1 esd
          ->     INNER JOIN DB1.t2 es ON esd.t1Id = es.t1Id
          ->     GROUP BY esd.esd_c1) q ON esd.esd_c1 = q.esd_c1
          -> WHERE
          ->     es.es_c1 = q.last_set;
      +------+-----------------+------------+-------+---------------------------+-----------+---------+----------------------------------------------+-------+----------+----------+------------+--------------------------+
      | id   | select_type     | table      | type  | possible_keys             | key       | key_len | ref                                          | rows  | r_rows   | filtered | r_filtered | Extra                    |
      +------+-----------------+------------+-------+---------------------------+-----------+---------+----------------------------------------------+-------+----------+----------+------------+--------------------------+
      |    1 | PRIMARY         | esd        | index | PRIMARY,UNIQUEMSG,Index_3 | UNIQUEMSG | 137     | NULL                                         | 15197 | 15197.00 |   100.00 |     100.00 | Using where; Using index |
      |    1 | PRIMARY         | <derived2> | ref   | key0                      | key0      | 33      | DB1.esd.esd_c1                          | 2     | 0.98     |   100.00 |     100.00 | Using where              |
      |    1 | PRIMARY         | es         | ref   | PRIMARY                   | PRIMARY   | 9       | DB1.esd.t1Id,q.last_set | 157   | 21.26    |   100.00 |     100.00 |                          |
      |    2 | LATERAL DERIVED | esd        | ref   | PRIMARY,UNIQUEMSG,Index_3 | UNIQUEMSG | 33      | DB1.esd.esd_c1                          | 7     | 17.40    |   100.00 |     100.00 | Using index              |
      |    2 | LATERAL DERIVED | es         | ref   | PRIMARY                   | PRIMARY   | 4       | DB1.esd.t1Id            | 875   | 783.76   |   100.00 |     100.00 | Using index              |
      +------+-----------------+------------+-------+---------------------------+-----------+---------+----------------------------------------------+-------+----------+----------+------------+--------------------------+
      5 rows in set (1 min 12.157 sec)
      

      It uses LATERAL DERIVED optimization and in the optimizer trace (see attached) we can find the following cardinality estimations (with statistics up to date including histograms etc):

                  "rows_estimation": [
                    { 
                      "table": "esd",
                      "table_scan": {
                        "rows": 15197,
                        "cost": 97
                      }
                    },
                    {
                      "table": "es",
                      "table_scan": {
                        "rows": 12840562,
                        "cost": 103969
                      }
                    },
                    {
                      "table": "<derived2>",
                      "table_scan": {
                        "rows": 13304563,
                        "cost": 1.33e7
                      }
                    }
                  ]
      

      while real row counts are as follows:

      MariaDB [(none)]> select count(*) from DB1.t1 esd;
      +----------+
      | count(*) |
      +----------+
      | 15197 |
      +----------+
      1 row in set (0.032 sec)
       
      MariaDB [(none)]> select count(*) from DB1.t2 es;
      +----------+
      | count(*) |
      +----------+
      | 12846717 |
      +----------+
      1 row in set (13.979 sec)
       
      MariaDB [(none)]> select count(*) from (SELECT
      -> esd.symbol, MAX(es.es_c1) AS last_set
      -> FROM
      -> DB1.t1 esd
      -> INNER JOIN DB1.t2 es ON esd.t1Id = es.t1Id
      -> GROUP BY esd.esd_c1) q;
      +----------+
      | count(*) |
      +----------+
      | 1877 |
      +----------+
      1 row in set (5.507 sec)
      

      You can see above that estimated number of rows for each of the tables is precise or very close to reality, while for the derived table it's orders of magnitude wrong. What can be done to fix this?

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            valerii Valerii Kravchuk
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.