Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9533

Query with an aggregate values in ORDER BY clause takes more memory and time than it should

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 5.5, 10.0, 10.1
    • 10.1
    • None
    • None

    Description

      The data file is attached, it's three tables, 20, 100 and 1000 rows, named accordingly.

      Queries similar to the one below do not work well on MariaDB (5.5, 10.0, and 10.1) in terms of memory usage, comparing to MySQL, at least with default configuration of each. It's also slower on MariaDB.

      SELECT 
        COUNT( DISTINCT alias1.`col_date_key`, alias1.`col_date_nokey`, alias1.`col_datetime_nokey`, alias1.`col_int_key`, alias1.`col_int_key`, alias1.`col_date_key` ) AS cnt, 
        SUM(alias1.col_int_key + alias1.col_int_nokey) AS sm 
      FROM 
        ( SELECT DISTINCT SQ1_alias1.* FROM t20 AS SQ1_alias1, t20 AS SQ1_alias2 ) AS alias1, 
        ( SELECT SQ2_alias1.* FROM t100 AS SQ2_alias1 
                   LEFT OUTER JOIN 
                 ( t1000 AS SQ2_alias2 INNER JOIN t20 AS SQ2_alias3 ) 
                   ON (SQ2_alias3.`col_varchar_key` = SQ2_alias2.`col_varchar_key` ) ) AS alias2, 
        t100 AS alias3 
      WHERE 
        alias1.`col_varchar_key` IN ('f', 'z') OR alias1.`col_int_key` NOT BETWEEN 104 AND 196 
      ORDER BY sm, cnt;

      With MariaDB (e.g. 10.1.11) it works this way.
      Server starts with approximately 750 Mb/140 Mb (here and further, two numbers indicate VIRT/RES memory as shown by top, correspondingly).
      When the query starts running, a ~3.5 Gb on-disk temporary table is created first.
      After it is on disk, mysqld starts growing in memory, and grows up to 3.4 Gb / 3.3 Gb.
      After the query is finished, the numbers go down significantly.
      The query takes ~4.5 min on my machine.

      Plan on MariaDB 10.1.11

      +------+-------------+------------+-------+-----------------+-----------------+---------+---------------------------------+------+----------+-----------------------------------------------------------+
      | id   | select_type | table      | type  | possible_keys   | key             | key_len | ref                             | rows | filtered | Extra                                                     |
      +------+-------------+------------+-------+-----------------+-----------------+---------+---------------------------------+------+----------+-----------------------------------------------------------+
      |    1 | PRIMARY     | SQ2_alias1 | index | NULL            | PRIMARY         | 4       | NULL                            |  100 |   100.00 | Using index; Using temporary                              |
      |    1 | PRIMARY     | SQ2_alias3 | range | col_varchar_key | col_varchar_key | 4       | NULL                            |   17 |   100.00 | Using where; Using index                                  |
      |    1 | PRIMARY     | SQ2_alias2 | ref   | col_varchar_key | col_varchar_key | 4       | test.SQ2_alias3.col_varchar_key |    8 |   100.00 | Using index                                               |
      |    1 | PRIMARY     | alias3     | index | NULL            | PRIMARY         | 4       | NULL                            |  100 |   100.00 | Using index; Using join buffer (flat, BNL join)           |
      |    1 | PRIMARY     | <derived2> | ALL   | NULL            | NULL            | NULL    | NULL                            |  400 |   100.00 | Using where; Using join buffer (incremental, BNL join)    |
      |    2 | DERIVED     | SQ1_alias1 | ALL   | NULL            | NULL            | NULL    | NULL                            |   20 |   100.00 | Using temporary                                           |
      |    2 | DERIVED     | SQ1_alias2 | index | NULL            | PRIMARY         | 4       | NULL                            |   20 |   100.00 | Using index; Distinct; Using join buffer (flat, BNL join) |
      +------+-------------+------------+-------+-----------------+-----------------+---------+---------------------------------+------+----------+-----------------------------------------------------------+

      If the query is executed again, the pattern repeats.
      I don't observe a memory leak as such (and valgrind does not find it either), but the memory is apparently overused.

      On MySQL 5.5, the behavior is different. It also creates the on-disk table of approximately the same size (a bit bigger), but there is no memory growth. The query takes ~2 min.

      Plan on MySQL 5.5.45

      +----+-------------+------------+-------+-----------------+-----------------+---------+---------------------------------+-------+----------+--------------------------------+
      | id | select_type | table      | type  | possible_keys   | key             | key_len | ref                             | rows  | filtered | Extra                          |
      +----+-------------+------------+-------+-----------------+-----------------+---------+---------------------------------+-------+----------+--------------------------------+
      |  1 | PRIMARY     | <derived2> | ALL   | NULL            | NULL            | NULL    | NULL                            |    20 |   100.00 | Using where; Using temporary   |
      |  1 | PRIMARY     | alias3     | index | NULL            | PRIMARY         | 4       | NULL                            |   100 |   100.00 | Using index; Using join buffer |
      |  1 | PRIMARY     | <derived3> | ALL   | NULL            | NULL            | NULL    | NULL                            | 57800 |   100.00 | Using join buffer              |
      |  3 | DERIVED     | SQ2_alias1 | ALL   | NULL            | NULL            | NULL    | NULL                            |   100 |   100.00 |                                |
      |  3 | DERIVED     | SQ2_alias3 | index | col_varchar_key | col_varchar_key | 9       | NULL                            |    20 |   100.00 | Using index                    |
      |  3 | DERIVED     | SQ2_alias2 | ref   | col_varchar_key | col_varchar_key | 4       | test.SQ2_alias3.col_varchar_key |     8 |   100.00 | Using index                    |
      |  2 | DERIVED     | SQ1_alias1 | ALL   | NULL            | NULL            | NULL    | NULL                            |    20 |   100.00 | Using temporary                |
      |  2 | DERIVED     | SQ1_alias2 | index | NULL            | PRIMARY         | 4       | NULL                            |    20 |   100.00 | Using index; Using join buffer |
      +----+-------------+------------+-------+-----------------+-----------------+---------+---------------------------------+-------+----------+--------------------------------+

      MySQL 5.7 is different from both. It starts with somewhat higher memory demands (1.5 Gb / 150 Mb in my case), but when it comes to the query execution, it manages to run it without either on-disk file or memory growth. The query takes ~1 min 50 sec.

      Plan on MySQL 5.7.10

      +----+-------------+------------+------------+-------+-----------------+-----------------+---------+---------------------------------+------+----------+----------------------------------------------------+
      | id | select_type | table      | partitions | type  | possible_keys   | key             | key_len | ref                             | rows | filtered | Extra                                              |
      +----+-------------+------------+------------+-------+-----------------+-----------------+---------+---------------------------------+------+----------+----------------------------------------------------+
      |  1 | PRIMARY     | SQ2_alias1 | NULL       | index | NULL            | PRIMARY         | 4       | NULL                            |  100 |   100.00 | Using index                                        |
      |  1 | PRIMARY     | SQ2_alias3 | NULL       | index | col_varchar_key | col_varchar_key | 9       | NULL                            |   20 |   100.00 | Using index                                        |
      |  1 | PRIMARY     | SQ2_alias2 | NULL       | ref   | col_varchar_key | col_varchar_key | 4       | test.SQ2_alias3.col_varchar_key |    8 |   100.00 | Using index                                        |
      |  1 | PRIMARY     | alias3     | NULL       | index | NULL            | PRIMARY         | 4       | NULL                            |  100 |   100.00 | Using index; Using join buffer (Block Nested Loop) |
      |  1 | PRIMARY     | <derived2> | NULL       | ALL   | NULL            | NULL            | NULL    | NULL                            |  400 |    91.11 | Using where; Using join buffer (Block Nested Loop) |
      |  2 | DERIVED     | SQ1_alias1 | NULL       | ALL   | NULL            | NULL            | NULL    | NULL                            |   20 |   100.00 | Using temporary                                    |
      |  2 | DERIVED     | SQ1_alias2 | NULL       | index | NULL            | PRIMARY         | 4       | NULL                            |   20 |   100.00 | Using index; Using join buffer (Block Nested Loop) |
      +----+-------------+------------+------------+-------+-----------------+-----------------+---------+---------------------------------+------+----------+----------------------------------------------------+

      The problem also scales very badly for MariaDB: if I add a small complication to the query, like below (a bit bigger join and more fields), it already takes ~18 Gb on disk and in memory, and I interrupted it after 40 min of running.
      With MySQL 5.5 it takes ~19 Gb on disk file, no memory growth, 9 min.

      SELECT 
        COUNT( DISTINCT alias1.`col_date_key`, alias1.`col_date_nokey`, alias1.`col_datetime_nokey`, alias1.`col_int_key`, alias1.`col_int_key`, alias1.`col_date_key` ) AS cnt, 
        SUM(alias1.col_int_key + alias1.col_int_nokey) AS sm,
        AVG(alias1.col_int_nokey) AS av,
        MAX(alias1.col_int_key) AS mx
      FROM 
        ( SELECT DISTINCT SQ1_alias1.* FROM t20 AS SQ1_alias1, t20 AS SQ1_alias2 ) AS alias1, 
        ( SELECT SQ2_alias1.* FROM t100 AS SQ2_alias1 
                   LEFT OUTER JOIN 
                 ( t1000 AS SQ2_alias2 INNER JOIN t20 AS SQ2_alias3 ) 
                   ON (SQ2_alias3.`col_varchar_key` = SQ2_alias2.`col_varchar_key` ) ) AS alias2, 
        t20 AS alias3,
        t20 AS alias4
      WHERE 
        alias1.`col_varchar_key` IN ('f', 'z') OR alias1.`col_int_key` NOT BETWEEN 104 AND 196 
      ORDER BY sm, cnt, mx, av;

      Since the tables are very small and joins are very moderate comparing to those in real life (the query eventually selects and groups less than 20 rows), I suppose it's quite realistic that people can hit OOM if they use similar queries.

      Attachments

        Issue Links

          Activity

            People

              monty Michael Widenius
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.