[MDEV-12556] Possible performance problem with recursive CTE and UNION involving TEXT/BLOB columns Created: 2017-04-22  Updated: 2018-07-31  Resolved: 2017-04-28

Status: Closed
Project: MariaDB Server
Component/s: Optimizer - CTE
Affects Version/s: 10.2
Fix Version/s: 10.2.6

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Igor Babaev
Resolution: Fixed Votes: 0
Labels: 10.2-ga

Issue Links:
Relates
relates to MDEV-16867 Recursive query leads to endless exec... Closed

 Description   

The CTE below is intentionally endless. It's controlled by max_recursive_iterations, which works just fine. But the execution time is radically different, depending on whether we use a VARCHAR column, or a TEXT column of the same length.

DROP TABLE IF EXISTS t;
CREATE TABLE t (c1 VARCHAR(255), c2 TINYTEXT);
INSERT INTO t VALUES ('a','a'),('b','b'),('c','c'),('d','d');
 
SET max_recursive_iterations = 8;
 
WITH RECURSIVE cte(f) AS (
  SELECT c2 FROM t
  UNION
  SELECT c2 FROM t, cte
) SELECT COUNT(*) FROM cte;
 
WITH RECURSIVE cte(f) AS (
  SELECT c1 FROM t
  UNION
  SELECT c1 FROM t, cte
) SELECT COUNT(*) FROM cte;

Result with TEXT column (15.23 sec)

MariaDB [test]> WITH RECURSIVE cte(f) AS (
    ->   SELECT c2 FROM t
    ->   UNION
    ->   SELECT c2 FROM t, cte
    -> ) SELECT COUNT(*) FROM cte;
+----------+
| COUNT(*) |
+----------+
|        4 |
+----------+
1 row in set (15.23 sec)

Result with VARCHAR column (0.01 sec)

MariaDB [test]> WITH RECURSIVE cte(f) AS (
    ->   SELECT c1 FROM t
    ->   UNION
    ->   SELECT c1 FROM t, cte
    -> ) SELECT COUNT(*) FROM cte;
+----------+
| COUNT(*) |
+----------+
|        4 |
+----------+
1 row in set (0.01 sec)

If I understand the idea, with UNION the temporary table should never grow above 4 rows, so there is no reason for such long execution time.

If I replace UNION with UNION ALL, execution time naturally grows, and it becomes approximately the same for VARCHAR and TEXT column. The result (final count) is the same, which shows that the number of iterations is the same too.

MariaDB [test]> WITH RECURSIVE cte(f) AS (
    ->   SELECT c2 FROM t
    ->   UNION ALL
    ->   SELECT c2 FROM t, cte
    -> ) SELECT COUNT(*) FROM cte;
+----------+
| COUNT(*) |
+----------+
|   349524 |
+----------+
1 row in set (21.04 sec)

MariaDB [test]> WITH RECURSIVE cte(f) AS (
    ->   SELECT c1 FROM t
    ->   UNION ALL
    ->   SELECT c1 FROM t, cte
    -> ) SELECT COUNT(*) FROM cte;
+----------+
| COUNT(*) |
+----------+
|   349524 |
+----------+
1 row in set (19.43 sec)



 Comments   
Comment by Igor Babaev [ 2017-04-26 ]

ANALYZE run for the above two queries shows why their performance is so different:

 MariaDB [test]> ANALYZE
    -> WITH RECURSIVE cte(f) AS (
    ->   SELECT c1 FROM t
    ->   UNION
    ->   SELECT c1 FROM t, cte
    -> ) SELECT COUNT(*) FROM cte;
+------+-----------------+------------+------+---------------+------+---------+------+------+--------+----------+------------+------------------------------------+
| id   | select_type     | table      | type | possible_keys | key  | key_len | ref  | rows | r_rows | filtered | r_filtered | Extra                              |
+------+-----------------+------------+------+---------------+------+---------+------+------+--------+----------+------------+------------------------------------+
|    1 | PRIMARY         | <derived2> | ALL  | NULL          | NULL | NULL    | NULL |    4 |   4.00 |   100.00 |     100.00 |                                    |
|    2 | SUBQUERY        | t          | ALL  | NULL          | NULL | NULL    | NULL |    4 |   4.00 |   100.00 |     100.00 |                                    |
|    3 | RECURSIVE UNION | t          | ALL  | NULL          | NULL | NULL    | NULL |    4 |   4.00 |   100.00 |     100.00 |                                    |
|    3 | RECURSIVE UNION | <derived2> | ALL  | NULL          | NULL | NULL    | NULL |    4 |   4.00 |   100.00 |     100.00 | Using join buffer (flat, BNL join) |
| NULL | UNION RESULT    | <union2,3> | ALL  | NULL          | NULL | NULL    | NULL | NULL |   0.00 |     NULL |       NULL |                                    |
+------+-----------------+------------+------+---------------+------+---------+------+------+--------+----------+------------+------------------------------------+
MariaDB [test]> ANALYZE
    -> WITH RECURSIVE cte(f) AS (
    ->   SELECT c2 FROM t
    ->   UNION
    ->   SELECT c2 FROM t, cte
    -> ) SELECT COUNT(*) FROM cte;
+------+-----------------+------------+------+---------------+------+---------+------+------+----------+----------+------------+------------------------------------+
| id   | select_type     | table      | type | possible_keys | key  | key_len | ref  | rows | r_rows   | filtered | r_filtered | Extra                              |
+------+-----------------+------------+------+---------------+------+---------+------+------+----------+----------+------------+------------------------------------+
|    1 | PRIMARY         | <derived2> | ALL  | NULL          | NULL | NULL    | NULL |    4 |     4.00 |   100.00 |     100.00 |                                    |
|    2 | SUBQUERY        | t          | ALL  | NULL          | NULL | NULL    | NULL |    4 |     4.00 |   100.00 |     100.00 |                                    |
|    3 | RECURSIVE UNION | t          | ALL  | NULL          | NULL | NULL    | NULL |    4 |     4.00 |   100.00 |     100.00 |                                    |
|    3 | RECURSIVE UNION | <derived2> | ALL  | NULL          | NULL | NULL    | NULL |    4 | 10922.50 |   100.00 |     100.00 | Using join buffer (flat, BNL join) |
| NULL | UNION RESULT    | <union2,3> | ALL  | NULL          | NULL | NULL    | NULL | NULL |     0.00 |     NULL |       NULL |                                    |
+------+-----------------+------------+------+---------------+------+---------+------+------+----------+----------+------------+------------------------------------+

We see that when the second query is executed the temporary table <derived2> used to store the new rows for the next iteration contains far more than 4 rows that it should contain.

Comment by Igor Babaev [ 2017-04-26 ]

Actually the CTE tables in the queries from the reported test case should have stabilized after the first iteration.
It's really so for the query that uses the c1 column, and not so for the other query.

Comment by Igor Babaev [ 2017-04-26 ]

Here's what happens:
When the rows produced on the current iteration are sent to the temporary table T of the UNION type created for CTE the rows that were not there simultaneously are sent to the temporary table D that contains rows for the next iteration. The test whether a row was in T checks the return code of writing into T. If just a HEAP table is used for T then the return code is HA_ERR_FOUND_DUPP_KEY, but if an ARIA table is used for T then the return code is HA_ERR_FOUND_DUPP_UNIQUE.
The current implementation erroneously checks only for the first return code. So if an Aria
table is used for T then all rows produced by the current iteration go to D. For the first iteration
it's 4*4, for the second 4*4*4 and so on. Whether T has reached stabilization is detected by
checking whether D is empty, but it only grows. So as a result, we never stop the iterations
unless a limit for them is set.

Comment by Igor Babaev [ 2017-04-28 ]

The fix for this bug was pushed into the 10.2 tree.

Generated at Thu Feb 08 07:58:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.