[MDEV-23066] RAND() evaluated and filtered twice with subquery Created: 2020-07-01  Updated: 2023-04-27

Status: Confirmed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.4.8, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.4, 10.5

Type: Bug Priority: Major
Reporter: Christopher Granahan Assignee: Igor Babaev
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Windows 10


Issue Links:
Relates
relates to MDEV-5414 RAND() in a subselect : different beh... Closed

 Description   

When calling RAND() in a subquery and filtering on it in the outer query, it appears that a random number is being generated twice, with both values checked against the WHERE clause. This issue sounds similar to MDEV-5414, but that issue is marked as fixed in 10.0.8. MySQL (or at least recent versions of it) does not appear to exhibit this problem.

Example:

WITH RECURSIVE one_hundred (N) AS (
  SELECT 1
  UNION ALL
  SELECT N+1 FROM one_hundred WHERE N < 100
),
randomized (RANDOM_VALUE) AS (
  SELECT RAND()
  FROM one_hundred
)
SELECT COUNT(*)
FROM randomized
WHERE RANDOM_VALUE <= 0.1;

This query should return 10 on average, but instead returns 1 on average.

One workaround is to add a LIMIT clause to the inner query. The following code returns an average of 10 as expected:

WITH RECURSIVE one_hundred (N) AS (
  SELECT 1
  UNION ALL
  SELECT N+1 FROM one_hundred WHERE N < 100
),
randomized (RANDOM_VALUE) AS (
  SELECT RAND()
  FROM one_hundred
  LIMIT 999999999999
)
SELECT COUNT(*)
FROM randomized
WHERE RANDOM_VALUE <= 0.1;



 Comments   
Comment by Alice Sherepa [ 2020-07-08 ]

It is repeatable on 10.2-10.5 - when condition_pushdown_for_derived=on.

MariaDB [test]> SELECT count(a)
    -> FROM (SELECT RAND()*100 a FROM seq_1_to_1000) tbl
    -> WHERE a <= 10;
+----------+
| count(a) |
+----------+
|        9 |
+----------+
1 row in set (0.006 sec)
 
MariaDB [test]> explain extended
    -> SELECT count(a)
    -> FROM (SELECT RAND()*100 a FROM seq_1_to_1000) tbl
    -> WHERE a <= 10;
 
+------+-------------+---------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| id   | select_type | table         | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra                    |
+------+-------------+---------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
|    1 | PRIMARY     | <derived2>    | ALL   | NULL          | NULL    | NULL    | NULL | 1000 |   100.00 | Using where              |
|    2 | DERIVED     | seq_1_to_1000 | index | NULL          | PRIMARY | 8       | NULL | 1000 |   100.00 | Using where; Using index |
+------+-------------+---------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
2 rows in set, 1 warning (0.004 sec)
 
Note (Code 1003): /* select#1 */ select count(`tbl`.`a`) AS `count(a)` from (/* select#2 */ select rand() * 100 AS `a` from `test`.`seq_1_to_1000` where rand() * 100 <= 10) `tbl` where `tbl`.`a` <= 10
 
MariaDB [test]> set optimizer_switch='condition_pushdown_for_derived=off'; 
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [test]> SELECT count(a) FROM (SELECT RAND()*100 a FROM seq_1_to_1000) tbl WHERE a <= 10;
+----------+
| count(a) |
+----------+
|      103 |
+----------+
1 row in set (0.012 sec)

Generated at Thu Feb 08 09:19:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.