[MDEV-21503] non determinist query result on first execution, myisam speed regression Created: 2020-01-16  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - MyISAM
Affects Version/s: 10.3.18, 10.3.21, 10.4.8
Fix Version/s: 10.4

Type: Bug Priority: Minor
Reporter: VAROQUI Stephane Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 2
Labels: None
Environment:

Debian and unbutu package mariadb depo zfs


Attachments: File test.rdt    

 Description   

How to reproduce :
The DDL:

CREATE TABLE `T` (
  `DATE_1` timestamp NOT NULL DEFAULT current_timestamp(),
  `ID` int(10) unsigned NOT NULL DEFAULT 0,
  `D` smallint(6) DEFAULT NULL,
  `C` smallint(6) DEFAULT NULL,
  `TIC` float DEFAULT NULL,
  `VALUE_1` float DEFAULT NULL,
  `VALUE_2` float DEFAULT NULL,
  `VALUE_3` float DEFAULT NULL,
  `VALUE_4` float DEFAULT NULL,
  `IS_PROCESSING_VALIDATED` tinyint(1) unsigned DEFAULT NULL,
  `DATE_2` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`DATE_1`,`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

MariaDB [test]> load data local infile '/root/test.rdt' into table  T;
Query OK, 34121 rows affected (0.522 sec)            
Records: 34121  Deleted: 0  Skipped: 0  Warnings: 0
 
MariaDB [test]>  SELECT COUNT(*)  AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := MD5(CONCAT_WS('#', `date_1` + 0, `id`, `d`, `c`, `tic`, `value_1`, `value_2`, `value_3`, `value_4`, `is_processing_validated`, `date_2` + 0, CONCAT(ISNULL(`d`), ISNULL(`c`), ISNULL(`tic`), ISNULL(`value_1`), ISNULL(`value_2`), ISNULL(`value_3`), ISNULL(`value_4`), ISNULL(`is_processing_validated`)))), 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0) AS crc FROM `test`.`T` FORCE INDEX(`PRIMARY`) WHERE ((`date_1` > '2020-01-08 13:43:16') OR (`date_1` = '2020-01-08 13:43:16' AND `id` >= '1283507')) AND ((`date_1` < '2020-01-08 13:43:28') OR (`date_1` = '2020-01-08 13:43:28' AND `id` <= '1854530'));
 
+-------+----------------------------------+
| cnt   | crc                              |
+-------+----------------------------------+
| 34121 | 0000000000000000e3edfd9f0148c286 |
+-------+----------------------------------+
1 row in set (11.230 sec)
 
MariaDB [test]> SELECT COUNT(*)  AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := MD5(CONCAT_WS('#', `date_1` + 0, `id`, `d`, `c`, `tic`, `value_1`, `value_2`, `value_3`, `value_4`, `is_processing_validated`, `date_2` + 0, CONCAT(ISNULL(`d`), ISNULL(`c`), ISNULL(`tic`), ISNULL(`value_1`), ISNULL(`value_2`), ISNULL(`value_3`), ISNULL(`value_4`), ISNULL(`is_processing_validated`)))), 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0) AS crc FROM `test`.`T` FORCE INDEX(`PRIMARY`) WHERE ((`date_1` > '2020-01-08 13:43:16') OR (`date_1` = '2020-01-08 13:43:16' AND `id` >= '1283507')) AND ((`date_1` < '2020-01-08 13:43:28') OR (`date_1` = '2020-01-08 13:43:28' AND `id` <= '1854530'));
+-------+----------------------------------+
| cnt   | crc                              |
+-------+----------------------------------+
| 34121 | 7e756f5a0de6135ee3edfd9f0148c286 |
+-------+----------------------------------+
1 row in set (12.281 sec)

The server encoding & collation: utfbm4 ;

Always the fist query have a bad CRC result filled with 0 on half bytes
12sec ro read 30K rows hooo god !

MariaDB [test]> alter table T engine innodb ;
Query OK, 34121 rows affected (0.217 sec)              
Records: 34121  Duplicates: 0  Warnings: 0
 
 
root@s18-fr-1:~# mysql -hdb1.tst-2x-mariadb104-proxysql.svc.cloud18 -uroot -pmariadb -P3306 --local-infile test 
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 54792
Server version: 10.4.8-MariaDB-1:10.4.8+maria~bionic-log mariadb.org binary distribution
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [test]> SELECT COUNT(*)  AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := MD5(CONCAT_WS('#', `date_1` + 0, `id`, `d`, `c`, `tic`, `value_1`, `value_2`, `value_3`, `value_4`, `is_processing_validated`, `date_2` + 0, CONCAT(ISNULL(`d`), ISNULL(`c`), ISNULL(`tic`), ISNULL(`value_1`), ISNULL(`value_2`), ISNULL(`value_3`), ISNULL(`value_4`), ISNULL(`is_processing_validated`)))), 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0) AS crc FROM `test`.`T` FORCE INDEX(`PRIMARY`) WHERE ((`date_1` > '2020-01-08 13:43:16') OR (`date_1` = '2020-01-08 13:43:16' AND `id` >= '1283507')) AND ((`date_1` < '2020-01-08 13:43:28') OR (`date_1` = '2020-01-08 13:43:28' AND `id` <= '1854530'));
+-------+----------------------------------+
| cnt   | crc                              |
+-------+----------------------------------+
| 34121 | 0000000000000000e3edfd9f0148c286 |
+-------+----------------------------------+
1 row in set (0.222 sec)
 
MariaDB [test]> SELECT COUNT(*)  AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := MD5(CONCAT_WS('#', `date_1` + 0, `id`, `d`, `c`, `tic`, `value_1`, `value_2`, `value_3`, `value_4`, `is_processing_validated`, `date_2` + 0, CONCAT(ISNULL(`d`), ISNULL(`c`), ISNULL(`tic`), ISNULL(`value_1`), ISNULL(`value_2`), ISNULL(`value_3`), ISNULL(`value_4`), ISNULL(`is_processing_validated`)))), 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0) AS crc FROM `test`.`T` FORCE INDEX(`PRIMARY`) WHERE ((`date_1` > '2020-01-08 13:43:16') OR (`date_1` = '2020-01-08 13:43:16' AND `id` >= '1283507')) AND ((`date_1` < '2020-01-08 13:43:28') OR (`date_1` = '2020-01-08 13:43:28' AND `id` <= '1854530'));
+-------+----------------------------------+
| cnt   | crc                              |
+-------+----------------------------------+
| 34121 | 7e756f5a0de6135ee3edfd9f0148c286 |
+-------+----------------------------------+
1 row in set (0.239 sec)

What we can see is that such read query do not only give wrong computation on first read but is also taking x20 more time on myisam compare with loading the data , the wrong computation is repeatable on innodb but query get magically fast. Once the client get the first wrong result the resultset is always correct later on .

For the performance issue on MyISAM it looks like a optimizer regression

MariaDB [test]> set optimizer_switch='mrr=off';
Query OK, 0 rows affected, 1 warning (0.000 sec)
set optimizer_switch='index_condition_pushdown=off';
Query OK, 0 rows affected, 1 warning (0.000 sec)

 
This simplified query is fast:
 
MariaDB [test]> SELECT COUNT(*)  AS cnt,   CONCAT_WS('#',  `date_1` + 0, `id`, `tic`)     FROM `test`.`T` ;
+-------+--------------------------------------------+
| cnt   | CONCAT_WS('#',  `date_1` + 0, `id`, `tic`) |
+-------+--------------------------------------------+
| 34121 | 20200108134316#1283507                     |
+-------+--------------------------------------------+
1 row in set (0.010 sec)

When adding the force index and the where clause it get slow :

SELECT COUNT(*)  AS cnt,   CONCAT_WS('#',  `date_1` + 0, `id`, `tic`)     FROM `test`.`T` FORCE INDEX(`PRIMARY`) WHERE ((`date_1` > '2020-01-08 13:43:16') OR (`date_1` = '2020-01-08 13:43:16' AND `id` >= '1283507')) AND ((`date_1` < '2020-01-08 13:43:28') OR (`date_1` = '2020-01-08 13:43:28' AND `id` <= '1854530'));
+-------+--------------------------------------------+
| cnt   | CONCAT_WS('#',  `date_1` + 0, `id`, `tic`) |
+-------+--------------------------------------------+
| 34121 | 20200108134316#1283507                     |
+-------+--------------------------------------------+
1 row in set (11.354 sec)

But without Force index or with force and no where it's fast

MariaDB [test]> SELECT COUNT(*)  AS cnt,   CONCAT_WS('#',  `date_1` + 0, `id`, `tic`)     FROM `test`.`T`  WHERE ((`date_1` > '2020-01-08 13:43:16') OR (`date_1` = '2020-01-08 13:43:16' AND `id` >= '1283507')) AND ((`date_1` < '2020-01-08 13:43:28') OR (`date_1` = '2020-01-08 13:43:28' AND `id` <= '1854530'));
+-------+--------------------------------------------+
| cnt   | CONCAT_WS('#',  `date_1` + 0, `id`, `tic`) |
+-------+--------------------------------------------+
| 34121 | 20200108134316#1283507                     |
+-------+--------------------------------------------+
1 row in set (0.020 sec)
SELECT COUNT(*)  AS cnt,   CONCAT_WS('#',  `date_1` + 0, `id`, `tic`)     FROM `test`.`T` FORCE INDEX(`PRIMARY`)  ;
+-------+--------------------------------------------+
| cnt   | CONCAT_WS('#',  `date_1` + 0, `id`, `tic`) |
+-------+--------------------------------------------+
| 34121 | 20200108134316#1283507                     |
+-------+--------------------------------------------+
1 row in set (0.011 sec)



 Comments   
Comment by VAROQUI Stephane [ 2020-01-16 ]

The select query is coming from pt-table-ckesum and pt-table-sync , tools used by most DBA

Comment by VAROQUI Stephane [ 2020-01-17 ]

Aria not affected by slow reads, it as fast as innodb.

Comment by Alexandre Legrix [ 2020-03-17 ]

Hello
We have this bug too.
Where do you stand on this ?
Best
Alex

Comment by Valerii Kravchuk [ 2020-04-10 ]

I can easily confirm wrong crc calculation at the first run with random dummy data:

MariaDB [test]> \r
Connection id:    11
Current database: test
 
MariaDB [test]> select * from `T`;
+---------------------+----+------+------+------+---------+---------+---------+-
--------+-------------------------+---------------------+
| DATE_1              | ID | D    | C    | TIC  | VALUE_1 | VALUE_2 | VALUE_3 |
VALUE_4 | IS_PROCESSING_VALIDATED | DATE_2              |
+---------------------+----+------+------+------+---------+---------+---------+-
--------+-------------------------+---------------------+
| 2020-04-10 10:04:15 |  1 |    1 | NULL | NULL |    NULL |    NULL |    NULL |
   NULL |                    NULL | 0000-00-00 00:00:00 |
| 2020-04-10 10:04:15 |  2 |    2 | NULL | NULL |    NULL |    NULL |    NULL |
   NULL |                    NULL | 0000-00-00 00:00:00 |
| 2020-04-10 10:04:15 |  3 |    3 | NULL | NULL |    NULL |    NULL |    NULL |
   NULL |                    NULL | 0000-00-00 00:00:00 |
| 2020-01-08 13:43:28 |  1 |    1 | NULL | NULL |    NULL |    NULL |    NULL |
   NULL |                    NULL | 0000-00-00 00:00:00 |
| 2020-01-08 13:43:28 |  2 |    2 | NULL | NULL |    NULL |    NULL |    NULL |
   NULL |                    NULL | 0000-00-00 00:00:00 |
| 2020-01-08 13:43:28 |  4 |    5 | NULL | NULL |    NULL |    NULL |    NULL |
   NULL |                    NULL | 0000-00-00 00:00:00 |
+---------------------+----+------+------+------+---------+---------+---------+-
--------+-------------------------+---------------------+
6 rows in set (0.002 sec)
 
MariaDB [test]> SELECT COUNT(*)  AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR
(CAST(CONV(SUBSTRING(@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPA
D(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := MD5(CONCAT_WS('#', `date_1` + 0, `id`
, `d`, `c`, `tic`, `value_1`, `value_2`, `value_3`, `value_4`, `is_processing_va
lidated`, `date_2` + 0, CONCAT(ISNULL(`d`), ISNULL(`c`), ISNULL(`tic`), ISNULL(`
value_1`), ISNULL(`value_2`), ISNULL(`value_3`), ISNULL(`value_4`), ISNULL(`is_p
rocessing_validated`)))), 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0
) AS crc FROM `test`.`T` FORCE INDEX(`PRIMARY`) WHERE ((`date_1` > '2020-01-08 1
3:43:16') OR (`date_1` = '2020-01-08 13:43:16' AND `id` >= '1283507')) AND ((`da
te_1` < '2020-01-08 13:43:28') OR (`date_1` = '2020-01-08 13:43:28' AND `id` <=
'1854530'));
+-----+----------------------------------+
| cnt | crc                              |
+-----+----------------------------------+
|   3 | 0000000000000000f6ccb3cb43027a4a |
+-----+----------------------------------+
1 row in set (0.004 sec)
 
MariaDB [test]> SELECT COUNT(*)  AS cnt, COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR
(CAST(CONV(SUBSTRING(@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPA
D(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := MD5(CONCAT_WS('#', `date_1` + 0, `id`
, `d`, `c`, `tic`, `value_1`, `value_2`, `value_3`, `value_4`, `is_processing_va
lidated`, `date_2` + 0, CONCAT(ISNULL(`d`), ISNULL(`c`), ISNULL(`tic`), ISNULL(`
value_1`), ISNULL(`value_2`), ISNULL(`value_3`), ISNULL(`value_4`), ISNULL(`is_p
rocessing_validated`)))), 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0
) AS crc FROM `test`.`T` FORCE INDEX(`PRIMARY`) WHERE ((`date_1` > '2020-01-08 1
3:43:16') OR (`date_1` = '2020-01-08 13:43:16' AND `id` >= '1283507')) AND ((`da
te_1` < '2020-01-08 13:43:28') OR (`date_1` = '2020-01-08 13:43:28' AND `id` <=
'1854530'));
+-----+----------------------------------+
| cnt | crc                              |
+-----+----------------------------------+
|   3 | bfb8907e0c79ddf5f6ccb3cb43027a4a |
+-----+----------------------------------+
1 row in set (0.004 sec)

Comment by Alexander Barkov [ 2020-05-13 ]

Reproducible with this script:

CREATE OR REPLACE TABLE t1 (
  DATE_1 timestamp NOT NULL DEFAULT current_timestamp(),
  ID int(10) unsigned NOT NULL DEFAULT 0,
  D smallint(6) DEFAULT NULL,
  C smallint(6) DEFAULT NULL,
  TIC float DEFAULT NULL,
  VALUE_1 float DEFAULT NULL,
  VALUE_2 float DEFAULT NULL,
  VALUE_3 float DEFAULT NULL,
  VALUE_4 float DEFAULT NULL,
  IS_PROCESSING_VALIDATED tinyint(1) unsigned DEFAULT NULL,
  DATE_2 timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (DATE_1,ID)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
INSERT INTO t1
(DATE_1, ID, D, C, TIC, VALUE_1, VALUE_2, VALUE_3,VALUE_4, IS_PROCESSING_VALIDATED, DATE_2)
VALUES
('2020-04-10 10:04:15', 1, 1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, '0000-00-00 00:00:00'),
('2020-04-10 10:04:15', 2, 2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, '0000-00-00 00:00:00'),
('2020-04-10 10:04:15', 3, 3, NULL, NULL, NULL, NULL, NULL, NULL, NULL, '0000-00-00 00:00:00'),
('2020-01-08 13:43:28', 1, 1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, '0000-00-00 00:00:00'),
('2020-01-08 13:43:28', 2, 2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, '0000-00-00 00:00:00'),
('2020-01-08 13:43:28', 4, 5, NULL, NULL, NULL, NULL, NULL, NULL, NULL, '0000-00-00 00:00:00');
 
-- SET @crc=CAST(NULL AS CHAR);
 
SELECT
  COUNT(*) AS cnt,
  COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := MD5(CONCAT_WS('#', date_1 + 0, id, d, c, tic, value_1, value_2, value_3, value_4, is_processing_validated, date_2 + 0, CONCAT(ISNULL(d), ISNULL(c), ISNULL(tic), ISNULL(value_1), ISNULL(value_2), ISNULL(value_3), ISNULL(value_4), ISNULL(is_processing_validated)))), 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0) AS crc
FROM t1 FORCE INDEX(PRIMARY)
WHERE
  ((date_1 > '2020-01-08 13:43:16') OR (date_1 = '2020-01-08 13:43:16' AND id >= '1283507'))
AND
  ((date_1 < '2020-01-08 13:43:28') OR (date_1 = '2020-01-08 13:43:28' AND id <= '1854530'));
 
-- SET @crc=CAST(NULL AS CHAR);
 
SELECT
  COUNT(*) AS cnt,
  COALESCE(LOWER(CONCAT(LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc, 1, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'), LPAD(CONV(BIT_XOR(CAST(CONV(SUBSTRING(@crc := MD5(CONCAT_WS('#', date_1 + 0, id, d, c, tic, value_1, value_2, value_3, value_4, is_processing_validated, date_2 + 0, CONCAT(ISNULL(d), ISNULL(c), ISNULL(tic), ISNULL(value_1), ISNULL(value_2), ISNULL(value_3), ISNULL(value_4), ISNULL(is_processing_validated)))), 17, 16), 16, 10) AS UNSIGNED)), 10, 16), 16, '0'))), 0) AS crc
FROM t1 FORCE INDEX(PRIMARY)
WHERE
  ((date_1 > '2020-01-08 13:43:16') OR (date_1 = '2020-01-08 13:43:16' AND id >= '1283507'))
AND
  ((date_1 < '2020-01-08 13:43:28') OR (date_1 = '2020-01-08 13:43:28' AND id <= '1854530'));

The output is:

cnt	crc
3	0000000000000000f6ccb3cb43027a4a
cnt	crc
3	bfb8907e0c79ddf5f6ccb3cb43027a4a

Note, this is not a bug. Everything works as expected. The query result depends on the value of the @crc user variable.

When the first SELECT starts, the user variable @crc is set to NULL.
When the second SELECT starts, the user variable @crc is set to a non-NULL value.

Hence the different result.

If I uncomment this statement before both SELECTs:

SET @crc=CAST(NULL AS CHAR);

both SELECTs return the same result.

Comment by Alexander Barkov [ 2020-05-13 ]

Summary:

  • There are no problems with a wrong query set
  • I could not reproduce the problem with the query slow down.

I checked the following combinations:

  • MariaDB versions: 10.0, 10.3, 10.4, 10.5
  • Engine: MyISAM, InnoDB
  • Queries: with FORCE INDEX and without FORCE INDEX

Performance is about the same with all combinations.

I'm passing the issue to the optimizer team.

psergey, can you please investigate this problem further?

Thanks.

Comment by VAROQUI Stephane [ 2020-05-13 ]

I can not reproduce as well without the use of ZFS

Comment by VAROQUI Stephane [ 2020-05-13 ]

may related https://github.com/openzfs/zfs/issues/7573

Generated at Thu Feb 08 09:07:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.