[MDEV-18089] optimizer is wrong Created: 2018-12-27  Updated: 2022-05-12  Resolved: 2021-09-30

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.3.11
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Philip orleans Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Linux


Issue Links:
Duplicate
duplicates MDEV-26729 Optimizer does wrong cast from number... Closed

 Description   

I checked this with MS SQL and it behaves correctly.
I have a very large table with one single column, varchar, primary key
These two statements take forever
Set @strvbig:='9544447408';select 1 FROM large_table WHERE did = cast(@strvbig as char) limit 1;
Set @strvbig:=9544447408;select 1 FROM large_table WHERE did = cast(@strvbig as char) limit 1;
while
select 1 FROM large_table WHERE did = '9544447408' limit 1;
returns immediately.
the optimizer is executing the cast on the right side of the Where clause for each row, which is insane since there are no columns involved, therefore it's a constant expression.
In MS SQL, this works perfectly
declare @strvbig bigint=9544447408;
select 1 from large_table where did=cast (@strvbig as varchar);
This problem seems to add an unbelievable amount of inefficiency to MariaDB when writing stored procedures and writing statements like select from large_table where column=variable,
if the variable is of a different datatype.



 Comments   
Comment by Elena Stepanova [ 2018-12-29 ]

I think it's the same implicit charset conversion issues already discussed in another bug report MDEV-14698. I'll leave it to alice to make sure of that.

Comment by Philip orleans [ 2018-12-29 ]

There is a further problem, using RocksDB. Take this query on a very large table. Please note that all fields are boolean, i.e. numeric, and the three fields are indexed.

 explain select count(*) from asterisk.npadata where reachable=1 and port_type=2 and tested=0;
+------+-------------+---------+-------------+------------------------------------------+------------------------------------------+---------+------+------------+------------------------------------------------------------------------+
| id   | select_type | table   | type        | possible_keys                            | key                                      | key_len | ref  | rows       | Extra                                                                  |
+------+-------------+---------+-------------+------------------------------------------+------------------------------------------+---------+------+------------+------------------------------------------------------------------------+
|    1 | SIMPLE      | npadata | index_merge | IDX_npadata_tested,IDX_npadata_reachable | IDX_npadata_tested,IDX_npadata_reachable | 2,2     | NULL | 1139455527 | Using intersect(IDX_npadata_tested,IDX_npadata_reachable); Using where |
+------+-------------+---------+-------------+------------------------------------------+------------------------------------------+---------+------+------------+------------------------------------------------------------------------+

I had to stop the query after 12 hours. It is basically useless for BIG data. This query in MsSQL may take 15 minutes max. I could try InnoDB but it will require several times the disk space as RocksDB. Should I try TokuDB, or this is an issue with the optimizer?

I want to emphasize that anybody can do a faster job on the same query using a CSV file and plain old AWK. It is trivial.

The entire data fits in memory:

+---------+---------+----------+----------+------------+
| ENGINE  | Data MB | Index MB | Total MB | Num Tables |
+---------+---------+----------+----------+------------+
| NULL    |    NULL |     NULL |     NULL |          4 |
| Aria    |     0.0 |      0.0 |      0.0 |          1 |
| CSV     |     0.0 |      0.0 |      0.0 |          3 |
| InnoDB  | 62981.1 |    609.4 |  63590.5 |         37 |
| MyISAM  |     0.8 |      0.1 |      0.9 |         25 |
| ROCKSDB | 14842.9 |  42893.4 |  57736.3 |          2 |
+---------+---------+----------+----------+------------+
6 rows in set (0.017 sec)
 
MariaDB [asterisk]> show variables like 'rocksdb%cache';
+-------------------------------------------------+-------+
| Variable_name                                   | Value |
+-------------------------------------------------+-------+
| rocksdb_no_block_cache                          | OFF   |
| rocksdb_pin_l0_filter_and_index_blocks_in_cache | ON    |
| rocksdb_skip_fill_cache                         | OFF   |
| rocksdb_use_clock_cache                         | OFF   |
+-------------------------------------------------+-------+
4 rows in set (0.002 sec)
 
MariaDB [asterisk]> show variables like 'rocksdb%cache%';
+-------------------------------------------------+-------------+
| Variable_name                                   | Value       |
+-------------------------------------------------+-------------+
| rocksdb_block_cache_size                        | 96636764160 |
| rocksdb_cache_index_and_filter_blocks           | ON          |
| rocksdb_force_compute_memtable_stats_cachetime  | 60000000    |
| rocksdb_no_block_cache                          | OFF         |
| rocksdb_persistent_cache_path                   |             |
| rocksdb_persistent_cache_size_mb                | 0           |
| rocksdb_pin_l0_filter_and_index_blocks_in_cache | ON          |
| rocksdb_sim_cache_size                          | 0           |
| rocksdb_skip_fill_cache                         | OFF         |
| rocksdb_table_cache_numshardbits                | 19          |
| rocksdb_use_clock_cache                         | OFF         |
+-------------------------------------------------+-------------+

Comment by Philip orleans [ 2019-01-22 ]

Take a table with a single field varchar(10) primary key
it has 700 MM records
if I do this, it works immediately
select * from table where field='9999999999';
but, had I chose to use a number
select * from table where field=9999999999;
it takes for hours
You would think that the optimizer would be smart enough to convert my bigint only once to a varchar and then do the search. It is not.
No commercial database has this issue.

Comment by Sergei Golubchik [ 2020-03-25 ]

consider this:

create table t1 (a varchar(100));
insert t1 values ('9'), ('0009'), ('     9');
select * from t1 where a=9;

as you can see, one cannot just convert bigint to a varchar once.

Comment by Philip orleans [ 2020-03-25 ]

I disagree, you take the bigint and convert it to varchar representation, that is '9' in the case, and retrieve any record that matches, '9'.
The customer already decided to pull that record, he is not looking for ' 9' or '00009', he knows what he is looking for, otherwise, he would have used a varchar variable. Please do not think the customer doesn't know what he/she is doing. Also, as I mentioned, any other commercial database works that way.

Comment by Sergei Golubchik [ 2020-03-26 ]

This is a different issue.

My example shows that if comparison rules are as documented here: https://mariadb.com/kb/en/type-conversion/#rules-for-conversion-on-comparison then the example above cannot use an index or convert bigint to varchar once.

You're saying that the comparison rules should be different. Perhaps, but they're very difficult to change, MariaDB always worked like this, since 2010, and MySQL always worked like that since 1998. It's a lot of history to break.

Besides, SQL Standard says that one cannot compare a number to a string anyway, so any conforming database implementation should just throw an error if you'd try to do that.

Comment by Philip orleans [ 2020-03-26 ]

Then close the issue.

Generated at Thu Feb 08 08:41:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.