[MDEV-18089] optimizer is wrong - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Not a Bug
Affects Version/s: 10.3.11
Fix Version/s: N/A
Component/s: Optimizer
Labels:
None
Environment:
Linux

Description

I checked this with MS SQL and it behaves correctly.
I have a very large table with one single column, varchar, primary key
These two statements take forever
Set @strvbig:='9544447408';select 1 FROM large_table WHERE did = cast(@strvbig as char) limit 1;
Set @strvbig:=9544447408;select 1 FROM large_table WHERE did = cast(@strvbig as char) limit 1;
while
select 1 FROM large_table WHERE did = '9544447408' limit 1;
returns immediately.
the optimizer is executing the cast on the right side of the Where clause for each row, which is insane since there are no columns involved, therefore it's a constant expression.
In MS SQL, this works perfectly
declare @strvbig bigint=9544447408;
select 1 from large_table where did=cast (@strvbig as varchar);
This problem seems to add an unbelievable amount of inefficiency to MariaDB when writing stored procedures and writing statements like select from large_table where column=variable,
if the variable is of a different datatype.

Attachments

Issue Links

duplicates

MDEV-26729 Optimizer does wrong cast from number to string

Closed

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2018-12-29 00:20

I think it's the same implicit charset conversion issues already discussed in another bug report ~~MDEV-14698~~. I'll leave it to alice to make sure of that.

Elena Stepanova added a comment - 2018-12-29 00:20 I think it's the same implicit charset conversion issues already discussed in another bug report MDEV-14698 . I'll leave it to alice to make sure of that.

Philip orleans added a comment - 2018-12-29 13:14 - edited

There is a further problem, using RocksDB. Take this query on a very large table. Please note that all fields are boolean, i.e. numeric, and the three fields are indexed.

 explain select count(*) from asterisk.npadata where reachable=1 and port_type=2 and tested=0;

+------+-------------+---------+-------------+------------------------------------------+------------------------------------------+---------+------+------------+------------------------------------------------------------------------+

| id   | select_type | table   | type        | possible_keys                            | key                                      | key_len | ref  | rows       | Extra                                                                  |

+------+-------------+---------+-------------+------------------------------------------+------------------------------------------+---------+------+------------+------------------------------------------------------------------------+

|    1 | SIMPLE      | npadata | index_merge | IDX_npadata_tested,IDX_npadata_reachable | IDX_npadata_tested,IDX_npadata_reachable | 2,2     | NULL | 1139455527 | Using intersect(IDX_npadata_tested,IDX_npadata_reachable); Using where |

+------+-------------+---------+-------------+------------------------------------------+------------------------------------------+---------+------+------------+------------------------------------------------------------------------+

I had to stop the query after 12 hours. It is basically useless for BIG data. This query in MsSQL may take 15 minutes max. I could try InnoDB but it will require several times the disk space as RocksDB. Should I try TokuDB, or this is an issue with the optimizer?

I want to emphasize that anybody can do a faster job on the same query using a CSV file and plain old AWK. It is trivial.

The entire data fits in memory:

+---------+---------+----------+----------+------------+

| ENGINE  | Data MB | Index MB | Total MB | Num Tables |

+---------+---------+----------+----------+------------+

| NULL    |    NULL |     NULL |     NULL |          4 |

| Aria    |     0.0 |      0.0 |      0.0 |          1 |

| CSV     |     0.0 |      0.0 |      0.0 |          3 |

| InnoDB  | 62981.1 |    609.4 |  63590.5 |         37 |

| MyISAM  |     0.8 |      0.1 |      0.9 |         25 |

| ROCKSDB | 14842.9 |  42893.4 |  57736.3 |          2 |

+---------+---------+----------+----------+------------+

6 rows in set (0.017 sec)

MariaDB [asterisk]> show variables like 'rocksdb%cache';

+-------------------------------------------------+-------+

| Variable_name                                   | Value |

+-------------------------------------------------+-------+

| rocksdb_no_block_cache                          | OFF   |

| rocksdb_pin_l0_filter_and_index_blocks_in_cache | ON    |

| rocksdb_skip_fill_cache                         | OFF   |

| rocksdb_use_clock_cache                         | OFF   |

+-------------------------------------------------+-------+

4 rows in set (0.002 sec)

MariaDB [asterisk]> show variables like 'rocksdb%cache%';

+-------------------------------------------------+-------------+

| Variable_name                                   | Value       |

+-------------------------------------------------+-------------+

| rocksdb_block_cache_size                        | 96636764160 |

| rocksdb_cache_index_and_filter_blocks           | ON          |

| rocksdb_force_compute_memtable_stats_cachetime  | 60000000    |

| rocksdb_no_block_cache                          | OFF         |

| rocksdb_persistent_cache_path                   |             |

| rocksdb_persistent_cache_size_mb                | 0           |

| rocksdb_pin_l0_filter_and_index_blocks_in_cache | ON          |

| rocksdb_sim_cache_size                          | 0           |

| rocksdb_skip_fill_cache                         | OFF         |

| rocksdb_table_cache_numshardbits                | 19          |

| rocksdb_use_clock_cache                         | OFF         |

+-------------------------------------------------+-------------+

Philip orleans added a comment - 2018-12-29 13:14 - edited There is a further problem, using RocksDB. Take this query on a very large table. Please note that all fields are boolean, i.e. numeric, and the three fields are indexed. explain select count(*) from asterisk.npadata where reachable=1 and port_type=2 and tested=0; +------+-------------+---------+-------------+------------------------------------------+------------------------------------------+---------+------+------------+------------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+---------+-------------+------------------------------------------+------------------------------------------+---------+------+------------+------------------------------------------------------------------------+ | 1 | SIMPLE | npadata | index_merge | IDX_npadata_tested,IDX_npadata_reachable | IDX_npadata_tested,IDX_npadata_reachable | 2,2 | NULL | 1139455527 | Using intersect(IDX_npadata_tested,IDX_npadata_reachable); Using where | +------+-------------+---------+-------------+------------------------------------------+------------------------------------------+---------+------+------------+------------------------------------------------------------------------+ I had to stop the query after 12 hours. It is basically useless for BIG data. This query in MsSQL may take 15 minutes max. I could try InnoDB but it will require several times the disk space as RocksDB. Should I try TokuDB, or this is an issue with the optimizer? I want to emphasize that anybody can do a faster job on the same query using a CSV file and plain old AWK. It is trivial. The entire data fits in memory: +---------+---------+----------+----------+------------+ | ENGINE | Data MB | Index MB | Total MB | Num Tables | +---------+---------+----------+----------+------------+ | NULL | NULL | NULL | NULL | 4 | | Aria | 0.0 | 0.0 | 0.0 | 1 | | CSV | 0.0 | 0.0 | 0.0 | 3 | | InnoDB | 62981.1 | 609.4 | 63590.5 | 37 | | MyISAM | 0.8 | 0.1 | 0.9 | 25 | | ROCKSDB | 14842.9 | 42893.4 | 57736.3 | 2 | +---------+---------+----------+----------+------------+ 6 rows in set (0.017 sec) MariaDB [asterisk]> show variables like 'rocksdb%cache'; +-------------------------------------------------+-------+ | Variable_name | Value | +-------------------------------------------------+-------+ | rocksdb_no_block_cache | OFF | | rocksdb_pin_l0_filter_and_index_blocks_in_cache | ON | | rocksdb_skip_fill_cache | OFF | | rocksdb_use_clock_cache | OFF | +-------------------------------------------------+-------+ 4 rows in set (0.002 sec) MariaDB [asterisk]> show variables like 'rocksdb%cache%'; +-------------------------------------------------+-------------+ | Variable_name | Value | +-------------------------------------------------+-------------+ | rocksdb_block_cache_size | 96636764160 | | rocksdb_cache_index_and_filter_blocks | ON | | rocksdb_force_compute_memtable_stats_cachetime | 60000000 | | rocksdb_no_block_cache | OFF | | rocksdb_persistent_cache_path | | | rocksdb_persistent_cache_size_mb | 0 | | rocksdb_pin_l0_filter_and_index_blocks_in_cache | ON | | rocksdb_sim_cache_size | 0 | | rocksdb_skip_fill_cache | OFF | | rocksdb_table_cache_numshardbits | 19 | | rocksdb_use_clock_cache | OFF | +-------------------------------------------------+-------------+

Philip orleans added a comment - 2019-01-22 06:45

Take a table with a single field varchar(10) primary key
it has 700 MM records
if I do this, it works immediately
select * from table where field='9999999999';
but, had I chose to use a number
select * from table where field=9999999999;
it takes for hours
You would think that the optimizer would be smart enough to convert my bigint only once to a varchar and then do the search. It is not.
No commercial database has this issue.

Philip orleans added a comment - 2019-01-22 06:45 Take a table with a single field varchar(10) primary key it has 700 MM records if I do this, it works immediately select * from table where field='9999999999'; but, had I chose to use a number select * from table where field=9999999999; it takes for hours You would think that the optimizer would be smart enough to convert my bigint only once to a varchar and then do the search. It is not. No commercial database has this issue.

Sergei Golubchik added a comment - 2020-03-25 18:13

consider this:

create table t1 (a varchar(100));

insert t1 values ('9'), ('0009'), ('     9');

select * from t1 where a=9;

as you can see, one cannot just convert bigint to a varchar once.

Sergei Golubchik added a comment - 2020-03-25 18:13 consider this: create table t1 (a varchar (100)); insert t1 values ( '9' ), ( '0009' ), ( ' 9' ); select * from t1 where a=9; as you can see, one cannot just convert bigint to a varchar once.

Philip orleans added a comment - 2020-03-25 19:07 - edited

I disagree, you take the bigint and convert it to varchar representation, that is '9' in the case, and retrieve any record that matches, '9'.
The customer already decided to pull that record, he is not looking for ' 9' or '00009', he knows what he is looking for, otherwise, he would have used a varchar variable. Please do not think the customer doesn't know what he/she is doing. Also, as I mentioned, any other commercial database works that way.

Philip orleans added a comment - 2020-03-25 19:07 - edited I disagree, you take the bigint and convert it to varchar representation, that is '9' in the case, and retrieve any record that matches, '9'. The customer already decided to pull that record, he is not looking for ' 9' or '00009', he knows what he is looking for, otherwise, he would have used a varchar variable. Please do not think the customer doesn't know what he/she is doing. Also, as I mentioned, any other commercial database works that way.

Sergei Golubchik added a comment - 2020-03-26 11:44 - edited

This is a different issue.

My example shows that if comparison rules are as documented here: https://mariadb.com/kb/en/type-conversion/#rules-for-conversion-on-comparison then the example above cannot use an index or convert bigint to varchar once.

You're saying that the comparison rules should be different. Perhaps, but they're very difficult to change, MariaDB always worked like this, since 2010, and MySQL always worked like that since 1998. It's a lot of history to break.

Besides, SQL Standard says that one cannot compare a number to a string anyway, so any conforming database implementation should just throw an error if you'd try to do that.

Sergei Golubchik added a comment - 2020-03-26 11:44 - edited This is a different issue. My example shows that if comparison rules are as documented here: https://mariadb.com/kb/en/type-conversion/#rules-for-conversion-on-comparison then the example above cannot use an index or convert bigint to varchar once. You're saying that the comparison rules should be different. Perhaps, but they're very difficult to change, MariaDB always worked like this, since 2010, and MySQL always worked like that since 1998. It's a lot of history to break. Besides, SQL Standard says that one cannot compare a number to a string anyway, so any conforming database implementation should just throw an error if you'd try to do that.

Philip orleans added a comment - 2020-03-26 20:55

Then close the issue.

Philip orleans added a comment - 2020-03-26 20:55 Then close the issue.

People

Assignee:: Unassigned

Reporter:: Philip orleans

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2018-12-27 00:22

Updated:: 2022-05-12 00:22

Resolved:: 2021-09-30 12:20

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server