[MDEV-6299] Query optimizer (?) fails - full table scan for Integer PK select Created: 2014-06-03  Updated: 2014-06-05  Resolved: 2014-06-05

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.5.36
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Slawomir Pryczek Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Fedora linux


Attachments: Zip Archive targeting_data_testing.zip    

 Description   

When sending query - simple select by unique integer id that is primary key results in full table scan.

Table structure:

CREATE TABLE `targeting_data_testing` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `_hash` varchar(30) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '',
  `__last_used` date NOT NULL,
  PRIMARY KEY (`id`,`__last_used`),
  UNIQUE KEY `_hash` (`_hash`,`__last_used`) USING BTREE
) ENGINE=Aria AUTO_INCREMENT=116938292 DEFAULT CHARSET=utf8 PAGE_CHECKSUM=1 TRANSACTIONAL=0;

Sample table attached. Below is query plan for following queries

Full table scan:

SELECT * FROM `targeting_data_testing` WHERE id=('6E76XdQUK+es/GqzbfEPm4dbHpA') OR _hash=('6E76XdQUK+es/GqzbfEPm4dbHpA')
SELECT * FROM `targeting_data_testing` WHERE id=('6E76') OR _hash= ('6E76XdQUK+es/GqzbfEPm4dbHpA')

Using index:

SELECT * FROM `targeting_data_testing` WHERE id=('6E10'+0) OR _hash=('6E76XdQUK+es/GqzbfEPm4dbHpA')
SELECT * FROM `targeting_data_testing` WHERE id=6E10 OR _hash=('6E76XdQUK+es/GqzbfEPm4dbHpA')

So the server needs to scan the whole table if we SELECT by large xEy formatted number and we pass that number with quotes. Very serious and very hard to prevent bug (as it's very counter-intuitive that just adding quotes will make such a big difference, and only some base64 encoded strings will trigger the bug).

This lead to scanning 2 billion rows table on some occasions, on our server.

When querying only by id it was like 5 million rows scan VS "Impossible WHERE noticed after reading const tables" on that 2 billion row table.



 Comments   
Comment by Slawomir Pryczek [ 2014-06-04 ]

Interesting thing
"6E10" is having same effect (full table scan)
While "10000000" , 10000000 , 6E10 is scaning only 1 row / no rows...

So to optimizer "6E10" is neither 6E10 number nor "10000000", 10000000?

Comment by Sergei Golubchik [ 2014-06-05 ]

This is, in fact, intentional and not a bug. When you compare an integer to a string, they're both being casted and compared as floating-point numbers. That is, your query

SELECT * FROM targeting_data_testing WHERE id='6E76Xd...'

is internally executed as

SELECT * FROM targeting_data_testing WHERE CAST(id AS DOUBLE)=CAST('6E76Xd...' AS DOUBLE)

And if you write it like that, it becomes obvious that no index can help, if an indexed column is used as an argument for a function.

To be sure than an index will be considered, cast the value to the data type of your indexed column, for example, write

SELECT * FROM targeting_data_testing WHERE id=CAST('6E76Xd...' AS SIGNED)

Generated at Thu Feb 08 07:10:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.