[MDEV-21823] information_schema tables need optimization Created: 2020-02-27  Updated: 2020-05-27

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - RocksDB
Affects Version/s: 10.5.1
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Philip orleans Assignee: Sergei Golubchik
Resolution: Unresolved Votes: 0
Labels: None
Environment:

linux



 Description   

this statement fails in database information_schema
ALTER TABLE TABLES ADD UNIQUE INDEX UK_TABLES(TABLE_SCHEMA, TABLE_NAME);
Right now the statement below, which I use hundreds of times per second, is performing a full table scan
SELECT AUTO_INCREMENT - 1 FROM INFORMATION_SCHEMA.TABLES where table_schema ='x' and table_name='y')
I could add an index but it does not allow me any such privileges. It should allow me to improve the internal tables.
I use this statement because I need to know inside a stored procedure which is the highest ID in the table. To do full count or a max(ID) is slower than the statement above. Am I wrong?



 Comments   
Comment by Marko Mäkelä [ 2020-02-27 ]

The INFORMATION_SCHEMA tables are essentially hard-coded views. The data is generated programmatically via an interface that (to my understanding) only allows the whole table contents to be returned. There is a parameter that might allow some filtering, but that one is not used in INFORMATION_SCHEMA.INNODB_%, which I am familiar with. I am not aware of anything that would allow to push down any ranges. To my understanding, INFORMATION_SCHEMA.TABLES is returning the same information that you would get with SHOW TABLES, via the same internal interface (invoking virtual member functions of the handler class to get some data from the storage engines).

Regarding AUTO_INCREMENT, you should be aware that there are 2 or 3 different values, as described in MDEV-13094.

Have you considered CREATE SEQUENCE (MDEV-10139)? That is a cleaner design and based on an SQL standard. AUTO_INCREMENT is always tied to a single table, more or less.

Comment by Sergei Golubchik [ 2020-03-02 ]

This statement

SELECT AUTO_INCREMENT - 1 FROM INFORMATION_SCHEMA.TABLES where table_schema ='x' and table_name='y'

should only open one table `x`.`y` and not touch anything else, there is no full table scan there,

Comment by Philip orleans [ 2020-03-02 ]

The optimizer says that "select max(id) from table" is "optimized away" when "id" is the primary key. But, strange enough,
"select rand()* max(id) from table" is not optimized away, it uses an index and it is 10 times slower.
The optimizer should be smarter. I need need to do in three steps what should take only one and be optimized:
"select field from table where id=floor(rand()*max(id))"
Maybe we could design a SQL extension for this generic case, when we need a random row from a table with a unique index on that field. It has to be very fast.
So far, it is very slow unless done in three steps
select max(id) from table into var1
set var2=floor(rand()*var1)
select field from table where id=var2

Comment by Sergei Golubchik [ 2020-03-02 ]

you can do in two steps, like

set var=(select max(id) from table)*rand();
select * from table where id=var;

To do it in one step you can try creating a stored function like

create function deterministic_rand() returns double deterministic return rand();

And may be it'll help, if used like

select * from table where id=deterministic_rand()*(select max(id) from table);

Generated at Thu Feb 08 09:10:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.