[MDEV-15415] there is no combination of innodb variables that would force update table_rows count Created: 2018-02-24  Updated: 2018-10-05  Resolved: 2018-10-05

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.13
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Philip orleans Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

centos 7



 Description   

even if I set

+--------------------------------------+--------------+
| Variable_name                        | Value        |
+--------------------------------------+--------------+
| innodb_stats_auto_recalc             | ON           |
| innodb_stats_include_delete_marked   | OFF          |
| innodb_stats_method                  | nulls_equal  |
| innodb_stats_modified_counter        | 0            |
| innodb_stats_on_metadata             | OFF          |
| innodb_stats_persistent              | ON           |
| innodb_stats_persistent_sample_pages | 999999999999 |
| innodb_stats_sample_pages            | 999999999999 |
| innodb_stats_traditional             | ON           |
| innodb_stats_transient_sample_pages  | 999999999999 |
| innodb_status_output                 | OFF          |
| innodb_status_output_locks           | OFF          |
+--------------------------------------+--------------+

nothing at all forces analyze table to accurate update the field table_rows in information_schema.tables, and there is no known way to user-update that field, which is certainly a solution.

I need that field to be 100% accurate because I use that number to retrieve a random row on a large table, and there is no fastest way than doing RAND()*table_rows. The tables never change. Now I do "analyze table" after loading the data from scratch once a month.



 Comments   
Comment by Elena Stepanova [ 2018-03-05 ]

How large is the large table, and how far off is the count?
Could you please provide the output of

select * from information_schema.INNODB_SYS_TABLES WHERE NAME = '<database/table>';
select * from information_schema.INNODB_SYS_TABLESPACES WHERE NAME = '<database/table>';
select * from information_schema.INNODB_TABLESPACES_ENCRYPTION WHERE NAME = '<database/table>';

(replace '<database/table>' with actual names).

Please also attach your complete cnf file, maybe something else in there affects the outcome.

On a separate note, if tables never change, you don't actually need INFORMATION_SCHEMA for row counts, you can store them right when you load the tables. Moreover, you probably don't need InnoDB either, and MyISAM could give you a simple and more accurate row count.

Comment by Philip orleans [ 2018-03-06 ]

call TableInfo('sip404');                                      
+----------+---------------+------+--------+-------+-------------+------------+---------------+------------+
| TABLE_ID | NAME          | FLAG | N_COLS | SPACE | FILE_FORMAT | ROW_FORMAT | ZIP_PAGE_SIZE | SPACE_TYPE |
+----------+---------------+------+--------+-------+-------------+------------+---------------+------------+
|     2538 | sip404/sip404 |   33 |      5 |  2235 | Barracuda   | Dynamic    |             0 | Single     |
+----------+---------------+------+--------+-------+-------------+------------+---------------+------------+
1 row in set (0.00 sec)
 
+-------+---------------+------+-------------+------------+-----------+---------------+------------+---------------+-----------+----------------+
| SPACE | NAME          | FLAG | FILE_FORMAT | ROW_FORMAT | PAGE_SIZE | ZIP_PAGE_SIZE | SPACE_TYPE | FS_BLOCK_SIZE | FILE_SIZE | ALLOCATED_SIZE |
+-------+---------------+------+-------------+------------+-----------+---------------+------------+---------------+-----------+----------------+
|  2235 | sip404/sip404 |   33 | Barracuda   | Dynamic    |     16384 |         16384 | Single     |          4096 | 482344960 |      482344960 |
+-------+---------------+------+-------------+------------+-----------+---------------+------------+---------------+-----------+----------------+
1 row in set (0.00 sec)

In reality there are 425 tables in tha database. This is the largest table with 9MM rows. The data is randomized prior to inserting it and the ID field is the primary key, an auto-increment number. Then each time I read the table to extract a random row, I multiply RAND()*table_rows. There is no faster way to retrive a true random record.
The database software should do what the documentation says it does, and in this case it does not. All I say is: make it comply with the documentation. If there are variables that control the behavior when we type "analyze table", and we use a large number, more than existing pages, it should generate a true row count, because it is supposed to read 100% of the pages. The difference is off by about 5% to 20%. You may reproduce the issue in your lab.

Comment by Elena Stepanova [ 2018-10-04 ]

Could you please give an example of the table structure, and also provide the complete config file (or show global variables)? I can't reproduce it, I'm getting the exact number of rows with your stat options:

MariaDB [test]> select count(*) from t1;
+----------+
| count(*) |
+----------+
|  9000010 |
+----------+
1 row in set (2.07 sec)
 
MariaDB [test]> select table_rows from information_schema.tables where table_name = 't1';
+------------+
| table_rows |
+------------+
|    9000010 |
+------------+
1 row in set (0.00 sec)
 
MariaDB [test]> analyze table t1;
+---------+---------+----------+----------+
| Table   | Op      | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status   | OK       |
+---------+---------+----------+----------+
1 row in set (0.70 sec)
 
MariaDB [test]> select table_rows from information_schema.tables where table_name = 't1';
+------------+
| table_rows |
+------------+
|    9000010 |
+------------+
1 row in set (0.01 sec)
 
MariaDB [test]> insert into t1 values (1,1),(2,2);
Query OK, 2 rows affected (0.27 sec)
Records: 2  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select table_rows from information_schema.tables where table_name = 't1';
+------------+
| table_rows |
+------------+
|    9000012 |
+------------+
1 row in set (0.07 sec)
 
MariaDB [test]> analyze table t1;
+---------+---------+----------+----------+
| Table   | Op      | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status   | OK       |
+---------+---------+----------+----------+
1 row in set (0.54 sec)
 
MariaDB [test]> select table_rows from information_schema.tables where table_name = 't1';
+------------+
| table_rows |
+------------+
|    9000012 |
+------------+
1 row in set (0.01 sec)

etc.

Comment by Philip orleans [ 2018-10-05 ]

Please close the case, I cannot reproduce it myself.
Server version: 10.2.18-MariaDB MariaDB Server

Generated at Thu Feb 08 08:21:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.