[MDEV-14007] Automatic statistics recalculation for MyISAM and Aria Created: 2017-10-05  Updated: 2019-05-16

Status: Open
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - Aria, Storage Engine - MyISAM
Fix Version/s: None

Type: Task Priority: Major
Reporter: Richard Stracke Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Cardinality for non-unique index will not updated, until one row was inserted and
analyze table was executed.

Create Statement.

CREATE TABLE TAB1 (
CAMPO1 INT NOT NULL AUTO_INCREMENT,
CAMPO2 INT NOT NULL DEFAULT '0',
INDEX IND_TAB1 (CAMPO2),
PRIMARY KEY (CAMPO1)
) ENGINE=Aria;

SHOW INDEX FROM TAB1;
show cardinality for PK = 0  and for Non-unique = NULL.

MariaDB [test]> show index from TAB1;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TAB1  |          0 | PRIMARY  |            1 | CAMPO1      | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| TAB1  |          1 | IND_TAB1 |            1 | CAMPO2      | A         |        NULL |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

I would expect cardinality a not NULL like for Index CAMP01.

An Analyze does not help;

ANALYZE TABLE TAB1;

MariaDB [test]> ANALYZE TABLE TAB1;
+-----------+---------+----------+-----------------------------+
| Table     | Op      | Msg_type | Msg_text                    |
+-----------+---------+----------+-----------------------------+
| test.TAB1 | analyze | status   | Table is already up to date |
+-----------+---------+----------+-----------------------------+

Message "Table is already up to to date" is wrong.
CAMP02 cardinality is mssing.

Output of show index from TAB1; is still the same.

MariaDB [test]> show index from TAB1;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TAB1  |          0 | PRIMARY  |            1 | CAMPO1      | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| TAB1  |          1 | IND_TAB1 |            1 | CAMPO2      | A         |        NULL |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

If I insert two values.

INSERT INTO TAB1 (CAMPO1) VALUES (1);
INSERT INTO TAB1 (CAMPO1) VALUES (2);

MariaDB [test]> show index from TAB1;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TAB1  |          0 | PRIMARY  |            1 | CAMPO1      | A         |           2 |     NULL | NULL   |      | BTREE      |         |               |
| TAB1  |          1 | IND_TAB1 |            1 | CAMPO2      | A         |        NULL |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

The cardinality of the PK CAMP01 was updated, the cardinality of CAMP02 is still NULL.
The Expected cardinality updated value for CAMP02 instead NULL.
Cardinality CAMP01 was updated, CAMP02 not.

Now a analyze table can help.

MariaDB [test]> analyze table TAB1;
+-----------+---------+----------+----------+e
| Table     | Op      | Msg_type | Msg_text |
+-----------+---------+----------+----------+
| test.TAB1 | analyze | status   | OK       |
+-----------+---------+----------+----------+
 
MariaDB [test]> show index FROM TAB1;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TAB1  |          0 | PRIMARY  |            1 | CAMPO1      | A         |           2 |     NULL | NULL   |      | BTREE      |         |               |
| TAB1  |          1 | IND_TAB1 |            1 | CAMPO2      | A         |           1 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Now cardinality is as expected.

Adding an additional value.

MariaDB [test]> INSERT INTO TAB1 (CAMPO1,CAMPO2) VALUES (3,4);
Query OK, 1 row affected (0.00 sec)
 
MariaDB [test]> show index FROM TAB1;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| TAB1  |          0 | PRIMARY  |            1 | CAMPO1      | A         |           3 |     NULL | NULL   |      | BTREE      |         |               |
| TAB1  |          1 | IND_TAB1 |            1 | CAMPO2      | A         |           3 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)

Now cardinality will be updated as expected. Both indices cardinality was updated.

Expected behaviour would be, that cardinality from non-unique index was updated in the same way like unique keys.

Richard



 Comments   
Comment by Elena Stepanova [ 2017-10-06 ]

Richard,

Could you please clarify examples in the description – which exactly part you consider a bug, what is the desired result, and why you expect INSERTs on one table (TAB1) to have any effect on another table (TAB2)?

Comment by Richard Stracke [ 2017-10-10 ]

Hello Elena,

I made the tests with MyISAM and Aria to be sure, I mixed up the tablenames.
Only 1 table is involved in the reproducing steps.

I update the whole report, I hope now it is more clearly.

Richard

Comment by Elena Stepanova [ 2017-10-10 ]

Richard,

First of all, I think there is a general misunderstanding here. MyISAM or Aria generally do not collect statistics upon populating a table, one must run ANALYZE for that. It is not a bug, it is an implementation specifics. It is certainly documented in MySQL manual, e.g. here https://dev.mysql.com/doc/refman/5.7/en/show-index.html ; I can't say whether it's documented in MariaDB KB, if it isn't, it should be added.

Also, there are parts of the description that seem confusing. Let's go through the s one by one.

  • you created an empty table
  • SHOW INDEX shows NULL cardinality for the non-unique index CAMP02, while you expect it to be 1.
    => statistics are not collected automatically, so there shouldn't be anything there, NULL is the expected value
  • you ran ANALYZE, the cardinality for CAMP02 is still NULL, while you expect it to be 1.
    => Why would it show 1? As documented, cardinality is based on N/S, where N is the number of rows in the table and S is the average value group size. The table is empty, so NULL seems to be a logical result; with some stretch, one can expect 0, but in any case it won't be 1.
  • you added two rows, (1,0) and (2,0)
  • SHOW INDEX shows cardinality NULL, while you expect it to be 2.
    => given that it was NULL before and statistics are not collected automatically, NULL is the expected value.
  • you ran ANALYZE, now you see CAMP02 cardinality 2 and consider it the expected value
    => This is strange. You have two "0" values – two rows, one group of size 2, cardinality should be 1 now, and I see 1, I'm not sure how you got 2. If you can reproduce it, it might be a bug.
  • you inserted another row, so now you have (1,0),(2,0),(3,4)
    => SHOW INDEX shows cardinality 3 for CAMP02, you consider it the expected value
    => This is also strange. While statistics are not promised to be recollected upon INSERT, it's not forbidden, I believe it can happen under certain circumstances; but in any case, you have 3 rows, 2 groups with average size 1.5, so technically it should be 2. I'm actually getting 1 with MariaDB, even after ANALYZE, which seems to be a bug fixed in MySQL 5.7. MySQL 5.7 returns 2. I couldn't find any version which would return 3.

So, we have three separate directions here:

  • a request for automatic recalculation of MyISAM/Aria statistics upon DML, similar to the upstream https://bugs.mysql.com/bug.php?id=34467;
  • some possible minor issues in recalculation
  • documentation request

Which do you want to pursue? If you want automatic recalculation, I can convert this bug report into a feature request. If you want the suspected issues to be addressed, please paste the complete totally unabridged output from the MySQL client, and also attach your cnf file(s) (but keep in mind that after fixing results might be even more different from your expectations than they are now). If you want the current behavior to be documented, I will re-categorize it.

Comment by Richard Stracke [ 2017-10-24 ]

It is not clear documented.

Quote from
https://dev.mysql.com/doc/refman/5.7/en/show-index.html

"Cardinality
An estimate of the number of unique values in the index. To update this number, run ANALYZE TABLE or (for MyISAM tables) myisamchk -a. "
This statement is engine independent.

I assume the this has historical reasons.
In earlier times there was no automatic statistics.
Later innodb_stats_auto_update a etc. was introduced as an innodb feature,
so nobody comes in mind to document the opposite behaviour of MyISAM.

What is irritating is the inconsistent behavior.

I changed expected cardinality amount to updated or not.
All other will be just distracting.
The main focus is on automatic recalculation of MyISAM/Aria statistic, what is answer of your question regarding the main focus.

For "minor issues in recalculation" I would create a new MDEV with more accurat details.

Regards,

Richard

Comment by Elena Stepanova [ 2017-10-30 ]

The main focus is on automatic recalculation of MyISAM/Aria statistic, what is answer of your question regarding the main focus.

Richard, do I understand it correctly that you want it to be a feature request for automatic statistics recalculation for the above-mentioned engines?

Comment by Richard Stracke [ 2017-11-02 ]

>> The main focus is on automatic recalculation of MyISAM/Aria statistic, what is answer of your >>question regarding the main focus.
>Richard Stracke, do I understand it correctly that you want it to be a feature request for >automatic statistics recalculation for the above-mentioned engines

Yes

Comment by Michael Widenius [ 2019-03-12 ]

Statistics and how they they are calculated are engine dependent. Depending on the design of the engine, it can provide more or less accurate statistics without running an ANALYZE TABLE.
Providing correct statistics for index and other things for all situations is a very expensive operation for some engines, like MyISAM, so there is no foundation for doing this automatically.

The basic usage case of when to update index statistics:

  • Run your application.
  • When things goes slow, do an analyze table once on the most critical tables, which should help fixing things. One usually never have to do it again on the same table.

I agree with Elena's first comment that for an empty table, the default cardinality is probably right, or at least not wrong.

Generated at Thu Feb 08 08:10:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.