[MDEV-17605] Statistics for InnoDB table is wrong if persistent statistics is used Created: 2018-11-02  Updated: 2020-08-25  Resolved: 2019-04-23

Status: Closed
Project: MariaDB Server
Component/s: Information Schema
Affects Version/s: 10.1, 10.2, 10.3, 10.4
Fix Version/s: 10.2.24, 10.1.39, 10.3.15, 10.4.5

Type: Bug Priority: Critical
Reporter: Valerii Kravchuk Assignee: Igor Babaev
Resolution: Fixed Votes: 0
Labels: upstream

Issue Links:
Relates
relates to MDEV-18188 Maintain persistent COUNT(*) in InnoDB Stalled
relates to MDEV-19334 bool is_eits_usable(Field*): Assertio... Closed
relates to MDEV-19561 Cardinality error in InnoDB slave sta... Closed

 Description   

When persistent statistics is used for InnoDB it still updates cardinality with values which are incorrect. Basically, upstream bugs:

https://bugs.mysql.com/bug.php?id=80178
https://jira.percona.com/browse/PS-1693

apply to MariaDB as well. This is a result from the test case on MariaDB 10.3.7, for example:

...
MariaDB [test]> select count(distinct customer_id), count(distinct sale_time), c
ount(distinct product_id) from sale;
+-----------------------------+---------------------------+----------------------------+
| count(distinct customer_id) | count(distinct sale_time) | count(distinct product_id) |
+-----------------------------+---------------------------+----------------------------+
|                           3 |                         5 |     6 |
+-----------------------------+---------------------------+----------------------------+
1 row in set (0.005 sec)
 
MariaDB [test]> select count(*), count(distinct customer_id), count(distinct cus
tomer_id, sale_time), count(distinct product_id) from sale;
+----------+-----------------------------+----------------------------------------+----------------------------+
| count(*) | count(distinct customer_id) | count(distinct customer_id, sale_time) | count(distinct product_id) |
+----------+-----------------------------+----------------------------------------+----------------------------+
|      694 |                           3 |                                     15 |                          6 |
+----------+-----------------------------+--------------------------------------
--+----------------------------+
1 row in set (0.005 sec)
 
MariaDB [test]> analyze table sale;
+-----------+---------+----------+----------+
| Table     | Op      | Msg_type | Msg_text |
+-----------+---------+----------+----------+
| test.sale | analyze | status   | OK       |
+-----------+---------+----------+----------+
1 row in set (0.133 sec)
 
MariaDB [test]> show indexes from sale;
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name    | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| sale  |          0 | PRIMARY     |            1 | id          | A         |      694 |     NULL | NULL   |      | BTREE      |         |               |
| sale  |          1 | customer_id |            1 | customer_id | A         |        6 |     NULL | NULL   |      | BTREE      |         |               |
| sale  |          1 | customer_id |            2 | sale_time   | A         |       30 |     NULL | NULL   |      | BTREE      |         |               |
| sale  |          1 | product_id  |            1 | product_id  | A         |       12 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.007 sec)
 
MariaDB [test]> show create table sale\G
*************************** 1. row ***************************
       Table: sale
Create Table: CREATE TABLE `sale` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `customer_id` int(11) NOT NULL,
  `product_id` int(11) NOT NULL,
  `sale_time` datetime NOT NULL,
  `sale_value` decimal(10,2) NOT NULL,
  `filler` varchar(250) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `customer_id` (`customer_id`,`sale_time`),
  KEY `product_id` (`product_id`)
) ENGINE=InnoDB AUTO_INCREMENT=887 DEFAULT CHARSET=latin1 STATS_AUTO_RECALC=0 ST
ATS_SAMPLE_PAGES=1000
1 row in set (0.016 sec)



 Comments   
Comment by Elena Stepanova [ 2018-11-02 ]

Set to 'Confirmed' because both MySQL and Percona have already confirmed it, I don't expect MariaDB to have any different behavior in regard to InnoDB persistent statistics.

Comment by Valerii Kravchuk [ 2018-11-02 ]

To show that correct Cardinality is possible, this is what we get for MyISAM:

...
MariaDB [test]> alter table sale engine=MyISAM;
Query OK, 694 rows affected (1.161 sec)
Records: 694  Duplicates: 0  Warnings: 0
 
MariaDB [test]> analyze table sale;
+-----------+---------+----------+-----------------------------+
| Table     | Op      | Msg_type | Msg_text                    |
+-----------+---------+----------+-----------------------------+
| test.sale | analyze | status   | Table is already up to date |
+-----------+---------+----------+-----------------------------+
1 row in set (0.004 sec)
 
MariaDB [test]> show indexes from sale;
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name    | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-------------+--------------+-------------+-----------+--
----------+----------+--------+------+------------+---------+---------------+
| sale  |          0 | PRIMARY     |            1 | id          | A         |      694 |     NULL | NULL   |      | BTREE      |         |               |
| sale  |          1 | customer_id |            1 | customer_id | A         |        3 |     NULL | NULL   |      | BTREE      |         |               |
| sale  |          1 | customer_id |            2 | sale_time   | A         |       15 |     NULL | NULL   |      | BTREE      |         |               |
| sale  |          1 | product_id  |            1 | product_id  | A         |        5 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.006 sec)
 
MariaDB [test]> select count(*), count(distinct customer_id), count(distinct customer_id, sale_time), count(distinct product_id) from sale;
+----------+-----------------------------+--------------------------------------
--+----------------------------+
| count(*) | count(distinct customer_id) | count(distinct customer_id, sale_time) | count(distinct product_id) |
+----------+-----------------------------+----------------------------------------+----------------------------+
|      694 |                           3 |                                     15 |                          6 |
+----------+-----------------------------+----------------------------------------+----------------------------+
1 row in set (0.006 sec)

Comment by Marko Mäkelä [ 2019-01-09 ]

This appears to have been done on purpose, many years ago:

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index 7aa5781cba1..75d72eec466 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -14062,13 +14062,6 @@ ha_innobase::info_low(
 					innodb_rec_per_key(index, j,
 							   stats.records));
 
-				/* Since MySQL seems to favor table scans
-				too much over index searches, we pretend
-				index selectivity is 2 times better than
-				our estimate: */
-
-				rec_per_key_int = rec_per_key_int / 2;
-
 				if (rec_per_key_int == 0) {
 					rec_per_key_int = 1;
 				}

It looks like this code was added in a merge a long time ago, in MySQL 4.0.2:

commit 4abd402b87def3f292f77b4091493675d9a37589
Merge: 28670e4757f 1efcc3e35ea
Author: monty@hundin.mysql.fi <>
Date:   Mon Feb 11 13:48:59 2002 +0200
 
    merge with 3.23.48

Inside the merge, I do not see this, even though the comment suggests that it was not written by Monty, but by Heikki Tuuri, the author of InnoDB. Heikki did change some tweaks to statistics slightly earlier, in MySQL 3.23.48:

commit 43439d0c131b7869b581ac8f8e2a3056721f887b
Author: heikki@donna.mysql.fi <>
Date:   Wed Jan 30 16:29:15 2002 +0200
 
    btr0cur.c, ha_innobase.cc:
      Fine-tune optimization parameters; in small tables the estimates are accurate; in big tables let us not be too optimistic about index selectivity

MariaDB 10.0.11 added a compensation for these inflated figures in one place:

commit 9b79feba56d8b0f955447afddbc4ab8b1d4d33c1
Author: Igor Babaev <igor@askmonty.org>
Date:   Thu May 17 16:54:26 2012 -0700
 
    Fixed the bug that caused displaying incorrect values in
    the column cardinality of the table information_schema.statistics.

I think that we should consider removing both tweaks.

Comment by Igor Babaev [ 2019-04-22 ]

The reported problem with SHOW INDEXES can be reproduced starting from 10.0.

Comment by Igor Babaev [ 2019-04-23 ]

A fix for this bug was pushed into 10.1

Generated at Thu Feb 08 08:37:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.