[MDEV-21472] ALTER TABLE ... ANALYZE PARTITION ... with EITS reads and locks all rows in the table Created: 2020-01-14  Updated: 2024-01-30  Resolved: 2020-07-29

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Partitioning
Affects Version/s: 10.1.34, 10.2.31
Fix Version/s: 10.1.46, 10.2.33, 10.3.24, 10.4.14, 10.5.5

Type: Bug Priority: Critical
Reporter: Valerii Kravchuk Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: eits


 Description   

When EITS is enabled, ALTER TABLE ... ANALYZE PARTITION reads (and locks) all rows in the InnoDB table, even for a tiny partition. As a result, for huge tables it may work for hours and days and does not server the purpose. This does NOT happen without engine-independent persistent statistics enabled.

Consider this table:

MariaDB [test]> show create table tp\G
*************************** 1. row ***************************
       Table: tp
Create Table: CREATE TABLE `tp` (
  `id` int(11) DEFAULT NULL,
  `c1` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
 PARTITION BY RANGE (`id`)
(PARTITION `p0` VALUES LESS THAN (10) ENGINE = InnoDB,
 PARTITION `p1` VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
1 row in set (0,00 sec)
 
MariaDB [test]> show table status like 'tp'\G
*************************** 1. row ***************************
           Name: tp
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 9731209
 Avg_row_length: 32
    Data_length: 318619648
Max_data_length: 0
   Index_length: 0
      Data_free: 5242880
 Auto_increment: NULL
    Create_time: 2020-01-10 17:45:50
    Update_time: 2020-01-14 10:39:57
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: partitioned
        Comment:
1 row in set (0,02 sec)

This is how long it takes to get statistics for the partition with just 9 rows:

MariaDB [test]> set session use_stat_tables='preferably';
Query OK, 0 rows affected (0,00 sec)
 
MariaDB [test]> alter table tp analyze partition p0;
+---------+---------+----------+-----------------------------------------+
| Table   | Op      | Msg_type | Msg_text                                |
+---------+---------+----------+-----------------------------------------+
| test.tp | analyze | status   | Engine-independent statistics collected |
| test.tp | analyze | status   | OK                                      |
+---------+---------+----------+-----------------------------------------+
2 rows in set (17,33 sec)
 
MariaDB [test]> select count(*) from tp;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (5,89 sec)
 
MariaDB [test]> select count(*) from tp partition(p0);
+----------+
| count(*) |
+----------+
|        9 |
+----------+
1 row in set (0,00 sec)

In the INNODB STATUS output taken when this query runs we see:

...
---TRANSACTION 421521529651464, ACTIVE 10 sec fetching rows
mysql tables in use 2, locked 2
12335 lock struct(s), heap size 2105552, 6374971 row lock(s)
MySQL thread id 10, OS thread handle 140046871947008, query id 59 localhost root
alter table tp analyze partition p0
...

So, the statement that should access single partition with just 9 rows reads and locks all rows in the table. As a result it is totally useless, One can just run ANALYZE for the entire table with the same "costs".



 Comments   
Comment by Sergei Petrunia [ 2020-07-25 ]

So how about this solution for versions that are currently GA:

EITS data cannot be collected on a per-partition basis.
Let's then make {{ALTER TABLE t ANALYZE PARTITION ... }} syntax to never collect EITS statistics (regardless of the use_stat_tables setting).

This looks consistent to me, and will solve the customer' complaint.

The patch to implement this is a one-liner:

diff --git a/sql/sql_admin.cc b/sql/sql_admin.cc
index 4afaff58223..7aeedbaa8a2 100644
--- a/sql/sql_admin.cc
+++ b/sql/sql_admin.cc
@@ -729,6 +729,7 @@ static bool mysql_admin_table(THD* thd, TABLE_LIST* tables,
       }
       collect_eis=
         (table->table->s->table_category == TABLE_CATEGORY_USER &&
+        !(lex->alter_info.flags &= Alter_info::ALTER_ADMIN_PARTITION) &&
          (get_use_stat_tables_mode(thd) > NEVER ||
           lex->with_persistent_for_clause));

For subsequent versions, we could look at collecting statistics for each partition. When that is implemented, we will make ANALYZE PARTITION to collect EITS stats if use_stat_tables setting requires it.

valerii igor what are your thoughts?

Comment by Valerii Kravchuk [ 2020-07-26 ]

I think the idea makes sense. If EITS is needed for the entire table there is still a way to get it, but the attempt of per partition collecftion will not cause a lot of unexpected row reads. Maybe add a warning or note that EITS is not really collected in frames of this change.

Comment by Sergei Petrunia [ 2020-07-29 ]

... also ran it through on the Optimizer Call yesterday. No objections.

https://github.com/MariaDB/server/commit/7e9ffc69ecd1bddf731391b01e6be221efb6f8ef

Comment by Michael Widenius [ 2024-01-29 ]

This issue was fixed in 10.2 but merging the code to 10.3 disabled the change.
Will be pushed to 10.4 shortly

Generated at Thu Feb 08 09:07:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.