[MDEV-9118] ANALYZE TABLE for Engine independent status fetches blob/text columns without use Created: 2015-11-12  Updated: 2016-01-22  Resolved: 2016-01-22

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: 10.1.11

Type: Task Priority: Major
Reporter: Daniel Black Assignee: Oleksandr Byelkin
Resolution: Fixed Votes: 0
Labels: eits

Issue Links:
Relates
relates to MDEV-9119 Document that engine independent stat... Closed
relates to MDEV-7383 [PATCH] engine-independent-stats colu... Closed
Sprint: 10.1.10, 10.1.11

 Description   

From MDEV-7383, column stats of bob/text columns aren't implemented. They are however still retrieved during the process of ANALYZE TABLE. Removing the retrieval of blob columns during ANALYZE TABLE could potentially save a lot of IO on essentially an IO bound task.



 Comments   
Comment by Oleksandr Byelkin [ 2015-12-16 ]

I am not sure if BLOB should be ignored if it is explicitly mentioned in the fields list. Also I am not sure if warning should be issued.

revision-id: c9e76fec68476b7987cc6ce0a0a74d9d492854e0 (mariadb-10.1.9-20-gc9e76fe)
parent(s): 953d5680a3c050273a8f29253f7386984679f92b
committer: Oleksandr Byelkin
timestamp: 2015-12-16 19:33:41 +0100
message:

MDEV-9118 ANALYZE TABLE for Engine independent status fetchs blob/text columns without use

Do not include BLOB fields by default.

Comment by Sergei Petrunia [ 2015-12-16 ]

I think: if the blob column was explicitly mentioned, but we are not collecting statistics, a warning should be issued.

Now, to the question of what to do when ANALYZE explicitly mentions a blob column:
Looking at the patch:

@@ -1276,7 +1275,6 @@ test	t1	c	aaaa	dddddddd	0.1250	6.6571	7.0000	0	NULL	NULL
 test	t1	d	1989-03-12	1999-07-23	0.1500	3.0000	8.5000	0	NULL	NULL
 test	t1	e	0.01	0.112	0.2250	8.0000	6.2000	0	NULL	NULL
 test	t1	f	1	5	0.2000	1.0000	6.4000	0	NULL	NULL
-test	t1	b	NULL	NULL	0.2000	17.1250	NULL	NULL	NULL	NULL

column names are:

db_name	table_name	column_name	min_value	max_value	nulls_ratio	avg_length	avg_frequency	hist_size	hist_type	histogram

One can see that min_value and max_value are not saved. nulls_ratio and avg_length are saved.

Comment by Sergei Petrunia [ 2015-12-16 ]

Trying to check whether nulls_ratio can be used...

Comment by Sergei Petrunia [ 2015-12-17 ]

Yes, it can be:

create table t1 (pk int primary key, a varchar(32), b blob); 
... # fill the table with data, 40% rows have a=b=NULL, other 60% are non-null values.

MariaDB [test]> explain extended select * from t1 where a is null;
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1000 |    40.00 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

MariaDB [test]> explain extended select * from t1 where b is null;
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1000 |    40.00 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

MariaDB [test]> explain extended select * from t1 where b is not null;
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1000 |    60.00 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

Debugging an ANALYZE TABLE t1 PERSISTENT FOR ALL, I see that column b doesn't have a Unique object. So, does it make sense to collect some stats for blobs after all?

Comment by Sergei Petrunia [ 2015-12-17 ]

.. but if that is enabled, one can also try producing selectivities for other columns, as well.

sanja, I think that for now the code should never collect stats for a blob column. if the column is specified explicitly, emit a warning "Engine-independent statistics are not collected for column %s".

Comment by Oleksandr Byelkin [ 2015-12-18 ]

revision-id: aa67fa247f9758b8361a79fb560ad44ef02397d3 (mariadb-10.1.9-20-gaa67fa2)
parent(s): 953d5680a3c050273a8f29253f7386984679f92b
committer: Oleksandr Byelkin
timestamp: 2015-12-18 11:26:20 +0100
message:

MDEV-9118 ANALYZE TABLE for Engine independent status fetchs blob/text columns without use

Do not include BLOB fields by default.

Comment by Oleksandr Byelkin [ 2015-12-18 ]

revision-id: 59fcd7ff2315d007045eb987da5f21abbea6f6f1 (mariadb-10.1.9-20-g59fcd7f)
parent(s): 953d5680a3c050273a8f29253f7386984679f92b
committer: Oleksandr Byelkin
timestamp: 2015-12-18 12:23:45 +0100
message:

MDEV-9118 ANALYZE TABLE for Engine independent status fetchs blob/text columns without use

Do not include BLOB fields by default.

Comment by Sergei Petrunia [ 2015-12-22 ]

Review feedback sent over email.

Generated at Thu Feb 08 07:32:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.