Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9118

ANALYZE TABLE for Engine independent status fetches blob/text columns without use

Details

    • 10.1.10, 10.1.11

    Description

      From MDEV-7383, column stats of bob/text columns aren't implemented. They are however still retrieved during the process of ANALYZE TABLE. Removing the retrieval of blob columns during ANALYZE TABLE could potentially save a lot of IO on essentially an IO bound task.

      Attachments

        Issue Links

          Activity

            I am not sure if BLOB should be ignored if it is explicitly mentioned in the fields list. Also I am not sure if warning should be issued.

            revision-id: c9e76fec68476b7987cc6ce0a0a74d9d492854e0 (mariadb-10.1.9-20-gc9e76fe)
            parent(s): 953d5680a3c050273a8f29253f7386984679f92b
            committer: Oleksandr Byelkin
            timestamp: 2015-12-16 19:33:41 +0100
            message:

            MDEV-9118 ANALYZE TABLE for Engine independent status fetchs blob/text columns without use

            Do not include BLOB fields by default.

            sanja Oleksandr Byelkin added a comment - I am not sure if BLOB should be ignored if it is explicitly mentioned in the fields list. Also I am not sure if warning should be issued. revision-id: c9e76fec68476b7987cc6ce0a0a74d9d492854e0 (mariadb-10.1.9-20-gc9e76fe) parent(s): 953d5680a3c050273a8f29253f7386984679f92b committer: Oleksandr Byelkin timestamp: 2015-12-16 19:33:41 +0100 message: MDEV-9118 ANALYZE TABLE for Engine independent status fetchs blob/text columns without use Do not include BLOB fields by default. —

            I think: if the blob column was explicitly mentioned, but we are not collecting statistics, a warning should be issued.

            Now, to the question of what to do when ANALYZE explicitly mentions a blob column:
            Looking at the patch:

            @@ -1276,7 +1275,6 @@ test	t1	c	aaaa	dddddddd	0.1250	6.6571	7.0000	0	NULL	NULL
             test	t1	d	1989-03-12	1999-07-23	0.1500	3.0000	8.5000	0	NULL	NULL
             test	t1	e	0.01	0.112	0.2250	8.0000	6.2000	0	NULL	NULL
             test	t1	f	1	5	0.2000	1.0000	6.4000	0	NULL	NULL
            -test	t1	b	NULL	NULL	0.2000	17.1250	NULL	NULL	NULL	NULL

            column names are:

            db_name	table_name	column_name	min_value	max_value	nulls_ratio	avg_length	avg_frequency	hist_size	hist_type	histogram

            One can see that min_value and max_value are not saved. nulls_ratio and avg_length are saved.

            psergei Sergei Petrunia added a comment - I think: if the blob column was explicitly mentioned, but we are not collecting statistics, a warning should be issued. Now, to the question of what to do when ANALYZE explicitly mentions a blob column: Looking at the patch: @@ -1276,7 +1275,6 @@ test t1 c aaaa dddddddd 0.1250 6.6571 7.0000 0 NULL NULL test t1 d 1989-03-12 1999-07-23 0.1500 3.0000 8.5000 0 NULL NULL test t1 e 0.01 0.112 0.2250 8.0000 6.2000 0 NULL NULL test t1 f 1 5 0.2000 1.0000 6.4000 0 NULL NULL -test t1 b NULL NULL 0.2000 17.1250 NULL NULL NULL NULL column names are: db_name table_name column_name min_value max_value nulls_ratio avg_length avg_frequency hist_size hist_type histogram One can see that min_value and max_value are not saved. nulls_ratio and avg_length are saved.

            Trying to check whether nulls_ratio can be used...

            psergei Sergei Petrunia added a comment - Trying to check whether nulls_ratio can be used...

            Yes, it can be:

            create table t1 (pk int primary key, a varchar(32), b blob); 
            ... # fill the table with data, 40% rows have a=b=NULL, other 60% are non-null values.

            MariaDB [test]> explain extended select * from t1 where a is null;
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
            |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1000 |    40.00 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

            MariaDB [test]> explain extended select * from t1 where b is null;
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
            |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1000 |    40.00 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

            MariaDB [test]> explain extended select * from t1 where b is not null;
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+
            |    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1000 |    60.00 | Using where |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+

            Debugging an ANALYZE TABLE t1 PERSISTENT FOR ALL, I see that column b doesn't have a Unique object. So, does it make sense to collect some stats for blobs after all?

            psergei Sergei Petrunia added a comment - Yes, it can be: create table t1 (pk int primary key, a varchar(32), b blob); ... # fill the table with data, 40% rows have a=b=NULL, other 60% are non-null values. MariaDB [test]> explain extended select * from t1 where a is null; +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | 40.00 | Using where | +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ MariaDB [test]> explain extended select * from t1 where b is null; +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | 40.00 | Using where | +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ MariaDB [test]> explain extended select * from t1 where b is not null; +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 1000 | 60.00 | Using where | +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------+ Debugging an ANALYZE TABLE t1 PERSISTENT FOR ALL , I see that column b doesn't have a Unique object. So, does it make sense to collect some stats for blobs after all?

            .. but if that is enabled, one can also try producing selectivities for other columns, as well.

            sanja, I think that for now the code should never collect stats for a blob column. if the column is specified explicitly, emit a warning "Engine-independent statistics are not collected for column %s".

            psergei Sergei Petrunia added a comment - .. but if that is enabled, one can also try producing selectivities for other columns, as well. sanja , I think that for now the code should never collect stats for a blob column. if the column is specified explicitly, emit a warning "Engine-independent statistics are not collected for column %s".

            revision-id: aa67fa247f9758b8361a79fb560ad44ef02397d3 (mariadb-10.1.9-20-gaa67fa2)
            parent(s): 953d5680a3c050273a8f29253f7386984679f92b
            committer: Oleksandr Byelkin
            timestamp: 2015-12-18 11:26:20 +0100
            message:

            MDEV-9118 ANALYZE TABLE for Engine independent status fetchs blob/text columns without use

            Do not include BLOB fields by default.

            sanja Oleksandr Byelkin added a comment - revision-id: aa67fa247f9758b8361a79fb560ad44ef02397d3 (mariadb-10.1.9-20-gaa67fa2) parent(s): 953d5680a3c050273a8f29253f7386984679f92b committer: Oleksandr Byelkin timestamp: 2015-12-18 11:26:20 +0100 message: MDEV-9118 ANALYZE TABLE for Engine independent status fetchs blob/text columns without use Do not include BLOB fields by default. —

            revision-id: 59fcd7ff2315d007045eb987da5f21abbea6f6f1 (mariadb-10.1.9-20-g59fcd7f)
            parent(s): 953d5680a3c050273a8f29253f7386984679f92b
            committer: Oleksandr Byelkin
            timestamp: 2015-12-18 12:23:45 +0100
            message:

            MDEV-9118 ANALYZE TABLE for Engine independent status fetchs blob/text columns without use

            Do not include BLOB fields by default.

            sanja Oleksandr Byelkin added a comment - revision-id: 59fcd7ff2315d007045eb987da5f21abbea6f6f1 (mariadb-10.1.9-20-g59fcd7f) parent(s): 953d5680a3c050273a8f29253f7386984679f92b committer: Oleksandr Byelkin timestamp: 2015-12-18 12:23:45 +0100 message: MDEV-9118 ANALYZE TABLE for Engine independent status fetchs blob/text columns without use Do not include BLOB fields by default. —

            Review feedback sent over email.

            psergei Sergei Petrunia added a comment - Review feedback sent over email.

            People

              sanja Oleksandr Byelkin
              danblack Daniel Black
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.