[MDEV-8271] Min_value from PROCEDURE ANALYSE() is wrong Created: 2015-06-05  Updated: 2022-09-12  Resolved: 2022-09-12

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 5.1.67, 5.2.14, 5.3.12, 5.5, 10.0, 10.1
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Takuya Aoki (Inactive) Assignee: Unassigned
Resolution: Won't Fix Votes: 1
Labels: upstream
Environment:

CentOS release 6.5 (X86_64)


Attachments: File table256.csv     File table257.csv     File test256.sql     File test257.sql    

 Description   

The Min_value returned from PROCEDURE ANALYSE() is false.
This problem is also in MySQL but is not reported anywhere yet.

For column `T_buy_1y` which has values 0 and 1,
Optimal_fieldtype is correctly calculated ENUM('0','1') NOT NULL while the Min_value is 1.

I created a test case with CSV data to import.

[root@kc0022 win]# mysql -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 10
Server version: 10.0.17-MariaDB-log MariaDB Server
 
Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [(none)]> USE test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
MariaDB [test]> DROP TABLE IF EXISTS table256;
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> CREATE TABLE table256 (
    ->   N_O_data_id int(10) unsigned NOT NULL default '0',
    ->   E_OA_recency int(11) default NULL,
    ->   T_buy_1y int(11) default NULL,
    ->   T_buy_6m int(11) default NULL,
    ->   T_buy_2m int(11) default NULL,
    ->   PRIMARY KEY  (N_O_data_id)
    -> ) ENGINE=MyISAM DEFAULT CHARSET=cp932;
Query OK, 0 rows affected (0.04 sec)
 
MariaDB [test]> LOAD DATA INFILE '/win/table256.csv' INTO TABLE table256 FIELDS TERMINATED BY ',' ENCLOSED BY '"';
Query OK, 300000 rows affected (2.10 sec)
Records: 300000  Deleted: 0  Skipped: 0  Warnings: 0
 
MariaDB [test]> SELECT T_buy_1y, COUNT(*) FROM table256 GROUP BY T_buy_1y;
+----------+----------+
| T_buy_1y | COUNT(*) |
+----------+----------+
|        0 |   220579 |
|        1 |    79421 |
+----------+----------+
2 rows in set (0.14 sec)
 
MariaDB [test]> SELECT T_buy_1y FROM table256 PROCEDURE ANALYSE();
+------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
| Field_name             | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std    | Optimal_fieldtype      |
+------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
| test.table256.T_buy_1y | 1         | 1         |          1 |          1 |           220579 |     0 | 0.2647                  | 0.4412 | ENUM('0','1') NOT NULL |
+------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
1 row in set (0.06 sec)

In addition:

It seems to happen when the first data is not 0.
Please look at the below test.
Row `T_buy_2m` returns the correct Min_value because the first value read is a 0.

Altering the table to change the first value of row `T_buy_2m` effects the results of PROCEDURE ANAYSE().

[root@kc0022 ~]# mysql -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 20
Server version: 10.0.17-MariaDB-log MariaDB Server
 
Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [(none)]> USE test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
MariaDB [test]> DROP TABLE IF EXISTS table257;
Query OK, 0 rows affected (0.01 sec)
 
MariaDB [test]> CREATE TABLE table257 (
    ->   N_O_data_id int(10) unsigned NOT NULL default '0',
    ->   E_OA_recency int(11) default NULL,
    ->   T_buy_1y int(11) default NULL,
    ->   T_buy_6m int(11) default NULL,
    ->   T_buy_2m int(11) default NULL,
    ->   PRIMARY KEY  (N_O_data_id)
    -> ) ENGINE=MyISAM DEFAULT CHARSET=cp932;
Query OK, 0 rows affected (0.04 sec)
 
MariaDB [test]> LOAD DATA INFILE '/win/table257.csv' INTO TABLE table257 FIELDS TERMINATED BY ',' ENCLOSED BY '"';
Query OK, 300000 rows affected (1.46 sec)
Records: 300000  Deleted: 0  Skipped: 0  Warnings: 0
 
MariaDB [test]> SELECT T_buy_1y, COUNT(*) FROM table257 GROUP BY T_buy_1y;
+----------+----------+
| T_buy_1y | COUNT(*) |
+----------+----------+
|        0 |   220579 |
|        1 |    79421 |
+----------+----------+
2 rows in set (0.14 sec)
 
MariaDB [test]> SELECT T_buy_6m, COUNT(*) FROM table257 GROUP BY T_buy_6m;
+----------+----------+
| T_buy_6m | COUNT(*) |
+----------+----------+
|        0 |   236346 |
|        1 |    63654 |
+----------+----------+
2 rows in set (0.13 sec)
 
MariaDB [test]> SELECT T_buy_2m, COUNT(*) FROM table257 GROUP BY T_buy_2m;
+----------+----------+
| T_buy_2m | COUNT(*) |
+----------+----------+
|        0 |   275832 |
|        1 |    24168 |
+----------+----------+
2 rows in set (0.13 sec)
 
MariaDB [test]> SELECT * FROM table257 PROCEDURE ANALYSE();
+----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
| Field_name                 | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std         | Optimal_fieldtype              |
+----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
| test.table257.N_O_data_id  | 1         | 300000    |          1 |          6 |                0 |     0 | 150000.5000             | 173337.8797 | MEDIUMINT(6) UNSIGNED NOT NULL |
| test.table257.E_OA_recency | 1         | 1839      |          1 |          4 |              807 |     0 | 936.2592                | 639.5488    | SMALLINT(4) UNSIGNED NOT NULL  |
| test.table257.T_buy_1y     | 1         | 1         |          1 |          1 |           220579 |     0 | 0.2647                  | 0.4412      | ENUM('0','1') NOT NULL         |
| test.table257.T_buy_6m     | 1         | 1         |          1 |          1 |           236346 |     0 | 0.2122                  | 0.4089      | ENUM('0','1') NOT NULL         |
| test.table257.T_buy_2m     | 0         | 1         |          1 |          1 |           275832 |     0 | 0.0806                  | 0.2722      | ENUM('0','1') NOT NULL         |
+----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
5 rows in set (0.15 sec)
 
MariaDB [test]> ALTER TABLE table257 ORDER BY T_buy_2m DESC;
Query OK, 300000 rows affected (0.90 sec)
Records: 300000  Duplicates: 0  Warnings: 0
 
MariaDB [test]> SELECT * FROM table257 PROCEDURE ANALYSE();
+----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
| Field_name                 | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std         | Optimal_fieldtype              |
+----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
| test.table257.N_O_data_id  | 1         | 300000    |          1 |          6 |                0 |     0 | 150000.5000             | 173337.8797 | MEDIUMINT(6) UNSIGNED NOT NULL |
| test.table257.E_OA_recency | 1         | 1839      |          1 |          4 |              807 |     0 | 936.2592                | 639.5488    | SMALLINT(4) UNSIGNED NOT NULL  |
| test.table257.T_buy_1y     | 1         | 1         |          1 |          1 |           220579 |     0 | 0.2647                  | 0.4412      | ENUM('0','1') NOT NULL         |
| test.table257.T_buy_6m     | 1         | 1         |          1 |          1 |           236346 |     0 | 0.2122                  | 0.4089      | ENUM('0','1') NOT NULL         |
| test.table257.T_buy_2m     | 1         | 1         |          1 |          1 |           275832 |     0 | 0.0806                  | 0.2722      | ENUM('0','1') NOT NULL         |
+----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
5 rows in set (0.16 sec)
 
MariaDB [test]> ALTER TABLE table257 ORDER BY T_buy_2m ASC;
Query OK, 300000 rows affected (0.88 sec)
Records: 300000  Duplicates: 0  Warnings: 0
 
MariaDB [test]> SELECT * FROM table257 PROCEDURE ANALYSE();
+----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
| Field_name                 | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std         | Optimal_fieldtype              |
+----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
| test.table257.N_O_data_id  | 1         | 300000    |          1 |          6 |                0 |     0 | 150000.5000             | 173337.8797 | MEDIUMINT(6) UNSIGNED NOT NULL |
| test.table257.E_OA_recency | 1         | 1839      |          1 |          4 |              807 |     0 | 936.2592                | 639.5488    | SMALLINT(4) UNSIGNED NOT NULL  |
| test.table257.T_buy_1y     | 1         | 1         |          1 |          1 |           220579 |     0 | 0.2647                  | 0.4412      | ENUM('0','1') NOT NULL         |
| test.table257.T_buy_6m     | 1         | 1         |          1 |          1 |           236346 |     0 | 0.2122                  | 0.4089      | ENUM('0','1') NOT NULL         |
| test.table257.T_buy_2m     | 0         | 1         |          1 |          1 |           275832 |     0 | 0.0806                  | 0.2722      | ENUM('0','1') NOT NULL         |
+----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
5 rows in set (0.15 sec)



 Comments   
Comment by Elena Stepanova [ 2015-06-05 ]

I think it might be intentional, because zeros are counted separately in the Empties_or_zeros column; but I haven't found anything in either MySQL manual or MariaDB KB to confirm that, so I will go for an expert opinion.

serg,
Can you confirm (or deny) that it's intentional to ignore 0 as Min_value or Max_value in PROCEDURE ANALYSE output?

Comment by Takuya Aoki (Inactive) [ 2015-06-08 ]

I don't think it's intentional, because there are cases when the Min_value is correctly calculated as 0.

By the way, this error seems to occur when the column contains 0 with other values (can be more than one value).
And it does not occur at every row that satisfies this condition so I don't know the specific cause.

Comment by Sergei Golubchik [ 2015-06-08 ]

This is intentional. The code looks like

  if (num == 0)
    empty++;
...
  else if (num != 0)
  {
    sum += num;
    sum_sqr += num * num;
    if (length < min_length)
      min_length = length;
    if (length > max_length)
      max_length = length;
    if (compare_longlong(&num, &min_arg) < 0)
      min_arg = num;
    if (compare_longlong(&num, &max_arg) > 0)
      max_arg = num;
  }

That is it only updates min/max value and min/max length if the value is not zero.

Comment by Takuya Aoki (Inactive) [ 2015-06-08 ]

Sorry I am not a programmer so I don't understand.
Why are there cases when Min_value takes 0?
It is not so much a problem if 0 is always ignored.

Comment by Sergei Golubchik [ 2015-06-08 ]

Could you show an example where Min_value is 0?
Perhaps it happens only when all values in the column are zeros?

Comment by Takuya Aoki (Inactive) [ 2015-06-09 ]

I added a test case.
It seems to be a bug, depending on the value of the first line the Min_value goes wrong.

Comment by Elena Stepanova [ 2015-06-09 ]

I agree, I don't think there is a good excuse for this.
Here is a simpler test case which demonstrates the same problem:

drop table if exists t1;
 
create table t1 (i int);
select * from t1 procedure analyse();
insert into t1 values (1);
select * from t1 procedure analyse();
insert into t1 values (0);
select * from t1 procedure analyse();
drop table t1;
 
create table t1 (i int);
select * from t1 procedure analyse();
insert into t1 values (0);
select * from t1 procedure analyse();
insert into t1 values (1);
select * from t1 procedure analyse();
drop table t1;

Actual output

MariaDB [test]> create table t1 (i int);
Query OK, 0 rows affected (0.87 sec)
 
MariaDB [test]> select * from t1 procedure analyse();
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+------+-------------------+
| Field_name | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std  | Optimal_fieldtype |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+------+-------------------+
| test.t1.i  | NULL      | NULL      |          0 |          0 |                0 |     0 | 0.0                     | 0.0  | CHAR(0) NOT NULL  |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+------+-------------------+
1 row in set (0.00 sec)
 
MariaDB [test]> insert into t1 values (1);
Query OK, 1 row affected (0.10 sec)
 
MariaDB [test]> select * from t1 procedure analyse();
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+--------------------+
| Field_name | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std    | Optimal_fieldtype  |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+--------------------+
| test.t1.i  | 1         | 1         |          1 |          1 |                0 |     0 | 1.0000                  | 0.0000 | ENUM('1') NOT NULL |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+--------------------+
1 row in set (0.00 sec)
 
MariaDB [test]> insert into t1 values (0);
Query OK, 1 row affected (0.02 sec)
 
MariaDB [test]> select * from t1 procedure analyse();
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
| Field_name | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std    | Optimal_fieldtype      |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
| test.t1.i  | 1         | 1         |          1 |          1 |                1 |     0 | 0.5000                  | 0.5000 | ENUM('0','1') NOT NULL |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
1 row in set (0.01 sec)

MariaDB [test]> create table t1 (i int);
Query OK, 0 rows affected (0.78 sec)
 
MariaDB [test]> select * from t1 procedure analyse();
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+------+-------------------+
| Field_name | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std  | Optimal_fieldtype |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+------+-------------------+
| test.t1.i  | NULL      | NULL      |          0 |          0 |                0 |     0 | 0.0                     | 0.0  | CHAR(0) NOT NULL  |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+------+-------------------+
1 row in set (0.00 sec)
 
MariaDB [test]> insert into t1 values (0);
Query OK, 1 row affected (0.04 sec)
 
MariaDB [test]> select * from t1 procedure analyse();
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+--------------------+
| Field_name | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std    | Optimal_fieldtype  |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+--------------------+
| test.t1.i  | 0         | 0         |          1 |          1 |                1 |     0 | 0.0000                  | 0.0000 | ENUM('0') NOT NULL |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+--------------------+
1 row in set (0.00 sec)
 
MariaDB [test]> insert into t1 values (1);
Query OK, 1 row affected (0.04 sec)
 
MariaDB [test]> select * from t1 procedure analyse();
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
| Field_name | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std    | Optimal_fieldtype      |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
| test.t1.i  | 0         | 1         |          1 |          1 |                1 |     0 | 0.5000                  | 0.5000 | ENUM('0','1') NOT NULL |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
1 row in set (0.00 sec)

While not taking into account 0 always looks like a very questionable design decision, here we see that zeros are ignored in some cases but not another.

takuya,
As you said before, it's the same with MySQL (as far as I can see, all of 5.1-5.7 at least). Are you planning to report it at bugs.mysql.com?

For a note, another curious inconsistency:

MariaDB [test]> drop table t1;
Query OK, 0 rows affected (0.26 sec)
 
MariaDB [test]> create table t1 (i int) engine=InnoDB;
Query OK, 0 rows affected (0.71 sec)
 
MariaDB [test]> select * from t1 procedure analyse();
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+------+-------------------+
| Field_name | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std  | Optimal_fieldtype |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+------+-------------------+
| test.t1.i  | NULL      | NULL      |          0 |          0 |                0 |     0 | 0.0                     | 0.0  | CHAR(0) NOT NULL  |
+------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+------+-------------------+
1 row in set (0.00 sec)
 
MariaDB [test]> drop table t1;
Query OK, 0 rows affected (0.41 sec)
 
MariaDB [test]> create table t1 (i int) engine=MyISAM;
Query OK, 0 rows affected (0.15 sec)
 
MariaDB [test]> select * from t1 procedure analyse();
Empty set (0.00 sec)

That is, for an empty InnoDB table it returns rows, while for MyISAM it's an empty set.

Comment by Takuya Aoki (Inactive) [ 2015-06-10 ]

I didn't know about the second case where the table is empty.
I don't plan to report it to bugs.mysql.com since I use MariaDB now.

Comment by Elena Stepanova [ 2015-06-10 ]

Okay, I filed one on your behalf (http://bugs.mysql.com/bug.php?id=77299).

Comment by Takuya Aoki (Inactive) [ 2015-06-11 ]

Thank you, will it be fixed in MySQL beforehand?

Comment by Elena Stepanova [ 2015-06-11 ]

As the normal routine goes, we will wait for a while to see if it gets fixed in the upstream version. If it does, we will merge or backport the fix. If it does not, we might consider fixing it directly in MariaDB.

Comment by Sergei Golubchik [ 2022-09-12 ]

10.0 was EOLed in March 2019

Generated at Thu Feb 08 07:25:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.