Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8271

Min_value from PROCEDURE ANALYSE() is wrong

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Won't Fix
    • 5.1.67, 5.2.14, 5.3.12, 5.5(EOL), 10.0(EOL), 10.1(EOL)
    • N/A
    • OTHER
    •  CentOS release 6.5 (X86_64)

    Description

      The Min_value returned from PROCEDURE ANALYSE() is false.
      This problem is also in MySQL but is not reported anywhere yet.

      For column `T_buy_1y` which has values 0 and 1,
      Optimal_fieldtype is correctly calculated ENUM('0','1') NOT NULL while the Min_value is 1.

      I created a test case with CSV data to import.

      [root@kc0022 win]# mysql -u root
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 10
      Server version: 10.0.17-MariaDB-log MariaDB Server
       
      Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      MariaDB [(none)]> USE test;
      Reading table information for completion of table and column names
      You can turn off this feature to get a quicker startup with -A
       
      Database changed
      MariaDB [test]> DROP TABLE IF EXISTS table256;
      Query OK, 0 rows affected (0.00 sec)
       
      MariaDB [test]> CREATE TABLE table256 (
          ->   N_O_data_id int(10) unsigned NOT NULL default '0',
          ->   E_OA_recency int(11) default NULL,
          ->   T_buy_1y int(11) default NULL,
          ->   T_buy_6m int(11) default NULL,
          ->   T_buy_2m int(11) default NULL,
          ->   PRIMARY KEY  (N_O_data_id)
          -> ) ENGINE=MyISAM DEFAULT CHARSET=cp932;
      Query OK, 0 rows affected (0.04 sec)
       
      MariaDB [test]> LOAD DATA INFILE '/win/table256.csv' INTO TABLE table256 FIELDS TERMINATED BY ',' ENCLOSED BY '"';
      Query OK, 300000 rows affected (2.10 sec)
      Records: 300000  Deleted: 0  Skipped: 0  Warnings: 0
       
      MariaDB [test]> SELECT T_buy_1y, COUNT(*) FROM table256 GROUP BY T_buy_1y;
      +----------+----------+
      | T_buy_1y | COUNT(*) |
      +----------+----------+
      |        0 |   220579 |
      |        1 |    79421 |
      +----------+----------+
      2 rows in set (0.14 sec)
       
      MariaDB [test]> SELECT T_buy_1y FROM table256 PROCEDURE ANALYSE();
      +------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
      | Field_name             | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std    | Optimal_fieldtype      |
      +------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
      | test.table256.T_buy_1y | 1         | 1         |          1 |          1 |           220579 |     0 | 0.2647                  | 0.4412 | ENUM('0','1') NOT NULL |
      +------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
      1 row in set (0.06 sec)

      In addition:

      It seems to happen when the first data is not 0.
      Please look at the below test.
      Row `T_buy_2m` returns the correct Min_value because the first value read is a 0.

      Altering the table to change the first value of row `T_buy_2m` effects the results of PROCEDURE ANAYSE().

      [root@kc0022 ~]# mysql -u root
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 20
      Server version: 10.0.17-MariaDB-log MariaDB Server
       
      Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      MariaDB [(none)]> USE test;
      Reading table information for completion of table and column names
      You can turn off this feature to get a quicker startup with -A
       
      Database changed
      MariaDB [test]> DROP TABLE IF EXISTS table257;
      Query OK, 0 rows affected (0.01 sec)
       
      MariaDB [test]> CREATE TABLE table257 (
          ->   N_O_data_id int(10) unsigned NOT NULL default '0',
          ->   E_OA_recency int(11) default NULL,
          ->   T_buy_1y int(11) default NULL,
          ->   T_buy_6m int(11) default NULL,
          ->   T_buy_2m int(11) default NULL,
          ->   PRIMARY KEY  (N_O_data_id)
          -> ) ENGINE=MyISAM DEFAULT CHARSET=cp932;
      Query OK, 0 rows affected (0.04 sec)
       
      MariaDB [test]> LOAD DATA INFILE '/win/table257.csv' INTO TABLE table257 FIELDS TERMINATED BY ',' ENCLOSED BY '"';
      Query OK, 300000 rows affected (1.46 sec)
      Records: 300000  Deleted: 0  Skipped: 0  Warnings: 0
       
      MariaDB [test]> SELECT T_buy_1y, COUNT(*) FROM table257 GROUP BY T_buy_1y;
      +----------+----------+
      | T_buy_1y | COUNT(*) |
      +----------+----------+
      |        0 |   220579 |
      |        1 |    79421 |
      +----------+----------+
      2 rows in set (0.14 sec)
       
      MariaDB [test]> SELECT T_buy_6m, COUNT(*) FROM table257 GROUP BY T_buy_6m;
      +----------+----------+
      | T_buy_6m | COUNT(*) |
      +----------+----------+
      |        0 |   236346 |
      |        1 |    63654 |
      +----------+----------+
      2 rows in set (0.13 sec)
       
      MariaDB [test]> SELECT T_buy_2m, COUNT(*) FROM table257 GROUP BY T_buy_2m;
      +----------+----------+
      | T_buy_2m | COUNT(*) |
      +----------+----------+
      |        0 |   275832 |
      |        1 |    24168 |
      +----------+----------+
      2 rows in set (0.13 sec)
       
      MariaDB [test]> SELECT * FROM table257 PROCEDURE ANALYSE();
      +----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
      | Field_name                 | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std         | Optimal_fieldtype              |
      +----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
      | test.table257.N_O_data_id  | 1         | 300000    |          1 |          6 |                0 |     0 | 150000.5000             | 173337.8797 | MEDIUMINT(6) UNSIGNED NOT NULL |
      | test.table257.E_OA_recency | 1         | 1839      |          1 |          4 |              807 |     0 | 936.2592                | 639.5488    | SMALLINT(4) UNSIGNED NOT NULL  |
      | test.table257.T_buy_1y     | 1         | 1         |          1 |          1 |           220579 |     0 | 0.2647                  | 0.4412      | ENUM('0','1') NOT NULL         |
      | test.table257.T_buy_6m     | 1         | 1         |          1 |          1 |           236346 |     0 | 0.2122                  | 0.4089      | ENUM('0','1') NOT NULL         |
      | test.table257.T_buy_2m     | 0         | 1         |          1 |          1 |           275832 |     0 | 0.0806                  | 0.2722      | ENUM('0','1') NOT NULL         |
      +----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
      5 rows in set (0.15 sec)
       
      MariaDB [test]> ALTER TABLE table257 ORDER BY T_buy_2m DESC;
      Query OK, 300000 rows affected (0.90 sec)
      Records: 300000  Duplicates: 0  Warnings: 0
       
      MariaDB [test]> SELECT * FROM table257 PROCEDURE ANALYSE();
      +----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
      | Field_name                 | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std         | Optimal_fieldtype              |
      +----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
      | test.table257.N_O_data_id  | 1         | 300000    |          1 |          6 |                0 |     0 | 150000.5000             | 173337.8797 | MEDIUMINT(6) UNSIGNED NOT NULL |
      | test.table257.E_OA_recency | 1         | 1839      |          1 |          4 |              807 |     0 | 936.2592                | 639.5488    | SMALLINT(4) UNSIGNED NOT NULL  |
      | test.table257.T_buy_1y     | 1         | 1         |          1 |          1 |           220579 |     0 | 0.2647                  | 0.4412      | ENUM('0','1') NOT NULL         |
      | test.table257.T_buy_6m     | 1         | 1         |          1 |          1 |           236346 |     0 | 0.2122                  | 0.4089      | ENUM('0','1') NOT NULL         |
      | test.table257.T_buy_2m     | 1         | 1         |          1 |          1 |           275832 |     0 | 0.0806                  | 0.2722      | ENUM('0','1') NOT NULL         |
      +----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
      5 rows in set (0.16 sec)
       
      MariaDB [test]> ALTER TABLE table257 ORDER BY T_buy_2m ASC;
      Query OK, 300000 rows affected (0.88 sec)
      Records: 300000  Duplicates: 0  Warnings: 0
       
      MariaDB [test]> SELECT * FROM table257 PROCEDURE ANALYSE();
      +----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
      | Field_name                 | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std         | Optimal_fieldtype              |
      +----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
      | test.table257.N_O_data_id  | 1         | 300000    |          1 |          6 |                0 |     0 | 150000.5000             | 173337.8797 | MEDIUMINT(6) UNSIGNED NOT NULL |
      | test.table257.E_OA_recency | 1         | 1839      |          1 |          4 |              807 |     0 | 936.2592                | 639.5488    | SMALLINT(4) UNSIGNED NOT NULL  |
      | test.table257.T_buy_1y     | 1         | 1         |          1 |          1 |           220579 |     0 | 0.2647                  | 0.4412      | ENUM('0','1') NOT NULL         |
      | test.table257.T_buy_6m     | 1         | 1         |          1 |          1 |           236346 |     0 | 0.2122                  | 0.4089      | ENUM('0','1') NOT NULL         |
      | test.table257.T_buy_2m     | 0         | 1         |          1 |          1 |           275832 |     0 | 0.0806                  | 0.2722      | ENUM('0','1') NOT NULL         |
      +----------------------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+-------------+--------------------------------+
      5 rows in set (0.15 sec)

      Attachments

        1. test257.sql
          0.8 kB
        2. test256.sql
          0.5 kB
        3. table257.csv
          7.72 MB
        4. table256.csv
          7.72 MB

        Activity

          People

            Unassigned Unassigned
            takuya Takuya Aoki (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.