Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6181

EITS could eat all tmpdir space and hang

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.10
    • 10.0.13
    • None
    • None

    Description

      Hi,

      While testing the EIST feature, I tried to analyze a 24GB table with use_stat_tables='preferably'. It ended up with a full tmpdir without any warnings from MariaDB. (and a stuck ANALYZE TABLE command).

      I would have expected MariaDB to respect the tmp_table_size value while trying to compute the statistics ?

      Thanks and regards,
      Jocelyn Fournier

      Attachments

        Issue Links

          Activity

            jocel1 jocelyn fournier created issue -
            jocel1 jocelyn fournier made changes -
            Field Original Value New Value
            Summary EIST could eat all tmpdir size EIST could eat all tmpdir space
            jocel1 jocelyn fournier made changes -
            Description Hi,

            While testing the EIST feature, I tried to analyze a 24GB table with use_stat_tables='preferably'. It ended up with a full tmpdir without any warnings from MariaDB. (and a stuck ANALYZE TABLE command).

            I would have expect MariaDB to respect the tmp_table_size value while trying to compute the statistics ?

            Thanks and regards,
               Jocelyn Fournier
            Hi,

            While testing the EIST feature, I tried to analyze a 24GB table with use_stat_tables='preferably'. It ended up with a full tmpdir without any warnings from MariaDB. (and a stuck ANALYZE TABLE command).

            I would have expected MariaDB to respect the tmp_table_size value while trying to compute the statistics ?

            Thanks and regards,
               Jocelyn Fournier

            So far I don't see that ANALYZE TABLE with enabled statistics creates temporary tables, according to status, it creates a temporary file:

            +-------------------------+-------+
            | Variable_name           | Value |
            +-------------------------+-------+
            | Created_tmp_disk_tables | 0     |
            | Created_tmp_files       | 1     |
            | Created_tmp_tables      | 0     |
            +-------------------------+-------+

            But it's with relatively tables, I will try a bigger one, maybe it makes the difference.

            However, even if it does create a table in tmpdir, how does it mean that it ignored tmp_table_size?

            tmp_table_size
            The maximum size of internal in-memory temporary tables. (The actual limit is determined as the minimum of tmp_table_size and max_heap_table_size.) If an in-memory temporary table exceeds the limit, MySQL automatically converts it to an on-disk MyISAM table.

            So, on the contrary, it would mean it respects tmp_table_size. Could you please clarify your point?

            elenst Elena Stepanova added a comment - So far I don't see that ANALYZE TABLE with enabled statistics creates temporary tables, according to status, it creates a temporary file: +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Created_tmp_disk_tables | 0 | | Created_tmp_files | 1 | | Created_tmp_tables | 0 | +-------------------------+-------+ But it's with relatively tables, I will try a bigger one, maybe it makes the difference. However, even if it does create a table in tmpdir, how does it mean that it ignored tmp_table_size? tmp_table_size The maximum size of internal in-memory temporary tables. (The actual limit is determined as the minimum of tmp_table_size and max_heap_table_size.) If an in-memory temporary table exceeds the limit, MySQL automatically converts it to an on-disk MyISAM table. So, on the contrary, it would mean it respects tmp_table_size. Could you please clarify your point?
            elenst Elena Stepanova made changes -
            Assignee Elena Stepanova [ elenst ]

            Hi Elena,

            My bug report seems to be incomplete/misleading indeed. EIST does create a tmp files, not a tmp table. But I was not expecting this tmp file to be created in the tmpdir, but rather in the same directory than the original table (not sure if I should be considered as an online DDL / fast index creation hence using the tmpdir if a full scan is needed by EIST).

            Thanks,
            Jocelyn

            jocel1 jocelyn fournier added a comment - Hi Elena, My bug report seems to be incomplete/misleading indeed. EIST does create a tmp files, not a tmp table. But I was not expecting this tmp file to be created in the tmpdir, but rather in the same directory than the original table (not sure if I should be considered as an online DDL / fast index creation hence using the tmpdir if a full scan is needed by EIST). Thanks, Jocelyn

            Okay, so there are three questions/issues here as I understand.

            1) the files are created in tmpdir rather than in datadir – it's normal, because these are really just temporary files, not tables. Tables in some circumstances can be created in the datadir, but files are created in tmpdir.

            2) the operation requires a lot of space – yes, with the current implementation it can indeed be so. Although I didn't get that much consumption in my local experiments – my table is much smaller, – psergey explained that, depending on the number of columns and the data in these columns (how many unique values there are), the temporary files can grow big. While it might be hypothetically possible to use algorithms with less precision and hence less consumption, they might lack the determinism which is the characteristic of EITS.

            3) when all space is used, the operation hangs. I think it's a bug, it needs to be looked into. I'm going to modify the summary of this report to "EITS could eat all tmpdir space and hang". The bug here would be that the operation freezes instead of retreating with an error and freeing the space. In fact, it might affect some other operations, not just EITS collection.

            Also, you never mentioned – how much space do you have for tmpdir?

            elenst Elena Stepanova added a comment - Okay, so there are three questions/issues here as I understand. 1) the files are created in tmpdir rather than in datadir – it's normal, because these are really just temporary files , not tables . Tables in some circumstances can be created in the datadir, but files are created in tmpdir. 2) the operation requires a lot of space – yes, with the current implementation it can indeed be so. Although I didn't get that much consumption in my local experiments – my table is much smaller, – psergey explained that, depending on the number of columns and the data in these columns (how many unique values there are), the temporary files can grow big. While it might be hypothetically possible to use algorithms with less precision and hence less consumption, they might lack the determinism which is the characteristic of EITS. 3) when all space is used, the operation hangs. I think it's a bug, it needs to be looked into. I'm going to modify the summary of this report to "EITS could eat all tmpdir space and hang". The bug here would be that the operation freezes instead of retreating with an error and freeing the space. In fact, it might affect some other operations, not just EITS collection. Also, you never mentioned – how much space do you have for tmpdir?
            jocel1 jocelyn fournier added a comment - - edited

            Hi Elena,

            For the 24GB table, the whole tmpdir (13G) was completely filled by EITS.
            And the table isn't really complex in my case :

            CREATE TABLE `url_list` (
              `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
              `url` varchar(2048) NOT NULL DEFAULT '',
              `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
              `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0',
              `id_license` int(10) unsigned NOT NULL DEFAULT '0',
              `dom` mediumint(8) unsigned NOT NULL DEFAULT '0',
              `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0',
              `gen` mediumint(8) unsigned NOT NULL DEFAULT '0',
              `script` varchar(2048) NOT NULL DEFAULT '',
              `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0',
              PRIMARY KEY (`id`),
              KEY `date` (`date`)
            ) ENGINE=TokuDB AUTO_INCREMENT=116935390 DEFAULT CHARSET=latin1 `compression`='tokudb_small' 

            Jocelyn

            jocel1 jocelyn fournier added a comment - - edited Hi Elena, For the 24GB table, the whole tmpdir (13G) was completely filled by EITS. And the table isn't really complex in my case : CREATE TABLE `url_list` ( `id` int (10) unsigned NOT NULL AUTO_INCREMENT, `url` varchar (2048) NOT NULL DEFAULT '' , ` date ` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP , `id_license_domain` int (10) unsigned NOT NULL DEFAULT '0' , `id_license` int (10) unsigned NOT NULL DEFAULT '0' , `dom` mediumint(8) unsigned NOT NULL DEFAULT '0' , `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0' , `gen` mediumint(8) unsigned NOT NULL DEFAULT '0' , `script` varchar (2048) NOT NULL DEFAULT '' , `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0' , PRIMARY KEY (`id`), KEY ` date ` (` date `) ) ENGINE=TokuDB AUTO_INCREMENT=116935390 DEFAULT CHARSET=latin1 `compression`= 'tokudb_small' Jocelyn

            Right, as discussed with psergey (again), it's indeed possible, maybe even more so since the actual table is compressed, while the tempfile storage format is inefficient.
            As a workaround, you might try to analyze only a few columns at once:
            ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...)

            So, I'm going to convert the bug into a "don't hang on full tmpdir" bug.
            We should also document the high tmpdir consumption on EITS collection.

            elenst Elena Stepanova added a comment - Right, as discussed with psergey (again), it's indeed possible, maybe even more so since the actual table is compressed, while the tempfile storage format is inefficient. As a workaround, you might try to analyze only a few columns at once: ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) So, I'm going to convert the bug into a "don't hang on full tmpdir" bug. We should also document the high tmpdir consumption on EITS collection.

            As discussed above, some operations, including EITS collection, can require a lot of tmpdir space by creating temporary files, and after using up all of it, they hang, apparently waiting for the space to be freed.
            If the operation(s) never finish, the space will be never freed, and thus they can block themselves and future actions.

            I suggest to consider throwing an error when the space cannot be allocated – either immediately, or after some waiting timeout, or upon some smart algorithm detecting whether there is a reason to wait. Immediate retreat should probably be simplest, and it's still better than hanging.

            elenst Elena Stepanova added a comment - As discussed above, some operations, including EITS collection, can require a lot of tmpdir space by creating temporary files, and after using up all of it, they hang, apparently waiting for the space to be freed. If the operation(s) never finish, the space will be never freed, and thus they can block themselves and future actions. I suggest to consider throwing an error when the space cannot be allocated – either immediately, or after some waiting timeout, or upon some smart algorithm detecting whether there is a reason to wait. Immediate retreat should probably be simplest, and it's still better than hanging.
            elenst Elena Stepanova made changes -
            Fix Version/s 10.0.12 [ 15201 ]
            Assignee Elena Stepanova [ elenst ] Sergei Golubchik [ serg ]
            Summary EIST could eat all tmpdir space EITS could eat all tmpdir space and hang
            serg Sergei Golubchik made changes -
            Workflow defaullt [ 39827 ] MariaDB v2 [ 43062 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.0.13 [ 16000 ]
            Fix Version/s 10.0.12 [ 15201 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.0.13 [ 16300 ]
            Fix Version/s 10.0 [ 16000 ]

            I cannot repeat it. By looking at the code I see that EITS code (using Unique, using open_cached_file(), using init_io_cache()) does not set MY_WAIT_IF_FULL flag. Also, I ran a simple ANALYZE TABLE ... PERSISTENT FOR ALL statement and manually, in the debugger, emulated disk full condition (write fails, nothing gets written, error=ENOSPC). I got a failure immediately, no retries (and I could see that in the debugger too).

            serg Sergei Golubchik added a comment - I cannot repeat it. By looking at the code I see that EITS code (using Unique , using open_cached_file() , using init_io_cache() ) does not set MY_WAIT_IF_FULL flag. Also, I ran a simple ANALYZE TABLE ... PERSISTENT FOR ALL statement and manually, in the debugger, emulated disk full condition (write fails, nothing gets written, error=ENOSPC). I got a failure immediately, no retries (and I could see that in the debugger too).
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Elena Stepanova [ elenst ]

            I tried it now, and yes, for me it also ends eventually.
            But I can understand why it looks like it hangs forever.

            I started server with use_stat_tables=PREFERABLY optimizer_use_condition_selectivity=3, created the table described above

            CREATE TABLE `url_list` (
              `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
              `url` varchar(2048) NOT NULL DEFAULT '',
              `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
              `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0',
              `id_license` int(10) unsigned NOT NULL DEFAULT '0',
              `dom` mediumint(8) unsigned NOT NULL DEFAULT '0',
              `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0',
              `gen` mediumint(8) unsigned NOT NULL DEFAULT '0',
              `script` varchar(2048) NOT NULL DEFAULT '',
              `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0',
              PRIMARY KEY (`id`),
              KEY `date` (`date`)
            ) ENGINE=TokuDB AUTO_INCREMENT=116935390 DEFAULT CHARSET=latin1 `compression`='tokudb_small' 

            populated it with X rows and ran ANALYZE TABLE; repeated this with tmpdir which had enough space and more, and with a small tmpdir which quickly got filled up during ANALYZE, for X in (200K, 1M, 5M) rows.
            Below are comparative execution times for these 6 experiments.

              big tmpdir small tmpdir
            200,000 rows 21.66 sec 17.74 sec
            1,000,000 rows 120.26 sec 770.00 sec
            5,000,000 rows 661.69 sec 3802.09 sec

            Every time ANALYZE on the big tmpdir ended with

            +---------------+---------+----------+-----------------------------------------+
            | Table         | Op      | Msg_type | Msg_text                                |
            +---------------+---------+----------+-----------------------------------------+
            | test.url_list | analyze | status   | Engine-independent statistics collected |
            | test.url_list | analyze | status   | OK                                      |
            +---------------+---------+----------+-----------------------------------------+

            and on the small tmpdir with this (same 64 errors regardless the number of rows):

            +---------------+---------+----------+----------------------------------------------------------------------------------------------+
            | Table         | Op      | Msg_type | Msg_text                                                                                     |
            +---------------+---------+----------+----------------------------------------------------------------------------------------------+
            | test.url_list | analyze | status   | Engine-independent statistics collected                                                      |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | Error    | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device") |
            | test.url_list | analyze | status   | OK                                                                                           |
            +---------------+---------+----------+----------------------------------------------------------------------------------------------+

            So, with 5M rows ANALYZE took over an hour, and this table is only about 100 MB. If the dynamics holds, and if Jocelyn's table has billions rows, it could take seemingly forever.

            elenst Elena Stepanova added a comment - I tried it now, and yes, for me it also ends eventually. But I can understand why it looks like it hangs forever. I started server with use_stat_tables=PREFERABLY optimizer_use_condition_selectivity=3 , created the table described above CREATE TABLE `url_list` ( `id` int (10) unsigned NOT NULL AUTO_INCREMENT, `url` varchar (2048) NOT NULL DEFAULT '' , ` date ` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP , `id_license_domain` int (10) unsigned NOT NULL DEFAULT '0' , `id_license` int (10) unsigned NOT NULL DEFAULT '0' , `dom` mediumint(8) unsigned NOT NULL DEFAULT '0' , `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0' , `gen` mediumint(8) unsigned NOT NULL DEFAULT '0' , `script` varchar (2048) NOT NULL DEFAULT '' , `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0' , PRIMARY KEY (`id`), KEY ` date ` (` date `) ) ENGINE=TokuDB AUTO_INCREMENT=116935390 DEFAULT CHARSET=latin1 `compression`= 'tokudb_small' populated it with X rows and ran ANALYZE TABLE ; repeated this with tmpdir which had enough space and more, and with a small tmpdir which quickly got filled up during ANALYZE, for X in (200K, 1M, 5M) rows. Below are comparative execution times for these 6 experiments.   big tmpdir small tmpdir 200,000 rows 21.66 sec 17.74 sec 1,000,000 rows 120.26 sec 770.00 sec 5,000,000 rows 661.69 sec 3802.09 sec Every time ANALYZE on the big tmpdir ended with + ---------------+---------+----------+-----------------------------------------+ | Table | Op | Msg_type | Msg_text | + ---------------+---------+----------+-----------------------------------------+ | test.url_list | analyze | status | Engine-independent statistics collected | | test.url_list | analyze | status | OK | + ---------------+---------+----------+-----------------------------------------+ and on the small tmpdir with this (same 64 errors regardless the number of rows): + ---------------+---------+----------+----------------------------------------------------------------------------------------------+ | Table | Op | Msg_type | Msg_text | + ---------------+---------+----------+----------------------------------------------------------------------------------------------+ | test.url_list | analyze | status | Engine-independent statistics collected | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYYU1mLB' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | Error | Error writing file '/home/elenst/small/tmp/MYzWJcgk' (Errcode: 28 "No space left on device" ) | | test.url_list | analyze | status | OK | + ---------------+---------+----------+----------------------------------------------------------------------------------------------+ So, with 5M rows ANALYZE took over an hour, and this table is only about 100 MB. If the dynamics holds, and if Jocelyn's table has billions rows, it could take seemingly forever.
            serg Sergei Golubchik made changes -
            Assignee Elena Stepanova [ elenst ] Sergei Golubchik [ serg ]

            I've created a MyISAM table with the same definition. Loaded 2M rows, the table was ~4G. Set use_stat_tables=PREFERABLY optimizer_use_condition_selectivity=3 just as you did. Ran ANALYZE TABLE. It failed with lots of Error writing file '/tmp/MYvtxG5n' (Errcode: 28 "No space left on device") errors, but in just 9 minutes. I'd say, it's still "cannot repeat".

            serg Sergei Golubchik added a comment - I've created a MyISAM table with the same definition. Loaded 2M rows, the table was ~4G. Set use_stat_tables=PREFERABLY optimizer_use_condition_selectivity=3 just as you did. Ran ANALYZE TABLE . It failed with lots of Error writing file '/tmp/MYvtxG5n' (Errcode: 28 "No space left on device") errors, but in just 9 minutes. I'd say, it's still "cannot repeat".
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Elena Stepanova [ elenst ]

            Hi Sergei,

            Have you tried the same with a tokudb table ? Perhaps it's an engine specific bug ?

            Jocelyn

            jocel1 jocelyn fournier added a comment - Hi Sergei, Have you tried the same with a tokudb table ? Perhaps it's an engine specific bug ? Jocelyn

            @ serg:
            I'm running the same test (same machine, same server, same data) on MyISAM now, started with 5M rows / small tmpdir, it's been 5388 seconds and counting. Even assuming that my machine and disks are much slower than yours, I suppose it's too long. I will leave it overnight to see if it ever ends, and then will run a few more combinations to see if the dynamics is the same as it was above with the TokuDB table.

            Maybe the ratio between the tmpdir size and the table size is important, or the absolute size of the tmpdir is? My "small datadir" is 200 Mb total; and the MyISAM table size with 5M rows is ~320M which is obviously very different from yours (I use small character values).

            It's also worth mentioning that I'm running my tests on a debug build.

            elenst Elena Stepanova added a comment - @ serg : I'm running the same test (same machine, same server, same data) on MyISAM now, started with 5M rows / small tmpdir, it's been 5388 seconds and counting. Even assuming that my machine and disks are much slower than yours, I suppose it's too long. I will leave it overnight to see if it ever ends, and then will run a few more combinations to see if the dynamics is the same as it was above with the TokuDB table. Maybe the ratio between the tmpdir size and the table size is important, or the absolute size of the tmpdir is? My "small datadir" is 200 Mb total; and the MyISAM table size with 5M rows is ~320M which is obviously very different from yours (I use small character values). It's also worth mentioning that I'm running my tests on a debug build.
            elenst Elena Stepanova added a comment - - edited

            So, with MyISAM I got similar results. The machine is not benchmark-tuned, so not only is it slow, but results can deviate a lot, on different reasons. But the trend still holds:

              big tmpdir small tmpdir
            200,000 rows 17.06 sec 9.70 sec
            1,000,000 rows 78.35 sec 723.99 sec
            5,000,000 rows 901.04 sec 6610.53 sec

            Same 64 errors in each case on the small tmpdir.

            elenst Elena Stepanova added a comment - - edited So, with MyISAM I got similar results. The machine is not benchmark-tuned, so not only is it slow, but results can deviate a lot, on different reasons. But the trend still holds:   big tmpdir small tmpdir 200,000 rows 17.06 sec 9.70 sec 1,000,000 rows 78.35 sec 723.99 sec 5,000,000 rows 901.04 sec 6610.53 sec Same 64 errors in each case on the small tmpdir.
            elenst Elena Stepanova added a comment - - edited

            Based on IRC discussion, I ran the tests from a different angle.
            The table is of the same structure as above, but MyISAM, to rule out TokuDB specifics.
            All runs below are on 2,000,000 rows dataset.
            The rows are inserted in two different ways.
            INSERT1:
            insert t1 select seq, repeat(seq, 1000/length(seq)), date'2001-02-03' + interval seq second, seq % 20, seq, seq % 1000, 60 + seq % 25, seq % 100, repeat(seq, 1000/length(seq)), seq % 1000 from seq_1_to_2000000
            INSERT2:
            insert into t3 select seq, concat('http://my.url',seq), now(), seq, seq, seq, seq, seq, concat('script',seq), seq from seq_1_to_2000000

            So, while INSERTs are similar, they produce essentially different data layouts.

            The tests were run on tmpdirs with 3 different free space volume: 16 Gb, 8 Gb, 300 Mb.
            On 16 Gb, in both cases ANALYZE passes. At most it takes ~53% of tmpdir space, so it's slightly above 8 Gb.
            On 8 Gb and 300 Mb, it fails with the same infamous 64 disk space errors.
            Execution time:

              16 Gb 8 Gb 300 Mb
            INSERT1 332.07 sec 206.35 sec 920.29 sec
            INSERT2 314.79 sec 172.25 sec 2187.84 sec

            I believe the difference on 8 Gb can be ignored, it might well be a random fluctuation. The difference on 300 Mb is noticeable though. – both the horizontal and vertical difference. It is not random, I double-checked.

            elenst Elena Stepanova added a comment - - edited Based on IRC discussion, I ran the tests from a different angle. The table is of the same structure as above, but MyISAM, to rule out TokuDB specifics. All runs below are on 2,000,000 rows dataset. The rows are inserted in two different ways. INSERT1: insert t1 select seq, repeat(seq, 1000/length(seq)), date'2001-02-03' + interval seq second, seq % 20, seq, seq % 1000, 60 + seq % 25, seq % 100, repeat(seq, 1000/length(seq)), seq % 1000 from seq_1_to_2000000 INSERT2: insert into t3 select seq, concat('http://my.url',seq), now(), seq, seq, seq, seq, seq, concat('script',seq), seq from seq_1_to_2000000 So, while INSERTs are similar, they produce essentially different data layouts. The tests were run on tmpdirs with 3 different free space volume: 16 Gb, 8 Gb, 300 Mb. On 16 Gb, in both cases ANALYZE passes. At most it takes ~53% of tmpdir space, so it's slightly above 8 Gb. On 8 Gb and 300 Mb, it fails with the same infamous 64 disk space errors. Execution time:   16 Gb 8 Gb 300 Mb INSERT1 332.07 sec 206.35 sec 920.29 sec INSERT2 314.79 sec 172.25 sec 2187.84 sec I believe the difference on 8 Gb can be ignored, it might well be a random fluctuation. The difference on 300 Mb is noticeable though. – both the horizontal and vertical difference. It is not random, I double-checked.
            elenst Elena Stepanova made changes -
            Assignee Elena Stepanova [ elenst ] Sergei Golubchik [ serg ]

            Motivated by CSC#7897, I've tried to see how much space can ANALYZE use. (So far my understanding was that max. temp dir usage would be rougly equal to size of the table w/o indexes).

            Let's try this:

            create table t1 (pk int primary key, col1 varchar(100)) charset=utf8;
            insert into t1 select A.a+B.a*1000, concat('val-', A.a+B.a*1000) from one_k A, one_k B;

            The table has 1M rows. Rows in the VARCHAR(100) column occupy less than
            300 bytes (which is typical).

            The table on disk:

             -rw-rw---- 1 psergey psergey  44M Aug  1 23:09 t1.ibd

            Now,

            analyze table t1 persistent for all;

              Breakpoint 4, unique_write_to_file_with_count (key=0x7ffebf2ce0a0 "\005", count=1, unique=0x7ffebf047ef0) at /home/psergey/dev2/10.0/sql/uniques.cc:53
            (gdb) print key
              $7 = (uchar *) 0x7ffebf2ce0a0 "\005"
            (gdb)  
            (gdb) p unique->size
              $8 = 302

            It's writing 300 bytes, the unpacked length.

            In total, we get:

              Breakpoint 3, Unique::flush (this=0x7ffebf047ef0) at /home/psergey/dev2/10.0/sql/uniques.cc:379
              $46 = 281,734,200
              Breakpoint 3, Unique::flush (this=0x7ffebf047ef0) at /home/psergey/dev2/10.0/sql/uniques.cc:379
              $48 = 297,386,100
            ...
              Breakpoint 3, Unique::flush (this=0x7ffebf047a30) at /home/psergey/dev2/10.0/sql/uniques.cc:379
              $47 = 4,194,304

            300MBit = 286 M, i.e. ANALYZE may require much more space than is occupied by the table (44M)

            psergei Sergei Petrunia added a comment - Motivated by CSC#7897, I've tried to see how much space can ANALYZE use. (So far my understanding was that max. temp dir usage would be rougly equal to size of the table w/o indexes). Let's try this: create table t1 (pk int primary key, col1 varchar(100)) charset=utf8; insert into t1 select A.a+B.a*1000, concat('val-', A.a+B.a*1000) from one_k A, one_k B; The table has 1M rows. Rows in the VARCHAR(100) column occupy less than 300 bytes (which is typical). The table on disk: -rw-rw---- 1 psergey psergey 44M Aug 1 23:09 t1.ibd Now, analyze table t1 persistent for all; Breakpoint 4, unique_write_to_file_with_count (key=0x7ffebf2ce0a0 "\005", count=1, unique=0x7ffebf047ef0) at /home/psergey/dev2/10.0/sql/uniques.cc:53 (gdb) print key $7 = (uchar *) 0x7ffebf2ce0a0 "\005" (gdb) (gdb) p unique->size $8 = 302 It's writing 300 bytes, the unpacked length. In total, we get: Breakpoint 3, Unique::flush (this=0x7ffebf047ef0) at /home/psergey/dev2/10.0/sql/uniques.cc:379 $46 = 281,734,200 Breakpoint 3, Unique::flush (this=0x7ffebf047ef0) at /home/psergey/dev2/10.0/sql/uniques.cc:379 $48 = 297,386,100 ... Breakpoint 3, Unique::flush (this=0x7ffebf047a30) at /home/psergey/dev2/10.0/sql/uniques.cc:379 $47 = 4,194,304 300MBit = 286 M, i.e. ANALYZE may require much more space than is occupied by the table (44M)
            psergei Sergei Petrunia added a comment - - edited

            If we can't quickly fix ANALYZE to take less diskspace, we could have it refuse to run if there is a risk of running out of disk space. The thing is, current way ANALYZE uses diskspace is inefficient, but it is very predictable.

            POSIX.1-2001 provides a function to check free disk space, statvfs. TokuDB uses is. It seems, my_XXXX portability layer doesn't have a function to check for free disk space.

            psergei Sergei Petrunia added a comment - - edited If we can't quickly fix ANALYZE to take less diskspace, we could have it refuse to run if there is a risk of running out of disk space. The thing is, current way ANALYZE uses diskspace is inefficient, but it is very predictable. POSIX.1-2001 provides a function to check free disk space, statvfs. TokuDB uses is. It seems, my_XXXX portability layer doesn't have a function to check for free disk space.
            serg Sergei Golubchik made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            serg Sergei Golubchik added a comment - - edited

            The "EITS could eat all tmpdir space" is not fixed, the "hang" is.

            serg Sergei Golubchik added a comment - - edited The "EITS could eat all tmpdir space" is not fixed, the "hang" is.
            serg Sergei Golubchik made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            psergei Sergei Petrunia made changes -
            ratzpo Rasmus Johansson (Inactive) made changes -
            Workflow MariaDB v2 [ 43062 ] MariaDB v3 [ 64843 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 64843 ] MariaDB v4 [ 147846 ]
            monty Michael Widenius made changes -
            monty Michael Widenius made changes -

            People

              serg Sergei Golubchik
              jocel1 jocelyn fournier
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.