Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9459

Truncate table causes innodb stalls

Details

    Description

      Feature request:

      Backport the MySQL 5.8 fix for upstream bug https://bugs.mysql.com/bug.php?id=68184 for MariaDB 10.2

      This performance issue seems more pronounced with Galera Cluster feature enabled using TOI but also affects standalone instances (admittedly I have seen this in the 10.0 series but have not tested 10.1 for this problem directly)

      Attachments

        Issue Links

          Activity

            mg MG created issue -

            axel, jplindst, do you know if 10.1 is affected by this issue?

            elenst Elena Stepanova added a comment - axel , jplindst , do you know if 10.1 is affected by this issue?

            Yes it is, but MySQL 5.8.0 is not yet released so back port is not yet possible. Furthermore, as one can see from developer comments there is no known working solution for 5.6 based MySQL.

            jplindst Jan Lindström (Inactive) added a comment - Yes it is, but MySQL 5.8.0 is not yet released so back port is not yet possible. Furthermore, as one can see from developer comments there is no known working solution for 5.6 based MySQL.

            Apparently, despite the comment by Vasil Dimov, the change didn't make it to 5.7 after all, and it will be impossible to backport the fix to 5.7-based MySQL, too?

            Anyway, I'm setting it to confirmed according to Jan's comment above, and for now I'll set the fix version to 10.2 so it stays on the radar. If 5.8 InnoDB won't make it to 10.2, which most likely will be the case, then the fix version will be updated to a later release.

            elenst Elena Stepanova added a comment - Apparently, despite the comment by Vasil Dimov, the change didn't make it to 5.7 after all, and it will be impossible to backport the fix to 5.7-based MySQL, too? Anyway, I'm setting it to confirmed according to Jan's comment above, and for now I'll set the fix version to 10.2 so it stays on the radar. If 5.8 InnoDB won't make it to 10.2, which most likely will be the case, then the fix version will be updated to a later release.
            elenst Elena Stepanova made changes -
            Field Original Value New Value
            Fix Version/s 10.2 [ 14601 ]
            elenst Elena Stepanova made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            mg MG added a comment -

            Upstream 8.0.0-dmr has changed fil_prepare_for_truncate and fil_truncate_tablespace in ./storage/innobase/fil/fil0fil.cc I believe to address this

            mg MG added a comment - Upstream 8.0.0-dmr has changed fil_prepare_for_truncate and fil_truncate_tablespace in ./storage/innobase/fil/fil0fil.cc I believe to address this

            There are two recent changes to TRUNCATE TABLE in MySQL.
            In MySQL 5.7, http://dev.mysql.com/worklog/task/?id=6501 implemented crash-safe TRUNCATE for single-table tablespaces in such a way that the *.ibd file is re-created with the same tablespace ID. For shared tablespaces (the InnoDB system tablespace or tablespaces created with CREATE TABLESPACE, which was introduced in MySQL 5.7), the files will not shrink and the logic will be somewhat different.
            MySQL 8.0.0 includes WL#6795 as part of the WL#6378 umbrella (Global Data Dictionary). WL#6795 removes the low-level TRUNCATE code from InnoDB and implements TRUNCATE as a simple combination of DROP TABLE and CREATE TABLE. This would rely on a future worklog WL#7016 to implement atomic, crash-safe DDL. In MySQL 8.0.0, fil_truncate_tablespace() has nothing to do with TRUNCATE TABLE. It is only used for undo tablespace truncation: http://dev.mysql.com/worklog/task/?id=6965

            When it comes to stalls, InnoDB fixed stalls for DROP TABLE quite some time ago. The WL#6501 TRUNCATE TABLE in MySQL 5.7 is holding dict_sys->mutex and dict_operation_lock while performing time-consuming tasks, such as dropping any adaptive hash index entries (pointers from buf_block_t::index to to-be-dropped dict_index_t).

            marko Marko Mäkelä added a comment - There are two recent changes to TRUNCATE TABLE in MySQL. In MySQL 5.7, http://dev.mysql.com/worklog/task/?id=6501 implemented crash-safe TRUNCATE for single-table tablespaces in such a way that the *.ibd file is re-created with the same tablespace ID. For shared tablespaces (the InnoDB system tablespace or tablespaces created with CREATE TABLESPACE, which was introduced in MySQL 5.7), the files will not shrink and the logic will be somewhat different. MySQL 8.0.0 includes WL#6795 as part of the WL#6378 umbrella (Global Data Dictionary). WL#6795 removes the low-level TRUNCATE code from InnoDB and implements TRUNCATE as a simple combination of DROP TABLE and CREATE TABLE. This would rely on a future worklog WL#7016 to implement atomic, crash-safe DDL. In MySQL 8.0.0, fil_truncate_tablespace() has nothing to do with TRUNCATE TABLE. It is only used for undo tablespace truncation: http://dev.mysql.com/worklog/task/?id=6965 When it comes to stalls, InnoDB fixed stalls for DROP TABLE quite some time ago. The WL#6501 TRUNCATE TABLE in MySQL 5.7 is holding dict_sys->mutex and dict_operation_lock while performing time-consuming tasks, such as dropping any adaptive hash index entries (pointers from buf_block_t::index to to-be-dropped dict_index_t).
            elenst Elena Stepanova made changes -
            Labels upstream-fixed 10.2-ga upstream-fixed
            ratzpo Rasmus Johansson (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ]
            jplindst Jan Lindström (Inactive) made changes -
            Labels 10.2-ga upstream-fixed upstream-fixed
            jplindst Jan Lindström (Inactive) made changes -
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.2 [ 14601 ]

            While we definitely would like to have atomic truncate with ho stalls, I do not see that feasible on 10.2 with current deadlines. We have plans for own own global DD, thus moving this to 10.4 for now.

            jplindst Jan Lindström (Inactive) added a comment - While we definitely would like to have atomic truncate with ho stalls, I do not see that feasible on 10.2 with current deadlines. We have plans for own own global DD, thus moving this to 10.4 for now.
            marko Marko Mäkelä made changes -

            As I noted in MDEV-13328, I believe that it is feasible to fix some of this even while keeping the current InnoDB data dictionary.

            It is possible that some bottleneck related to redo log checkpoint will remain until the crash recovery of TRUNCATE operations has been rewritten to rely on an undo log record. Such a change would necessarily be part of MDEV-515.

            marko Marko Mäkelä added a comment - As I noted in MDEV-13328 , I believe that it is feasible to fix some of this even while keeping the current InnoDB data dictionary. It is possible that some bottleneck related to redo log checkpoint will remain until the crash recovery of TRUNCATE operations has been rewritten to rely on an undo log record. Such a change would necessarily be part of MDEV-515 .
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Component/s Storage Engine - XtraDB [ 10135 ]
            marko Marko Mäkelä made changes -
            Affects Version/s 10.3 [ 22126 ]
            marko Marko Mäkelä made changes -
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            marko Marko Mäkelä made changes -
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]

            I now realize that the original request was filed against MySQL 5.5.

            The original request was that in TRUNCATE, InnoDB should release the file space while not holding any lock. To do that, we must change the undo or redo logging, so that we can ensure that no orphan files will be left behind in case the server is killed after the TRUNCATE transaction internally commits.

            A similar problem exists in DROP TABLE. It would be better to release all locks and only after that start deleting the file. This would improve the performance for dropping large or heavily fragmented tables. We might be able to fix this in MariaDB 10.3.

            I believe that in MySQL 5.7 (and in MariaDB 10.2.2 and 10.3), the performance of TRUNCATE degraded in some scenarios. Because the crash recovery of TRUNCATE uses separate log files, a redo log checkpoint will be part of the TRUNCATE execution. This could force all dirty pages to be written out from the buffer pool to the data files. I am afraid that we cannot fix this problem before MariaDB 10.4.

            marko Marko Mäkelä added a comment - I now realize that the original request was filed against MySQL 5.5. The original request was that in TRUNCATE, InnoDB should release the file space while not holding any lock. To do that, we must change the undo or redo logging, so that we can ensure that no orphan files will be left behind in case the server is killed after the TRUNCATE transaction internally commits. A similar problem exists in DROP TABLE. It would be better to release all locks and only after that start deleting the file. This would improve the performance for dropping large or heavily fragmented tables. We might be able to fix this in MariaDB 10.3. I believe that in MySQL 5.7 (and in MariaDB 10.2.2 and 10.3), the performance of TRUNCATE degraded in some scenarios. Because the crash recovery of TRUNCATE uses separate log files, a redo log checkpoint will be part of the TRUNCATE execution. This could force all dirty pages to be written out from the buffer pool to the data files. I am afraid that we cannot fix this problem before MariaDB 10.4.

            The WL#7016 that I mentioned in my earlier comment (and in some MySQL 8.0 née 5.8 commit messages) will likely remain private forever. Its contents was copied to another number, WL#9536 which has been slightly modified.

            At the time I fixed WL#6795 in the development version of MySQL (called 5.8 at that time), the crash recovery part was not yet implemented. MySQL 8.0.3 was the first version to implement it.
            In MySQL 8.0.3, the TRUNCATE indeed is implemented as rename, drop, and create. The question is: Does the "drop" part really fulfill the request that was stated in MySQL Bug #68184?
            Let us look at innobase_basic_ddl::delete_impl() which implements the "drop" part of TRUNCATE in MySQL 8.0.3:

            	error = row_drop_table_for_mysql(
            		norm_name, trx, sqlcom, true, handler);
            

            The caller does not appear to hold any global locks at this point. This operation does not look completely transactional or crash-safe, because row_drop_table_for_mysql_in_background() was not removed. But, the file deletion in fact does seem to be executed in the innobase_post_ddl() function, apparently after all locks have been released in Sql_cmd_truncate_table::truncate_table(). This corresponds to what I designed in WL#7016 and what was copied to WL#9536.

            MariaDB cannot port the change from MySQL 8.0, because the solution depends on the Global Data Dictionary, which MariaDB will not implement in that form.
            I think that MariaDB should eventually implement transactional logging and recovery for .frm files and DDL operations and remove the InnoDB internal data dictionary, and make InnoDB data files self-consistent so that InnoDB can find all relevant metadata in the .frm and .ibd files.
            In MariaDB, TRUNCATE is an operation that does not modify .frm files. So, the TRUNCATE logging and recovery would have to be done completely inside InnoDB.

            marko Marko Mäkelä added a comment - The WL#7016 that I mentioned in my earlier comment (and in some MySQL 8.0 née 5.8 commit messages) will likely remain private forever. Its contents was copied to another number, WL#9536 which has been slightly modified. At the time I fixed WL#6795 in the development version of MySQL (called 5.8 at that time), the crash recovery part was not yet implemented. MySQL 8.0.3 was the first version to implement it. In MySQL 8.0.3, the TRUNCATE indeed is implemented as rename, drop, and create. The question is: Does the "drop" part really fulfill the request that was stated in MySQL Bug #68184 ? Let us look at innobase_basic_ddl::delete_impl() which implements the "drop" part of TRUNCATE in MySQL 8.0.3: error = row_drop_table_for_mysql( norm_name, trx, sqlcom, true, handler); The caller does not appear to hold any global locks at this point. This operation does not look completely transactional or crash-safe, because row_drop_table_for_mysql_in_background() was not removed. But, the file deletion in fact does seem to be executed in the innobase_post_ddl() function, apparently after all locks have been released in Sql_cmd_truncate_table::truncate_table(). This corresponds to what I designed in WL#7016 and what was copied to WL#9536. MariaDB cannot port the change from MySQL 8.0, because the solution depends on the Global Data Dictionary, which MariaDB will not implement in that form. I think that MariaDB should eventually implement transactional logging and recovery for .frm files and DDL operations and remove the InnoDB internal data dictionary, and make InnoDB data files self-consistent so that InnoDB can find all relevant metadata in the .frm and .ibd files. In MariaDB, TRUNCATE is an operation that does not modify .frm files. So, the TRUNCATE logging and recovery would have to be done completely inside InnoDB.
            serg Sergei Golubchik made changes -
            Affects Version/s 10.2 [ 14601 ]
            Affects Version/s 10.3 [ 22126 ]
            Environment Large innodb_buffer_pool_size and innodb_file_per_table=1
            Issue Type Bug [ 1 ] Task [ 3 ]
            valerii Valerii Kravchuk made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            marko Marko Mäkelä made changes -

            This was fixed as part of MDEV-13564.

            marko Marko Mäkelä added a comment - This was fixed as part of MDEV-13564 .
            marko Marko Mäkelä made changes -
            Fix Version/s 10.3.10 [ 23140 ]
            Fix Version/s 10.4 [ 22408 ]
            Assignee Jan Lindström [ jplindst ] Marko Mäkelä [ marko ]
            Resolution Fixed [ 1 ]
            Status Confirmed [ 10101 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 10.2.19 [ 23207 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            MDEV-13564 rewrote TRUNCATE TABLE in MariaDB 10.2.19 and 10.3.10 as a combination of RENAME, CREATE, DROP.

            A remaining major cause of stalls would be addressed in MDEV-8069, where we should delete files of dropped tables after releasing all locks.

            marko Marko Mäkelä added a comment - MDEV-13564 rewrote TRUNCATE TABLE in MariaDB 10.2.19 and 10.3.10 as a combination of RENAME, CREATE, DROP. A remaining major cause of stalls would be addressed in MDEV-8069 , where we should delete files of dropped tables after releasing all locks.
            GeoffMontee Geoff Montee (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels upstream-fixed ServiceNow upstream-fixed
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels ServiceNow upstream-fixed 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z upstream-fixed
            serg Sergei Golubchik made changes -
            Labels 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z upstream-fixed upstream-fixed
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 73739 ] MariaDB v4 [ 132775 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 201658 157739 150004 165058
            Zendesk active tickets 201658

            People

              marko Marko Mäkelä
              mg MG
              Votes:
              4 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.