Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9459

Truncate table causes innodb stalls

Details

    Description

      Feature request:

      Backport the MySQL 5.8 fix for upstream bug https://bugs.mysql.com/bug.php?id=68184 for MariaDB 10.2

      This performance issue seems more pronounced with Galera Cluster feature enabled using TOI but also affects standalone instances (admittedly I have seen this in the 10.0 series but have not tested 10.1 for this problem directly)

      Attachments

        Issue Links

          Activity

            As I noted in MDEV-13328, I believe that it is feasible to fix some of this even while keeping the current InnoDB data dictionary.

            It is possible that some bottleneck related to redo log checkpoint will remain until the crash recovery of TRUNCATE operations has been rewritten to rely on an undo log record. Such a change would necessarily be part of MDEV-515.

            marko Marko Mäkelä added a comment - As I noted in MDEV-13328 , I believe that it is feasible to fix some of this even while keeping the current InnoDB data dictionary. It is possible that some bottleneck related to redo log checkpoint will remain until the crash recovery of TRUNCATE operations has been rewritten to rely on an undo log record. Such a change would necessarily be part of MDEV-515 .

            I now realize that the original request was filed against MySQL 5.5.

            The original request was that in TRUNCATE, InnoDB should release the file space while not holding any lock. To do that, we must change the undo or redo logging, so that we can ensure that no orphan files will be left behind in case the server is killed after the TRUNCATE transaction internally commits.

            A similar problem exists in DROP TABLE. It would be better to release all locks and only after that start deleting the file. This would improve the performance for dropping large or heavily fragmented tables. We might be able to fix this in MariaDB 10.3.

            I believe that in MySQL 5.7 (and in MariaDB 10.2.2 and 10.3), the performance of TRUNCATE degraded in some scenarios. Because the crash recovery of TRUNCATE uses separate log files, a redo log checkpoint will be part of the TRUNCATE execution. This could force all dirty pages to be written out from the buffer pool to the data files. I am afraid that we cannot fix this problem before MariaDB 10.4.

            marko Marko Mäkelä added a comment - I now realize that the original request was filed against MySQL 5.5. The original request was that in TRUNCATE, InnoDB should release the file space while not holding any lock. To do that, we must change the undo or redo logging, so that we can ensure that no orphan files will be left behind in case the server is killed after the TRUNCATE transaction internally commits. A similar problem exists in DROP TABLE. It would be better to release all locks and only after that start deleting the file. This would improve the performance for dropping large or heavily fragmented tables. We might be able to fix this in MariaDB 10.3. I believe that in MySQL 5.7 (and in MariaDB 10.2.2 and 10.3), the performance of TRUNCATE degraded in some scenarios. Because the crash recovery of TRUNCATE uses separate log files, a redo log checkpoint will be part of the TRUNCATE execution. This could force all dirty pages to be written out from the buffer pool to the data files. I am afraid that we cannot fix this problem before MariaDB 10.4.

            The WL#7016 that I mentioned in my earlier comment (and in some MySQL 8.0 née 5.8 commit messages) will likely remain private forever. Its contents was copied to another number, WL#9536 which has been slightly modified.

            At the time I fixed WL#6795 in the development version of MySQL (called 5.8 at that time), the crash recovery part was not yet implemented. MySQL 8.0.3 was the first version to implement it.
            In MySQL 8.0.3, the TRUNCATE indeed is implemented as rename, drop, and create. The question is: Does the "drop" part really fulfill the request that was stated in MySQL Bug #68184?
            Let us look at innobase_basic_ddl::delete_impl() which implements the "drop" part of TRUNCATE in MySQL 8.0.3:

            	error = row_drop_table_for_mysql(
            		norm_name, trx, sqlcom, true, handler);
            

            The caller does not appear to hold any global locks at this point. This operation does not look completely transactional or crash-safe, because row_drop_table_for_mysql_in_background() was not removed. But, the file deletion in fact does seem to be executed in the innobase_post_ddl() function, apparently after all locks have been released in Sql_cmd_truncate_table::truncate_table(). This corresponds to what I designed in WL#7016 and what was copied to WL#9536.

            MariaDB cannot port the change from MySQL 8.0, because the solution depends on the Global Data Dictionary, which MariaDB will not implement in that form.
            I think that MariaDB should eventually implement transactional logging and recovery for .frm files and DDL operations and remove the InnoDB internal data dictionary, and make InnoDB data files self-consistent so that InnoDB can find all relevant metadata in the .frm and .ibd files.
            In MariaDB, TRUNCATE is an operation that does not modify .frm files. So, the TRUNCATE logging and recovery would have to be done completely inside InnoDB.

            marko Marko Mäkelä added a comment - The WL#7016 that I mentioned in my earlier comment (and in some MySQL 8.0 née 5.8 commit messages) will likely remain private forever. Its contents was copied to another number, WL#9536 which has been slightly modified. At the time I fixed WL#6795 in the development version of MySQL (called 5.8 at that time), the crash recovery part was not yet implemented. MySQL 8.0.3 was the first version to implement it. In MySQL 8.0.3, the TRUNCATE indeed is implemented as rename, drop, and create. The question is: Does the "drop" part really fulfill the request that was stated in MySQL Bug #68184 ? Let us look at innobase_basic_ddl::delete_impl() which implements the "drop" part of TRUNCATE in MySQL 8.0.3: error = row_drop_table_for_mysql( norm_name, trx, sqlcom, true, handler); The caller does not appear to hold any global locks at this point. This operation does not look completely transactional or crash-safe, because row_drop_table_for_mysql_in_background() was not removed. But, the file deletion in fact does seem to be executed in the innobase_post_ddl() function, apparently after all locks have been released in Sql_cmd_truncate_table::truncate_table(). This corresponds to what I designed in WL#7016 and what was copied to WL#9536. MariaDB cannot port the change from MySQL 8.0, because the solution depends on the Global Data Dictionary, which MariaDB will not implement in that form. I think that MariaDB should eventually implement transactional logging and recovery for .frm files and DDL operations and remove the InnoDB internal data dictionary, and make InnoDB data files self-consistent so that InnoDB can find all relevant metadata in the .frm and .ibd files. In MariaDB, TRUNCATE is an operation that does not modify .frm files. So, the TRUNCATE logging and recovery would have to be done completely inside InnoDB.

            This was fixed as part of MDEV-13564.

            marko Marko Mäkelä added a comment - This was fixed as part of MDEV-13564 .

            MDEV-13564 rewrote TRUNCATE TABLE in MariaDB 10.2.19 and 10.3.10 as a combination of RENAME, CREATE, DROP.

            A remaining major cause of stalls would be addressed in MDEV-8069, where we should delete files of dropped tables after releasing all locks.

            marko Marko Mäkelä added a comment - MDEV-13564 rewrote TRUNCATE TABLE in MariaDB 10.2.19 and 10.3.10 as a combination of RENAME, CREATE, DROP. A remaining major cause of stalls would be addressed in MDEV-8069 , where we should delete files of dropped tables after releasing all locks.

            People

              marko Marko Mäkelä
              mg MG
              Votes:
              4 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.