Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34988

InnoDB locks dict_sys.latch for a long time during ALTER TABLE

Details

    Description

      Our MariaDB servers with many thousands of clients sometimes encounter slow ALTER TABLE queries (up to 25 seconds) with adjacent read-only queries also being just as slow; all read-only queries apparently wait for the one ALTER TABLE to finish the commit.
      Using two BPF scripts (https://github.com/iovisor/bcc/pull/5108 and https://github.com/iovisor/bcc/pull/5112), I pinpointed this to contention on dict_sys.latch; a common backtrace looks like this:

        syscall
        unlock_and_close_files(std::vector<pfs_os_file_t, std::allocator<pfs_os_file_t> > const&, trx_t*)
        ha_innobase::commit_inplace_alter_table(TABLE*, Alter_inplace_info*, bool)
        mysql_inplace_alter_table(THD*, TABLE_LIST*, TABLE*, TABLE*, Alter_inplace_info*, MDL_request*, st_ddl_log_state*, TRIGGER_RENAME_PARAM*, Alter_table_ctx*, bool&, unsigned long long&, bool) [clone .constprop.0]
        mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, Table_specification_st*, TABLE_LIST*, Recreate_info*, Alter_info*, unsigned int, st_order*, bool, bool)
        Sql_cmd_alter_table::execute(THD*)
        mysql_execute_command(THD*, bool)
        mysql_parse(THD*, char*, unsigned int, Parser_state*)
        dispatch_command(enum_server_command, THD*, char*, unsigned int, bool)
        do_command(THD*, bool)
      

      The method ha_innobase::commit_inplace_alter_table indeed does most of its I/O while having dict_sys.latch locked exclusively. I believe that one should not do any I/O while holding such a (global) lock, one that is commonly locked for all queries, including read-only queries.
      This is a major scalability issue for us, because it is easy for (unprivileged) MariaDB users to lock up the whole daemon (including all databases and all catalogs) and it could be used to DoS the MariaDB server.

      Attachments

        Issue Links

          Activity

            It is correct that the last phase of ha_innobase::commit_inplace_alter_table(commit=true) is being covered by an exclusive dict_sys.latch.

            The function unlock_and_close_files() is part of the MDEV-8069 logic. DDL operations may need to delete some files. Invoking the unlink() system call while holding an exclusive latch would be a bad idea if this would lead to deleting a huge file. Therefore, we retain an open handle to the file, and we close these file handles right after we release the dict_sys.latch.

            Since you have mentioned OPTIMIZE TABLE related to this, I have to mention MDEV-15641, where dict_sys.latch was called dict_operation_lock (and accompanied by dict_sys.mutex). In a table-rebuilding operation such as OPTIMIZE TABLE or ALTER TABLE…FORCE or anything that is listed red or blue in MDEV-11424, we may have a design problem that some log from concurrent DML operations may accumulate while the DDL operation is waiting for the DML operations to finish. Due to this, the final call to row_log_table_apply() that we are invoking while holding exclusive dict_sys.latch could take an excessive amount of time. It is possible that this was fixed in MDEV-15250 by making each DML operation apply their own changes to the being-rebuilt table during transaction commit.

            marko Marko Mäkelä added a comment - It is correct that the last phase of ha_innobase::commit_inplace_alter_table(commit=true) is being covered by an exclusive dict_sys.latch . The function unlock_and_close_files() is part of the MDEV-8069 logic. DDL operations may need to delete some files. Invoking the unlink() system call while holding an exclusive latch would be a bad idea if this would lead to deleting a huge file. Therefore, we retain an open handle to the file, and we close these file handles right after we release the dict_sys.latch . Since you have mentioned OPTIMIZE TABLE related to this, I have to mention MDEV-15641 , where dict_sys.latch was called dict_operation_lock (and accompanied by dict_sys.mutex ). In a table-rebuilding operation such as OPTIMIZE TABLE or ALTER TABLE…FORCE or anything that is listed red or blue in MDEV-11424 , we may have a design problem that some log from concurrent DML operations may accumulate while the DDL operation is waiting for the DML operations to finish. Due to this, the final call to row_log_table_apply() that we are invoking while holding exclusive dict_sys.latch could take an excessive amount of time. It is possible that this was fixed in MDEV-15250 by making each DML operation apply their own changes to the being-rebuilt table during transaction commit.

            Could you please provide some profiling data that shows what ha_innobase::commit_inplace_alter_table() is spending its time on while holding exclusive dict_sys.latch? One way to achieve this would be to run perf record -g -p $(pgrep mariadbd) -t (thread_id_of_OPTIMIZE_TABLE) and repeatedly execute the OPTIMIZE TABLE statement from the same connection. If this is related to the function row_log_table_apply(), then there should also be some concurrent writes from other connections to the table that is being optimized.

            I don’t think that it is possible to remove the use of exclusive dict_sys.latch from DDL operations altogether, but it could be possible to reduce the duration of the critical section once we know the worst bottlenecks.

            marko Marko Mäkelä added a comment - Could you please provide some profiling data that shows what ha_innobase::commit_inplace_alter_table() is spending its time on while holding exclusive dict_sys.latch ? One way to achieve this would be to run perf record -g -p $(pgrep mariadbd) -t (thread_id_of_OPTIMIZE_TABLE) and repeatedly execute the OPTIMIZE TABLE statement from the same connection. If this is related to the function row_log_table_apply() , then there should also be some concurrent writes from other connections to the table that is being optimized. I don’t think that it is possible to remove the use of exclusive dict_sys.latch from DDL operations altogether, but it could be possible to reduce the duration of the critical section once we know the worst bottlenecks.

            > Could you please provide some profiling data that shows what ha_innobase::commit_inplace_alter_table() is spending its time on while holding exclusive dict_sys.latch? One way to achieve this would be to run perf record -g -p $(pgrep mariadbd) -t (thread_id_of_OPTIMIZE_TABLE) and repeatedly execute the OPTIMIZE TABLE statement from the same connection.

            I wish I could do that easily, but the extremely slow queries cannot be reproduced (easily); they occur randomly, and when they do, the MariaDB process blocks all threads. This ticket isn't about those initial slow queries; it's about the global avalanche leaking to all threads that then follows.

            maxk Max Kellermann added a comment - > Could you please provide some profiling data that shows what ha_innobase::commit_inplace_alter_table() is spending its time on while holding exclusive dict_sys.latch? One way to achieve this would be to run perf record -g -p $(pgrep mariadbd) -t (thread_id_of_OPTIMIZE_TABLE) and repeatedly execute the OPTIMIZE TABLE statement from the same connection. I wish I could do that easily, but the extremely slow queries cannot be reproduced (easily); they occur randomly, and when they do, the MariaDB process blocks all threads. This ticket isn't about those initial slow queries; it's about the global avalanche leaking to all threads that then follows.

            MDEV-35154 describes an idea of a dict_table_t ‘stub’, which should allow the critical section of dict_sys.latch to be reduced. Ideally, no buffer pool operations should be allowed while the caller holds dict_sys.latch in any mode.

            marko Marko Mäkelä added a comment - MDEV-35154 describes an idea of a dict_table_t ‘stub’, which should allow the critical section of dict_sys.latch to be reduced. Ideally, no buffer pool operations should be allowed while the caller holds dict_sys.latch in any mode.

            People

              marko Marko Mäkelä
              maxk Max Kellermann
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.