[MDEV-34988] InnoDB locks dict_sys.latch for a long time during ALTER TABLE - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.11.8
Fix Version/s: 10.11
Component/s: Storage Engine - InnoDB
Labels:
- performance
Environment:
Debian Bookworm

Description

Our MariaDB servers with many thousands of clients sometimes encounter slow ALTER TABLE queries (up to 25 seconds) with adjacent read-only queries also being just as slow; all read-only queries apparently wait for the one ALTER TABLE to finish the commit.
Using two BPF scripts (https://github.com/iovisor/bcc/pull/5108 and https://github.com/iovisor/bcc/pull/5112), I pinpointed this to contention on dict_sys.latch; a common backtrace looks like this:

  syscall

  unlock_and_close_files(std::vector<pfs_os_file_t, std::allocator<pfs_os_file_t> > const&, trx_t*)

  ha_innobase::commit_inplace_alter_table(TABLE*, Alter_inplace_info*, bool)

  mysql_inplace_alter_table(THD*, TABLE_LIST*, TABLE*, TABLE*, Alter_inplace_info*, MDL_request*, st_ddl_log_state*, TRIGGER_RENAME_PARAM*, Alter_table_ctx*, bool&, unsigned long long&, bool) [clone .constprop.0]

  mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, Table_specification_st*, TABLE_LIST*, Recreate_info*, Alter_info*, unsigned int, st_order*, bool, bool)

  Sql_cmd_alter_table::execute(THD*)

  mysql_execute_command(THD*, bool)

  mysql_parse(THD*, char*, unsigned int, Parser_state*)

  dispatch_command(enum_server_command, THD*, char*, unsigned int, bool)

  do_command(THD*, bool)

The method ha_innobase::commit_inplace_alter_table indeed does most of its I/O while having dict_sys.latch locked exclusively. I believe that one should not do any I/O while holding such a (global) lock, one that is commonly locked for all queries, including read-only queries.
This is a major scalability issue for us, because it is easy for (unprivileged) MariaDB users to lock up the whole daemon (including all databases and all catalogs) and it could be used to DoS the MariaDB server.

Attachments

Issue Links

blocks

MDEV-34986 Use RAII classes to manage locks

Stalled

is blocked by

MDEV-35154 dict_sys_t::load_table() is holding exclusive dict_sys.latch for unnecessarily long time

Confirmed

relates to

MDEV-8069 DROP or rebuild of a large table may lock up InnoDB

Closed

MDEV-15641 InnoDB crash while committing table-rebuilding ALTER TABLE

Closed

MDEV-34999 ha_innobase::open() should not acquire dict_sys.latch twice

Open

MDEV-35436 dict_stats_fetch_from_ps() unnecessarily holds exclusive dict_sys.latch

Closed

(1 relates to)

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Max Kellermann

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2024-09-23 21:16

Updated:: 2025-01-13 19:37

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server