[MDEV-19291] server_audit plugin mutex causes stalls at very high concurrency Created: 2019-04-19  Updated: 2020-08-25  Resolved: 2019-06-15

Status: Closed
Project: MariaDB Server
Component/s: Locking, Plugin - Audit, Plugins
Affects Version/s: 10.1.38, 10.2.23, 10.3.14, 10.4.4
Fix Version/s: 10.2.26, 10.1.41, 10.3.17

Type: Bug Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Alexey Botchkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MDEV-18661 loading the audit plugin causes perfo... Closed
Problem/Incident
causes MDEV-19174 server_audit plugin locks mutex for e... Closed
Relates
relates to MDEV-20269 Create new "Auditing" thread state Open

 Description   

The server_audit plugin's aggressive use of its lock_operations mutex can cause stalls under very high concurrency:

https://github.com/MariaDB/server/blob/mariadb-10.3.7/plugin/server_audit/server_audit.c#L544

I've been told that the server_audit plugin's lock_operations mutex may not be the only bottleneck. We may also need to refactor some audit-related mutex in the server itself. It looks like that might be referring to LOCK_plugin based on this comment in sql_audit.cc:

  /*
    Pre-acquire the newly inslalled audit plugin for events that
    may potentially occur further during INSTALL PLUGIN.
    When audit event is triggered, audit subsystem acquires interested
    plugins by walking through plugin list. Evidently plugin list
    iterator protects plugin list by acquiring LOCK_plugin, see
    plugin_foreach_with_mask().
    On the other hand [UN]INSTALL PLUGIN is acquiring LOCK_plugin
    rather for a long time.
    When audit event is triggered during [UN]INSTALL PLUGIN, plugin
    list iterator acquires the same lock (within the same thread)
    second time.
    This hack should be removed when LOCK_plugin is fixed so it
    protects only what it supposed to protect.
    See also mysql_install_plugin() and mysql_uninstall_plugin()
  */

https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/sql_audit.cc#L277

It looks like the comment might be referring to the plugin_foreach calls in these locations:

https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/sql_audit.cc#L411

https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/sql_audit.cc#L120

One user said that when they enable the audit plugin in environments with 30k-40k QPS, and with 1400+ values for Threads_connected, and with 500+ values for Threads_running, they start to see most threads stall in init/freeing items state.

Another user said that they see a lot of threads stall in the "System lock" state. This seems to be related to the call to the mysql_audit_external_lock() function in the handler::ha_external_lock() function:

https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/handler.cc#L6125

https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/sql_audit.h#L285



 Comments   
Comment by Alexey Botchkov [ 2019-04-22 ]

I actually refactored the server code so the LOCK_plugin is locked much less.
Now waiting for somebody high-loaded to test this server.

Comment by Geoff Montee (Inactive) [ 2019-04-25 ]

Hi holyfoot,

The user who tested your refactored server code said that your changes significantly improved performance under load, so it sounds very promising.

Generated at Thu Feb 08 08:50:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.