[MDEV-19291] server_audit plugin mutex causes stalls at very high concurrency Created: 2019-04-19 Updated: 2020-08-25 Resolved: 2019-06-15 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Locking, Plugin - Audit, Plugins |
| Affects Version/s: | 10.1.38, 10.2.23, 10.3.14, 10.4.4 |
| Fix Version/s: | 10.2.26, 10.1.41, 10.3.17 |
| Type: | Bug | Priority: | Major |
| Reporter: | Geoff Montee (Inactive) | Assignee: | Alexey Botchkov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
The server_audit plugin's aggressive use of its lock_operations mutex can cause stalls under very high concurrency: https://github.com/MariaDB/server/blob/mariadb-10.3.7/plugin/server_audit/server_audit.c#L544 I've been told that the server_audit plugin's lock_operations mutex may not be the only bottleneck. We may also need to refactor some audit-related mutex in the server itself. It looks like that might be referring to LOCK_plugin based on this comment in sql_audit.cc:
https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/sql_audit.cc#L277 It looks like the comment might be referring to the plugin_foreach calls in these locations: https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/sql_audit.cc#L411 https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/sql_audit.cc#L120 One user said that when they enable the audit plugin in environments with 30k-40k QPS, and with 1400+ values for Threads_connected, and with 500+ values for Threads_running, they start to see most threads stall in init/freeing items state. Another user said that they see a lot of threads stall in the "System lock" state. This seems to be related to the call to the mysql_audit_external_lock() function in the handler::ha_external_lock() function: https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/handler.cc#L6125 https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/sql_audit.h#L285 |
| Comments |
| Comment by Alexey Botchkov [ 2019-04-22 ] |
|
I actually refactored the server code so the LOCK_plugin is locked much less. |
| Comment by Geoff Montee (Inactive) [ 2019-04-25 ] |
|
Hi holyfoot, The user who tested your refactored server code said that your changes significantly improved performance under load, so it sounds very promising. |