Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19291

server_audit plugin mutex causes stalls at very high concurrency

    XMLWordPrintable

Details

    Description

      The server_audit plugin's aggressive use of its lock_operations mutex can cause stalls under very high concurrency:

      https://github.com/MariaDB/server/blob/mariadb-10.3.7/plugin/server_audit/server_audit.c#L544

      I've been told that the server_audit plugin's lock_operations mutex may not be the only bottleneck. We may also need to refactor some audit-related mutex in the server itself. It looks like that might be referring to LOCK_plugin based on this comment in sql_audit.cc:

        /*
          Pre-acquire the newly inslalled audit plugin for events that
          may potentially occur further during INSTALL PLUGIN.
          When audit event is triggered, audit subsystem acquires interested
          plugins by walking through plugin list. Evidently plugin list
          iterator protects plugin list by acquiring LOCK_plugin, see
          plugin_foreach_with_mask().
          On the other hand [UN]INSTALL PLUGIN is acquiring LOCK_plugin
          rather for a long time.
          When audit event is triggered during [UN]INSTALL PLUGIN, plugin
          list iterator acquires the same lock (within the same thread)
          second time.
          This hack should be removed when LOCK_plugin is fixed so it
          protects only what it supposed to protect.
          See also mysql_install_plugin() and mysql_uninstall_plugin()
        */
      

      https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/sql_audit.cc#L277

      It looks like the comment might be referring to the plugin_foreach calls in these locations:

      https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/sql_audit.cc#L411

      https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/sql_audit.cc#L120

      One user said that when they enable the audit plugin in environments with 30k-40k QPS, and with 1400+ values for Threads_connected, and with 500+ values for Threads_running, they start to see most threads stall in init/freeing items state.

      Another user said that they see a lot of threads stall in the "System lock" state. This seems to be related to the call to the mysql_audit_external_lock() function in the handler::ha_external_lock() function:

      https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/handler.cc#L6125

      https://github.com/MariaDB/server/blob/mariadb-10.3.7/sql/sql_audit.h#L285

      Attachments

        Issue Links

          Activity

            People

              holyfoot Alexey Botchkov
              GeoffMontee Geoff Montee (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.