Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34680

Asynchronous and Buffered Logging for Audit Plugin

    XMLWordPrintable

Details

    Description

      Introduction

      The server_audit_plugin employs synchronous, per-event logging, causing performance bottlenecks. Individual file writes for each log entry result in significant I/O overhead, especially in large database environments.

      Detailed Description (Solution)

      A dedicated logging thread is implemented to reduce the main audit plugin thread's wait time from synchronous write operations.The main audit plugin thread now pushes messages into a thread-safe queue, which efficiently manages concurrent access. Upon each push, it checks if the queue has reached its capacity. If the queue is full, the main thread signals the logger thread to flush and process the messages. This approach ensures efficient logging by reducing the performance hit from frequent thread creation and maintaining smooth, high-throughput operations in write-heavy applications. To further guarantee security, we introduced an additional system variable that allows users to set a log buffer time. This enhancement enables the audit plugin to periodically poll the buffer queue. If the queue is not empty, it will flush the messages at specified intervals. This approach ensures that during high throughput periods, the buffer size will be exceeded and flushed accordingly. Conversely, during low throughput periods, users are guaranteed to receive their logs after a certain period. This dual-variable system provides a more adaptive and efficient logging mechanism, accommodating fluctuations in database activity while maintaining optimal performance and security.

      Summary of Changes

      Buffered Logging:

      • Implement an in-memory buffer to reduce disk I/O by batching log writes.

      Asynchronous Logging:

      • Delegate logging to a dedicated logger thread for independent log handling,
        minimizing wait times for main audit threads

      Dynamic Buffer Management:

      • Add log_buffer_size and log_buffer_time system variables for user-customizable
        flushing based on data volume or time, adapting to varying workloads.

      Confirm Performance Improvement:

      • An average of 15% speed improvements confirmed in benchmark tests, demonstrating the
        effectiveness of these logging enhancements in high-load environments.

      Additional Information

      System Variables Added

      server_audit_log_buffer_size

      Description: When server_audit_log_buffer_size is set, it determines the maximum number of log messages
      the buffer can hold. If the buffer reaches its limit, it flushes the logs and writes them to the file. The default value is 1, which means logs are directly written without buffering. Adjusting this value allows for batch logging, improving performance by reducing the overhead of writing each log message individually.

      Commandline: --server-audit-log-buffer-size=value

      Scope: Global

      Dynamic: Yes

      Data Type: numeric

      Default Value: 1

      Range: 1 to 9999

      server_audit_log_buffer_time

      Description: When server_audit_log_buffer_time is set, it determines how often, in milliseconds, the queue of log messages flushes itself. The default value is 1000 ms (1 second). This longer interval is due to the initial server_audit_log_buffer_size being 1, meaning logs are directly written without significant buffering. Adjusting this value allows for fine-tuning how frequently the logs are written, balancing between real-time logging and performance optimization.

      Commandline: --server-audit-log-buffer-time=value

      Scope: Global

      Dynamic: Yes

      Data Type: numeric

      Default Value: 1000

      Range: 1000 to 3600000 (1 hour)

      Results

      Below are the benchmarking results comparing three implementations: the original buffer size only, the default audit plugin, and the combination of buffer size and buffer time.

      For these tests, the buffer size was set to 100, and the buffer time was set to 1000 ms (1 second).

      Benchmarking with 64 Threads: Performance Improvements Compared to Default

      Implementation QPS TPS Avg Latency (ms) Total Transactions QPS Improvement vs TPS Improvement vs Avg Latency Improve Total Transactions Improve
      Synchronous Direct Logging (Default) 8182.0 1363.7 187.4 82011 0.0% 0.0% 0.0% 0.0%
      Asynchronous Logging with Buffer Size 9535.9 1589.3 160.8 95561 16.5% 16.5% 14.2% 16.5%
      Asynchronous Logging with Buffer Size and Buffer Time 10263.9 1710.6 149.6 102790 25.4% 25.4% 20.2% 25.3%

      Benchmarking with 256 Threads: Performance Improvements Compared to Default

      Implementation QPS TPS Avg Latency (ms) Total Transactions QPS Improvement vs TPS Improvement vs Avg Latency Improve Total Transactions Improve
      Synchronous Direct Logging (Default) 2501.8 417.0 153.4 25116 0.0% 0.0% 0.0% 0.0%
      Asynchronous Logging with Buffer Size 3225.4 537.6 118.8 32330 28.9% 28.9% 22.6% 28.7%
      Asynchronous Logging with Buffer Size and Buffer Time 3239.9 540.0 118.2 32554 29.5% 29.5% 23.0% 29.6%

      Update After Feedback

      After receiving feedback from Upstream, I was advised against adding additional threads since MariaDB already has built-in timer threads that can handle asynchronous time-based logging. Consequently, size-based logging will now operate synchronously. Utilizing the thr_time class, I have successfully implemented this approach. Below are the updated benchmarking results, comparing this implementation to the default audit plugin.

      Below are the results for New Implementation set to Default Configurations. The `log_buffer_size` is set to 1 here to mimic instant logging.

      64 Threads

      Metric Default New % Improvement
      transactions_per_sec 466.9 453.6 -2.8
      queries_per_sec 2801.3 2821.6 0.7
      total_events 28089 27271 -2.9
      avg_latency 136.8 141 3.1
      execution_time_avg 60 60.1 0.1
      total_queries 168534 163626 -2.9
      transactions 28089 27271 -2.9

      256 Threads

      Metric Default New % Improvement
      transactions_per_sec 1510 1508.4 -0.6
      queries_per_sec 9060.2 9004.8 -0.6
      total_events 91002 90448 -0.6
      avg_latency 169.1 170 0.6
      execution_time_avg 60.1 60.1 -0
      total_queries 546012 542688 -0.6
      transactions 91002 90448 -0.6

      512 Threads

      Metric Default New % Improvement
      transactions_per_sec 1819.1 1853.4 1.9
      queries_per_sec 10914.4 11120.6 1.9
      total_events 112787 116268 3.1
      avg_latency 274 266.7 -2.7
      execution_time_avg 60.4 60.6 0.3
      total_queries 676725 697615 3.1
      transactions 112787 116268 3.1

      Below are the results for New Implementation set to Buffered Configurations. The `log_buffer_size` is set to 1000 and `log_buffer_time` was set to 20 seconds here to mimic buffered logging.

      64 Threads

      Metric Default New % Improvement
      transactions_per_sec 466.9 602.4 29
      queries_per_sec 2801.3 3614.1 29
      total_events 28089 36244 29
      avg_latency 136.8 106.1 -22.5
      execution_time_avg 60 60.1 0.1
      total_queries 168534 217464 29
      transactions 28089 36244 29

      256 Threads

      Metric Default New % Improvement
      transactions_per_sec 1510 1729.5 14.5
      queries_per_sec 9060.2 10377.2 14.5
      total_events 91002 103978 14.3
      avg_latency 169.1 147.8 -12.5
      execution_time_avg 60.1 60 -0.1
      total_queries 546012 623868 14.3
      transactions 91002 103978 14.3

      512 Threads

      Metric Default New % Improvement
      transactions_per_sec 1819.1 2071.3 13.9
      queries_per_sec 10914.4 12427.9 13.9
      total_events 112787 129747 15
      avg_latency 274 238.9 -12.8
      execution_time_avg 60.4 60.6 0.3
      total_queries 676725 778484 15
      transactions 112787 129747 15

      We observe that instant logging with this implementation performs almost as efficiently as the default implementation. However, significant performance improvements are achieved when using the buffer.

      Note: We were unable to benchmark time-based buffered logging because improvements or regressions can only be detected with high throughput, which can only test size-based buffering.

      Attachments

        Activity

          People

            holyfoot Alexey Botchkov
            hashim_s hashiim
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.