Details
-
New Feature
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
None
-
Q2/2025 Development, Q3/2025 Maintenance
Description
Introduction
The server_audit_plugin employs synchronous, per-event logging, causing performance bottlenecks. Individual file writes for each log entry result in significant I/O overhead, especially in large database environments.
Detailed Description (Solution)
A dedicated logging thread is implemented to reduce the main audit plugin thread's wait time from synchronous write operations.The main audit plugin thread now pushes messages into a thread-safe queue, which efficiently manages concurrent access. Upon each push, it checks if the queue has reached its capacity. If the queue is full, the main thread signals the logger thread to flush and process the messages. This approach ensures efficient logging by reducing the performance hit from frequent thread creation and maintaining smooth, high-throughput operations in write-heavy applications. To further guarantee security, we introduced an additional system variable that allows users to set a log buffer time. This enhancement enables the audit plugin to periodically poll the buffer queue. If the queue is not empty, it will flush the messages at specified intervals. This approach ensures that during high throughput periods, the buffer size will be exceeded and flushed accordingly. Conversely, during low throughput periods, users are guaranteed to receive their logs after a certain period. This dual-variable system provides a more adaptive and efficient logging mechanism, accommodating fluctuations in database activity while maintaining optimal performance and security.
Summary of Changes
Buffered Logging:
- Implement an in-memory buffer to reduce disk I/O by batching log writes.
Asynchronous Logging:
- Delegate logging to a dedicated logger thread for independent log handling,
minimizing wait times for main audit threads
Dynamic Buffer Management:
- Add log_buffer_size and log_buffer_time system variables for user-customizable
flushing based on data volume or time, adapting to varying workloads.
Confirm Performance Improvement:
- An average of 15% speed improvements confirmed in benchmark tests, demonstrating the
effectiveness of these logging enhancements in high-load environments.
Additional Information
System Variables Added
************
Note: this section is outdated. Ref here and MDEV-36956 for the actual implemented variables.
************
server_audit_log_buffer_size
Description: When server_audit_log_buffer_size is set, it determines the maximum number of log messages
the buffer can hold. If the buffer reaches its limit, it flushes the logs and writes them to the file. The default value is 1, which means logs are directly written without buffering. Adjusting this value allows for batch logging, improving performance by reducing the overhead of writing each log message individually.
Commandline: --server-audit-log-buffer-size=value
Scope: Global
Dynamic: Yes
Data Type: numeric
Default Value: 1
Range: 1 to 9999
server_audit_log_buffer_time
Description: When server_audit_log_buffer_time is set, it determines how often, in milliseconds, the queue of log messages flushes itself. The default value is 1000 ms (1 second). This longer interval is due to the initial server_audit_log_buffer_size being 1, meaning logs are directly written without significant buffering. Adjusting this value allows for fine-tuning how frequently the logs are written, balancing between real-time logging and performance optimization.
Commandline: --server-audit-log-buffer-time=value
Scope: Global
Dynamic: Yes
Data Type: numeric
Default Value: 1000
Range: 1000 to 3600000 (1 hour)
Results
Below are the benchmarking results comparing three implementations: the original buffer size only, the default audit plugin, and the combination of buffer size and buffer time.
For these tests, the buffer size was set to 100, and the buffer time was set to 1000 ms (1 second).
Benchmarking with 64 Threads: Performance Improvements Compared to Default
Implementation | QPS | TPS | Avg Latency (ms) | Total Transactions | QPS Improvement vs | TPS Improvement vs | Avg Latency Improve | Total Transactions Improve |
Synchronous Direct Logging (Default) | 8182.0 | 1363.7 | 187.4 | 82011 | 0.0% | 0.0% | 0.0% | 0.0% |
Asynchronous Logging with Buffer Size | 9535.9 | 1589.3 | 160.8 | 95561 | 16.5% | 16.5% | 14.2% | 16.5% |
Asynchronous Logging with Buffer Size and Buffer Time | 10263.9 | 1710.6 | 149.6 | 102790 | 25.4% | 25.4% | 20.2% | 25.3% |
Benchmarking with 256 Threads: Performance Improvements Compared to Default
Implementation | QPS | TPS | Avg Latency (ms) | Total Transactions | QPS Improvement vs | TPS Improvement vs | Avg Latency Improve | Total Transactions Improve |
Synchronous Direct Logging (Default) | 2501.8 | 417.0 | 153.4 | 25116 | 0.0% | 0.0% | 0.0% | 0.0% |
Asynchronous Logging with Buffer Size | 3225.4 | 537.6 | 118.8 | 32330 | 28.9% | 28.9% | 22.6% | 28.7% |
Asynchronous Logging with Buffer Size and Buffer Time | 3239.9 | 540.0 | 118.2 | 32554 | 29.5% | 29.5% | 23.0% | 29.6% |
Update After Feedback
After receiving feedback from Upstream, I was advised against adding additional threads since MariaDB already has built-in timer threads that can handle asynchronous time-based logging. Consequently, size-based logging will now operate synchronously. Utilizing the thr_time class, I have successfully implemented this approach. Below are the updated benchmarking results, comparing this implementation to the default audit plugin.
Below are the results for New Implementation set to Default Configurations. The `log_buffer_size` is set to 1 here to mimic instant logging.
64 Threads
Metric | Default | New | % Improvement |
transactions_per_sec | 466.9 | 453.6 | -2.8 |
queries_per_sec | 2801.3 | 2821.6 | 0.7 |
total_events | 28089 | 27271 | -2.9 |
avg_latency | 136.8 | 141 | 3.1 |
execution_time_avg | 60 | 60.1 | 0.1 |
total_queries | 168534 | 163626 | -2.9 |
transactions | 28089 | 27271 | -2.9 |
256 Threads
Metric | Default | New | % Improvement |
transactions_per_sec | 1510 | 1508.4 | -0.6 |
queries_per_sec | 9060.2 | 9004.8 | -0.6 |
total_events | 91002 | 90448 | -0.6 |
avg_latency | 169.1 | 170 | 0.6 |
execution_time_avg | 60.1 | 60.1 | -0 |
total_queries | 546012 | 542688 | -0.6 |
transactions | 91002 | 90448 | -0.6 |
512 Threads
Metric | Default | New | % Improvement |
transactions_per_sec | 1819.1 | 1853.4 | 1.9 |
queries_per_sec | 10914.4 | 11120.6 | 1.9 |
total_events | 112787 | 116268 | 3.1 |
avg_latency | 274 | 266.7 | -2.7 |
execution_time_avg | 60.4 | 60.6 | 0.3 |
total_queries | 676725 | 697615 | 3.1 |
transactions | 112787 | 116268 | 3.1 |
Below are the results for New Implementation set to Buffered Configurations. The `log_buffer_size` is set to 1000 and `log_buffer_time` was set to 20 seconds here to mimic buffered logging.
64 Threads
Metric | Default | New | % Improvement |
transactions_per_sec | 466.9 | 602.4 | 29 |
queries_per_sec | 2801.3 | 3614.1 | 29 |
total_events | 28089 | 36244 | 29 |
avg_latency | 136.8 | 106.1 | -22.5 |
execution_time_avg | 60 | 60.1 | 0.1 |
total_queries | 168534 | 217464 | 29 |
transactions | 28089 | 36244 | 29 |
256 Threads
Metric | Default | New | % Improvement |
transactions_per_sec | 1510 | 1729.5 | 14.5 |
queries_per_sec | 9060.2 | 10377.2 | 14.5 |
total_events | 91002 | 103978 | 14.3 |
avg_latency | 169.1 | 147.8 | -12.5 |
execution_time_avg | 60.1 | 60 | -0.1 |
total_queries | 546012 | 623868 | 14.3 |
transactions | 91002 | 103978 | 14.3 |
512 Threads
Metric | Default | New | % Improvement |
transactions_per_sec | 1819.1 | 2071.3 | 13.9 |
queries_per_sec | 10914.4 | 12427.9 | 13.9 |
total_events | 112787 | 129747 | 15 |
avg_latency | 274 | 238.9 | -12.8 |
execution_time_avg | 60.4 | 60.6 | 0.3 |
total_queries | 676725 | 778484 | 15 |
transactions | 112787 | 129747 | 15 |
We observe that instant logging with this implementation performs almost as efficiently as the default implementation. However, significant performance improvements are achieved when using the buffer.
Note: We were unable to benchmark time-based buffered logging because improvements or regressions can only be detected with high throughput, which can only test size-based buffering.
Attachments
Issue Links
- blocks
-
MDEV-33763 Buffered error logging for Galera
-
- In Review
-
- causes
-
MDEV-36955 plugins.server_audit_buffering fails sporadically, server_audit_file_buffer_size setting not working
-
- Closed
-
-
MDEV-37089 Temporary multi-thread hangs observed when using Server Audit Log Buffering, incosistent server_audit_file_buffer_size increase operation
-
- Closed
-
-
MDEV-37122 Audit log mutex locking issue upon changing server_audit_file_buffer_size, UBSAN null-pointer-use in logger_sync
-
- Closed
-
-
MDEV-37158 The server_audit_file_buffer_size variable does not show correct values when set, and operates differently from other variables when undersized
-
- Closed
-
-
MDEV-37159 Incorrect server_audit_file_buffer_size operation according to specification
-
- Closed
-
-
MDEV-37160 When >=2 clients are in use, the server_audit_file_buffer_size setting is not honored
-
- Closed
-
-
MDEV-37162 Audit log buffering works correctly with long strings when using 8192 (8k) bytes, but not with 8193
-
- Closed
-
-
MDEV-37163 plugins.server_audit_buffering MTR test fails in various ways in patch v3, v4 and v5
-
- Closed
-
-
MDEV-37251 On the second Audit log flush/write, the number of written bytes is incorrect
-
- Closed
-
-
MDEV-37253 Server audit buffering does not honor the server_audit_file_rotate_size setting
-
- Closed
-
-
MDEV-37393 server_audit_file_buffer_size max value to be increased
-
- Open
-
- includes
-
MDEV-36956 Document two new audit log variables
-
- In Review
-
-
MDEV-37161 Plugin version needs increasing, small error log \n issue
-
- Closed
-
-
MDEV-37355 Performance testing of final Audit log buffering patch
-
- Closed
-
- is blocked by
-
MDEV-36955 plugins.server_audit_buffering fails sporadically, server_audit_file_buffer_size setting not working
-
- Closed
-
- relates to
-
MDEV-37501 plugins.server_audit Fails in 12.1+
-
- Closed
-