XML

Word

Printable

Details

Type: New Feature
Status: Stalled (View Workflow)
Priority: Critical
Resolution: Unresolved
Fix Version/s: 11.9
Component/s: Plugin - Audit
Labels:

Description

Introduction

The server_audit_plugin employs synchronous, per-event logging, causing performance bottlenecks. Individual file writes for each log entry result in significant I/O overhead, especially in large database environments.

Detailed Description (Solution)

A dedicated logging thread is implemented to reduce the main audit plugin thread's wait time from synchronous write operations.The main audit plugin thread now pushes messages into a thread-safe queue, which efficiently manages concurrent access. Upon each push, it checks if the queue has reached its capacity. If the queue is full, the main thread signals the logger thread to flush and process the messages. This approach ensures efficient logging by reducing the performance hit from frequent thread creation and maintaining smooth, high-throughput operations in write-heavy applications. To further guarantee security, we introduced an additional system variable that allows users to set a log buffer time. This enhancement enables the audit plugin to periodically poll the buffer queue. If the queue is not empty, it will flush the messages at specified intervals. This approach ensures that during high throughput periods, the buffer size will be exceeded and flushed accordingly. Conversely, during low throughput periods, users are guaranteed to receive their logs after a certain period. This dual-variable system provides a more adaptive and efficient logging mechanism, accommodating fluctuations in database activity while maintaining optimal performance and security.

Summary of Changes

Buffered Logging:

Implement an in-memory buffer to reduce disk I/O by batching log writes.

Asynchronous Logging:

Delegate logging to a dedicated logger thread for independent log handling,
minimizing wait times for main audit threads

Dynamic Buffer Management:

Add log_buffer_size and log_buffer_time system variables for user-customizable
flushing based on data volume or time, adapting to varying workloads.

Confirm Performance Improvement:

An average of 15% speed improvements confirmed in benchmark tests, demonstrating the
effectiveness of these logging enhancements in high-load environments.

Additional Information

System Variables Added

server_audit_log_buffer_size

Description: When server_audit_log_buffer_size is set, it determines the maximum number of log messages
the buffer can hold. If the buffer reaches its limit, it flushes the logs and writes them to the file. The default value is 1, which means logs are directly written without buffering. Adjusting this value allows for batch logging, improving performance by reducing the overhead of writing each log message individually.

Commandline: --server-audit-log-buffer-size=value

Scope: Global

Dynamic: Yes

Data Type: numeric

Default Value: 1

Range: 1 to 9999

server_audit_log_buffer_time

Description: When server_audit_log_buffer_time is set, it determines how often, in milliseconds, the queue of log messages flushes itself. The default value is 1000 ms (1 second). This longer interval is due to the initial server_audit_log_buffer_size being 1, meaning logs are directly written without significant buffering. Adjusting this value allows for fine-tuning how frequently the logs are written, balancing between real-time logging and performance optimization.

Commandline: --server-audit-log-buffer-time=value

Scope: Global

Dynamic: Yes

Data Type: numeric

Default Value: 1000

Range: 1000 to 3600000 (1 hour)

Results

Below are the benchmarking results comparing three implementations: the original buffer size only, the default audit plugin, and the combination of buffer size and buffer time.

For these tests, the buffer size was set to 100, and the buffer time was set to 1000 ms (1 second).

Benchmarking with 64 Threads: Performance Improvements Compared to Default

Implementation	QPS	TPS	Avg Latency (ms)	Total Transactions	QPS Improvement vs	TPS Improvement vs	Avg Latency Improve	Total Transactions Improve
Synchronous Direct Logging (Default)	8182.0	1363.7	187.4	82011	0.0%	0.0%	0.0%	0.0%
Asynchronous Logging with Buffer Size	9535.9	1589.3	160.8	95561	16.5%	16.5%	14.2%	16.5%
Asynchronous Logging with Buffer Size and Buffer Time	10263.9	1710.6	149.6	102790	25.4%	25.4%	20.2%	25.3%

Benchmarking with 256 Threads: Performance Improvements Compared to Default

Implementation	QPS	TPS	Avg Latency (ms)	Total Transactions	QPS Improvement vs	TPS Improvement vs	Avg Latency Improve	Total Transactions Improve
Synchronous Direct Logging (Default)	2501.8	417.0	153.4	25116	0.0%	0.0%	0.0%	0.0%
Asynchronous Logging with Buffer Size	3225.4	537.6	118.8	32330	28.9%	28.9%	22.6%	28.7%
Asynchronous Logging with Buffer Size and Buffer Time	3239.9	540.0	118.2	32554	29.5%	29.5%	23.0%	29.6%

Update After Feedback

After receiving feedback from Upstream, I was advised against adding additional threads since MariaDB already has built-in timer threads that can handle asynchronous time-based logging. Consequently, size-based logging will now operate synchronously. Utilizing the thr_time class, I have successfully implemented this approach. Below are the updated benchmarking results, comparing this implementation to the default audit plugin.

Below are the results for New Implementation set to Default Configurations. The `log_buffer_size` is set to 1 here to mimic instant logging.

64 Threads

Metric	Default	New	% Improvement
transactions_per_sec	466.9	453.6	-2.8
queries_per_sec	2801.3	2821.6	0.7
total_events	28089	27271	-2.9
avg_latency	136.8	141	3.1
execution_time_avg	60	60.1	0.1
total_queries	168534	163626	-2.9
transactions	28089	27271	-2.9

256 Threads

Metric	Default	New	% Improvement
transactions_per_sec	1510	1508.4	-0.6
queries_per_sec	9060.2	9004.8	-0.6
total_events	91002	90448	-0.6
avg_latency	169.1	170	0.6
execution_time_avg	60.1	60.1	-0
total_queries	546012	542688	-0.6
transactions	91002	90448	-0.6

512 Threads

Metric	Default	New	% Improvement
transactions_per_sec	1819.1	1853.4	1.9
queries_per_sec	10914.4	11120.6	1.9
total_events	112787	116268	3.1
avg_latency	274	266.7	-2.7
execution_time_avg	60.4	60.6	0.3
total_queries	676725	697615	3.1
transactions	112787	116268	3.1

Below are the results for New Implementation set to Buffered Configurations. The `log_buffer_size` is set to 1000 and `log_buffer_time` was set to 20 seconds here to mimic buffered logging.

64 Threads

Metric	Default	New	% Improvement
transactions_per_sec	466.9	602.4	29
queries_per_sec	2801.3	3614.1	29
total_events	28089	36244	29
avg_latency	136.8	106.1	-22.5
execution_time_avg	60	60.1	0.1
total_queries	168534	217464	29
transactions	28089	36244	29

256 Threads

Metric	Default	New	% Improvement
transactions_per_sec	1510	1729.5	14.5
queries_per_sec	9060.2	10377.2	14.5
total_events	91002	103978	14.3
avg_latency	169.1	147.8	-12.5
execution_time_avg	60.1	60	-0.1
total_queries	546012	623868	14.3
transactions	91002	103978	14.3

512 Threads

Metric	Default	New	% Improvement
transactions_per_sec	1819.1	2071.3	13.9
queries_per_sec	10914.4	12427.9	13.9
total_events	112787	129747	15
avg_latency	274	238.9	-12.8
execution_time_avg	60.4	60.6	0.3
total_queries	676725	778484	15
transactions	112787	129747	15

We observe that instant logging with this implementation performs almost as efficiently as the default implementation. However, significant performance improvements are achieved when using the buffer.

Note: We were unable to benchmark time-based buffered logging because improvements or regressions can only be detected with high throughput, which can only test size-based buffering.

Attachments

Activity

People

Assignee:: Alexey Botchkov

Reporter:: hashiim

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2024-07-31 17:34

Updated:: Yesterday 16:22

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.