[MXS-3986] Binlog compression and archiving Created: 2022-02-06  Updated: 2023-11-20  Resolved: 2023-11-20

Status: Closed
Project: MariaDB MaxScale
Component/s: binlogrouter
Affects Version/s: 22.08
Fix Version/s: 24.02.0

Type: New Feature Priority: Major
Reporter: Pon Suresh Pandian (Inactive) Assignee: Niclas Antti
Resolution: Fixed Votes: 0
Labels: Maxscale
Environment:

All Linux Environments


Sprint: MXS-SPRINT-186, MXS-SPRINT-187, MXS-SPRINT-188, MXS-SPRINT-189, MXS-SPRINT-190, MXS-SPRINT-191, MXS-SPRINT-192, MXS-SPRINT-193, MXS-SPRINT-194

 Description   

This Jira was originally "Stream binary logs to s3 bucket".
It has been implemented with two new features: compression and archiving.
See https://jira.mariadb.org/browse/MXS-4867.

Original Description:

Hi Team,

Need to stream the binary logs directly to S3 bucket without storing locally.

This is new feature request please let me know your thoughts.

I have tested this in my test environment like below,

Step 1 :

Created new S3 bucket and provided all access to that bucket.

Step 2 :

Installed the maxscale 6 and configured binlog router.

Step 3 :

Then locally mounted that S3 bucket on my maxscale server.

Example :

[root@centos14 /]# df -h | grep bucket
s3fs                     256T     0  256T   0% /mariadb-s3-bucket

Step 4 :

Added this mount path in my maxscale config.

Example :

[replication-service]
type=service
router=binlogrouter
cluster=MariaDB-Monitor
select_master=true
expire_log_minimum_files=10000
user=xxx
password=xxxx
datadir=/mariadb-s3-bucket/binlogs/

Step 5:

Start the maxscale service. Now binlogs are stored to S3 bucket.

[root@centos14 binlogs]# pwd
/mariadb-s3-bucket/binlogs
[root@centos14 binlogs]# 
[root@centos14 binlogs]# ls -lrth
total 4.0K
-rw-r--r--. 1 maxscale maxscale   12 Jan 23 08:03 requested_rpl_state
-rw-r--r--. 1 maxscale maxscale 1.8K Jan 23 08:03 centos11-bin.000001
-rw-r--r--. 1 maxscale maxscale   48 Jan 23 08:03 binlog.index
-rw-r--r--. 1 maxscale maxscale   12 Jan 23 08:08 rpl_state
-rw-r--r--. 1 maxscale maxscale  258 Jan 23 11:32 master-info.json

Problem statement 1 :

The streaming speed is too slow compared to local storage. I loaded some data using sysbench but it's taking more time stream to S3 bucket. The transfer speed is 4MB and it's streaming single threaded.

Is it possible to stream the binlogs without mounting locally ? or how can we speed up this process using binlog router.

Problem statement 2 :

If unexpectedly my maxscale crashes it removes existing binlogs from S3 bucket.



 Comments   
Comment by markus makela [ 2023-07-07 ]

An additional improvement to this would be to compress the files before uploading them. This could be done as a pre-purge step by the binlogrouter so that older files are compressed instead of completely removed. This way more data can be kept available at the cost of making the access to it slower.

One way to have different tiers of storage would be:

  1. uncompressed binlogs: this is what we have now
  2. compressed binlogs: could be as simple as a .gz suffix to detect what is compressed and what isn't
  3. compressed binlogs in S3: the ultimate storage where archived binlogs end up going
Comment by Niclas Antti [ 2023-11-20 ]

Documentation has its own Jira:
https://jira.mariadb.org/browse/MXS-4867

Generated at Thu Feb 08 04:25:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.