Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9345

Replication to enable filtering on master

Details

    Description

      Problem:

      Currently, there are two points where replication filters can be applied:
      1. Filtering events when writing to the binary log. This approach can break point-in-time recovery, as data that has been applied on the server will not be in the binary log.
      2. Filtering events on the slave by the SQL thread (for replicate_ based filters) or IO thread (for CHANGE MASTER TO based filters). This approach requires all events to be transmitted to the slave and written into the relay log in-order to check whether or not an event should be replicated. This has a few disadvantages:

      • If replication filters are configured to discard sensitive data, the events which contain this information will still be sent over the network to a slave.
      • Increased network traffic
      • Increase slave footprint (extra memory and storage space for events which will not be executed)
      • Each slave maintains its own replication configuration, which increases the risk of data divergence between slaves whenever modifying the configuration files.

      The above limitations are paraphrased from oli, thanks for a great summary!

      Solution:

      Add another layer at which replication can filter events: on the binlog dump threads, before sending events to a slave. These would be configured by the master configuration files, and it would apply to all binlog dump threads. This would solve all aforementioned limitations:
      1. It would allow for point-in-time recovery by allowing the binlogs to be consistent with the data state of the server, as the underlying binlogs would still contain all events.
      2. Events intended to be filtered would no longer be sent over the network, resulting in 1) better containment of sensitive data, 2) reduced network bandwidth, 3) reduced slave footprint.
      3. Less risk of data divergence between slaves which should have the same replication filtering rules.

      The files would mimic the replication filter configurations, but would change the option prefix from replicate_ to binlog_dump_. For example, binlog_dump_do_table.

      A couple tips from knielsen:

      The dump thread code IIRC is very convoluted, so adding something like this might be good to combine with some refactoring.
      One challenge can be how much information the dump thread has available to filter on. Dump thread currently does not construct the Log_event subclasses in memory, this task may mean it optionally needs to do so. Dump thread probably cannot parse SQL to find accessed tables, but some things can be done.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stephane@skysql.com VAROQUI Stephane
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.