[MXS-701] Add binlog filtering to MaxScale Created: 2016-04-28 Updated: 2018-09-26 Resolved: 2018-09-26 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | binlogrouter |
| Affects Version/s: | None |
| Fix Version/s: | 2.3.0 |
| Type: | New Feature | Priority: | Major |
| Reporter: | Kolbe Kegel (Inactive) | Assignee: | markus makela |
| Resolution: | Fixed | Votes: | 3 |
| Labels: | None | ||
| Sprint: | 2017-45, 2017-46, MXS-SPRINT-67 |
| Description |
|
There are many use cases where a user wants to replicate only some subset of a master's objects. In some cases, only a single table should be replicated. In other cases, a single database should be replicated. In other cases, perhaps a single table should be excluded. Today, this can be done using the replicate-* options on the slave. However, those options affect the slave SQL thread. The slave IO thread still fetches the entire binlog from the master and writes the whole thing to disk in the form of relay logs. Then the IO thread skips over potentially many gigabytes of log entries to apply the limited subset that match the configured replicate-* entries. This means the master is tied up transferring enormous amounts of unnecessary data over the network, all the while waiting for the slave IO thread to write that data to disk. An alternative is to use binlog-* options on the master to prevent it from writing certain log entries to disk. This is unacceptable. It means that the binlog is no longer than authoritative record, with the result that the binlog cannot be used for roll-forward recovery. It also precludes separate sets of slaves each with different requirements. Another solution could use the MaxScale binlog router. If MaxScale were located close to the master, to minimize network bottlenecks, it could write binlogs locally and provide filtering to clients. This means the binlog on the master would still be authoritative, it would add MaxScale as an async (or semi-sync) DR copy of the binlog, and the CPU load of processing the per-slave filters would be handled on the MaxScale node rather than on the master. To facilitate this use case, there are several different implementation possibilities: 1) Separate listeners, each with its own filters. Slaves that need only a specific subset of log entries would connect to the listener that was configured to serve only that subset. This feature would add enormous value to MaxScale and to the MaxScale binlog router. |
| Comments |
| Comment by markus makela [ 2017-08-22 ] |
|
This should be possible to implement as a MaxScale filter in 2.2 now that the binlogrouter routes the replies to the slaves via the normal routing mechanisms. |
| Comment by Massimiliano Pinto (Inactive) [ 2017-11-21 ] |
|
Next additions, 16MB payloads and DDLs are in 'develop' |
| Comment by Massimiliano Pinto (Inactive) [ 2017-11-21 ] |
|
Current implementation is with ROW based replication: when TABLE_MAP event is seen the db/table are checked against configuration In case of match TABLE_MAP event and all next events, i.e. Write_rows_v1 etc are "replaced" by a RAND_EVENT with:
This keeps the packet sequence but SAVES the bandwidth by not sending real data but only an empty rand_event: 19 bytes. When XID_EVENT is seen (commit) the filtering ends and this event is altered with next_pos = 0 Note: all packets belonging to a large event, i.e 34 MBytes are replaced by empty rand_events Example of replaced event /Unable to render embedded object: File (50521 SET skip_replication=1//) not found./; 700101 1:00:00 server id 0 end_log_pos 0 CRC32 0x354f3a99 Rand SET @@RAND_SEED1=894384793, @@RAND_SEED2=51539607552/!/; Commit example: 171113 18:53:21 server id 10124 end_log_pos 0 CRC32 0xa93380fa Xid = 11967 |