[MDEV-12954] Parallel replication does not account for multiple databases Created: 2017-05-30 Updated: 2017-07-07 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | Kai | Assignee: | Andrei Elkin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Description |
|
In a situation where there are multiple databases on a single MariaDB server, a slave connected to the master will not apply statements from separate databases in parallel. Normally, a transaction will only affect a single database; even if it affects a large number of tables, all of those tables will still be in the same DB. Therefore, it seems like a singularly obvious optimization to parallel replication for updates to separate databases to be applied simultaneously when possible (with, if necessary, safeguards to ensure that transactions which do affect multiple databases block subsequent transactions to any of the affected databases so as not to apply updates out of order). |
| Comments |
| Comment by Elena Stepanova [ 2017-06-04 ] | |||
|
Could you please provide an example that makes you think that parallel replication is not applied to transactions executed in separate databases? | |||
| Comment by Kai [ 2017-06-05 ] | |||
|
In my case, I set up a master with several databases, with a slave that replicates from it. Even with extremely heavy work on the master across several databases, parallel replication enabled on the slave, and binlog_commit_wait_count set to a non-zero value, I always see one thread on the slave applying updates and all of the others waiting for a prior transaction to commit. If there's a better way to prove it either way, I'd love to know. Not only that, but per the documentation, parallel replication is applied for separate domain IDs; since all of the transactions seem to share a domain ID, it follows they would be applied serially. | |||
| Comment by Elena Stepanova [ 2017-06-25 ] | |||
|
If you are using in-order replication, then yes, while the transactions are still executed in parallel (whenever possible), the commits happen in the order of appearance. If you want a more aggressive approach, you need to configure out-of-order replication. The domain ID is a dynamic variable which can be set on a session level, if you are sure that certain connections are operating only withing certain databases, make them set different domain IDs. You can also explore the optimistic parallel replication. I will convert it into a feature request and assign to Elkin to see if he finds a realistic path for improvement here. | |||
| Comment by Andrei Elkin [ 2017-07-06 ] | |||
|
Elena, your direction is pretty much correct. https://mariadb.com/kb/en/mariadb/parallel-replication/#out-of-order-parallel-replication has some examples how to use @@session.gtid_domain_id. Yet when the user can't do it we shall consider possibilities to automate such setting which deems quite feasible. Consider the server is configured with something like user specific mapping db:s (or even subset of db:s) into
The framework should also provide a notion of the default domain id in cases
can't be mapped. &rei. | |||
| Comment by Daniel Black [ 2017-07-06 ] | |||
|
https://openquery.com.au/blog/optimising-multithreaded-replication as an example too | |||
| Comment by Andrei Elkin [ 2017-07-06 ] | |||
|
Thanks, Daniel. > The premise of this implementation is that each database username uniquely identifies an application It's a nice idea! Indeed, the mapping can use various keys, like the username. &rei. | |||
| Comment by Kai [ 2017-07-06 ] | |||
|
Oh, that's a great idea. I'll just give every DB session SUPER privileges which can't POSSIBLY have any negative consequences. I guess there's the option of wrapping all of this into a stored procedure, but there IS no "order of appearance" between multiple databases. They are separate domains, and in many cases, the queries are happening concurrently in the master. Unless there is a global lock across all databases, the log simply is going to recording which statements complete first, which isn't even fully reflective of the order in which they were issued! If a user wants fully serial replication, they can turn parallel replication off. If they turn parallel replication on, then the threads should consume statements whenever it can be determined that there is no mutual conflict or order-of-execution issues with applying them. | |||
| Comment by Daniel Black [ 2017-07-06 ] | |||
|
Look closer, each session doesn't have SUPER privs, they are only used temporary in the procedure due to it being "SECURITY DEFINER". Anyway, just an example, nothing more. | |||
| Comment by Kai [ 2017-07-07 ] | |||
|
Yeah, that's why I said there's the option of storing it in the procedure; I completely spaced for a second that that was possible. (You don't have to bring them into this just because you forgot refrigerators existed for a second there.) It still seems cumbersome, to say the least, but at least it is an option. |