[MXS-4411] Loading a SQL dump sends queries to the replica, breaking it (GTID under strict mode) Created: 2022-11-22 Updated: 2023-02-08 Resolved: 2023-02-08 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | N/A |
| Affects Version/s: | 6.4.3 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Assen Totin (Inactive) | Assignee: | Unassigned |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Description |
|
I've been chasing a weird thing (bug?) which happens from time to time (once every 3-6 months) and looks like this:
An attempt was made to binlog GTID 0-11-170271865 which would create an out-of-order sequence number with existing GTID 0-12-170271865, and gtid strict mode is enabled The binlog on the master and the relay log are identical, as it may be expected; the binlog on the replica shows a GTID with the local server ID created (but no data) and then the replication stops. This has happened enough times to rule out any accidental cause, Mercury retrograde etc.; both the master and the slave are firewalled out, so there is now way any connection could have been made directly into them - so this must have come from MaxScale. This happens both on loaded systems and testing ones, so the load factor seems not to play any role. MaxScale config is pretty straightforward, with a simple read-write split. It does have causal reads tracking now set to "global", but the same issue was present when this option did not exist; all-in-all, this has happened on all MaxScale mainlines from 2.3 to 6. We do have "use_sql_variables_in=master" but I don't see how this would cause the observed effect. The most frustrating thing is that this happens rarely and by far not every data load ends like this; however, this thing only happens on dump loads, so it must be somehow related. I'm attaching here the relevant parts (stripped of the repetitive INSERT statements) from the master log, relay log and the slave log on a breakage that happened few days ago; also the header of a dump from HeidiSQL, which caused a similar breakage few months ago (it then broke literally after the first CREATE TABLE statement, on the USE statement shown - but this time it broke halfway through the dump which I'm still waiting to receive). I understand these are hardly enough and I cannot give a way to reliably reproduce this, but maybe somebody has seen or heard of something similar? Does this ring any bell? |
| Comments |
| Comment by markus makela [ 2022-11-25 ] |
|
Is it possible for you to try and reproduce this with log_info enabled for MaxScale? Without MaxScale logs and with no way to consistently reproduce it, it's quite hard to say what might be happening. The only way MaxScale would send a write to a slave server would be if a bug like |
| Comment by Johan Wikman [ 2022-11-25 ] |
|
|
| Comment by markus makela [ 2022-12-01 ] |
|
In theory this could also be caused by |
| Comment by markus makela [ 2023-02-08 ] |
|
Closing this as Incomplete since there's a possibility that this was caused by |