[MDEV-8354] out-of-order error with --gtid-ignore-duplicates and row-based replication Created: 2015-06-22 Updated: 2015-08-01 Resolved: 2015-06-24 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.0.20 |
| Fix Version/s: | 10.0.21, 10.1.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Matt Neth | Assignee: | Kristian Nielsen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | gtid | ||
| Environment: |
CentOS 7.1.1503 64bit |
||
| Issue Links: |
|
||||||||
| Description |
|
I'm trying to set up multiple masters with multi-source replication slaves but i'm getting the following error even with gtid-ignore-duplicates ON: If I do just a single insert/update query, it seems to work just fine. Under any sort of load (even mysqlslap) replication stops for one of the slave connections on my slave with the error above. Here's my setup: A and B are masters. C is the slave. Replication is currently as follows (all are using master_use_gtid=current_pos) The MariaDB-server package is installed from the repo:
Server A config:
Server B config:
Server C config:
— Do a single insert/update/create and it'll work just fine. No error. Now Server C's slave connection to Server B will stop (Slave_SQL_Running is No) with the following error (the GTID position isn't the same every time) Server C's gtid_slave_pos is, however, 1-1-111 since the transactions worked through the replication with server A. If I simply do a {reset master;}on server C and then {start slave 'serverB';} then the error goes away. Doing the mysqlslap again will cause the error again as well. I spoke to knielsen in the #maria IRC chat about this last week and he seemed to be puzzled by the issue as well. |
| Comments |
| Comment by Kristian Nielsen [ 2015-06-23 ] | ||||||||||||||||||||||||||||||||
|
Is that the exact mysqlslap command you used? For me, mysqlslap -u root -p --auto-generate-sql does not do anything, and I do not get the error: Average number of seconds to run all queries: 0.011 seconds "number of queries" is 0, seems like it doesn't run anything. I will try to fiddle with mysqlslap command line arguments, but might be useful to know the exact command line that was used by reporter. | ||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2015-06-23 ] | ||||||||||||||||||||||||||||||||
|
Ok, I could reproduce with this mysqlslap command line: bld/client/mysqlslap --socket=bld/data1/mysql.sock -uroot --auto-generate-sql --auto-generate-sql-execute-number=10000 --concurrency=10 | ||||||||||||||||||||||||||||||||
| Comment by Matt Neth [ 2015-06-23 ] | ||||||||||||||||||||||||||||||||
|
Yeah, that mysqlslap command is all that I need to run the create the error. gtid_binlog_pos on the master goes up by 109 transactions every time I run that so it's doing the queries. Mysqlslap is just how I get it to cause the error in my test environment. My main environment has enough natural traffic to cause this issue too when I was originally trying to add a second master to it so it's not just mysqlslap that causes it. | ||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2015-06-23 ] | ||||||||||||||||||||||||||||||||
|
Investigation shows that rpl_global_gtid_slave_state.release_domain_owner() release_domain_owner() is called from rpl_group_info::cleanup_context(). So it seems that the release_domain_owner() call is in the wrong place, but More investigation is required to confirm this and find the correct fix, but | ||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2015-06-23 ] | ||||||||||||||||||||||||||||||||
|
I think this is the patch to fix the problem. I will try to create a
| ||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2015-06-23 ] | ||||||||||||||||||||||||||||||||
|
http://lists.askmonty.org/pipermail/commits/2015-June/008083.html |