[MDEV-10391] During async GTID replication Galera crashes after error writing to binlog Created: 2016-07-18 Updated: 2019-12-14 Resolved: 2019-12-14 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Replication |
| Affects Version/s: | 10.1.14 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Trivial |
| Reporter: | Ryan Lavelle | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Incomplete | Votes: | 1 |
| Labels: | galera, replication | ||
| Environment: |
Ubuntu 16.04 Amazon m4.xlarge |
||
| Issue Links: |
|
||||||||
| Description |
|
While running 3 separate Galera clusters of 2 nodes each with P2P async master-master replication between clusters using the second galera node in each cluster as a master to each slave in case of failure we ran into this error where replication stopped. 2016-07-18 17:16:33 140072587262720 [ERROR] Master 'va_2': mysqld: Error writing file 'binlog' (errno: 1950 "Unknown error 1950") After reboot the node lost the slave settings due to SST. After recreating the slave settings the GTID position was intact but would not start with MASTER_USE_GTID=current_pos with error message: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 1-104-68680, which is not in the master's binlog. Since the master's binlog contains GTIDs with higher sequence numbers, it probably means that the slave has diverged due to executing extra erroneous transactions'. Because the slave died while replicating during a load test from another node running it, I don't think it could have diverged. Also, I attempted to increase the GTID_SLAVE_POS incrementally by 10 transactions and received the same error message and couldn't get replication to resume. Additionally, we were running all on the same gtid_domain_id with unique server ids per cluster, with each node in the cluster with the same server_id to avoid duplicated replication. Many tests ran fine until we hit the error writing the binlog and the corresponding WSREP error. Full log: 2016-07-18 17:16:33 140072587262720 [ERROR] Master 'va_2': mysqld: Error writing file 'binlog' (errno: 1950 "Unknown error 1950") To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help Server version: 10.1.14-MariaDB-1~xenial Thread pointer: 0x0x7f62f10ba008 Trying to get some variables. |
| Comments |
| Comment by Ryan Lavelle [ 2016-07-18 ] |
|
Issue seemed to go away when using separate gtid domain ids for each cluster |
| Comment by Geoff Montee (Inactive) [ 2018-02-20 ] |
|
Did you have encrypt-tmp-files set? The error and backtrace looks similar to |