[MDEV-7395] change master to Relay_Log_File/log_pos breaks Relay_Master_Log_File:Exec_Master_Log_Pos Created: 2014-12-30 Updated: 2022-09-12 Resolved: 2022-09-12 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | OTHER |
| Affects Version/s: | None |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Daniel Black | Assignee: | Unassigned |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Description |
|
After thinking I was well on the way to recovering a slave:
I concluded its 100 bytes from the end of the file, going to reposition to the the next file. MariaDB [(none)]> change master to relay_log_file='mysqld-relay-bin.001145',RELAY_LOG_POS=0; start slave I soon (<2 mins) there after found a duplicate key error on an insert which I suspected was related and quickly clear it. set sql_log_bin=0; REPLACE INTO seller_counters (seller_id, category_id, counter) VALUES (36239, 321, 1); set global sql_slave_skip_counter=1; start slave The show slave status soon (<2 mins) after found another error:
(sorry I didn't record the full output at the time) So from position:
until a very short time later after a change master to changed the relay_log_file/pos to the next file and with a 100 bytes changed the new position is:
The relay log files are 100M though though probably a logrotate not using shared scripts the following pattern occurs about 3 hourly
The master log files are more strict on 100M an only differ on on what appears to be a daily logrotate. The relay log file has gone from 001144 to 001147 in the same time Relay_Master_Log_File has gone from 006105 (19th December) to mysql-bin.006189 (30 December, which is the current binlog on the master).
A few hours later I decided to capture this right without gaps. Take a stopped slave position and ignoring this error (i'll put a new bug for it).
change master to Relay_Log_File='mysqld-relay-bin.001150',relay_log_pos=94785831; show slave status;
so changing Relay_Log_Pos: 94785831 changed
so changing the relay_log_pos back 300 bytes made the exec_master_log_pos jump forward 1.5M. Looks like this is grabbing a current master position when then relay_log_pos/file is changed. |
| Comments |
| Comment by Kristian Nielsen [ 2015-01-06 ] |
|
This sounds a lot like But if 10.0.16 (when it is released) does not fix it, then please feel free to re-open this bug with updated info. |
| Comment by Daniel Black [ 2015-01-08 ] |
Doesn't to me. The Relay_Master_log_file:Exec_master_log_pos became corrupted in the absence of a running sql or io thread. |
| Comment by Daniel Black [ 2015-01-08 ] |
|
also in the second example the position changed and not the file. |
| Comment by Arjen Lentz [ 2015-01-12 ] |
|
Kristian - this doesn't appear to be fixed (as per Daniel testing) and thus this would not be a duplicate of another fixed issue. |
| Comment by Kristian Nielsen [ 2015-01-12 ] |
|
> Kristian - this doesn't appear to be fixed (as per Daniel testing) and thus Arjen, so what do you expect me to do with this issue?
|
| Comment by Daniel Black [ 2015-01-12 ] |
|
upstream bug with test case thanks to elenst |
| Comment by Arjen Lentz [ 2015-01-13 ] |
|
Hi Kristian Well, if it's not fixed then it can't be closed - doesn't mean I know exactly where we should go next, but as a baseline it seems like the right direction. That way others would also be able to add additional information.
Thanks |
| Comment by Daniel Black [ 2015-01-13 ] |
|
I think the upstream bug is an exact duplicate. The cause of the corrupted position is hopefully fixed as per the other bugs. I see trying to map any relay log file/pos changes back to the master log files/pos as quite a hard problem I assume if there is no easy way. Invalidating the display of the relay_master_log_file:exec_master_log_pos until some IO/SQL thread convergence occurs may be a solution. Could just documented as don't rely on the relay_master_log_file:pos after changing the local relay file/pos. I have to admit that this might be the first time I've actually changed relay_log_file/pos, and i only tried to do it because of another error corrupting the positions, and expired master binlogs making a resync there impossible. So I doubt its something commonly done. |
| Comment by Sergei Golubchik [ 2022-09-12 ] |
|
10.0 was EOLed in March 2019 |