[MDEV-7395] change master to Relay_Log_File/log_pos breaks Relay_Master_Log_File:Exec_Master_Log_Pos Created: 2014-12-30  Updated: 2022-09-12  Resolved: 2022-09-12

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None


 Description   

After thinking I was well on the way to recovering a slave:

                  Master_Port: 3306
                Connect_Retry: 60  
              Master_Log_File: mysql-bin.006189
          Read_Master_Log_Pos: 50082470  
               Relay_Log_File: mysqld-relay-bin.001144
                Relay_Log_Pos: 104857605  
        Relay_Master_Log_File: mysql-bin.006105
             Slave_IO_Running: Yes  
            Slave_SQL_Running: Yes
              Replicate_Do_DB:   
          Replicate_Ignore_DB: 
           Replicate_Do_Table:   
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table:   
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0  
                   Last_Error: 
                 Skip_Counter: 0  
          Exec_Master_Log_Pos: 9258445
              Relay_Log_Space: 8480309601  
              Until_Condition: None
               Until_Log_File:   
                Until_Log_Pos: 0
           Master_SSL_Allowed: No  
           Master_SSL_CA_File: 
           Master_SSL_CA_Path:   
              Master_SSL_Cert: 
            Master_SSL_Cipher:   
               Master_SSL_Key: 
        Seconds_Behind_Master: 963719  
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0  
                Last_IO_Error: 
               Last_SQL_Errno: 0  
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids:   
             Master_Server_Id: 12022
               Master_SSL_Crl:   
           Master_SSL_Crlpath: 
                   Using_Gtid: No  
                  Gtid_IO_Pos: 
1 row in set (0.00 sec)
(interval < 30 sec)
 
stop slave;
set global sync_binlog=0, global innodb_flush_log_at_trx_commit=0
start slave;
 
                 Master_Host: 192.168.179.103
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60  
              Master_Log_File: mysql-bin.006189
          Read_Master_Log_Pos: 54032040
               Relay_Log_File: mysqld-relay-bin.001144
                Relay_Log_Pos: 104857605
        Relay_Master_Log_File: mysql-bin.006105
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1594  
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0 
          Exec_Master_Log_Pos: 61589175
              Relay_Log_Space: 8484259756
              Until_Condition: None
               Until_Log_File:   
                Until_Log_Pos: 0
           Master_SSL_Allowed: No  
           Master_SSL_CA_File: 
           Master_SSL_CA_Path:   
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:   
               Last_SQL_Errno: 1594
               Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 12022
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: No
 
Looking at the relay file:
ls -al mysqld-relay-bin.001144
-rw-rw---- 1 mysql mysql 104857705 Dec 19 03:23 mysqld-relay-bin.001144
 
 mysqlbinlog --start-position 104857605 mysqld-relay-bin.001144 | more
ERROR: /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
Error in Log_event::read_log_event(): 'Event too big', data_len: 1163019592, event_type: 48
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
ERROR: DELIMITER /*!*/;
Could not read entry at offset 104857605: Error in log format or read error.# at 4
 
#141215  0:30:57 server id 12025  end_log_pos 248       Start: binlog v 4, server v 10.0.15-MariaDB-1~wheezy-log created 141215  0:30:57
BINLOG '
wSuOVA/5LgAA9AAAAPgAAAAAAAQAMTAuMC4xNS1NYXJpYURCLTF+d2...
AAAAAAAAAAAEEwQAj+72WQ==
'/*!*/;
# at 291
#141218 23:29:46 server id 12022  end_log_pos 248       Start: binlog v 4, server v 10.0.15-MariaDB-1~wheezy-log created 141218 23:29:46
BINLOG '
amOTVA/2LgAA9AAAAPgAAAAAAAQAM.....
AAAAAAAAAAAEEwQA8pOnBg==
'/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

I concluded its 100 bytes from the end of the file, going to reposition to the the next file.

MariaDB [(none)]> change master to relay_log_file='mysqld-relay-bin.001145',RELAY_LOG_POS=0;

start slave

I soon (<2 mins) there after found a duplicate key error on an insert which I suspected was related and quickly clear it. set sql_log_bin=0; REPLACE INTO seller_counters (seller_id, category_id, counter) VALUES (36239, 321, 1); set global sql_slave_skip_counter=1; start slave

The show slave status soon (<2 mins) after found another error:

Relay_Log_File: mysqld-relay-bin.001147
                Relay_Log_Pos: 7901913
        Relay_Master_Log_File: mysql-bin.006189
             Slave_IO_Running: Yes
            Slave_SQL_Running: No

(sorry I didn't record the full output at the time)

So from position:

               Relay_Log_File: mysqld-relay-bin.001144
                Relay_Log_Pos: 104857605
        Relay_Master_Log_File: mysql-bin.006105
          Exec_Master_Log_Pos: 9258445

until a very short time later after a change master to changed the relay_log_file/pos to the next file and with a 100 bytes changed the new position is:

Relay_Log_File: mysqld-relay-bin.001147
                Relay_Log_Pos: 7901913
        Relay_Master_Log_File: mysql-bin.006189

The relay log files are 100M though though probably a logrotate not using shared scripts the following pattern occurs about 3 hourly

-rw-rw----  1 mysql  mysql  104857912 Dec 21 00:33 mysqld-relay-bin.001187
-rw-rw----  1 mysql  mysql        569 Dec 21 00:33 mysqld-relay-bin.001188
-rw-rw----  1 mysql  mysql        341 Dec 21 00:33 mysqld-relay-bin.001189

The master log files are more strict on 100M an only differ on on what appears to be a daily logrotate.

The relay log file has gone from 001144 to 001147 in the same time Relay_Master_Log_File has gone from 006105 (19th December) to mysql-bin.006189 (30 December, which is the current binlog on the master).

binlog_format                           | STATEMENT

A few hours later I decided to capture this right without gaps.

Take a stopped slave position and ignoring this error (i'll put a new bug for it).

*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 192.168.179.103
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.006189
          Read_Master_Log_Pos: 96248700
               Relay_Log_File: mysqld-relay-bin.001150
                Relay_Log_Pos: 94786159
        Relay_Master_Log_File: mysql-bin.006189
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbin
log' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you
 want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 94785872
              Relay_Log_Space: 8316760144

change master to Relay_Log_File='mysqld-relay-bin.001150',relay_log_pos=94785831; show slave status;

               Slave_IO_State: 
                  Master_Host: 192.168.179.103
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.006189
          Read_Master_Log_Pos: 96248700
               Relay_Log_File: mysqld-relay-bin.001150
                Relay_Log_Pos: 94785831
        Relay_Master_Log_File: mysql-bin.006189
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 96248700
              Relay_Log_Space: 8316760144
              Until_Condition: None

so changing Relay_Log_Pos: 94785831
to Relay_Log_Pos: 94786159

changed

          Exec_Master_Log_Pos: 94785872
          Exec_Master_Log_Pos: 96248700

so changing the relay_log_pos back 300 bytes made the exec_master_log_pos jump forward 1.5M.

Looks like this is grabbing a current master position when then relay_log_pos/file is changed.



 Comments   
Comment by Kristian Nielsen [ 2015-01-06 ]

This sounds a lot like MDEV-7237, which is fixed in 10.0.16.

But if 10.0.16 (when it is released) does not fix it, then please feel free to re-open this bug with updated info.

Comment by Daniel Black [ 2015-01-08 ]

This sounds a lot like MDEV-7237, which is fixed in 10.0.16.

Doesn't to me. The Relay_Master_log_file:Exec_master_log_pos became corrupted in the absence of a running sql or io thread.

Comment by Daniel Black [ 2015-01-08 ]

also in the second example the position changed and not the file.

Comment by Arjen Lentz [ 2015-01-12 ]

Kristian - this doesn't appear to be fixed (as per Daniel testing) and thus this would not be a duplicate of another fixed issue.

Comment by Kristian Nielsen [ 2015-01-12 ]

> Kristian - this doesn't appear to be fixed (as per Daniel testing) and thus
> this would not be a duplicate of another fixed issue.

Arjen, so what do you expect me to do with this issue?

  • Kristian.
Comment by Daniel Black [ 2015-01-12 ]

upstream bug with test case thanks to elenst

Comment by Arjen Lentz [ 2015-01-13 ]

Hi Kristian

Well, if it's not fixed then it can't be closed - doesn't mean I know exactly where we should go next, but as a baseline it seems like the right direction. That way others would also be able to add additional information.
My thoughts at this stage:

  • Does the upstream bug provide any lead on where to look?
  • Also Daniel's earlier comment on that particular variable getting corrupted might be something that can be tracked in the source, as I presume it doesn't get written to in that many places.
  • If it's corruption in a file (can't discern that for sure from Dan's comment), there might be a logic error when writing that file (double linefeed, that kind of nonsense) in certain cases.

Thanks
Arjen.

Comment by Daniel Black [ 2015-01-13 ]

I think the upstream bug is an exact duplicate.

The cause of the corrupted position is hopefully fixed as per the other bugs.

I see trying to map any relay log file/pos changes back to the master log files/pos as quite a hard problem I assume if there is no easy way.

Invalidating the display of the relay_master_log_file:exec_master_log_pos until some IO/SQL thread convergence occurs may be a solution.

Could just documented as don't rely on the relay_master_log_file:pos after changing the local relay file/pos.

I have to admit that this might be the first time I've actually changed relay_log_file/pos, and i only tried to do it because of another error corrupting the positions, and expired master binlogs making a resync there impossible. So I doubt its something commonly done.

Comment by Sergei Golubchik [ 2022-09-12 ]

10.0 was EOLed in March 2019

Generated at Thu Feb 08 07:19:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.