[MDEV-7396] parallel replication - incorrect binlog offset calculation after deadlock Created: 2014-12-30  Updated: 2015-01-08  Resolved: 2015-01-06

Status: Closed
Project: MariaDB Server
Component/s: Replication
Fix Version/s: 10.0.16

Type: Task Priority: Major
Reporter: Daniel Black Assignee: Kristian Nielsen
Resolution: Duplicate Votes: 0
Labels: parallelslave, replication


 Description   

Given the show slave status:

              Master_Log_File: mysql-bin.006189
          Read_Master_Log_Pos: 70373742
               Relay_Log_File: mysqld-relay-bin.001150
                Relay_Log_Pos: 94786159
        Relay_Master_Log_File: mysql-bin.006189
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1213
                   Last_Error: Deadlock found when trying to get lock; try restarting transaction
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 94785872
              Relay_Log_Space: 8290885186
 
start slave
 ls -al mysqld-relay-bin.001150
-rw-rw---- 1 mysql mysql 104600980 Dec 19 09:50 mysqld-relay-bin.001150
 
              Master_Log_File: mysql-bin.006189
          Read_Master_Log_Pos: 70715093
               Relay_Log_File: mysqld-relay-bin.001150
                Relay_Log_Pos: 94786159
        Relay_Master_Log_File: mysql-bin.006189
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 94785872
              Relay_Log_Space: 8291226537
 
mysqlbinlog --base64-output=DECODE-ROWS --start-position 94786159 mysqld-relay-bin.001150 | more
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 842218041, event_type: 45
ERROR: Could not read entry at offset 94786159: Error in log format or read error.
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
 
Obviously 94786159 isn't a start position

Looking for what real positions exist in the binlog:

mysqlbinlog mysqld-relay-bin.001150 | grep "^# at 9478[0-9][0-9][0-9][0-9]" | tee /tmp/pos.txt
 
at 94785793
at 94785831
at 94786200
at 94786227
at 94786265

so 94786159 is a wrong position number. looking at prev transaction pos 94785831

mysqlbinlog --start-position 94785831 mysqld-relay-bin.001150 | more
/*!50530 SET @SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @session.max_insert_delayed_threads=0*/;
/*!50003 SET OLD_COMPLETION_TYPE=@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
at 4
#141219 6:25:05 server id 12025 end_log_pos 248 Start: binlog v 4, server v 10.0.15-MariaDB-1~wheezy-log created 141219 6:25:05
BINLOG '
wcSTVA/5LgAA9AAAAPgAAAAAAAQAMTAuMC4xNS1NYXJpYURCLTF+d2hlZXp5LWxvZwAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAA3AAEGggAAAAICAgCAAAACgoKAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAEEwQApm6oDg==
'/*!*/;
at 248
#141219 6:25:02 server id 12022 end_log_pos 0 Start: binlog v 4, server v 10.0.15-MariaDB-1~wheezy-log created 141219 6:25:02
BINLOG '
vsSTVA/2LgAA9AAAAAAAAAAgAAQAMTAuMC4xNS1NYXJpYURCLTF+d2hlZXp5LWxvZwAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAA3AAEGggAAAAICAgCAAAACgoKAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAEEwQAhzJdFw==
'/*!*/;
at 94785831
#141219 9:32:45 server id 12022 end_log_pos 95042978 Query thread_id=16552077 exec_time=0 error_code=0
use `gemrock_auctions`/*!*/;
SET TIMESTAMP=1418981565/*!*/;
SET @session.pseudo_thread_id=16552077/*!*/;
SET @session.foreign_key_checks=1, @session.sql_auto_is_null=0, @session.unique_checks=1, @session.autocommit=1/*!*/;
SET @session.sql_mode=1073741824/*!*/;
SET @session.auto_increment_increment=2, @session.auto_increment_offset=2/*!*/;
/*!\C utf8 //!*/;
SET @session.character_set_client=33,@session.collation_connection=33,@session.collation_server=192/*!*/;
SET @session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
INSERT INTO `opalauctions_com`.`sessions` (`id`, `ip_address`, `user_agent`, `data`, `date_added`, `date_updated`) VALUES ('571ald9j87ruqumntllvciddh1', '178.154.206.250', 'Mozilla/5.0 (compatible;
YandexBot/3.0; +http://yandex.com/bots)', '', '2014-12-19 09:32:45', '2014-12-19 09:32:45')
/*!*/;
at 94786200
#141219 9:32:45 server id 12022 end_log_pos 95043005 Xid = 171836529
COMMIT/*!*/;
at 94786227
MariaDB [(none)]> select * from opalauctions_com.`sessions` where id='571ald9j87ruqumntllvciddh1';
Empty set (0.02 sec)

So lets start at this position.

change master to Relay_Log_File='mysqld-relay-bin.001150',relay_log_pos=94785831; start slave;
 
               Slave_IO_State: 
                  Master_Host: 192.168.179.103
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.006189
          Read_Master_Log_Pos: 96248700
               Relay_Log_File: mysqld-relay-bin.001150
                Relay_Log_Pos: 94785831

Hit MDEV-7395 but otherwise replication continued.

MariaDB [(none)]> set global slave_parallel_threads=0; start slave;
 
(yes I'd had enough of hitting bugs)
...
 
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.179.103
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.006189
          Read_Master_Log_Pos: 100171990
               Relay_Log_File: mysqld-relay-bin.001152
                Relay_Log_Pos: 4098842
        Relay_Master_Log_File: mysql-bin.006108
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 4098555
              Relay_Log_Space: 8121038882
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 932657
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 12022
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: No
                  Gtid_IO_Pos: 



 Comments   
Comment by Daniel Black [ 2014-12-30 ]

10.0.15 x86_64 mean to be bug not task. bah...

Comment by Kristian Nielsen [ 2015-01-06 ]

Like MDEV-7395, this looks like a likely duplicate of MDEV-7237. So closing as
a duplicate, when 10.0.16 is released and if it does not fix things, feel free
to re-open with additional info.

Note that the effect of MDEV-7395 is to get the incorrect relay log
file_name. So adjusting the position to a nearby event still gets a
completely wrong position, off by an entire relay log file. So follow-up
errors are likely to result from this.

(I am curious how you got the Last SQL error to be a deadlock error in the
first place. Such errors are supposed to be retried automatically. Did you
run with slave_transaction_retries=0? Or did a transaction really get
deadlocked 10 times in a row (the default value of slave_transaction_retries),
and then give up?)

Comment by Daniel Black [ 2015-01-08 ]

> Note that the effect of MDEV-7395 is to get the incorrect relay log
file_name. So adjusting the position to a nearby event still gets a
completely wrong position, off by an entire relay log file. So follow-up
errors are likely to result from this

Ok. Was planning a table check at some stage anyway.

> I am curious how you got the Last SQL error to be a deadlock error in the
first place. Such errors are supposed to be retried automatically. Did you
run with slave_transaction_retries=0?

no.

> Or did a transaction really get
deadlocked 10 times in a row (the default value of slave_transaction_retries),
and then give up?

Must of. Its still set to the default of 10. As I assume there is no current status variable to tell me I've added this to MDEV-7340 max consecutive parallel deadlocks.

Getting some reviews and perhaps commits on patches like MDEV-7202, MDEV-7198, MDEV-6981 might make me more confident to add other patches to include basic status monitoring capabilities of what is really a significant improvement in mariadb functionality.

Generated at Thu Feb 08 07:19:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.