[MDEV-5703] [PATCH] Slave disconnects and fails to reconnect on Error_code: 1159 Created: 2014-02-19  Updated: 2014-03-06  Resolved: 2014-03-06

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.5.35
Fix Version/s: 5.5.37, 10.0.9

Type: Bug Priority: Major
Reporter: Tomas Matejicek Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: replication
Environment:

Linux (slackware)



 Description   

While replicating, slave server randomly prints this error and disconnects from master:

[ERROR] Slave I/O: The slave I/O thread stops because a fatal error is encountered when it try to get the value of SERVER_ID variable from master. Error: , Error_code: 1159
[Note] Slave I/O thread exiting, read up to log 'mysql-bin.xxxxxx', position xxxxxx

Where error code 1159 is in fact ER_NET_READ_INTERRUPTED: Got timeout reading communication packets

Executing STOP SLAVE; START SLAVE; on the slave server resumes the replication without any problem. The slave server should reconnect automatically though, which doesn't happen.

I believe the issue is in mariadb-sources/sql/slave.cc

There is a function called is_network_error(), which checks if the given error is network related. It's missing a check for ER_NET_READ_INTERRUPTED. Patch is very trivial:

--- sql/slave.cc<----->2013-07-17 09:51:31.000000000 -0500
+++ sql/slave.cc<-->2014-02-19 02:06:55.591593796 -0600
@@ -1215,6 +1215,7 @@ bool is_network_error(uint errorno)
       errorno == ER_CON_COUNT_ERROR ||
       errorno == ER_CONNECTION_KILLED ||
       errorno == ER_NEW_ABORTING_CONNECTION ||
+      errorno == ER_NET_READ_INTERRUPTED ||
       errorno == ER_SERVER_SHUTDOWN)
     return TRUE;

Then mariadb will know that it was network related error and will try to reconnect automatically.



 Comments   
Comment by Elena Stepanova [ 2014-02-20 ]

Hi Kristian,

Could you please take a look at the suggested patch to see if it's valid (and maybe push it if it is)?

Comment by Kristian Nielsen [ 2014-03-04 ]

Pushed to 10.0-base (will be later merged to 10.0)

Comment by Kristian Nielsen [ 2014-03-04 ]

And btw, thanks a lot for the report and patch, Tomas Matejicek!

Comment by Tomas Matejicek [ 2014-03-04 ]

You are welcome. May I ask you why the fix is not added to MariaDB
5.5.* like 5.5.36 or so?
Thank you

Tomas M

On Tue, Mar 4, 2014 at 2:46 PM, Kristian Nielsen (JIRA)

Comment by Kristian Nielsen [ 2014-03-04 ]

> May I ask you why the fix is not added to MariaDB 5.5.* like 5.5.36 or so?

No particular reason. I've now pushed to 5.5 as well.

  • Kristian.
Comment by Laurynas Biveinis [ 2014-03-05 ]

This is also https://bugs.launchpad.net/percona-server/+bug/1268729 aka http://bugs.mysql.com/bug.php?id=71374. There is also a related bug https://bugs.launchpad.net/percona-server/+bug/1268735 aka http://bugs.mysql.com/bug.php?id=71375.

Comment by Ives Stoddard [ 2014-03-05 ]

will this patch also make its way into the 10.0.9 release? i was about to start with 10.0.8, for the new multi-source replication until 10.0.10 GA is available.

Comment by Sergei Golubchik [ 2014-03-05 ]

most probably — yes, I've just merged it into 10.0.

Generated at Thu Feb 08 07:06:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.