Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
5.5.35
-
None
-
Linux (slackware)
Description
While replicating, slave server randomly prints this error and disconnects from master:
[ERROR] Slave I/O: The slave I/O thread stops because a fatal error is encountered when it try to get the value of SERVER_ID variable from master. Error: , Error_code: 1159
[Note] Slave I/O thread exiting, read up to log 'mysql-bin.xxxxxx', position xxxxxx
Where error code 1159 is in fact ER_NET_READ_INTERRUPTED: Got timeout reading communication packets
Executing STOP SLAVE; START SLAVE; on the slave server resumes the replication without any problem. The slave server should reconnect automatically though, which doesn't happen.
I believe the issue is in mariadb-sources/sql/slave.cc
There is a function called is_network_error(), which checks if the given error is network related. It's missing a check for ER_NET_READ_INTERRUPTED. Patch is very trivial:
--- sql/slave.cc<----->2013-07-17 09:51:31.000000000 -0500
|
+++ sql/slave.cc<-->2014-02-19 02:06:55.591593796 -0600
|
@@ -1215,6 +1215,7 @@ bool is_network_error(uint errorno)
|
errorno == ER_CON_COUNT_ERROR ||
|
errorno == ER_CONNECTION_KILLED ||
|
errorno == ER_NEW_ABORTING_CONNECTION ||
|
+ errorno == ER_NET_READ_INTERRUPTED ||
|
errorno == ER_SERVER_SHUTDOWN)
|
return TRUE;
|
Then mariadb will know that it was network related error and will try to reconnect automatically.