[MDEV-6188] master_retry_count (ignored if disconnect happens on SET master_heartbeat_period) Created: 2014-04-30 Updated: 2014-10-11 Resolved: 2014-06-17 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 5.5.38, 10.0.10 |
| Fix Version/s: | 5.5.39, 10.0.13 |
| Type: | Bug | Priority: | Critical |
| Reporter: | BELUGABEHR | Assignee: | Kristian Nielsen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | replication, slave | ||
| Description |
|
Hello, The documentation for the "master_retry_count" variable marks it as deprecated: https://mariadb.com/kb/en/replication-and-binary-log-server-system-variables/#master_retry_count However, there is no indication on how to go about enabling this feature. It appears that there may be a larger issue here: http://serverfault.com/questions/522207/mariadb-replication-not-auto-reconnecting I'm using 10.0.10 Thanks! |
| Comments |
| Comment by Elena Stepanova [ 2014-04-30 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Hi. Thanks for noticing. I've updated the page: in MariaDB as of 10.0.10, the option is not deprecated. Regarding the question at serverfault.com (unrelated to the documentation issue): The problem here is that master_retry_count works most of the time (errors upon reading packets, connecting to the master, etc.). But in some special cases, for example while trying to execute a query on master, such as SET master_heartbeat_period, if the query fails on whatever reason, the slave considers it a fatal error and gives up. From all I see, it's a bug present in MariaDB, including current 10.0, and MySQL up to 5.5. It was fixed in MySQL 5.6: the error code returned upon query execution is checked against the list of "network failures", and if it's one of those, the connection retry happens as usual. Meanwhile, if you are concerned about this issue, configuring MASTER_HEARTBEAT_PERIOD = interval should make it go away, although of course disabling heartbeats can have other consequences if the slave has idle periods longer than slave_net_timeout – it will increase the number of reconnects. It's probably not a big deal when the connection is that poor, but should be taken into account anyway. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-04-30 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Here is the patch in MySQL 5.6 tree where (I think) the problem was fixed:
It sounds unrelated, but it contains, among other things, this diff (it's not the full hunk, just the part that seems most relevant):
| ||||||||||||||||||||||||||||||||||||||||||||
| Comment by BELUGABEHR [ 2014-05-01 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the great insight. How do I interrogate this variable on my system? It is not listed in "SHOW SLAVE STATUS". | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-05-01 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
If you mean master-retry-count, currently you can't. There was an upstream bugreport about it, http://bugs.mysql.com/bug.php?id=44486 . Since this bugfix was a separate commit, hopefully it will make it to 10.0 tree along with other 5.6 bugfixes as a part of MDEV-5242 activity. I'll add a note to the latter task, just in case (although, it won't make it to the 10.0.11 release which is due any day; it will happen later). | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by BELUGABEHR [ 2014-05-01 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Many thanks! Not sure how you want to handle this ticket, but I will monitor for the changes on MDEV-5242. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2014-06-17 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Pushed into 5.5.39 | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anton Avramov [ 2014-10-02 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
This bug states it was fixed in 10.0.13, however I keep experience it in It seams that this is one of those network issues that is not in the list. |