[MDEV-5165] Duplicate MDEV-4452 Created: 2013-10-21 Updated: 2014-10-17 Resolved: 2014-10-17 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | OTHER |
| Affects Version/s: | 10.0.4, 5.5.33a |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | VAROQUI Stephane | Assignee: | Elena Stepanova |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Debian Squeeze |
||
| Issue Links: |
|
||||||||
| Description |
|
Facing the same sur MYSQL1 : répli "ccmstats_lucifer" Error 'Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED' on query. Default database: 'ccmstats_shard03'. Query: 'replace into `ccmstats_shard03`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(3363428860,'2013-10-19 22:27:41','/download/start/descargar-14103-driver-de-video-de-lenovo-ibm-thinkpad-t30','','es.kioskea.net','http://static.ak.facebook.com/connect/xd_arbiter.php?version=27',9175071638627673410)' répli "ccmstats_mysql1" Error 'Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED' on query. Default database: 'ccmstats_shard07'. Query: 'replace into `ccmstats_shard07`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(1947240397,'2013-10-19 09:33:46','/sites/details/1089563.jjwxc.net','','www.commentcamarche.net','http://www.quanneiren.com/seo/?page=1068&url=1089563.jjwxc.net',2180121150729504318)' sur LUCIFER : répli "ccmstats_gertrude" Error 'Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED' on query. Default database: 'ccmstats_shard13'. Query: 'replace into `ccmstats_shard13`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(1323859475,'2013-10-19 11:47:46','/forum/affich-1573832-pourquoi-mon-timer-ne-s-execute-pas','','codes-sources.commentcamarche.net','http://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fcodes-sources.commentcamarche.net%2Fforum%2Faffich-1573832-pourquoi-mon-timer-ne-s-execute-pas&ei=NVViUoq0PMa90QXl9ICACQ&usg=AFQjCNH10nmC3K-H9cMsSbl1rj7V7M_V7Q&bv',2736937289062304772)' répli "ccmstats_lucifer" Error 'Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED' on query. Default database: 'ccmstats_shard10'. Query: 'replace into `ccmstats_shard10`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(2903347022,'2013-10-19 09:27:37','/news/12118-firefox-la-nuova-versione-nel-play-store','','it.kioskea.net','http://184.84.222.35/news/12118-Firefox%2C+la+nuova+versione+nel+Play+Store',9684943911008351985)' sur GERTRUDE : Error 'Got timeout reading communication packets' on query. Default database: 'ccmstats_shard24'. Query: 'replace into `ccmstats_shard24`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(3192578384,'2013-10-19 04:04:09','/download/start/descargar-16307-driver-de-audio-de-placa-base-pcchips-p27g','','es.kioskea.net','http://static.ak.facebook.com/connect/xd_arbiter.php?version=27',6488389282267718615)' The master is 5.5.33a CREATE TABLE `domaine_federated` ( ) ENGINE=federated CONNECTION='PUMA/domaine' CREATE TABLE `url_federated` ( The slave have a before insert trigger define like this CREATE TABLE `ccmreferers` ( DECLARE l_idUrl INT unsigned DEFAULT 0; delimiter // CREATE DEFINER=`root`@`%` FUNCTION `GetIdUrl`(l_strUrl varchar(255)) RETURNS int(11) delimiter // CREATE DEFINER=`root`@`%` FUNCTION `GetIdDomaine`(l_strDomaine char(50)) RETURNS int(11) SELECT IdDomaine into l_IdDomaine FROM domaine_federated where Domaine = l_strDomaine; END IF; |
| Comments |
| Comment by VAROQUI Stephane [ 2013-10-23 ] | ||||||||||
|
Helena , you can notice that all slaves are pointing to the same remote table that slaves just break randomly at no predefine time . We have set a cron to select from the federated table to be able to produce other activity on the remote table . That cron also get the same error . Notice as well that the time that each slave breaks is very random Just restarting the replication works so replaying the same query at a later time is fine | ||||||||||
| Comment by VAROQUI Stephane [ 2013-10-23 ] | ||||||||||
|
Server that get the physical table is in reality MariaDB 5.5.31
Uptime =9077854 Variables thread_cache_size = 64 net.core.somaxconn = 4096 | ||||||||||
| Comment by Elena Stepanova [ 2013-10-23 ] | ||||||||||
|
Hi Stephane, Do you understand correctly from your comment above that the problem is sporadic, not reliably reproducible with the provided structures and queries? | ||||||||||
| Comment by VAROQUI Stephane [ 2013-10-23 ] | ||||||||||
|
Elena Yes sporadic issue and i don't really know about the policy in case of duplicate issue that does not provide any solution at the end . I'm more coming to you for help in a methodology to help founding the cause The error clearly state it could be a network issue on the remote server but the applications using that same server does not suffer the same issues. that server that hold the federated table is a master so constantly used and every error from the application is logged into syslog . We only have one error per day happening every day at the same time . The client is investigating this. but it does not match timming with our sporadic issue codes-sources.commentcamarche.net web18 2013-10-19 04:12:00 /profile/user/Bul3:/var/www/vhosts/www.commentcamarche.net/include/ccmfunctions.php3:737 - Acc?s ? la base To come back to network potential isues i have the client to come back to a more conservative tcp setting From this setup to to this setup And we will see.... On the master looking at the query monitoring we do need see any queries popping at special unexpected response time. The only suboptimal status are Aborted_connects | 157 | If we state the issue is not on the server that hold the table but more on servers that use the federated table what we observe is an deadlock on the slave and 30 minutes later the replication is broken Oct 19 11:21:13 lucifer mysqld: 131019 11:21:13 [ERROR] Master 'ccmstats_gertrude': Slave SQL: Error 'Deadlock found when trying to get lock; try restarting transaction' on query. Default database: 'ccmstats_shard10'. Query: 'replace into `ccmstats_shard10`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(1315980540,'2013-10-19 11:21:13','/forum/affich-4000625-prime-noel-condition-d-attribution','','droit-finances.commentcamarche.net','https://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CFwQFjAE&url=http%3A%2F%2Fdroit-finances.commentcamarche.net%2Fforum%2Faffich-4000625-prime-noel-condition-d-attribution&ei=KE1iUsfIHIeu0QXt5YGYBg&usg=AFQjCNGauJn6T9vDHk4LSLUQOI43iD_NwQ&s',5321267248370821353)', Internal MariaDB error code: 1213 Oct 19 11:47:46 lucifer mysqld: 131019 11:47:46 [ERROR] Master 'ccmstats_gertrude': Slave SQL: Error 'Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED' on query. Default database: 'ccmstats_shard13'. Query: 'replace into `ccmstats_shard13`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(1323859475,'2013-10-19 11:47:46','/forum/affich-1573832-pourquoi-mon-timer-ne-s-execute-pas','','codes-sources.commentcamarche.net','http://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fcodes-sources.commentcamarche.net%2Fforum%2Faffich-1573832-pourquoi-mon-timer-ne-s-execute-pas&ei=NVViUoq0PMa90QXl9ICACQ&usg=AFQjCNH10nmC3K-H9cMsSbl1rj7V7M_V7Q&bv',2736937289062304772)', Internal MariaDB error code: 1296 Asking the client to check if this predictable after a deadlock a replication break . Now in any case . Looking at federatedX code you can see that federated have auto reconnect flag on . the question is what happen if it failed, does it retry ? and if the query is inside a replication thread so what ? Any way to make this stable dispite in our case i'm really not suspecting a network issue . Thanks | ||||||||||
| Comment by VAROQUI Stephane [ 2013-10-24 ] | ||||||||||
|
More input found on some various error log We sporadicly get the following error as well on some MariaDB 10 slave that get the federated tables mysqldump: Couldn't execute 'FLUSH /*!40101 LOCAL */ TABLES': Got an error writing communication packets (1160) It could related to this bug reported on MySQL and to the same issue we are facing | ||||||||||
| Comment by VAROQUI Stephane [ 2013-10-30 ] | ||||||||||
|
We have replace FederatedX with Spider Engine and the issue has show up again but getting replication stopped on different error messages . One server have Last_SQL_Error: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'ccmstats_shard12'. Query: 'replace into `ccmstats_shard12`.`ccmreferers And an other one have Last_SQL_Error: Error 'Remote MySQL server has gone away' on query. Default database: 'ccmstats_shard05'. Query: 'replace into `ccmstats_shard05`.`ccmreferers Now one other server is fine and never stopped his replication and the difference is that this server does not have load on it . So this state that the issue is not on the remote server but more caused by activity on the one that his holding the linked table. In a more generic way to fixe this we have slave-skip-errors, can we have slave-retry-errors | ||||||||||
| Comment by VAROQUI Stephane [ 2013-12-19 ] | ||||||||||
|
Fixed that was an issue in the trigger code | ||||||||||
| Comment by Elena Stepanova [ 2014-10-17 ] | ||||||||||
|
I added a test case to |