[MDEV-4452] Problem with FederatedX between two local MariaDB servers Created: 2013-04-29  Updated: 2022-10-31  Resolved: 2022-10-31

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Federated
Affects Version/s: 5.5.30, 5.5.40, 10.0.14
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Hank Eskin (Inactive) Assignee: Sergei Golubchik
Resolution: Not a Bug Votes: 1
Labels: federated
Environment:

CentOS 5.0 64-bit on VMWare ESxi 5


Issue Links:
Duplicate
duplicates MDEV-11276 Mulltiple Insert into Federated Table... Closed
duplicates MDEV-17651 FederatedX Table use FLUSH TABLE to s... Closed
is duplicated by MDEV-5165 Duplicate MDEV-4452 Closed

 Description   

I spoke with Sergei at Percona about this issue. I have two MariaDB 5.5.30 (on CentOS 5) servers set up as master-slave (although that should matter, as this is not a replication issue).

Here is my setup. On the slave, I have a federated table linking back to the master. Also on the slave, I have a batch reporting process that inserts records into a local (slave) table which matches the table definition of the table on the master. When the process is complete, I do a bulk insert like this:

insert into <federated_table> select * from <local_table>;

The records are then loaded into the server table, which then flow back to the slave (as well as other slaves) through normal replication.

The problem is that one time in about 20 (about 5% of the time), I receive the following error on the slave:

Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED

Of course, the master is running fine and never hangs or "goes away". When this error occurs, I have my slave reporting code immediately retry the insert query, and it usually works without failure the second time. The two servers are right next to each other and operate on an internal network (10.0.0.x) with just one HP gigabit switch between them. They are both VMWare VMs running in VMWare ESXi 5.

The master server is a development server, so it is not under any load or contention.

I'd like to try to find out some more detailed error logging on the slave when this happens with Federated tables or what could be causing this issue.

Many thanks.



 Comments   
Comment by Hank Eskin (Inactive) [ 2013-04-29 ]

Just to be clear, there are four tables:

Master: results
Slave: results (the slave's replicated version of "results" on the master)
Slave: results_local (a slave local table used for reporting output)
Slave results_fed (federated table linking to table "results" on the master)

The reporting process populates "results_local" on the slave. The problematic insert statement on the slave is:

insert into results_fed select * from results_local;

Hope this clears things up.

Comment by Elena Stepanova [ 2013-05-01 ]

Please ignore my previous question if you received it (deleted now), the summary of the report already answers it.

Comment by Hank Eskin (Inactive) [ 2013-08-16 ]

I would like to add that along the same lines, I am also experiencing another similar problem with FEDERATEDX. I have code that deletes all records for a primary key in a federated table (between the same two servers mentioned above), and then inserts new records replacing those deleted records.

Again, about 5% of the time, I will get an error "Can't write; duplicate key in table <table>' when I know for sure that there are absolutely no duplicate records in the existing table or the source records. The source records are coming from another table with the same primary key, so there can not be duplicate records. If I try the same insert again, it usually works. So something on the original table side is triggering this error, but the Federated engine can't interpret and transmit the error correctly to the federated table database.

Just to eliminate the delete, in my code I now delete the primary key records on the "master" table directly (not through FEDERATED), and then insert into the federated table and the same problem occurs. So the problem is definitely on the federated table insert.

Comment by Elena Stepanova [ 2014-10-17 ]

Eureka (18 months too late).

Since we are talking about some regular analytic/reporting activity, it is probably scheduled and isn't very frequent. Then, it might be that the batch reporting process which does local inserts runs at such time when it finishes close to the moment when the existing federated connection is about to timeout. Sometimes (like, in 5% cases) the process takes a bit longer and the federated connection actually times out. And then we hit the problem – when the remote connection disappeared, the next INSERT ends with either "server has gone away" or with "error reading communication packets", see the test case below.

Test case

source suite/federated/have_federatedx.inc;
 
create table t_remote (a int primary key, b varchar(8));
 
eval create table t_local (a int primary key, b varchar(8))
  engine=federated
  connection='mysql://root@127.0.0.1:$MASTER_MYPORT/test/t_remote';
 
set @wait_timeout.saved = @@global.wait_timeout;
set global wait_timeout = 1;
set wait_timeout = 28800;
 
echo # This one should work;
insert into t_local values (1,'foo');
 
sleep 2;
--error ER_GET_ERRMSG,ER_NET_READ_ERROR
insert into t_local values (2,'bar');
echo # Got error $mysql_errno ($mysql_errname)
echo # This should work again;
insert into t_local values (2,'bar');
 
drop table t_remote, t_local;
 
set global wait_timeout = @wait_timeout.saved;

Comment by Aleksey Midenkov [ 2018-05-03 ]

Another use case with UPDATE:

create or replace table n1 (x int);
create or replace table nf engine=FEDERATED connection='mysql://root@127.0.0.1:$MASTER_MYPORT/test/n1';
insert into n1 values (1);
update nf, n1 set nf.x= 2;
ERROR 1159 (08S01): Got timeout reading communication packets

Comment by Su, Jun-Ming [ 2018-11-12 ]

I read this issue and I only got error Got an error reading communication packets, no MySQL connection flew away message in error log or application log, and once I flush table for foreign tables first, all error will be gone. Please consider this issue MDEV-17651 is different with this issue make duplicate.

If there is something help to solve it, please let me know and I will provide as full as possible.

Comment by Sergei Golubchik [ 2022-10-31 ]

Assuming the "eureka" comment correctly describes the reason of this old issue, it's not a bug. With sufficiently slow queries FederatedX connection can timeout, and to fix that one needs to increase the timeout accordingly.

Generated at Thu Feb 08 06:56:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.