[MDEV-17651] FederatedX Table use FLUSH TABLE to solve the problem "Got an error reading communication packets" Created: 2018-11-09  Updated: 2018-11-12  Resolved: 2018-11-12

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Federated
Affects Version/s: 10.1.28
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Su, Jun-Ming Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Windows Server 2012 R2


Issue Links:
Duplicate
is duplicated by MDEV-4452 Problem with FederatedX between two l... Closed

 Description   

The following is the scenerio of this problem.

Server A <-> Server B

Server A uses FederatedX Plugin to connect Server B's Table (called DB.TableA)

Because sometimes there will be Got an error reading communication packets. error for the failed processes, so I set an event to query ServerB's table (called DB.TableB) every hour, and there still are errors for this event and origional failed processes.

I check the network, and changes them to better environment, still happened.

I modified max_allowed_packet to 1G from Server A and B, modified timeout (net_read_timeout, net_write_timeout to 300 and 600 secs), still happened.

The execution time for failed processes are between 1 sec to 1 miniute only, and the content is writting data from Server A to Server B.

And few month later, the failed process for this problem increasing, I need to find the solution for this problem soon.

Once I found this issue and it says need to add FLUSH TABLE ServerB.DB.TableA/B eventually to solve this problem, and I do FLUSH TABLE every hour, there is no error now (for event to query ServerB's DB.TableB and the processes to write to Server B's DB.TableA).

Is it still need to do FLUSH TABLE eventually or make the FeferatedX code change to do FLUSH TABLE automatically?



 Comments   
Comment by Elena Stepanova [ 2018-11-11 ]

You could try to increase wait_timeout value and see if it helps.
See a comment about it here: Comment to MDEV-4452.

Comment by Su, Jun-Ming [ 2018-11-11 ]

Thanks for your responding.

The server A and B's wait_timeout are set to 28800 seconds (default value, 8 hours), and the event for each hour still got error.

Comment by Elena Stepanova [ 2018-11-11 ]

If I understand your setup correctly, wait_timeout is important on the server B, the one where non-Federated table is located.

Comment by Su, Jun-Ming [ 2018-11-12 ]

Yes, you are right, Server B is the one where non-Federated table is located, and I edited my comment for this, thanks.

Comment by Elena Stepanova [ 2018-11-12 ]

I suppose there can be other reasons for disconnect, like network problems or what not. Anyway, whatever the cause of the lost connection is, the problem seems pretty much the same as MDEV-4452 – after it happens, the next statement ends with ER_NET_READ_ERROR or ER_GET_ERRMSG, and only after that the connection gets re-established.

Thus, we'll continue tracking it within the scope of MDEV-4452.

Comment by Su, Jun-Ming [ 2018-11-12 ]

I read the issue you post before, but I only got error Got an error reading communication packets, no MySQL connection flew away message in error log or application log, and once I flush table for foreign tables first, all error will be gone. Please consider this issue is different with the issue you provide to make duplicate this issue.

If there is something help to solve it, please let me know and I will provide as full as possible.

Comment by Elena Stepanova [ 2018-11-12 ]

MDEV-4452 says that it is one error OR another, not both at once, so you only getting only one doesn't contradict anything.
And sure, if you can provide a reliably reproducible test case in MTR format for your variation of the failure, and it will be different from MDEV-4452, we will reconsider it.

Generated at Thu Feb 08 08:38:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.