[MDEV-8313] Got an error writing communication packets - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 5.5.41
Fix Version/s: 5.5.47, 10.0.23, 10.1.10
Component/s: Admin statements, Storage Engine - Federated
Labels:
- federatedx
Environment:
Centos 7 stock install

Description

From time to time, in various systems, processes, and in this case even backup, we get the error. An example is from mysqldump: '

Dumping MySQL database adwords ..
.. dump failed! mysqldump: Couldn't execute 'show table status like 'Nextag_Products_Bids_Working'': Got an error writing communication packets (1160)

Right in the middle of dumping the database. None of them make any sense, the database tables are local, communication is via socket. Many times, it's near the beginning of a program, so, it's opened the database, maybe read something like the date from MySQL, and, then immediately does something, which fails with this error. This is the second time this week for the mysqldump error. No message is logged in the mariadb error log. here is the my.cnf file:

[mysqld]

datadir=/home/mysql

socket=/var/lib/mysql/mysql.sock

tmpdir=/home/mysqltemp

user=mysql

log-bin=/var/lib/mysqllogs/binlog

expire_logs_days = 3

sync_binlog=0

server-id = 108

collation-server=latin1_general_cs

group_concat_max_len = 2M

symbolic-links=0

wait_timeout = 14400

connect_timeout = 50

max_heap_table_size = 256M

tmp_table_size = 256M

max_allowed_packet = 64M

max_connect_errors = 50

innodb_stats_sample_pages=12

innodb_file_per_table = 1

innodb_flush_log_at_trx_commit = 0

innodb_log_buffer_size = 8M

innodb_log_file_size = 512M

innodb_buffer_pool_size= 8G

# MyISAM tuning

key_buffer_size=512M

myisam_sort_buffer_size = 64M

join_buffer_size = 512K

bulk_insert_buffer_size = 512M

read_rnd_buffer_size = 1M

innodb_flush_method = O_DIRECT

default-storage-engine = Innodb

net_read_timeout = 600

net_write_timeout = 600

log-error=/var/log/mariadb/mariadb.log

pid-file=/var/run/mariadb/mariadb.pid

# myisam_use_mmap

[mysqld_safe]

log-error=/var/log/mariadb/mariadb.log

pid-file=/var/run/mariadb/mariadb.pid

!includedir /etc/my.cnf.d

No settings are in my.cnf.d

Nothing much is going on during the time the backup runs, machine very idle.

Attachments

Issue Links

links to

Bug #66184 - mysqldump: Got an error writing communication packets, Error_code: 1160

Activity

Ascending order - Click to sort in descending order

View 5 older comments

Elena Stepanova added a comment - 2015-06-17 14:17

Thanks for the info. I agree, 4 days might be a lucky coincidence, lets wait.

Elena Stepanova added a comment - 2015-06-17 14:17 Thanks for the info. I agree, 4 days might be a lucky coincidence, lets wait.

Steve Fatula added a comment - 2015-06-24 08:51

So, still no re-occurrence. So, the workaround based on understand what the bug actually was has likely worked. This would sort of prove the theory they proposed in upstream. Thanks for re-posting the upstream cases so I could carefully read them and come up with a workaround.

Steve Fatula added a comment - 2015-06-24 08:51 So, still no re-occurrence. So, the workaround based on understand what the bug actually was has likely worked. This would sort of prove the theory they proposed in upstream. Thanks for re-posting the upstream cases so I could carefully read them and come up with a workaround.

Elena Stepanova added a comment - 2015-06-24 23:01

Thanks for the info. Do you remember in which upstream report you found the theory about table_open_cache, so I could link this report to the proper upstream entry? I found a couple (https://bugs.mysql.com/bug.php?id=61790 and http://bugs.mysql.com/bug.php?id=51196), but they seem to relate to federated tables only.

Elena Stepanova added a comment - 2015-06-24 23:01 Thanks for the info. Do you remember in which upstream report you found the theory about table_open_cache, so I could link this report to the proper upstream entry? I found a couple ( https://bugs.mysql.com/bug.php?id=61790 and http://bugs.mysql.com/bug.php?id=51196 ), but they seem to relate to federated tables only.

Steve Fatula added a comment - 2015-06-25 01:46

The reference I found to the table cache was here: https://bugs.mysql.com/bug.php?id=61790

Now, that one is marked as duplicate, but, the chain should be there somewhere. Specifically, comment from Alexey Kopytov on July 20 2011 at 15:29

The problem is it's a duplicate (it appears to me) of https://bugs.mysql.com/bug.php?id=51196, which is closed. But that only mentions 5.0 and 5.1 and was fixed in 2012. Clearly, it's not fixed at least in Mariadb version. But the 61790 was the tipoff I needed as it made sense based on what I was seeing.

Steve Fatula added a comment - 2015-06-25 01:46 The reference I found to the table cache was here: https://bugs.mysql.com/bug.php?id=61790 Now, that one is marked as duplicate, but, the chain should be there somewhere. Specifically, comment from Alexey Kopytov on July 20 2011 at 15:29 The problem is it's a duplicate (it appears to me) of https://bugs.mysql.com/bug.php?id=51196 , which is closed. But that only mentions 5.0 and 5.1 and was fixed in 2012. Clearly, it's not fixed at least in Mariadb version. But the 61790 was the tipoff I needed as it made sense based on what I was seeing.

Elena Stepanova added a comment - 2015-08-02 19:41 - edited

I've been trying to reproduce the problem as it's described in Alexey Kopytov's comment of 20 Jul 2011, but something important must be missing there; and sadly the fix came without a test case.

The way it's put there, it should be very easy to repeat:

open a federated table;
wait until its connection to the remote server times out;
open enough tables to get the federated one be evicted from the cache;
observe the error.

The fix was adding this to ha_federated::close:

=== modified file 'storage/federated/ha_federated.cc'

--- storage/federated/ha_federated.cc	2011-06-30 15:37:13 +0000

+++ storage/federated/ha_federated.cc	2011-12-23 14:52:44 +0000

@@ -1651,6 +1651,16 @@

   mysql_close(mysql);

   mysql= NULL;

+  /*

+    mysql_close() might return an error if a remote server's gone

+    for some reason. If that happens while removing a table from

+    the table cache, the error will be propagated to a client even

+    if the original query was not issued against the FEDERATED table.

+    So, don't propagate errors from mysql_close().

+  */

+  if (table->in_use)

+    table->in_use->clear_error();

   DBUG_RETURN(free_share(share));

I can do the first three points, I get to ha_federated::close, but I don't get through this code, because the table is not in use – the flag gets unset as soon as the statement involving the federated table was finished. So, the table must be in active use at the moment? But if it's in use, then how can the connection expire, and if it's in active use, how can it be evicted from the cache?

So, all in all, I was not able to reproduce it as the comment describes, even on a pre-fix version (e.g. MySQL 5.1.61).

However, assuming the theory and the fix were correct, apparently it only made it to Federated, but not to FederatedX, while our release packages contain FederatedX. So I suppose if the fix is okay for an expert eye, it needs to be incorporated into FederatedX as well. Assigning to serg to take a look.

Elena Stepanova added a comment - 2015-08-02 19:41 - edited I've been trying to reproduce the problem as it's described in Alexey Kopytov's comment of 20 Jul 2011 , but something important must be missing there; and sadly the fix came without a test case. The way it's put there, it should be very easy to repeat: open a federated table; wait until its connection to the remote server times out; open enough tables to get the federated one be evicted from the cache; observe the error. The fix was adding this to ha_federated::close: === modified file 'storage/federated/ha_federated.cc' --- storage/federated/ha_federated.cc 2011-06-30 15:37:13 +0000 +++ storage/federated/ha_federated.cc 2011-12-23 14:52:44 +0000 @@ -1651,6 +1651,16 @@ mysql_close(mysql); mysql= NULL; + /* + mysql_close() might return an error if a remote server's gone + for some reason. If that happens while removing a table from + the table cache, the error will be propagated to a client even + if the original query was not issued against the FEDERATED table. + So, don't propagate errors from mysql_close(). + */ + if (table->in_use) + table->in_use->clear_error(); + DBUG_RETURN(free_share(share)); } I can do the first three points, I get to ha_federated::close, but I don't get through this code, because the table is not in use – the flag gets unset as soon as the statement involving the federated table was finished. So, the table must be in active use at the moment? But if it's in use, then how can the connection expire, and if it's in active use, how can it be evicted from the cache? So, all in all, I was not able to reproduce it as the comment describes, even on a pre-fix version (e.g. MySQL 5.1.61). However, assuming the theory and the fix were correct, apparently it only made it to Federated, but not to FederatedX, while our release packages contain FederatedX. So I suppose if the fix is okay for an expert eye, it needs to be incorporated into FederatedX as well. Assigning to serg to take a look.

MariaDB Server

Got an error writing communication packets

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration