[MDEV-8313] Got an error writing communication packets Created: 2015-06-12  Updated: 2015-12-08  Resolved: 2015-12-08

Status: Closed
Project: MariaDB Server
Component/s: Admin statements, Storage Engine - Federated
Affects Version/s: 5.5.41
Fix Version/s: 5.5.47, 10.0.23, 10.1.10

Type: Bug Priority: Major
Reporter: Steve Fatula Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: federatedx
Environment:

Centos 7 stock install



 Description   

From time to time, in various systems, processes, and in this case even backup, we get the error. An example is from mysqldump: '

Dumping MySQL database adwords ..
.. dump failed! mysqldump: Couldn't execute 'show table status like 'Nextag_Products_Bids_Working'': Got an error writing communication packets (1160)

Right in the middle of dumping the database. None of them make any sense, the database tables are local, communication is via socket. Many times, it's near the beginning of a program, so, it's opened the database, maybe read something like the date from MySQL, and, then immediately does something, which fails with this error. This is the second time this week for the mysqldump error. No message is logged in the mariadb error log. here is the my.cnf file:

[mysqld]
datadir=/home/mysql
socket=/var/lib/mysql/mysql.sock
tmpdir=/home/mysqltemp
user=mysql
log-bin=/var/lib/mysqllogs/binlog
expire_logs_days = 3
sync_binlog=0
server-id = 108
collation-server=latin1_general_cs
group_concat_max_len = 2M
symbolic-links=0
wait_timeout = 14400
connect_timeout = 50
max_heap_table_size = 256M
tmp_table_size = 256M
max_allowed_packet = 64M
max_connect_errors = 50
innodb_stats_sample_pages=12
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 0
innodb_log_buffer_size = 8M
innodb_log_file_size = 512M
innodb_buffer_pool_size= 8G
# MyISAM tuning
key_buffer_size=512M
myisam_sort_buffer_size = 64M
join_buffer_size = 512K
bulk_insert_buffer_size = 512M
read_rnd_buffer_size = 1M
#
innodb_flush_method = O_DIRECT
default-storage-engine = Innodb
net_read_timeout = 600
net_write_timeout = 600
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
# myisam_use_mmap
 
[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
 
!includedir /etc/my.cnf.d

No settings are in my.cnf.d

Nothing much is going on during the time the backup runs, machine very idle.



 Comments   
Comment by Elena Stepanova [ 2015-06-13 ]

sfatula,

Thanks for the report.

Do you happen to have FEDERATED tables in your database?
Some upstream issues which describe very similar observations. This one is probably most interesting:
http://bugs.mysql.com/bug.php?id=66184

Even though it's in 'Verified' status, in fact it is only verified in the part that relates to using FEDERATED tables, while other affected users reported that they observe it even though they do not use the engine. So, if you use FEDERATED, it will probably be easier to explain; if you don't, it might require digging deep.

Comment by Steve Fatula [ 2015-06-13 ]

No federated tables in the database. There is a single federated table in one database, but, not this one. All federated tables (all 1) are in their own database which is not backed up. This is confirmed via select* from information_schema.tables where ENGINE = "FEDERATED". I did see the other issue related to federated. Not sure if they use sockets or not. But, it may apply as the claim is backing up a database with no federated table fails even though some other database has a federated table. Is that the way you read that bug report? Sounds like it's not solved though. And, the current status is they claim it only affects backup of information_schema, whereas, mine is certainly not that database. How to proceed?

Comment by Steve Fatula [ 2015-06-13 ]

A similar case mentions the table cache. This might make some sense, so, I increased table_open_cache from the default of 64 to far more than the table count to see if maybe this will avoid the error since no one is on really when the backup is running.

Comment by Elena Stepanova [ 2015-06-13 ]

Not sure if they use sockets or not.

The initial reporter uses sockets, the description says it: "mysqldump is configured to use SOCKETS." Further discussion goes in different directions, it's hard to say who uses what.

But, it may apply as the claim is backing up a database with no federated table fails even though some other database has a federated table. Is that the way you read that bug report?

Unless I missed something, I would sum up the report this way:

  • people see this error when they do a backup, even when the database to backup is not information_schema and does not contain FEDERATED tables; but nobody said they do not have FEDERATED tables at all;
  • the only part that the verification team was able to reproduce so far was getting the error while running backup on information_schema, and there are some FEDERATED tables in some databases.

Sounds like it's not solved though.

Right, it does not look like it's solved. Bug status in MySQL's public bug base might not always be updated timely, but given that it's a vague issue which allegedly involves FEDERATED, I highly doubt it was fixed.

And, the current status is they claim it only affects backup of information_schema, whereas, mine is certainly not that database. How to proceed?

As far as I remember, mysqldump does a lot of weird stuff behind the scenes, so I will not be overly surprised if in some circumstances it is affected by tables in a different schema. We will need to look deeper into it.

How to proceed?

If possible, it could be helpful to enable general log until the next time the problem occurs, and also increase the warning level to >=2 (so that the error log contains aborted connections). Also, if you can set up a periodic monitor for SHOW FULL PROCESSLIST, it might be useful. I can't promise that when we have this data, we will certainly figure out what goes wrong, but matching these three sources might give us some clues.
Of course, you should be careful with general log in a busy environment, since it can cause performance degradation and require a lot of disk space, and only enable it if it's safe enough in your setup.

Also, please paste your exact mysqldump command line.

Comment by Steve Fatula [ 2015-06-17 ]

Ok, just so you know I am not ignoring your message. Thus far, based on another upstream case which mentioned the true cause to be a bug in open tables, I had increased the table_open_cache setting such that no table should need to be closed during a backup on this system. And, 4 days in a row, perfect backup. So, monitoring it at this time. If the problem does not re-occur, I don't wish to pursue it further as eventually, it sounds like they will fix it. But, 4 days, while unusual lately, is not confirmed yet in my mind.

Comment by Elena Stepanova [ 2015-06-17 ]

Thanks for the info. I agree, 4 days might be a lucky coincidence, lets wait.

Comment by Steve Fatula [ 2015-06-24 ]

So, still no re-occurrence. So, the workaround based on understand what the bug actually was has likely worked. This would sort of prove the theory they proposed in upstream. Thanks for re-posting the upstream cases so I could carefully read them and come up with a workaround.

Comment by Elena Stepanova [ 2015-06-24 ]

Thanks for the info. Do you remember in which upstream report you found the theory about table_open_cache, so I could link this report to the proper upstream entry? I found a couple (https://bugs.mysql.com/bug.php?id=61790 and http://bugs.mysql.com/bug.php?id=51196), but they seem to relate to federated tables only.

Comment by Steve Fatula [ 2015-06-25 ]

The reference I found to the table cache was here: https://bugs.mysql.com/bug.php?id=61790

Now, that one is marked as duplicate, but, the chain should be there somewhere. Specifically, comment from Alexey Kopytov on July 20 2011 at 15:29

The problem is it's a duplicate (it appears to me) of https://bugs.mysql.com/bug.php?id=51196, which is closed. But that only mentions 5.0 and 5.1 and was fixed in 2012. Clearly, it's not fixed at least in Mariadb version. But the 61790 was the tipoff I needed as it made sense based on what I was seeing.

Comment by Elena Stepanova [ 2015-08-02 ]

I've been trying to reproduce the problem as it's described in Alexey Kopytov's comment of 20 Jul 2011, but something important must be missing there; and sadly the fix came without a test case.

The way it's put there, it should be very easy to repeat:

  • open a federated table;
  • wait until its connection to the remote server times out;
  • open enough tables to get the federated one be evicted from the cache;
  • observe the error.

The fix was adding this to ha_federated::close:

=== modified file 'storage/federated/ha_federated.cc'
--- storage/federated/ha_federated.cc	2011-06-30 15:37:13 +0000
+++ storage/federated/ha_federated.cc	2011-12-23 14:52:44 +0000
@@ -1651,6 +1651,16 @@
   mysql_close(mysql);
   mysql= NULL;
 
+  /*
+    mysql_close() might return an error if a remote server's gone
+    for some reason. If that happens while removing a table from
+    the table cache, the error will be propagated to a client even
+    if the original query was not issued against the FEDERATED table.
+    So, don't propagate errors from mysql_close().
+  */
+  if (table->in_use)
+    table->in_use->clear_error();
+
   DBUG_RETURN(free_share(share));
 }
 

I can do the first three points, I get to ha_federated::close, but I don't get through this code, because the table is not in use – the flag gets unset as soon as the statement involving the federated table was finished. So, the table must be in active use at the moment? But if it's in use, then how can the connection expire, and if it's in active use, how can it be evicted from the cache?

So, all in all, I was not able to reproduce it as the comment describes, even on a pre-fix version (e.g. MySQL 5.1.61).

However, assuming the theory and the fix were correct, apparently it only made it to Federated, but not to FederatedX, while our release packages contain FederatedX. So I suppose if the fix is okay for an expert eye, it needs to be incorporated into FederatedX as well. Assigning to serg to take a look.

Generated at Thu Feb 08 07:26:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.