Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
5.5.41
-
Centos 7 stock install
Description
From time to time, in various systems, processes, and in this case even backup, we get the error. An example is from mysqldump: '
Dumping MySQL database adwords ..
.. dump failed! mysqldump: Couldn't execute 'show table status like 'Nextag_Products_Bids_Working'': Got an error writing communication packets (1160)
Right in the middle of dumping the database. None of them make any sense, the database tables are local, communication is via socket. Many times, it's near the beginning of a program, so, it's opened the database, maybe read something like the date from MySQL, and, then immediately does something, which fails with this error. This is the second time this week for the mysqldump error. No message is logged in the mariadb error log. here is the my.cnf file:
[mysqld]
|
datadir=/home/mysql
|
socket=/var/lib/mysql/mysql.sock
|
tmpdir=/home/mysqltemp
|
user=mysql
|
log-bin=/var/lib/mysqllogs/binlog
|
expire_logs_days = 3
|
sync_binlog=0
|
server-id = 108
|
collation-server=latin1_general_cs
|
group_concat_max_len = 2M
|
symbolic-links=0
|
wait_timeout = 14400
|
connect_timeout = 50
|
max_heap_table_size = 256M
|
tmp_table_size = 256M
|
max_allowed_packet = 64M
|
max_connect_errors = 50
|
innodb_stats_sample_pages=12
|
innodb_file_per_table = 1
|
innodb_flush_log_at_trx_commit = 0
|
innodb_log_buffer_size = 8M
|
innodb_log_file_size = 512M
|
innodb_buffer_pool_size= 8G
|
# MyISAM tuning
|
key_buffer_size=512M
|
myisam_sort_buffer_size = 64M
|
join_buffer_size = 512K
|
bulk_insert_buffer_size = 512M
|
read_rnd_buffer_size = 1M
|
#
|
innodb_flush_method = O_DIRECT
|
default-storage-engine = Innodb
|
net_read_timeout = 600
|
net_write_timeout = 600
|
log-error=/var/log/mariadb/mariadb.log
|
pid-file=/var/run/mariadb/mariadb.pid
|
# myisam_use_mmap
|
|
[mysqld_safe]
|
log-error=/var/log/mariadb/mariadb.log
|
pid-file=/var/run/mariadb/mariadb.pid
|
|
!includedir /etc/my.cnf.d
|
No settings are in my.cnf.d
Nothing much is going on during the time the backup runs, machine very idle.
I've been trying to reproduce the problem as it's described in Alexey Kopytov's comment of 20 Jul 2011, but something important must be missing there; and sadly the fix came without a test case.
The way it's put there, it should be very easy to repeat:
The fix was adding this to ha_federated::close:
=== modified file 'storage/federated/ha_federated.cc'
--- storage/federated/ha_federated.cc 2011-06-30 15:37:13 +0000
+++ storage/federated/ha_federated.cc 2011-12-23 14:52:44 +0000
@@ -1651,6 +1651,16 @@
mysql_close(mysql);
mysql= NULL;
+ /*
+ mysql_close() might return an error if a remote server's gone
+ for some reason. If that happens while removing a table from
+ the table cache, the error will be propagated to a client even
+ if the original query was not issued against the FEDERATED table.
+ So, don't propagate errors from mysql_close().
+ */
+ if (table->in_use)
+ table->in_use->clear_error();
+
DBUG_RETURN(free_share(share));
}
I can do the first three points, I get to ha_federated::close, but I don't get through this code, because the table is not in use – the flag gets unset as soon as the statement involving the federated table was finished. So, the table must be in active use at the moment? But if it's in use, then how can the connection expire, and if it's in active use, how can it be evicted from the cache?
So, all in all, I was not able to reproduce it as the comment describes, even on a pre-fix version (e.g. MySQL 5.1.61).
However, assuming the theory and the fix were correct, apparently it only made it to Federated, but not to FederatedX, while our release packages contain FederatedX. So I suppose if the fix is okay for an expert eye, it needs to be incorporated into FederatedX as well. Assigning to serg to take a look.