Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8313

Got an error writing communication packets

Details

    Description

      From time to time, in various systems, processes, and in this case even backup, we get the error. An example is from mysqldump: '

      Dumping MySQL database adwords ..
      .. dump failed! mysqldump: Couldn't execute 'show table status like 'Nextag_Products_Bids_Working'': Got an error writing communication packets (1160)

      Right in the middle of dumping the database. None of them make any sense, the database tables are local, communication is via socket. Many times, it's near the beginning of a program, so, it's opened the database, maybe read something like the date from MySQL, and, then immediately does something, which fails with this error. This is the second time this week for the mysqldump error. No message is logged in the mariadb error log. here is the my.cnf file:

      [mysqld]
      datadir=/home/mysql
      socket=/var/lib/mysql/mysql.sock
      tmpdir=/home/mysqltemp
      user=mysql
      log-bin=/var/lib/mysqllogs/binlog
      expire_logs_days = 3
      sync_binlog=0
      server-id = 108
      collation-server=latin1_general_cs
      group_concat_max_len = 2M
      symbolic-links=0
      wait_timeout = 14400
      connect_timeout = 50
      max_heap_table_size = 256M
      tmp_table_size = 256M
      max_allowed_packet = 64M
      max_connect_errors = 50
      innodb_stats_sample_pages=12
      innodb_file_per_table = 1
      innodb_flush_log_at_trx_commit = 0
      innodb_log_buffer_size = 8M
      innodb_log_file_size = 512M
      innodb_buffer_pool_size= 8G
      # MyISAM tuning
      key_buffer_size=512M
      myisam_sort_buffer_size = 64M
      join_buffer_size = 512K
      bulk_insert_buffer_size = 512M
      read_rnd_buffer_size = 1M
      #
      innodb_flush_method = O_DIRECT
      default-storage-engine = Innodb
      net_read_timeout = 600
      net_write_timeout = 600
      log-error=/var/log/mariadb/mariadb.log
      pid-file=/var/run/mariadb/mariadb.pid
      # myisam_use_mmap
       
      [mysqld_safe]
      log-error=/var/log/mariadb/mariadb.log
      pid-file=/var/run/mariadb/mariadb.pid
       
      !includedir /etc/my.cnf.d

      No settings are in my.cnf.d

      Nothing much is going on during the time the backup runs, machine very idle.

      Attachments

        Activity

          Thanks for the info. I agree, 4 days might be a lucky coincidence, lets wait.

          elenst Elena Stepanova added a comment - Thanks for the info. I agree, 4 days might be a lucky coincidence, lets wait.
          sfatula Steve Fatula added a comment -

          So, still no re-occurrence. So, the workaround based on understand what the bug actually was has likely worked. This would sort of prove the theory they proposed in upstream. Thanks for re-posting the upstream cases so I could carefully read them and come up with a workaround.

          sfatula Steve Fatula added a comment - So, still no re-occurrence. So, the workaround based on understand what the bug actually was has likely worked. This would sort of prove the theory they proposed in upstream. Thanks for re-posting the upstream cases so I could carefully read them and come up with a workaround.

          Thanks for the info. Do you remember in which upstream report you found the theory about table_open_cache, so I could link this report to the proper upstream entry? I found a couple (https://bugs.mysql.com/bug.php?id=61790 and http://bugs.mysql.com/bug.php?id=51196), but they seem to relate to federated tables only.

          elenst Elena Stepanova added a comment - Thanks for the info. Do you remember in which upstream report you found the theory about table_open_cache, so I could link this report to the proper upstream entry? I found a couple ( https://bugs.mysql.com/bug.php?id=61790 and http://bugs.mysql.com/bug.php?id=51196 ), but they seem to relate to federated tables only.
          sfatula Steve Fatula added a comment -

          The reference I found to the table cache was here: https://bugs.mysql.com/bug.php?id=61790

          Now, that one is marked as duplicate, but, the chain should be there somewhere. Specifically, comment from Alexey Kopytov on July 20 2011 at 15:29

          The problem is it's a duplicate (it appears to me) of https://bugs.mysql.com/bug.php?id=51196, which is closed. But that only mentions 5.0 and 5.1 and was fixed in 2012. Clearly, it's not fixed at least in Mariadb version. But the 61790 was the tipoff I needed as it made sense based on what I was seeing.

          sfatula Steve Fatula added a comment - The reference I found to the table cache was here: https://bugs.mysql.com/bug.php?id=61790 Now, that one is marked as duplicate, but, the chain should be there somewhere. Specifically, comment from Alexey Kopytov on July 20 2011 at 15:29 The problem is it's a duplicate (it appears to me) of https://bugs.mysql.com/bug.php?id=51196 , which is closed. But that only mentions 5.0 and 5.1 and was fixed in 2012. Clearly, it's not fixed at least in Mariadb version. But the 61790 was the tipoff I needed as it made sense based on what I was seeing.
          elenst Elena Stepanova added a comment - - edited

          I've been trying to reproduce the problem as it's described in Alexey Kopytov's comment of 20 Jul 2011, but something important must be missing there; and sadly the fix came without a test case.

          The way it's put there, it should be very easy to repeat:

          • open a federated table;
          • wait until its connection to the remote server times out;
          • open enough tables to get the federated one be evicted from the cache;
          • observe the error.

          The fix was adding this to ha_federated::close:

          === modified file 'storage/federated/ha_federated.cc'
          --- storage/federated/ha_federated.cc	2011-06-30 15:37:13 +0000
          +++ storage/federated/ha_federated.cc	2011-12-23 14:52:44 +0000
          @@ -1651,6 +1651,16 @@
             mysql_close(mysql);
             mysql= NULL;
           
          +  /*
          +    mysql_close() might return an error if a remote server's gone
          +    for some reason. If that happens while removing a table from
          +    the table cache, the error will be propagated to a client even
          +    if the original query was not issued against the FEDERATED table.
          +    So, don't propagate errors from mysql_close().
          +  */
          +  if (table->in_use)
          +    table->in_use->clear_error();
          +
             DBUG_RETURN(free_share(share));
           }
           

          I can do the first three points, I get to ha_federated::close, but I don't get through this code, because the table is not in use – the flag gets unset as soon as the statement involving the federated table was finished. So, the table must be in active use at the moment? But if it's in use, then how can the connection expire, and if it's in active use, how can it be evicted from the cache?

          So, all in all, I was not able to reproduce it as the comment describes, even on a pre-fix version (e.g. MySQL 5.1.61).

          However, assuming the theory and the fix were correct, apparently it only made it to Federated, but not to FederatedX, while our release packages contain FederatedX. So I suppose if the fix is okay for an expert eye, it needs to be incorporated into FederatedX as well. Assigning to serg to take a look.

          elenst Elena Stepanova added a comment - - edited I've been trying to reproduce the problem as it's described in Alexey Kopytov's comment of 20 Jul 2011 , but something important must be missing there; and sadly the fix came without a test case. The way it's put there, it should be very easy to repeat: open a federated table; wait until its connection to the remote server times out; open enough tables to get the federated one be evicted from the cache; observe the error. The fix was adding this to ha_federated::close: === modified file 'storage/federated/ha_federated.cc' --- storage/federated/ha_federated.cc 2011-06-30 15:37:13 +0000 +++ storage/federated/ha_federated.cc 2011-12-23 14:52:44 +0000 @@ -1651,6 +1651,16 @@ mysql_close(mysql); mysql= NULL; + /* + mysql_close() might return an error if a remote server's gone + for some reason. If that happens while removing a table from + the table cache, the error will be propagated to a client even + if the original query was not issued against the FEDERATED table. + So, don't propagate errors from mysql_close(). + */ + if (table->in_use) + table->in_use->clear_error(); + DBUG_RETURN(free_share(share)); }   I can do the first three points, I get to ha_federated::close, but I don't get through this code, because the table is not in use – the flag gets unset as soon as the statement involving the federated table was finished. So, the table must be in active use at the moment? But if it's in use, then how can the connection expire, and if it's in active use, how can it be evicted from the cache? So, all in all, I was not able to reproduce it as the comment describes, even on a pre-fix version (e.g. MySQL 5.1.61). However, assuming the theory and the fix were correct, apparently it only made it to Federated, but not to FederatedX, while our release packages contain FederatedX. So I suppose if the fix is okay for an expert eye, it needs to be incorporated into FederatedX as well. Assigning to serg to take a look.

          People

            serg Sergei Golubchik
            sfatula Steve Fatula
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.