Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4452

Problem with FederatedX between two local MariaDB servers

Details

    Description

      I spoke with Sergei at Percona about this issue. I have two MariaDB 5.5.30 (on CentOS 5) servers set up as master-slave (although that should matter, as this is not a replication issue).

      Here is my setup. On the slave, I have a federated table linking back to the master. Also on the slave, I have a batch reporting process that inserts records into a local (slave) table which matches the table definition of the table on the master. When the process is complete, I do a bulk insert like this:

      insert into <federated_table> select * from <local_table>;

      The records are then loaded into the server table, which then flow back to the slave (as well as other slaves) through normal replication.

      The problem is that one time in about 20 (about 5% of the time), I receive the following error on the slave:

      Got error 10000 'Error on remote system: 2006: MySQL server has gone away' from FEDERATED

      Of course, the master is running fine and never hangs or "goes away". When this error occurs, I have my slave reporting code immediately retry the insert query, and it usually works without failure the second time. The two servers are right next to each other and operate on an internal network (10.0.0.x) with just one HP gigabit switch between them. They are both VMWare VMs running in VMWare ESXi 5.

      The master server is a development server, so it is not under any load or contention.

      I'd like to try to find out some more detailed error logging on the slave when this happens with Federated tables or what could be causing this issue.

      Many thanks.

      Attachments

        Issue Links

          Activity

            heskin Hank Eskin (Inactive) added a comment - - edited

            Just to be clear, there are four tables:

            Master: results
            Slave: results (the slave's replicated version of "results" on the master)
            Slave: results_local (a slave local table used for reporting output)
            Slave results_fed (federated table linking to table "results" on the master)

            The reporting process populates "results_local" on the slave. The problematic insert statement on the slave is:

            insert into results_fed select * from results_local;

            Hope this clears things up.

            heskin Hank Eskin (Inactive) added a comment - - edited Just to be clear, there are four tables: Master: results Slave: results (the slave's replicated version of "results" on the master) Slave: results_local (a slave local table used for reporting output) Slave results_fed (federated table linking to table "results" on the master) The reporting process populates "results_local" on the slave. The problematic insert statement on the slave is: insert into results_fed select * from results_local; Hope this clears things up.

            Please ignore my previous question if you received it (deleted now), the summary of the report already answers it.

            elenst Elena Stepanova added a comment - Please ignore my previous question if you received it (deleted now), the summary of the report already answers it.

            I would like to add that along the same lines, I am also experiencing another similar problem with FEDERATEDX. I have code that deletes all records for a primary key in a federated table (between the same two servers mentioned above), and then inserts new records replacing those deleted records.

            Again, about 5% of the time, I will get an error "Can't write; duplicate key in table <table>' when I know for sure that there are absolutely no duplicate records in the existing table or the source records. The source records are coming from another table with the same primary key, so there can not be duplicate records. If I try the same insert again, it usually works. So something on the original table side is triggering this error, but the Federated engine can't interpret and transmit the error correctly to the federated table database.

            Just to eliminate the delete, in my code I now delete the primary key records on the "master" table directly (not through FEDERATED), and then insert into the federated table and the same problem occurs. So the problem is definitely on the federated table insert.

            heskin Hank Eskin (Inactive) added a comment - I would like to add that along the same lines, I am also experiencing another similar problem with FEDERATEDX. I have code that deletes all records for a primary key in a federated table (between the same two servers mentioned above), and then inserts new records replacing those deleted records. Again, about 5% of the time, I will get an error "Can't write; duplicate key in table <table>' when I know for sure that there are absolutely no duplicate records in the existing table or the source records. The source records are coming from another table with the same primary key, so there can not be duplicate records. If I try the same insert again, it usually works. So something on the original table side is triggering this error, but the Federated engine can't interpret and transmit the error correctly to the federated table database. Just to eliminate the delete, in my code I now delete the primary key records on the "master" table directly (not through FEDERATED), and then insert into the federated table and the same problem occurs. So the problem is definitely on the federated table insert.

            Eureka (18 months too late).

            Since we are talking about some regular analytic/reporting activity, it is probably scheduled and isn't very frequent. Then, it might be that the batch reporting process which does local inserts runs at such time when it finishes close to the moment when the existing federated connection is about to timeout. Sometimes (like, in 5% cases) the process takes a bit longer and the federated connection actually times out. And then we hit the problem – when the remote connection disappeared, the next INSERT ends with either "server has gone away" or with "error reading communication packets", see the test case below.

            Test case

            source suite/federated/have_federatedx.inc;
             
            create table t_remote (a int primary key, b varchar(8));
             
            eval create table t_local (a int primary key, b varchar(8))
              engine=federated
              connection='mysql://root@127.0.0.1:$MASTER_MYPORT/test/t_remote';
             
            set @wait_timeout.saved = @@global.wait_timeout;
            set global wait_timeout = 1;
            set wait_timeout = 28800;
             
            echo # This one should work;
            insert into t_local values (1,'foo');
             
            sleep 2;
            --error ER_GET_ERRMSG,ER_NET_READ_ERROR
            insert into t_local values (2,'bar');
            echo # Got error $mysql_errno ($mysql_errname)
            echo # This should work again;
            insert into t_local values (2,'bar');
             
            drop table t_remote, t_local;
             
            set global wait_timeout = @wait_timeout.saved;

            elenst Elena Stepanova added a comment - Eureka (18 months too late). Since we are talking about some regular analytic/reporting activity, it is probably scheduled and isn't very frequent. Then, it might be that the batch reporting process which does local inserts runs at such time when it finishes close to the moment when the existing federated connection is about to timeout. Sometimes (like, in 5% cases) the process takes a bit longer and the federated connection actually times out. And then we hit the problem – when the remote connection disappeared, the next INSERT ends with either "server has gone away" or with "error reading communication packets", see the test case below. Test case source suite/federated/have_federatedx.inc;   create table t_remote (a int primary key , b varchar (8));   eval create table t_local (a int primary key , b varchar (8)) engine=federated connection = 'mysql://root@127.0.0.1:$MASTER_MYPORT/test/t_remote' ;   set @wait_timeout.saved = @@ global .wait_timeout; set global wait_timeout = 1; set wait_timeout = 28800;   echo # This one should work ; insert into t_local values (1, 'foo' );   sleep 2; --error ER_GET_ERRMSG,ER_NET_READ_ERROR insert into t_local values (2, 'bar' ); echo # Got error $mysql_errno ($mysql_errname) echo # This should work again; insert into t_local values (2, 'bar' );   drop table t_remote, t_local;   set global wait_timeout = @wait_timeout.saved;
            midenok Aleksey Midenkov added a comment - - edited

            Another use case with UPDATE:

            create or replace table n1 (x int);
            create or replace table nf engine=FEDERATED connection='mysql://root@127.0.0.1:$MASTER_MYPORT/test/n1';
            insert into n1 values (1);
            update nf, n1 set nf.x= 2;
            ERROR 1159 (08S01): Got timeout reading communication packets
            

            midenok Aleksey Midenkov added a comment - - edited Another use case with UPDATE: create or replace table n1 (x int ); create or replace table nf engine=FEDERATED connection = 'mysql://root@127.0.0.1:$MASTER_MYPORT/test/n1' ; insert into n1 values (1); update nf, n1 set nf.x= 2; ERROR 1159 (08S01): Got timeout reading communication packets
            sujunmin Su, Jun-Ming added a comment -

            I read this issue and I only got error Got an error reading communication packets, no MySQL connection flew away message in error log or application log, and once I flush table for foreign tables first, all error will be gone. Please consider this issue MDEV-17651 is different with this issue make duplicate.

            If there is something help to solve it, please let me know and I will provide as full as possible.

            sujunmin Su, Jun-Ming added a comment - I read this issue and I only got error Got an error reading communication packets, no MySQL connection flew away message in error log or application log, and once I flush table for foreign tables first, all error will be gone. Please consider this issue MDEV-17651 is different with this issue make duplicate. If there is something help to solve it, please let me know and I will provide as full as possible.

            Assuming the "eureka" comment correctly describes the reason of this old issue, it's not a bug. With sufficiently slow queries FederatedX connection can timeout, and to fix that one needs to increase the timeout accordingly.

            serg Sergei Golubchik added a comment - Assuming the "eureka" comment correctly describes the reason of this old issue, it's not a bug. With sufficiently slow queries FederatedX connection can timeout, and to fix that one needs to increase the timeout accordingly.

            People

              serg Sergei Golubchik
              heskin Hank Eskin (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.