[MXS-1926] LOAD DATA LOCAL INFILE interrupted by slave shutdown Created: 2018-06-15  Updated: 2018-06-21  Resolved: 2018-06-20

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: 2.2.9
Fix Version/s: 2.2.10

Type: Bug Priority: Critical
Reporter: Zdravelina Sokolovska (Inactive) Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS 7.4; 1 MaxScale Node; 3 Master-Master Galera Nodes


Attachments: File info.log.tar.gz     File maxscale.cnf     File maxscale_reduced.cnf    

 Description   

Issue Summary

Doing a LOAD DATA LOCAL INFILE and shutting down a slave server causes the LOAD DATA LOCAL INFILE to be aborted and the connection to be closed.

Original description

Reset one Node of the Cluster is closing all established connections to other Nodes trough maxctrl

the Node with restarted mysql service is afterwards never reached when client connects
trough MaxScale

I scenario
Client connections was established to server1 and server2 trough MaxScale.
After restarting mysqld on server2 , clients connected to server1 also received
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
and all connections was closed.

II scenario
Start table data load trough MaxScale.
As it's used 'readwritesplit' maxscale router , data loading started on server1 which is
considered Master by maxctrl at the moment
restart mysqld on server3
Data load to server1 was interrupted ,client connection is closed with
Error: 2013, Lost connection to MySQL server during query, when using table: fact_tick
The load become partial and invalid.

It is expected resetting slave Node not to break the operations on master node or any other slave node.
With the given 2nd scenario the restarted slave server is expected to sync with Master by IST .

I scenario

[root@t4w1 ~]#  maxctrl  list servers
┌─────────┬─────────────────┬──────┬─────────────┬─────────────────────────┬──────┐
│ Server  │ Address         │ Port │ Connections │ State                   │ GTID │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server1 │ 192.168.104.193 │ 3306 │ 151         │ Slave, Synced, Running  │      │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server2 │ 192.168.104.195 │ 3306 │ 151         │ Slave, Synced, Running  │      │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server3 │ 192.168.104.196 │ 3306 │ 151         │ Master, Synced, Running │      │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server4 │ 172.20.2.208    │ 3306 │ 0           │ Running                 │      │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server5 │ 172.20.2.206    │ 3306 │ 0           │ Running                 │      │
└─────────┴─────────────────┴──────┴─────────────┴─────────────────────────┴──────┘

[root@t4w5 ~]#  service mysql restart
Restarting mysql (via systemctl):                          [  OK  ]
[root@t4w5 ~]#

[root@t4w1 ~]#  maxctrl  list servers
┌─────────┬─────────────────┬──────┬─────────────┬─────────────────────────┬──────┐
│ Server  │ Address         │ Port │ Connections │ State                   │ GTID │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server1 │ 192.168.104.193 │ 3306 │ 0           │ Slave, Synced, Running  │      │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server2 │ 192.168.104.195 │ 3306 │ 0           │ Slave, Synced, Running  │      │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server3 │ 192.168.104.196 │ 3306 │ 0           │ Master, Synced, Running │      │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server4 │ 172.20.2.208    │ 3306 │ 0           │ Running                 │      │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server5 │ 172.20.2.206    │ 3306 │ 0           │ Running                 │      │
└─────────┴─────────────────┴──────┴─────────────┴─────────────────────────┴──────┘

logs from maxscale

2018-06-15 18:14:29   error  : Monitor was unable to connect to server [192.168.104.195]:3306 : "Lost connection to MySQL server at 'handshake: reading initial communication packet', system error: 107"
2018-06-15 18:14:29   notice : Server changed state: server2[192.168.104.195:3306]: slave_down. [Slave, Synced, Running] -> [Down]
2018-06-15 18:14:33   notice : Server changed state: server2[192.168.104.195:3306]: slave_up. [Down] -> [Slave, Synced, Running]
 

II scenario
Start data table load from client trough MaxScale

[root@t4w1 ~]#  maxctrl list servers
┌─────────┬─────────────────┬──────┬─────────────┬─────────────────────────┬──────┐
│ Server  │ Address         │ Port │ Connections │ State                   │ GTID │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server1 │ 192.168.104.193 │ 3306 │ 1           │ Master, Synced, Running │      │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server2 │ 192.168.104.195 │ 3306 │ 1           │ Slave, Synced, Running  │      │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server3 │ 192.168.104.196 │ 3306 │ 1           │ Slave, Synced, Running  │      │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server4 │ 172.20.2.208    │ 3306 │ 0           │ Running                 │      │
├─────────┼─────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ server5 │ 172.20.2.206    │ 3306 │ 0           │ Running                 │      │
└─────────┴─────────────────┴──────┴─────────────┴─────────────────────────┴──────┘
[root@t4w1 ~]#
 
 

Load started on server1 which is considered to be Master at the moment
server1
MariaDB [(none)]> show processlist ;
+----+-------------+-----------------------+---------------+---------+------+--------------------------+------------------------------------------------------------------------------------------------------+----------+
| Id | User        | Host                  | db            | Command | Time | State                    | Info                                                                                                 | Progress |
+----+-------------+-----------------------+---------------+---------+------+--------------------------+------------------------------------------------------------------------------------------------------+----------+
|  1 | system user |                       | NULL          | Sleep   | 1525 | WSREP aborter idle       | NULL                                                                                                 |    0.000 |
|  2 | system user |                       | NULL          | Sleep   |  199 | Committed 27502          | NULL                                                                                                 |    0.000 |
|  4 | system user |                       | NULL          | Daemon  | NULL | InnoDB purge coordinator | NULL                                                                                                 |    0.000 |
|  3 | system user |                       | NULL          | Daemon  | NULL | InnoDB purge worker      | NULL                                                                                                 |    0.000 |
|  5 | system user |                       | NULL          | Daemon  | NULL | InnoDB purge worker      | NULL                                                                                                 |    0.000 |
|  6 | system user |                       | NULL          | Daemon  | NULL | InnoDB purge worker      | NULL                                                                                                 |    0.000 |
|  7 | system user |                       | NULL          | Daemon  | NULL | InnoDB shutdown handler  | NULL                                                                                                 |    0.000 |
| 12 | root        | localhost             | NULL          | Query   |    0 | Init                     | show processlist                                                                                     |    0.000 |
| 13 | root        | 192.168.104.191:46590 | NULL          | Sleep   |    0 |                          | NULL                                                                                                 |    0.000 |
| 19 | root        | 192.168.104.191:46672 | boxoffice_mdb | Query   |   14 | Reading file             | LOAD DATA  LOCAL INFILE '/home/zdrav1/Pro/MDBEL22/MDBE-level-2/Auto_Pr_Check/insert-data-tables/boxo |    0.000 |
+----+-------------+-----------------------+---------------+---------+------+--------------------------+------------------------------------------------------------------------------------------------------+----------+
10 rows in set (0.000 sec)
 
MariaDB [(none)]>

restart any slave server -- it was restarted server3
 
[root@t4w6 ~]# service mysql restart
Restarting mysql (via systemctl):                          [  OK  ]
[root@t4w6 ~]#

Client connection was reset and load failed
 
Loading MammothDB Table fact_tickets.csv
Mon Jun 18 17:06:57 EEST 2018
 
mysqlimport: Error: 2013, Lost connection to MySQL server during query, when using table: fact_tickets

Logs from MaxScale
2018-06-18 17:11:37   error  : Monitor was unable to connect to server [192.168.104.196]:3306 : "Lost connection to MySQL server at 'handshake: reading initial communication packet', system error: 107"
2018-06-18 17:11:37   notice : Server changed state: server3[192.168.104.196:3306]: slave_down. [Slave, Synced, Running] -> [Down]
2018-06-18 17:11:43   notice : Server changed state: server3[192.168.104.196:3306]: slave_up. [Down] -> [Slave, Synced, Running]
 



 Comments   
Comment by markus makela [ 2018-06-15 ]

Could you attach the MaxScale configuration you used to test this? Also please add the MaxScale version you used.

Comment by markus makela [ 2018-06-18 ]

Have you created listeners at runtime? The configuration seems to be missing the listener definitions. I'm assuming that the service in question that was used was the Galera Service.

Could you try and reproduce this with the info log enabled? You can do that by adding log_info=true under the [maxscale] section. This should show when and why the client sessions are being closed.

Comment by markus makela [ 2018-06-18 ]

I tried reproducing this by loading 1M CSV rows via MaxScale into MariaDB and killing one of the slaves mid-query. I happened to stumble upon MXS-1931 but that manifests itself as a hang rather than a closed connection. I am not confident that it will always manifest as a hang so if possible, please reproduce this with a debug version of 2.2.9.

Comment by Zdravelina Sokolovska (Inactive) [ 2018-06-19 ]

listeners added to the running config; client connections are established to servers from Galera Service
over first listener port 3307.
interesting that problem is also recreated with 'readconnroute' maxscale router but then are closed
only connections to slave Nodes , while with 'readwritesplit' router was closed all connections .

Comment by markus makela [ 2018-06-19 ]

Can you reduce the configuration so that only the parts that have problem are in it? This way we know exactly which part of the configuration is the problem. Right now there are two services and two sets of servers that appear to be unrelated.

Comment by Zdravelina Sokolovska (Inactive) [ 2018-06-19 ]

recreated with reduced conifg consisting of one service

Comment by markus makela [ 2018-06-19 ]

OK, the info log tells the full story: When the slave is being killed, it sends an unexpected error packet. As the default behavior for unexpected responses is to route it to the client, the client proceeds to close the data stream. The correct behavior is to ignore the error if it is of the type ER_CONNECTION_KILLED.

Generated at Thu Feb 08 04:10:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.