[MDEV-13770] client/Server protocol violation caused by galera rerunning rolled back Stored Procedure call Created: 2017-09-08  Updated: 2017-10-03  Resolved: 2017-10-03

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.2.8
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Simon Lewis Assignee: Andrii Nikitin (Inactive)
Resolution: Duplicate Votes: 0
Labels: galera
Environment:

MariaDb 10.2.8 with Galera Cluster (3 nodes)
Centos 7.3 for database server
Client running on docker 17.06.1-ce
Client written in c# .net core 2 with communicating with db using MySqlConnector
(https://github.com/mysql-net/MySqlConnector)


Issue Links:
Duplicate
duplicates MDEV-4237 Galera: Malformed packet, ER_SP_DOES_... Open

 Description   

We use Mariadb to store a work queue which is accessed by application servers using a stored procedure. The stored procedure updates a number of rows and then returns them to the calling server.

When load testing the system, we are receiving occasional "Protocol Violation" exceptions. This appears to occur when

The stored procedure is called on one node of the galera cluster.
This node starts returning the resultset to the client application
Galera detects an inconsistency caused by an update on a second node and rolls the transaction back. Its sends an "EOF, more results" to the client application, followed by a new resultset from the rerun stored procedure call. The new resultset isn't preceded by column information and so causes the protocol exception.

1. This behaviour doesn't appear to conform to the MySql client/server protocol. Is this correct? If it does, can you point me in the direction of documentation that describes the behaviour - In particular, how do you know the first rowset is incomplete and should be thrown away?

2. Should you be returning data to the client application that can be still rolled back.

We have captured a network trace of this issue and can share it with you, but not on a public forum. If you want it, can you let me know where to send it.



 Comments   
Comment by Andrii Nikitin (Inactive) [ 2017-09-19 ]

The way I read the report and questions is that mysql_fetch_row() API call may return "Deadlock found" in Galera, while the only defined error codes may be CR_SERVER_LOST or CR_UNKNOWN_ERROR .
Please confirm if my understanding looks correct - I tend to agree that it is valid claim. And still need to compile test case.

In any case - I doubt that it may be easily changed from Galera side - so most possible workaround will be to modify client programs or drivers to deal with such violation.
These are only preliminary speculations though.

Comment by Andrii Nikitin (Inactive) [ 2017-09-27 ]

(It is hard to comment for sure without reproducible test case, but )
Is it possible for you to conduct the load testing with additional setting on the nodes and confirm if the problem still persist?
wsrep_retry_autocommit=0

Comment by Simon Lewis [ 2017-09-27 ]

Yes, sure. I will rerun it tomorrow and should have a response by the end of the day.

Comment by Andrii Nikitin (Inactive) [ 2017-10-03 ]

The problem is now verified and reported to codership https://github.com/codership/mysql-wsrep/issues/313
Thank you for assistance - I will close now this report as duplicate of MDEV-4237

Generated at Thu Feb 08 08:08:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.