Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13770

client/Server protocol violation caused by galera rerunning rolled back Stored Procedure call

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • 10.2.8
    • N/A
    • Galera

    Description

      We use Mariadb to store a work queue which is accessed by application servers using a stored procedure. The stored procedure updates a number of rows and then returns them to the calling server.

      When load testing the system, we are receiving occasional "Protocol Violation" exceptions. This appears to occur when

      The stored procedure is called on one node of the galera cluster.
      This node starts returning the resultset to the client application
      Galera detects an inconsistency caused by an update on a second node and rolls the transaction back. Its sends an "EOF, more results" to the client application, followed by a new resultset from the rerun stored procedure call. The new resultset isn't preceded by column information and so causes the protocol exception.

      1. This behaviour doesn't appear to conform to the MySql client/server protocol. Is this correct? If it does, can you point me in the direction of documentation that describes the behaviour - In particular, how do you know the first rowset is incomplete and should be thrown away?

      2. Should you be returning data to the client application that can be still rolled back.

      We have captured a network trace of this issue and can share it with you, but not on a public forum. If you want it, can you let me know where to send it.

      Attachments

        Issue Links

          Activity

            The way I read the report and questions is that mysql_fetch_row() API call may return "Deadlock found" in Galera, while the only defined error codes may be CR_SERVER_LOST or CR_UNKNOWN_ERROR .
            Please confirm if my understanding looks correct - I tend to agree that it is valid claim. And still need to compile test case.

            In any case - I doubt that it may be easily changed from Galera side - so most possible workaround will be to modify client programs or drivers to deal with such violation.
            These are only preliminary speculations though.

            anikitin Andrii Nikitin (Inactive) added a comment - The way I read the report and questions is that mysql_fetch_row() API call may return "Deadlock found" in Galera, while the only defined error codes may be CR_SERVER_LOST or CR_UNKNOWN_ERROR . Please confirm if my understanding looks correct - I tend to agree that it is valid claim. And still need to compile test case. In any case - I doubt that it may be easily changed from Galera side - so most possible workaround will be to modify client programs or drivers to deal with such violation. These are only preliminary speculations though.
            anikitin Andrii Nikitin (Inactive) added a comment - - edited

            (It is hard to comment for sure without reproducible test case, but )
            Is it possible for you to conduct the load testing with additional setting on the nodes and confirm if the problem still persist?
            wsrep_retry_autocommit=0

            anikitin Andrii Nikitin (Inactive) added a comment - - edited (It is hard to comment for sure without reproducible test case, but ) Is it possible for you to conduct the load testing with additional setting on the nodes and confirm if the problem still persist? wsrep_retry_autocommit=0
            simon1144 Simon Lewis added a comment -

            Yes, sure. I will rerun it tomorrow and should have a response by the end of the day.

            simon1144 Simon Lewis added a comment - Yes, sure. I will rerun it tomorrow and should have a response by the end of the day.

            The problem is now verified and reported to codership https://github.com/codership/mysql-wsrep/issues/313
            Thank you for assistance - I will close now this report as duplicate of MDEV-4237

            anikitin Andrii Nikitin (Inactive) added a comment - The problem is now verified and reported to codership https://github.com/codership/mysql-wsrep/issues/313 Thank you for assistance - I will close now this report as duplicate of MDEV-4237

            People

              anikitin Andrii Nikitin (Inactive)
              simon1144 Simon Lewis
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.