[MDEV-10812] WSREP causes responses being sent to protocol commands that must not send a response Created: 2016-09-15  Updated: 2021-05-13  Resolved: 2017-01-31

Status: Closed
Project: MariaDB Server
Component/s: Server, wsrep
Affects Version/s: 10.1
Fix Version/s: 10.1.22

Type: Bug Priority: Critical
Reporter: Jaka Močnik Assignee: Sachin Setiya (Inactive)
Resolution: Fixed Votes: 1
Labels: contribution
Environment:

tested on GNU/Linux (Ubuntu 1x.04), but relevant for all environments


Sprint: 10.1.21

 Description   

when using wsrep (w/ galera) and issuing commands that can cause deadlocks, deadlock exception errors are sent in responses to commands such as close prepared statement () which, by spec, must not send a response.

this confuses protocol clients (tested with C library and golang native driver), causing anything from the client disconnection (C client) to mismatched responses (golang driver, which results in this error being interpreted as the response to next command).

we can send the test case sources if needed, the problem is simple to reproduce on a two node mariadb+galera cluster, with two simple clients issuing commands to different nodes.

the issue is relevant for all 10.x versions of mariadb server (tested w/ 10.0-galera and 10.2).

(further text refers to 10.0-galera sources on github, but the code does not change relevantly for newer 10.x versions)

the root cause is in sql_parse.cc, in the WSREP-related block at the very top of dispatch_command(), starting at https://github.com/MariaDB/server/blob/10.0-galera/sql/sql_parse.cc#L1245

when ABORTED conflict state is detected there, an error is set, and the code jumps to dispatch_end label. this results in emitting the error to the client.

that's all fine and dandy unless the command being dispatched is something that - according to the protocol spec - must not emit a response, such as COM_STMT_CLOSE. at least COM_QUIT is also a problem, but the client will break the connection after it, so it don't matter that much. processing COM_STMT_CLOSE would call da->disable_status(), preventing response emission. however, since the code jumps directly to dispatch_end, this is never called.

a fix that we contemplate and are testing right now is skipping the check for wsrep conflict state in case the dispatched_command is COM_STMT_CLOSE. this should be just fine, as all it does is deallocate the statement, and the deadlock will be reported to the next command that requires a response the next time dispatch_command() is called.

will send a pull request on github as soon as we're done testing, but educated opinion on the proposed fix would be very much appreciated, databases are not our core competence here.



 Comments   
Comment by Jaka Močnik [ 2016-09-15 ]

The proposed fix: https://github.com/bancek/mariadb-server/commit/dc62ddc9c8e9e6032eec13aee8069f20ab2a0521

Comment by Elena Stepanova [ 2016-09-19 ]

Thanks for the report and proposed patch.

Comment by Sachin Setiya (Inactive) [ 2016-10-28 ]

Hi jkmcnk could you please provide test case for this bug ?

Comment by Luka Zakrajsek [ 2016-11-02 ]

Hi

https://github.com/bancek/mariadb-mdev-10812

Comment by Nirbhay Choubey (Inactive) [ 2016-12-07 ]

http://lists.askmonty.org/pipermail/commits/2016-December/010180.html

Comment by Nirbhay Choubey (Inactive) [ 2016-12-07 ]

The included test passes even if the sql_parse.cc patch is not applied.

Comment by Sachin Setiya (Inactive) [ 2016-12-07 ]

It will pass because original sql_parse.cc does not have this DBUG_ASSERT

DBUG_ASSERT((command != COM_QUIT && command != COM_STMT_CLOSE) || thd->get_stmt_da()->is_disabled());

BTW, this http://lists.askmonty.org/pipermail/commits/2016-December/010180.html commit is old one,
I updated it. but that patch is not showing up in Archive.

Comment by Nirbhay Choubey (Inactive) [ 2017-01-03 ]

sachin.setiya.007 Is this the latest patch? http://lists.askmonty.org/pipermail/commits/2016-December/010274.html

Does the included test assert if we remove the following change?

-    if (thd->wsrep_conflict_state== ABORTED)
+    /* we let COM_QUIT and STMT_CLOSE execute even if wsrep aborted */
+    if (thd->wsrep_conflict_state== ABORTED &&
+        command != COM_STMT_CLOSE && command != COM_QUIT)

Comment by Sachin Setiya (Inactive) [ 2017-01-22 ]

Yes test assrets.
╰$ ./mtr galera.galera_prepared_statement
Logging: /home/abc/mdev-10812/server/mysql-test/mysql-test-run.pl galera.galera_prepared_statement
vardir: /home/abc/mdev-10812/build/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/home/abc/mdev-10812/build/mysql-test/var'...
Checking supported features...
MariaDB Version 10.1.22-MariaDB-debug

  • SSL connections supported
  • binaries are debug compiled
    Sphinx 'indexer' binary not found, sphinx suite will be skipped
    Collecting tests...
    Installing system database...

==============================================================================

TEST RESULT TIME (ms) or COMMENT
--------------------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
galera.galera_prepared_statement 'innodb_plugin' [ fail ] Found warnings/errors in server log file!
Test ended at 2017-01-22 17:33:53
line
mysqld: /home/abc/mdev-10812/server/sql/sql_parse.cc:1925: bool dispatch_command(enum_server_command, THD*, char*, uint): Assertion `(command != COM_QUIT && command != COM_STMT_CLOSE) || thd->get_stmt_da()->is_disabled()' failed.

This one is latest patch.
I have changed test case. Previous test case was not producing the deadlock As expected.
http://lists.askmonty.org/pipermail/commits/2017-January/010478.html

Comment by Sachin Setiya (Inactive) [ 2017-01-31 ]

http://lists.askmonty.org/pipermail/commits/2017-January/010586.html

Generated at Thu Feb 08 07:45:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.