when using wsrep (w/ galera) and issuing commands that can cause deadlocks, deadlock exception errors are sent in responses to commands such as close prepared statement () which, by spec, must not send a response.
this confuses protocol clients (tested with C library and golang native driver), causing anything from the client disconnection (C client) to mismatched responses (golang driver, which results in this error being interpreted as the response to next command).
we can send the test case sources if needed, the problem is simple to reproduce on a two node mariadb+galera cluster, with two simple clients issuing commands to different nodes.
the issue is relevant for all 10.x versions of mariadb server (tested w/ 10.0-galera and 10.2).
(further text refers to 10.0-galera sources on github, but the code does not change relevantly for newer 10.x versions)
the root cause is in sql_parse.cc, in the WSREP-related block at the very top of dispatch_command(), starting at https://github.com/MariaDB/server/blob/10.0-galera/sql/sql_parse.cc#L1245
when ABORTED conflict state is detected there, an error is set, and the code jumps to dispatch_end label. this results in emitting the error to the client.
that's all fine and dandy unless the command being dispatched is something that - according to the protocol spec - must not emit a response, such as COM_STMT_CLOSE. at least COM_QUIT is also a problem, but the client will break the connection after it, so it don't matter that much. processing COM_STMT_CLOSE would call da->disable_status(), preventing response emission. however, since the code jumps directly to dispatch_end, this is never called.
a fix that we contemplate and are testing right now is skipping the check for wsrep conflict state in case the dispatched_command is COM_STMT_CLOSE. this should be just fine, as all it does is deallocate the statement, and the deadlock will be reported to the next command that requires a response the next time dispatch_command() is called.
will send a pull request on github as soon as we're done testing, but educated opinion on the proposed fix would be very much appreciated, databases are not our core competence here.