It appears that the action of the router error handling (for example handleError in readconnroute.c line 847ff) is unsafe. When a backend "error" occurs, it is picked up by the protocol error handler in mysql_backend.c (gw_backend_hangup at line 1046ff). This calls the router error handler with various relevant data blocks. At present the protocol calls dcb_close, passing the DCB relating to the backend that has the "error".
However, the router seems to need to do more work. There is a pointer in the router's session to the backend DCB, and if the DCB is to be closed, the pointer needs to be removed. Otherwise, the DCB gets closed and freed but is subsequently referred to from the link in the router session, which can cause a crash.
The provisional solution, implemented in branch
MXS-329 (because the problem was exposed during testing of the crash described in MXS-329), is to remove the dcb_close from the protocol. The protocol cannot deal with the router session, because the structure of a router session varies from router to router. The error handler in the router needs to do this. So the error handler in the router code now (at least temporarily) aborts if the passed DCB does not match the DCB pointer in the router session. Otherwise, the pointer in the router session is set to NULL and the DCB is closed.The call to dcb_close in the protocol error handler is removed.
All line numbers refer to
MXS-329 as of now.