[MXS-323] mysql_client readwritesplit handleError seems using wrong dcb and cause wrong behavior Created: 2015-08-20 Updated: 2015-12-02 Resolved: 2015-12-02 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | Core, readconnroute, readwritesplit, schemarouter |
| Affects Version/s: | None |
| Fix Version/s: | 1.3.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Leo Guo | Assignee: | markus makela |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Ubuntu14 |
||
| Description |
|
I am doing some experimentation with Maxscale 1.3 develop branch and I ran into a situation which I think is a bug in readwritesplit.handleError. let me describe why I think it is a bug. In mysql_client.gw_read_client_event(DCB *dcb), it calls route_by_statement() to route a query stmt in its dcb->session. Let's say that route_by_statement() return a 'rc' that indicates routing failure due to backend failure. In my case, I somehow faked a backend failure. At mysql_client.c line 1176, it calls router->handleError(...dcb...), which is readwritesplit.handleError(). router->handleError( Note, 'dcb' is actually client_dcb that is passed to gw_read_client_event(dcb). However, in readwritesplit.handleError(), this dcb is thought to be a 'backend_dcb'. In my case, the error action is ERRACT_NEW_CONNECTION, and it is supposed to find a replacement of slave or fail because of master failover. At readwritesplit.c line 4851, the equality check of "rses->rses_master_ref->bref_dcb == backend_dcb" would not be true, and it would go to slave case. For my case, I used a single master server in the config, so it is already wrong to go to slave replacement. I think it should set *succp = false so that it would show error to the client. if (rses->rses_master_ref->bref_dcb == backend_dcb && Now, look at slave replacement. It calls handle_error_new_connection(). At readwritesplit.c line 4970, it checks for the backend_ref_t that points this "backend_dcb" and it would never find it because this dcb is a client_dcb. if ((bref = get_bref_from_dcb(myrses, backend_dcb)) == NULL) { succp = true; goto return_succp; }So it sets succp = true and return. Back in mysql_client.gw_read_client_event(), it seems error handling went fine. The correct action should be close the client_dcb and let client know. Instead, the client session hangs, while maxscale moves on. mysql> insert into test.mytest values (4,'test4',4); Well, I know I kind of faked the master backend error, but it actually is something that I want to test with error handling. I hope my description is clear. I am happy to discuss the situation. |
| Comments |
| Comment by markus makela [ 2015-08-21 ] |
|
I took a look at the code in mysql_client and it does seem odd that the handleError is called even though we should be closing the client DCB right away. We'll review the code and report on our findings. Thank you for noticing this! |
| Comment by Leo Guo [ 2015-08-22 ] |
|
Agree on closing client_dcb for master backend failure. What about slave backend? According to commend, it could try to re-connect to a different slave replica backend. Does it still make sense to re-create backend connection to a different slave? Thanks. |
| Comment by Dipti Joshi (Inactive) [ 2015-09-08 ] |
|
johan.wikman markus makela Can you discuss if this is really needed for 1.3.0 or can wait after 1.3.0 |
| Comment by Dipti Joshi (Inactive) [ 2015-09-09 ] |
|
johan.wikman Check if this is same as |
| Comment by Johan Wikman [ 2015-09-14 ] |
|
No, this is not the same as |
| Comment by markus makela [ 2015-12-02 ] |
|
All the routers now properly detect client DCBs in the handleError calls. The fix to the readwritesplit and schemarouter is in commit d6afe70c6fe34ef595d044155eb5898c993c4fd7 and for readconnroute in 1b04a0cf910fdfdd58f4bb5a49f5416b990d59cf. |