[MXS-1852] mysql_client_test test_bug17309863 failed Created: 2018-05-04  Updated: 2018-05-16  Resolved: 2018-05-16

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: 2.2
Fix Version/s: 2.2.6

Type: Bug Priority: Major
Reporter: xiangzhong Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

linux + make debug + mysql 5.6



 Description   

we add debug log in kill_func

static bool kill_func(DCB *dcb, void *data)
{
    KillInfo* info = (KillInfo*)data;
 
    if (dcb->dcb_role == DCB_ROLE_BACKEND_HANDLER &&
        dcb->session->ses_id == info->target_id)
    {
        MySQLProtocol* proto = (MySQLProtocol*)dcb->protocol;
        MXS_DEBUG("dcb:%p find client threadid:%lu, with backend threadid:%lu", dcb, info->target_id, proto->thread_id);
        info->targets.push_back(std::make_pair(dcb->server, proto->thread_id));
    }
 
    return true;
}

there output:(6) (kill_func): dcb:0x1c25180 find client threadid:5, with backend threadid:0
now the backend dcb real proto->thread_id= 0, lead to not kill the connection and client can exec sql again.

i think there backend dcb proto->thread_id should be != 0.



 Comments   
Comment by markus makela [ 2018-05-04 ]

I think this could happen if the backend DCB has not yet been connected. To confirm this, the state of the DCB can be checked.

Comment by xiangzhong [ 2018-05-04 ]

i confirm the backend dcb have connected, i add debug log:

bool gw_read_backend_handshake(DCB *dcb, GWBUF *buffer)
{
    MySQLProtocol *proto = (MySQLProtocol *)dcb->protocol;
    bool rval = false;
    uint8_t *payload = GWBUF_DATA(buffer) + 4;
 
    if (gw_decode_mysql_server_handshake(proto, payload) >= 0)
    {
        MXS_DEBUG("backend dcb:%p with backend threadid:%lu", dcb, proto->thread_id);
        rval = true;
    }
 
    return rval;
}

there printf dcb is equal kill_func't dcb and thread_id != 0.

Comment by xiangzhong [ 2018-05-07 ]

sorry, this case is modified by us, because direct connect db is failed. diff like:

rc= mysql_query(lmysql, "SELECT 'bug17309863'");
myquery(rc); ===> myquery_r(rc);

mysql_close(lmysql);

Comment by markus makela [ 2018-05-07 ]

I believe the test fails due to the fact that it rapidly creates databases and uses them. The following log was generated with a four node master-slave cluster. It shows that the slaves have not yet replicated the database creation which causes an authentication failure:

2018-05-07 11:21:08   info   : (16) [readwritesplit] > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 25, type: QUERY_TYPE_READ, stmt: SELECT 'bug17309863' 
2018-05-07 11:21:08   info   : (16) [readwritesplit] Route query to slave       [127.0.0.1]:3001 <
2018-05-07 11:21:08   error  : (16) [mariadbbackend] Invalid authentication message from backend 'server2'. Error code: 1049, Msg : #42000Unknown database 'client_test_db'
2018-05-07 11:21:08   error  : [mariadbbackend] Unable to write to backend 'server2' due to authentication failure. Server in state RUNNING SLAVE.
2018-05-07 11:21:08   error  : (16) [mariadbbackend] Invalid authentication message from backend 'server3'. Error code: 1049, Msg : #42000Unknown database 'client_test_db'
2018-05-07 11:21:08   error  : (16) [mariadbbackend] Invalid authentication message from backend 'server4'. Error code: 1049, Msg : #42000Unknown database 'client_test_db'

With only one server, the test never fails.

Comment by xiangzhong [ 2018-05-08 ]

we have confirmed: because the backend connect is async,for this case when recv kill query maybe the backend not connect now.

Comment by markus makela [ 2018-05-08 ]

OK, in that case I think the DCB could just be closed.

Generated at Thu Feb 08 04:09:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.