[MXS-3282] Select query inside transactions are routed to slave with session_track_trx_state=true Created: 2020-11-04  Updated: 2020-11-05  Resolved: 2020-11-05

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 2.5.5
Fix Version/s: 2.5.6

Type: Bug Priority: Major
Reporter: Tim Westervoorde Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

MariaDB 10.3.25



 Description   

With MaxScale, inside a transaction the select statements get routed to slave causing uncommited rows not to show up.

MariaDB [db]> start transaction;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [db]> insert into audit set event = 'BLAAAP', details = 'This is a test', created = 0;
Query OK, 1 row affected (0.001 sec)
 
MariaDB [db]> select * from audit where event = 'BLAAAP';
Empty set (0.001 sec)
 
MariaDB [db]> select * from audit where event = 'BLAAAP' for update;
+---------+---------+--------+----------------+------------+---------+
| id      | user_id | event  | details        | ip_address | created |
+---------+---------+--------+----------------+------------+---------+
| 2051146 |    NULL | BLAAAP | This is a test | NULL       |       0 |
+---------+---------+--------+----------------+------------+---------+
1 row in set (0.001 sec)



 Comments   
Comment by markus makela [ 2020-11-04 ]

Can you enable log_info and share the log for that particular session? It should show where the queries are routed.

Comment by Tim Westervoorde [ 2020-11-04 ]

2020-11-04 16:09:46   info   : (7917463) (Read-Write-Service) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 22, type: QUERY_TYPE_BEGIN_TRX, stmt: start transaction
2020-11-04 16:09:46   info   : (7917463) [readwritesplit] (Read-Write-Service) Route query to master: db2 <
2020-11-04 16:09:46   info   : (7917463) [readwritesplit] (Read-Write-Service) Reply complete, last reply from db2
2020-11-04 16:09:46   info   : (7917463) (Read-Write-Service) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 84, type: QUERY_TYPE_WRITE, stmt: insert into audit set event = 'BLAAAP', details = 'This is a test', created = 0
2020-11-04 16:09:46   info   : (7917463) [readwritesplit] (Read-Write-Service) Route query to master: db2 <
2020-11-04 16:09:46   info   : (7917463) [readwritesplit] (Read-Write-Service) Reply complete, last reply from db2
2020-11-04 16:09:46   info   : (7917463) (Read-Write-Service) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 47, type: QUERY_TYPE_READ, stmt: select * from audit where event = 'BLAAAP'
2020-11-04 16:09:46   info   : (7917463) [readwritesplit] (Read-Write-Service) Route query to slave: db1 <
2020-11-04 16:09:46   info   : (7917463) [readwritesplit] (Read-Write-Service) Reply complete, last reply from db1
2020-11-04 16:09:52   info   : (7917463) (Read-Write-Service) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 13, type: QUERY_TYPE_ROLLBACK, stmt: rollback
2020-11-04 16:09:52   info   : (7917463) [readwritesplit] (Read-Write-Service) Route query to master: db2 <
2020-11-04 16:09:52   info   : (7917463) [readwritesplit] (Read-Write-Service) Reply complete, last reply from db2
2020-11-04 16:09:52   info   : (7917463) (Read-Write-Service) > Autocommit: [enabled], trx is [open], cmd: (0x01) COM_QUIT, plen: 5, type: QUERY_TYPE_SESSION_WRITE, stmt:
2020-11-04 16:09:52   info   : (7917463) [readwritesplit] (Read-Write-Service) Session write, routing to all servers.
2020-11-04 16:09:52   info   : (7917463) (Read-Write-Service) Execute sescmd 2 on 'db1':
2020-11-04 16:09:52   info   : (7917463) [readwritesplit] (Read-Write-Service) Route query to slave: db1
2020-11-04 16:09:52   info   : (7917463) (Read-Write-Service) Execute sescmd 2 on 'db2':
2020-11-04 16:09:52   info   : (7917463) [readwritesplit] (Read-Write-Service) Route query to master: db2
2020-11-04 16:09:52   info   : (7917463) (Read-Write-Service) Execute sescmd 2 on 'db3':
2020-11-04 16:09:52   info   : (7917463) [readwritesplit] (Read-Write-Service) Route query to slave: db3
2020-11-04 16:09:52   info   : Stopped Read-Write-Service client session [7917463]

Comment by markus makela [ 2020-11-04 ]

OK, it definitely looks like the transaction tracking doesn't pick up on the start of the transaction. Can you attach the configuratio you use to the issue? Make sure to remove any passwords and usernames from it.

Comment by Tim Westervoorde [ 2020-11-04 ]

[maxscale]
threads=auto
local_address=10.10.x.x
query_classifier_cache_size=100M
 
[db1]
type=server
address=10.10.x.x
port=3307
protocol=MariaDBBackend
proxy_protocol=on
 
[db2]
type=server
address=10.10.x.x
port=3307
protocol=MariaDBBackend
proxy_protocol=on
 
[db3]
type=server
address=10.10.x.x
port=3307
protocol=MariaDBBackend
proxy_protocol=on
 
[Galera-Monitor]
type=monitor
module=galeramon
servers=db1,db2,db3
user=maxscale
password=xxx
monitor_interval=2000
available_when_donor=true
disable_master_failback=true
 
[Read-Write-Service]
type=service
router=readwritesplit
servers=db1,db2,db3
user=maxscale
password=xxxx
connection_keepalive=30
max_sescmd_history=50
disable_sescmd_history=false
master_accept_reads=false
master_failure_mode=fail_on_write
enable_root_user=true
master_reconnection=true
prune_sescmd_history=true
session_track_trx_state=true
causal_reads=true
causal_reads_timeout=2
 
[Read-Write-Listener]
type=listener
service=Read-Write-Service
protocol=MariaDBClient
port=3306

I first suspected casual_reads but that doesn't change a thing.

Comment by markus makela [ 2020-11-04 ]

If you remove session_track_trx_state=true, does it work?

Comment by Tim Westervoorde [ 2020-11-04 ]

Yes, that works

2020-11-04 16:34:51   info   : (7917465) (Read-Write-Service) > Autocommit: [enabled], trx is [not open], cmd: (0x04) COM_FIELD_LIST, plen: 10, type: , stmt:
2020-11-04 16:34:51   info   : (7917465) [readwritesplit] (Read-Write-Service) Route query to master: db2 <
2020-11-04 16:34:51   info   : (7917465) [readwritesplit] (Read-Write-Service) Reply complete, last reply from db2
2020-11-04 16:34:52   info   : (7917465) (Read-Write-Service) > Autocommit: [enabled], trx is [open], cmd: (0x03) COM_QUERY, plen: 22, type: QUERY_TYPE_BEGIN_TRX, stmt: start transaction
2020-11-04 16:34:52   info   : (7917465) [readwritesplit] (Read-Write-Service) Route query to master: db2 <
2020-11-04 16:34:52   info   : (7917465) [readwritesplit] (Read-Write-Service) Reply complete, last reply from db2
2020-11-04 16:34:52   info   : (7917465) (Read-Write-Service) > Autocommit: [enabled], trx is [open], cmd: (0x03) COM_QUERY, plen: 84, type: QUERY_TYPE_WRITE, stmt: insert into audit set event = 'BLAAAP', details = 'This is a test', created = 0
2020-11-04 16:34:52   info   : (7917465) [readwritesplit] (Read-Write-Service) Route query to master: db2 <
2020-11-04 16:34:52   info   : (7917465) [readwritesplit] (Read-Write-Service) Reply complete, last reply from db2
2020-11-04 16:34:53   info   : (7917465) (Read-Write-Service) > Autocommit: [enabled], trx is [open], cmd: (0x03) COM_QUERY, plen: 47, type: QUERY_TYPE_READ, stmt: select * from audit where event = 'BLAAAP'
2020-11-04 16:34:53   info   : (7917465) [readwritesplit] (Read-Write-Service) Route query to master: db2 <
2020-11-04 16:34:53   info   : (7917465) [readwritesplit] (Read-Write-Service) Reply complete, last reply from db2
2020-11-04 16:34:57   info   : (7917465) (Read-Write-Service) > Autocommit: [enabled], trx is [open], cmd: (0x03) COM_QUERY, plen: 13, type: QUERY_TYPE_ROLLBACK, stmt: rollback
2020-11-04 16:34:57   info   : (7917465) [readwritesplit] (Read-Write-Service) Route query to master: db2 <
2020-11-04 16:34:57   info   : (7917465) [readwritesplit] (Read-Write-Service) Reply complete, last reply from db2
2020-11-04 16:34:57   info   : (7917465) (Read-Write-Service) > Autocommit: [enabled], trx is [not open], cmd: (0x01) COM_QUIT, plen: 5, type: QUERY_TYPE_SESSION_WRITE, stmt:
2020-11-04 16:34:57   info   : (7917465) [readwritesplit] (Read-Write-Service) Session write, routing to all servers.
2020-11-04 16:34:57   info   : (7917465) (Read-Write-Service) Execute sescmd 2 on 'db1':
2020-11-04 16:34:57   info   : (7917465) [readwritesplit] (Read-Write-Service) Route query to slave: db1
2020-11-04 16:34:57   info   : (7917465) (Read-Write-Service) Execute sescmd 2 on 'db2':
2020-11-04 16:34:57   info   : (7917465) [readwritesplit] (Read-Write-Service) Route query to master: db2
2020-11-04 16:34:57   info   : (7917465) (Read-Write-Service) Execute sescmd 2 on 'db3':
2020-11-04 16:34:57   info   : (7917465) [readwritesplit] (Read-Write-Service) Route query to slave: db3
2020-11-04 16:34:57   info   : Stopped Read-Write-Service client session [7917465]

Comment by markus makela [ 2020-11-04 ]

OK, that's good: we have a workaround for the time being. I'll change the issue description to mention that the problem is with session_track_trx_state.

Comment by Tim Westervoorde [ 2020-11-04 ]

OK I did some debugging and I think I've found where things go wrong.

I started a transaction on the client

MariaDB [db]> start transaction;

Had a debugger attached to MaxScale with a breakpoint

Breakpoint 1, Session::parse_and_set_trx_state (this=this@entry=0x7fffa8014180, reply=...) at /home/timofey_turenko_mariadb_com/MaxScale/server/core/session.cc:1460
 
(gdb) p reply
$1 = (const maxscale::Reply &) @0x7fffa8018a08: {m_command = 3 '\003', m_reply_state = maxscale::DONE, m_error = {m_code = 0, m_sql_state = "", m_message = ""}, m_row_count = 0, m_size = 146,
  m_generated_id = 0, m_param_count = 0, m_num_warnings = 0, Python Exception <class 'gdb.error'> There is no member or method named _M_bbegin.:
m_is_ok = true, m_field_counts = std::vector of length 0, capacity 0, m_variables = std::unordered_map with 5 elements}
 
(gdb) p trx_state
$2 = "T_______"

Based on this the reply packet included transaction info, and T telling an explicit transaction was started

In the 2.3.x code, this gets checked as follows

       mysql_tx_state_t s = parse_trx_state(trx_state);
 
        if (s == TX_EMPTY)
        {
            session_set_trx_state(ses, SESSION_TRX_INACTIVE);
        }
        else if ((s & TX_EXPLICIT) || (s & TX_IMPLICIT))
        {
            session_set_trx_state(ses, SESSION_TRX_ACTIVE);
        }

With s being defined as

        case 'T':
            s |= TX_EXPLICIT;
            break;
 
        case 'I':
            s |= TX_IMPLICIT;
            break;
 
        case 'r':
            s |= TX_READ_UNSAFE;
            break;
 
        case 'R':
            s |= TX_READ_TRX;
            break;
 
        case 'w':
            s |= TX_WRITE_UNSAFE;
            break;
 
        case 'W':
            s |= TX_WRITE_TRX;
            break;
 
        case 's':
            s |= TX_STMT_UNSAFE;
            break;
 
        case 'S':
            s |= TX_RESULT_SET;
            break;
 
        case 'L':
            s |= TX_LOCKED_TABLES;
            break;

That all seems legit and worked like a charm in 2.3.x

In 2.5.5 however, this has been rewritten to server/core/session.cc

 auto trx_state = reply.get_variable("trx_state");
 
    if (!trx_state.empty())
    {
        if (trx_state.find_first_of("TI") == std::string::npos)
        {
            set_trx_state(SESSION_TRX_ACTIVE);
        }
        else if (trx_state.find_first_of("rRwWsSL") == std::string::npos)
        {
            set_trx_state(SESSION_TRX_INACTIVE);
        }
    }

That's were things break, find_first_of returns the position of the first character found, so I think

if (trx_state.find_first_of("TI") == std::string::npos)

Is the reverse of what we want. If neither T and I are set, we set our trx state to SESSION_TRX_ACTIVE.

This should be If either T or I is set, our trx_state is SESSION_TRX_ACTIVE

Does that makes sense to you as well?

Comment by markus makela [ 2020-11-05 ]

It looks like you solved the bug faster than us If you'd like to, you can create a pull request on GitHub for this so that your contributions will show in the git history. Otherwise, we can fix it now that the bug is obvious.

Comment by Tim Westervoorde [ 2020-11-05 ]

Sure, see https://github.com/mariadb-corporation/MaxScale/pull/212

Comment by markus makela [ 2020-11-05 ]

Thank you for your contribution!

Generated at Thu Feb 08 04:20:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.