[MXS-4817] maxscale crashes on maxsimd::generic::is_multi_stmt_imp Created: 2023-10-19  Updated: 2023-10-25  Resolved: 2023-10-25

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 23.08.1
Fix Version/s: 23.08.2

Type: Bug Priority: Critical
Reporter: Maikel Punie Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

centos 7.9
Intel(R) Xeon(R) CPU W3565


Issue Links:
Problem/Incident
is caused by MXS-4821 Multi-statement detection works diffe... Closed

 Description   

2023-10-17 08:47:28 alert : (333766) (donald); MaxScale 23.08.1 received fatal signal 11. Commit ID: 18390c93c6c637a0e472bea061b5338565f53d6a, System name: Linux, Release string: CentOS Linux release 7.9.2009 (Core), Thread: Worker-02
2023-10-17 08:47:28 alert : (333766) (donald); Statement currently being classified: none/unknown
2023-10-17 08:47:28 alert : (333766) (donald); Session: 333766 Service: ReadWriteSVC
2023-10-17 08:47:28 notice : (333766) (donald); For a more detailed stacktrace, install GDB and add 'debug=gdb-stacktrace' under the [maxscale] section.
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (maxsimd::generic::is_multi_stmt_impl(std::basic_string_view<char, std::char_traits<char> >)): maxutils/maxsimd/src/generic_multistmt.cc:91
/usr/lib64/maxscale/libpp_sqlite.so (parse_query(maxscale::Parser::Helper const&, GWBUF const&, unsigned int)): server/modules/parser_plugin/pp_sqlite/pp_sqlite.cc:3838
/usr/lib64/maxscale/libpp_sqlite.so (PpSqliteInfo::get(maxscale::Parser::Helper const&, GWBUF const*, unsigned int)): server/modules/parser_plugin/pp_sqlite/pp_sqlite.cc:321
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (maxscale::CachingParser::get_type_mask(GWBUF const&) const): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (ServerEndpoint::routeQuery(GWBUF&&)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (maxscale::Backend::write(GWBUF&&, maxscale::Backend::response_type)): server/core/backend.cc:72
/usr/lib64/maxscale/libreadwritesplit.so (std::vector<Hint, std::allocator<Hint> >::~vector()): /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_vector.h:677
/usr/lib64/maxscale/libreadwritesplit.so (RWSplitSession::handle_got_target(GWBUF&&, maxscale::RWBackend*, bool)): ??:?
/usr/lib64/maxscale/libreadwritesplit.so (RWSplitSession::route_single_stmt(GWBUF&&, RWSplitSession::RoutingPlan const&)): ??:?
/usr/lib64/maxscale/libreadwritesplit.so (RWSplitSession::route_query(GWBUF&&)): ??:?
/usr/lib64/maxscale/libreadwritesplit.so (RWSplitSession::routeQuery(GWBUF&&)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (ServiceEndpoint::routeQuery(GWBUF&&)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (Session::routeQuery(GWBUF&&)): /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/std_function.h:687
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (MariaDBClientConnection::process_normal_packet(GWBUF&&)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (MariaDBClientConnection::process_normal_read()): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (MariaDBClientConnection::ready_for_reading(DCB*)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (DCB::process_events(unsigned int)): server/core/dcb.cc:1304
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (DCB::event_handler(unsigned int)): server/core/dcb.cc:1357
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (maxbase::Worker::deliver_events(maxbase::Pollable*, unsigned int, maxbase::Pollable::Context)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (maxbase::Worker::poll_waitevents()): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (maxbase::Worker::run(maxbase::Semaphore*)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (execute_native_thread_routine): thread48.o:?
/lib64/libpthread.so.0 (start_thread): pthread_create.c:?
/lib64/libc.so.6 (__clone): ??:?
MaxScale 23.08.1 received fatal signal 11. Commit ID: 18390c93c6c637a0e472bea061b5338565f53d6a, System name: Linux, Release string: CentOS Linux release 7.9.2009 (Core), Thread: Worker-02Writing core dump.



 Comments   
Comment by markus makela [ 2023-10-19 ]

Just to confirm that this is a problem related to older CPUs, could you check if the output of lscpu contains avx2 or not? I'm assuming it does't which is why it ends up here in the code.

Comment by Maikel Punie [ 2023-10-19 ]

[root@harvey ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 26
Model name: Intel(R) Xeon(R) CPU W3565 @ 3.20GHz
Stepping: 5
CPU MHz: 1733.000
CPU max MHz: 3068.0000
CPU min MHz: 1600.0000
BogoMIPS: 6133.26
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ssbd rsb_ctxsw ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida spec_ctrl intel_stibp flush_l1d

Comment by markus makela [ 2023-10-19 ]

Yes, it looks like it indeed does not have it.

Comment by markus makela [ 2023-10-19 ]

Are you able to identify which query caused this? If you have only seen this once and you aren't able to see which one caused it, you can configure MaxScale to keep the latest query string in memory by adding retain_last_statements=1 under the [maxscale] section. This should then log the current query if the crash happens again.

Comment by Maikel Punie [ 2023-10-19 ]

i just restarted maxscale with this parameter, will need to wait until it happens again

Comment by markus makela [ 2023-10-19 ]

How long have you had MaxScale 23.08 running on this server?

Comment by Maikel Punie [ 2023-10-19 ]

since last week ....
Before that we had an older version and this was never seen.

Comment by Maikel Punie [ 2023-10-20 ]

Here is the log of the last couple

2023-10-19 22:41:23 notice : (1223722) Stmt 1(2023-10-19 22:41:02): SELECT id, GroupName, Attribute, Value, op FROM radgroupreply WHERE GroupName = '' ORDER BY id
2023-10-19 22:45:07 notice : (1245064) Stmt 1(2023-10-19 22:44:42): SELECT GroupName FROM usergroup WHERE UserName = '00:99:00:00:00:01::' ORDER BY priority
2023-10-19 22:50:29 notice : (1310967) Stmt 1(2023-10-19 22:50:03): SELECT GroupName FROM usergroup WHERE UserName = '00:99:00:00:00:01::' ORDER BY priority
2023-10-19 22:56:02 notice : (1384474) Stmt 1(2023-10-19 22:55:36): SELECT GroupName FROM usergroup WHERE UserName = '00:99:00:00:00:01::' ORDER BY priority
2023-10-19 23:01:38 notice : (1443917) Stmt 1(2023-10-19 23:01:13): SELECT GroupName FROM usergroup WHERE UserName = '00:99:00:00:00:01::' ORDER BY priority
2023-10-19 23:06:39 notice : (1509097) Stmt 1(2023-10-19 23:06:13): SELECT GroupName FROM usergroup WHERE UserName = '00:99:00:00:00:01::' ORDER BY priority
2023-10-19 23:12:02 notice : (1570038) Stmt 1(2023-10-19 23:11:36): SELECT GroupName FROM usergroup WHERE UserName = '00:99:00:00:00:01::' ORDER BY priority
2023-10-19 23:27:06 alert : (1778638) (minnie); MaxScale 23.08.1 received fatal signal 11. Commit ID: 18390c93c6c637a0e472bea061b5338565f53d6a, System name: Linux, Release string: CentOS Linux release 7.
9.2009 (Core), Thread: Worker-01
2023-10-19 23:27:06 alert : (1778638) (minnie); Statement currently being classified: none/unknown
2023-10-19 23:27:06 alert : (1778638) (minnie); Session: 1778638 Service: ReadWriteSVC
2023-10-19 23:27:06 notice : (1778638) (minnie); For a more detailed stacktrace, install GDB and add 'debug=gdb-stacktrace' under the [maxscale] section.
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (maxsimd::generic::is_multi_stmt_impl(std::basic_string_view<char, std::char_traits<char> >)): maxutils/maxsimd/src/generic_multistmt.cc:91
/usr/lib64/maxscale/libpp_sqlite.so (parse_query(maxscale::Parser::Helper const&, GWBUF const&, unsigned int)): server/modules/parser_plugin/pp_sqlite/pp_sqlite.cc:3838
/usr/lib64/maxscale/libpp_sqlite.so (PpSqliteInfo::get(maxscale::Parser::Helper const&, GWBUF const*, unsigned int)): server/modules/parser_plugin/pp_sqlite/pp_sqlite.cc:321
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (maxscale::CachingParser::get_type_mask(GWBUF const&) const): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (ServerEndpoint::routeQuery(GWBUF&&)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (maxscale::Backend::write(GWBUF&&, maxscale::Backend::response_type)): server/core/backend.cc:72
/usr/lib64/maxscale/libreadwritesplit.so (std::vector<Hint, std::allocator<Hint> >::~vector()): /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_vector.h:677
/usr/lib64/maxscale/libreadwritesplit.so (RWSplitSession::handle_got_target(GWBUF&&, maxscale::RWBackend*, bool)): ??:?
/usr/lib64/maxscale/libreadwritesplit.so (RWSplitSession::route_single_stmt(GWBUF&&, RWSplitSession::RoutingPlan const&)): ??:?
/usr/lib64/maxscale/libreadwritesplit.so (RWSplitSession::route_query(GWBUF&&)): ??:?
/usr/lib64/maxscale/libreadwritesplit.so (RWSplitSession::routeQuery(GWBUF&&)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (ServiceEndpoint::routeQuery(GWBUF&&)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (Session::routeQuery(GWBUF&&)): /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/std_function.h:687
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (MariaDBClientConnection::process_normal_packet(GWBUF&&)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (MariaDBClientConnection::process_normal_read()): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (MariaDBClientConnection::ready_for_reading(DCB*)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (DCB::process_events(unsigned int)): server/core/dcb.cc:1304
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (DCB::event_handler(unsigned int)): server/core/dcb.cc:1357
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (maxbase::Worker::deliver_events(maxbase::Pollable*, unsigned int, maxbase::Pollable::Context)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (maxbase::Worker::poll_waitevents()): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (maxbase::Worker::run(maxbase::Semaphore*)): ??:?
/usr/lib64/maxscale/libmaxscale-common.so.1.0.0 (execute_native_thread_routine): thread48.o:? /lib64/libpthread.so.0 (start_thread): pthread_create.c:? /lib64/libc.so.6 (__clone): ??:? MaxScale 23.08.1 received fatal signal 11. Commit ID: 18390c93c6c637a0e472bea061b5338565f53d6a, System name: Linux, Release string: CentOS Linux release 7.9.2009 (Core), Thread: Worker-01Writing core dump.

Comment by markus makela [ 2023-10-20 ]

Managed to reproduce it with this SQL:

begin not atomic select 1; end /** hello */

There's a null check missing in code that's checking if it's the end of stored procedure.

Generated at Thu Feb 08 04:31:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.