[MCOL-5622] columnstore crash on big query Created: 2022-08-29  Updated: 2023-11-29

Status: Needs Feedback
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Michael Roosz Assignee: Roman
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Columnstore crashes on a big query (>65 kb) with lots of "case when then.." statements.

The crash was seen first on 10.6.8, then we upgraded to 10.9.2 but this did not solve the issue.

Server version: 10.9.2-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=16
max_threads=153
thread_count=16
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 468006 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f710bc07b58
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f733fec43d8 thread_stack 0x49000
??:0(my_print_stacktrace)[0x557e488a1f9e]
??:0(handle_fatal_signal)[0x557e4839b745]
??:0(__restore_rt)[0x7f73491d7ce0]
??:0(execplan::operator==(execplan::ParseTree const&, execplan::ParseTree const&))[0x7f733514d681]
??:0(cal_impl_if::buildCaseFunction(Item_func*, cal_impl_if::gp_walk_info&, bool&))[0x7f7336c8fc1f]
??:0(cal_impl_if::buildFunctionColumn(Item_func*, cal_impl_if::gp_walk_info&, bool&, bool))[0x7f7336c9074c]
??:0(cal_impl_if::getSelectPlan(cal_impl_if::gp_walk_info&, st_select_lex&, boost::shared_ptr<execplan::CalpontSelectExecutionPlan>&, bool, bool, std::vector<Item*, std::allocator<Item*> > const&))[0x7f7336c9eeef]
??:0(cal_impl_if::cs_get_select_plan(ha_columnstore_select_handler*, THD*, boost::shared_ptr<execplan::CalpontSelectExecutionPlan>&, cal_impl_if::gp_walk_info&))[0x7f7336ca29f6]
??:0(ha_mcs_impl_pushdown_init(mcs_handler_info*, TABLE*))[0x7f7336c562d8]
??:0(create_columnstore_select_handler(THD*, st_select_lex*))[0x7f7336c3d2ea]
??:0(mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x557e481d9059]
??:0(handle_select(THD*, LEX*, select_result*, unsigned long))[0x557e481d984b]
??:0(LEX::mark_first_table_as_inserting())[0x557e48165879]
??:0(mysql_execute_command(THD*, bool))[0x557e4816dd12]
??:0(Prepared_statement::execute(String*, bool))[0x557e4818f29f]
??:0(Prepared_statement::execute_loop(String*, bool, unsigned char*, unsigned char*))[0x557e4818f55d]
??:0(Prepared_statement::execute_bulk_loop(String*, bool, unsigned char*, unsigned char*))[0x557e48190315]
??:0(mysqld_stmt_execute(THD*, char*, unsigned int))[0x557e481904bb]
??:0(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x557e4816b0b0]
??:0(do_command(THD*, bool))[0x557e4816bebf]
??:0(do_handle_one_connection(CONNECT*, bool))[0x557e4827a487]
??:0(handle_one_connection)[0x557e4827a7cd]
??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x557e4859ee4d]



 Comments   
Comment by Michael Roosz [ 2022-09-14 ]

Okay, I did some debugging, the reason for the crash is this line in "ha_mcs_execplan.cpp":

if (!gwi.ptWorkStack.empty() && *gwi.ptWorkStack.top() == *sptp.get())

"*sptp.get()" returns 0x0 (NULL) and thus makes the == check crash

Comment by Michael Roosz [ 2022-09-14 ]

My naive fix would be:

before:

if (!gwi.ptWorkStack.empty() && *gwi.ptWorkStack.top() == *sptp.get())

after:

if (sptp && !gwi.ptWorkStack.empty() && *gwi.ptWorkStack.top() == *sptp.get())

this fixes the crash for me.

however, my query now fails with:

ERROR 1178 (42000) at line 2: The storage engine for the table doesn't support MCS-1001: Function 'replace' isn't supported.

but no more crashes

Comment by Roman [ 2023-11-29 ]

Hey Roosz I would appreciate if you share the original query/ddl that crashes the plugin code so we can repro it locally b/c as you correctly pointed out the naive fix won't work.
JFYI MDB 10.6-11.0 uses the same MCS branch so there won't be fixes in there for the issue. You can try 11.1.
You can check MCS version with `select status like '%columnstore%';`

Generated at Thu Feb 08 02:59:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.