Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.4.13, 10.4.14
Description
We are observing mariadb sever crashes with 10.4 (confirmed with 10.4.13 and 10.4.14) in a scenario where multiple clients are modifying rows from the same innodb tables. The same scenario not to happen with mariadb 10.3 (confirmed OK with 10.3.22).
On a debug build (built with -DCMAKE_BUILD_TYPE=Debug) , the error log contain:
mysqld: /srv/slapgrid/slappart4/srv/slapos/inst/slappart0/tmp/soft/5f89b8831c4d60d07cb75ad034cd3ec6/parts/mariadb__compile__/mariadb-10.4.14/storage/innobase/row/row0sel.cc:4480: dberr_t row_search_mvcc(byte*, page_cur_mode_t, row_prebuilt_t*, ulint, ulint): Assertion `prebuilt->sql_stat_start || trx->state == TRX_STATE_ACTIVE || (prebuilt->table->no_rollback() && trx->state == TRX_STATE_NOT_STARTED)' failed.
|
201008 7:10:57 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.4.14-MariaDB-debug-log
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=40
|
max_threads=132
|
thread_count=47
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 421632 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x7f17cc000a88
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7f18493eae88 thread_stack 0x49000
|
/srv/slapgrid/slappart4/srv/slapos/inst/slappart0/tmp/soft/5f89b8831c4d60d07cb75ad034cd3ec6/parts/mariadb/bin/mysqld(my_print_stacktrace+0x2b)[0x563fa496a232]
|
mysys/stacktrace.c:270(my_print_stacktrace)[0x563fa41bcc09]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7f1864c080c0]
|
linux/raise.c:51(__GI_raise)[0x7f1863895fcf]
|
stdlib/abort.c:91(__GI_abort)[0x7f18638973fa]
|
assert/assert.c:92(__assert_fail_base)[0x7f186388ee37]
|
/lib/x86_64-linux-gnu/libc.so.6(+0x2bee2)[0x7f186388eee2]
|
/srv/slapgrid/slappart4/srv/slapos/inst/slappart0/tmp/soft/5f89b8831c4d60d07cb75ad034cd3ec6/parts/mariadb/bin/mysqld(+0xd56934)[0x563fa4595934]
|
include/sync0types.h:451(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long))[0x563fa43e8fa7]
|
:0(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function))[0x563fa41d4c2c]
|
sql/handler.h:3631(handler::index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function))[0x563fa41c5474]
|
sql/handler.cc:2919(handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function))[0x563fa41cd459]
|
sql/handler.cc:5878(handler::read_range_first(st_key_range const*, st_key_range const*, bool, bool))[0x563fa40830cd]
|
sql/multi_range_read.cc:353(handler::multi_range_read_next(void**))[0x563fa4083158]
|
sql/multi_range_read.cc:390(Mrr_simple_index_reader::get_next(void**))[0x563fa4084f3a]
|
sql/multi_range_read.cc:1491(DsMrr_impl::dsmrr_next(void**))[0x563fa43d1370]
|
:0(ha_innobase::multi_range_read_next(void**))[0x563fa432606e]
|
sql/opt_range.cc:12224(QUICK_RANGE_SELECT::get_next())[0x563fa41bb099]
|
sql/filesort.cc:772(find_all_keys)[0x563fa3f60fee]
|
sql/sql_select.cc:23654(create_sort_index(THD*, JOIN*, st_join_table*, Filesort*))[0x563fa3f612fe]
|
sql/sql_select.cc:21400(st_join_table::sort_table())[0x563fa3f61432]
|
sql/sql_select.cc:21338(join_init_read_record(st_join_table*))[0x563fa3f49d10]
|
sql/sql_select.cc:20414(sub_select(JOIN*, st_join_table*, bool))[0x563fa3f7d39e]
|
sql/sql_select.cc:19955(do_select)[0x563fa3f7d7d5]
|
sql/sql_select.cc:4256(JOIN::exec())[0x563fa3f7bb92]
|
sql/sql_select.cc:4687(mysql_select(THD*, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x563fa3f7beac]
|
sql/sql_select.cc:410(handle_select(THD*, LEX*, select_result*, unsigned long))[0x563fa3ef4a65]
|
sql/sql_parse.cc:6355(execute_sqlcom_select(THD*, TABLE_LIST*))[0x563fa3effe3b]
|
sql/sql_parse.cc:3889(mysql_execute_command(THD*))[0x563fa3f0a33e]
|
sql/sql_parse.cc:7896(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x563fa3f0cb89]
|
sql/sql_parse.cc:1834(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x563fa3f10504]
|
sql/sql_parse.cc:1352(do_command(THD*))[0x563fa403ca5d]
|
sql/sql_connect.cc:1412(do_handle_one_connection(CONNECT*))[0x563fa403cb74]
|
nptl/pthread_create.c:333(start_thread)[0x7f1864bfe494]
|
x86_64/clone.S:99(clone)[0x7f186394baff]
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x7f17cc0112e0): SELECT * FROM message WHERE processing_node IN (0, 2) AND path IN ('/erp5/portal_simulation/1/7/1/1/1/1/1','/erp5/portal_simulation/1/7/1/1/1/1','/erp5/portal_simulation/1/7/1/1/1','/erp5/portal_simulation/1/7/1/1','/erp5/portal_simulation/1/7/1','/erp5/portal_simulation/1/7','/erp5/portal_simulation/1') AND method_id = 'expand' AND group_method_id = '\0' ORDER BY path LIMIT 1 FOR UPDATE
|
Connection ID (thread ID): 47
|
Status: NOT_KILLED
|
|
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on
|
|
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
|
information that should help you find out what is causing the crash.
|
Writing a core file...
|
Working directory at /srv/slapgrid/slappart4/srv/slapos/inst/slappart0/tmp/inst/crash3/srv/mariadb
|
Resource Limits:
|
Limit Soft Limit Hard Limit Units
|
Max cpu time unlimited unlimited seconds
|
Max file size unlimited unlimited bytes
|
Max data size unlimited unlimited bytes
|
Max stack size 8388608 unlimited bytes
|
Max core file size unlimited unlimited bytes
|
Max resident set unlimited unlimited bytes
|
Max processes 257485 257485 processes
|
Max open files 4096 4096 files
|
Max locked memory 65536 65536 bytes
|
Max address space unlimited unlimited bytes
|
Max file locks unlimited unlimited locks
|
Max pending signals 257485 257485 signals
|
Max msgqueue size 819200 819200 bytes
|
Max nice priority 0 0
|
Max realtime priority 0 0
|
Max realtime timeout unlimited unlimited us
|
Core pattern: core
|
on a non-debug build, it was something like InnoDB: Assertion failure in file /opt/slapgrid/shared/mariadb/4da287626e382c1b113859af7d7565a9_compile_/mariadb-10.4.13/storage/innobase/lock/lock0lock.cc line 6898.
We have collected a core dump and produced a full back trace of all threads, please see attached gdb.txt
I also attach the general log from this execution, from the creation of the database until the crash. In that scenario, we are using mariadb for a kind of "task queue" application, the typical queries that might be causing the crash are using something like this simplified pattern, executed by several clients:
-- pick a task to execute
|
SELECT uid FROM message WHERE processing_node=0 LIMIT 1 FOR UPDATE; |
-- assign to self
|
UPDATE message SET processing_node=(worker id) WHERE uid=(uid of the picked task) |
-- remove when done
|
DELETE FROM message WHERE uid=(uid of the picked task) |
We don't have a simple reproduction, but we have a script which compiles mariadb and all its dependencies, start an application creating random data and it usually crashes after less than one hour. I'd be happy to try a patch with this script if you'd like, describe how to run this script, give access to a virtual machine with an already set up environment, or anything else which could help.
Attachments
Issue Links
- relates to
-
MDEV-20666 Assertion failures prebuilt->sql_stat_start ... in row_search / index_read upon concurrent INSERT and CREATE .. SELECT
- Stalled
-
MDEV-23338 server crash on select for update
- Open
-
MDEV-26291 ER_FILSORT_ABORT triggers - crash query when "for update" in a select statement
- Closed
-
MDEV-10087 mysqld_update()/mysql_delete() continues execution even after subquery with JOIN gets error from storage engine
- Closed