[MDEV-18815] Server crashes on TRUNCATE TABLE Created: 2019-03-04  Updated: 2019-10-23  Resolved: 2019-04-28

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.13
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Konstantin Vasserman Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: innodb_adaptive_hash_index, need_feedback
Environment:

Ubuntu


Issue Links:
Blocks
is blocked by MDEV-20461 testgi Closed
Relates
relates to MDEV-20487 Set innodb_adaptive_hash_index=OFF by... Closed
relates to MDEV-19613 Got error 128 when reading "Table Nam... Open

 Description   

Another crash. 10.3 seems to be incredibly unstable. This is the third crash we are reporting. Here is the log:

Mar 2 19:23:23 litemasha mysqld[1246]: 190302 19:23:23 [ERROR] mysqld got signal 11 ;
Mar 2 19:23:23 litemasha mysqld[1246]: This could be because you hit a bug. It is also possible that this binary
Mar 2 19:23:23 litemasha mysqld[1246]: or one of the libraries it was linked against is corrupt, improperly built,
Mar 2 19:23:23 litemasha mysqld[1246]: or misconfigured. This error can also be caused by malfunctioning hardware.
Mar 2 19:23:23 litemasha mysqld[1246]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Mar 2 19:23:23 litemasha mysqld[1246]: We will try our best to scrape up some info that will hopefully help
Mar 2 19:23:23 litemasha mysqld[1246]: diagnose the problem, but since we have already crashed,
Mar 2 19:23:23 litemasha mysqld[1246]: something is definitely wrong and this may fail.
Mar 2 19:23:23 litemasha mysqld[1246]: Server version: 10.3.13-MariaDB-1:10.3.13+maria~xenial
Mar 2 19:23:23 litemasha mysqld[1246]: key_buffer_size=134217728
Mar 2 19:23:23 litemasha mysqld[1246]: read_buffer_size=2097152
Mar 2 19:23:23 litemasha mysqld[1246]: max_used_connections=23
Mar 2 19:23:23 litemasha mysqld[1246]: max_threads=202
Mar 2 19:23:23 litemasha mysqld[1246]: thread_count=21
Mar 2 19:23:23 litemasha mysqld[1246]: It is possible that mysqld could use up to
Mar 2 19:23:23 litemasha mysqld[1246]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1376643 K bytes of memory
Mar 2 19:23:23 litemasha mysqld[1246]: Hope that's ok; if not, decrease some variables in the equation.
Mar 2 19:23:23 litemasha mysqld[1246]: Thread pointer: 0x7f9f30269288
Mar 2 19:23:23 litemasha mysqld[1246]: Attempting backtrace. You can use the following information to find out
Mar 2 19:23:23 litemasha mysqld[1246]: where mysqld died. If you see no messages after this, something went
Mar 2 19:23:23 litemasha mysqld[1246]: terribly wrong...
Mar 2 19:23:23 litemasha mysqld[1246]: stack_bottom = 0x7f9f84c09cf8 thread_stack 0x49000
Mar 2 19:23:40 litemasha mysqld[1246]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x558792c5936e]
Mar 2 19:23:40 litemasha mysqld[1246]: /usr/sbin/mysqld(handle_fatal_signal+0x347)[0x558792708187]
Mar 2 19:23:40 litemasha mysqld[1246]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fa0b3d5c390]
Mar 2 19:23:40 litemasha mysqld[1246]: /usr/sbin/mysqld(+0xbb7c67)[0x558792b6dc67]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(+0xb18fa5)[0x558792acefa5]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(+0xb1ac0b)[0x558792ad0c0b]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(+0xb45f0d)[0x558792afbf0d]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(+0xa67635)[0x558792a1d635]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(+0x9af282)[0x558792965282]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(+0x9ab3db)[0x5587929613db]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(_ZN22Sql_cmd_truncate_table16handler_truncateEP3THDP10TABLE_LISTb+0x108)[0x558792854a88]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(_ZN22Sql_cmd_truncate_table14truncate_tableEP3THDP10TABLE_LIST+0x1d0)[0x5587928552f0]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(_ZN22Sql_cmd_truncate_table7executeEP3THD+0x50)[0x5587928553a0]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x13d7)[0x558792527477]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x21a)[0x55879252eb8a]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1cc0)[0x5587925315b0]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(_Z10do_commandP3THD+0x13c)[0x5587925320bc]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x252)[0x558792604042]
Mar 2 19:23:41 litemasha mysqld[1246]: /usr/sbin/mysqld(handle_one_connection+0x3d)[0x5587926041bd]
Mar 2 19:23:41 litemasha mysqld[1246]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fa0b3d526ba]
Mar 2 19:23:41 litemasha mysqld[1246]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fa0b33fd41d]
Mar 2 19:23:41 litemasha mysqld[1246]: Trying to get some variables.
Mar 2 19:23:41 litemasha mysqld[1246]: Some pointers may be invalid and cause the dump to abort.
Mar 2 19:23:41 litemasha mysqld[1246]: Query (0x7f9f30888940): TRUNCATE TABLE `LiteworldCompany-Site5`.`ZipCodes`
Mar 2 19:23:41 litemasha mysqld[1246]: Connection ID (thread ID): 80684
Mar 2 19:23:41 litemasha mysqld[1246]: Status: NOT_KILLED
Mar 2 19:23:41 litemasha mysqld[1246]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on
Mar 2 19:23:42 litemasha kernel: [355725.768030] do_general_protection: 9 callbacks suppressed
Mar 2 19:23:42 litemasha kernel: [355725.816626] traps: mysqld[7494] general protection ip:558792b6dc67 sp:7f9f84c06060 error:0 in mysqld[558791fb6000+1261000]
Mar 2 19:23:42 litemasha mysqld[1246]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Mar 2 19:23:42 litemasha mysqld[1246]: information that should help you find out what is causing the crash.
Mar 2 19:23:53 litemasha systemd[1]: mariadb.service: Main process exited, code=dumped, status=11/SEGV
Mar 2 19:23:53 litemasha systemd[1]: mariadb.service: Unit entered failed state.
Mar 2 19:23:53 litemasha systemd[1]: mariadb.service: Failed with result 'core-dump'.
Mar 2 19:23:58 litemasha systemd[1]: mariadb.service: Service hold-off time over, scheduling restart.
Mar 2 19:23:58 litemasha systemd[1]: Stopped MariaDB 10.3.13 database server.



 Comments   
Comment by Elena Stepanova [ 2019-03-09 ]

Resolved stack trace from the description

stack_bottom = 0x7f9f84c09cf8 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x558792c5936e]
/usr/sbin/mysqld(handle_fatal_signal+0x347)[0x558792708187]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fa0b3d5c390]
/usr/sbin/mysqld(0xbb7c67 ha_remove_all_nodes_to_page(hash_table_t*, unsigned long, unsigned char const*) + 71)[0x558792b6dc67]
/usr/sbin/mysqld(0xb18fa5 btr_search_drop_page_hash_index(buf_block_t*) + 3765)[0x558792acefa5]
/usr/sbin/mysqld(0xb1ac0b btr_search_drop_page_hash_when_freed(page_id_t) + 331)[0x558792ad0c0b]
/usr/sbin/mysqld(0xb45f0d buf_LRU_drop_page_hash_for_tablespace(dict_table_t*) + 1053)[0x558792afbf0d]
/usr/sbin/mysqld(0xa67635 row_drop_table_for_mysql(char const*, trx_t*, enum_sql_command, bool, bool) + 709)[0x558792a1d635]
/usr/sbin/mysqld(0x9af282 ha_innobase::delete_table(char const*, enum_sql_command) + 402)[0x558792965282]
/usr/sbin/mysqld(0x9ab3db ha_innobase::truncate() [clone .part.278] [clone .constprop.291] + 1803)[0x5587929613db]
/usr/sbin/mysqld(Sql_cmd_truncate_table::handler_truncate(THD*, TABLE_LIST*, bool)+0x108)[0x558792854a88]
/usr/sbin/mysqld(Sql_cmd_truncate_table::truncate_table(THD*, TABLE_LIST*)+0x1d0)[0x5587928552f0]
/usr/sbin/mysqld(Sql_cmd_truncate_table::execute(THD*)+0x50)[0x5587928553a0]
/usr/sbin/mysqld(mysql_execute_command(THD*)+0x13d7)[0x558792527477]
/usr/sbin/mysqld(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool)+0x21a)[0x55879252eb8a]
/usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool)+0x1cc0)[0x5587925315b0]
/usr/sbin/mysqld(do_command(THD*)+0x13c)[0x5587925320bc]
/usr/sbin/mysqld(do_handle_one_connection(CONNECT*)+0x252)[0x558792604042]
/usr/sbin/mysqld(handle_one_connection+0x3d)[0x5587926041bd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fa0b3d526ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fa0b33fd41d]

Comment by Marko Mäkelä [ 2019-03-21 ]

kvasserman, it looks like something has corrupted the InnoDB adaptive hash index. The crash occurs when InnoDB tries to remove the adaptive hash index entries for each buffer pool block of the table that contains adaptive hash index entries. The adaptive hash index is a very fragile data structure, because it is global for all InnoDB tables.

To work around this issue:

SET GLOBAL innodb_adaptive_hash_index=OFF;

This may reduce or improve performance. Because the adaptive hash index is expected to improve performance, I have not dared to remove it altogether.

To fix the actual bug, I would need something reproducible. Based on the nature of the adaptive hash index, I do not have high hopes. The adaptive hash index may be updated on any read or write access, including for internal accesses by the InnoDB purge subsystem. Maybe you could run a debug version of the server in a staging environment? Or a version that is built WITH_ASAN?

The crash appears to occur here:

void
ha_remove_all_nodes_to_page(
/*========================*/
        hash_table_t*   table,  /*!< in: hash table */
        ulint           fold,   /*!< in: fold value */
        const page_t*   page)   /*!< in: buffer page */
{
        ha_node_t*      node;
 
        ut_ad(table);
        ut_ad(table->magic_n == HASH_TABLE_MAGIC_N);
        hash_assert_can_modify(table, fold);
        ut_ad(btr_search_enabled);
 
        node = ha_chain_get_first(table, fold);
 
        while (node) {
                if (page_align(ha_node_get_data(node)) == page) {

Cleaned-up disassembly:

<+0>:     push   %rbp
<+1>:     xor    $0x62946a4f,%rsi
<+8>:     mov    %rsi,%rax
<+11>:    mov    %rsp,%rbp
<+14>:    push   %r14
<+16>:    push   %r13
<+18>:    push   %r12
<+20>:    push   %rbx
<+21>:    mov    %rdx,%rbx
<+24>:    xor    %edx,%edx
<+26>:    mov    %rsi,%r14
<+29>:    divq   0x8(%rdi)
<+33>:    mov    0x10(%rdi),%rax
<+37>:    mov    (%rax,%rdx,8),%rsi
<+41>:    test   %rsi,%rsi
<+44>:    je     <+89>
<+46>:    lea    0x12c8eab(%rip),%r12        # 0x1e80b00 <srv_page_size>
<+53>:    mov    %rdi,%r13
<+56>:    nopl   0x0(%rax,%rax,1)
<+64>:    mov    (%r12),%rax
<+68>:    neg    %rax
<+71>:    and    0x10(%rsi),%rax
<+75>:    cmp    %rax,%rbx
<+78>:    je     <+104>
<+80>:    mov    0x8(%rsi),%rsi
<+84>:    test   %rsi,%rsi
<+87>:    jne    <+64>
<+89>:    pop    %rbx
<+90>:    pop    %r12
<+92>:    pop    %r13
<+94>:    pop    %r14
<+96>:    pop    %rbp
<+97>:    retq
[snip code that jumps back to <+84>]

The crash occurs at offset +71 from the function, for what looks like ha_node_get_data(node) aka return node->data combined with the evaluation of page_align(). In a 64-bit non-debug build there should be 16 bytes before that field, and the crashing instruction is using that offset:

and    0x10(%rsi),%rax

Do you happen to resize the InnoDB buffer pool while the server is running? There have been some problems related to that. The feature was introduced in MariaDB 10.2 via MySQL 5.7.

Generated at Thu Feb 08 08:46:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.