[MDEV-15256] Crash in flush_cached_blocks - Jira

XML

Word

Printable

Details

Type: Bug
Status: Confirmed (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 5.5, 10.1, 10.1.22, 10.1.30, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0
Fix Version/s: 10.4, 10.5, 10.6, 10.11
Component/s: Storage Engine - Aria
Labels:
None
Environment:
CentOS Linux release 7.4.1708

Description

We have copied part of our environment to a virtual server and since then we have experienced random crashes (4 times in a couple of weeks). As soon as the crash happens the dbms won't restart, since it crashes again while recovering. The latest crash, tonight, shows this stacktrace:

stack_bottom = 0x0 thread_stack 0x48400
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55d96d72cd0e]
/usr/sbin/mysqld(handle_fatal_signal+0x305)[0x55d96d24f925]
/lib64/libpthread.so.0(+0xf5e0)[0x7f759d8a15e0]
/usr/sbin/mysqld(+0x790bc7)[0x55d96d41ebc7]
/usr/sbin/mysqld(+0x791422)[0x55d96d41f422]
/usr/sbin/mysqld(+0x79568c)[0x55d96d42368c]
/usr/sbin/mysqld(+0x797108)[0x55d96d425108]
/lib64/libpthread.so.0(+0x7e25)[0x7f759d899e25]
/lib64/libc.so.6(clone+0x6d)[0x7f759bc3d34d]

Using 'addr2line' I deduced that the crash happens while doing a pagecache flush:

flush_cached_blocks .../mariadb-10.1.30/storage/maria/ma_pagecache.c:4431
flush_pagecache_blocks_int .../mariadb-10.1.30/storage/maria/ma_pagecache.c:4727
flush_pagecache_blocks_with_filter .../mariadb-10.1.30/storage/maria/ma_pagecache.c:4844
ma_checkpoint_background .../mariadb-10.1.30/storage/maria/ma_checkpoint.c:674

As said, when the dbms restarts, the recovery fails:

recovered pages: 0% 10% 20% 30% 40% 50%180208 21:55:00 [ERROR] mysqld got signal 11 ;

which happens here:

stack_bottom = 0x0 thread_stack 0x48400
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5637695b7d0e]
/usr/sbin/mysqld(handle_fatal_signal+0x305)[0x5637690da925]
/lib64/libpthread.so.0(+0xf5e0)[0x7f041f2795e0]
/usr/sbin/mysqld(+0x799f6b)[0x5637692b2f6b]
/usr/sbin/mysqld(+0x7992f1)[0x5637692b22f1]
/usr/sbin/mysqld(+0x79df00)[0x5637692b6f00]
/usr/sbin/mysqld(+0x79e8ae)[0x5637692b78ae]
mysys/stacktrace.c:268(my_print_stacktrace)[0x5637692900bd]
maria/ma_recovery.c:2139(exec_REDO_LOGREC_CLR_END)[0x5637690dcbd4]
sql/handler.cc:521(ha_initialize_handlerton(st_plugin_int*))[0x563768f64730]
sql/sql_plugin.cc:1404(plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool))[0x563768f6601a]
/usr/sbin/mysqld(+0x3a1768)[0x563768eba768]
sql/mysqld.cc:5133(init_server_components())[0x563768ebe210]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f041d53ec05]
/usr/sbin/mysqld(+0x398ced)[0x563768eb1ced]

The four mysqld addresses translate to:

exec_REDO_LOGREC_CLR_END .../mariadb-10.1.30/storage/maria/ma_recovery.c:2139
display_and_apply_record .../mariadb-10.1.30/storage/maria/ma_recovery.c:588
run_redo_phase .../mariadb-10.1.30/storage/maria/ma_recovery.c:2730
maria_apply_log .../mariadb-10.1.30/storage/maria/ma_recovery.c:350
maria_recovery_from_log .../mariadb-10.1.30/storage/maria/ma_recovery.c:242

We can get the dbms running by removing a the table in which the redo log crashes (as determined by using aria_read_log) and moving the table back in after the restart.

Then everything runs fine, for a few days, for a week, even for two weeks and then it crashes again.

We haven't got the faintest idea of what goes wrong. I checked the open bugs and found some vague resemblance, but nothing that stood out. There is no oom-error or other malfunction visible on the system that can be pinpointed to the problem. I tried to investigate (at least) the aria_log file, but could not find tools to find out what is causing the crash in the restart. I know that the record which causes the crash is not the last record in the logfile, so it seems the two crashes are not related.

But we are a bit at a dead end. Any help is appreciated. We can enable logs (if someone tells us how), we can run a debug version, we can even try to update to 10.2 (but there are some issues in de Perl DBD::mysql module at the moment that keep us from doing so). Bear in mind that updating from 10.1.22 to 10.1.30 did not solve the issue. Help is appreciated...

Attachments

Issue Links

relates to

MDEV-654 LP:1000495 - Assertion `share->now_transactional' failed in flush_log_for_bitmap on concurrent workload with Aria tables

Closed

MDEV-4312 aria: background thread crashes in make_lock_and_pin

Closed

MDEV-18947 my_pwrite / pagecache_fwrite: Syscall param pwrite64(buf) points to uninitialised byte(s)

Open

MDEV-33577 Crash on get_rdlock

Open

Activity

People

Assignee:: Michael Widenius

Reporter:: Frank Maas

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2018-02-08 23:34

Updated:: 2024-03-04 10:29

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.