Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15256

Crash in flush_cached_blocks

    XMLWordPrintable

Details

    • Bug
    • Status: Confirmed (View Workflow)
    • Major
    • Resolution: Unresolved
    • 5.5, 10.1, 10.1.22, 10.1.30, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0
    • 10.4, 10.5, 10.6, 10.11
    • Storage Engine - Aria
    • None
    • CentOS Linux release 7.4.1708

    Description

      We have copied part of our environment to a virtual server and since then we have experienced random crashes (4 times in a couple of weeks). As soon as the crash happens the dbms won't restart, since it crashes again while recovering. The latest crash, tonight, shows this stacktrace:

      stack_bottom = 0x0 thread_stack 0x48400
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55d96d72cd0e]
      /usr/sbin/mysqld(handle_fatal_signal+0x305)[0x55d96d24f925]
      /lib64/libpthread.so.0(+0xf5e0)[0x7f759d8a15e0]
      /usr/sbin/mysqld(+0x790bc7)[0x55d96d41ebc7]
      /usr/sbin/mysqld(+0x791422)[0x55d96d41f422]
      /usr/sbin/mysqld(+0x79568c)[0x55d96d42368c]
      /usr/sbin/mysqld(+0x797108)[0x55d96d425108]
      /lib64/libpthread.so.0(+0x7e25)[0x7f759d899e25]
      /lib64/libc.so.6(clone+0x6d)[0x7f759bc3d34d]

      Using 'addr2line' I deduced that the crash happens while doing a pagecache flush:

      flush_cached_blocks .../mariadb-10.1.30/storage/maria/ma_pagecache.c:4431
      flush_pagecache_blocks_int .../mariadb-10.1.30/storage/maria/ma_pagecache.c:4727
      flush_pagecache_blocks_with_filter .../mariadb-10.1.30/storage/maria/ma_pagecache.c:4844
      ma_checkpoint_background .../mariadb-10.1.30/storage/maria/ma_checkpoint.c:674

      As said, when the dbms restarts, the recovery fails:

      recovered pages: 0% 10% 20% 30% 40% 50%180208 21:55:00 [ERROR] mysqld got signal 11 ;

      which happens here:

      stack_bottom = 0x0 thread_stack 0x48400
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5637695b7d0e]
      /usr/sbin/mysqld(handle_fatal_signal+0x305)[0x5637690da925]
      /lib64/libpthread.so.0(+0xf5e0)[0x7f041f2795e0]
      /usr/sbin/mysqld(+0x799f6b)[0x5637692b2f6b]
      /usr/sbin/mysqld(+0x7992f1)[0x5637692b22f1]
      /usr/sbin/mysqld(+0x79df00)[0x5637692b6f00]
      /usr/sbin/mysqld(+0x79e8ae)[0x5637692b78ae]
      mysys/stacktrace.c:268(my_print_stacktrace)[0x5637692900bd]
      maria/ma_recovery.c:2139(exec_REDO_LOGREC_CLR_END)[0x5637690dcbd4]
      sql/handler.cc:521(ha_initialize_handlerton(st_plugin_int*))[0x563768f64730]
      sql/sql_plugin.cc:1404(plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool))[0x563768f6601a]
      /usr/sbin/mysqld(+0x3a1768)[0x563768eba768]
      sql/mysqld.cc:5133(init_server_components())[0x563768ebe210]
      /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f041d53ec05]
      /usr/sbin/mysqld(+0x398ced)[0x563768eb1ced]

      The four mysqld addresses translate to:

      exec_REDO_LOGREC_CLR_END .../mariadb-10.1.30/storage/maria/ma_recovery.c:2139
      display_and_apply_record .../mariadb-10.1.30/storage/maria/ma_recovery.c:588
      run_redo_phase .../mariadb-10.1.30/storage/maria/ma_recovery.c:2730
      maria_apply_log .../mariadb-10.1.30/storage/maria/ma_recovery.c:350
      maria_recovery_from_log .../mariadb-10.1.30/storage/maria/ma_recovery.c:242

      We can get the dbms running by removing a the table in which the redo log crashes (as determined by using aria_read_log) and moving the table back in after the restart.

      Then everything runs fine, for a few days, for a week, even for two weeks and then it crashes again.

      We haven't got the faintest idea of what goes wrong. I checked the open bugs and found some vague resemblance, but nothing that stood out. There is no oom-error or other malfunction visible on the system that can be pinpointed to the problem. I tried to investigate (at least) the aria_log file, but could not find tools to find out what is causing the crash in the restart. I know that the record which causes the crash is not the last record in the logfile, so it seems the two crashes are not related.

      But we are a bit at a dead end. Any help is appreciated. We can enable logs (if someone tells us how), we can run a debug version, we can even try to update to 10.2 (but there are some issues in de Perl DBD::mysql module at the moment that keep us from doing so). Bear in mind that updating from 10.1.22 to 10.1.30 did not solve the issue. Help is appreciated...

      Attachments

        Issue Links

          Activity

            People

              monty Michael Widenius
              Frank_VID Frank Maas
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.