Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12052

Shutdown crash presumably due to master thread activity

Details

    • 5.5.55, 10.0.30

    Description

      https://internal.askmonty.org/buildbot/builders/p8-rhel6-bintar/builds/549/steps/test/logs/stdio

      Thread 1 (Thread 24994):
      #0  0x00000fffa3a63a40 in .pthread_kill () from /opt/at7.0/lib64/power7/libpthread.so.0
      #1  0x00000000109eebb8 in .my_write_core ()
      #2  0x00000000103e3a3c in .handle_fatal_signal ()
      #3  <signal handler called>
      #4  0x00000fffa320f2f0 in .raise () from /opt/at7.0/lib64/power7/libc.so.6
      #5  0x00000fffa32112f4 in .abort () from /opt/at7.0/lib64/power7/libc.so.6
      #6  0x000000001079ef54 in .buf_flush_page_cleaner_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0flu.cc:2803
      #7  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
      #8  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
      

      innodb_fts.innodb_fts_stopword_charset 'xtradb' w2 [ fail ]  Found warnings/errors in server log file!
              Test ended at 2017-02-09 07:47:28
      line
      Attempting backtrace. You can use the following information to find out
      ^ Found warnings in /home/buildbot/maria-slave/p8-rhel6-bintar/build/mysql-test/var/2/log/mysqld.1.err
      ok
       
       - saving '/home/buildbot/maria-slave/p8-rhel6-bintar/build/mysql-test/var/2/log/innodb_fts.innodb_fts_stopword_charset-xtradb/' to '/home/buildbot/maria-slave/p8-rhel6-bintar/build/mysql-test/var/log/innodb_fts.innodb_fts_stopword_charset-xtradb/'
       - found 'core.24981' (0/5)
       
      Trying 'dbx' to get a backtrace
       
      Trying 'gdb' to get a backtrace from coredump /home/buildbot/maria-slave/p8-rhel6-bintar/build/mysql-test/var/log/innodb_fts.innodb_fts_stopword_charset-xtradb/mysqld.1/data/core.24981
      Core generated by '/home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/mysqld'
      Output from gdb follows. The first stack trace is from the failing thread.
      The following stack traces are from all threads (so the failing one is
      duplicated).
      --------------------------
      [New Thread 24994]
      [New Thread 25000]
      [New Thread 24992]
      [New Thread 25032]
      [New Thread 24981]
      [New Thread 24982]
      [New Thread 24983]
      [New Thread 24984]
      [New Thread 24985]
      [New Thread 24987]
      [New Thread 24986]
       
      warning: Ignoring non-absolute filename: <linux-vdso64.so.1>
      Missing separate debuginfo for linux-vdso64.so.1
      Try: yum --enablerepo='*-debug*' install /usr/lib/debug/.build-id/b4/c87d71a8818890c1b89fb25fa9a70233e32a89
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
      To enable execution of this file add
      	add-auto-load-safe-path /opt/at7.0/lib64/libthread_db-1.0.so
      line to your configuration file "/home/buildbot/.gdbinit".
      To completely disable this security protection add
      	set auto-load safe-path /
      line to your configuration file "/home/buildbot/.gdbinit".
      For more information about this security protection see the
      "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
      	info "(gdb)Auto-loading safe path"
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
       
      warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
       
      warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
      Core was generated by `/home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/mysqld --defaults-group-su'.
      Program terminated with signal 6, Aborted.
      #0  0x00000fffa3a63a40 in .pthread_kill () from /opt/at7.0/lib64/power7/libpthread.so.0
      #0  0x00000fffa3a63a40 in .pthread_kill () from /opt/at7.0/lib64/power7/libpthread.so.0
      #1  0x00000000109eebb8 in .my_write_core ()
      #2  0x00000000103e3a3c in .handle_fatal_signal ()
      #3  <signal handler called>
      #4  0x00000fffa320f2f0 in .raise () from /opt/at7.0/lib64/power7/libc.so.6
      #5  0x00000fffa32112f4 in .abort () from /opt/at7.0/lib64/power7/libc.so.6
      #6  0x000000001079ef54 in .buf_flush_page_cleaner_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0flu.cc:2803
      #7  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
      #8  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
       
      Thread 11 (Thread 24986):
      #0  0x00000fffa3a67220 in .__libc_fsync () at ../sysdeps/unix/syscall-template.S:81
      #1  0x000000001068b8e4 in ._Z18os_file_flush_funci () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/os/os0file.cc:2482
      #2  0x00000000107f04c8 in ._Z9fil_flushm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/include/os0file.ic:355
      #3  0x00000000107f13cc in ._Z21fil_flush_file_spacesm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:6014
      #4  0x0000000010790fdc in ._Z16buf_dblwr_updatePK10buf_page_t11buf_flush_t () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0dblwr.cc:655
      #5  0x0000000010797918 in ._Z24buf_flush_write_completeP10buf_page_t () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0flu.cc:734
      #6  0x000000001078b7c8 in ._Z20buf_page_io_completeP10buf_page_t () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0buf.cc:4639
      #7  0x00000000107ec38c in ._Z12fil_aio_waitm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:5820
      #8  0x000000001070f790 in .io_handler_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:532
      #9  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
      #10 0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
       
      Thread 10 (Thread 24987):
      #0  0x00000fffa3a10958 in ?? () from /lib64/libaio.so.1
      #1  0x00000fffa3a109bc in .io_getevents () from /lib64/libaio.so.1
      #2  0x000000001068f6c8 in ._Z19os_aio_linux_handlemPP10fil_node_tPPvPmS4_ () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/os/os0file.cc:5179
      #3  0x00000000107ec224 in ._Z12fil_aio_waitm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:5784
      #4  0x000000001070f790 in .io_handler_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:532
      #5  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
      #6  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
       
      Thread 9 (Thread 24985):
      #0  0x00000fffa3a10958 in ?? () from /lib64/libaio.so.1
      #1  0x00000fffa3a109bc in .io_getevents () from /lib64/libaio.so.1
      #2  0x000000001068f6c8 in ._Z19os_aio_linux_handlemPP10fil_node_tPPvPmS4_ () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/os/os0file.cc:5179
      #3  0x00000000107ec224 in ._Z12fil_aio_waitm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:5784
      #4  0x000000001070f790 in .io_handler_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:532
      #5  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
      #6  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
       
      Thread 8 (Thread 24984):
      #0  0x00000fffa3a10958 in ?? () from /lib64/libaio.so.1
      #1  0x00000fffa3a109bc in .io_getevents () from /lib64/libaio.so.1
      #2  0x000000001068f6c8 in ._Z19os_aio_linux_handlemPP10fil_node_tPPvPmS4_ () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/os/os0file.cc:5179
      #3  0x00000000107ec224 in ._Z12fil_aio_waitm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:5784
      #4  0x000000001070f790 in .io_handler_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:532
      #5  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
      #6  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
       
      Thread 7 (Thread 24983):
      #0  0x00000fffa3a10958 in ?? () from /lib64/libaio.so.1
      #1  0x00000fffa3a109bc in .io_getevents () from /lib64/libaio.so.1
      #2  0x000000001068f6c8 in ._Z19os_aio_linux_handlemPP10fil_node_tPPvPmS4_ () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/os/os0file.cc:5179
      #3  0x00000000107ec224 in ._Z12fil_aio_waitm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:5784
      #4  0x000000001070f790 in .io_handler_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:532
      #5  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
      #6  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
       
      Thread 6 (Thread 24982):
      #0  0x00000fffa3a10958 in ?? () from /lib64/libaio.so.1
      #1  0x00000fffa3a109bc in .io_getevents () from /lib64/libaio.so.1
      #2  0x000000001068f6c8 in ._Z19os_aio_linux_handlemPP10fil_node_tPPvPmS4_ () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/os/os0file.cc:5179
      #3  0x00000000107ec224 in ._Z12fil_aio_waitm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:5784
      #4  0x000000001070f790 in .io_handler_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:532
      #5  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
      #6  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
       
      Thread 5 (Thread 24981):
      #0  0x00000fffa3a62130 in .pthread_cond_wait () from /opt/at7.0/lib64/power7/libpthread.so.0
      #1  0x000000001014e5c8 in ._Z11mysqld_mainiPPc () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/include/mysql/psi/mysql_thread.h:1154
      #2  0x0000000010129e58 in .main () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/main.cc:25
       
      Thread 4 (Thread 25032):
      #0  0x00000fffa32dbf44 in .__select () at ../sysdeps/unix/syscall-template.S:81
      #1  0x00000000106931c4 in ._Z15os_thread_sleepm ()
      #2  0x000000001066f9b0 in ._Z37logs_empty_and_mark_files_at_shutdownv () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/log/log0log.cc:3631
      #3  0x0000000010713b04 in ._Z27innobase_shutdown_for_mysqlv () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:3002
      #4  0x000000001060778c in ._ZL12innobase_endP10handlerton17ha_panic_function () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/handler/ha_innodb.cc:4188
      #5  0x00000000103e6db4 in ._Z22ha_finalize_handlertonP13st_plugin_int () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/handler.cc:440
      #6  0x000000001021ef2c in ._ZL12reap_pluginsv () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/sql_plugin.cc:1176
      #7  0x0000000000000000 in ?? ()
       
      Thread 3 (Thread 24992):
      #0  0x00000fffa3a62130 in .pthread_cond_wait () from /opt/at7.0/lib64/power7/libpthread.so.0
      #1  0x00000000106921dc in ._Z17os_event_wait_lowP8os_eventl ()
      #2  0x000000001070c5d4 in .srv_master_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0srv.cc:1010
      #3  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
      #4  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
       
      Thread 2 (Thread 25000):
      #0  0x00000fffa3a67f18 in .__libc_sigwait () from /opt/at7.0/lib64/power7/libpthread.so.0
      #1  0x00000000101404dc in .signal_hand () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/mysqld.cc:3278
      #2  0x00000000109916dc in .pfs_spawn_thread ()
      #3  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
      #4  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
       
      Thread 1 (Thread 24994):
      #0  0x00000fffa3a63a40 in .pthread_kill () from /opt/at7.0/lib64/power7/libpthread.so.0
      #1  0x00000000109eebb8 in .my_write_core ()
      #2  0x00000000103e3a3c in .handle_fatal_signal ()
      #3  <signal handler called>
      #4  0x00000fffa320f2f0 in .raise () from /opt/at7.0/lib64/power7/libc.so.6
      #5  0x00000fffa32112f4 in .abort () from /opt/at7.0/lib64/power7/libc.so.6
      #6  0x000000001079ef54 in .buf_flush_page_cleaner_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0flu.cc:2803
      #7  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
      #8  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
      

      The test has not failed for 8 months, so it's suspicious.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä added a comment - - edited

            I was wondering if this is a duplicate of MDEV-12091 (which I fixed in 10.x). Normally, MySQL would rollback any active transactions on shutdown. (I had to jump through some hoops when writing the test innodb.read_only_recovery for MDEV-11985.)

            However, the InnoDB fulltext index subsystem is creating transactions on its own. Such transactions could remain active (incomplete) the first time the test is shutting down and restarting the server. Maybe on the 2nd restart that is performed by the test, the background rollback was still in progress for a transaction that was left incomplete on the 1st shutdown?

            The server error log for the 10.0 XtraDB crash and for a 10.2 InnoDB crash refute the above hypothesis. The only messages about recovered transactions are issued for other tests, such as innodb.xa_recovery and innodb.read_only_recovery. For this test, there is no message about a recovered transaction. Here are some relevant messages from the failing test:

            170209  7:47:06 [Note] /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/mysqld: Normal shutdown
            …
            170209  7:47:09 [Note] InnoDB: innodb_empty_free_list_algorithm has been changed to legacy because of small buffer pool size. In order to use backoff, increase buffer pool at least up to 20MB.
            …
            170209  7:47:24 [Note] /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/mysqld: Normal shutdown
             
            170209  7:47:24 [Note] Event Scheduler: Purging the queue. 0 events
            170209  7:47:24 [Note] InnoDB: FTS optimize thread exiting.
            170209  7:47:24 [Note] InnoDB: Starting shutdown...
            170209  7:47:25 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
            2017-02-09 07:47:25 fff977ef160  InnoDB: Assertion failure in thread 17590432756064 in file buf0flu.cc line 2803
            InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0
            

            Because XtraDB is not active in 10.2, the bug affects plain InnoDB too.

            Note: The test is being skipped by Valgrind with no valid reason (other than that it takes very long to run on a debug binary). Also the not_crashrep.inc should not be there, because there is no crash injection taking place. A non-debug binary passes the test in 38 seconds with Valgrind and 4 seconds without. I will push the test case clean-up, hoping to see better failures in the future.

            marko Marko Mäkelä added a comment - - edited I was wondering if this is a duplicate of MDEV-12091 (which I fixed in 10.x). Normally, MySQL would rollback any active transactions on shutdown. (I had to jump through some hoops when writing the test innodb.read_only_recovery for MDEV-11985 .) However, the InnoDB fulltext index subsystem is creating transactions on its own. Such transactions could remain active (incomplete) the first time the test is shutting down and restarting the server. Maybe on the 2nd restart that is performed by the test, the background rollback was still in progress for a transaction that was left incomplete on the 1st shutdown? The server error log for the 10.0 XtraDB crash and for a 10.2 InnoDB crash refute the above hypothesis. The only messages about recovered transactions are issued for other tests, such as innodb.xa_recovery and innodb.read_only_recovery. For this test, there is no message about a recovered transaction. Here are some relevant messages from the failing test: 170209 7:47:06 [Note] /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/mysqld: Normal shutdown … 170209 7:47:09 [Note] InnoDB: innodb_empty_free_list_algorithm has been changed to legacy because of small buffer pool size. In order to use backoff, increase buffer pool at least up to 20MB. … 170209 7:47:24 [Note] /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/mysqld: Normal shutdown   170209 7:47:24 [Note] Event Scheduler: Purging the queue. 0 events 170209 7:47:24 [Note] InnoDB: FTS optimize thread exiting. 170209 7:47:24 [Note] InnoDB: Starting shutdown... 170209 7:47:25 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2017-02-09 07:47:25 fff977ef160 InnoDB: Assertion failure in thread 17590432756064 in file buf0flu.cc line 2803 InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0 Because XtraDB is not active in 10.2, the bug affects plain InnoDB too. Note: The test is being skipped by Valgrind with no valid reason (other than that it takes very long to run on a debug binary). Also the not_crashrep.inc should not be there, because there is no crash injection taking place. A non-debug binary passes the test in 38 seconds with Valgrind and 4 seconds without. I will push the test case clean-up, hoping to see better failures in the future.

            As pointed out in the follow-up discussion of my comment about an incorrect Xtrabackup change that caused the shutdown hang reported in MDEV-12773, it seems that this bug could be caused by the function srv_master_do_shutdown_tasks() that is being run by the master thread.

            The most likely problem should be the row_drop_tables_for_mysql_in_background(). Other tasks are only performed in a slow shutdown (innodb_fast_shutdown=0). It is possible that logs_empty_and_mark_files_at_shutdown() is also not properly waiting for the change buffer merge to complete, but it would also be waiting for purge to complete, and that could perhaps take longer than the change buffer merge.

            It might be cleanest to move these clean-up tasks from the master thread directly to logs_empty_and_mark_files_at_shutdown().

            marko Marko Mäkelä added a comment - As pointed out in the follow-up discussion of my comment about an incorrect Xtrabackup change that caused the shutdown hang reported in MDEV-12773 , it seems that this bug could be caused by the function srv_master_do_shutdown_tasks() that is being run by the master thread. The most likely problem should be the row_drop_tables_for_mysql_in_background(). Other tasks are only performed in a slow shutdown (innodb_fast_shutdown=0). It is possible that logs_empty_and_mark_files_at_shutdown() is also not properly waiting for the change buffer merge to complete, but it would also be waiting for purge to complete, and that could perhaps take longer than the change buffer merge. It might be cleanest to move these clean-up tasks from the master thread directly to logs_empty_and_mark_files_at_shutdown().

            In the failure log that I analyzed earlier, the srv_master_thread() appears to be waiting for a signal in srv_resume_thread(slot) here:

            	srv_main_thread_op_info = "waiting for server activity";
             
            	srv_resume_thread(slot);
             
            	if (srv_shutdown_state == SRV_SHUTDOWN_EXIT_THREADS) {
            		os_thread_exit(NULL);
            	}
             
            	goto loop;
            

            I would assume that some moments before, it returned from the srv_master_do_shutdown_tasks() loop, which likely caused the problem. Note that srv_shutdown_state must be SRV_SHUTDOWN_FLUSH_PHASE here, if we look at the crashing thread:

            	ut_a(srv_shutdown_state == SRV_SHUTDOWN_FLUSH_PHASE);
            	for (ulint i = 0; i < srv_buf_pool_instances; i++) {
            		buf_pool_t* buf_pool = buf_pool_from_array(i);
            		ut_a(UT_LIST_GET_LEN(buf_pool->flush_list) == 0);
            	}
            

            The page-dirtying activity of srv_master_do_shutdown_tasks() is clearly violating the documented assumption:

            	SRV_SHUTDOWN_FLUSH_PHASE,/*!< At this phase the master and the
            				purge threads must have completed their
            				work. Once we enter this phase the
            				page_cleaner can clean up the buffer
            				pool and exit */
            

            All page-dirtying activity should have been completed before advancing from the first shutdown phase, SRV_SHUTDOWN_CLEANUP.

            It would probably be cleanest to move this activity to the logs_empty_and_mark_files_at_shutdown() loop, which was refactored in MDEV-11638 and revised in MDEV-12091.

            marko Marko Mäkelä added a comment - In the failure log that I analyzed earlier , the srv_master_thread() appears to be waiting for a signal in srv_resume_thread(slot) here: srv_main_thread_op_info = "waiting for server activity";   srv_resume_thread(slot);   if (srv_shutdown_state == SRV_SHUTDOWN_EXIT_THREADS) { os_thread_exit(NULL); }   goto loop; I would assume that some moments before, it returned from the srv_master_do_shutdown_tasks() loop, which likely caused the problem. Note that srv_shutdown_state must be SRV_SHUTDOWN_FLUSH_PHASE here, if we look at the crashing thread: ut_a(srv_shutdown_state == SRV_SHUTDOWN_FLUSH_PHASE); for (ulint i = 0; i < srv_buf_pool_instances; i++) { buf_pool_t* buf_pool = buf_pool_from_array(i); ut_a(UT_LIST_GET_LEN(buf_pool->flush_list) == 0); } The page-dirtying activity of srv_master_do_shutdown_tasks() is clearly violating the documented assumption: SRV_SHUTDOWN_FLUSH_PHASE,/*!< At this phase the master and the purge threads must have completed their work. Once we enter this phase the page_cleaner can clean up the buffer pool and exit */ All page-dirtying activity should have been completed before advancing from the first shutdown phase, SRV_SHUTDOWN_CLEANUP. It would probably be cleanest to move this activity to the logs_empty_and_mark_files_at_shutdown() loop, which was refactored in MDEV-11638 and revised in MDEV-12091 .
            marko Marko Mäkelä added a comment - - edited

            bb-10.2-marko
            I was unable to repeat any problems on 10.0, but it could require good timing or the use of the change buffer. On 10.1 and 10.2, my test triggered a race condition in the btr_defragment_thread shutdown.
            bb-10.1-marko
            On 10.1, we were unnecessarily starting scrubbing and defragmenting in read-only mode.
            bb-10.0-marko
            The 10.0 patch is the simplest.

            marko Marko Mäkelä added a comment - - edited bb-10.2-marko I was unable to repeat any problems on 10.0, but it could require good timing or the use of the change buffer. On 10.1 and 10.2, my test triggered a race condition in the btr_defragment_thread shutdown. bb-10.1-marko On 10.1, we were unnecessarily starting scrubbing and defragmenting in read-only mode. bb-10.0-marko The 10.0 patch is the simplest.

            ok to push.

            jplindst Jan Lindström (Inactive) added a comment - ok to push.

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.