[MDEV-12052] Shutdown crash presumably due to master thread activity Created: 2017-02-12  Updated: 2017-08-17  Resolved: 2017-05-26

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB, Tests
Affects Version/s: 5.5, 10.0, 10.1, 10.2
Fix Version/s: 10.1.24, 10.0.32, 10.2.7

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-7069 Fix buildbot failures in main server ... Stalled
is blocked by MDEV-11638 Encryption causes race conditions in ... Closed
Duplicate
is duplicated by MDEV-12899 innodb.innodb_prefix_index_restart_se... Closed
Problem/Incident
causes MDEV-12994 innodb_fast_shutdown=0 skips change b... Closed
Relates
relates to MDEV-12091 Shutdown fails to wait for rollback o... Closed
relates to MDEV-10939 innodb.innodb_monitor failed with a t... Closed
Sprint: 5.5.55, 10.0.30

 Description   

https://internal.askmonty.org/buildbot/builders/p8-rhel6-bintar/builds/549/steps/test/logs/stdio

Thread 1 (Thread 24994):
#0  0x00000fffa3a63a40 in .pthread_kill () from /opt/at7.0/lib64/power7/libpthread.so.0
#1  0x00000000109eebb8 in .my_write_core ()
#2  0x00000000103e3a3c in .handle_fatal_signal ()
#3  <signal handler called>
#4  0x00000fffa320f2f0 in .raise () from /opt/at7.0/lib64/power7/libc.so.6
#5  0x00000fffa32112f4 in .abort () from /opt/at7.0/lib64/power7/libc.so.6
#6  0x000000001079ef54 in .buf_flush_page_cleaner_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0flu.cc:2803
#7  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
#8  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104

innodb_fts.innodb_fts_stopword_charset 'xtradb' w2 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2017-02-09 07:47:28
line
Attempting backtrace. You can use the following information to find out
^ Found warnings in /home/buildbot/maria-slave/p8-rhel6-bintar/build/mysql-test/var/2/log/mysqld.1.err
ok
 
 - saving '/home/buildbot/maria-slave/p8-rhel6-bintar/build/mysql-test/var/2/log/innodb_fts.innodb_fts_stopword_charset-xtradb/' to '/home/buildbot/maria-slave/p8-rhel6-bintar/build/mysql-test/var/log/innodb_fts.innodb_fts_stopword_charset-xtradb/'
 - found 'core.24981' (0/5)
 
Trying 'dbx' to get a backtrace
 
Trying 'gdb' to get a backtrace from coredump /home/buildbot/maria-slave/p8-rhel6-bintar/build/mysql-test/var/log/innodb_fts.innodb_fts_stopword_charset-xtradb/mysqld.1/data/core.24981
Core generated by '/home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/mysqld'
Output from gdb follows. The first stack trace is from the failing thread.
The following stack traces are from all threads (so the failing one is
duplicated).
--------------------------
[New Thread 24994]
[New Thread 25000]
[New Thread 24992]
[New Thread 25032]
[New Thread 24981]
[New Thread 24982]
[New Thread 24983]
[New Thread 24984]
[New Thread 24985]
[New Thread 24987]
[New Thread 24986]
 
warning: Ignoring non-absolute filename: <linux-vdso64.so.1>
Missing separate debuginfo for linux-vdso64.so.1
Try: yum --enablerepo='*-debug*' install /usr/lib/debug/.build-id/b4/c87d71a8818890c1b89fb25fa9a70233e32a89
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
To enable execution of this file add
	add-auto-load-safe-path /opt/at7.0/lib64/libthread_db-1.0.so
line to your configuration file "/home/buildbot/.gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/home/buildbot/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
 
warning: File "/opt/at7.0/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Core was generated by `/home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/mysqld --defaults-group-su'.
Program terminated with signal 6, Aborted.
#0  0x00000fffa3a63a40 in .pthread_kill () from /opt/at7.0/lib64/power7/libpthread.so.0
#0  0x00000fffa3a63a40 in .pthread_kill () from /opt/at7.0/lib64/power7/libpthread.so.0
#1  0x00000000109eebb8 in .my_write_core ()
#2  0x00000000103e3a3c in .handle_fatal_signal ()
#3  <signal handler called>
#4  0x00000fffa320f2f0 in .raise () from /opt/at7.0/lib64/power7/libc.so.6
#5  0x00000fffa32112f4 in .abort () from /opt/at7.0/lib64/power7/libc.so.6
#6  0x000000001079ef54 in .buf_flush_page_cleaner_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0flu.cc:2803
#7  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
#8  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
 
Thread 11 (Thread 24986):
#0  0x00000fffa3a67220 in .__libc_fsync () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000001068b8e4 in ._Z18os_file_flush_funci () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/os/os0file.cc:2482
#2  0x00000000107f04c8 in ._Z9fil_flushm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/include/os0file.ic:355
#3  0x00000000107f13cc in ._Z21fil_flush_file_spacesm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:6014
#4  0x0000000010790fdc in ._Z16buf_dblwr_updatePK10buf_page_t11buf_flush_t () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0dblwr.cc:655
#5  0x0000000010797918 in ._Z24buf_flush_write_completeP10buf_page_t () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0flu.cc:734
#6  0x000000001078b7c8 in ._Z20buf_page_io_completeP10buf_page_t () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0buf.cc:4639
#7  0x00000000107ec38c in ._Z12fil_aio_waitm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:5820
#8  0x000000001070f790 in .io_handler_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:532
#9  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
#10 0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
 
Thread 10 (Thread 24987):
#0  0x00000fffa3a10958 in ?? () from /lib64/libaio.so.1
#1  0x00000fffa3a109bc in .io_getevents () from /lib64/libaio.so.1
#2  0x000000001068f6c8 in ._Z19os_aio_linux_handlemPP10fil_node_tPPvPmS4_ () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/os/os0file.cc:5179
#3  0x00000000107ec224 in ._Z12fil_aio_waitm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:5784
#4  0x000000001070f790 in .io_handler_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:532
#5  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
#6  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
 
Thread 9 (Thread 24985):
#0  0x00000fffa3a10958 in ?? () from /lib64/libaio.so.1
#1  0x00000fffa3a109bc in .io_getevents () from /lib64/libaio.so.1
#2  0x000000001068f6c8 in ._Z19os_aio_linux_handlemPP10fil_node_tPPvPmS4_ () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/os/os0file.cc:5179
#3  0x00000000107ec224 in ._Z12fil_aio_waitm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:5784
#4  0x000000001070f790 in .io_handler_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:532
#5  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
#6  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
 
Thread 8 (Thread 24984):
#0  0x00000fffa3a10958 in ?? () from /lib64/libaio.so.1
#1  0x00000fffa3a109bc in .io_getevents () from /lib64/libaio.so.1
#2  0x000000001068f6c8 in ._Z19os_aio_linux_handlemPP10fil_node_tPPvPmS4_ () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/os/os0file.cc:5179
#3  0x00000000107ec224 in ._Z12fil_aio_waitm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:5784
#4  0x000000001070f790 in .io_handler_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:532
#5  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
#6  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
 
Thread 7 (Thread 24983):
#0  0x00000fffa3a10958 in ?? () from /lib64/libaio.so.1
#1  0x00000fffa3a109bc in .io_getevents () from /lib64/libaio.so.1
#2  0x000000001068f6c8 in ._Z19os_aio_linux_handlemPP10fil_node_tPPvPmS4_ () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/os/os0file.cc:5179
#3  0x00000000107ec224 in ._Z12fil_aio_waitm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:5784
#4  0x000000001070f790 in .io_handler_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:532
#5  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
#6  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
 
Thread 6 (Thread 24982):
#0  0x00000fffa3a10958 in ?? () from /lib64/libaio.so.1
#1  0x00000fffa3a109bc in .io_getevents () from /lib64/libaio.so.1
#2  0x000000001068f6c8 in ._Z19os_aio_linux_handlemPP10fil_node_tPPvPmS4_ () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/os/os0file.cc:5179
#3  0x00000000107ec224 in ._Z12fil_aio_waitm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:5784
#4  0x000000001070f790 in .io_handler_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:532
#5  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
#6  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
 
Thread 5 (Thread 24981):
#0  0x00000fffa3a62130 in .pthread_cond_wait () from /opt/at7.0/lib64/power7/libpthread.so.0
#1  0x000000001014e5c8 in ._Z11mysqld_mainiPPc () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/include/mysql/psi/mysql_thread.h:1154
#2  0x0000000010129e58 in .main () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/main.cc:25
 
Thread 4 (Thread 25032):
#0  0x00000fffa32dbf44 in .__select () at ../sysdeps/unix/syscall-template.S:81
#1  0x00000000106931c4 in ._Z15os_thread_sleepm ()
#2  0x000000001066f9b0 in ._Z37logs_empty_and_mark_files_at_shutdownv () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/log/log0log.cc:3631
#3  0x0000000010713b04 in ._Z27innobase_shutdown_for_mysqlv () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:3002
#4  0x000000001060778c in ._ZL12innobase_endP10handlerton17ha_panic_function () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/handler/ha_innodb.cc:4188
#5  0x00000000103e6db4 in ._Z22ha_finalize_handlertonP13st_plugin_int () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/handler.cc:440
#6  0x000000001021ef2c in ._ZL12reap_pluginsv () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/sql_plugin.cc:1176
#7  0x0000000000000000 in ?? ()
 
Thread 3 (Thread 24992):
#0  0x00000fffa3a62130 in .pthread_cond_wait () from /opt/at7.0/lib64/power7/libpthread.so.0
#1  0x00000000106921dc in ._Z17os_event_wait_lowP8os_eventl ()
#2  0x000000001070c5d4 in .srv_master_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0srv.cc:1010
#3  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
#4  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
 
Thread 2 (Thread 25000):
#0  0x00000fffa3a67f18 in .__libc_sigwait () from /opt/at7.0/lib64/power7/libpthread.so.0
#1  0x00000000101404dc in .signal_hand () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/mysqld.cc:3278
#2  0x00000000109916dc in .pfs_spawn_thread ()
#3  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
#4  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104
 
Thread 1 (Thread 24994):
#0  0x00000fffa3a63a40 in .pthread_kill () from /opt/at7.0/lib64/power7/libpthread.so.0
#1  0x00000000109eebb8 in .my_write_core ()
#2  0x00000000103e3a3c in .handle_fatal_signal ()
#3  <signal handler called>
#4  0x00000fffa320f2f0 in .raise () from /opt/at7.0/lib64/power7/libc.so.6
#5  0x00000fffa32112f4 in .abort () from /opt/at7.0/lib64/power7/libc.so.6
#6  0x000000001079ef54 in .buf_flush_page_cleaner_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0flu.cc:2803
#7  0x00000fffa3a5c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
#8  0x00000fffa32e8454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104

The test has not failed for 8 months, so it's suspicious.



 Comments   
Comment by Marko Mäkelä [ 2017-02-16 ]

This is an assertion failure on shutdown. The server error log says:

Version: '10.0.30-MariaDB'  socket: '/home/buildbot/maria-slave/p8-rhel6-bintar/build/mysql-test/var/tmp/2/mysqld.1.sock'  port: 16060  MariaDB Server
2017-02-09 07:47:09 fff90dff160 InnoDB: FTS Optimize Removing table test/articles
2017-02-09 07:47:19 fff90dff160 InnoDB: FTS Optimize Removing table test/articles
170209  7:47:24 [Note] /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/mysqld: Normal shutdown
 
170209  7:47:24 [Note] Event Scheduler: Purging the queue. 0 events
170209  7:47:24 [Note] InnoDB: FTS optimize thread exiting.
170209  7:47:24 [Note] InnoDB: Starting shutdown...
170209  7:47:25 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
2017-02-09 07:47:25 fff977ef160  InnoDB: Assertion failure in thread 17590432756064 in file buf0flu.cc line 2803
InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0

This is a sanity check at the end of buf_flush_page_cleaner_thread. In the Description, one I/O handler thread still seems to be busy, which could explain why the assertion failed:

Thread 11 (Thread 24986):
#0  0x00000fffa3a67220 in .__libc_fsync () at ../sysdeps/unix/syscall-template.S:81
#1  0x000000001068b8e4 in ._Z18os_file_flush_funci () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/os/os0file.cc:2482
#2  0x00000000107f04c8 in ._Z9fil_flushm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/include/os0file.ic:355
#3  0x00000000107f13cc in ._Z21fil_flush_file_spacesm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:6014
#4  0x0000000010790fdc in ._Z16buf_dblwr_updatePK10buf_page_t11buf_flush_t () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0dblwr.cc:655
#5  0x0000000010797918 in ._Z24buf_flush_write_completeP10buf_page_t () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0flu.cc:734
#6  0x000000001078b7c8 in ._Z20buf_page_io_completeP10buf_page_t () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/buf/buf0buf.cc:4639
#7  0x00000000107ec38c in ._Z12fil_aio_waitm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/fil/fil0fil.cc:5820
#8  0x000000001070f790 in .io_handler_thread () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/xtradb/srv/srv0start.cc:532

There have been no changes to any XtraDB files since commit 6011fb6daa5ac9e6cb97a23b877e4d2568c85d94 where this failed.

Of the recent changes, MDEV-11947 could affect the reproducibility rate, but it should not directly cause the problem.
I suspect some kind of a race condition or lost signal that causes buf_flush_wait_batch_end() or buf_flush_wait_LRU_batch_end() to return prematurely. An alternative explanation would be that the an InnoDB internal subsystem (fulltext indexes, persistent statistics) failed to exit early enough on shutdown, and kept modifying pages too late. No such thread is visible in the stack traces, but theoretically it could have terminated right after modifying the pages.

Comment by Marko Mäkelä [ 2017-02-16 ]

I am reviewing all use of os_event_set() and os_event_reset() in InnoDB. It seems to me that signals can be lost in many cases. MDEV-11947 could have made this particular crash more probably by increasing the probability of last-moment purge activity at shutdown, potentially causing a burst of page flushes near shutdown.

More speculation: Given that MDEV-11802 is reproducible even with the MDEV-11947 fix present, it is possible that the purge threads were stuck due to a lost signal. Then, shutdown would wake up the stuck purge threads, causing a burst of page flush activity. Due to the lost flushing signal, we would then hit the assertion failure.

The impact of this bug should be that instead of performing a clean shutdown, InnoDB is crashing, and a subsequent startup must perform recovery. This can take some time, depending on the age of the previous redo log checkpoint.

Comment by Marko Mäkelä [ 2017-02-16 ]

The os_event_set() and os_event_reset() for buf_pool_t::no_flush[] are protected by buf_pool_t::mutex.
That said, there could be an inconsistency between buf_pool_t::n_flush[], buf_pool_t::init_flush[] and buf_pool_t:flush_list. Maybe buf_flush_wait_LRU_batch_end() should check the length of flush_list too, or the arrays and lists should be protected by a common mutex. (There is a separate buf_pool_t::flush_list mutex, split from buf_pool_t::mutex, but buf_flush_wait_LRU_batch_end() is not holding it.)

Comment by Marko Mäkelä [ 2017-02-16 ]

In XtraDB there are more buf_pool mutexes than in InnoDB. So, I will have to review the XtraDB code separately.

Comment by Marko Mäkelä [ 2017-02-28 ]

This assertion is failing on 10.2 too, and I do not think I have fixed it yet. Two recent examples:
http://buildbot.askmonty.org/buildbot/builders/p8-xenial-bintar/builds/523
http://buildbot.askmonty.org/buildbot/builders/p8-xenial-bintar/builds/614
This probably means that the shutdown fixes in 10.2 (and some in 10.1) should not affect this bug. Theoretically, MDEV-12057 could have fixed it in 10.2, but I guess we will soon get a failure that includes that fix.

I compared 10.0 XtraDB against MySQL 5.6. There were only minor differences in buf_flush_page_cleaner_thread(). The shutdown logic is such that the thread running the shutdown procedure will change srv_shutdown_state from SRV_SHUTDOWN_NONE to SRV_SHUTDOWN_CLEANUP to SRV_SHUTDOWN_FLUSH_PHASE.

Apparently something is still dirtying pages after the state transition to SRV_SHUTDOWN_FLUSH_PHASE.
I should add some debug assertions to catch the rogue writes. Maybe also temporarily add a sleep to the page cleaner thread so that those rogue writers get a better chance to run during the shutdown.

Comment by Marko Mäkelä [ 2017-03-18 ]

I was wondering if this is a duplicate of MDEV-12091 (which I fixed in 10.x). Normally, MySQL would rollback any active transactions on shutdown. (I had to jump through some hoops when writing the test innodb.read_only_recovery for MDEV-11985.)

However, the InnoDB fulltext index subsystem is creating transactions on its own. Such transactions could remain active (incomplete) the first time the test is shutting down and restarting the server. Maybe on the 2nd restart that is performed by the test, the background rollback was still in progress for a transaction that was left incomplete on the 1st shutdown?

The server error log for the 10.0 XtraDB crash and for a 10.2 InnoDB crash refute the above hypothesis. The only messages about recovered transactions are issued for other tests, such as innodb.xa_recovery and innodb.read_only_recovery. For this test, there is no message about a recovered transaction. Here are some relevant messages from the failing test:

170209  7:47:06 [Note] /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/mysqld: Normal shutdown
170209  7:47:09 [Note] InnoDB: innodb_empty_free_list_algorithm has been changed to legacy because of small buffer pool size. In order to use backoff, increase buffer pool at least up to 20MB.
170209  7:47:24 [Note] /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/mysqld: Normal shutdown
 
170209  7:47:24 [Note] Event Scheduler: Purging the queue. 0 events
170209  7:47:24 [Note] InnoDB: FTS optimize thread exiting.
170209  7:47:24 [Note] InnoDB: Starting shutdown...
170209  7:47:25 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
2017-02-09 07:47:25 fff977ef160  InnoDB: Assertion failure in thread 17590432756064 in file buf0flu.cc line 2803
InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0

Because XtraDB is not active in 10.2, the bug affects plain InnoDB too.

Note: The test is being skipped by Valgrind with no valid reason (other than that it takes very long to run on a debug binary). Also the not_crashrep.inc should not be there, because there is no crash injection taking place. A non-debug binary passes the test in 38 seconds with Valgrind and 4 seconds without. I will push the test case clean-up, hoping to see better failures in the future.

Comment by Marko Mäkelä [ 2017-05-19 ]

As pointed out in the follow-up discussion of my comment about an incorrect Xtrabackup change that caused the shutdown hang reported in MDEV-12773, it seems that this bug could be caused by the function srv_master_do_shutdown_tasks() that is being run by the master thread.

The most likely problem should be the row_drop_tables_for_mysql_in_background(). Other tasks are only performed in a slow shutdown (innodb_fast_shutdown=0). It is possible that logs_empty_and_mark_files_at_shutdown() is also not properly waiting for the change buffer merge to complete, but it would also be waiting for purge to complete, and that could perhaps take longer than the change buffer merge.

It might be cleanest to move these clean-up tasks from the master thread directly to logs_empty_and_mark_files_at_shutdown().

Comment by Marko Mäkelä [ 2017-05-19 ]

In the failure log that I analyzed earlier, the srv_master_thread() appears to be waiting for a signal in srv_resume_thread(slot) here:

	srv_main_thread_op_info = "waiting for server activity";
 
	srv_resume_thread(slot);
 
	if (srv_shutdown_state == SRV_SHUTDOWN_EXIT_THREADS) {
		os_thread_exit(NULL);
	}
 
	goto loop;

I would assume that some moments before, it returned from the srv_master_do_shutdown_tasks() loop, which likely caused the problem. Note that srv_shutdown_state must be SRV_SHUTDOWN_FLUSH_PHASE here, if we look at the crashing thread:

	ut_a(srv_shutdown_state == SRV_SHUTDOWN_FLUSH_PHASE);
	for (ulint i = 0; i < srv_buf_pool_instances; i++) {
		buf_pool_t* buf_pool = buf_pool_from_array(i);
		ut_a(UT_LIST_GET_LEN(buf_pool->flush_list) == 0);
	}

The page-dirtying activity of srv_master_do_shutdown_tasks() is clearly violating the documented assumption:

	SRV_SHUTDOWN_FLUSH_PHASE,/*!< At this phase the master and the
				purge threads must have completed their
				work. Once we enter this phase the
				page_cleaner can clean up the buffer
				pool and exit */

All page-dirtying activity should have been completed before advancing from the first shutdown phase, SRV_SHUTDOWN_CLEANUP.

It would probably be cleanest to move this activity to the logs_empty_and_mark_files_at_shutdown() loop, which was refactored in MDEV-11638 and revised in MDEV-12091.

Comment by Marko Mäkelä [ 2017-05-23 ]

bb-10.2-marko
I was unable to repeat any problems on 10.0, but it could require good timing or the use of the change buffer. On 10.1 and 10.2, my test triggered a race condition in the btr_defragment_thread shutdown.
bb-10.1-marko
On 10.1, we were unnecessarily starting scrubbing and defragmenting in read-only mode.
bb-10.0-marko
The 10.0 patch is the simplest.

Comment by Jan Lindström (Inactive) [ 2017-05-26 ]

ok to push.

Generated at Thu Feb 08 07:54:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.