[MDEV-22190] After IMPORT: InnoDB: Record 126 is above rec heap top 120 Created: 2020-04-08  Updated: 2020-06-15  Resolved: 2020-06-13

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.2
Fix Version/s: 10.5.4

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: rr-profile

Attachments: Text File MDEV-22190-gdb-bt-full-page_simple_validate_new.txt     Text File MDEV-22190-gdb-bt.txt    
Issue Links:
Relates
relates to MDEV-15528 Avoid writing freed InnoDB pages Closed
relates to MDEV-21347 innodb_log_optimize_ddl=OFF is not cr... Closed

 Description   

We started to get a new type of failures in a test on 10.5:

10.5 ccc06931c3c7be094f6dcddeb45589f06cf0c8af

CURRENT_TEST: encryption.innodb_encryption_discard_import
mysqltest: At line 154: query 'DROP TABLE t1, t2, t3' failed: 2013: Lost connection to MySQL server during query
Version: '10.5.3-MariaDB-debug-log'  socket: '/dev/shm/var/tmp/1/mysqld.1.sock'  port: 16000  Source distribution
2020-04-08  4:52:36 0 [ERROR] InnoDB: Record 126 is above rec heap top 120
2020-04-08  4:52:36 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 1586335936, calculated checksums for field1: crc32 3521116018, innodb 2228745977,  page type 17855 == INDEX.none 3735928559, stored checksum in field2 15335816, calculated checksums for field2: crc32 3521116018, innodb 11584688, none 3735928559,  page LSN 0 15335816, low 4 bytes of LSN at page end 2171006708, page number (if stored to page already) 6, space id (if created with >= MySQL-4.1.1 and stored already) 8
2020-04-08  4:52:36 0 [Note] InnoDB: Page may be an index page where index id is 28
2020-04-08  4:52:36 0 [Note] InnoDB: Index 28 is `PRIMARY` in table `test`.`t1`
2020-04-08  4:52:36 0 [ERROR] [FATAL] InnoDB: Apparent corruption of an index page [page id: space=8, page number=6] to be written to data file. We intentionally crash the server to prevent corrupt data from ending up in data files.
200408  4:52:36 [ERROR] mysqld got signal 6 ;

Similar to MDEV-21549, this might be an old bug that became more reproducible in 10.5 thanks to improved write performance.
The first failure of this kind on the cross-reference was for a bb-10.5-release that was based on 10.5 fad47df9957d754bec12d4d327c77ae04f71d903. The first failure directly on 10.5 was on dec14dcffe08c0284daac4f93cd750956b15ca47. All these failures occur on kvm-asan only, so it looks somewhat timing-sensitive.



 Comments   
Comment by Marko Mäkelä [ 2020-06-01 ]

I believe that MDEV-22027 could cause this. I do not remember seeing this failure after finishing MDEV-21724.

Comment by Daniel Black [ 2020-06-03 ]

Sorry Marko, wouldn't call success on this just yet.

10.5-d5e8b4d7f97c7a15be5d58b6f088c4c64787a385

 
encryption.innodb_encryption_discard_import 'innodb' w3 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2020-06-03 16:33:00
line
2020-06-03 16:32:45 0 [ERROR] InnoDB: Record 126 is above rec heap top 120
2020-06-03 16:32:45 0 [ERROR] [FATAL] InnoDB: Apparent corruption of an index page [page id: space=8, page number=6] to be written to data file. We intentionally crash the server to prevent corrupt data from ending up in data files.
Attempting backtrace. You can use the following information to find out
2020-06-03 16:32:53 0 [ERROR] InnoDB: Record 126 is above rec heap top 120
2020-06-03 16:32:53 0 [ERROR] [FATAL] InnoDB: Apparent corruption of an index page [page id: space=8, page number=6] to be written to data file. We intentionally crash the server to prevent corrupt data from ending up in data files.
Attempting backtrace. You can use the following information to find out
^ Found warnings in /home/dan/build-mariadb-server-10.5-debug-gccO0/mysql-test/var/3/log/mysqld.1.err
ok
 
 - found 'core' (0/5)
 
Trying 'dbx' to get a backtrace
 
Trying 'gdb' to get a backtrace from coredump /home/dan/build-mariadb-server-10.5-debug-gccO0/mysql-test/var/3/log/encryption.innodb_encryption_discard_import-innodb/mysqld.1/data/core
Core generated by '/home/dan/build-mariadb-server-10.5-debug-gccO0/sql/mariadbd'
Output from gdb follows. The first stack trace is from the failing thread.
The following stack traces are from all threads (so the failing one is
duplicated).
--------------------------
[New LWP 52334]
[New LWP 52331]
[New LWP 52332]
[New LWP 52336]
[New LWP 52338]
[New LWP 52333]
[New LWP 52337]
[New LWP 52344]
[New LWP 52348]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
Core was generated by `/home/dan/build-mariadb-server-10.5-debug-gccO0/sql/mariadbd --defaults-group-s'.
Program terminated with signal SIGABRT, Aborted.
#0  0x000074dad8cb4844 in __pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ../sysdeps/unix/sysv/linux/pthread_kill.c:55
55	../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x74dad168f120 (LWP 52334))]
#0  0x000074dad8cb4844 in __pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ../sysdeps/unix/sysv/linux/pthread_kill.c:55
#1  0x000007965ec3e9fc in my_write_core (sig=6) at /home/dan/mariadb-server-10.5/mysys/stacktrace.c:518
#2  0x000007965df998bc in handle_fatal_signal (sig=6) at /home/dan/mariadb-server-10.5/sql/signal_handler.cc:330
#3  <signal handler called>
#4  0x000074dad866e98c in __libc_signal_restore_set (set=0x74dad168dd48) at ../sysdeps/unix/sysv/linux/nptl-signals.h:80
#5  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:48
#6  0x000074dad8670be0 in __GI_abort () at abort.c:79
#7  0x000007965e926e44 in ib::fatal::~fatal (this=0x74dad168e0b0, __in_chrg=<optimized out>) at /home/dan/mariadb-server-10.5/storage/innobase/ut/ut0ut.cc:601
#8  0x000007965e9de0cc in buf_dblwr_assert_on_corrupt_block (block=0x74dad1aa6260) at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0dblwr.cc:793
#9  0x000007965e9de2a4 in buf_dblwr_check_block (block=0x74dad1aa6260) at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0dblwr.cc:852
#10 0x000007965e9de9e8 in buf_dblwr_flush_buffered_writes () at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0dblwr.cc:976
#11 0x000007965e9f2800 in buf_flush_end (flush_type=BUF_FLUSH_LIST) at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0flu.cc:1975
#12 0x000007965e9f2a98 in buf_flush_do_batch (type=BUF_FLUSH_LIST, min_n=18446744073709551614, lsn_limit=18446744073709551615, n=0x74dad168e438) at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0flu.cc:2019
#13 0x000007965e9f5424 in pc_flush_slot () at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0flu.cc:2640
#14 0x000007965e9f69cc in buf_flush_page_cleaner () at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0flu.cc:3041
#15 0x000074dad8ca885c in start_thread (arg=0x0) at pthread_create.c:463
#16 0x000074dad8779028 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82
 
Thread 9 (Thread 0x74dac0f9f120 (LWP 52348)):
#0  0x000074dad8cb2b94 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x74dac0f9e300, expected=0, futex_word=0x204) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  __pthread_cond_wait_common (abstime=0x74dac0f9e300, mutex=0x7969da044f0, cond=0x7969da41140) at pthread_cond_wait.c:539
#2  __pthread_cond_timedwait (cond=0x7969da41140, mutex=0x7969da044f0, abstime=0x74dac0f9e300) at pthread_cond_wait.c:667
#3  0x000007965eb6ff34 in __gthread_cond_timedwait (__cond=0x7969da41140, __mutex=0x7969da044f0, __abs_timeout=0x74dac0f9e300) at /usr/include/powerpc64le-linux-gnu/c++/7/bits/gthr-default.h:871
#4  0x000007965eb77178 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7969da41140, __lock=@0x74dac0f9e470: {_M_device = 0x7969da044f0, _M_owns = true}, __atime=@0x74dac0f9e3c8: {__d = {__r = 1591166032477822425}}) at /usr/include/c++/7/condition_variable:178
#5  0x000007965eb754bc in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7969da41140, __lock=@0x74dac0f9e470: {_M_device = 0x7969da044f0, _M_owns = true}, __atime=@0x74dac0f9e3c8: {__d = {__r = 1591166032477822425}}) at /usr/include/c++/7/condition_variable:106
#6  0x000007965eb74104 in std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (this=0x7969da41140, __lock=@0x74dac0f9e470: {_M_device = 0x7969da044f0, _M_owns = true}, __rtime=@0x7969da04518: {__r = 60000}) at /usr/include/c++/7/condition_variable:143
#7  0x000007965eb70204 in tpool::thread_pool_generic::wait_for_tasks (this=0x7969da043e0, lk=@0x74dac0f9e470: {_M_device = 0x7969da044f0, _M_owns = true}, thread_data=0x7969da41140) at /home/dan/mariadb-server-10.5/tpool/tpool_generic.cc:418
#8  0x000007965eb704b0 in tpool::thread_pool_generic::get_task (this=0x7969da043e0, thread_var=0x7969da41140, t=0x74dac0f9e4c8) at /home/dan/mariadb-server-10.5/tpool/tpool_generic.cc:471
#9  0x000007965eb70838 in tpool::thread_pool_generic::worker_main (this=0x7969da043e0, thread_var=0x7969da41140) at /home/dan/mariadb-server-10.5/tpool/tpool_generic.cc:516
#10 0x000007965eb76024 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=@0x74dab0001048: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x7965eb707a4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x74dab0001040: 0x7969da043e0) at /usr/include/c++/7/bits/invoke.h:73
#11 0x000007965eb74838 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=@0x74dab0001048: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x7965eb707a4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>) at /usr/include/c++/7/bits/invoke.h:95
#12 0x000007965eb7bf20 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x74dab0001038) at /usr/include/c++/7/thread:234
#13 0x000007965eb7be50 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x74dab0001038) at /usr/include/c++/7/thread:243
#14 0x000007965eb7bdfc in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x74dab0001030) at /usr/include/c++/7/thread:186
#15 0x000074dad8ae3430 in ?? () from /usr/lib/powerpc64le-linux-gnu/libstdc++.so.6
#16 0x000074dad8ca885c in start_thread (arg=0x0) at pthread_create.c:463
#17 0x000074dad8779028 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82
 
Thread 8 (Thread 0x74dad1888120 (LWP 52344)):
#0  0x000074dad8670228 in __GI___sigtimedwait (set=<optimized out>, info=0x74dad1887480, timeout=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:42
#1  0x000074dad86702ec in __GI___sigwaitinfo (set=<optimized out>, info=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwaitinfo.c:25
#2  0x000007965d985474 in my_sigwait (set=0x74dad18875c0, sig=0x74dad1887584, code=0x74dad1887588) at /home/dan/mariadb-server-10.5/include/my_pthread.h:196
#3  0x000007965d98d79c in signal_hand (arg=0x0) at /home/dan/mariadb-server-10.5/sql/mysqld.cc:2987
#4  0x000007965e4607a0 in pfs_spawn_thread (arg=0x7969e1b34a8) at /home/dan/mariadb-server-10.5/storage/perfschema/pfs.cc:2201
#5  0x000074dad8ca885c in start_thread (arg=0x0) at pthread_create.c:463
#6  0x000074dad8779028 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82
 
Thread 7 (Thread 0x74dac37ef120 (LWP 52337)):
#0  0x000074dad8cb2b94 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x74dac37ee300, expected=0, futex_word=0x204) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  __pthread_cond_wait_common (abstime=0x74dac37ee300, mutex=0x7969da044f0, cond=0x7969da41240) at pthread_cond_wait.c:539
#2  __pthread_cond_timedwait (cond=0x7969da41240, mutex=0x7969da044f0, abstime=0x74dac37ee300) at pthread_cond_wait.c:667
#3  0x000007965eb6ff34 in __gthread_cond_timedwait (__cond=0x7969da41240, __mutex=0x7969da044f0, __abs_timeout=0x74dac37ee300) at /usr/include/powerpc64le-linux-gnu/c++/7/bits/gthr-default.h:871
#4  0x000007965eb77178 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7969da41240, __lock=@0x74dac37ee470: {_M_device = 0x7969da044f0, _M_owns = true}, __atime=@0x74dac37ee3c8: {__d = {__r = 1591166033100910763}}) at /usr/include/c++/7/condition_variable:178
#5  0x000007965eb754bc in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7969da41240, __lock=@0x74dac37ee470: {_M_device = 0x7969da044f0, _M_owns = true}, __atime=@0x74dac37ee3c8: {__d = {__r = 1591166033100910763}}) at /usr/include/c++/7/condition_variable:106
#6  0x000007965eb74104 in std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (this=0x7969da41240, __lock=@0x74dac37ee470: {_M_device = 0x7969da044f0, _M_owns = true}, __rtime=@0x7969da04518: {__r = 60000}) at /usr/include/c++/7/condition_variable:143
#7  0x000007965eb70204 in tpool::thread_pool_generic::wait_for_tasks (this=0x7969da043e0, lk=@0x74dac37ee470: {_M_device = 0x7969da044f0, _M_owns = true}, thread_data=0x7969da41240) at /home/dan/mariadb-server-10.5/tpool/tpool_generic.cc:418
#8  0x000007965eb704b0 in tpool::thread_pool_generic::get_task (this=0x7969da043e0, thread_var=0x7969da41240, t=0x74dac37ee4c8) at /home/dan/mariadb-server-10.5/tpool/tpool_generic.cc:471
#9  0x000007965eb70838 in tpool::thread_pool_generic::worker_main (this=0x7969da043e0, thread_var=0x7969da41240) at /home/dan/mariadb-server-10.5/tpool/tpool_generic.cc:516
#10 0x000007965eb76024 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=@0x7969de499a8: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x7965eb707a4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x7969de499a0: 0x7969da043e0) at /usr/include/c++/7/bits/invoke.h:73
#11 0x000007965eb74838 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=@0x7969de499a8: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x7965eb707a4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>) at /usr/include/c++/7/bits/invoke.h:95
#12 0x000007965eb7bf20 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x7969de49998) at /usr/include/c++/7/thread:234
#13 0x000007965eb7be50 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x7969de49998) at /usr/include/c++/7/thread:243
#14 0x000007965eb7bdfc in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x7969de49990) at /usr/include/c++/7/thread:186
#15 0x000074dad8ae3430 in ?? () from /usr/lib/powerpc64le-linux-gnu/libstdc++.so.6
#16 0x000074dad8ca885c in start_thread (arg=0x0) at pthread_create.c:463
#17 0x000074dad8779028 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82
 
Thread 6 (Thread 0x74dad2eaf120 (LWP 52333)):
#0  0x000074dad8cb2b94 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x74dad2eae530, expected=0, futex_word=0x204) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  __pthread_cond_wait_common (abstime=0x74dad2eae530, mutex=0x7966016a2f0 <LOCK_checkpoint+40>, cond=0x7966016a378 <COND_checkpoint>) at pthread_cond_wait.c:539
#2  __pthread_cond_timedwait (cond=0x7966016a378 <COND_checkpoint>, mutex=0x7966016a2f0 <LOCK_checkpoint+40>, abstime=0x74dad2eae530) at pthread_cond_wait.c:667
#3  0x000007965ec45e70 in safe_cond_timedwait (cond=0x7966016a378 <COND_checkpoint>, mp=0x7966016a2c8 <LOCK_checkpoint>, abstime=0x74dad2eae530, file=0x7965effd7c8 "/home/dan/mariadb-server-10.5/include/mysql/psi/mysql_thread.h", line=1257) at /home/dan/mariadb-server-10.5/mysys/thr_mutex.c:546
#4  0x000007965e38c2f0 in inline_mysql_cond_timedwait (that=0x7966016a378 <COND_checkpoint>, mutex=0x7966016a2c8 <LOCK_checkpoint>, abstime=0x74dad2eae530, src_file=0x7965effd808 "/home/dan/mariadb-server-10.5/storage/maria/ma_servicethread.c", src_line=116) at /home/dan/mariadb-server-10.5/include/mysql/psi/mysql_thread.h:1257
#5  0x000007965e38ca20 in my_service_thread_sleep (control=0x7965f75c420 <checkpoint_control>, sleep_time=29000000000) at /home/dan/mariadb-server-10.5/storage/maria/ma_servicethread.c:115
#6  0x000007965e37c25c in ma_checkpoint_background (arg=0x1e) at /home/dan/mariadb-server-10.5/storage/maria/ma_checkpoint.c:707
#7  0x000007965e4607a0 in pfs_spawn_thread (arg=0x7969d9e6a28) at /home/dan/mariadb-server-10.5/storage/perfschema/pfs.cc:2201
#8  0x000074dad8ca885c in start_thread (arg=0x0) at pthread_create.c:463
#9  0x000074dad8779028 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82
 
Thread 5 (Thread 0x74dac2fdf120 (LWP 52338)):
#0  0x000074dad8cb2b94 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x74dac2fde300, expected=0, futex_word=0x204) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  __pthread_cond_wait_common (abstime=0x74dac2fde300, mutex=0x7969da044f0, cond=0x7969da411c0) at pthread_cond_wait.c:539
#2  __pthread_cond_timedwait (cond=0x7969da411c0, mutex=0x7969da044f0, abstime=0x74dac2fde300) at pthread_cond_wait.c:667
#3  0x000007965eb6ff34 in __gthread_cond_timedwait (__cond=0x7969da411c0, __mutex=0x7969da044f0, __abs_timeout=0x74dac2fde300) at /usr/include/powerpc64le-linux-gnu/c++/7/bits/gthr-default.h:871
#4  0x000007965eb77178 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7969da411c0, __lock=@0x74dac2fde470: {_M_device = 0x7969da044f0, _M_owns = true}, __atime=@0x74dac2fde3c8: {__d = {__r = 1591166032488182077}}) at /usr/include/c++/7/condition_variable:178
#5  0x000007965eb754bc in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7969da411c0, __lock=@0x74dac2fde470: {_M_device = 0x7969da044f0, _M_owns = true}, __atime=@0x74dac2fde3c8: {__d = {__r = 1591166032488182077}}) at /usr/include/c++/7/condition_variable:106
#6  0x000007965eb74104 in std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (this=0x7969da411c0, __lock=@0x74dac2fde470: {_M_device = 0x7969da044f0, _M_owns = true}, __rtime=@0x7969da04518: {__r = 60000}) at /usr/include/c++/7/condition_variable:143
#7  0x000007965eb70204 in tpool::thread_pool_generic::wait_for_tasks (this=0x7969da043e0, lk=@0x74dac2fde470: {_M_device = 0x7969da044f0, _M_owns = true}, thread_data=0x7969da411c0) at /home/dan/mariadb-server-10.5/tpool/tpool_generic.cc:418
#8  0x000007965eb704b0 in tpool::thread_pool_generic::get_task (this=0x7969da043e0, thread_var=0x7969da411c0, t=0x74dac2fde4c8) at /home/dan/mariadb-server-10.5/tpool/tpool_generic.cc:471
#9  0x000007965eb70838 in tpool::thread_pool_generic::worker_main (this=0x7969da043e0, thread_var=0x7969da411c0) at /home/dan/mariadb-server-10.5/tpool/tpool_generic.cc:516
#10 0x000007965eb76024 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=@0x7969de49978: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x7965eb707a4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x7969de49970: 0x7969da043e0) at /usr/include/c++/7/bits/invoke.h:73
#11 0x000007965eb74838 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=@0x7969de49978: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x7965eb707a4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>) at /usr/include/c++/7/bits/invoke.h:95
#12 0x000007965eb7bf20 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x7969de49968) at /usr/include/c++/7/thread:234
#13 0x000007965eb7be50 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x7969de49968) at /usr/include/c++/7/thread:243
#14 0x000007965eb7bdfc in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x7969de49960) at /usr/include/c++/7/thread:186
#15 0x000074dad8ae3430 in ?? () from /usr/lib/powerpc64le-linux-gnu/libstdc++.so.6
#16 0x000074dad8ca885c in start_thread (arg=0x0) at pthread_create.c:463
#17 0x000074dad8779028 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82
 
Thread 4 (Thread 0x74dac3fff120 (LWP 52336)):
#0  0x000074dad8cb2b94 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x74dac3ffe300, expected=0, futex_word=0x204) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  __pthread_cond_wait_common (abstime=0x74dac3ffe300, mutex=0x7969da044f0, cond=0x7969da412c0) at pthread_cond_wait.c:539
#2  __pthread_cond_timedwait (cond=0x7969da412c0, mutex=0x7969da044f0, abstime=0x74dac3ffe300) at pthread_cond_wait.c:667
#3  0x000007965eb6ff34 in __gthread_cond_timedwait (__cond=0x7969da412c0, __mutex=0x7969da044f0, __abs_timeout=0x74dac3ffe300) at /usr/include/powerpc64le-linux-gnu/c++/7/bits/gthr-default.h:871
#4  0x000007965eb77178 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7969da412c0, __lock=@0x74dac3ffe470: {_M_device = 0x7969da044f0, _M_owns = true}, __atime=@0x74dac3ffe3c8: {__d = {__r = 1591166032475521173}}) at /usr/include/c++/7/condition_variable:178
#5  0x000007965eb754bc in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7969da412c0, __lock=@0x74dac3ffe470: {_M_device = 0x7969da044f0, _M_owns = true}, __atime=@0x74dac3ffe3c8: {__d = {__r = 1591166032475521173}}) at /usr/include/c++/7/condition_variable:106
#6  0x000007965eb74104 in std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (this=0x7969da412c0, __lock=@0x74dac3ffe470: {_M_device = 0x7969da044f0, _M_owns = true}, __rtime=@0x7969da04518: {__r = 60000}) at /usr/include/c++/7/condition_variable:143
#7  0x000007965eb70204 in tpool::thread_pool_generic::wait_for_tasks (this=0x7969da043e0, lk=@0x74dac3ffe470: {_M_device = 0x7969da044f0, _M_owns = true}, thread_data=0x7969da412c0) at /home/dan/mariadb-server-10.5/tpool/tpool_generic.cc:418
#8  0x000007965eb704b0 in tpool::thread_pool_generic::get_task (this=0x7969da043e0, thread_var=0x7969da412c0, t=0x74dac3ffe4c8) at /home/dan/mariadb-server-10.5/tpool/tpool_generic.cc:471
#9  0x000007965eb70838 in tpool::thread_pool_generic::worker_main (this=0x7969da043e0, thread_var=0x7969da412c0) at /home/dan/mariadb-server-10.5/tpool/tpool_generic.cc:516
#10 0x000007965eb76024 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=@0x7969de49a08: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x7965eb707a4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x7969de49a00: 0x7969da043e0) at /usr/include/c++/7/bits/invoke.h:73
#11 0x000007965eb74838 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=@0x7969de49a08: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x7965eb707a4 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>) at /usr/include/c++/7/bits/invoke.h:95
#12 0x000007965eb7bf20 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x7969de499f8) at /usr/include/c++/7/thread:234
#13 0x000007965eb7be50 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x7969de499f8) at /usr/include/c++/7/thread:243
#14 0x000007965eb7bdfc in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x7969de499f0) at /usr/include/c++/7/thread:186
#15 0x000074dad8ae3430 in ?? () from /usr/lib/powerpc64le-linux-gnu/libstdc++.so.6
#16 0x000074dad8ca885c in start_thread (arg=0x0) at pthread_create.c:463
#17 0x000074dad8779028 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82
 
Thread 3 (Thread 0x74dad428f120 (LWP 52332)):
#0  0x000074dad8cb2b94 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x74dad428e630, expected=0, futex_word=0x204) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  __pthread_cond_wait_common (abstime=0x74dad428e630, mutex=0x7966023b5b8 <LOCK_timer+40>, cond=0x7966023b640 <COND_timer>) at pthread_cond_wait.c:539
#2  __pthread_cond_timedwait (cond=0x7966023b640 <COND_timer>, mutex=0x7966023b5b8 <LOCK_timer+40>, abstime=0x74dad428e630) at pthread_cond_wait.c:667
#3  0x000007965ec45e70 in safe_cond_timedwait (cond=0x7966023b640 <COND_timer>, mp=0x7966023b590 <LOCK_timer>, abstime=0x74dad428e630, file=0x7965f2075b8 "/home/dan/mariadb-server-10.5/include/mysql/psi/mysql_thread.h", line=1257) at /home/dan/mariadb-server-10.5/mysys/thr_mutex.c:546
#4  0x000007965ec47660 in inline_mysql_cond_timedwait (that=0x7966023b640 <COND_timer>, mutex=0x7966023b590 <LOCK_timer>, abstime=0x74dad428e630, src_file=0x7965f2075f8 "/home/dan/mariadb-server-10.5/mysys/thr_timer.c", src_line=321) at /home/dan/mariadb-server-10.5/include/mysql/psi/mysql_thread.h:1257
#5  0x000007965ec48818 in timer_handler (arg=0x0) at /home/dan/mariadb-server-10.5/mysys/thr_timer.c:321
#6  0x000007965e4607a0 in pfs_spawn_thread (arg=0x7969d932fd8) at /home/dan/mariadb-server-10.5/storage/perfschema/pfs.cc:2201
#7  0x000074dad8ca885c in start_thread (arg=0x0) at pthread_create.c:463
#8  0x000074dad8779028 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82
 
Thread 2 (Thread 0x74dad9278c70 (LWP 52331)):
#0  0x000074dad8cb9768 in __GI___nanosleep (requested_time=0x204, remaining=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x000007965e70da38 in os_thread_sleep (tm=100000) at /home/dan/mariadb-server-10.5/storage/innobase/os/os0thread.cc:231
#2  0x000007965e6bc40c in logs_empty_and_mark_files_at_shutdown () at /home/dan/mariadb-server-10.5/storage/innobase/log/log0log.cc:1641
#3  0x000007965e88d640 in innodb_shutdown () at /home/dan/mariadb-server-10.5/storage/innobase/srv/srv0start.cc:2089
#4  0x000007965e59b3f4 in innobase_end () at /home/dan/mariadb-server-10.5/storage/innobase/handler/ha_innodb.cc:4089
#5  0x000007965df9e920 in ha_finalize_handlerton (plugin=0x7969d936310) at /home/dan/mariadb-server-10.5/sql/handler.cc:524
#6  0x000007965db6c1a4 in plugin_deinitialize (plugin=0x7969d936310, ref_check=true) at /home/dan/mariadb-server-10.5/sql/sql_plugin.cc:1262
#7  0x000007965db6c8c0 in reap_plugins () at /home/dan/mariadb-server-10.5/sql/sql_plugin.cc:1338
#8  0x000007965db6f4d8 in plugin_shutdown () at /home/dan/mariadb-server-10.5/sql/sql_plugin.cc:2019
#9  0x000007965d98aa68 in clean_up (print_message=true) at /home/dan/mariadb-server-10.5/sql/mysqld.cc:1984
#10 0x000007965d994c70 in mysqld_main (argc=172, argv=0x7969d7e3b60) at /home/dan/mariadb-server-10.5/sql/mysqld.cc:5647
#11 0x000007965d9852d4 in main (argc=31, argv=0x7ffffbfaff08) at /home/dan/mariadb-server-10.5/sql/main.cc:25
 
Thread 1 (Thread 0x74dad168f120 (LWP 52334)):
#0  0x000074dad8cb4844 in __pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ../sysdeps/unix/sysv/linux/pthread_kill.c:55
#1  0x000007965ec3e9fc in my_write_core (sig=6) at /home/dan/mariadb-server-10.5/mysys/stacktrace.c:518
#2  0x000007965df998bc in handle_fatal_signal (sig=6) at /home/dan/mariadb-server-10.5/sql/signal_handler.cc:330
#3  <signal handler called>
#4  0x000074dad866e98c in __libc_signal_restore_set (set=0x74dad168dd48) at ../sysdeps/unix/sysv/linux/nptl-signals.h:80
#5  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:48
#6  0x000074dad8670be0 in __GI_abort () at abort.c:79
#7  0x000007965e926e44 in ib::fatal::~fatal (this=0x74dad168e0b0, __in_chrg=<optimized out>) at /home/dan/mariadb-server-10.5/storage/innobase/ut/ut0ut.cc:601
#8  0x000007965e9de0cc in buf_dblwr_assert_on_corrupt_block (block=0x74dad1aa6260) at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0dblwr.cc:793
#9  0x000007965e9de2a4 in buf_dblwr_check_block (block=0x74dad1aa6260) at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0dblwr.cc:852
#10 0x000007965e9de9e8 in buf_dblwr_flush_buffered_writes () at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0dblwr.cc:976
#11 0x000007965e9f2800 in buf_flush_end (flush_type=BUF_FLUSH_LIST) at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0flu.cc:1975
#12 0x000007965e9f2a98 in buf_flush_do_batch (type=BUF_FLUSH_LIST, min_n=18446744073709551614, lsn_limit=18446744073709551615, n=0x74dad168e438) at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0flu.cc:2019
#13 0x000007965e9f5424 in pc_flush_slot () at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0flu.cc:2640
#14 0x000007965e9f69cc in buf_flush_page_cleaner () at /home/dan/mariadb-server-10.5/storage/innobase/buf/buf0flu.cc:3041
#15 0x000074dad8ca885c in start_thread (arg=0x0) at pthread_create.c:463
#16 0x000074dad8779028 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82
 
 - saving '/home/dan/build-mariadb-server-10.5-debug-gccO0/mysql-test/var/3/log/encryption.innodb_encryption_discard_import-innodb/' to '/home/dan/build-mariadb-server-10.5-debug-gccO0/mysql-test/var/log/encryption.innodb_encryption_discard_import-innodb/'

Comment by Daniel Black [ 2020-06-05 ]

seems to be a memory barrier related:

on a 2 node system with 80 threads per node, the linux kernel co-locates the threads such that the following will successfully run:

10.5-036920c7262dfaf1532f3ba77a013b7ec04d5f7c (bb-10.5-MDEV-15053-3) works ok

build-mariadb-server-10.5-debug-gccO0$ mysql-test/mtr --parallel=80 --force --max-test-fail=30  --repeat=4 --mem encryption.innodb_encryption_discard_import{,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,}
 
with frequent, byt non-crashing
encryption.innodb_encryption_discard_import 'innodb' w68 [ 2 fail ]  Found warnings/errors in server log file!
        Test ended at 2020-06-05 16:57:26
line
2020-06-05 16:57:11 3 [Warning] InnoDB: Trying to delete tablespace 'test/#sql-ib27' but there are 1 pending operations on it.
^ Found warnings in /home/dan/build-mariadb-server-10.5-debug-gccO0/mysql-test/var/68/log/mysqld.1.err
ok

Push the --parallel=120 or beyond and quite a few instances of the assertion will almost instantly occur:

parallel=120

worker[118] Using MTR_BUILD_THREAD 411, with reserved ports 18220..18239
worker[107] Using MTR_BUILD_THREAD 413, with reserved ports 18260..18279
worker[120] Using MTR_BUILD_THREAD 415, with reserved ports 18300..18319
worker[106] Using MTR_BUILD_THREAD 416, with reserved ports 18320..18339
worker[105] Using MTR_BUILD_THREAD 417, with reserved ports 18340..18359
worker[114] Using MTR_BUILD_THREAD 414, with reserved ports 18280..18299
worker[108] Using MTR_BUILD_THREAD 418, with reserved ports 18360..18379
worker[117] Using MTR_BUILD_THREAD 419, with reserved ports 18380..18399
encryption.innodb_encryption_discard_import 'innodb' w26 [ pass ]  90701
encryption.innodb_encryption_discard_import 'innodb' w3 [ pass ]  91568
encryption.innodb_encryption_discard_import 'innodb' w111 [ pass ]  92694
encryption.innodb_encryption_discard_import 'innodb' w76 [ pass ]  93182
encryption.innodb_encryption_discard_import 'innodb' w21 [ pass ]  93436
encryption.innodb_encryption_discard_import 'innodb' w87 [ pass ]  93424
encryption.innodb_encryption_discard_import 'innodb' w73 [ pass ]  93722
encryption.innodb_encryption_discard_import 'innodb' w114 [ pass ]  93723
encryption.innodb_encryption_discard_import 'innodb' w40 [ pass ]  93673
encryption.innodb_encryption_discard_import 'innodb' w85 [ pass ]  94168
encryption.innodb_encryption_discard_import 'innodb' w31 [ pass ]  93777
encryption.innodb_encryption_discard_import 'innodb' w5 [ pass ]  94037
encryption.innodb_encryption_discard_import 'innodb' w60 [ pass ]  94459
encryption.innodb_encryption_discard_import 'innodb' w24 [ pass ]  94056
encryption.innodb_encryption_discard_import 'innodb' w74 [ pass ]  94880
encryption.innodb_encryption_discard_import 'innodb' w96 [ pass ]  94790
encryption.innodb_encryption_discard_import 'innodb' w81 [ pass ]  94545
encryption.innodb_encryption_discard_import 'innodb' w53 [ pass ]  94533
encryption.innodb_encryption_discard_import 'innodb' w14 [ pass ]  94979
encryption.innodb_encryption_discard_import 'innodb' w22 [ pass ]  95137
encryption.innodb_encryption_discard_import 'innodb' w83 [ pass ]  95067
encryption.innodb_encryption_discard_import 'innodb' w109 [ pass ]  95219
encryption.innodb_encryption_discard_import 'innodb' w99 [ pass ]  95144
encryption.innodb_encryption_discard_import 'innodb' w2 [ pass ]  95356
encryption.innodb_encryption_discard_import 'innodb' w36 [ pass ]  94990
encryption.innodb_encryption_discard_import 'innodb' w9 [ pass ]  95262
encryption.innodb_encryption_discard_import 'innodb' w27 [ pass ]  95053
encryption.innodb_encryption_discard_import 'innodb' w77 [ pass ]  95483
encryption.innodb_encryption_discard_import 'innodb' w102 [ pass ]  95680
encryption.innodb_encryption_discard_import 'innodb' w113 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2020-06-05 17:26:09
line
2020-06-05 17:26:01 0 [ERROR] InnoDB: Record 126 is above rec heap top 120
......

Comment by Marko Mäkelä [ 2020-06-13 ]

I am now looking at a rr replay trace of a failed encryption.innodb_encryption_discard_import run. In the page image, PAGE_HEAP_TOP, PAGE_N_HEAP and PAGE_N_RECS correspond to an empty ROW_FORMAT≠REDUNDANT page (120, 0x8002, 0), but the page is not empty. The infimum pseudo-record points to the payload start of the record that was allocated at byte 120:

od -t x1 -Ax -w14 -j 120 page.bin

Each record is 14 bytes; that is why I chose this format. Every record is also allocated sequentially; the REC_NEXT bytes are always 00 0e except for the very last one:

000078 00 10 00 11 00 0e 80 00 00 00 00 00 00 04
000086 00 00 00 19 00 0e 80 00 00 37 00 00 00 05
000804 00 00 04 61 00 0e 80 00 1d a6 00 00 00 aa
000812 00 00 04 69 f8 58 80 00 1d dd 00 00 00 ab
000820 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*

So, the PAGE_HEAP_TOP should actually be 0x820 and the number of records should be 140 = (0x820-120)/14.

As far as I understood, the crash that I am debugging occurred during the execution of the statement

ALTER TABLE t1 ENGINE InnoDB;

and now I have a call stack for the problematic page 8:6:

#0  PageBulk::finish (this=0x61a000012c98)
    at /home/mleich/bb-10.5-marko/storage/innobase/btr/btr0bulk.cc:504
#1  0x000055ea99985ff0 in BtrBulk::pageCommit (this=0x6070003b56a8, 
    page_bulk=0x61a000012c98, next_page_bulk=0x0, insert_father=false)
    at /home/mleich/bb-10.5-marko/storage/innobase/btr/btr0bulk.cc:908
#2  0x000055ea99987cf2 in BtrBulk::finish (this=0x6070003b56a8, err=DB_SUCCESS)
    at /home/mleich/bb-10.5-marko/storage/innobase/btr/btr0bulk.cc:1157

At this point, the page header looks different: the page payload ends at 0xa6e, which corresponds to the PAGE_HEAP_TOP.

The assertion will fail later for the same page number, but the page has been migrated to a different block address in the buffer pool.
I currently suspect that the page was incorrectly evicted (likely because we failed to keep it in the buf_pool.flush_list) and an earlier version reloaded. I should be able to confirm this shortly.

This is also somewhat related to MDEV-21347, where I would like to remove buf_block_t::skip_flush_check.

Comment by Marko Mäkelä [ 2020-06-13 ]

We seem to be evicting a dirty page.

Thread 30 hit Breakpoint 1, buf_flush_write_complete (bpage=0x7f457eaaa888, flush_type=IORequest::FLUSH_LIST, dblwr=true) at /home/mleich/bb-10.5-marko/storage/innobase/buf/buf0flu.cc:529
529	  ut_ad(mutex_own(&buf_pool.mutex));
(rr) when
Current event: 974555
Thread 34 hit Hardware watchpoint 2: -location $4.frame[16]@8
 
Old value = "\000\000\000\000\000ı9"
New value = "\000\000\000\000\000ı\251"
mach_write_to_4 (b=0x7f457edc4014 "", n=12890537) at /home/mleich/bb-10.5-marko/storage/innobase/include/mach0data.ic:167
167	}
1: /x $4.frame[16]@8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0xc4, 0xb1, 0xa9}
Current event: 1184399
(rr) 
Continuing.
 
Thread 34 hit Hardware watchpoint 2: -location $4.frame[16]@8
 
Old value = "\000\000\000\000\000ı\251"
New value = "\000\000\000\000\000\000\261\251"
__memset_avx2_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:141
141	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
1: /x $4.frame[16]@8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xb1, 0xa9}
Current event: 1542264
(rr) bt
#0  __memset_avx2_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:141
#1  0x00007f458a8f1cdf in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.4
#2  0x000055ea99a6cb31 in buf_LRU_block_free_non_file_page (block=0x7f457eaaa888) at /home/mleich/bb-10.5-marko/storage/innobase/buf/buf0lru.cc:1352
#3  0x000055ea99a651cb in buf_LRU_block_free_hashed_page (block=0x7f457eaaa888) at /home/mleich/bb-10.5-marko/storage/innobase/buf/buf0lru.cc:150
#4  0x000055ea99a6c6e9 in buf_LRU_free_page (bpage=0x7f457eaaa888, zip=true) at /home/mleich/bb-10.5-marko/storage/innobase/buf/buf0lru.cc:1327
#5  0x000055ea99a67027 in buf_LRU_free_from_common_LRU_list (scan_all=false) at /home/mleich/bb-10.5-marko/storage/innobase/buf/buf0lru.cc:563
#6  0x000055ea99a67344 in buf_LRU_scan_and_free_block (scan_all=false) at /home/mleich/bb-10.5-marko/storage/innobase/buf/buf0lru.cc:598
#7  0x000055ea99a68872 in buf_LRU_get_free_block (have_mutex=false) at /home/mleich/bb-10.5-marko/storage/innobase/buf/buf0lru.cc:780
#8  0x000055ea99a74291 in buf_page_init_for_read (mode=132, page_id=..., zip_size=0, unzip=false) at /home/mleich/bb-10.5-marko/storage/innobase/buf/buf0rea.cc:109

The penultimate watchpoint caught an update of FIL_PAGE_LSN to 0xc4b1a9, and the page was not written to the file system after that.

Comment by Marko Mäkelä [ 2020-06-13 ]

The page was removed from the flush list shortly before it was reused for something else:

#6  0x000055ea9995dbdb in btr_page_free (index=0x618000021520, block=0x7f457eaaa888, mtr=0x7f45692ac5a0, blob=false) at /home/mleich/bb-10.5-marko/storage/innobase/btr/btr0btr.cc:746
#7  0x000055ea9998819e in BtrBulk::finish (this=0x6070003b56a8, err=DB_SUCCESS) at /home/mleich/bb-10.5-marko/storage/innobase/btr/btr0bulk.cc:1195
(rr) when
Current event: 1184391

Thanks to MDEV-15528, we will skip the write of the freed page to the data file. At this point, the contents of page 8:6 looks rather similar to the contents of page 8:3 (the root page of the primary key). The root page has no siblings at this point, and its PAGE_LEVEL is 1, that is, it is 1 level above the leaf level.

Up until now, all seems well and good. Page 8:6 is a garbage page, and nobody should have any business modifying it. An allowed subsequent course of action would be to allocate the page for something else and invoke buf_page_create() on it.

The actual problem is revealed later, when an encryption thread is about to perform a dummy write on the garbage page. Here it is reading the garbage page into the buffer pool:

(rr) when
Current event: 1705403
(rr) bt
#0  buf_page_get_low (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=12, file=0x55ea9b035ba0 "/home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc", line=1739, mtr=0x7f4579964740, 
    err=0x7f45799645d0, allow_ibuf_merge=false) at /home/mleich/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:3003
#1  0x000055ea99a1ce5c in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=12, file=0x55ea9b035ba0 "/home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc", line=1739, 
    mtr=0x7f4579964740, err=0x7f45799645d0, allow_ibuf_merge=false) at /home/mleich/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:3609
#2  0x000055ea99b6d510 in fil_crypt_get_page_throttle_func (state=0x7f4579964dc0, offset=6, mtr=0x7f4579964740, sleeptime_ms=0x7f4579964700, 
    file=0x55ea9b035ba0 "/home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc", line=1739) at /home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc:1660
#3  0x000055ea99b6dc29 in fil_crypt_rotate_page (key_state=0x7f4579964d80, state=0x7f4579964dc0) at /home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc:1737
#4  0x000055ea99b6e779 in fil_crypt_rotate_pages (key_state=0x7f4579964d80, state=0x7f4579964dc0) at /home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc:1866
#5  0x000055ea99b6fb7b in fil_crypt_thread () at /home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc:2060

The first such call will return nullptr, but a little later the garbage page does exist in the buffer pool, and we will be able to perform the dummy write, which will trip the check at page flushing time:

(rr) fin
Run till exit from #1  0x000055ea99a1ce5c in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=16, file=0x55ea9b035ba0 "/home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc", 
    line=1739, mtr=0x7f4579964740, err=0x7f45799645d0, allow_ibuf_merge=false) at /home/mleich/bb-10.5-marko/storage/innobase/buf/buf0buf.cc:3609
0x000055ea99b6d640 in fil_crypt_get_page_throttle_func (state=0x7f4579964dc0, offset=6, mtr=0x7f4579964740, sleeptime_ms=0x7f4579964700, 
    file=0x55ea9b035ba0 "/home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc", line=1739) at /home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc:1674
1674		block = buf_page_get_gen(page_id, zip_size,
Value returned is $14 = (buf_block_t *) 0x7f457eab8940
(rr) bt
#0  0x000055ea99b6d640 in fil_crypt_get_page_throttle_func (state=0x7f4579964dc0, offset=6, mtr=0x7f4579964740, sleeptime_ms=0x7f4579964700, 
    file=0x55ea9b035ba0 "/home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc", line=1739) at /home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc:1674
#1  0x000055ea99b6dc29 in fil_crypt_rotate_page (key_state=0x7f4579964d80, state=0x7f4579964dc0) at /home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc:1737
#2  0x000055ea99b6e779 in fil_crypt_rotate_pages (key_state=0x7f4579964d80, state=0x7f4579964dc0) at /home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc:1866
#3  0x000055ea99b6fb7b in fil_crypt_thread () at /home/mleich/bb-10.5-marko/storage/innobase/fil/fil0crypt.cc:2060
#4  0x00007f458957b6db in start_thread (arg=0x7f4579965700) at pthread_create.c:463
#5  0x00007f458876188f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) when
Current event: 1705413
(rr) c
Continuing.
 
Thread 14 received signal SIGABRT, Aborted.
[Switching to Thread 75971.76741]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(rr) when
Current event: 1753207

I think that for now, I will work around this by setting the flag buf_block_t::skip_flush_check in the encryption thread. That flag will ultimately be removed and the BtrBulk code cleaned up in MDEV-21347, which I believe to only affect 10.2, 10.3, 10.4 but not 10.5.

Side note: I am unable to repeat this failure on my system. I tried CMAKE_BUILD_TYPE=Debug compiled both with -O2 and without any optimizations.

Comment by Daniel Black [ 2020-06-15 ]

Fix confirmed. A couple of (hopefully minor warnings), but most importantly no errors of this bug report. Thank you.

10.5-c9f5cb97af6d3ef853f84a19602efea715016734 fix confirmed

dan@fstn4-p1:~/build-mariadb-server-10.5-debug-gccO0$ mysql-test/mtr --parallel=160 --force --max-test-fail=30  --repeat=4 --mem encryption.innodb_encryption_discard_import{,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,}
...
encryption.innodb_encryption_discard_import 'innodb' w54 [ 3 pass ]  92509
encryption.innodb_encryption_discard_import 'innodb' w31 [ 4 pass ]  90983
encryption.innodb_encryption_discard_import 'innodb' w97 [ 4 pass ]  90242
encryption.innodb_encryption_discard_import 'innodb' w130 [ 4 pass ]  89715
encryption.innodb_encryption_discard_import 'innodb' w2 [ 4 pass ]  93117
encryption.innodb_encryption_discard_import 'innodb' w58 [ 4 pass ]  90451
encryption.innodb_encryption_discard_import 'innodb' w124 [ 4 pass ]  89816
encryption.innodb_encryption_discard_import 'innodb' w87 [ 4 pass ]  89178
encryption.innodb_encryption_discard_import 'innodb' w57 [ 4 pass ]  91261
encryption.innodb_encryption_discard_import 'innodb' w12 [ 4 pass ]  90886
encryption.innodb_encryption_discard_import 'innodb' w42 [ 4 pass ]  89372
encryption.innodb_encryption_discard_import 'innodb' w82 [ 4 pass ]  92053
encryption.innodb_encryption_discard_import 'innodb' w75 [ 4 pass ]  91744
encryption.innodb_encryption_discard_import 'innodb' w3 [ 4 pass ]  90868
encryption.innodb_encryption_discard_import 'innodb' w45 [ 4 pass ]  91657
encryption.innodb_encryption_discard_import 'innodb' w96 [ 4 pass ]  88776
encryption.innodb_encryption_discard_import 'innodb' w13 [ 4 pass ]  90238
encryption.innodb_encryption_discard_import 'innodb' w7 [ 4 pass ]  91961
encryption.innodb_encryption_discard_import 'innodb' w114 [ 4 pass ]  92635
encryption.innodb_encryption_discard_import 'innodb' w30 [ 4 pass ]  91885
encryption.innodb_encryption_discard_import 'innodb' w62 [ 4 pass ]  92384
encryption.innodb_encryption_discard_import 'innodb' w160 [ 4 pass ]  92220
encryption.innodb_encryption_discard_import 'innodb' w131 [ 4 pass ]  93382
encryption.innodb_encryption_discard_import 'innodb' w117 [ 4 pass ]  91168
encryption.innodb_encryption_discard_import 'innodb' w71 [ 4 pass ]  91152
encryption.innodb_encryption_discard_import 'innodb' w74 [ 4 pass ]  92536
encryption.innodb_encryption_discard_import 'innodb' w64 [ 4 pass ]  91634
encryption.innodb_encryption_discard_import 'innodb' w123 [ 4 pass ]  89752
encryption.innodb_encryption_discard_import 'innodb' w39 [ 4 pass ]  91201
encryption.innodb_encryption_discard_import 'innodb' w147 [ 4 pass ]  91203
encryption.innodb_encryption_discard_import 'innodb' w135 [ 4 pass ]  91850
encryption.innodb_encryption_discard_import 'innodb' w106 [ 4 pass ]  91389
encryption.innodb_encryption_discard_import 'innodb' w126 [ 4 pass ]  89969
encryption.innodb_encryption_discard_import 'innodb' w99 [ 4 pass ]  90548
encryption.innodb_encryption_discard_import 'innodb' w141 [ 4 pass ]  93512
encryption.innodb_encryption_discard_import 'innodb' w90 [ 4 pass ]  90921
encryption.innodb_encryption_discard_import 'innodb' w1 [ 4 pass ]  90680
encryption.innodb_encryption_discard_import 'innodb' w46 [ 4 pass ]  89816
encryption.innodb_encryption_discard_import 'innodb' w118 [ 4 pass ]  91283
encryption.innodb_encryption_discard_import 'innodb' w134 [ 4 pass ]  90862
encryption.innodb_encryption_discard_import 'innodb' w70 [ 4 pass ]  90307
encryption.innodb_encryption_discard_import 'innodb' w23 [ 4 pass ]  92586
encryption.innodb_encryption_discard_import 'innodb' w102 [ 4 pass ]  92368
encryption.innodb_encryption_discard_import 'innodb' w63 [ 4 pass ]  93105
encryption.innodb_encryption_discard_import 'innodb' w48 [ 4 pass ]  90333
encryption.innodb_encryption_discard_import 'innodb' w36 [ 4 pass ]  89422
encryption.innodb_encryption_discard_import 'innodb' w111 [ 4 pass ]  90681
encryption.innodb_encryption_discard_import 'innodb' w29 [ 4 pass ]  91288
encryption.innodb_encryption_discard_import 'innodb' w43 [ 4 pass ]  91116
encryption.innodb_encryption_discard_import 'innodb' w34 [ 4 pass ]  92586
encryption.innodb_encryption_discard_import 'innodb' w155 [ 4 pass ]  90528
encryption.innodb_encryption_discard_import 'innodb' w133 [ 4 pass ]  91216
encryption.innodb_encryption_discard_import 'innodb' w128 [ 4 pass ]  91466
encryption.innodb_encryption_discard_import 'innodb' w119 [ 4 pass ]  90827
encryption.innodb_encryption_discard_import 'innodb' w136 [ 4 pass ]  90500
encryption.innodb_encryption_discard_import 'innodb' w79 [ 4 pass ]  90279
encryption.innodb_encryption_discard_import 'innodb' w146 [ 4 pass ]  93473
encryption.innodb_encryption_discard_import 'innodb' w101 [ 4 pass ]  93685
encryption.innodb_encryption_discard_import 'innodb' w59 [ 4 pass ]  89491
encryption.innodb_encryption_discard_import 'innodb' w100 [ 4 pass ]  92208
encryption.innodb_encryption_discard_import 'innodb' w121 [ 4 pass ]  91196
encryption.innodb_encryption_discard_import 'innodb' w61 [ 4 pass ]  90388
encryption.innodb_encryption_discard_import 'innodb' w25 [ 4 pass ]  91218
encryption.innodb_encryption_discard_import 'innodb' w14 [ 4 pass ]  89496
encryption.innodb_encryption_discard_import 'innodb' w151 [ 4 pass ]  90569
encryption.innodb_encryption_discard_import 'innodb' w103 [ 4 pass ]  91180
encryption.innodb_encryption_discard_import 'innodb' w44 [ 4 pass ]  89909
encryption.innodb_encryption_discard_import 'innodb' w78 [ 4 pass ]  91551
encryption.innodb_encryption_discard_import 'innodb' w159 [ 4 pass ]  91890
encryption.innodb_encryption_discard_import 'innodb' w113 [ 4 pass ]  90588
....
encryption.innodb_encryption_discard_import 'innodb' w4 [ 4 fail ]  Found warnings/errors in server log file!
        Test ended at 2020-06-15 19:55:31
line
2020-06-15 19:55:16 3 [Warning] InnoDB: Trying to delete tablespace 'test/#sql-ib27' but there are 1 pending operations on it.
^ Found warnings in /home/dan/build-mariadb-server-10.5-debug-gccO0/mysql-test/var/4/log/mysqld.1.err
ok
 
encryption.innodb_encryption_discard_import 'innodb' w87 [ 3 fail ]
        Test ended at 2020-06-15 19:53:51
 
CURRENT_TEST: encryption.innodb_encryption_discard_import
--- /home/dan/mariadb-server-10.5/mysql-test/suite/encryption/r/innodb_encryption_discard_import.result	2020-05-07 20:26:02.552751540 +1000
+++ /home/dan/mariadb-server-10.5/mysql-test/suite/encryption/r/innodb_encryption_discard_import.reject	2020-06-15 19:53:50.826222226 +1000
@@ -115,7 +115,7 @@
 # t1 yes on expecting NOT FOUND
 NOT FOUND /foobar/ in t1.ibd
 # t2 ... on expecting NOT FOUND
-NOT FOUND /temp/ in t2.ibd
+FOUND 1 /temp/ in t2.ibd
 # t3 ... on expecting NOT FOUND
 NOT FOUND /barfoo/ in t3.ibd
 # restart
 
 
Completed: Failed 28/1756 tests, 98.41% were successful.
 
Failing test(s): encryption.innodb_encryption_discard_import
 
The log files in var/log may give you some hint of what went wrong.
 
If you want to report this error, please read first the documentation
at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html
 
Errors/warnings were found in logfiles during server shutdown after running the
following sequence(s) of tests:
    encryption.innodb_encryption_discard_import
mysql-test-run: *** ERROR: there were failing test cases

The above warning about pending operations only occured after the first 1292 tests succeeded.

Comment by Marko Mäkelä [ 2020-06-15 ]

danblack, thank you! I have observed both of those occasional failures myself. The warnings about pending operations on #sql-ib files could have been introduced by MDEV-15053.

Generated at Thu Feb 08 09:12:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.