[MDEV-26733] assert on shutdown lock->lock_word == X_LOCK_DECR in test Created: 2021-09-30  Updated: 2023-12-14

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Vladislav Vaintroub Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 0
Labels: not-10.6, regression-10.5

Issue Links:
Relates
relates to MDEV-17340 innodb.undo_truncate failed in buildb... Closed
relates to MDEV-20481 innodb.blob-crash crashes during reco... Closed
relates to MDEV-26450 Corruption due to innodb_undo_log_tru... Closed
relates to MDEV-33009 Server hangs for a long time with inn... Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/27582/steps/test/logs/stdio shows an assertion

bb-10.5-mdev-26539 ad74dfc98a6d670802d6ca74476bf068e2779f5e

server!ut_dbg_assertion_failed [D:\winx64-debug\build\src\storage\innobase\ut\ut0dbg.cc @ 61]
char * expr = 0x00007ff9`cd4da078 "lock->lock_word == X_LOCK_DECR"
char * file = 0x00007ff9`cd4da128 "D:\winx64-debug\build\src\storage\innobase\sync\sync0rw.cc"
unsigned int line = 0xfe
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!rw_lock_free_func [D:\winx64-debug\build\src\storage\innobase\sync\sync0rw.cc @ 254]
struct rw_lock_t * lock = 0x00000219`e7bdfdf8
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!pfs_rw_lock_free_func [D:\winx64-debug\build\src\storage\innobase\include\sync0rw.ic @ 612]
struct rw_lock_t * lock = 0x00000219`e7bdfdf8
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!buf_block_free_mutexes [D:\winx64-debug\build\src\storage\innobase\buf\buf0buf.cc @ 1377]
struct buf_block_t * block = 0x00000219`e9c412b0
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!buf_pool_t::close [D:\winx64-debug\build\src\storage\innobase\buf\buf0buf.cc @ 1539]
unsigned int64 i = 0x15f7
struct buf_block_t * block = 0x00000219`e9c412b0
struct buf_pool_t::chunk_t * chunk = 0x00000219`e4f1e6a8
class buf_pool_t * this = 0x00007ff9`cf087580
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!innodb_shutdown [D:\winx64-debug\build\src\storage\innobase\srv\srv0start.cc @ 2109]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

there are no other innodb threads at this moment



 Comments   
Comment by Marko Mäkelä [ 2021-10-01 ]

The assertion failure says that a buffer page latch is in reserved state while the buffer pool is being freed. Because no other InnoDB threads exist, this ought to have been caused by some earlier race condition or memory corruption. If the bug is in InnoDB, that

The 10.6 or later equivalent (if this were to occur there) would be the assertion failure !is_locked_or_waiting() in srw_mutex_impl::destroy(), or some assertion failure in sux_lock<ssux_lock_impl<true>>::free(). Also that call stack should include buf_block_free_mutexes().

The following in the log is misleading:

innodb.innodb-alter-tempfile '32k,innodb' w2 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2021-09-30 15:05:16
line
210930 15:04:49 [ERROR] mysqld got exception 0x80000003 ;
 
 
Attempting backtrace. You can use the following information to find out
 
 
^ Found warnings in D:/winx64-debug/build/mysql-test/var/2/log/mysqld.1.err

But, in the server error log, nothing failed during the execution of that test. There is also something very strange in that log file: some messages that were written by the client side, not by the server at all. Only the first message in the snippet below is from a server:

2021-09-30 15:04:52 0 [Note] D:/winx64-debug/build/sql//Debug/mariadbd.exe: Shutdown complete
 
CURRENT_TEST: innodb.innodb-alter-tempfile
$ D:/winx64-debug/build/sql//Debug/mariadbd.exe --defaults-group-suffix=.1 --defaults-file=D:/winx64-debug/build/mysql-test/var/2/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --innodb-page-size=32K --innodb-buffer-pool-size=24M --core-file --loose-debug-sync-timeout=300
innodb.undo_truncate '2,4k,innodb'       w1 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2021-09-30 15:04:54
line
210930 15:04:49 [ERROR] mysqld got exception 0x80000003 ;
 
Attempting backtrace. You can use the following information to find out
 
^ Found warnings in D:/winx64-debug/build/mysql-test/var/1/log/mysqld.1.err

That branch is only one commit ahead of 10.5 be803f037f98812410f24b67af61aa7857969dcf, and nothing in that commit should affect this.

Can you repeat this failure when running the test innodb.undo_truncate on Windows for a few hundred or thousand times?

Comment by Marko Mäkelä [ 2023-01-17 ]

My search did not find anything similar in main branches (10.x) since 2018-01-15:

10.2 0292cd0a27418fabe4660fa6f2a92a09

innodb.innodb-alter-tempfile '4k,innodb' w1 [ fail ]
        Test ended at 2018-01-15 23:32:19
 
CURRENT_TEST: innodb.innodb-alter-debug
mysqltest: At line 22: query 'ALTER TABLE t1ć DROP FOREIGN KEY t1c2, RENAME TO ②' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 1114...
CURRENT_TEST: innodb.innodb-alter-tempfile
Version: '10.2.13-MariaDB-debug-log'  socket: ''  port: 16100  Source distribution
180115 23:31:48 [ERROR] mysqld got exception 0xc0000005 ;
mysqld.exe!pfs_rw_lock_x_lock_func()[sync0rw.ic:525]
mysqld.exe!btr_cur_update_in_place()[btr0cur.cc:3718]

The duplicate CURRENT_TEST line in the output looks strange, and the assertion is not exactly the same as the one that seems to have occurred on server shutdown.

In MDEV-24142 (10.6), the assertion in rw_lock_free_func() was replaced with debug assertions in srw_lock::destroy() and sux_lock::free(), which are being called by buf_pool_t::close() on shutdown. But, those assertions never seem to fail.

In 10.5, this assertion appears to fail in the test innodb.undo_truncate only, rather frequently.

I have some evidence that on Windows, the standard output of the mtr test harness may be corrupted. I will assume that this failure occurred during the execution of the test innodb.undo_truncate.

Comment by Elena Stepanova [ 2023-09-07 ]

Still fails
https://buildbot.mariadb.org/#/builders/534/builds/8970/steps/7/logs/stdio

10.5 53499cd1e

#2  0x000055807bc98eef in handle_fatal_signal (sig=6) at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/sql/signal_handler.cc:360
        curr_time = 1693306128
        tm = {tm_sec = 48, tm_min = 48, tm_hour = 10, tm_mday = 29, tm_mon = 7, tm_year = 123, tm_wday = 2, tm_yday = 240, tm_isdst = 0, tm_gmtoff = 0, tm_zone = 0x7f663a007b1b "UTC"}
        thd = 0x0
        print_invalid_query_pointer = false
#3  <signal handler called>
No symbol table info available.
#4  0x00007f6639ec4a7c in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#5  0x00007f6639e70476 in raise () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#6  0x00007f6639e567f3 in abort () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#7  0x000055807c368dd4 in ut_dbg_assertion_failed (expr=0x55807ca60c78 "lock->lock_word == X_LOCK_DECR", file=0x55807ca60b50 "/home/buildbot/amd64-ubuntu-2204-debug-ps/build/storage/innobase/sync/sync0rw.cc", line=254) at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/storage/innobase/ut/ut0dbg.cc:60
No locals.
#8  0x000055807c30cb10 in rw_lock_free_func (lock=0x55807eaaab78) at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/storage/innobase/sync/sync0rw.cc:254
        __PRETTY_FUNCTION__ = "void rw_lock_free_func(rw_lock_t*)"
#9  0x000055807c3de460 in pfs_rw_lock_free_func (lock=0x55807eaaab78) at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/storage/innobase/include/sync0rw.inl:611
No locals.
#10 0x000055807c3e109c in buf_block_free_mutexes (block=0x7f662e0d9c28) at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/storage/innobase/buf/buf0buf.cc:1196
No locals.
#11 0x000055807c3e23e6 in buf_pool_t::close (this=0x55807d0b3340 <buf_pool>) at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/storage/innobase/buf/buf0buf.cc:1363
        i = 3157
        block = 0x7f662e0d9c28
        chunk = 0x55807e8d8028
        __PRETTY_FUNCTION__ = "void buf_pool_t::close()"
#12 0x000055807c3043b5 in innodb_shutdown () at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/storage/innobase/srv/srv0start.cc:2098
        __PRETTY_FUNCTION__ = "void innodb_shutdown()"
#13 0x000055807c0aaa07 in innobase_end () at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/storage/innobase/handler/ha_innodb.cc:4137
        thd = 0x0
#14 0x000055807bc9c568 in ha_finalize_handlerton (plugin=0x55807e6ff030) at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/sql/handler.cc:580
        hton = 0x55807e7c98d8
        __PRETTY_FUNCTION__ = "int ha_finalize_handlerton(st_plugin_int*)"
#15 0x000055807b94f793 in plugin_deinitialize (plugin=0x55807e6ff030, ref_check=true) at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/sql/sql_plugin.cc:1265
        __PRETTY_FUNCTION__ = "void plugin_deinitialize(st_plugin_int*, bool)"
#16 0x000055807b94fc88 in reap_plugins () at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/sql/sql_plugin.cc:1341
        count = 68
        plugin = 0x55807e6ff030
        reap = 0x7ffc08cc0220
        list = 0x7ffc08cc0140
        __PRETTY_FUNCTION__ = "void reap_plugins()"
#17 0x000055807b9522b8 in plugin_shutdown () at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/sql/sql_plugin.cc:2093
        i = 68
        count = 68
        plugins = 0x7ffc08cc0370
        plugin = 0x55807e706aa0
        dl = 0x7ffc08cc0380
#18 0x000055807b7f16b5 in clean_up (print_message=true) at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/sql/mysqld.cc:1995
No locals.
#19 0x000055807b7f8db4 in mysqld_main (argc=169, argv=0x55807e5d5760) at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/sql/mysqld.cc:5733
        please_close_stdin = true
        ho_error = 0
        new_thread_stack_size = 299008
        user = 0x7f65fc22ae38 "\330\246\031\374e\177"
#20 0x000055807b7edbad in main (argc=28, argv=0x7ffc08cc05d8) at /home/buildbot/amd64-ubuntu-2204-debug-ps/build/sql/main.cc:25

bb-10.5-release 8852afe31

innodb.undo_truncate '2,4k,innodb'       w6 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2023-08-08 11:35:50
line
230808 11:35:46 [ERROR] mysqld got exception 0x80000003 ;
Attempting backtrace. You can use the following information to find out
^ Found warnings in D:/Buildbot/amd64-windows/build/mysql-test/var/6/log/mysqld.1.err
ok
 - found 'mariadbd.dmp' (0/5)
Trying 'cdb' to get a backtrace
OS debug symbols will be downloaded and stored in C:\symbols.
You can control the location of symbol cache with _NT_SYMBOL_PATH
environment variable. Please refer to Microsoft KB article
http://support.microsoft.com/kb/311503  for details about _NT_SYMBOL_PATH
-------------------------------------------------------------------------
Output from cdb follows. Faulting thread is printed twice,with and without function parameters
Search for STACK_TEXT to see the stack trace of 
the faulting thread. Callstacks of other threads are printed after it.
Microsoft (R) Windows Debugger Version 10.0.19041.685 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [D:\Buildbot\amd64-windows\build\mysql-test\var\6\log\innodb.undo_truncate-2,4k,innodb\mysqld.1\data\mariadbd.dmp]
User Mini Dump File: Only registers, stack and portions of memory are available
Response                         Time (ms)     Location
OK                                             .
Response                         Time (ms)     Location
OK                                             .
Deferred                                       srv*C:\symbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: .;srv*C:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: .
Windows 10 Version 17763 MP (64 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
17763.1.amd64fre.rs5_release.180914-1434
Machine Name:
Debug session time: Tue Aug  8 11:35:47.000 2023 (UTC + 2:00)
System Uptime: not available
Process Uptime: 0 days 0:00:43.000
.............................
This dump file has a breakpoint exception stored in it.
The stored exception information can be accessed via .ecxr.
For analysis of this file, run !analyze -v
ntdll!NtGetContextThread:
             ret
0:000> cdb: Reading initial command '!sym prompts off; !analyze -v; .ecxr; !for_each_frame dv /t;!uniqstack -p;q'
quiet mode - symbol prompts off
KEY_VALUES_STRING: 1
    Key  : Analysis.CPU.Sec
    Value: 0
    Key  : Analysis.DebugAnalysisProvider.CPP
    Value: Create: 8007007e on WIN-HI7PPJVFI2B
    Key  : Analysis.DebugData
    Value: CreateObject
    Key  : Analysis.DebugModel
    Value: CreateObject
    Key  : Analysis.Elapsed.Sec
    Value: 0
    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 198
    Key  : Analysis.System
    Value: CreateObject
    Key  : Timeline.Process.Start.DeltaSec
    Value: 43
CONTEXT:  (.ecxr)
rax=00007fff9b814100 rbx=0000000000000016 rcx=0000000000000016
rdx=00007fffd42382f0 rsi=00007fff9b814100 rdi=00000098f69eef01
rip=00007fff9b814105 rsp=00000098f69eeb10 rbp=00000098f69ef2b0
 r8=00000098f69ed518  r9=00000098f69eebc0 r10=0000000000000000
r11=00000098f69eea40 r12=0000000000000910 r13=0000000000000000
r14=0000000000000000 r15=00007fffd42db6f0
iopl=0         nv up ei pl nz ac pe nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000212
server!my_sigabrt_handler:
             int     3
Resetting default scope
EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007fff9b814105 (server!my_sigabrt_handler)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 1
   Parameter[0]: 0000000000000000
WRONG_SYMBOLS_TIMESTAMP: 7dd1a0eb
WRONG_SYMBOLS_SIZE: 1ee000
FAULTING_MODULE: 00007fffd7400000 ntdll
ADDITIONAL_DEBUG_TEXT:  
You can run '.symfix; .reload' to try to fix the symbol path and load symbols. ; Followup set based on attribute [Is_ChosenCrashFollowupThread] from Frame:[0] on thread:[PSEUDO_THREAD]
STACK_TEXT:  
server!my_sigabrt_handler
ucrtbase!raise
ucrtbase!abort
server!ut_dbg_assertion_failed
server!rw_lock_free_func
server!pfs_rw_lock_free_func
server!buf_block_free_mutexes
server!buf_pool_t::close
server!innodb_shutdown
server!innobase_end
server!ha_finalize_handlerton
server!plugin_deinitialize
server!reap_plugins
server!plugin_shutdown
server!clean_up
server!win_main
server!mysql_service
server!mysqld_main
mariadbd!main
mariadbd!__scrt_common_main_seh
kernel32!BaseThreadInitThunk
ntdll!RtlUserThreadStart
STACK_COMMAND:  .ecxr ; kb ; ** Pseudo Context ** Pseudo ** Value: 1ede2220320 ** ; kb
FAULTING_SOURCE_LINE:  D:\Buildbot\amd64-windows\build\mysys\my_thr_init.c
FAULTING_SOURCE_FILE:  D:\Buildbot\amd64-windows\build\mysys\my_thr_init.c
FAULTING_SOURCE_LINE_NUMBER:  465
FAULTING_SOURCE_CODE:  
   461: 
   462: #if (_MSC_VER >= 1400)
   463: static void my_sigabrt_handler(int sig)
   464: {
>  465:   __debugbreak();
   466: }
   467: #endif /*_MSC_VER >=1400 */
   468: 
   469: static void install_sigabrt_handler(void)
   470: {
BUGCHECK_CODE:  7dd1a0eb
EXCEPTION_CODE_STR:  7DD1A0EB
EXCEPTION_STR:  WRONG_SYMBOLS
PROCESS_NAME:  ntdll.wrong.symbols.dll
IMAGE_NAME:  ntdll.wrong.symbols.dll
MODULE_NAME: ntdll_wrong_symbols
SYMBOL_NAME:  ntdll_wrong_symbols!7DD1A0EB1EE000
FAILURE_BUCKET_ID:  WRONG_SYMBOLS_X64_17763.1.amd64fre.rs5_release.180914-1434_TIMESTAMP_361121-142339_7DD1A0EB_ntdll.wrong.symbols.dll!7DD1A0EB1EE000
OS_VERSION:  10.0.17763.1
BUILDLAB_STR:  rs5_release
OSPLATFORM_TYPE:  x64
OSNAME:  Windows 10
FAILURE_ID_HASH:  {7d129e52-0ab2-f1e4-4813-57f69169ff25}
Followup:     MachineOwner
---------
rax=00007fff9b814100 rbx=0000000000000016 rcx=0000000000000016
rdx=00007fffd42382f0 rsi=00007fff9b814100 rdi=00000098f69eef01
rip=00007fff9b814105 rsp=00000098f69eeb10 rbp=00000098f69ef2b0
 r8=00000098f69ed518  r9=00000098f69eebc0 r10=0000000000000000
r11=00000098f69eea40 r12=0000000000000910 r13=0000000000000000
r14=0000000000000000 r15=00007fffd42db6f0
iopl=0         nv up ei pl nz ac pe nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000212
server!my_sigabrt_handler:
             int     3
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!my_sigabrt_handler [D:\Buildbot\amd64-windows\build\mysys\my_thr_init.c @ 465]
int sig = 0n22
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ucrtbase!raise
Unable to enumerate locals, Win32 error 0n318
Private symbols (symbols.pri) are required for locals.
Type ".hh dbgerr005" for details.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ucrtbase!abort
Unable to enumerate locals, Win32 error 0n318
Private symbols (symbols.pri) are required for locals.
Type ".hh dbgerr005" for details.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!ut_dbg_assertion_failed [D:\Buildbot\amd64-windows\build\storage\innobase\ut\ut0dbg.cc @ 61]
char * expr = 0x00007fff`9d4bc588 "lock->lock_word == X_LOCK_DECR"
char * file = 0x00007fff`9d4bceb0 "D:\Buildbot\amd64-windows\build\storage\innobase\sync\sync0rw.cc"
unsigned int line = 0xfe
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!rw_lock_free_func [D:\Buildbot\amd64-windows\build\storage\innobase\sync\sync0rw.cc @ 254]
struct rw_lock_t * lock = 0x00000266`4d6a5368
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!pfs_rw_lock_free_func [D:\Buildbot\amd64-windows\build\storage\innobase\include\sync0rw.inl @ 612]
struct rw_lock_t * lock = 0x00000266`4d6a5368
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!buf_block_free_mutexes [D:\Buildbot\amd64-windows\build\storage\innobase\buf\buf0buf.cc @ 1197]
struct buf_block_t * block = 0x00000266`4bf42660
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!buf_pool_t::close [D:\Buildbot\amd64-windows\build\storage\innobase\buf\buf0buf.cc @ 1363]
unsigned int64 i = 0xb7a
struct buf_block_t * block = 0x00000266`4bf42660
struct buf_pool_t::chunk_t * chunk = 0x00000266`4718b3e8
class buf_pool_t * this = 0x00007fff`9f1b6780
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!innodb_shutdown [D:\Buildbot\amd64-windows\build\storage\innobase\srv\srv0start.cc @ 2099]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!innobase_end [D:\Buildbot\amd64-windows\build\storage\innobase\handler\ha_innodb.cc @ 4137]
class THD * thd = 0x00000000`00000000
struct handlerton * __formal = 0x00000266`49c49418
ha_panic_function __formal = HA_PANIC_CLOSE (0n0)
struct _db_stack_frame_ _db_stack_frame_ = struct _db_stack_frame_
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!ha_finalize_handlerton [D:\Buildbot\amd64-windows\build\sql\handler.cc @ 582]
struct st_plugin_int * plugin = 0x00000266`459aa430
struct handlerton * hton = 0x00000266`49c49418
struct _db_stack_frame_ _db_stack_frame_ = struct _db_stack_frame_
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!plugin_deinitialize [D:\Buildbot\amd64-windows\build\sql\sql_plugin.cc @ 1265]
struct st_plugin_int * plugin = 0x00000266`459aa430
bool ref_check = true
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!reap_plugins [D:\Buildbot\amd64-windows\build\sql\sql_plugin.cc @ 1341]
struct st_plugin_int ** list = 0x00000098`f69ef110
struct st_plugin_int ** reap = 0x00000098`f69ef1e8
unsigned int count = 0x42
struct st_plugin_int * plugin = 0x00000266`459aa430
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!plugin_shutdown [D:\Buildbot\amd64-windows\build\sql\sql_plugin.cc @ 2094]
unsigned int i = 0x42
unsigned int count = 0x42
struct _db_stack_frame_ _db_stack_frame_ = struct _db_stack_frame_
struct st_plugin_int ** plugins = 0xcccccccc`cccccccc
struct st_plugin_int * plugin = 0x00000266`49b2ec20
struct st_plugin_dl ** dl = 0xcccccccc`cccccccc
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!clean_up [D:\Buildbot\amd64-windows\build\sql\mysqld.cc @ 1996]
bool print_message = true
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!win_main [D:\Buildbot\amd64-windows\build\sql\mysqld.cc @ 5738]
int argc = 0n168
char ** argv = 0x00000266`459e1a60
char * user = 0x00000266`4dce9ab8 "--- memory read error at address 0x00000266`4dce9ab8 ---"
int ho_error = 0n0
unsigned int64 new_thread_stack_size = 0x49000
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!mysql_service [D:\Buildbot\amd64-windows\build\sql\mysqld.cc @ 5782]
void * p = 0x00000000`00000000
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!mysqld_main [D:\Buildbot\amd64-windows\build\sql\mysqld.cc @ 6000]
int argc = 0n28
char ** argv = 0x00000266`459b9940
char [512] file_path = char [512] "D:\Buildbot\amd64-windows\build\sql\Debug\mariadbd.exe"
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mariadbd!main [D:\Buildbot\amd64-windows\build\sql\main.cc @ 26]
int argc = 0n28
char ** argv = 0x00000266`459b9940
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
(Inline Function) --------`-------- mariadbd!invoke_main [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 78]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mariadbd!__scrt_common_main_seh [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288]
bool has_cctor = false
int main_result = <value unavailable>
<function> ** tls_init_callback = <value unavailable>
bool is_nested = <value unavailable>
<function> ** tls_dtor_callback = <value unavailable>
int main_result = <value unavailable>
__scrt_native_startup_state __scrt_current_native_startup_state = <value unavailable>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
kernel32!BaseThreadInitThunk
Unable to enumerate locals, Win32 error 0n318
Private symbols (symbols.pri) are required for locals.
Type ".hh dbgerr005" for details.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ntdll!RtlUserThreadStart
Unable to enumerate locals, Win32 error 0n318
Private symbols (symbols.pri) are required for locals.
Type ".hh dbgerr005" for details.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!my_sigabrt_handler [D:\Buildbot\amd64-windows\build\mysys\my_thr_init.c @ 465]
Processing 8 threads, please wait
.  0  Id: 1dec.4940 Suspend: 0 Teb: 00000098`f6a8b000 Unfrozen
      Priority: 0  Priority class: 32
ntdll!NtGetContextThread
0xffffffff`00000000
.  1  Id: 1dec.5700 Suspend: 0 Teb: 00000098`f6a8d000 Unfrozen
      Priority: 0  Priority class: 32
ntdll!NtWaitForWorkViaWorkerFactory
ntdll!RtlAcquireSRWLockExclusive
kernel32!BaseThreadInitThunk
ntdll!RtlUserThreadStart
.  3  Id: 1dec.3c10 Suspend: 0 Teb: 00000098`f6a93000 Unfrozen
      Priority: 0  Priority class: 32
ntdll!ZwWaitForAlertByThreadId
ntdll!RtlSleepConditionVariableCS
KERNELBASE!SleepConditionVariableCS
server!pthread_cond_timedwait(struct _RTL_CONDITION_VARIABLE * cond = 0x00007fff`9e774b20, struct _RTL_CRITICAL_SECTION * mutex = 0x00007fff`9e774af0, struct timespec * abstime = 0x00000098`f6fffac8) [D:\Buildbot\amd64-windows\build\mysys\my_wincond.c @ 82]
server!inline_mysql_cond_timedwait(struct st_mysql_cond * that = 0x00007fff`9e774b20, struct st_mysql_mutex * mutex = 0x00007fff`9e774af0, struct timespec * abstime = 0x00000098`f6fffac8, char * src_file = 0x00007fff`9e40eb58 "--- memory read error at address 0x00007fff`9e40eb58 ---", unsigned int src_line = 0x141) [D:\Buildbot\amd64-windows\build\include\mysql\psi\mysql_thread.h @ 1259]
server!timer_handler(void * arg = 0x00000000`00000000) [D:\Buildbot\amd64-windows\build\mysys\thr_timer.c @ 322]
server!pfs_spawn_thread(void * arg = 0x00000266`49962578) [D:\Buildbot\amd64-windows\build\storage\perfschema\pfs.cc @ 2203]
server!pthread_start(void * p = 0x00000266`459c6be0) [D:\Buildbot\amd64-windows\build\mysys\my_winthread.c @ 61]
ucrtbase!o_exp
kernel32!BaseThreadInitThunk
ntdll!RtlUserThreadStart
.  4  Id: 1dec.6590 Suspend: 0 Teb: 00000098`f6a95000 Unfrozen
      Priority: 0  Priority class: 32
ntdll!ZwWaitForAlertByThreadId
ntdll!RtlSleepConditionVariableCS
KERNELBASE!SleepConditionVariableCS
server!pthread_cond_timedwait(struct _RTL_CONDITION_VARIABLE * cond = 0x00007fff`9f11abb0, struct _RTL_CRITICAL_SECTION * mutex = 0x00007fff`9f11ab80, struct timespec * abstime = 0x00000098`f70ffc88) [D:\Buildbot\amd64-windows\build\mysys\my_wincond.c @ 82]
server!inline_mysql_cond_timedwait(struct st_mysql_cond * that = 0x00007fff`9f11abb0, struct st_mysql_mutex * mutex = 0x00007fff`9f11ab80, struct timespec * abstime = 0x00000098`f70ffc88, char * src_file = 0x00007fff`9e6c6650 "--- memory read error at address 0x00007fff`9e6c6650 ---", unsigned int src_line = 0x74) [D:\Buildbot\amd64-windows\build\include\mysql\psi\mysql_thread.h @ 1259]
server!my_service_thread_sleep(struct st_ma_service_thread_control * control = 0x00007fff`9e67f4c8, unsigned int64 sleep_time = 0x00000006`fc23ac00) [D:\Buildbot\amd64-windows\build\storage\maria\ma_servicethread.c @ 119]
server!ma_checkpoint_background(void * arg = 0x00000000`0000001e) [D:\Buildbot\amd64-windows\build\storage\maria\ma_checkpoint.c @ 725]
server!pfs_spawn_thread(void * arg = 0x00000266`49d32ba8) [D:\Buildbot\amd64-windows\build\storage\perfschema\pfs.cc @ 2203]
server!pthread_start(void * p = 0x00000266`459c6b00) [D:\Buildbot\amd64-windows\build\mysys\my_winthread.c @ 61]
ucrtbase!o_exp
kernel32!BaseThreadInitThunk
ntdll!RtlUserThreadStart
Total threads: 8
Duplicate callstacks: 4 (windbg thread #s follow):
2, 5, 6, 7

Comment by Marko Mäkelä [ 2023-11-03 ]

Still fails:

10.5 ee773753778e741c1aff1449fcd41b775701a4db

innodb.undo_truncate '2,4k,innodb'       w24 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2023-10-30 18:50:15
line
231030 18:50:12 [ERROR] mysqld got exception 0x80000003 ;
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!ut_dbg_assertion_failed [D:\Buildbot\amd64-windows\build\storage\innobase\ut\ut0dbg.cc @ 61]
char * expr = 0x00007fff`81f17ef8 "lock->lock_word == X_LOCK_DECR"
char * file = 0x00007fff`81f18820 "D:\Buildbot\amd64-windows\build\storage\innobase\sync\sync0rw.cc"
unsigned int line = 0xfe
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!rw_lock_free_func [D:\Buildbot\amd64-windows\build\storage\innobase\sync\sync0rw.cc @ 254]
struct rw_lock_t * lock = 0x000002d8`bdf01658
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!pfs_rw_lock_free_func [D:\Buildbot\amd64-windows\build\storage\innobase\include\sync0rw.inl @ 612]
struct rw_lock_t * lock = 0x000002d8`bdf01658
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!buf_block_free_mutexes [D:\Buildbot\amd64-windows\build\storage\innobase\buf\buf0buf.cc @ 1197]
struct buf_block_t * block = 0x000002d8`bc7909a0
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!buf_pool_t::close [D:\Buildbot\amd64-windows\build\storage\innobase\buf\buf0buf.cc @ 1363]
unsigned int64 i = 0xb8e
struct buf_block_t * block = 0x000002d8`bc7909a0
struct buf_pool_t::chunk_t * chunk = 0x000002d8`b79bb388
class buf_pool_t * this = 0x00007fff`83c0bc40
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
server!innodb_shutdown [D:\Buildbot\amd64-windows\build\storage\innobase\srv\srv0start.cc @ 2108]

I do not see stack traces of other threads that would execute inside InnoDB. I think that to analyze this, we would need a larger dump that includes a copy of the buffer pool, so that we would have some more clues on what might have failed to release the page latch.

Comment by Marko Mäkelä [ 2023-12-14 ]

Another debug run catches a forgotten page latch release:

10.5 fbd11d5f299a1e399ee5b46f141d6f56db0f7f58

innodb.undo_truncate '2,4k,innodb'       w3 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2023-10-13 01:08:30
line
Attempting backtrace. You can use the following information to find out
^ Found warnings in /dev/shm/var/3/log/mysqld.1.err
ok
 
 - found 'core.515' (0/0)
Core generated by '/usr/sbin/mariadbd'
Output from gdb follows. The first stack trace is from the failing thread.
The following stack traces are from all threads (so the failing one is
duplicated).
--------------------------
[New LWP 515]
[New LWP 518]
[New LWP 519]
[New LWP 542]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/mariadbd --defaults-group-suffix=.1 --defaults-file=/dev/shm/var/3/my'.
Program terminated with signal 6, Aborted.
#0  0x00007f2fe284a9b1 in pthread_kill () from /lib64/libpthread.so.0
 
Thread 4 (Thread 0x7f2fe40b7700 (LWP 542)):
#0  0x00007f2fe1ca33fa in sigwaitinfo () from /lib64/libc.so.6
#1  0x00005621331cdc42 in my_sigwait (code=<synthetic pointer>, sig=0x7f2fe40b6a48, set=0x7f2fe40b6a50) at /usr/src/debug/MariaDB-/src_0/include/my_pthread.h:196
#2  signal_hand (arg=arg@entry=0x0) at /usr/src/debug/MariaDB-/src_0/sql/mysqld.cc:2999
#3  0x0000562133774ad0 in pfs_spawn_thread (arg=0x562139eacf88) at /usr/src/debug/MariaDB-/src_0/storage/perfschema/pfs.cc:2201
#4  0x00007f2fe2845e25 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f2fe1d6534d in clone () from /lib64/libc.so.6
 
Thread 3 (Thread 0x7f2fde9a7700 (LWP 519)):
#0  0x00007f2fe2849cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000562133c2fb6d in safe_cond_timedwait (cond=cond@entry=0x5621350eeea0 <COND_checkpoint>, mp=mp@entry=0x5621350eeee0 <LOCK_checkpoint>, abstime=abstime@entry=0x7f2fde9a6ab0, file=file@entry=0x562133ce6cb8 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.23/include/mysql/psi/mysql_thread.h", line=line@entry=1259) at /usr/src/debug/MariaDB-/src_0/mysys/thr_mutex.c:546
#2  0x00005621337099b7 in inline_mysql_cond_timedwait (src_line=116, src_file=0x562133eff188 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.23/storage/maria/ma_servicethread.c", abstime=0x7f2fde9a6ab0, mutex=0x5621350eeee0 <LOCK_checkpoint>, that=0x5621350eeea0 <COND_checkpoint>) at /usr/src/debug/MariaDB-/src_0/include/mysql/psi/mysql_thread.h:1259
#3  my_service_thread_sleep (control=control@entry=0x56213481bba0 <checkpoint_control>, sleep_time=sleep_time@entry=30000000000) at /usr/src/debug/MariaDB-/src_0/storage/maria/ma_servicethread.c:115
#4  0x0000562133701f16 in ma_checkpoint_background (arg=arg@entry=0x1e) at /usr/src/debug/MariaDB-/src_0/storage/maria/ma_checkpoint.c:725
#5  0x0000562133774ad0 in pfs_spawn_thread (arg=0x562135f17c08) at /usr/src/debug/MariaDB-/src_0/storage/perfschema/pfs.cc:2201
#6  0x00007f2fe2845e25 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f2fe1d6534d in clone () from /lib64/libc.so.6
 
Thread 2 (Thread 0x7f2fe41eb700 (LWP 518)):
#0  0x00007f2fe2849cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000562133c2fb6d in safe_cond_timedwait (cond=cond@entry=0x562135197c80 <COND_timer>, mp=mp@entry=0x562135197cc0 <LOCK_timer>, abstime=abstime@entry=0x7f2fe41eab20, file=file@entry=0x562133ce6cb8 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.23/include/mysql/psi/mysql_thread.h", line=line@entry=1259) at /usr/src/debug/MariaDB-/src_0/mysys/thr_mutex.c:546
#2  0x0000562133c301bd in inline_mysql_cond_timedwait (src_line=321, src_file=0x562134060e70 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.23/mysys/thr_timer.c", abstime=0x7f2fe41eab20, mutex=0x562135197cc0 <LOCK_timer>, that=0x562135197c80 <COND_timer>) at /usr/src/debug/MariaDB-/src_0/include/mysql/psi/mysql_thread.h:1259
#3  timer_handler (arg=arg@entry=0x0) at /usr/src/debug/MariaDB-/src_0/mysys/thr_timer.c:321
#4  0x0000562133774ad0 in pfs_spawn_thread (arg=0x562135f16d08) at /usr/src/debug/MariaDB-/src_0/storage/perfschema/pfs.cc:2201
#5  0x00007f2fe2845e25 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f2fe1d6534d in clone () from /lib64/libc.so.6
 
Thread 1 (Thread 0x7f2fe41eea80 (LWP 515)):
#0  0x00007f2fe284a9b1 in pthread_kill () from /lib64/libpthread.so.0
#1  0x0000562133c2ad64 in my_write_core (sig=6) at /usr/src/debug/MariaDB-/src_0/mysys/stacktrace.c:424
#2  0x000056213350cb6e in handle_fatal_signal (sig=6) at /usr/src/debug/MariaDB-/src_0/sql/signal_handler.cc:360
#3  <signal handler called>
#4  0x00007f2fe1ca21f7 in raise () from /lib64/libc.so.6
#5  0x00007f2fe1ca38e8 in abort () from /lib64/libc.so.6
#6  0x0000562133a35264 in ut_dbg_assertion_failed (expr=expr@entry=0x562133fe3820 "lock->lock_word == X_LOCK_DECR", file=file@entry=0x562133fe3758 "/home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.23/storage/innobase/sync/sync0rw.cc", line=line@entry=254) at /usr/src/debug/MariaDB-/src_0/storage/innobase/ut/ut0dbg.cc:60
#7  0x00005621339eaa2d in rw_lock_free_func (lock=lock@entry=0x56213ac6a318) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/sync0rw.cc:254
#8  0x0000562133a997b0 in pfs_rw_lock_free_func (lock=0x56213ac6a318) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/sync0rw.inl:611
#9  buf_block_free_mutexes (block=block@entry=0x7f2fdc2e4138) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0buf.cc:1196
#10 0x0000562133aa677f in buf_pool_t::close (this=0x56213482b240 <buf_pool>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0buf.cc:1363
#11 0x00005621339de454 in innodb_shutdown () at /usr/src/debug/MariaDB-/src_0/storage/innobase/srv/srv0start.cc:2104
#12 0x000056213380d3c4 in innobase_end () at /usr/src/debug/MariaDB-/src_0/storage/innobase/handler/ha_innodb.cc:4137
#13 0x0000562133510571 in ha_finalize_handlerton (plugin=0x56213a1ac0b0) at /usr/src/debug/MariaDB-/src_0/sql/handler.cc:580
#14 0x00005621332bd81b in plugin_deinitialize (plugin=0x56213a1ac0b0, ref_check=ref_check@entry=true) at /usr/src/debug/MariaDB-/src_0/sql/sql_plugin.cc:1265
#15 0x00005621332bdef8 in reap_plugins () at /usr/src/debug/MariaDB-/src_0/sql/sql_plugin.cc:1341
#16 0x00005621332c0486 in plugin_shutdown () at /usr/src/debug/MariaDB-/src_0/sql/sql_plugin.cc:2093
#17 0x00005621331cd16f in clean_up (print_message=print_message@entry=true) at /usr/src/debug/MariaDB-/src_0/sql/mysqld.cc:1995
#18 0x00005621331d81cd in mysqld_main (argc=169, argv=0x562135f22ee0) at /usr/src/debug/MariaDB-/src_0/sql/mysqld.cc:5735
#19 0x00005621331cae2e in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/MariaDB-/src_0/sql/main.cc:25

In MDEV-33009, axel has reproduced hangs with innodb_undo_log_truncate=ON on 10.5 that can be explained by the fact that something forgets to release a page latch.

I see 3 ways to deal with this:

  1. Upgrade to 10.6 or later.
  2. Port MDEV-24142 to 10.5. (High risk and effort; why not just upgrade to 10.6?)
  3. Produce an rr replay trace of the failure (extremely unlikely to succeed) so that it can be analyzed.
Generated at Thu Feb 08 09:47:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.