[MDEV-27865] buf_page_read_complete assertions during during innodb_init Created: 2022-02-16  Updated: 2022-06-07  Resolved: 2022-06-07

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.13
Fix Version/s: 10.6.9, 10.7.5, 10.8.4, 10.9.2

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Fedora 35


Issue Links:
Duplicate
duplicates MDEV-13542 Crashing on a corrupted page is unhel... Closed

 Description   

From: faf

20 instances from last report 2022-02-15.

Two stack traces, similar at frame 11 (buf_page_get_low) and lower.

Example:

faf report 314639

Frame # 	Function 	Binary 	Source or offset 	Line
1 	
kill
	/lib64/libc.so.6 	0x4298b
Build id: 644dac2c66a6e0b32674f0ec71e7431bd0c06a63 	-
2 	
handle_fatal_signal
	/usr/libexec/mariadbd 	0x98265d
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
3 	
	/lib64/libc.so.6 	0x42750
Build id: 644dac2c66a6e0b32674f0ec71e7431bd0c06a63 	-
4 	
__pthread_kill_implementation
	/lib64/libc.so.6 	0x8f84c
Build id: 644dac2c66a6e0b32674f0ec71e7431bd0c06a63 	-
5 	
raise
	/lib64/libc.so.6 	0x426a6
Build id: 644dac2c66a6e0b32674f0ec71e7431bd0c06a63 	-
6 	
abort
	/lib64/libc.so.6 	0x2c7d3
Build id: 644dac2c66a6e0b32674f0ec71e7431bd0c06a63 	-
7 	
ib_wqueue_add(ib_wqueue_t*, void*, mem_block_info_t*, bool) [clone .cold]
	/usr/libexec/mariadbd 	0x598826
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
8 	
buf_page_read_complete(buf_page_t*, fil_node_t const&) [clone .cold]
	/usr/libexec/mariadbd 	0x59e056
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
9 	
buf_read_page_low(dberr_t*, fil_space_t*, bool, unsigned long, page_id_t, unsigned long, bool) [clone .lto_priv.0]
	/usr/libexec/mariadbd 	0xd5e967
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
10 	
buf_read_page(page_id_t, unsigned long)
	/usr/libexec/mariadbd 	0xd5f4c8
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
11 	
buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned int, mtr_t*, dberr_t*, bool)
	/usr/libexec/mariadbd 	0xd5f71f
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
12 	
dict_boot()
	/usr/libexec/mariadbd 	0x5a0910
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
13 	
srv_start(bool)
	/usr/libexec/mariadbd 	0x592326
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
14 	
innodb_init(void*) [clone .lto_priv.0]
	/usr/libexec/mariadbd 	0xc3b939
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
15 	
ha_initialize_handlerton(st_plugin_int*)
	/usr/libexec/mariadbd 	0x983902
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
16 	
plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool)
	/usr/libexec/mariadbd 	0x74a3fe
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
17 	
plugin_init(int*, char**, int)
	/usr/libexec/mariadbd 	0x74f334
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
18 	
init_server_components()
	/usr/libexec/mariadbd 	0x6050ab
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
19 	
mysqld_main(int, char**)
	/usr/libexec/mariadbd 	0x60a2fb
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-
20 	
__libc_start_call_main
	/lib64/libc.so.6 	0x2d560
Build id: 644dac2c66a6e0b32674f0ec71e7431bd0c06a63 	-
21 	
__libc_start_main@@GLIBC_2.34
	/lib64/libc.so.6 	0x2d60c
Build id: 644dac2c66a6e0b32674f0ec71e7431bd0c06a63 	-
22 	
_start
	/usr/libexec/mariadbd 	0x5f5a35
Build id: 7246c3e7348acc8b2e3e369599d9a84f006bae17 	-

The assertion was added in 2d98b967e316 / MDEV-26865

Report Build ID MariaDB version Stating point
faf report 314639 7246c3e7348acc8b2e3e369599d9a84f006bae17 10.5.13 dict_boot
faf report 352736 7246c3e7348acc8b2e3e369599d9a84f006bae17 10.5.13 trx_rseg_array_init

BuildID to MariaDB version

$ DEBUGINFOD_URLS=https://debuginfod.fedoraproject.org/ debuginfod-find executable 7246c3e7348acc8b2e3e369599d9a84f006bae17
 
/home/dan/.cache/debuginfod_client/7246c3e7348acc8b2e3e369599d9a84f006bae17/executable
 
$ chmod a+x /home/dan/.cache/debuginfod_client/7246c3e7348acc8b2e3e369599d9a84f006bae17/executable
 
$ /home/dan/.cache/debuginfod_client/7246c3e7348acc8b2e3e369599d9a84f006bae17/executable --version
/home/dan/.cache/debuginfod_client/7246c3e7348acc8b2e3e369599d9a84f006bae17/executable  Ver 10.5.13-MariaDB for Linux on x86_64 (MariaDB Server)

I don't have access to original stack traces. Just copying from faf with a little information extraction.



 Comments   
Comment by Marko Mäkelä [ 2022-02-17 ]

Thank you, this is an interesting concept.

The stack traces look invalid. The only callers of ib_wqueue_add() in mariadb-10.5.13 are add_msg() in fts0opt.cc and trx_finalize_for_fts() on transaction commit. The buffer pool (such as buf_page_read_complete()) does not directly interact with the fts_optimize_wq or dict_table_t::fts->add_wq.

Can you try to determine the correct stack traces?

Comment by Daniel Black [ 2022-02-17 ]

Yep, I see. Taking the compiler can take identical looking bits of code and make them the same, sure, calling ib_wqueue_add wrong.

Looking at offsets as authentic:

buf_page_read_complete 0x59e056 from faf report 314639

 
   0x000000000059e03a <+391>:	call   0x56c36c <_ZN2ib6loggerC2Ev.lto_priv.0>
   0x000000000059e03f <+396>:	lea    0xb7804a(%rip),%rsi        # 0x1116090
   0x000000000059e046 <+403>:	mov    %r14,%rdi
   0x000000000059e049 <+406>:	call   0x56de26 <_ZN2ib6loggerlsILm9EEERS0_RAT__Kc>
   0x000000000059e04e <+411>:	mov    %r14,%rdi
   0x000000000059e051 <+414>:	call   0x5987d2 <_ZN2ib5fatalD2Ev>
   0x000000000059e056 <+419>:	mov    $0x43,%r14d
 
(gdb) p ((char *)0x1116090)
$12 = 0x1116090 "Aborting because of a corrupt database page."

So the following call in buf_page_read_complete after this error is buf_corrupt_page_release. Would this be alias of ib_wqueue_add? Doesn't look similar enough, but I don't have a compiler mind, but at least it has a cold attribute and contains an abort. clone seems to be a copy of the function under specific preconditions (why a compiler would do that to a tiny cold function - no idea).

So a close as random un-reproducable corruption?

Comment by Marko Mäkelä [ 2022-02-17 ]

Thank you. It could simply be that the database was corrupted for whatever reason, and we failed to start up. Another case of MDEV-13542.

We could try to fix the crash (just make InnoDB fail to start up), possibly as part of MDEV-13542, or as a special case for this particular call trace.

Comment by Marko Mäkelä [ 2022-05-16 ]

I think that this crash should be prevented in the MDEV-13542 branch that is currently under testing. But, the development branch is based on 10.6. Porting and testing the changes for earlier major versions could involve a lot of risk and effort.

Generated at Thu Feb 08 09:56:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.