[MDEV-28695] InnoDB: Database page corruption on disk or a failed read => mysqld got signal 11 - std::unique_lock<std::mutex>::unlock() Created: 2022-05-29  Updated: 2022-07-03  Resolved: 2022-07-03

Status: Closed
Project: MariaDB Server
Component/s: Locking, Platform Debian, Storage Engine - InnoDB
Affects Version/s: 10.5.15
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: A D Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: None
Environment:
  1. cat /etc/os-release
    PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
    ...
  1. uname -a
    Linux ... 5.10.0-14-amd64 #1 SMP Debian 5.10.113-1 (2022-04-29) x86_64 GNU/Linux
  1. mysqld --version
    mysqld Ver 10.5.15-MariaDB-0+deb11u1-log for debian-linux-gnu on x86_64 (Debian 11)


 Description   

I'm essentially facing two problems.
1)

every few days I'm getting... (db/table/col names manually obfuscated below)

2022-05-29  5:54:28 226 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './xxxxdb/bad_table.ibd' page [page id: space=386040, page number=1446430]. You may have to recover from a backup.
2022-05-29  5:54:28 226 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex ...
InnoDB: End of page dump
2022-05-29  5:54:28 226 [Note] InnoDB: Uncompressed page, stored checksum in field1 642196308, calculated checksums for field1: crc32 3586801619, innodb 1305855102,  page type 10 == BLOB.none 3735928559, stored checksum in field2 642196308, calculated checksums for field2: crc32 3586801619, innodb 1806045908, none 3735928559,  page LSN 3289 3616289816, low 4 bytes of LSN at page end 3616289816, page number (if stored to page already) 1446430, space id (if create with >= MySQL-4.1.1 and stored already) 386040
InnoDB: Page may be a BLOB page
2022-05-29  5:54:28 226 [Note] InnoDB:  You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.

I don't know if this is i) a bug, ii) failing memory or iii) failing disk. It happens with a variety of tables and happens for about 5 seconds, logging this about 20 times per second, before...

Sometimes (I think the pattern probably has more to do with my application querying MariaDB, then MariaDB itself?) it then goes...

2022-05-27  0:03:55 353 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mysqld server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2022-05-27  0:03:55 353 [ERROR] mariadbd: Index for table 'bad_table' is corrupt; try to repair it

but others times it then hits the second, more serious, problem...

2)

220529  5:54:33 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail.
 
Server version: 10.5.15-MariaDB-0+deb11u1-log
key_buffer_size=402653184
read_buffer_size=2097152
max_used_connections=2
max_threads=153
thread_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1963808 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f9a5c000c58
Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong...
stack_bottom = 0x7f9bdc054d78 thread_stack 0x30000
??:0(my_print_stacktrace)[0x55d6fdc5154e]
??:0(handle_fatal_signal)[0x55d6fd750f65]
sigaction.c:0(__restore_rt)[0x7f9be0732140]
??:0(std::unique_lock<std::mutex>::unlock())[0x55d6fdb455b4]
??:0(std::unique_lock<std::mutex>::unlock())[0x55d6fdb51a96]
??:0(std::unique_lock<std::mutex>::unlock())[0x55d6fdaed4ea]
??:0(std::unique_lock<std::mutex>::unlock())[0x55d6fdaed90c]
??:0(std::unique_lock<std::mutex>::unlock())[0x55d6fdaf469d]
??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55d6fda37b42]
??:0(handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function))[0x55d6fd756bd8]
??:0(cp_buffer_from_ref(THD*, TABLE*, st_table_ref*))[0x55d6fd5a8254]
??:0(sub_select(JOIN*, st_join_table*, bool))[0x55d6fd59480e]
??:0(Item_bool_func2::remove_eq_conds(THD*, Item::cond_result*, bool))[0x55d6fd580fec]
??:0(sub_select(JOIN*, st_join_table*, bool))[0x55d6fd5948a3]
??:0(JOIN::exec_inner())[0x55d6fd5bee38]
??:0(JOIN::exec())[0x55d6fd5bf295]
??:0(mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x55d6fd5bd116]
??:0(mysql_multi_update(THD*, TABLE_LIST*, List<Item>*, List<Item>*, Item*, unsigned long long, enum_duplicates, bool, st_select_lex_unit*, st_select_lex*, multi_update**))[0x55d6fd6109ed]
??:0(mysql_execute_command(THD*))[0x55d6fd55b0b4]
??:0(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55d6fd55c5db]
??:0(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x55d6fd55ea5d]
??:0(do_command(THD*))[0x55d6fd5602de]
??:0(do_handle_one_connection(CONNECT*, bool))[0x55d6fd651fb2]
??:0(handle_one_connection)[0x55d6fd65222d]
??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x55d6fd98e11b]
nptl/pthread_create.c:478(start_thread)[0x7f9be0726ea7]
x86_64/clone.S:97(__GI___clone)[0x7f9be033ddef]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f9a5c010470): update bad_table p, table2 n set p.n_col=n.n_col where p.colID=n.colID and n.col1=p.col1 and p.col='xx'
 
Connection ID (thread ID): 226
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /mnt/disk3/mysqldata
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             62978                62978                processes
Max open files            32768                32768                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       62978                62978                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
Core pattern: core
 
2022-05-29  5:54:38 0 [Note] Using unique option prefix 'myisam-recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead.
2022-05-29  5:54:38 0 [Note] CONNECT: Version 1.07.0002 March 22, 2021
2022-05-29  5:54:38 0 [Warning] The parameter innodb_file_format is deprecated and has no effect. It may be removed in future releases. See https://mariadb.com/kb/en/library/xtradbinnodb-file-format/
2022-05-29  5:54:38 0 [Note] InnoDB: !!! innodb_force_recovery is set to 1 !!!
2022-05-29  5:54:38 0 [Note] InnoDB: Uses event mutexes
2022-05-29  5:54:38 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-05-29  5:54:38 0 [Note] InnoDB: Number of pools: 1
2022-05-29  5:54:38 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-05-29  5:54:39 0 [Note] InnoDB: Using Linux native AIO
2022-05-29  5:54:39 0 [Note] InnoDB: Initializing buffer pool, total size = 4294967296, chunk size = 134217728
2022-05-29  5:54:39 0 [Note] InnoDB: Completed initialization of buffer pool
2022-05-29  5:54:39 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=14858023269921,14858023269921
2022-05-29  5:54:40 0 [Note] InnoDB: Starting final batch to recover 17480 pages from redo log.
2022-05-29  5:54:41 0 [Note] InnoDB: 128 rollback segments are active.
2022-05-29  5:54:41 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2022-05-29  5:54:41 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-05-29  5:54:41 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-05-29  5:54:41 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-05-29  5:54:41 0 [Note] InnoDB: 10.5.15 started; log sequence number 14858195549968; transaction id 5082853300
2022-05-29  5:54:41 0 [Note] InnoDB: Loading buffer pool(s) from /mnt/disk3/mysqldata/ib_buffer_pool
2022-05-29  5:54:41 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-05-29  5:54:41 0 [ERROR] mariadbd: Plugin 'CONNECT' already installed
2022-05-29  5:54:41 0 [Note] Server socket created on IP: '0.0.0.0'.
2022-05-29  5:54:41 0 [Note] Reading of all Master_info entries succeeded
2022-05-29  5:54:41 0 [Note] Added new Master_info '' to hash table
2022-05-29  5:54:41 0 [Note] /usr/sbin/mariadbd: ready for connections.
Version: '10.5.15-MariaDB-0+deb11u1-log'  socket: '/run/mysqld/mysqld.sock'  port: 3306  Debian 11
2022-05-29  5:54:41 5 [Warning] ./sqlite3/xxxx.frm is inconsistent: engine typecode 44, engine name CONNECT (46)
2022-05-29  5:54:41 0 [Note] InnoDB: Buffer pool(s) load completed at 220529  5:54:41

once MariaDB is running again, my application starts querying again and the process repeats. I usually stop my application, mysqlcheck/repair/restore the table before re starting it again.

The segfault is caused by a variety of different queries accessing the corrupt table(s) in different ways, but each time it's the std::unique_lock<std::mutex>::unlock() that seems to be responsible for the segfault. It's particularly sad when the resulting CHECK TABLE/OPTIMIZE TABLE cause the segfault too :-/ e.g.

Thread pointer: 0x7fa500003648
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fa678090d78 thread_stack 0x30000
??:0(my_print_stacktrace)[0x55d8e417954e]
??:0(handle_fatal_signal)[0x55d8e3c78f65]
sigaction.c:0(__restore_rt)[0x7fa67c481140]
??:0(std::unique_lock<std::mutex>::unlock())[0x55d8e3ffba71]
??:0(std::unique_lock<std::mutex>::unlock())[0x55d8e3ffd13a]
??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55d8e3f77f15]
??:0(mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool, bool))[0x55d8e3b247cb]
??:0(mysql_recreate_table(THD*, TABLE_LIST*, bool))[0x55d8e3b25017]
??:0(MDL_ticket::~MDL_ticket())[0x55d8e3b84f3d]
??:0(MDL_ticket::~MDL_ticket())[0x55d8e3b86dcc]
??:0(Sql_cmd_optimize_table::execute(THD*))[0x55d8e3b87f0d]
??:0(mysql_execute_command(THD*))[0x55d8e3a80356]
??:0(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55d8e3a845db]
??:0(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x55d8e3a86a5d]
??:0(do_command(THD*))[0x55d8e3a882de]
??:0(do_handle_one_connection(CONNECT*, bool))[0x55d8e3b79fb2]
??:0(handle_one_connection)[0x55d8e3b7a22d]
??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x55d8e3eb611b]
nptl/pthread_create.c:478(start_thread)[0x7fa67c475ea7]
x86_64/clone.S:97(__GI___clone)[0x7fa67c08cdef]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fa500011cd0): OPTIMIZE TABLE `bad_table`
 
Connection ID (thread ID): 212
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off

I don't suppose there's much that can easily be done to track down the first problem, but it would be great if the segfault could be fixed, so at least MariaDB stays up and carries on handling queries for the non-corrupt tables.



 Comments   
Comment by Marko Mäkelä [ 2022-05-30 ]

Can you please try to produce a full stack trace with a debugger? The built-in stack trace report seems to produce garbage in this case.

There is a chance that this will be fixed in MDEV-13542 which I am currently working on, but I can’t say it for sure without seeing a correct stack trace.

Comment by A D [ 2022-05-30 ]

I'll see what I can do. It depends on getting symbols/corefile etc setup (which isn't currently) and then waiting for the first problem to happen again (fingers crossed it doesn't!). May be a few days.

Is there also a bug to be fixed with the built-in stack trace report if it's not producing a usable stack trace?

Comment by Daniel Black [ 2022-06-01 ]

" ./sqlite3/xxxx.frm is inconsistent: engine typecode 44, engine name CONNECT (46)", wouldn't happen to the same badtable would it? 44=SEQUENCE.

If you a core dump the debug symbol packages can be installed after the crash. Do you have the recommended default of innodb_change_buffering=none?

Generated at Thu Feb 08 10:02:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.