[MDEV-16497] Server crashes on corrupted pages during DROP TABLE Created: 2018-06-15  Updated: 2018-06-18  Resolved: 2018-06-15

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.1.34
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Jan Lindström (Inactive) Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: None

Attachments: File innodb-corrupted.test     File innodb-page-compression.inc    
Issue Links:
Duplicate
duplicates MDEV-13542 Crashing on a corrupted page is unhel... Closed
Relates
relates to MDEV-13103 InnoDB: Deal with page_compressed pag... Closed

 Description   

In provided test we intentionally create several tables using page compression and corrupt

  • compression method field
  • payload size
  • actual payload data

Expectation is naturally that server will not crash. This expectation is not true for all possible compression methods.

2018-06-15 14:45:55 140214146956032 [ERROR] InnoDB: Unable to read tablespace 27 page no 3 into the buffer pool after 100 attempts. The most probable cause of this error may be that the table has been corrupted. You can try to fix this problem by using innodb_force_recovery. Please see http://dev.mysql.com/doc/refman/5.6/en/ for more details. Aborting...
2018-06-15 14:45:55 7f86266bfb00  InnoDB: Assertion failure in thread 140214146956032 in file ha_innodb.cc line 22025
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
180615 14:45:55 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.34-MariaDB-debug
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63026 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f8619f95070
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f86266bf1a0 thread_stack 0x48400
/home/jan/mysql/10.1-bugs/sql/mysqld(my_print_stacktrace+0x38)[0x55968afcabec]
mysys/stacktrace.c:267(my_print_stacktrace)[0x55968a963fd1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11f50)[0x7f862635cf50]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7f86235c7e7b]
linux/raise.c:51(__GI_raise)[0x7f86235c9231]
/home/jan/mysql/10.1-bugs/sql/mysqld(+0xa70592)[0x55968acc1592]
handler/ha_innodb.cc:22027(ib_logf(ib_log_level_t, char const*, ...))[0x55968aea9315]
buf/buf0buf.cc:3272(buf_page_get_gen(unsigned long, unsigned long, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*, dberr_t*))[0x55968ae66687]
include/btr0btr.ic:59(btr_block_get_func(unsigned long, unsigned long, unsigned long, unsigned long, char const*, unsigned long, dict_index_t*, mtr_t*))[0x55968ae667d4]
include/btr0btr.ic:124(btr_page_get(unsigned long, unsigned long, unsigned long, unsigned long, dict_index_t*, mtr_t*))[0x55968ae697d2]
btr/btr0btr.cc:1836(btr_free_but_not_root(unsigned long, unsigned long, unsigned long))[0x55968aed9590]
dict/dict0crea.cc:779(dict_drop_index_tree(unsigned char*, mtr_t*))[0x55968adff92a]
row/row0upd.cc:2697(row_upd_clust_step(upd_node_t*, que_thr_t*))[0x55968adffe7e]
row/row0upd.cc:2852(row_upd(upd_node_t*, que_thr_t*))[0x55968ae002f6]
row/row0upd.cc:3004(row_upd_step(que_thr_t*))[0x55968ad78f9e]
que/que0que.cc:1071(que_thr_step(que_thr_t*))[0x55968ad79286]
que/que0que.cc:1151(que_run_threads_low(que_thr_t*))[0x55968ad79415]
que/que0que.cc:1194(que_run_threads(que_thr_t*))[0x55968ad7969d]
que/que0que.cc:1277(que_eval_sql(pars_info_t*, char const*, unsigned long, trx_t*))[0x55968adc558f]
row/row0mysql.cc:4268(row_drop_table_for_mysql(char const*, trx_t*, bool, unsigned long, bool))[0x55968acb4583]
handler/ha_innodb.cc:13186(ha_innobase::delete_table(char const*))[0x55968a96ea90]
sql/handler.cc:4320(handler::ha_delete_table(char const*))[0x55968a969b35]
sql/handler.cc:2384(ha_delete_table(THD*, handlerton*, char const*, char const*, char const*, bool))[0x55968a7de55e]
sql/sql_table.cc:2469(mysql_rm_table_no_locks(THD*, TABLE_LIST*, bool, bool, bool, bool, bool))[0x55968a7dd77f]
sql/sql_table.cc:2084(mysql_rm_table(THD*, TABLE_LIST*, char, char))[0x55968a72b2cc]
sql/sql_parse.cc:4253(mysql_execute_command(THD*))[0x55968a734f33]
sql/sql_parse.cc:7449(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x55968a723697]
sql/sql_parse.cc:1494(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x55968a722458]
sql/sql_parse.cc:1121(do_command(THD*))[0x55968a85bc6c]
sql/sql_connect.cc:1330(do_handle_one_connection(THD*))[0x55968a85b9bc]
sql/sql_connect.cc:1243(handle_one_connection)[0x55968ac8c49d]
nptl/pthread_create.c:463(start_thread)[0x7f86263525aa]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f8623689cbf]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f860c843088): drop table t11, t12, t13, t14, t15
Connection ID (thread ID): 2
Status: NOT_KILLED



 Comments   
Comment by Marko Mäkelä [ 2018-06-15 ]

It is well known that a corrupted page read typically causes the server to crash. For some reason, corrupted encrypted pages are treated as a special case (DB_DECRYPTION_FAILED) that avoids a crash at a low level, and instead can crash if a caller of buf_page_get_gen() does not check for a NULL return value.

Comment by Jan Lindström (Inactive) [ 2018-06-18 ]

In decryption case we want to avoid crashing in case we used wrong keys or methods as original persistent page is not really corrupted. For some reason in my tests with above test case lz4 and lzo seem to be most vulnerable to corruptions (this could be because of better compression). In my opinion server should not crash even on corrupted pages on normal tablespaces and instead mark table as corrupted. There is naturally cases when system tablespace or undo tablespace corruption we can't continue. In GA-releases we cant make significant changes. In this case setting space corrupted during DROP TABLE fails:

/* Try to set table as corrupted instead of
                        asserting. */
if (space > TRX_SYS_SPACE &&
     dict_set_corrupted_by_space(space)) {

Only way to that happen is when table is already removed from table_LRU list. In my opinion user should be able to drop even corrupted pages.

Generated at Thu Feb 08 08:29:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.