[MDEV-8877] Inrecoverable InnoDB/XtraDB database after power failure (hard crash) Created: 2015-09-30  Updated: 2015-11-06  Resolved: 2015-11-06

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.1.7
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Norbert van Nobelen Assignee: Jan Lindström (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: innodb
Environment:

OpenSuse 12.1, ZFS 0.6.5, innodb doublewrite=off (ZFS takes care of consistency). ZFS config: atime=off, compress=lz4, recordsize=8k (to get rid of 16k page unsupported warnings), l2arc, logs on separate SSD devices, database on traditional HD. MariaDB 10.1.7 binary download (https://downloads.mariadb.org/interstitial/mariadb-10.1.7/bintar-linux-glibc_214-x86_64/mariadb-10.1.7-linux-glibc_214-x86_64.tar.gz), compressed and partitioned tables (barracuda file format in use)


Issue Links:
Relates
relates to MDEV-8963 innodb_force_recovery >= 4 won't work Closed

 Description   

After a power loss, the InnoDB/XtraDB database is not recoverable.

2015-09-30 20:17:29 140193769776992 [Note] /opt/mariadb-10.1.7-linux-x86_64/bin/mysqld (mysqld 10.1.7-MariaDB-log) starting as process 2319 ...
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: The InnoDB memory heap is disabled
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: Memory barrier is not used
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: Compressed tables use zlib 1.2.3
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: Not using CPU crc32 instructions
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: Initializing buffer pool, size = 256.0M
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: Completed initialization of buffer pool
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: Highest supported file format is Barracuda.
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: Log scan progressed past the checkpoint lsn 84533366099
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: Database was not shutdown normally!
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: Starting crash recovery.
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: Reading tablespace information from the .ibd files...
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: Restoring possible half-written data pages
2015-09-30 20:17:29 140193769776992 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Doing recovery: scanned up to log sequence number 84538608640
InnoDB: Doing recovery: scanned up to log sequence number 84543851520
InnoDB: Doing recovery: scanned up to log sequence number 84549094400
InnoDB: Set innodb_force_recovery to ignore this error.
2015-09-30 20:17:30 7f8167d8e760 InnoDB: Assertion failure in thread 140193769776992 in file log0recv.cc line 2846
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
150930 20:17:30 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.1.7-MariaDB-log
key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=0
max_threads=153
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 371998 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0xbe66be]
sql/signal_handler.cc:160(handle_fatal_signal)[0x754cc4]
/lib64/libpthread.so.0(+0xfd00)[0x7f816796fd00]
/lib64/libc.so.6(gsignal+0x35)[0x7f8166617d95]
/lib64/libc.so.6(abort+0x17b)[0x7f81666192ab]
log/log0recv.cc:2846(recv_scan_log_recs(unsigned long, unsigned long, unsigned char const*, unsigned long, unsigned long, unsigned long*, unsigned long*, dberr_t*))[0x93f15f]
log/log0recv.cc:3057(recv_group_scan_log_recs)[0x9401d7]
srv/srv0start.cc:2544(innobase_start_or_create_for_mysql())[0x9c13dc]
handler/ha_innodb.cc:4106(innobase_init)[0x8ecd90]
sql/handler.cc:513(ha_initialize_handlerton(st_plugin_int*))[0x75860e]
sql/sql_plugin.cc:1407(plugin_initialize)[0x5ba850]
sql/sql_plugin.cc:1684(plugin_init(int*, char**, int))[0x5bc5c2]
sql/mysqld.cc:5195(init_server_components)[0x518ece]
sql/mysqld.cc:5758(mysqld_main(int, char**))[0x51a7f4]
/lib64/libc.so.6(__libc_start_main+0xed)[0x7f816660423d]
/opt/mariadb-10.1.7-linux-x86_64/bin/mysqld[0x50fe49]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.



 Comments   
Comment by Elena Stepanova [ 2015-09-30 ]

InnoDB: Doing recovery: scanned up to log sequence number 84549094400
InnoDB: Set innodb_force_recovery to ignore this error.

Did you try using innodb_force_recovery as the message in the log suggests?

Comment by Norbert van Nobelen [ 2015-10-01 ]

No, I will once the database server is available again: Created a snapshot of the corrupt db, and then started over from a backup. It is an estimated 14 day processing run, so I will get back on this once this new run is over.

Comment by Norbert van Nobelen [ 2015-10-12 ]

Second server same issue.
Innodb recovery (levels 1-5 tested) is not able to recover the database at all, mariadb mysqld will not start.
Linux version: OpenSuse 12.3, filesystem zfs 0.6.3.

No other way to restore the database then from backup.

151012 18:41:47 [Note] InnoDB: Using mutexes to ref count buffer pool pages
151012 18:41:47 [Note] InnoDB: The InnoDB memory heap is disabled
151012 18:41:47 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
151012 18:41:47 [Note] InnoDB: Memory barrier is not used
151012 18:41:47 [Note] InnoDB: Compressed tables use zlib 1.2.3
151012 18:41:47 [Note] InnoDB: Using CPU crc32 instructions
151012 18:41:47 [Note] InnoDB: Initializing buffer pool, size = 2.0G
151012 18:41:47 [Note] InnoDB: Completed initialization of buffer pool
151012 18:41:47 [Note] InnoDB: Highest supported file format is Barracuda.
151012 18:41:47 [Note] InnoDB: Log scan progressed past the checkpoint lsn 1998060488869
151012 18:41:47 [Note] InnoDB: Database was not shutdown normally!
151012 18:41:47 [Note] InnoDB: Starting crash recovery.
151012 18:41:47 [Note] InnoDB: Reading tablespace information from the .ibd files...
151012 18:41:47 [Note] InnoDB: Restoring possible half-written data pages
151012 18:41:47 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Doing recovery: scanned up to log sequence number 1998061776360
InnoDB: Error: trying to access page number 29124520 in space 0,
InnoDB: space name /tank/mysql16k/ibdata1,
InnoDB: which is outside the tablespace bounds.
InnoDB: Byte offset 0, len 16384, i/o type 10.
InnoDB: If you get this error at mysqld startup, please check that
InnoDB: your my.cnf matches the ibdata files that you have in the
InnoDB: MySQL server.
2015-10-12 18:41:47 7f3e7d165780 InnoDB: Assertion failure in thread 139906363316096 in file fil0fil.cc line 5510
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
151012 18:41:47 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.0.17-MariaDB-log
key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=0
max_threads=153
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 372042 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0xbc4eee]
sql/signal_handler.cc:153(handle_fatal_signal)[0x71c4dc]
/lib64/libpthread.so.0(+0xf1f0)[0x7f3e7cd521f0]
/lib64/libc.so.6(gsignal+0x35)[0x7f3e7bb783d5]
/lib64/libc.so.6(abort+0x148)[0x7f3e7bb79858]
fil/fil0fil.cc:5572(_fil_io(unsigned long, bool, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, void*, void*, trx_t*))[0xa09a85]
buf/buf0rea.cc:239(buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, unsigned long, unsigned long, long, unsigned long, trx_t*))[0x9d3eeb]
include/ut0counter.h:148(ib_counter_t<unsigned long, 1, single_indexer_t>::add(unsigned long))[0x9d4f43]
buf/buf0buf.cc:2756(buf_page_get_gen(unsigned long, unsigned long, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*))[0x9bdab5]
include/trx0undo.ic:174(trx_undo_page_get)[0x989250]
trx/trx0rseg.cc:213(trx_rseg_mem_create)[0x97d24f]
trx/trx0rseg.cc:289(trx_rseg_create_instance)[0x97d553]
trx/trx0sys.cc:521(trx_sys_init_at_db_start())[0x97f976]
srv/srv0start.cc:2527(innobase_start_or_create_for_mysql())[0x96994d]
handler/ha_innodb.cc:3802(innobase_init)[0x8a8452]
sql/handler.cc:501(ha_initialize_handlerton(st_plugin_int*))[0x72157e]
sql/sql_plugin.cc:1378(plugin_initialize)[0x5a53a8]
sql/sql_plugin.cc:1613(plugin_init(int*, char**, int))[0x5a6666]
sql/mysqld.cc:4820(init_server_components)[0x50e78c]
sql/mysqld.cc:5414(mysqld_main(int, char**))[0x50f773]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f3e7bb64a15]
/tank/mariadb/mariadb-10.0.17-linux-x86_64/bin/mysqld[0x503d09]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Comment by Elena Stepanova [ 2015-10-18 ]

Norbert van Nobelen,
With which value of innodb_force_recovery was the last stack trace produced?
By "MariaDB won't start", do you mean that it does not start with innodb_force_recovery, or that it does not start afterwards?
Please note that high values of innodb_force_recovery do not actually restore the database, they are just supposed to allow you to connect and retrieve your data.

Upd: regarding non-functional innodb_force_recovery, it turns out there is an upstream bug https://bugs.mysql.com/bug.php?id=77654 which was brought to our attention in recently filed MDEV-8963.

Comment by Axel Schwenke [ 2015-10-19 ]

That second case looks like a corruption of either table space or redo log. For the first case the assertion in line 2846 of log0recv.cc is for a garbled redo log block.

So at a first glance it looks as if the "atomic writes" guarantee does not hold for ZFS on Linux. Which would make the whole thing a ZFS bug. If the second one is also for a garbled redo log, it might not even be related to the double write buffer setting.

Norbert: can you share your my.cnf? Especially what InnoDB flushing method and flush log at trx commit setting you use.

Comment by Norbert van Nobelen [ 2015-10-21 ]

@Elana Stepanova:

The posted log is generated with innodb_force_recovery=0. Once using 4 or higher the InnoDB is just not loaded anymore and mariadb will start, but just without my database and some errors in the information schema.

The stack trace is pretty useless (I realize) since it just tells what is the result of the crash (data loss), but in the sceond case not the cause of the crash. The second machine I have the issue on did not crash, only mariadb crashed.
Load: Reasonable, about 3-4 with moments of low load.

Comment by Norbert van Nobelen [ 2015-10-21 ]

@Axel Schwenke:

I have been running MySQL with ZFS for about 6 years now, without any issues, with a well researched ZFS suitable configuration. ZFS is as far as I know the only FS where doublewrite can be turned off since it already takes care of data consistency (atomic). I actually used a backup script which just snapshots the running database without quieting the database for a moment and use the InnoDB recovery to get me a running database again. (Sadly I did not have this script running on these servers at the time of the crashes, else I would have had more chance of saving my data).

Please Google on this reported InnoDB failure: There are a few more posts on stackoverflow and other sites with the same crash type: MariaDB crashes (either due to system crash or by itself) and destroys the InnoDB database with it, so it seems to occur in multiple configurations with either ZFS or other FS.

My my.cnf on crash 1 (opensuse 12.1, original post) (relevant parts only):

[mysqld]
port            = 3306
socket          = /var/run/mysql/mysql.sock
datadir = /mariadb/databases
key_buffer_size = 16M
max_allowed_packet = 1M
table_open_cache = 2000
sort_buffer_size = 2048K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 2048K
myisam_sort_buffer_size = 2M
query_cache_type=1
query_cache_size=4M
log-bin=mysql-bin
binlog_format=mixed
sync_binlog=0
server-id       = 1
tmpdir          = /tmp
 
innodb_file_format=BARRACUDA
innodb_data_home_dir = /mariadb/databases
innodb_log_group_home_dir = /mariadb/log
innodb_buffer_pool_size = 256M
innodb_log_file_size = 32M
innodb_log_buffer_size = 16M
innodb_flush_log_at_trx_commit = 0
innodb_flush_method=O_DSYNC
innodb_use_native_aio=0
innodb_doublewrite=0
innodb_change_buffering=all
innodb_io_capacity=150

My my.cnf on crash 2 (opensuse 12.3, original post) (relevant parts only):

[client]
#password       = your_password
port            = 3306
socket          = /var/run/mysql/mysql.sock
 
# The MySQL server
[mysqld]
port            = 3306
socket          = /var/run/mysql/mysql.sock
datadir = /tank/mysql_n
key_buffer_size = 16M
max_allowed_packet = 1M
table_open_cache = 256
sort_buffer_size = 2048K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 2048K
myisam_sort_buffer_size = 2M
query_cache_type=1
query_cache_size=4M
log-bin=mysql-bin
binlog_format=mixed
sync_binlog=0
server-id       = 1
tmpdir          = /tank/tmp
innodb_file_format=BARRACUDA
innodb_data_home_dir = /tank/mysql16k_n
innodb_log_group_home_dir = /tank/mysql-log_n
innodb_buffer_pool_size = 2048M
innodb_log_file_size = 32M
innodb_log_buffer_size = 16M
innodb_flush_log_at_trx_commit = 0
innodb_flush_method=O_DSYNC
innodb_use_native_aio=0
innodb_doublewrite=0
innodb_change_buffering=all
innodb_io_capacity=150
innodb_force_recovery = 0

Sample of the data model (1 unrecoverable table even in Barracuda file format, import ibd file after recovery of the .frm file, table size ~122GB):

CREATE TABLE `pagecache` (
  `pagecache_id` int(11) NOT NULL AUTO_INCREMENT,
  `url` VARCHAR(400) DEFAULT NULL,
  `documentcontent` LONGTEXT DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `md5sum` CHAR(32) DEFAULT NULL,
  `status_code` mediumint(9) DEFAULT NULL,
  PRIMARY KEY (`pagecache_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;

About the after crash situation:
Personally I would opt for recovery with data loss:
99.99% of data in this table was most likely OK, just a few pages are corrupted. If innodb_force_recovery=X (assign a number) would recover the data and drop a few pages, rebuild indexes (since it is unknown what data is gone, that is most likely required), it at least gives a running technically consistent database. It is then up to the user/admin to decide if this is an OK situation.

About the crash cause: Sadly I do not have more logging on the cause of the crash (especially in the second situation).

Comment by Elena Stepanova [ 2015-10-26 ]

jplindst,

So, we have (at least) two problems here:

1) innodb_force_recovery not working: it might be related https://bugs.mysql.com/bug.php?id=77654 or different. If it's the upstream bug, it's up to you to decide whether we will wait for their bugfix or not, but it seems strange to me that it has been staying open for quite a while by now – not being able to recover the data at all seems as critical as it gets;

2) second crash which was not caused by power failure: apparently we don't have a stack trace from the initial crash, but there is one when InnoDB fails to recover after it, see comments – maybe you will recognize something familiar in here.

Comment by Jan Lindström (Inactive) [ 2015-10-28 ]

1) Will be fixed on next release, just use innodb_plugin for recovery (not sure if xtradb 5.6.27 is out by then)

Generated at Thu Feb 08 07:30:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.