Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.2.22, 10.2.23
Description
The DB service stops from time to time but non-periodic (estimated mostly between 11pm and 9am) with the following error:
2019-03-28 23:31:08 0x7ff4dc62d700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.2.23/storage/innobase/rem/rem0rec.cc line 574
|
InnoDB: We intentionally generate a memory trap.
|
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
|
InnoDB: If you get repeated assertion failures or crashes, even
|
InnoDB: immediately after the mysqld startup, there may be
|
InnoDB: corruption in the InnoDB tablespace. Please refer to
|
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
|
InnoDB: about forcing recovery.
|
190328 23:31:08 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.2.23-MariaDB-10.2.23+maria~stretch
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=234
|
max_threads=752
|
thread_count=273
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1783435 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x7ff4d8001f28
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7ff4dc62ccc8 thread_stack 0x49000
|
*** buffer overflow detected ***: /usr/sbin/mysqld terminated
|
======= Backtrace: =========
|
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7ffa09af5bfb]
|
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7ffa09b7e437]
|
/lib/x86_64-linux-gnu/libc.so.6(+0xf7570)[0x7ffa09b7c570]
|
/lib/x86_64-linux-gnu/libc.so.6(+0xf93aa)[0x7ffa09b7e3aa]
|
/usr/sbin/mysqld(my_addr_resolve+0xe2)[0x55ca42284922]
|
/usr/sbin/mysqld(my_print_stacktrace+0x1bb)[0x55ca4226b1eb]
|
/usr/sbin/mysqld(handle_fatal_signal+0x41d)[0x55ca41d0a01d]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110e0)[0x7ffa0b4180e0]
|
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcf)[0x7ffa09ab7fff]
|
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7ffa09ab942a]
|
/usr/sbin/mysqld(+0x40f971)[0x55ca41ab8971]
|
/usr/sbin/mysqld(+0x887df6)[0x55ca41f30df6]
|
/usr/sbin/mysqld(+0x863673)[0x55ca41f0c673]
|
/usr/sbin/mysqld(+0x96648e)[0x55ca4200f48e]
|
/usr/sbin/mysqld(+0x89b559)[0x55ca41f44559]
|
/usr/sbin/mysqld(+0x8a15e4)[0x55ca41f4a5e4]
|
/usr/sbin/mysqld(+0x8a2187)[0x55ca41f4b187]
|
/usr/sbin/mysqld(+0x8b1a20)[0x55ca41f5aa20]
|
/usr/sbin/mysqld(+0x7f5c04)[0x55ca41e9ec04]
|
/usr/sbin/mysqld(_ZN7handler12ha_write_rowEPh+0x107)[0x55ca41d140d7]
|
/usr/sbin/mysqld(_Z12write_recordP3THDP5TABLEP12st_copy_info+0x72)[0x55ca41b4b992]
|
/usr/sbin/mysqld(_Z12mysql_insertP3THDP10TABLE_LISTR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb+0x1206)[0x55ca41b560f6]
|
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x3f68)[0x55ca41b6bee8]
|
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x28a)[0x55ca41b70e4a]
|
/usr/sbin/mysqld(+0x4c864f)[0x55ca41b7164f]
|
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1a7c)[0x55ca41b737fc]
|
/usr/sbin/mysqld(_Z10do_commandP3THD+0x176)[0x55ca41b748a6]
|
/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x25a)[0x55ca41c3ec0a]
|
/usr/sbin/mysqld(handle_one_connection+0x3d)[0x55ca41c3ed7d]
|
/usr/sbin/mysqld(+0xb75791)[0x55ca4221e791]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7ffa0b40e4a4]
|
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7ffa09b6dd0f]
|
======= Memory map: ========
|
55ca416a9000-55ca4278f000 r-xp 00000000 fe:02 1320586 /usr/sbin/mysqld
|
55ca4298e000-55ca42a60000 r--p 010e5000 fe:02 1320586 /usr/sbin/mysqld
|
55ca42a60000-55ca42b17000 rw-p 011b7000 fe:02 1320586 /usr/sbin/mysqld
|
55ca42b17000-55ca433a9000 rw-p 00000000 00:00 0
|
55ca43458000-55ca508b9000 rw-p 00000000 00:00 0 [heap]
|
7ff3d9e04000-7ff3d9e05000 ---p 00000000 00:00 0
|
7ff3d9e05000-7ff3d9e4f000 rw-p 00000000 00:00 0
|
7ff3d9e4f000-7ff3d9e50000 ---p 00000000 00:00 0
|
...
|
- The DB node which crashed at first (db1b) tonight, was set up yesterday (running 10.2.23).
- The second server (db1c) crashed 70 seconds later after becoming the primary DB node and during the reconnect/resync (SST) of db1b (last update 3 weeks ago from 10.2.14 to 10.2.22)
- The third DB node db1a wasn't running since the weekend and was therefore not involved in tonights crash
If seems that the processes keep some kind of running but the service info (systemctl status mariadb.service) are in a failed state and nobody can connect or use the db services. Furthermore the process has to be killed manually.
After that I could rebuild the Cluster by creating a new Galera Cluster (galera_new_cluster) from the most advanced node.
Attachments
Issue Links
- relates to
-
MDEV-13542 Crashing on a corrupted page is unhelpful
-
- Closed
-
InnoDB is crashing because it detects corruption in a record header of a table that does not use ROW_FORMAT=REDUNDANT (which was the original and only format before MySQL 5.0.3). The function rec_get_offsets_func() is reading the 3 bits from rec_get_status(), and expects the most significant bit of it to be zero.
In
MDEV-13542we should make the error handling of InnoDB more robust, so that crashes like this would be avoided.But, the cause of the corruption is unclear and cannot be found without a repeatable test case. I fixed some corruption bugs recently (
MDEV-14126just missed the 10.2.23 release), but I don’t think those fixes would affect this kind of corruption.I would suggest using innodb_compression_algorithm=strict_crc32 to improve the chances of detecting this kind of corruption earlier, when reading pages into the buffer pool. If corrupted pages enter the buffer pool, InnoDB can crash in very many places.
There could be a problem in the snapshot transfer (SST) procedure, or the corruption could simply have been propagated by the SST. That is the downside of physical replication.
If you can provide more information, such as page dumps (which could be obtained by using a debugger), I can try to help further.