Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Duplicate
-
10.3.13, 10.3.14, 10.3.15
-
Centos 7
Description
This will be a bit vague bug report, but I'll file it anyhow, as I think it relates to a couple of other reports and might give additional insight. We run a 5 node galera cluster, 1.1TB of data, quite heavy read/write during usage and high parallellism. This runs on Centos7.
The core of the issue are random crashes, giving either terminations on signal 6 or signal 11. The signal 6 ones are always intentionally crashed by innodb as it detects some sort of corruption, the signal 11s mostly come after it reports on a corrupted index, and then crashes without info a couple of seconds later.
In retrospect, we believe it started while we were at version 10.3.14. The first manifestation was with this stacktrace on one of our nodes:
2019-05-23 19:53:16 4 [ERROR] [FATAL] InnoDB: Rec offset 99, cur1 offset 14135, cur2 offset 16209
|
190523 19:53:16 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
Server version: 10.3.14-MariaDB-log
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=485
|
max_threads=1502
|
thread_count=179
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3432903 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
Thread pointer: 0x7f55780009a8
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7f558cff8c30 thread_stack 0x49000
|
*** buffer overflow detected ***: /usr/sbin/mysqld terminated
|
======= Backtrace: =========
|
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f6c417a19e7]
|
/lib64/libc.so.6(+0x115b62)[0x7f6c4179fb62]
|
/lib64/libc.so.6(+0x117947)[0x7f6c417a1947]
|
/usr/sbin/mysqld(my_addr_resolve+0xda)[0x55cf5d7408fa]
|
/usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x55cf5d729f92]
|
/usr/sbin/mysqld(handle_fatal_signal+0x357)[0x55cf5d1c9cc7]
|
/lib64/libpthread.so.0(+0xf5d0)[0x7f6c433ec5d0]
|
/lib64/libc.so.6(gsignal+0x37)[0x7f6c416c0207]
|
/lib64/libc.so.6(abort+0x148)[0x7f6c416c18f8]
|
/usr/sbin/mysqld(+0xa6c663)[0x55cf5d4b3663]
|
/usr/sbin/mysqld(+0x99b801)[0x55cf5d3e2801]
|
/usr/sbin/mysqld(+0xa7656e)[0x55cf5d4bd56e]
|
/usr/sbin/mysqld(+0xa79b1d)[0x55cf5d4c0b1d]
|
/usr/sbin/mysqld(+0xa7b016)[0x55cf5d4c2016]
|
/usr/sbin/mysqld(+0xa8b94e)[0x55cf5d4d294e]
|
Server restarted and continued happily after that. We didn't find a cause, and blamed it as a freak accident.
However, after a while, we started seeing issues on our backups. We take the backups with mariabackup, and part of the backup process is to actually restore the full backup again and verify that it load and runs fine on a test instance. Part of the verification is running a mysqlcheck. We had a couple of tables reporting that they were corrupt, but an optimize fixed this (secondary indexes being corrupt). We blamed this on running an older version of mariabackup on the recovery node to apply the logs, but think this might be unrelated.
Things got worse last week. We had a crash on 3 of our nodes, exactly at the same time, with exactly the same error on all three nodes (this is probably galera pushing a change to the same table on all nodes):
2019-06-13 13:23:18 9 [ERROR] [FATAL] InnoDB: Rec offset 99, cur1 offset 11337, cur2 offset 16190
|
190613 13:23:18 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
Server version: 10.3.15-MariaDB-log
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=301
|
max_threads=1502
|
thread_count=138
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3433196 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
Thread pointer: 0x7f06940009a8
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7f18682b8d60 thread_stack 0x49000
|
*** buffer overflow detected ***: /usr/sbin/mysqld terminated
|
======= Backtrace: =========
|
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f1d770279e7]
|
/lib64/libc.so.6(+0x115b62)[0x7f1d77025b62]
|
/lib64/libc.so.6(+0x117947)[0x7f1d77027947]
|
/usr/sbin/mysqld(my_addr_resolve+0xda)[0x5574e38f647a]
|
/usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x5574e38dfab2]
|
/usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x5574e337c53f]
|
/lib64/libpthread.so.0(+0xf5d0)[0x7f1d78c735d0]
|
/lib64/libc.so.6(gsignal+0x37)[0x7f1d76f46207]
|
/lib64/libc.so.6(abort+0x148)[0x7f1d76f478f8]
|
/usr/sbin/mysqld(+0xa6c223)[0x5574e366a223]
|
/usr/sbin/mysqld(+0x99ab61)[0x5574e3598b61]
|
/usr/sbin/mysqld(+0xa760be)[0x5574e36740be]
|
/usr/sbin/mysqld(+0xa83fde)[0x5574e3681fde]
|
/usr/sbin/mysqld(+0xa8e718)[0x5574e368c718]
|
/usr/sbin/mysqld(+0xa0da82)[0x5574e360ba82]
|
/usr/sbin/mysqld(+0xa12f99)[0x5574e3610f99]
|
/usr/sbin/mysqld(+0xa13c44)[0x5574e3611c44]
|
/usr/sbin/mysqld(+0x9e497b)[0x5574e35e297b]
|
/usr/sbin/mysqld(+0x92042c)[0x5574e351e42c]
|
/usr/sbin/mysqld(_ZN7handler13ha_update_rowEPKhS1_+0x1c2)[0x5574e3388032]
|
/usr/sbin/mysqld(_ZN21Update_rows_log_event11do_exec_rowEP14rpl_group_info+0x291)[0x5574e3480ca1]
|
2019-06-13 13:23:19 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.53285S), skipping check
|
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x24c)[0x5574e347425c]
|
/usr/sbin/mysqld(wsrep_apply_cb+0x4ac)[0x5574e32f8e9c]
|
/usr/lib64/galera/libgalera_smm.so(_ZNK6galera9TrxHandle5applyEPvPF15wsrep_cb_statusS1_PKvmjPK14wsrep_trx_metaERS6_+0xd8)[0x7f1d73314598]
|
/usr/lib64/galera/libgalera_smm.so(+0x1de753)[0x7f1d73351753]
|
/usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0x15c)[0x7f1d733548ac]
|
/usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM11process_trxEPvPNS_9TrxHandleE+0x10e)[0x7f1d73357a8e]
|
/usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x1b8)[0x7f1d73332f38]
|
/usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x4c)[0x7f1d7333462c]
|
/usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x6b)[0x7f1d7335806b]
|
/usr/lib64/galera/libgalera_smm.so(galera_recv+0x18)[0x7f1d73366068]
|
/usr/sbin/mysqld(+0x6fbb49)[0x5574e32f9b49]
|
/usr/sbin/mysqld(start_wsrep_THD+0x29e)[0x5574e32ed58e]
|
/lib64/libpthread.so.0(+0x7dd5)[0x7f1d78c6bdd5]
|
/lib64/libc.so.6(clone+0x6d)[0x7f1d7700dead]
|
======= Memory map: ========
|
5574e2bfe000-5574e3eb1000 r-xp 00000000 ca:01 12760256 /usr/sbin/mysqld
|
5574e40b0000-5574e41af000 r--p 012b2000 ca:01 12760256 /usr/sbin/mysqld
|
We recovered from this issue, and further investigation gave some tables which had the index corruption (redacted some names):
db_a.table_1
|
Warning : InnoDB: Index IDX_33BF90829E6B1585 is marked as corrupted
|
Warning : InnoDB: Index <redacted> is marked as corrupted
|
Warning : InnoDB: Index <redacted> is marked as corrupted
|
error : Corrupt
|
db_2.table_2
|
Warning : InnoDB: Index IDX_2AF5A5C39C6FCE9 is marked as corrupted
|
Warning : InnoDB: Index IDX_2AF5A5C41859289 is marked as corrupted
|
Warning : InnoDB: Index <redacted> is marked as corrupted
|
Warning : InnoDB: Index IDX_2AF5A5C54B9D732 is marked as corrupted
|
Warning : InnoDB: Index <redacted> is marked as corrupted
|
Warning : InnoDB: Index FK_2AF5A5CCD17CB07 is marked as corrupted
|
Warning : InnoDB: Index <redacted> is marked as corrupted
|
error : Corrupt
|
db_3.table_3
|
Warning : InnoDB: Index UNIQ_1B2C3247D0F6D69 is marked as corrupted
|
Warning : InnoDB: Index IDX_1B2C324796915FC is marked as corrupted
|
Warning : InnoDB: Index IDX_1B2C3247108B7592 is marked as corrupted
|
error : Corrupt
|
db_4.table_4
|
Warning : InnoDB: Index IDX_5615FCE3E7A1254A is marked as corrupted
|
error : Corrupt
|
db_5.table_5
|
Warning : InnoDB: Index IDX_4C62E638A76ED395 is marked as corrupted
|
error : Corrupt
|
db_6.table_6
|
Warning : InnoDB: Index IDX_F84AA10D613FECDF is marked as corrupted
|
Warning : InnoDB: Index IDX_F84AA10D5DA1941 is marked as corrupted
|
Warning : InnoDB: Index IDX_F84AA10DCCFA12B8 is marked as corrupted
|
Warning : InnoDB: Index IDX_F84AA10DF025383A is marked as corrupted
|
error : Corrupt
|
db_7.table_7
|
Warning : InnoDB: The B-tree of index PRIMARY is corrupted.
|
Warning : InnoDB: Index IDX_F84AA10D613FECDF is marked as corrupted
|
Warning : InnoDB: Index IDX_F84AA10D5DA1941 is marked as corrupted
|
Warning : InnoDB: Index IDX_F84AA10DCCFA12B8 is marked as corrupted
|
Warning : InnoDB: Index IDX_F84AA10DF025383A is marked as corrupted
|
error : Corrupt
|
db_8.table_8
|
Warning : InnoDB: Index 'IDX_FD2BEAC094A4C7D4' contains 1374306 entries, should be 1374337.
|
Warning : InnoDB: Index 'IDX_FD2BEAC0A76ED395' contains 1374306 entries, should be 1374337.
|
Warning : InnoDB: Index '<redacted>' contains 1374333 entries, should be 1374337.
|
Warning : InnoDB: Index '<redacted>' contains 1374332 entries, should be 1374337.
|
Warning : InnoDB: Index '<redacted>' contains 1374334 entries, should be 1374337.
|
Warning : InnoDB: Index 'IDX_FD2BEAC0727ACA70' contains 1374322 entries, should be 1374337.
|
Warning : InnoDB: Index 'FK_FD2BEAC067433D9C' contains 1374333 entries, should be 1374337.
|
Warning : InnoDB: Index 'IDX_FD2BEAC0727BAC71' contains 1374312 entries, should be 1374337.
|
Warning : InnoDB: Index 'IDX_FD2BEAC0E25D857E47CC8C92B5D554E5' contains 1374329 entries, should be 1374337.
|
Warning : InnoDB: Index '<redacted>' contains 1374333 entries, should be 1374337.
|
Warning : InnoDB: Index '<redacted>' contains 1374333 entries, should be 1374337.
|
error : Corrupt
|
db_9.table_9
|
Warning : InnoDB: Index 'IDX_9B8A8D46A76ED395' contains 669 entries, should be 670.
|
Warning : InnoDB: Index 'IDX_9B8A8D4694A4C7D4' contains 669 entries, should be 670.
|
Warning : InnoDB: Index 'IDX_9B8A8D46CCFA12B8' contains 669 entries, should be 670.
|
error : Corrupt
|
We recovered eventually from this by mysqldump-ing the data, dropping the table and reload the content. The dumping needed in one case the startup of the server with force_innodb_recovery set to 6.
There is one symptom which made us raise some eyebrows, and maybe this might help in debugging. The tables which had corruption had a field appended (alter table add column varchar...), using the new "instant" method (https://mariadb.com/kb/en/library/instant-add-column-for-innodb/). When mysqldumping the content from the corrupted tables, this field was actually filled with corrupted data (while it should have been NULL, we added it but didn't use/fill it yet). The data in the field looked like binary content of the table itself, as we could recognise parts of the other varchar columns in there. All other fields seemed correct.
My apologies if this is vague. We have the binary datafiles still on disk in our backups, so if needed we can "restore" the corrupted situation.
We thought our case has similarities to https://jira.mariadb.org/browse/MDEV-19666 and https://jira.mariadb.org/browse/MDEV-18223.
Attachments
Issue Links
- duplicates
-
MDEV-18519 0x7f0118195700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.3.7/storage/innobase/btr/btr0cur.cc line 4308
- Closed
-
MDEV-19916 Corruption after instant ADD/DROP and shrinking the index tree
- Closed
- is caused by
-
MDEV-11369 Instant add column for InnoDB
- Closed
- relates to
-
MDEV-19743 Crash while reorganizing an index page
- Closed
-
MDEV-19916 Corruption after instant ADD/DROP and shrinking the index tree
- Closed
-
MDEV-20590 Introduce a file format constraint to ALTER TABLE
- Closed
-
MDEV-21251 CHECK TABLE fails to check info_bits of records
- Closed
-
MDEV-21371 Assertion failure in page_rec_get_next_low() during innodb_gis.rtree_compress
- Closed
-
MDEV-28786 InnoDB crash leads to pagesize comparison failure
- Closed
-
MDEV-18932 MariaDB 10.3.10-10.3.13 corrupts table and refuses to start with assertion in row0sel.cc 2986
- Closed
-
MDEV-20413 Mariadb Galera Cluster Crash and won't restart
- Closed
-
MDEV-22404 Server Crash after Galera WSREP event received
- Closed