[MDEV-18932] MariaDB 10.3.10-10.3.13 corrupts table and refuses to start with assertion in row0sel.cc 2986 Created: 2019-03-14 Updated: 2019-10-21 Resolved: 2019-06-24 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data Manipulation - Insert, Storage Engine - InnoDB |
| Affects Version/s: | 10.3.13 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Martin Schroeter | Assignee: | Marko Mäkelä |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | blob, corruption, crash, need_feedback | ||
| Environment: |
RedHat 7.4, multiple Kernel 3.10.0*.x86_64, Docker 18.09.3. DB User Zabbix 4.0.3-4.0.5, MariaDB 10.3.10 - 10.3.13 (InnoDB), Docker-Volume: Overlay Filesystem on xfs-Filesystem within LVM Volume. |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
Zabbix Servers with MariaDB comes up after restart with corrupted Zabbix items table. It always affects the same table. All other tables are good. Because of the corrupted table, MariaDB Server refuses to start with assertion in row0sel.cc line 2968. Trying to recover the corrupted tables according to https://mariadb.com/kb/en/library/innodb-recovery-modes/ didn't work for all recovery modes from 1-6. my.cnf: |
| Comments |
| Comment by Marko Mäkelä [ 2019-03-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
msc, please attach text logs next time. Images are not searchable. In MariaDB_10.3.10-10.3.13_Innodb_Assertion_row0sel.cc.png
A normal access to a table should never encounter a BLOB pointer that has not been written yet. Non-locking (MVCC) reads should ignore records or record versions that are in the process of being written (the change has not been committed yet), and locking reads should wait until the change has been committed and the record lock has been released. Only at the READ UNCOMMITTED isolation level, such incomplete BLOBs are tolerated (and returned as empty columns). The question is: How did the table get corrupted? Was there a database crash at some point? Did crash recovery fail? Did you remove the redo log files or set innodb_force_recovery? Did you copy the data files in an unsafe fashion at some point in the past? The complete server error log could help narrow down the problem. By the way, it could be that CHECK TABLE is not trying to validate BLOB columns. CHECKSUM TABLE should be better in that respect. Until | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Martin Schroeter [ 2019-03-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
@Marko Mäkelä: Thanks a lot for your response. I have a separated machine now to make further tests. Next time I can provide logs instead of screenshots. I'm trying to reproduce the problem on this machine now. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Martin Schroeter [ 2019-04-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Looks like I can reproduce the situation: 1) Running mariadb 10.3.13. `check table items` says all good. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Martin Schroeter [ 2019-04-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
I tried once more the same as above, but just without using the incremental backup. This leads to the same corrupted items table. This table has 37981 rows. `check table items` shows [Error] InnoDB: Record overlaps another MariaDB [zbx]> check table items; | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Martin Schroeter [ 2019-04-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Martin Schroeter [ 2019-04-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
The database has been installed newly and the Zabbix-DB has been imported from mysqldump. This works fine and all tables are consistent now. I assume the mariabackup only provides damaged restores, because something is damaged on the mariadb instance and mariabackup just backs up this inconsistencies. I would recommend to close this ticket. The only concrete thing found is, that there were VM crashes wich may have been caused by https://support.mesosphere.com/s/article/Critical-Issue-KMEM-MSPH-2018-0006 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-05-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
msc, it sounds like there could be a bug in mariabackup (failure to detect an error while creating or preparing the backup). Can you please try to narrow it down more? | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Walter Doekes [ 2019-10-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi there. We just ran into this problem. It was reproducible apparently, because we recently imported a dump and the problem persisted over the imports.
In our case, we start just fine, but we have trouble with table accesses. Source server: mariadb 10.3.12 (origin of the db dump) Problematic server:
In the source DB, this table is fine. Any UPDATE or ANALYZE TABLE on this table causes the assertion as mentioned:
If we're trying to allocate values close to (uint64_t)-1, then you'd expect failure indeed:
Data has been synced using:
If I come up with any other info, I'll amend. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-10-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
msc, I believe that the problems that you are seeing are related to using instant ADD COLUMN. Unfortunately, we have been unable to reproduce that so far, despite adding consistency checks to debug builds. Let us track that one in | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-10-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
wdoekes, by data dump, do you mean a logical dump, which consists of SQL statements? If so, that could help us repeat the problem. A physical dump (used with ALTER TABLE…IMPORT TABLESPACE) would not help, because the steps between the time the corruption was introduced and it was noticed would be unknown. We have been trying hard to reproduce this with a sequence of SQL statements, but we have so far been unsuccessful. If I had such a sequence, this bug should be trivial to locate and fix. You can upload files to ftp.mariadb.com. It would only be accessible to some employees of MariaDB Corporation. You can obfuscate as much of the data as possible, while still keeping it repeatable. I think that for this bug, the PRIMARY KEY values must be preserved. For any other columns, the data should not matter (except that for any variable-length columsn, the lengths should be preserved). Once again, I am not interested in a physical dump (.ibd file). | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Walter Doekes [ 2019-10-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Marko, thanks for the reply. I don't know if this table was altered with instant-columns. But the "fix" you wrote in https://jira.mariadb.org/browse/MDEV-19783?focusedCommentId=132884&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-132884 does not appear to work. Dropping the index on the FK succeeded, but the corruption persisted on the PK.
Suggested fix from
No luck. Is it expected to crash on ALTER TABLE FORCE? Or is this new to you?
I'm sorry, no such luck. Only ibd files. We'll see if we can poke at it from another angle in the mean time. The productions servers have been up and running since Feb. So after reading some bug comments, I'm becoming a bit nervous they'll get in trouble after a restart. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Walter Doekes [ 2019-10-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
So. We got ourselves a cleaner environment, and recreated the crash with only this table. Using the original MariaDB 10.3.12. DROP TABLE didn't cause a crash. And reloading the data from a mysqldump fixed it in development. That sounds like a viable fix on production then. (Unfortunately that won't bring this ticket any closer to a resolution.) |