[MDEV-18932] MariaDB 10.3.10-10.3.13 corrupts table and refuses to start with assertion in row0sel.cc 2986 Created: 2019-03-14  Updated: 2019-10-21  Resolved: 2019-06-24

Status: Closed
Project: MariaDB Server
Component/s: Data Manipulation - Insert, Storage Engine - InnoDB
Affects Version/s: 10.3.13
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Martin Schroeter Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: blob, corruption, crash, need_feedback
Environment:

RedHat 7.4, multiple Kernel 3.10.0*.x86_64, Docker 18.09.3. DB User Zabbix 4.0.3-4.0.5, MariaDB 10.3.10 - 10.3.13 (InnoDB), Docker-Volume: Overlay Filesystem on xfs-Filesystem within LVM Volume.


Attachments: PNG File MariaDB_10.3.10-10.3.13_Innodb_Assertion_row0sel.cc.png     PNG File MariaDB_10.3.10-10.3.13_Innodb_Table_items.png    
Issue Links:
Relates
relates to MDEV-19783 Random crashes and corrupt data in IN... Closed
relates to MDEV-13542 Crashing on a corrupted page is unhel... Closed

 Description   

Zabbix Servers with MariaDB comes up after restart with corrupted Zabbix items table. It always affects the same table. All other tables are good.

Because of the corrupted table, MariaDB Server refuses to start with assertion in row0sel.cc line 2968.

Trying to recover the corrupted tables according to https://mariadb.com/kb/en/library/innodb-recovery-modes/ didn't work for all recovery modes from 1-6.

my.cnf:
-------
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
[mysqld]
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc_messages_dir = /usr/share/mysql
lc_messages = en_US
skip-external-locking
max_connections = 100
connect_timeout = 5
wait_timeout = 600
max_allowed_packet = 16M
thread_cache_size = 128
sort_buffer_size = 4M
bulk_insert_buffer_size = 16M
tmp_table_size = 32M
max_heap_table_size = 32M
myisam_recover_options = BACKUP
key_buffer_size = 128M
table_open_cache = 400
myisam_sort_buffer_size = 512M
concurrent_insert = 2
read_buffer_size = 2M
read_rnd_buffer_size = 1M
query_cache_limit = 128K
query_cache_size = 64M
long_query_time = 10
expire_logs_days = 10
max_binlog_size = 100M
default_storage_engine = InnoDB
innodb_log_file_size = 1250M
innodb_log_files_in_group = 4
innodb_log_buffer_size = 1250M
innodb_flush_log_at_trx_commit = 1
innodb_buffer_pool_size = 5G
innodb_buffer_pool_instances = 4
innodb_buffer_pool_chunk_size = 1250M
innodb_file_per_table = 1
innodb_open_files = 400
innodb_io_capacity = 400
innodb_flush_method = O_DIRECT
[galera]
[mysqldump]
quick
quote-names
max_allowed_packet = 16M
[mysql]
[isamchk]
key_buffer = 16M
!include /etc/mysql/mariadb.cnf
!includedir /etc/mysql/conf.d/



 Comments   
Comment by Marko Mäkelä [ 2019-03-21 ]

msc, please attach text logs next time. Images are not searchable. In MariaDB_10.3.10-10.3.13_Innodb_Assertion_row0sel.cc.png I can see that the following assertion failed when attempting to access an off-page (BLOB or long VARCHAR or CHAR) column in the function row_sel_store_mysql_field():

                if (UNIV_UNLIKELY(!data)) {
                        /* The externally stored field was not written
                        yet. This record should only be seen by
                        recv_recovery_rollback_active() or any
                        TRX_ISO_READ_UNCOMMITTED transactions. */
 
                        if (heap != prebuilt->blob_heap) {
                                mem_heap_free(heap);
                        }
 
                        ut_a(prebuilt->trx->isolation_level
                             == TRX_ISO_READ_UNCOMMITTED);
                        DBUG_RETURN(FALSE);
                }

A normal access to a table should never encounter a BLOB pointer that has not been written yet. Non-locking (MVCC) reads should ignore records or record versions that are in the process of being written (the change has not been committed yet), and locking reads should wait until the change has been committed and the record lock has been released. Only at the READ UNCOMMITTED isolation level, such incomplete BLOBs are tolerated (and returned as empty columns).

The question is: How did the table get corrupted? Was there a database crash at some point? Did crash recovery fail? Did you remove the redo log files or set innodb_force_recovery? Did you copy the data files in an unsafe fashion at some point in the past?

The complete server error log could help narrow down the problem.

By the way, it could be that CHECK TABLE is not trying to validate BLOB columns. CHECKSUM TABLE should be better in that respect.

Until MDEV-13542 has been fixed, InnoDB does easily crash when encountering any form of corruption. So, I would suggest to copy the data to a non-production system for analysis.

Comment by Martin Schroeter [ 2019-03-28 ]

@Marko Mäkelä: Thanks a lot for your response. I have a separated machine now to make further tests. Next time I can provide logs instead of screenshots. I'm trying to reproduce the problem on this machine now.

Comment by Martin Schroeter [ 2019-04-15 ]

Looks like I can reproduce the situation:

1) Running mariadb 10.3.13. `check table items` says all good.
2) doing full backup with mariabackup.
3) doing incremental backup with mariabackup based on the last full backup.
4) copy all data to anaother machine also with mariadb 10.3.13.
5) prepare full backup + prepare incremental backup (based on this full backup)
6) copying the data back (--copy-back) to the datadir and start DB. `check table items` fails.

Comment by Martin Schroeter [ 2019-04-15 ]

I tried once more the same as above, but just without using the incremental backup. This leads to the same corrupted items table.

This table has 37981 rows. `check table items` shows

[Error] InnoDB: Record overlaps another
[Error] InnoDB: Apparent corruption in space 179 page 240 index PRIMARY
[Error] InnoDB: In page 240 of index PRIMARY of table zbx.items

MariaDB [zbx]> check table items;
Table | Op | Msg_type | Msg_text
zbx.items | check | Warning | InnoDB: The B-tree of index PRIMARY is corrupted.
zbx.items | check | Warning | InnoDB: Index 'items_1' contains 37981 entries, should be 18446744073709551615
zbx.items | check | Warning | InnoDB: Index 'items_2' contains 37981 entries, should be 18446744073709551615
zbx.items | check | Warning | InnoDB: Index 'items_3' contains 37981 entries, should be 18446744073709551615
zbx.items | check | Warning | InnoDB: Index 'items_4' contains 37981 entries, should be 18446744073709551615
zbx.items | check | Warning | InnoDB: Index 'items_5' contains 37981 entries, should be 18446744073709551615
zbx.items | check | Warning | InnoDB: Index 'items_6' contains 37981 entries, should be 18446744073709551615
zbx.items | check | Warning | InnoDB: Index 'items_7' contains 37981 entries, should be 18446744073709551615
zbx.items | check | error | Corrupt

Comment by Martin Schroeter [ 2019-04-16 ]
  • Additionally done memtest86+ on the RH7 VM with no errors
  • Installed Mariadb 10.3.14 without using Docker / overlayfs - but leads to the same behavior as above (zbx.items corrupt)
Comment by Martin Schroeter [ 2019-04-25 ]

The database has been installed newly and the Zabbix-DB has been imported from mysqldump. This works fine and all tables are consistent now.

I assume the mariabackup only provides damaged restores, because something is damaged on the mariadb instance and mariabackup just backs up this inconsistencies.

I would recommend to close this ticket. The only concrete thing found is, that there were VM crashes wich may have been caused by https://support.mesosphere.com/s/article/Critical-Issue-KMEM-MSPH-2018-0006

Comment by Marko Mäkelä [ 2019-05-21 ]

msc, it sounds like there could be a bug in mariabackup (failure to detect an error while creating or preparing the backup). Can you please try to narrow it down more?

Comment by Walter Doekes [ 2019-10-18 ]

Hi there. We just ran into this problem. It was reproducible apparently, because we recently imported a dump and the problem persisted over the imports.

Because of the corrupted table, MariaDB Server refuses to start with assertion in row0sel.cc line 2968.

In our case, we start just fine, but we have trouble with table accesses.

Source server: mariadb 10.3.12 (origin of the db dump)

Problematic server:

Server version: 10.3.16-MariaDB-1:10.3.16+maria~bionic-log mariadb.org binary distribution

MariaDB [acceptance1]> check table xxx_xxxxx; 
+-----------------------+-------+----------+----------------------------------------------------------------------------------------------------+
| Table                 | Op    | Msg_type | Msg_text                                                                                           |
+-----------------------+-------+----------+----------------------------------------------------------------------------------------------------+
| acceptance1.xxx_xxxxx | check | Warning  | InnoDB: The B-tree of index PRIMARY is corrupted.                                                  |
| acceptance1.xxx_xxxxx | check | Warning  | InnoDB: Index 'xxx_xxxxx_xxxxxxx_id_index' contains 30399 entries, should be 18446744073709551615. |
| acceptance1.xxx_xxxxx | check | error    | Corrupt                                                                                            |
+-----------------------+-------+----------+----------------------------------------------------------------------------------------------------+
3 rows in set (0.379 sec)
 
MariaDB [(none)]> select count(*) from acceptance1.xxx_xxxxx;
+----------+
| count(*) |
+----------+
|    30399 |
+----------+

In the source DB, this table is fine.

Any UPDATE or ANALYZE TABLE on this table causes the assertion as mentioned:

2019-10-18 09:05:40 0x7f1bf05ad700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.3.16/storage/innobase/row/row0sel.cc line 2986
InnoDB: Failing assertion: prebuilt->trx->isolation_level == TRX_ISO_READ_UNCOMMITTED
InnoDB: We intentionally generate a memory trap.
...
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f1d43ac5801]
mysqld(+0x49f856)[0x557d83a93856]
mysqld(+0x49ec8c)[0x557d83a92c8c]
mysqld(+0x49ef89)[0x557d83a92f89]
mysqld(+0x9bb30a)[0x557d83faf30a]
mysqld(+0x8df357)[0x557d83ed3357]
mysqld(_ZN7handler11ha_rnd_nextEPh+0x79)[0x557d83d4bd29]
mysqld(_Z13rr_sequentialP11READ_RECORD+0x22)[0x557d83e5bcd2]
mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_orderybPySA_+0xe9e)[0x557d83c085de]
mysqld(_Z21mysql_execute_commandP3THD+0x3cb0)[0x557d83b6ab40]
mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x20a)[0x557d83b6f13a]
mysqld(+0x57b9e6)[0x557d83b6f9e6]
mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x133f)[0x557d83b711cf]
mysqld(_Z10do_commandP3THD+0x178)[0x557d83b72908]
mysqld(_Z24do_handle_one_connectionP7CONNECT+0x212)[0x557d83c44a12]
mysqld(handle_one_connection+0x3d)[0x557d83c44bed]

If we're trying to allocate values close to (uint64_t)-1, then you'd expect failure indeed:

                /* NOTE: if we are retrieving a big BLOB, we may
                already run out of memory in the next call, which
                causes an assert */
 
                data = btr_rec_copy_externally_stored_field(
                        rec, offsets,
                        dict_table_page_size(prebuilt->table),
                        field_no, &len, heap);
 
                if (UNIV_UNLIKELY(!data)) {
                    /* assertion is here, as you already pointed out */

Data has been synced using:

  • nightly mariabackup 1:10.3.12+maria~bionic
  • this data was then prepared on the source host (mariabackup --prepare --target-dir=/tank/mysql/restore)
  • this data was then copied (zfs send/recv) to the destination host
  • and started with mariadb 10.3.16

If I come up with any other info, I'll amend.

Comment by Marko Mäkelä [ 2019-10-20 ]

msc, I believe that the problems that you are seeing are related to using instant ADD COLUMN. Unfortunately, we have been unable to reproduce that so far, despite adding consistency checks to debug builds. Let us track that one in MDEV-19783.

Comment by Marko Mäkelä [ 2019-10-20 ]

wdoekes, by data dump, do you mean a logical dump, which consists of SQL statements? If so, that could help us repeat the problem.

A physical dump (used with ALTER TABLE…IMPORT TABLESPACE) would not help, because the steps between the time the corruption was introduced and it was noticed would be unknown. We have been trying hard to reproduce this with a sequence of SQL statements, but we have so far been unsuccessful. If I had such a sequence, this bug should be trivial to locate and fix.

You can upload files to ftp.mariadb.com. It would only be accessible to some employees of MariaDB Corporation. You can obfuscate as much of the data as possible, while still keeping it repeatable. I think that for this bug, the PRIMARY KEY values must be preserved. For any other columns, the data should not matter (except that for any variable-length columsn, the lengths should be preserved).

Once again, I am not interested in a physical dump (.ibd file).

Comment by Walter Doekes [ 2019-10-21 ]

Hi Marko,

thanks for the reply.

I don't know if this table was altered with instant-columns. But the "fix" you wrote in https://jira.mariadb.org/browse/MDEV-19783?focusedCommentId=132884&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-132884 does not appear to work.

Dropping the index on the FK succeeded, but the corruption persisted on the PK.

MariaDB [acceptance1]> check table xxx_xxxxx;
+-----------------------+-------+----------+---------------------------------------------------+
| Table                 | Op    | Msg_type | Msg_text                                          |
+-----------------------+-------+----------+---------------------------------------------------+
| acceptance1.xxx_xxxxx | check | Warning  | InnoDB: The B-tree of index PRIMARY is corrupted. |
| acceptance1.xxx_xxxxx | check | error    | Corrupt                                           |
+-----------------------+-------+----------+---------------------------------------------------+
2 rows in set (0.335 sec)

Suggested fix from MDEV-19783:

MariaDB [acceptance1]> ALTER TABLE xxx_xxxxx FORCE;
command terminated with exit code 137

No luck. Is it expected to crash on ALTER TABLE FORCE? Or is this new to you?

do you mean a logical dump, which consists of SQL statements? If so, that could help us repeat the problem.

I'm sorry, no such luck. Only ibd files.

We'll see if we can poke at it from another angle in the mean time. The productions servers have been up and running since Feb. So after reading some bug comments, I'm becoming a bit nervous they'll get in trouble after a restart.

Comment by Walter Doekes [ 2019-10-21 ]

So. We got ourselves a cleaner environment, and recreated the crash with only this table. Using the original MariaDB 10.3.12.

DROP TABLE didn't cause a crash. And reloading the data from a mysqldump fixed it in development. That sounds like a viable fix on production then. (Unfortunately that won't bring this ticket any closer to a resolution.)

Generated at Thu Feb 08 08:47:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.