[MDEV-13542] Crashing on a corrupted page is unhelpful - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL)
Fix Version/s: 10.6.9, 10.7.5, 10.8.4, 10.9.2
Component/s: Storage Engine - InnoDB
Labels:

Description

Since the very first version, InnoDB was aborting the whole MySQL or MariaDB server when trying to access a corrupted page.

Earlier, small steps have been taken to improve MariaDB stability, such as ~~MDEV-12253~~.

We must enable error handling for corrupted pages instead of allowing the server to crash. This is not only about detecting a checksum failure when reading a corrupted page (like suggested in MySQL Bug#10132 Crashing the server on corrupt InnoDB page is unhelpful, filed on April 25, 2005), but also about performing some operations on corrupted pages whose checksum appears valid.

In some background operations, such as the purge of committed transaction history, errors may be reported to the server error log only, or silently ignored. While executing SQL, errors should be reported back to the client, and the transaction could be rolled back.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

cursor_restore_supremum.patch
2 kB
2022-05-19 14:55

Issue Links

blocks

MDEV-18519 0x7f0118195700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.3.7/storage/innobase/btr/btr0cur.cc line 4308

Closed

MDEV-18976 Implement a CHECKSUM redo log record for improved validation

Closed

MDEV-21098 Crash in rec_get_offsets_func() due to invalid rec_get_status()

Closed

MDEV-22388 Corrupted undo log record leads to server crash

Closed

MDEV-27593 Crashing on I/O error is unhelpful

Closed

MDEV-28457 Crash in page_dir_find_owner_slot()

Closed

MDEV-29603 btr_cur_open_at_index_side() is missing some consistency checks

Closed

MDEV-34830 LSN in the future is not being treated as serious corruption

Closed

causes

MDEV-28845 InnoDB: Failing assertion: bpage->can_relocate() in buf0lru.cc

Closed

MDEV-28950 Assertion `*err == DB_SUCCESS' failed in btr_page_split_and_insert

Closed

MDEV-29321 Percona XtraDB 5.7 can't be upgrade to MariaDB 10.6 or above

Closed

MDEV-29374 Frequent "Data structure corruption" in InnoDB after OOM

Closed

MDEV-29383 Assertion mysql_mutex_assert_owner(&log_sys.flush_order_mutex) failed in mtr_t::commit()

Closed

MDEV-29435 CHECK TABLE forgets to release latches after reporting failure

Closed

MDEV-31333 fsp_free_page() fails to move the extent from FSP_FREE_FRAG to FSP_FREE list

Closed

MDEV-35430 my_snprintf fixes for 10.6+

Closed

is duplicated by

MDEV-14170 InnoDB: Failing assertion: sibling_mode == RW_NO_LATCH || btr_page_get_next(get_block->frame, mtr) == page_get_page_no(page)

Closed

MDEV-16497 Server crashes on corrupted pages during DROP TABLE

Closed

MDEV-26550 [FATAL] [ERROR] mysqld got signal 6 ;InnoDB: Aborting because of a corrupt database page.

Closed

MDEV-27865 buf_page_read_complete assertions during during innodb_init

Closed

MDEV-28148 InnoDB slave crash / signal 6

Closed

MDEV-28574 Assertion failure in fil_invalid_page_access_msg() while running CHECK TABLE to find the details on corruption suspected

Closed

MDEV-28786 InnoDB crash leads to pagesize comparison failure

Closed

MDEV-29126 DB is crashing repeatedly with error message "InnoDB: Rec offset 99, cur1 offset 1723, cur2 offset 16095"

Closed

MDEV-29130 InnoDB: Assertion failure in file storage/innobase/fsp/fsp0fsp.cc line 1562

Closed

MDEV-29150 DB continuously rebooting

Closed

MDEV-33398 Server crashes when executing DML with innodb_checksum_algorithm set to strict_innodb

Closed

relates to

MDEV-12112 corruption in encrypted table may be overlooked

Closed

MDEV-13680 InnoDB may crash when btr_page_alloc() fails

Closed

MDEV-17810 Improve error printout when decryption fails or we identify page as both encrypted and unencrypted

Closed

MDEV-17957 Make Innodb_checksum_algorithm stricter for strict_* values

Closed

MDEV-17958 Make bug-endian innodb_checksum_algorithm=crc32 optional

Closed

MDEV-18025 Mariabackup fails to detect corrupted page_compressed=1 tables

Closed

MDEV-18026 Make mariabackup option to double read the pages from data file.

Closed

MDEV-18455 Crash On Table structure Change on very large table - With filesystem not having enough space, rendering the table (and server) unusable

Closed

MDEV-18691 Failing assertion: page_is_comp(next_page) == page_is_comp(page) in btr0pcur.cc

Closed

MDEV-18932 MariaDB 10.3.10-10.3.13 corrupts table and refuses to start with assertion in row0sel.cc 2986

Closed

MDEV-19081 DB crashes periodicly with - InnoDB: Assertion failure

Closed

MDEV-19541 InnoDB crashes when trying to recover a corrupted page

Closed

MDEV-21030 MariaDB keepts crashing on start

Closed

MDEV-21513 Fix some crashes in ALTER TABLE…IMPORT TABLESPACE

Closed

MDEV-21863 InnoDB: Assertion failure rem0rec.cc 1878

Closed

MDEV-22373 Unable to find a record to delete-mark ends up crashing mysqld process after upgrading from 10.1.43 to 10.4

Closed

MDEV-22481 Server crash after access to table with imported tablespace (after restart)

Closed

MDEV-22624 mariadb used for wordpress restarts and crashes continuously

Closed

MDEV-23344 InnoDB: Fatal: Trying to access page number 57 in space 182 space name zabbix/items.

Closed

MDEV-23734 InnoDB: File (unknown): 'read' returned OS error 205. Cannot continue operation

Closed

MDEV-26780 Tablespaces corrupt, MariaDB does not start

Closed

MDEV-26893 innodb assertion on startup - rem0rec:877

Closed

MDEV-28143 Data table corruption/crashing on btrfs

Open

MDEV-28975 AWS RDS mariadb 10.5.12 crashes upon OS update

Closed

MDEV-29082 InnoDB: Failing assertion: !cursor->index->is_committed()

Closed

MDEV-29938 InnoDB: Assertion failure in btr0pcur.cc line 532

Open

MDEV-30132 Crash after recovery, with InnoDB: Tried to read ... bytes at offset

Closed

MDEV-31353 InnoDB recovery hangs after reporting corruption

Closed

MDEV-32269 InnoDB after ALTER TABLE…IMPORT TABLESPACE may not be crash safe

Closed

MDEV-33325 Crash in flst_read_addr on corrupted data

Closed

MDEV-35413 InnoDB: Cannot load compressed BLOB (ROW_FORMAT=COMPRESSED table)

Closed

MDEV-35729 InnoDB: Trying to read page … which is outside the tablespace bounds

Closed

MDEV-9663 InnoDB assertion failure: *cursor->index->name == TEMP_INDEX_PREFIX, or !cursor->index->is_committed()

Closed

MDEV-11125 Introduce a reduced doublewrite mode, handling error detection only

Closed

MDEV-12253 Buffer pool blocks are accessed after they have been freed

Closed

MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC

Closed

MDEV-13536 DB_TRX_ID is not actually being reset when the history is removed

Closed

MDEV-13559 encryption.innodb-redo-badkey failed in buildbot

Closed

MDEV-14132 innodb.innodb-64k failed in buildbot, lost connection to server

Closed

MDEV-14481 Execute InnoDB crash recovery in the background

Closed

MDEV-16763 SIGNAL 6 ERROR

Closed

MDEV-17482 InnoDB fails to say which fatal error fsync() returned

Closed

MDEV-17638 Improve error message about corruption of encrypted page

Closed

MDEV-17641 start slave and mariadb server crash

Closed

MDEV-17947 mariabackup SST encounters InnoDB tablespace errors during prepare, ignores them, server crashes

Closed

MDEV-18115 Remove dummy tablespace for the redo log

Closed

MDEV-18529 InnoDB wrongly skips decryption of encrypted page if unencrypted and post-encrypted checksums match

Closed

MDEV-18606 innodb crashes on large update and it gets corrupted

Closed

MDEV-19711 Assertion failure during IMPORT TABLESPACE

Closed

MDEV-20931 ALTER...IMPORT can crash the server

Closed

MDEV-24287 mysqlcheck unexpectedly causes table to be marked as corrupt

Closed

MDEV-24412 Mariadb 10.5: InnoDB: Upgrade after a crash is not supported

Closed

MDEV-24578 MariaDB 10.5 fails to join Galera cluster of MariaDB 10.1, root cause SST failure, symptom is InnoDB: Assertion failure

Closed

MDEV-26022 InnoDB page data size mismatch crash

Closed

MDEV-26537 InnoDB corrupts files due to incorrect st_blksize calculation

Closed

MDEV-27053 Crash on assertion failure in btr0cur.cc - apparent index corruption

Closed

MDEV-27538 MariaDB server startup crash

Closed

MDEV-28002 Failing assertion: btr_page_get_prev(next_page, mtr) == btr_pcur_get_block(cursor)->page.id.page_no()

Closed

MDEV-28174 InnoDB: Assertion failure

Closed

MDEV-28349 Provide "crash safe" options for CHECK TABLE and ALTER TABLE ... CHECK PARTITION ...

Open

MDEV-29276 [FATAL] InnoDB: Page old data size 7898 new data size 7922, page old max ins size 8288 new max ins size 8264

Closed

MDEV-30397 InnoDB crash due to DB_FAIL reported for a corrupted page

Closed

MDEV-31245 Assertion Failure results in crash

Closed

MDEV-31767 InnoDB tables are being flagged as corrupted on an I/O bound server

Closed

MDEV-32044 Mariadb crash after upgrading to 11.0.3: Failing assertion: local_len >= BTR_EXTERN_FIELD_REF_SIZE

Closed

MDEV-34233 InnoDB crashes due to corrupted ibdata1 (Assertion failure in innodb.undo_page)

Closed

MDEV-34773 Troubles with DB

Closed

SAMU-289 Loading...

links to

Intentionally cause an I/O error in Linux?

(3 blocks, 8 causes, 11 is duplicated by, 68 relates to, 1 links to)

Activity

Ascending order - Click to sort in descending order

View 6 older comments

Marko Mäkelä added a comment - 2020-10-28 10:04

~~MDEV-23344~~ reports a crash when trying to access a page that does not exist in the tablespace.

Marko Mäkelä added a comment - 2020-10-28 10:04 MDEV-23344 reports a crash when trying to access a page that does not exist in the tablespace.

Marko Mäkelä added a comment - 2022-05-02 12:15

When it comes to reading corrupted pages into the buffer pool, we seem to already do the right thing with regard to buf_page_is_corrupted() and buf_page_t::read_complete(). One relevant change was ~~MDEV-19541~~.

It looks like the main part of the fix would be to check that error results from buf_page_get_gen() will be propagated from the caller.

I would exclude that in change buffer merge from this. We have seen crashes in btr_page_reorganize() that were invoked from ibuf_insert_to_index_page_low() during ibuf_merge_or_delete_for_page(). Due to the design of the change buffer, such crashes can also occur during CHECK TABLE, where some extra attention has been spent to avoid crashes due to corruption. We already disabled the change buffer by default (~~MDEV-27734~~) and deprecated it (~~MDEV-27735~~) due to corruption that we are unable to reproduce in our internal testing.

Marko Mäkelä added a comment - 2022-05-02 12:15 When it comes to reading corrupted pages into the buffer pool, we seem to already do the right thing with regard to buf_page_is_corrupted() and buf_page_t::read_complete() . One relevant change was MDEV-19541 . It looks like the main part of the fix would be to check that error results from buf_page_get_gen() will be propagated from the caller. I would exclude that in change buffer merge from this. We have seen crashes in btr_page_reorganize() that were invoked from ibuf_insert_to_index_page_low() during ibuf_merge_or_delete_for_page() . Due to the design of the change buffer, such crashes can also occur during CHECK TABLE , where some extra attention has been spent to avoid crashes due to corruption. We already disabled the change buffer by default ( MDEV-27734 ) and deprecated it ( MDEV-27735 ) due to corruption that we are unable to reproduce in our internal testing.

Marko Mäkelä added a comment - 2022-05-02 14:09

It looks like we already have a large part of these robustness fixes present in 10.5 (added by ~~MDEV-15528~~ and related changes) or 10.6 (added in ~~MDEV-25506~~ and changes related to atomic DDL). I already went through all direct callers of buf_page_get_gen(), and there is not that much to be fixed. I still have to check all callers of some 12 functions and then implement some fault injection in buf_page_get_gen() so that we can run some stress tests.

I would declare that those linked open bugs that are related to some other forms of corruption (say, a garbage BLOB pointer or DB_ROLL_PTR, or an incorrect key specified for decrypting an encrypted page) can be out of the scope of this fix. This fix should mainly be about handling failures to read a page.

Marko Mäkelä added a comment - 2022-05-02 14:09 It looks like we already have a large part of these robustness fixes present in 10.5 (added by MDEV-15528 and related changes) or 10.6 (added in MDEV-25506 and changes related to atomic DDL). I already went through all direct callers of buf_page_get_gen() , and there is not that much to be fixed. I still have to check all callers of some 12 functions and then implement some fault injection in buf_page_get_gen() so that we can run some stress tests. I would declare that those linked open bugs that are related to some other forms of corruption (say, a garbage BLOB pointer or DB_ROLL_PTR , or an incorrect key specified for decrypting an encrypted page) can be out of the scope of this fix. This fix should mainly be about handling failures to read a page.

Matthias Leich added a comment - 2022-05-19 14:54

The RQG testing on the development tree

     origin/bb-10.6-MDEV-13542 dca6937a30eeee56f46fc47ac85656243227e4d2 2022-05-16T16:31:15+03:00

showed frequent

- SEGV's with a backtrace like in the description on top of MDEV-25257

- cases where the server error log contains the message '[ERROR] mysqld got signal 11' but the server process

   did not disappear even after more than 300s waiting --> no core file nor rr trace

After applying Thiru's initial patch for MDEV-25257

    diff --git a/storage/innobase/dict/dict0load.cc b/storage/innobase/dict/dict0load.cc

index 20916c2c96e..2c3d48b9573 100644

--- a/storage/innobase/dict/dict0load.cc

+++ b/storage/innobase/dict/dict0load.cc

@@ -1383,6 +1383,7 @@ dict_load_columns(

                        the flag is set before the table is created. */

                        if (table->fts == NULL) {

                                table->fts = fts_create(table);

+                               table->fts->cache = fts_cache_create(table);

                        ut_a(table->fts->doc_col == ULINT_UNDEFINED);

 to the tree bb-10.6-MDEV-13542 both bad effects were no more seen again in testing.

Now some new dominant problem showed up.

[rr 3532891 209362]mysqld: /data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc:2682: buf_block_t* buf_page_get_low(page_id_t, ulint, ulint, buf_block_t*, ulint, mtr_t*, dberr_t*, bool): Assertion `mode == 11 || mode == 12 || block->zip_size() == zip_size' failed.

(rr) bt

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50

#1  0x00007fcf37482859 in __GI_abort () at abort.c:79

#2  0x00007fcf37482729 in __assert_fail_base (fmt=0x7fcf37618588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55996a841da0 "mode == 11 || mode == 12 || block->zip_size() == zip_size",

    file=0x55996a83d680 "/data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc", line=2682, function=<optimized out>) at assert.c:92

#3  0x00007fcf37493f36 in __GI___assert_fail (assertion=0x55996a841da0 "mode == 11 || mode == 12 || block->zip_size() == zip_size", file=0x55996a83d680 "/data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc",

    line=2682, function=0x55996a841880 "buf_block_t* buf_page_get_low(page_id_t, ulint, ulint, buf_block_t*, ulint, mtr_t*, dberr_t*, bool)") at assert.c:101

#4  0x00005599696fa35f in buf_page_get_low (page_id=..., zip_size=0, rw_latch=8, guess=0x0, mode=16, mtr=0x7fcf1a61aec0, err=0x7fcf1a61a770, allow_ibuf_merge=false)

    at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc:2682

#5  0x00005599696fbaad in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=8, guess=0x0, mode=16, mtr=0x7fcf1a61aec0, err=0x7fcf1a61a770, allow_ibuf_merge=false)

    at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc:2984

#6  0x00005599696ae711 in btr_cur_pess_upd_restore_supremum (block=0x7fcf2aea57e0, rec=0x7fcf2b49e781 "", mtr=0x7fcf1a61aec0) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/btr/btr0cur.cc:4782

#7  0x00005599696b1f88 in btr_cur_pessimistic_update (flags=10, cursor=0x61100005f180, offsets=0x7fcf1a61aac0, offsets_heap=0x7fcf1a61abc0, entry_heap=0x619000bf9080, big_rec=0x7fcf1a61aaa0, update=0x62100026f3f0, cmpl_info=0,

    thr=0x62100026f750, trx_id=1719, mtr=0x7fcf1a61aec0) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/btr/btr0cur.cc:5243

#8  0x000055996956ff1b in row_upd_clust_rec (flags=0, node=0x62100026f2c0, index=0x616002f91a08, offsets=0x619000bf9850, offsets_heap=0x7fcf1a61abc0, thr=0x62100026f750, mtr=0x7fcf1a61aec0)

    at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2456

#9  0x0000559969571812 in row_upd_clust_step (node=0x62100026f2c0, thr=0x62100026f750) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2729

#10 0x0000559969571f69 in row_upd (node=0x62100026f2c0, thr=0x62100026f750) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2792

#11 0x0000559969572a6b in row_upd_step (thr=0x62100026f750) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2934

#12 0x00005599694d46e5 in row_update_for_mysql (prebuilt=0x62100026e988) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0mysql.cc:1691

#13 0x000055996914b046 in ha_innobase::update_row (this=0x61d0007f94b8, old_row=0x61a0003c80f8 <incomplete sequence \375>, new_row=0x61a0003c7eb8 "\265")

    at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/handler/ha_innodb.cc:8668

#14 0x00005599688687dd in handler::ha_update_row (this=0x61d0007f94b8, old_data=0x61a0003c80f8 <incomplete sequence \375>, new_data=0x61a0003c7eb8 "\265") at /data/Server/bb-10.6-MDEV-13542G/sql/handler.cc:7602

#15 0x00005599683484c9 in mysql_update (thd=0x62b00009a218, table_list=0x62b0000a13f8, fields=..., values=..., conds=0x0, order_num=0, order=0x0, limit=18446744073709551602, ignore=false, found_return=0x7fcf1a61c2f0,

    updated_return=0x7fcf1a61c310) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_update.cc:1087

#16 0x000055996806794f in mysql_execute_command (thd=0x62b00009a218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:4423

#17 0x000055996807fd40 in mysql_parse (thd=0x62b00009a218, rawbuf=0x62b0000a1238 "UPDATE t2 SET col_text = REPEAT(SUBSTR(CAST( 2361 AS CHAR),1,1), @fill_amount)  /* E_R Thread2 QNO 1139 CON_ID 16 */", length=116,

    parser_state=0x7fcf1a61cb20) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:8045

#18 0x0000559968058133 in dispatch_command (command=COM_QUERY, thd=0x62b00009a218, packet=0x629000c53219 " UPDATE t2 SET col_text = REPEAT(SUBSTR(CAST( 2361 AS CHAR),1,1), @fill_amount)  /* E_R Thread2 QNO 1139 CON_ID 16 */ ",

    packet_length=118, blocking=true) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:1912

#19 0x0000559968055369 in do_command (thd=0x62b00009a218, blocking=true) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:1409

#20 0x0000559968458c53 in do_handle_one_connection (connect=0x608000002eb8, put_in_cache=true) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_connect.cc:1418

#21 0x00005599684584df in handle_one_connection (arg=0x608000002838) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_connect.cc:1312

#22 0x00007fcf379ac609 in start_thread (arg=<optimized out>) at pthread_create.c:477

#23 0x00007fcf3757f293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(rr)

pluto:/data/results/1652901598/TBR-1503/dev/shm/rqg/1652901598/57/1/rr

This problem disappeared after applying Thiru's cursor_restore_supremum.patch.

Matthias Leich added a comment - 2022-05-19 14:54 The RQG testing on the development tree origin/bb-10.6-MDEV-13542 dca6937a30eeee56f46fc47ac85656243227e4d2 2022-05-16T16:31:15+03:00 showed frequent - SEGV's with a backtrace like in the description on top of MDEV-25257 - cases where the server error log contains the message '[ERROR] mysqld got signal 11' but the server process did not disappear even after more than 300s waiting --> no core file nor rr trace After applying Thiru's initial patch for MDEV-25257 diff --git a/storage/innobase/dict/dict0load.cc b/storage/innobase/dict/dict0load.cc index 20916c2c96e..2c3d48b9573 100644 --- a/storage/innobase/dict/dict0load.cc +++ b/storage/innobase/dict/dict0load.cc @@ -1383,6 +1383,7 @@ dict_load_columns( the flag is set before the table is created. */ if (table->fts == NULL) { table->fts = fts_create(table); + table->fts->cache = fts_cache_create(table); } ut_a(table->fts->doc_col == ULINT_UNDEFINED); to the tree bb-10.6-MDEV-13542 both bad effects were no more seen again in testing. Now some new dominant problem showed up. [rr 3532891 209362]mysqld: /data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc:2682: buf_block_t* buf_page_get_low(page_id_t, ulint, ulint, buf_block_t*, ulint, mtr_t*, dberr_t*, bool): Assertion `mode == 11 || mode == 12 || block->zip_size() == zip_size' failed. (rr) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fcf37482859 in __GI_abort () at abort.c:79 #2 0x00007fcf37482729 in __assert_fail_base (fmt=0x7fcf37618588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55996a841da0 "mode == 11 || mode == 12 || block->zip_size() == zip_size", file=0x55996a83d680 "/data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc", line=2682, function=<optimized out>) at assert.c:92 #3 0x00007fcf37493f36 in __GI___assert_fail (assertion=0x55996a841da0 "mode == 11 || mode == 12 || block->zip_size() == zip_size", file=0x55996a83d680 "/data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc", line=2682, function=0x55996a841880 "buf_block_t* buf_page_get_low(page_id_t, ulint, ulint, buf_block_t*, ulint, mtr_t*, dberr_t*, bool)") at assert.c:101 #4 0x00005599696fa35f in buf_page_get_low (page_id=..., zip_size=0, rw_latch=8, guess=0x0, mode=16, mtr=0x7fcf1a61aec0, err=0x7fcf1a61a770, allow_ibuf_merge=false) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc:2682 #5 0x00005599696fbaad in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=8, guess=0x0, mode=16, mtr=0x7fcf1a61aec0, err=0x7fcf1a61a770, allow_ibuf_merge=false) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/buf/buf0buf.cc:2984 #6 0x00005599696ae711 in btr_cur_pess_upd_restore_supremum (block=0x7fcf2aea57e0, rec=0x7fcf2b49e781 "", mtr=0x7fcf1a61aec0) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/btr/btr0cur.cc:4782 #7 0x00005599696b1f88 in btr_cur_pessimistic_update (flags=10, cursor=0x61100005f180, offsets=0x7fcf1a61aac0, offsets_heap=0x7fcf1a61abc0, entry_heap=0x619000bf9080, big_rec=0x7fcf1a61aaa0, update=0x62100026f3f0, cmpl_info=0, thr=0x62100026f750, trx_id=1719, mtr=0x7fcf1a61aec0) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/btr/btr0cur.cc:5243 #8 0x000055996956ff1b in row_upd_clust_rec (flags=0, node=0x62100026f2c0, index=0x616002f91a08, offsets=0x619000bf9850, offsets_heap=0x7fcf1a61abc0, thr=0x62100026f750, mtr=0x7fcf1a61aec0) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2456 #9 0x0000559969571812 in row_upd_clust_step (node=0x62100026f2c0, thr=0x62100026f750) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2729 #10 0x0000559969571f69 in row_upd (node=0x62100026f2c0, thr=0x62100026f750) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2792 #11 0x0000559969572a6b in row_upd_step (thr=0x62100026f750) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0upd.cc:2934 #12 0x00005599694d46e5 in row_update_for_mysql (prebuilt=0x62100026e988) at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/row/row0mysql.cc:1691 #13 0x000055996914b046 in ha_innobase::update_row (this=0x61d0007f94b8, old_row=0x61a0003c80f8 <incomplete sequence \375>, new_row=0x61a0003c7eb8 "\265") at /data/Server/bb-10.6-MDEV-13542G/storage/innobase/handler/ha_innodb.cc:8668 #14 0x00005599688687dd in handler::ha_update_row (this=0x61d0007f94b8, old_data=0x61a0003c80f8 <incomplete sequence \375>, new_data=0x61a0003c7eb8 "\265") at /data/Server/bb-10.6-MDEV-13542G/sql/handler.cc:7602 #15 0x00005599683484c9 in mysql_update (thd=0x62b00009a218, table_list=0x62b0000a13f8, fields=..., values=..., conds=0x0, order_num=0, order=0x0, limit=18446744073709551602, ignore=false, found_return=0x7fcf1a61c2f0, updated_return=0x7fcf1a61c310) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_update.cc:1087 #16 0x000055996806794f in mysql_execute_command (thd=0x62b00009a218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:4423 #17 0x000055996807fd40 in mysql_parse (thd=0x62b00009a218, rawbuf=0x62b0000a1238 "UPDATE t2 SET col_text = REPEAT(SUBSTR(CAST( 2361 AS CHAR),1,1), @fill_amount) /* E_R Thread2 QNO 1139 CON_ID 16 */", length=116, parser_state=0x7fcf1a61cb20) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:8045 #18 0x0000559968058133 in dispatch_command (command=COM_QUERY, thd=0x62b00009a218, packet=0x629000c53219 " UPDATE t2 SET col_text = REPEAT(SUBSTR(CAST( 2361 AS CHAR),1,1), @fill_amount) /* E_R Thread2 QNO 1139 CON_ID 16 */ ", packet_length=118, blocking=true) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:1912 #19 0x0000559968055369 in do_command (thd=0x62b00009a218, blocking=true) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_parse.cc:1409 #20 0x0000559968458c53 in do_handle_one_connection (connect=0x608000002eb8, put_in_cache=true) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_connect.cc:1418 #21 0x00005599684584df in handle_one_connection (arg=0x608000002838) at /data/Server/bb-10.6-MDEV-13542G/sql/sql_connect.cc:1312 #22 0x00007fcf379ac609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #23 0x00007fcf3757f293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (rr) pluto:/data/results/1652901598/TBR-1503/dev/shm/rqg/1652901598/57/1/rr This problem disappeared after applying Thiru's cursor_restore_supremum.patch.

Marko Mäkelä added a comment - 2022-05-23 15:05

thiru, thank you for cursor_restore_supremum.patch. I hope that my simpler solution works as well.

Even today, the fault injection exposed a couple new crashes, which I fixed:

./mtr --parallel=auto --big-test --force --max-test-fail=0 --mysqld=--debug=d,intermittent_read_failure

My main effort today was to prevent crashes on change buffer merge failure (motivated by MDEV-28349) as well as on page split or merge. Such corruption should not be a direct consequence of a page checksum mismatch, but something else (say, ~~MDEV-28312~~, a bug in Galera snapshot transfer).

Marko Mäkelä added a comment - 2022-05-23 15:05 thiru , thank you for cursor_restore_supremum.patch . I hope that my simpler solution works as well. Even today, the fault injection exposed a couple new crashes, which I fixed: ./mtr --parallel=auto --big-test --force --max-test-fail=0 --mysqld=--debug=d,intermittent_read_failure My main effort today was to prevent crashes on change buffer merge failure (motivated by MDEV-28349 ) as well as on page split or merge. Such corruption should not be a direct consequence of a page checksum mismatch, but something else (say, MDEV-28312 , a bug in Galera snapshot transfer).

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 7 Vote for this issue

Watchers:: 25 Start watching this issue

Dates

Created:: 2017-08-16 06:02

Updated:: 2025-03-21 06:56

Resolved:: 2022-06-06 14:56

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration