[MDEV-22650] Dirty compressed page checksum validation fails Created: 2020-05-21  Updated: 2020-06-02  Resolved: 2020-06-01

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.2, 10.5.3, 10.2, 10.3, 10.4
Fix Version/s: 10.5.4, 10.2.33, 10.3.24, 10.4.14

Type: Bug Priority: Major
Reporter: Roel Van de Paar Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: not-10.1

Issue Links:
Relates
relates to MDEV-11686 Multiple encryption tests fail in bui... Closed

 Description   

USE test;
CREATE TABLE t (c int) ENGINE=InnoDB key_block_size= 4;
SET GLOBAL innodb_buffer_pool_evict='uncompressed';
SET GLOBAL innodb_checksum_algorithm=strict_none;
SELECT SLEEP(10);  # Server crashes during sleep

Leads to:

10.5.3 cfe5ee90c8e4b9dfa98a41fcd299197a59261be7

InnoDB: Failing assertion: page_zip_verify_checksum(frame, bpage->zip_size())

10.5.3 cfe5ee90c8e4b9dfa98a41fcd299197a59261be7

Core was generated by `/test/MD110520-mariadb-10.5.3-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
[Current thread is 1 (Thread 0x14d160bff700 (LWP 1008144))]
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
#1  0x000055618e9fec11 in my_write_core (sig=sig@entry=6) at /test/10.5_dbg/mysys/stacktrace.c:518
#2  0x000055618e1a3f8d in handle_fatal_signal (sig=6) at /test/10.5_dbg/sql/signal_handler.cc:329
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#5  0x000014d17eafa801 in __GI_abort () at abort.c:79
#6  0x000055618e7baaae in ut_dbg_assertion_failed (expr=expr@entry=0x55618ee9abc0 "page_zip_verify_checksum(frame, bpage->zip_size())", file=file@entry=0x55618ee9a538 "/test/10.5_dbg/storage/innobase/buf/buf0flu.cc", line=line@entry=1135) at /test/10.5_dbg/storage/innobase/ut/ut0dbg.cc:60
#7  0x000055618e85177f in buf_flush_write_block_low (sync=false, flush_type=BUF_FLUSH_LIST, bpage=0x14d15d071318) at /test/10.5_dbg/storage/innobase/buf/buf0flu.cc:1135
#8  buf_flush_page (bpage=bpage@entry=0x14d15d071318, flush_type=flush_type@entry=BUF_FLUSH_LIST, sync=sync@entry=false) at /test/10.5_dbg/storage/innobase/buf/buf0flu.cc:1325
#9  0x000055618e852f8f in buf_flush_try_neighbors (n_to_flush=200, n_flushed=132, flush_type=BUF_FLUSH_LIST, page_id=...) at /test/10.5_dbg/storage/innobase/buf/buf0flu.cc:1535
#10 buf_flush_page_and_try_neighbors (bpage=bpage@entry=0x14d15d071318, flush_type=flush_type@entry=BUF_FLUSH_LIST, n_to_flush=n_to_flush@entry=200, count=count@entry=0x14d160bfe9d8) at /test/10.5_dbg/storage/innobase/buf/buf0flu.cc:1598
#11 0x000055618e8553c6 in buf_do_flush_list_batch (lsn_limit=18446744073709551615, min_n=200) at /test/10.5_dbg/storage/innobase/buf/buf0flu.cc:1827
#12 buf_flush_batch (flush_type=flush_type@entry=BUF_FLUSH_LIST, min_n=min_n@entry=200, lsn_limit=lsn_limit@entry=18446744073709551615, n=n@entry=0x14d160bfec10) at /test/10.5_dbg/storage/innobase/buf/buf0flu.cc:1895
#13 0x000055618e855964 in buf_flush_do_batch (type=type@entry=BUF_FLUSH_LIST, min_n=200, lsn_limit=lsn_limit@entry=18446744073709551615, n=n@entry=0x14d160bfec10) at /test/10.5_dbg/storage/innobase/buf/buf0flu.cc:2018
#14 0x000055618e855aca in buf_flush_lists (min_n=<optimized out>, lsn_limit=lsn_limit@entry=18446744073709551615, n_processed=n_processed@entry=0x14d160bfeca8) at /test/10.5_dbg/storage/innobase/buf/buf0flu.cc:2077
#15 0x000055618e856466 in buf_flush_page_cleaner () at /test/10.5_dbg/storage/innobase/buf/buf0flu.cc:2996
#16 0x000014d17f7dd6db in start_thread (arg=0x14d160bff700) at pthread_create.c:463
#17 0x000014d17ebdb88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Bug confirmed present in:
MariaDB: 10.2.32 (dbg), 10.3.23 (dbg), 10.4.13 (dbg), 10.5.2 (dbg), 10.5.3 (dbg)

Bug confirmed not present in:
MariaDB: 10.1.45 (dbg), 10.1.45 (opt), 10.2.32 (opt), 10.3.23 (opt), 10.4.13 (opt), 10.5.2 (opt), 10.5.3 (opt), 10.5.4 (dbg), 10.5.4 (opt)
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.47 (dbg), 5.6.47 (opt), 5.7.29 (dbg), 5.7.29 (opt), 8.0.19 (dbg), 8.0.19 (opt)

Apparently not in 10.5.4?



 Comments   
Comment by Roel Van de Paar [ 2020-05-21 ]

Based on frames seen, potentially related to Valgrind issues seen in MDEV-11686

Comment by Thirunarayanan Balathandayuthapani [ 2020-05-21 ]

Strict_* modes of innodb_checksum_algorithm doesn't accept the page which has other checksum value.

1) Here page initially created with crc32 checksum
2) User changes the checksum algorithm to STRICT_NONE
3) while processing the initially created crc32 page fails the check of STRICT_NONE.

I think assertion is very strict and wrong in few cases. InnoDB should remove the assertion

Comment by Marko Mäkelä [ 2020-05-21 ]

Starting with 10.4, files created with innodb_checksum_algorithm=full_crc32 will always use that algorithm. But the algorithm is intentionally not available for ROW_FORMAT=COMPRESSED (we did not want to introduce a new file format for such tables).

The crash is claimed to occur during page flushing, not when reading a page into the buffer pool. The checksum should be calculated during the flushing. So, this report seems legitimate to me, even though I see no valid use case for using anything else than innodb_checksum_algorithm=full_crc32 or innodb_checksum_algorithm=crc32.

Comment by Thirunarayanan Balathandayuthapani [ 2020-05-22 ]

buf_LRU_free_page() does write checksum value for compressed page.
buf_LRU_free_page() is triggered by SET GLOBAL innodb_buffer_pool_evict='uncompressed';
That's why we're getting crash after changing algorithm value

Comment by Thirunarayanan Balathandayuthapani [ 2020-05-28 ]

patch is in bb-10.2-thiru

Comment by Marko Mäkelä [ 2020-05-28 ]

The code change looks OK to me. I would recommend not to add the test case, because the innodb_checksum_algorithm=strict_none could cause crashes in background operations (change buffer merge, purge).

Comment by Roel Van de Paar [ 2020-06-02 ]

Secondary testcase to test any fix with

USE test;
CREATE TABLE t(a INT) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;
SET GLOBAL innodb_buffer_pool_evict='uncompressed';
SET GLOBAL innodb_checksum_algorithm=3;
SELECT SLEEP(5);

Generated at Thu Feb 08 09:16:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.