[MDEV-29435] CHECK TABLE forgets to release latches after reporting failure Created: 2022-09-01  Updated: 2022-09-08  Resolved: 2022-09-01

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.9, 10.7.5, 10.8.4, 10.9.2, 10.10.1
Fix Version/s: 10.6.10, 10.7.6, 10.8.5, 10.9.3, 10.10.2

Type: Bug Priority: Blocker
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: regression-10.6

Issue Links:
Problem/Incident
is caused by MDEV-13542 Crashing on a corrupted page is unhel... Closed

 Description   

As part of MDEV-13542, the CHECK TABLE code was refactored so that it would avoid crashes due to corrupted data. mleich produced a core dump where something had caused corruption, and subsequently shutdown crashed:

10.6 92032499874259bae7455130958ea7f38c4d53a3

# 2022-08-31T09:36:38 [1748421] | Version: '10.6.10-MariaDB-debug-log'  socket: '/dev/shm/rqg/1661956911/39/1_clone/mysql.sock'  port: 25364  Source distribution
# 2022-08-31T09:36:38 [1748421] | 2022-08-31  9:34:54 5 [Warning] InnoDB: Cannot save statistics for table test.t8 because file ./test/t8.ibd cannot be decrypted.
# 2022-08-31T09:36:38 [1748421] | 2022-08-31  9:34:55 5 [ERROR] InnoDB: In page 14 of index PRIMARY of table test.t8
# 2022-08-31T09:36:38 [1748421] | InnoDB: broken FIL_PAGE_NEXT link
# 2022-08-31T09:36:38 [1748421] | 2022-08-31  9:34:55 5 [ERROR] InnoDB: In page 42 of index k of table test.t8
# 2022-08-31T09:36:38 [1748421] | InnoDB: broken FIL_PAGE_NEXT link
# 2022-08-31T09:36:38 [1748421] | 2022-08-31  9:35:04 0 [Note] /data/Server_bin/bb-10.6-MDEV-29374_asan/bin/mysqld (initiated by: root[root] @ localhost [127.0.0.1]): Normal shutdown
# 2022-08-31T09:36:38 [1748421] | 2022-08-31  9:35:04 0 [Note] InnoDB: FTS optimize thread exiting.
# 2022-08-31T09:36:38 [1748421] | 2022-08-31  9:35:04 0 [Note] InnoDB: Starting shutdown...
# 2022-08-31T09:36:38 [1748421] | 2022-08-31  9:35:04 0 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/rqg/1661956911/39/1_clone/data/ib_buffer_pool
# 2022-08-31T09:36:38 [1748421] | 2022-08-31  9:35:04 0 [Note] InnoDB: Restricted to 95 pages due to innodb_buf_pool_dump_pct=25
# 2022-08-31T09:36:38 [1748421] | 2022-08-31  9:35:04 0 [Note] InnoDB: Buffer pool(s) dump completed at 220831  9:35:04
# 2022-08-31T09:36:38 [1748421] | mysqld: /data/Server/bb-10.6-MDEV-29374/storage/innobase/include/sux_lock.h:79: void sux_lock<ssux>::free() [with ssux = ssux_lock_impl<false>]: Assertion `!writer.load(std::memory_order_relaxed)' failed.

The broken FIL_PAGE_NEXT link messages were reported by btr_validate_level(), which is executed as part of non-QUICK CHECK TABLE. The crash appears to occur because we forgot to release the index latch after reporting the corruption:

diff --git a/storage/innobase/btr/btr0btr.cc b/storage/innobase/btr/btr0btr.cc
index 772ac99a5d5..3e48955e85a 100644
--- a/storage/innobase/btr/btr0btr.cc
+++ b/storage/innobase/btr/btr0btr.cc
@@ -4879,6 +4879,7 @@ btr_validate_level(
 loop:
 	if (!block) {
 invalid_page:
+		mtr.commit();
 func_exit:
 		mem_heap_free(heap);
 		return err;

The messages about failing to save persistent statistics seem to be unrelated to this, because that code is not accessing the dict_index_t::lock at all.

When it comes to the cause of the corruption itself, I think that an rr replay trace will be needed.



 Comments   
Comment by Sergei Golubchik [ 2022-09-08 ]

Summary: if a non-quick CHECK TABLE detects a corruption of an InnoDB table, the corrupted table might stay locked until server shutdown.

Generated at Thu Feb 08 10:08:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.