[MDEV-24589] DROP TABLE is not crash-safe Created: 2021-01-14  Updated: 2021-05-21  Resolved: 2021-04-22

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.6.0

Type: Bug Priority: Blocker
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: rr-profile

Issue Links:
Blocks
blocks MDEV-17567 Atomic DDL Closed
blocks MDEV-18518 Implement atomic multi-table (or mult... Closed
blocks MDEV-25180 Atomic ALTER TABLE Closed
is blocked by MDEV-23844 Atomic DROP TABLE (single table) Closed
Problem/Incident
causes MDEV-25265 ALTER TABLE…IMPORT TABLESPACE fails a... Closed
causes MDEV-25491 Race condition between DROP TABLE and... Closed
causes MDEV-25522 Assertion `page_is_leaf(block->frame)... Closed
Relates
relates to MDEV-18518 Implement atomic multi-table (or mult... Closed
relates to MDEV-24569 Assertion `mach_read_from_4(frame + 4... Closed

 Description   

Workflow:
1. Start the server
2. Run a DDL/DML mix by 9 sessions
3. During 2. is ongoing kill the server
4. Restart attempt with success
5. Certain SQLs
      # 2021-01-13T09:36:00 [3527603] INFO: Reporter 'CrashRecovery1': Executing CHECK TABLE `test`.`t6` EXTENDED.
     # 2021-01-13T09:36:05 [3527603] INFO: Reporter 'CrashRecovery1': Executing ANALYZE TABLE `test`.`t6`.
     # 2021-01-13T09:36:06 [3527603] INFO: Reporter 'CrashRecovery1': Executing OPTIMIZE TABLE `test`.`t6`.
     # 2021-01-13T09:36:15 [3527603] INFO: Reporter 'CrashRecovery1': Executing REPAIR TABLE `test`.`t6` EXTENDED.
     # 2021-01-13T09:36:15 [3527603] INFO: Reporter 'CrashRecovery1': Executing ALTER TABLE `test`.`t6` FORCE.
     # 2021-01-13T09:37:03 [3527603] WARN: Auxpid 3743269 exited with exit status 134.  <== The process which started the server under "rr" exited.
 
Some pointers may be invalid and cause the dump to abort.
 
# 2021-01-13T09:37:03 [3527603] | [rr 3743312 163255]Query (0x0): [rr 3743312 163257](null)[rr 3743312 163259]
# 2021-01-13T09:37:03 [3527603] | Connection ID (thread ID): 0
# 2021-01-13T09:37:03 [3527603] | [rr 3743312 163261]Status: NOT_KILLED
 
sdp:/home/mleich/RQG_O/storage/1610556037/tmp1/dev/shm/vardir/1610556037/232/1/rr
_RR_TRACE_DIR="." rr replay --mark-stdio
     (rr) bt
 
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000032f55d44e859 in __GI_abort () at abort.c:79
#2  0x000032f55d44e729 in __assert_fail_base (fmt=0x32f55d5e4588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5558f9fcfd60 "index->id == btr_page_get_index_id(page)",  file=0x5558f9fcd660 "/Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0cur.cc", line=1811, function=<optimized out>) at assert.c:92
#3  0x000032f55d45ff36 in __GI___assert_fail (assertion=0x5558f9fcfd60 "index->id == btr_page_get_index_id(page)", file=0x5558f9fcd660 "/Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0cur.cc", line=1811, 
    function=0x5558f9fcef00 "dberr_t btr_cur_search_to_nth_level_func(dict_index_t*, ulint, const dtuple_t*, page_cur_mode_t, ulint, btr_cur_t*, rw_lock_t*, const char*, unsigned int, mtr_t*, ib_uint64_t)") at assert.c:101
#4  0x00005558f890c882 in btr_cur_search_to_nth_level_func (index=0x61700008afa0, level=0, tuple=0x625001a1fdc0, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x61f000000540, ahi_latch=0x0, 
    file=0x5558f9eb08a0 "/Server/bb-10.5-MDEV-24569/storage/innobase/row/row0row.cc", line=1213, mtr=0x233825740060, autoinc=0) at /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0cur.cc:1811
#5  0x00005558f873fbdf in btr_pcur_open_low (index=0x61700008afa0, level=0, tuple=0x625001a1fdc0, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x61f000000540, file=0x5558f9eb08a0 "/Server/bb-10.5-MDEV-24569/storage/innobase/row/row0row.cc", line=1213, autoinc=0, 
    mtr=0x233825740060) at /Server/bb-10.5-MDEV-24569/storage/innobase/include/btr0pcur.ic:441
#6  0x00005558f87461f6 in row_search_on_row_ref (pcur=0x61f000000540, mode=2, table=0x618000040920, ref=0x625001a1fdc0, mtr=0x233825740060) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0row.cc:1213
#7  0x00005558f872edfd in row_purge_reposition_pcur (mode=2, node=0x61f0000004a0, mtr=0x233825740060) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0purge.cc:78
#8  0x00005558f8731a05 in row_purge_reset_trx_id (node=0x61f0000004a0, mtr=0x233825740060) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0purge.cc:658
#9  0x00005558f87351ec in row_purge_record_func (node=0x61f0000004a0, undo_rec=0x62500194cf60 "", thr=0x61f0000003f0, updated_extern=false) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0purge.cc:1064
#10 0x00005558f873558b in row_purge (node=0x61f0000004a0, undo_rec=0x62500194cf60 "", thr=0x61f0000003f0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0purge.cc:1108
#11 0x00005558f87358d1 in row_purge_step (thr=0x61f0000003f0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0purge.cc:1157
#12 0x00005558f860c887 in que_thr_step (thr=0x61f0000003f0) at /Server/bb-10.5-MDEV-24569/storage/innobase/que/que0que.cc:946
#13 0x00005558f860ccf3 in que_run_threads_low (thr=0x61f0000003f0) at /Server/bb-10.5-MDEV-24569/storage/innobase/que/que0que.cc:1008
#14 0x00005558f860d152 in que_run_threads (thr=0x61f0000003f0) at /Server/bb-10.5-MDEV-24569/storage/innobase/que/que0que.cc:1048
#15 0x00005558f8825d1f in trx_purge (n_tasks=4, truncate=false) at /Server/bb-10.5-MDEV-24569/storage/innobase/trx/trx0purge.cc:1300
#16 0x00005558f87bc7bc in srv_do_purge (n_total_purged=0x2338257409a0) at /Server/bb-10.5-MDEV-24569/storage/innobase/srv/srv0srv.cc:1997
#17 0x00005558f87bd281 in purge_coordinator_callback_low () at /Server/bb-10.5-MDEV-24569/storage/innobase/srv/srv0srv.cc:2087
#18 0x00005558f87bd4e5 in purge_coordinator_callback () at /Server/bb-10.5-MDEV-24569/storage/innobase/srv/srv0srv.cc:2116
#19 0x00005558f8bf77d4 in tpool::task_group::execute (this=0x5558fbf541c0 <purge_coordinator_task_group>, t=0x5558fbf54280 <purge_coordinator_task>) at /Server/bb-10.5-MDEV-24569/tpool/task_group.cc:55
#20 0x00005558f8bf80db in tpool::task::execute (this=0x5558fbf54280 <purge_coordinator_task>) at /Server/bb-10.5-MDEV-24569/tpool/task.cc:47
#21 0x00005558f8be4099 in tpool::thread_pool_generic::worker_main (this=0x618000000480, thread_var=0x63000001a580) at /Server/bb-10.5-MDEV-24569/tpool/tpool_generic.cc:546
#22 0x00005558f8bf7235 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=
    @0x6040000c0f28: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x5558f8be3eb6 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x6040000c0f20: 0x618000000480) at /usr/include/c++/9/bits/invoke.h:73
#23 0x00005558f8bf6fe0 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=
    @0x6040000c0f28: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x5558f8be3eb6 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>) at /usr/include/c++/9/bits/invoke.h:95
#24 0x00005558f8bf6e71 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x6040000c0f18) at /usr/include/c++/9/thread:244
#25 0x00005558f8bf6dbb in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x6040000c0f18) at /usr/include/c++/9/thread:251
#26 0x00005558f8bf6d20 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x6040000c0f10)
    at /usr/include/c++/9/thread:195
#27 0x00003f787144fd84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#28 0x00003f787135f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#29 0x000032f55d54b293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 
 origin/bb-10.5-MDEV-24569 2181fa74fdda0bba8a2cc92b555e39d0b9162877 2021-01-13T21:54:30+05:30
 
perl rqg.pl \ 
--no-mask \
--seed=1610558771 \
--duration=300 \
--engine=InnoDB \
--queries=100000000 \
--mysqld=--loose_innodb_use_native_aio=1 \
--reporters=Backtrace,ErrorLog,Deadlock1,CrashRecovery1 \
--mysqld=--innodb_undo_tablespaces=3 \
--mysqld=--innodb_immediate_scrub_data_uncompressed=1 \
--mysqld=--innodb_file_per_table=0 \
--restart_timeout=600 \
--max_gd_duration=1200 \
--mysqld=--innodb_page_size=8K \
--mysqld=--innodb-buffer-pool-size=8M \
--mysqld=--log_output=none \
--grammar=conf/mariadb/innodb_compression_encryption.yy \
--gendata=conf/mariadb/innodb_compression_encryption.zz \
--mysqld=--plugin-load-add=file_key_management.so \
--mysqld=--loose-file-key-management-filename=/home/mleich/RQG_O/conf/mariadb/encryption_keys.txt \
--threads=9 \
--duration=300 \
--no_mask \
--workdir=/home/mleich/RQG_O/storage/1610556037/232 \
--vardir=/dev/shm/vardir/1610556037/232 \
--mtr-build-thread=961 \
--basedir1=/Server_bin/bb-10.5-MDEV-24569_asan \
--basedir2=/Server_bin/bb-10.5-MDEV-24569_asan \
--script_debug=_nix_ \
--rr=Extended \
--rr_options=--chaos
(rr)
 



 Comments   
Comment by Marko Mäkelä [ 2021-01-14 ]

As analyzed in MDEV-24569, if the server is killed between the time DROP TABLE has started to free pages and the commit of the DDL transaction, on restart it is possible that some pages of the table have already been freed. If those pages are reused for something else (which is very easy to achieve with innodb_file_per_table=0), an attempt to access the halfway-dropped table will likely result to the assertion failure.

While implementing MDEV-18518 would fix this, I think that there is a simpler fix that would be technically doable in MariaDB 10.2.19 or later.

  1. DROP TABLE would first rename the table to #sql-ib name and make that change durable
  2. If the server is killed before the table with the #sql-ib name is completely dropped, recovery should complete dropping the table, without allowing it to be accessed.

mleich, would the following crude patch fix the problem in 10.5 (where MDEV-24569 has now been fixed)?

diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
index 23b38b23d33..48b820ba3d7 100644
--- a/storage/innobase/row/row0mysql.cc
+++ b/storage/innobase/row/row0mysql.cc
@@ -3489,7 +3489,7 @@ row_drop_table_for_mysql(
 	they can cope with the table having been dropped here? Foreign key
 	checks take an IS or IX lock on the table. */
 
-	if (table->n_foreign_key_checks_running > 0) {
+	if (true) {
 defer:
 		/* Rename #sql-backup to #sql-ib if table has open ref count
 		while dropping the table. This scenario can happen

Comment by Marko Mäkelä [ 2021-03-22 ]

It looks like this could be fixed by MDEV-23844 (Atomic DROP TABLE).

Comment by Marko Mäkelä [ 2021-03-23 ]

It seems to me that we can just move the dict_drop_index_tree() call from row_upd_clust_step() to row_purge_remove_clust_if_poss_low() to have mostly safe DDL operations in this respect. Years ago, someone (maybe Roel) filed a MySQL bug that when using a small value of innodb_trx_rseg_n_slots_debug, we would hit trouble on ALTER TABLE. With my patch, that use case should work just fine. If we run out of undo log space during ALTER TABLE, we would just roll back the operation, and no index trees would have been dropped.

Comment by Marko Mäkelä [ 2021-03-25 ]

During stress testing, only unrelated errors were found:

  • table->fts->cache is NULL after innobase_reload_table() on ALTER TABLE, causing a SIGSEGV a little later in an INSERT operation.
  • A latching order violation was found in ibuf_remove_free_page() between fil_system.sys_space->latch and ibuf_pessimistic_insert_mutex. This violation is there since the very first InnoDB version. Either the latching order is defined wrong, or the tablespace latch should be acquired after the mutex.

mleich will file these as separate tickets.

Comment by Marko Mäkelä [ 2021-04-22 ]

I observed a one-time failure related to this:

10.6 from yesterday or today with some changes

CURRENT_TEST: stress.ddl_innodb
mysqltest: In included file "./suite/stress/include/ddl7.inc": 
included from /mariadb/10.6-merge/mysql-test/suite/stress/t/ddl_innodb.test at line 43:
At line 272: query 'DROP TABLE t1' failed: 2013: Lost connection to server during query
2021-04-22  9:32:55 0 [Note] InnoDB: trying to read page [page id: space=17886, page number=366] in nonexisting or being-dropped tablespace
2021-04-22  9:32:55 0 [Note] InnoDB: trying to read page [page id: space=17886, page number=366] in nonexisting or being-dropped tablespace
2021-04-22  9:32:55 0 [ERROR] [FATAL] InnoDB: Unable to read page [page id: space=17886, page number=366] into the buffer pool after 100. The most probable cause of this error may be that the table has been corrupted. See https://mariadb.com/kb/en/library/innodb-recovery-modes/
#7  0x0000560d25c3dab5 in buf_page_get_gen (page_id=<optimized out>, page_id@entry={m_id = 76819785056622}, zip_size=<optimized out>, zip_size@entry=0, rw_latch=<optimized out>, rw_latch@entry=2, guess=<optimized out>, guess@entry=0x0, mode=<optimized out>, mode@entry=10, mtr=<optimized out>, mtr@entry=0x7fdd19ffa238, err=0x0, allow_ibuf_merge=<optimized out>) at /mariadb/10.6-merge/storage/innobase/buf/buf0buf.cc:3104
#8  0x0000560d25bd2ec1 in btr_free_root_check (page_id=page_id@entry={m_id = 76819785056622}, zip_size=0, index_id=30915, mtr=0x7fdd19ffa238, mtr@entry=0x78c3) at /mariadb/10.6-merge/storage/innobase/btr/btr0btr.cc:963
#9  btr_free_if_exists (page_id=page_id@entry={m_id = 76819785056622}, zip_size=0, index_id=30915, mtr=mtr@entry=0x7fdd19ffa238) at /mariadb/10.6-merge/storage/innobase/btr/btr0btr.cc:1240
#10 0x0000560d25c6d5bd in dict_drop_index_tree (pcur=<optimized out>, pcur@entry=0x560d27652e68, trx=trx@entry=0x0, mtr=mtr@entry=0x7fdd19ffa238) at /mariadb/10.6-merge/storage/innobase/dict/dict0crea.cc:955
#11 0x0000560d25b3b62b in row_purge_remove_clust_if_poss_low (node=node@entry=0x560d27652dc8, mode=mode@entry=2) at /mariadb/10.6-merge/storage/innobase/row/row0purge.cc:125
#12 0x0000560d25b36204 in row_purge_remove_clust_if_poss (node=<optimized out>) at /mariadb/10.6-merge/storage/innobase/row/row0purge.cc:204

During this time, DROP TABLE had set the STOPPING flag, which caused buf_page_get_gen(…, BUF_GET, …) to fail like that:

#2  0x0000560d25cbfa9b in std::this_thread::sleep_for<long, std::ratio<1l, 1000l> > (__rtime=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/thread:401
#3  fil_check_pending_operations (id=<optimized out>, id@entry=17886, truncate=false, path=path@entry=0x7fddc84ac830) at /mariadb/10.6-merge/storage/innobase/fil/fil0fil.cc:1664
#4  0x0000560d25cc0533 in fil_delete_tablespace (id=17886, if_exists=false, detached_handles=detached_handles@entry=0x7fddc84ace10) at /mariadb/10.6-merge/storage/innobase/fil/fil0fil.cc:1765
#5  0x0000560d25b15cde in row_drop_table_for_mysql (name=<optimized out>, name@entry=0x7fddc84ad600 "test/t1", trx=trx@entry=0x7fddc87643d8, sqlcom=sqlcom@entry=SQLCOM_DROP_TABLE, create_failed=<optimized out>, nonatomic=<optimized out>) at /mariadb/10.6-merge/storage/innobase/row/row0mysql.cc:3674

It is entirely possible that the tablespace will be marked inaccessible (the fil_space_t::STOPPING flag will be set) at the very moment when purge is processing a SYS_INDEXES for the same table.

Therefore, we must revise btr_free_if_exists() so that instead of BUF_GET, the mode BUF_GET_POSSIBLY_FREED will be used, and the operation will be aborted if the STOPPING flag has been set in the tablespace.

Comment by Marko Mäkelä [ 2021-04-22 ]

Because DDL operations will not be entirely crash-safe before MDEV-17567 (and MDEV-25180 in particular), and because the race condition between purge and DROP TABLE is not easy to repeat and could require extensive fixes, it is safest to revert the fix from 10.5 and target only the major version where MDEV-25180 will be pushed.

Comment by Marko Mäkelä [ 2021-04-22 ]

The original change will remain in the 10.6.0 alpha release despite the regression, which was filed as MDEV-25491.

Generated at Thu Feb 08 09:31:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.