[MDEV-26990] DDL_LOG error in the log and errno 168 Unknown (generic) error from engine upon concurrent DDL and DML Created: 2021-11-07  Updated: 2024-01-26

Status: Stalled
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Locking
Affects Version/s: 10.6, 10.7, 10.9, 10.10, 10.11, 11.0, 11.1
Fix Version/s: 10.6, 10.11, 11.0, 11.1

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Golubchik
Resolution: Unresolved Votes: 0
Labels: foreign-keys

Issue Links:
Relates
relates to MDEV-33104 Assertion `table.get_ref_count() <= 1... Closed

 Description   

In the test case below, I don't see any further effects, except for the error on the log and the extra error in SHOW WARNINGS. So, if it's innocent, it's just a cosmetic issue (should probably be a warning in the log and nothing at all in the client), and then it would be Minor.
However, in stress tests I see similar DDL_LOG errors in connection with corruption-like outcome, with errors like "table doesn't exist in engine". It can be a coincidence or can be a real relation, in the latter case as a corruption it would probably be Critical.
So, I will keep it Major for now, as a middle ground. Please adjust according to results of the analysis.

The test case is highly non-deterministic, but it's fast, so run with a big enough --repeat-N. On my machine, it usually fails within 50-100 attempts, but it can vary on different machines and builds.

--source include/have_innodb.inc
 
CREATE TABLE t1 (pk INT AUTO_INCREMENT PRIMARY KEY, f INT) ENGINE=InnoDB;
INSERT INTO t1 (f) VALUES (1),(2),(3),(4),(5);
CREATE TABLE t2 (pk INT AUTO_INCREMENT PRIMARY KEY, f INT, FOREIGN KEY (pk) REFERENCES t1(pk)) ENGINE=InnoDB;
SET innodb_lock_wait_timeout= 0;
START TRANSACTION;
 
--connect (con1,localhost,root,,test)
SET innodb_lock_wait_timeout= 0;
--send
  RENAME TABLE t1 TO tmp1, tmp1 TO tmp2, tmp2 TO t1;
 
--connection default
--error 0,ER_LOCK_WAIT_TIMEOUT
INSERT INTO t2 (f) VALUES (1),(2),(3),(4),(5);
 
--connection con1
--error 0,ER_LOCK_WAIT_TIMEOUT
--reap
SHOW WARNINGS;
--error 0,1
--exec grep DDL_LOG $MYSQLTEST_VARDIR/log/mysqld.1.err
if ($sys_errno == 0)
{
  --die # Found DDL_LOG error in the log
}
 
--connection default
DROP TABLE t2, t1;

bb-10.6-release 109fc67d4

SHOW WARNINGS;
Level	Code	Message
Error	1205	Lock wait timeout exceeded; try restarting transaction
Error	1025	Error on rename of './test/tmp1' to './test/tmp2' (errno: 168 "Unknown (generic) error from engine")
Error	1205	Lock wait timeout exceeded; try restarting transaction

2021-11-07 18:02:10 220 [ERROR] DDL_LOG: Got error 1205 when trying to execute action for entry 1 of type 'rename table'

Reproducible on 10.6-10.7.



 Comments   
Comment by Elena Stepanova [ 2023-06-08 ]

Still reproducible as of Q2.1 2023 release (10.6.14 etc.)

Comment by Marko Mäkelä [ 2024-01-05 ]

This test fails for me like this:

10.6 8bd5a3de7f44163deffc2c9cce570a6bee117678

mysqltest: At line 16: query 'INSERT INTO t2 (f) VALUES (1),(2),(3),(4),(5)' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of  (0)...

After I added ER_LOCK_DEADLOCK as an alternative to the error ER_LOCK_WAIT_TIMEOUT, the test is almost always passing for me, with an occasional error

mysqltest: At line 26: # Found DDL_LOG error in the log
 
Warnings from just before the error:
Error 1205 Lock wait timeout exceeded; try restarting transaction 
Error 1025 Error on rename of './test/tmp1' to './test/tmp2' (errno: 168 "Unknown (generic) error from engine")

In that server error log I see the following:

10.6 8bd5a3de7f44163deffc2c9cce570a6bee117678

Version: '10.6.17-MariaDB-debug-log'  socket: '/mariadb/10.6/build/mysql-test/var/tmp/20/mysqld.1.sock'  port: 16300  Source distribution
2024-01-05 11:43:29 5 [ERROR] DDL_LOG: Got error 1205 when trying to execute action for entry 1 of type 'rename table'
2024-01-05 11:43:29 0 [Note] /mariadb/10.6/build/sql/mariadbd (initiated by: unknown): Normal shutdown
2024-01-05 11:43:29 0 [Note] /mariadb/10.6/build/sql/mariadbd: Shutdown complete

So, indeed the originally reported error seems to be reproducible. It is reproducible even if I create the tables with STATS_PERSISTENT=0.

Comment by Marko Mäkelä [ 2024-01-05 ]

Sorry, the deadlock error probably could have been due to a quick and dirty patch that I had applied to lock_clust_rec_read_check_and_lock() in order to catch one case of snapshot isolation violation, which started from a mailing list discussion related to https://jepsen.io/analyses/mysql-8.0.34:

diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
index c9072998e66..3b205e6be50 100644
--- a/storage/innobase/lock/lock0lock.cc
+++ b/storage/innobase/lock/lock0lock.cc
@@ -5924,6 +5924,13 @@ lock_clust_rec_read_check_and_lock(
 		return DB_SUCCESS;
 	}
 
+	if (auto offset = index->trx_id_offset) {
+		if (!trx->read_view.changes_visible(
+			    trx_read_trx_id(rec + offset))) {
+			return DB_DEADLOCK;
+		}
+	}
+
 	dberr_t err = lock_rec_lock(false, gap_mode | mode,
 				    block, heap_no, index, thr);
 

I got this reproduced with rr replay. We get a lock wait timeout reported for the attempt to acquire an exclusive lock on t2:

10.6 8bd5a3de7f44163deffc2c9cce570a6bee117678

#0  lock_table_for_trx (table=0x557afc0626e8, trx=0x2c9d50fd3680, mode=LOCK_X, no_wait=false) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:3976
#1  0x0000557b05c7972b in lock_table_children (table=0x557afc244d18, trx=0x2c9d50fd3680) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:4001
#2  0x0000557b05bc594d in ha_innobase::rename_table (this=0x6e52404c250, from=0x30b32e0e46f0 "./test/tmp1", to=0x30b32e0e44e0 "./test/t1") at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:14137
#3  0x0000557b05736efe in handler::ha_rename_table (this=0x6e52404c250, from=0x30b32e0e46f0 "./test/tmp1", to=0x30b32e0e44e0 "./test/t1") at /mariadb/10.6/sql/handler.cc:5397
#4  0x0000557b054a6c70 in execute_rename_table (ddl_log_entry=0x30b32e0e49a0, file=0x6e52404c250, from_db=0x30b32e0e49d0, from_table=0x30b32e0e49a0, to_db=0x30b32e0e49e0, to_table=0x30b32e0e49b0, flags=0, from_path=0x30b32e0e46f0 "./test/tmp1", to_path=0x30b32e0e44e0 "./test/t1") at /mariadb/10.6/sql/ddl_log.cc:1110
#5  0x0000557b054a7dbe in ddl_log_execute_action (thd=0x6e5240018d8, mem_root=0x30b32e0e4960, ddl_log_entry=0x30b32e0e49a0) at /mariadb/10.6/sql/ddl_log.cc:1458
#6  0x0000557b054aab23 in ddl_log_execute_entry_no_lock (thd=0x6e5240018d8, first_entry=3) at /mariadb/10.6/sql/ddl_log.cc:2414
#7  0x0000557b054abf7c in ddl_log_revert (thd=0x6e5240018d8, state=0x30b32e0e4b30) at /mariadb/10.6/sql/ddl_log.cc:2931
#8  0x0000557b05399cb2 in mysql_rename_tables (thd=0x6e5240018d8, table_list=0x6e5240140f8, silent=false, if_exists=false) at /mariadb/10.6/sql/sql_rename.cc:203
#9  0x0000557b053652dc in mysql_execute_command (thd=0x6e5240018d8, is_called_from_prepared_stmt=false) at /mariadb/10.6/sql/sql_parse.cc:4347
#10 0x0000557b053722b4 in mysql_parse (thd=0x6e5240018d8, rawbuf=0x6e524013fc0 "RENAME TABLE t1 TO tmp1, tmp1 TO tmp2, tmp2 TO t1", length=49, parser_state=0x30b32e0e5340) at /mariadb/10.6/sql/sql_parse.cc:8100
#11 0x0000557b0535dad1 in dispatch_command (command=COM_QUERY, thd=0x6e5240018d8, packet=0x6e52400bbd9 "RENAME TABLE t1 TO tmp1, tmp1 TO tmp2, tmp2 TO t1", packet_length=49, blocking=true) at /mariadb/10.6/sql/sql_parse.cc:1896

Table t2 contains a clause REFERENCES t1, which is why InnoDB needs to acquire an exclusive lock on it, to prevent concurrent DML from executing. A concurrently active transaction in connection default had acquired a conflicting table lock:

10.6 8bd5a3de7f44163deffc2c9cce570a6bee117678

#3  0x0000557b05c78ab3 in lock_table (table=0x557afc0626e8, fktable=0x0, mode=LOCK_IX, thr=0x557afc22dd08) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:3813
#4  0x0000557b05d725fb in row_ins_step (thr=0x557afc22dd08) at /mariadb/10.6/storage/innobase/row/row0ins.cc:3762
#5  0x0000557b05d9471f in row_insert_for_mysql (mysql_rec=0x557afc061838 "\375\001", prebuilt=0x557afc22d5d8, ins_mode=ROW_INS_NORMAL) at /mariadb/10.6/storage/innobase/row/row0mysql.cc:1315
#6  0x0000557b05bb4d27 in ha_innobase::write_row (this=0x557afc05de20, record=0x557afc061838 "\375\001") at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:7907
#7  0x0000557b0573e315 in handler::ha_write_row (this=0x557afc05de20, buf=0x557afc061838 "\375\001") at /mariadb/10.6/sql/handler.cc:7658
#8  0x0000557b0531201c in write_record (thd=0x557afc0020e8, table=0x557afc12d6f8, info=0x5c4863f5aa20, sink=0x0) at /mariadb/10.6/sql/sql_insert.cc:2168
#9  0x0000557b0530eaf1 in mysql_insert (thd=0x557afc0020e8, table_list=0x557afc02cb60, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false, result=0x0) at /mariadb/10.6/sql/sql_insert.cc:1133
#10 0x0000557b05366218 in mysql_execute_command (thd=0x557afc0020e8, is_called_from_prepared_stmt=false) at /mariadb/10.6/sql/sql_parse.cc:4584
#11 0x0000557b053722b4 in mysql_parse (thd=0x557afc0020e8, rawbuf=0x557afc02ca60 "INSERT INTO t2 (f) VALUES (1),(2),(3),(4),(5)", length=45, parser_state=0x5c4863f5b340) at /mariadb/10.6/sql/sql_parse.cc:8100

I think that RENAME operations require a similar fix as MDEV-33104, which involves table-rebuilding operations such as ALTER TABLE or OPTIMIZE TABLE. That is, the RENAME TABLE operation needs to acquire MDL_EXCLUSIVE on t2 and report a deadlock at that level.

Comment by Marko Mäkelä [ 2024-01-26 ]

For the record, I tested this with a revised fix of MDEV-32899 (https://github.com/MariaDB/server/pull/3021), and there is no difference:

10.6-MDEV-32899 c851e172ea043985fc8d3cec46368004a174892d

2024-01-26  8:06:54 20 [ERROR] DDL_LOG: Got error 1205 when trying to execute action for entry 1 of type 'rename table'

Generated at Thu Feb 08 09:49:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.