[MDEV-20038] Assertion `rw_lock_own(dict_index_get_lock(index), RW_LOCK_S)' failed. in btr_cur_search_to_nth_level_func Created: 2019-07-10  Updated: 2023-10-10

Status: Stalled
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.0.26, 10.0, 10.1, 10.3.11, 10.4.7, 10.2, 10.5
Fix Version/s: 10.4, 10.5

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Vladislav Lesin
Resolution: Unresolved Votes: 1
Labels: not-10.6, rr-profile-analysed, upstream

Attachments: File MDEV-20038.tgz     File MDEV-20038.yy     HTML File prt     File simp_MDEV-20038.cfg    

 Description   

Assert hit during RQG testing on
10.4.7 commit 70c2bde931246ea4966d82fa56773b8ef1e0074f     2019-07-09
mysqld: storage/innobase/btr/btr0cur.cc:1444: dberr_t btr_cur_search_to_nth_level_func(dict_index_t*, ulint, const dtuple_t*, page_cur_mode_t, ulint, btr_cur_t*, rw_lock_t*, const char*, unsigned int, mtr_t*, ib_uint64_t): Assertion `rw_lock_own(dict_index_get_lock(index), RW_LOCK_S)' failed.
190710 19:41:21 [ERROR] mysqld got signal 6 ;
...
# 2019-07-10T19:41:42 [243109] | Query (0x7fd098011180): UPDATE t2 SET col1 = 1175 LIMIT 2  /* E_R Thread9 QNO 607 CON_ID 24 */
# 2019-07-10T19:41:42 [243109] | Connection ID (thread ID): 366
# 2019-07-10T19:41:42 [243109] | Status: NOT_KILLED



 Comments   
Comment by Matthias Leich [ 2019-07-11 ]

The problem could be 
- replayed too on 10.3.11 commit 620f4f8af98666e2efb7e14fb2663ab85b52bc12
- not replayed on 10.2.25 commit b90918dae33f8e3be4af2915984b5a4587e69f89  2019-05-21
 
prt == The output (with backtrace) of MTR during my test run on 10.3.
MDEV-20038.tgz == Archive with MTR based replay test case
 
How to replay the problem:
0. Use a Linux
1. cd <source>/mysql-test
2. tar xvzf MDEV-20038.tgz
     This unpacks a
     mysqltest_background.sh
     main/ml_MDEV-20038.test
3. cd <tree_with_binaries compiled with debug>/mysql-test
4. ./mysql-test-run.pl --mem ml_MDEV-20038
 
Please do not add my replay test case to the regression tests because
it consumes too much resources.
A perfect regression test should use debug_sync or similar.

Comment by Marko Mäkelä [ 2019-08-22 ]

The assertion fails, because the caller is holding index->lock in SX-mode and not S-mode. According to diagnostics, that SX-mode acquisition likely happened in row_ins_sec_mtr_start_and_check_if_aborted():

	if (search_mode & BTR_ALREADY_S_LATCHED) {
		mtr_s_lock(dict_index_get_lock(index), mtr);
	} else {
		mtr_sx_lock(dict_index_get_lock(index), mtr);
	}

That flag was not set; we had search_mode = BTR_INSERT | BTR_MODIFY_TREE at the time of the call, in row_ins_sec_index_entry_low():

		DEBUG_SYNC_C("row_ins_sec_index_unique");
 
		if (row_ins_sec_mtr_start_and_check_if_aborted(
			    &mtr, index, check, search_mode)) {
			goto func_exit;
		}
 
		err = row_ins_scan_sec_index_for_duplicate(
			flags, index, entry, thr, check, &mtr, offsets_heap);

The assertion failed during the execution of the last statement:

10.3 32ec5fb979760ba2db52d8afc39e8e090cf80a9c

#7  0x000055ceff9ceeda in btr_cur_search_to_nth_level_func (index=0x7fb2800cfab8, level=0, tuple=0x7fb288035388, mode=PAGE_CUR_GE, latch_mode=1, cursor=0x7fb2f4384298, ahi_latch=0x0, file=0x55cefff4e770 "/mariadb/10.3/storage/innobase/row/row0ins.cc", line=2169, mtr=0x7fb2f4385338, autoinc=0) at /mariadb/10.3/storage/innobase/btr/btr0cur.cc:1354
#8  0x000055ceff85d67d in btr_pcur_open_low (index=0x7fb2800cfab8, level=0, tuple=0x7fb288035388, mode=PAGE_CUR_GE, latch_mode=16385, cursor=0x7fb2f4384298, file=0x55cefff4e770 "/mariadb/10.3/storage/innobase/row/row0ins.cc", line=2169, autoinc=0, mtr=0x7fb2f4385338) at /mariadb/10.3/storage/innobase/include/btr0pcur.ic:441
#9  0x000055ceff864f87 in row_ins_scan_sec_index_for_duplicate (flags=0, index=0x7fb2800cfab8, entry=0x7fb288035388, thr=0x7fb288034f18, s_latch=true, mtr=0x7fb2f4385338, offsets_heap=0x7fb288035790) at /mariadb/10.3/storage/innobase/row/row0ins.cc:2165
#10 0x000055ceff863e89 in row_ins_sec_index_entry_low (flags=0, mode=33, index=0x7fb2800cfab8, offsets_heap=0x7fb288035790, heap=0x7fb288035c20, entry=0x7fb288035388, trx_id=0, thr=0x7fb288034f18, dup_chk_only=false) at /mariadb/10.3/storage/innobase/row/row0ins.cc:3069
#11 0x000055ceff8667ae in row_ins_sec_index_entry (index=0x7fb2800cfab8, entry=0x7fb288035388, thr=0x7fb288034f18, dup_chk_only=false) at /mariadb/10.3/storage/innobase/row/row0ins.cc:3347
#12 0x000055ceff9106fb in row_upd_sec_index_entry (node=0x7fb288034c10, thr=0x7fb288034f18) at /mariadb/10.3/storage/innobase/row/row0upd.cc:2536
#13 0x000055ceff90adfd in row_upd_sec_step (node=0x7fb288034c10, thr=0x7fb288034f18) at /mariadb/10.3/storage/innobase/row/row0upd.cc:2563
#14 0x000055ceff9071bc in row_upd (node=0x7fb288034c10, thr=0x7fb288034f18) at /mariadb/10.3/storage/innobase/row/row0upd.cc:3328
#15 0x000055ceff906ac5 in row_upd_step (thr=0x7fb288034f18) at /mariadb/10.3/storage/innobase/row/row0upd.cc:3443
#16 0x000055ceff894873 in row_update_for_mysql (prebuilt=0x7fb288034458) at /mariadb/10.3/storage/innobase/row/row0mysql.cc:1888
#17 0x000055ceff6d36e8 in ha_innobase::update_row (this=0x7fb288033cb0, old_row=0x7fb288032d98 "\376\017", new_row=0x7fb288032d88 "\376 \022") at /mariadb/10.3/storage/innobase/handler/ha_innodb.cc:8849
#18 0x000055ceff45f1e9 in handler::ha_update_row (this=0x7fb288033cb0, old_data=0x7fb288032d98 "\376\017", new_data=0x7fb288032d88 "\376 \022") at /mariadb/10.3/sql/handler.cc:6478
#19 0x000055ceff1f7910 in mysql_update (thd=0x7fb288000cf8, table_list=0x7fb288010c58, fields=..., values=..., conds=0x0, order_num=0, order=0x0, limit=2, ignore=false, found_return=0x7fb2f43884f8, updated_return=0x7fb2f43884f0) at /mariadb/10.3/sql/sql_update.cc:955
#20 0x000055ceff0c481e in mysql_execute_command (thd=0x7fb288000cf8) at /mariadb/10.3/sql/sql_parse.cc:4289
#21 0x000055ceff0bc700 in mysql_parse (thd=0x7fb288000cf8, rawbuf=0x7fb288010b60 "UPDATE t2 SET col1 = @MY_INT LIMIT 2", length=36, parser_state=0x7fb2f4389da8, is_com_multi=false, is_next_command=false) at /mariadb/10.3/sql/sql_parse.cc:7829

The problem appears to be related to the following code in row_ins_sec_index_entry_low():

	/* Ensure that we acquire index->lock when inserting into an
	index with index->online_status == ONLINE_INDEX_COMPLETE, but
	could still be subject to rollback_inplace_alter_table().
	This prevents a concurrent change of index->online_status.
	The memory object cannot be freed as long as we have an open
	reference to the table, or index->table->n_ref_count > 0. */
	const bool	check = !index->is_committed();
	if (check) {
		DEBUG_SYNC_C("row_ins_sec_index_enter");
		if (mode == BTR_MODIFY_LEAF) {
			search_mode |= BTR_ALREADY_S_LATCHED;
			mtr_s_lock(dict_index_get_lock(index), &mtr);
		} else {
			mtr_sx_lock(dict_index_get_lock(index), &mtr);
		}
 
		if (row_log_online_op_try(
			    index, entry, thr_get_trx(thr)->id)) {
			goto func_exit;
		}
	}

Here, the index uidx of the test case is uncommitted, but its construction has been completed, by the following statement that is waiting for a metadata lock (MDL):

ALTER TABLE t2 ADD UNIQUE KEY IF NOT EXISTS `uidx1` ( col2 ) , ALGORITHM = NOCOPY , LOCK = DEFAULT

In the above code, we are correctly (recursively) acquiring the SX-latch, thanks to mode=BTR_MODIFY_TREE. But, the flag check is being passed to row_ins_scan_sec_index_for_duplicate() as s_latch, which is slightly misleading, because it can also mean SX-latch:

	ut_ad(s_latch == rw_lock_own_flagged(
			&index->lock, RW_LOCK_FLAG_S | RW_LOCK_FLAG_SX));

The actual problem ought to be this function call:

	btr_pcur_open(index, entry, PAGE_CUR_GE,
		      s_latch
		      ? BTR_SEARCH_LEAF_ALREADY_S_LATCHED
		      : BTR_SEARCH_LEAF,
		      &pcur, mtr);

I believe that we should use BTR_MODIFY_TREE or BTR_MODIFY_LEAF here. Using the former could cause unnecessary exclusive latching of leaf pages. I expect this bug to affect 10.2 and MySQL 5.7 as well. We will find it out, once I have created a reduced deterministic test case.

Comment by Marko Mäkelä [ 2019-08-22 ]

I analyzed two crashes of this while trying to create a DEBUG_SYNC test case. In one crash, the secondary index consisted of multiple pages.

In another case, it was a single secondary index page, with 1119 records WHERE col2 between 10 and 4979, and one purged record (col2,col1)=(10,2940) in the PAGE_FREE list. The index tuple is (col2,col1)=(10,1329). There is a delete-marked record (10,10) in the secondary index. In the clustered index we have a matching delete-marked record (col1,DB_TRX_ID,DB_ROLL_PTR,col2)=(10,30259,…,10) (and no record with col1=2940). The current trx->id is 30259, so this current transaction must have delete-marked that record, maybe as part of updating the PRIMARY KEY(col1) from 10 to 1329. In the clustered index we also have the record (col1,DB_TRX_ID,DB_ROLL_PTR,col2)=(1329,30259,…,10). The current statement UPDATE t2 SET col1 = @MY_INT LIMIT 2 matches this. The thr.graph.trx->undo_no is 2, confirming my other findings that we are updating the first row in the table. Updating the PRIMARY KEY generates two undo log records: one for delete-marking the old key, and another for inserting the new record. So, the UPDATE was processing its very first row, updating the col1 from 10 to 1329.

The unique uncommitted index had not been marked corrupted yet, so there must not have been any duplicate key violations.

Aimed with this knowledge, I tried to create a test case, but unfortunately it does not trigger the bug yet:

--source include/have_innodb.inc
--source include/have_debug.inc
--source include/have_debug_sync.inc
--source include/have_sequence.inc
 
create table t(a int primary key, b int not null) engine=innodb;
insert into t select 4*seq, 4*seq from seq_1_to_1119;
 
connect ddl,localhost,root,,;
SET DEBUG_SYNC='inplace_after_index_build SIGNAL halfway WAIT_FOR done';
send ALTER TABLE t ADD UNIQUE INDEX(b);
connection default;
SET DEBUG_SYNC='now WAIT_FOR halfway';
 
--enable_info
BEGIN;
UPDATE t SET b=5 where a=1000;
DELETE FROM t WHERE a=1000;
COMMIT;
UPDATE t SET a=1329 where a=4;
--disable_info
 
SET DEBUG_SYNC='now SIGNAL done';
connection ddl;
reap;
disconnect ddl;
connection default;
SET DEBUG_SYNC='RESET';
check table t;
DROP TABLE t;

Comment by Marko Mäkelä [ 2019-08-22 ]

MDEV-20038.tgz repeats the bug on 10.1 and 10.2 as well, after replacing the 10.3-specific ALGORITHM=NOCOPY syntax with ALGORITHM=INPLACE. It took 34 seconds for me, and the crashing thread was executing a similar UPDATE statement.
Here is the stack trace from 10.1:

10.1 5b29820d800180816abd67f24414eaa0d51dd6b5

2019-08-22 15:38:02 140115392976832 [Note] /dev/shm/10.1/sql/mysqld: ready for connections.
Version: '10.1.42-MariaDB-debug'  socket: '/dev/shm/10.1/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  Source distribution
2019-08-22 15:38:35 7f6f2023e700  InnoDB: Assertion failure in thread 140115257321216 in file row0ins.cc line 1997
InnoDB: Failing assertion: s_latch == rw_lock_own(&index->lock, 352)
#6  0x0000560318003b52 in row_ins_scan_sec_index_for_duplicate (flags=flags@entry=0, index=index@entry=0x7f6ec8012e28, entry=entry@entry=0x7f6ed000e4b8, thr=thr@entry=0x7f6ed002efd0, s_latch=s_latch@entry=true, mtr=mtr@entry=0x7f6f2023a3a0, offsets_heap=<optimized out>) at /mariadb/10.1/storage/xtradb/row/row0ins.cc:1997
#7  0x0000560318004fce in row_ins_sec_index_entry_low (flags=flags@entry=0, mode=mode@entry=33, index=index@entry=0x7f6ec8012e28, offsets_heap=<optimized out>, offsets_heap@entry=0x7f6ed0023c90, heap=heap@entry=0x7f6ed001fee0, entry=entry@entry=0x7f6ed000e4b8, trx_id=0, thr=0x7f6ed002efd0) at /mariadb/10.1/storage/xtradb/row/row0ins.cc:2831
#8  0x000056031800ba1a in row_ins_sec_index_entry (index=index@entry=0x7f6ec8012e28, entry=entry@entry=0x7f6ed000e4b8, thr=thr@entry=0x7f6ed002efd0) at /mariadb/10.1/storage/xtradb/row/row0ins.cc:3101
#9  0x00005603180587e4 in row_upd_sec_index_entry (node=node@entry=0x7f6ed002ed20, thr=thr@entry=0x7f6ed002efd0) at /mariadb/10.1/storage/xtradb/row/row0upd.cc:2061
#10 0x00005603180592b8 in row_upd_sec_step (thr=0x7f6ed002efd0, node=0x7f6ed002ed20) at /mariadb/10.1/storage/xtradb/row/row0upd.cc:2088
#11 row_upd (thr=0x7f6ed002efd0, node=0x7f6ed002ed20) at /mariadb/10.1/storage/xtradb/row/row0upd.cc:2884
#12 row_upd_step (thr=thr@entry=0x7f6ed002efd0) at /mariadb/10.1/storage/xtradb/row/row0upd.cc:3001
#13 0x000056031801ede1 in row_update_for_mysql (mysql_rec=mysql_rec@entry=0x7f6ed001d920 "\376\002", prebuilt=0x7f6ed002e5d8) at /mariadb/10.1/storage/xtradb/row/row0mysql.cc:1862
#14 0x0000560317f35143 in ha_innobase::update_row (this=<optimized out>, old_row=0x7f6ed001d920 "\376\002", new_row=0x7f6ed001d910 "\376\320\n") at /mariadb/10.1/storage/xtradb/handler/ha_innodb.cc:9501
#15 0x0000560317c81f84 in handler::ha_update_row (this=0x7f6ed000ee90, old_data=0x7f6ed001d920 "\376\002", new_data=0x7f6ed001d910 "\376\320\n") at /mariadb/10.1/sql/handler.cc:6042
#16 0x0000560317b6a7b7 in mysql_update (thd=thd@entry=0x56031b7108a8, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=2, handle_duplicates=<optimized out>, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.1/sql/sql_update.cc:834
#17 0x0000560317ab19c2 in mysql_execute_command (thd=0x56031b7108a8) at /mariadb/10.1/sql/sql_parse.cc:3507
#18 0x0000560317ab8fcb in mysql_parse (thd=thd@entry=0x56031b7108a8, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7f6f2023d140) at /mariadb/10.1/sql/sql_parse.cc:7209
#19 0x0000560317abca01 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x56031b7108a8, packet=packet@entry=0x56031b715e19 "UPDATE t2 SET col1 = @MY_INT LIMIT 2 ", packet_length=packet_length@entry=37) at /mariadb/10.1/sql/sql_class.h:1064

On 10.1 InnoDB, it fails in the same way:

10.1 5b29820d800180816abd67f24414eaa0d51dd6b5

Version: '10.1.42-MariaDB-debug'  socket: '/dev/shm/10.1/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  Source distribution
2019-08-22 15:43:00 7ff208400700  InnoDB: Assertion failure in thread 140677497227008 in file row0ins.cc line 1979
InnoDB: Failing assertion: s_latch == rw_lock_own(&index->lock, 352)
#6  0x00007ff2094613e2 in row_ins_scan_sec_index_for_duplicate (flags=flags@entry=0, index=index@entry=0x7ff1b80259d8, entry=entry@entry=0x7ff1b00146b8, thr=thr@entry=0x7ff1b001a7b0, s_latch=s_latch@entry=true, mtr=mtr@entry=0x7ff2083fc3a0, offsets_heap=<optimized out>) at /mariadb/10.1/storage/innobase/row/row0ins.cc:1979
#7  0x00007ff2094627b6 in row_ins_sec_index_entry_low (flags=flags@entry=0, mode=mode@entry=33, index=index@entry=0x7ff1b80259d8, offsets_heap=<optimized out>, offsets_heap@entry=0x7ff1b0035a70, heap=heap@entry=0x7ff1b000fb20, entry=entry@entry=0x7ff1b00146b8, trx_id=<optimized out>, thr=<optimized out>) at /mariadb/10.1/storage/innobase/row/row0ins.cc:2758
#8  0x00007ff20946976a in row_ins_sec_index_entry (index=index@entry=0x7ff1b80259d8, entry=entry@entry=0x7ff1b00146b8, thr=thr@entry=0x7ff1b001a7b0) at /mariadb/10.1/storage/innobase/row/row0ins.cc:3028
#9  0x00007ff2094bd764 in row_upd_sec_index_entry (node=node@entry=0x7ff1b001a500, thr=thr@entry=0x7ff1b001a7b0) at /mariadb/10.1/storage/innobase/row/row0upd.cc:2055
#10 0x00007ff2094be218 in row_upd_sec_step (thr=0x7ff1b001a7b0, node=0x7ff1b001a500) at /mariadb/10.1/storage/innobase/row/row0upd.cc:2082
#11 row_upd (thr=0x7ff1b001a7b0, node=0x7ff1b001a500) at /mariadb/10.1/storage/innobase/row/row0upd.cc:2851
#12 row_upd_step (thr=thr@entry=0x7ff1b001a7b0) at /mariadb/10.1/storage/innobase/row/row0upd.cc:2968
#13 0x00007ff20947c7e1 in row_update_for_mysql (mysql_rec=mysql_rec@entry=0x7ff1b0016720 "\376\001", prebuilt=0x7ff1b0019db8) at /mariadb/10.1/storage/innobase/row/row0mysql.cc:1874
#14 0x00007ff20939d094 in ha_innodb::update_row (this=<optimized out>, old_row=0x7ff1b0016720 "\376\001", new_row=0x7ff1b0016710 "\376\233\020") at /mariadb/10.1/storage/innobase/handler/ha_innodb.cc:8939
#15 0x0000563156b42f84 in handler::ha_update_row (this=0x7ff1b0018020, old_data=0x7ff1b0016720 "\376\001", new_data=0x7ff1b0016710 "\376\233\020") at /mariadb/10.1/sql/handler.cc:6042
#16 0x0000563156a2b7b7 in mysql_update (thd=thd@entry=0x563159774fd8, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=2, handle_duplicates=<optimized out>, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /mariadb/10.1/sql/sql_update.cc:834
#17 0x00005631569729c2 in mysql_execute_command (thd=0x563159774fd8) at /mariadb/10.1/sql/sql_parse.cc:3507
#18 0x0000563156979fcb in mysql_parse (thd=thd@entry=0x563159774fd8, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7ff2083ff140) at /mariadb/10.1/sql/sql_parse.cc:7209
#19 0x000056315697da01 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x563159774fd8, packet=packet@entry=0x56315977a549 "UPDATE t2 SET col1 = @MY_INT LIMIT 2 ", packet_length=packet_length@entry=37) at /mariadb/10.1/sql/sql_class.h:1064

My educated guess is that MariaDB 10.0 and MySQL starting with 5.6 are affected as well. This was likely introduced by my online ALTER TABLE in MySQL 5.6.

Comment by Marko Mäkelä [ 2019-08-22 ]

I was unable to repeat the failure on MySQL 5.6 or 5.7, but that might be because InnoDB debug assertions were not enabled.
10.0 is affected as well:

10.0 828cc2ba7cdbe47c55fd679437e6e7f0bc714dff

Version: '10.0.39-MariaDB-debug'  socket: '/dev/shm/10.0/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  Source distribution
2019-08-22 16:18:22 7fdca03c5700  InnoDB: Assertion failure in thread 140585557841664 in file row0ins.cc line 1926
InnoDB: Failing assertion: s_latch == rw_lock_own(&index->lock, 352)
#5  0x00007fdca2099063 in row_ins_scan_sec_index_for_duplicate (flags=<optimized out>, index=<optimized out>, entry=<optimized out>, thr=<optimized out>, mtr=<optimized out>, offsets_heap=<optimized out>, s_latch=<optimized out>) at /mariadb/10.0/storage/innobase/row/row0ins.cc:1926
#6  row_ins_sec_index_entry_low (flags=0, mode=<optimized out>, index=0x7fdc4c01daf8, offsets_heap=<optimized out>, heap=<optimized out>, entry=<optimized out>, trx_id=<optimized out>, thr=<optimized out>) at /mariadb/10.0/storage/innobase/row/row0ins.cc:2686
#7  0x00007fdca2099b38 in row_ins_sec_index_entry (index=0x7fdc4c01daf8, entry=0x7fdc64020658, thr=0x7fdc6401df20) at /mariadb/10.0/storage/innobase/row/row0ins.cc:2956
#8  0x00007fdca20de4e9 in row_upd_sec_index_entry (node=<optimized out>, thr=<optimized out>) at /mariadb/10.0/storage/innobase/row/row0upd.cc:1831
#9  row_upd_sec_step (node=<optimized out>, thr=<optimized out>) at /mariadb/10.0/storage/innobase/row/row0upd.cc:1858
#10 row_upd (node=<optimized out>, thr=<optimized out>) at /mariadb/10.0/storage/innobase/row/row0upd.cc:2554
#11 row_upd_step (thr=<optimized out>) at /mariadb/10.0/storage/innobase/row/row0upd.cc:2671
#12 0x00007fdca20a9b4b in row_update_for_mysql (mysql_rec=<optimized out>, prebuilt=0x7fdc6401d528) at /mariadb/10.0/storage/innobase/row/row0mysql.cc:1812
#13 0x00007fdca1fe2336 in ha_innodb::update_row (this=<optimized out>, old_row=0x7fdc64014e50 "\376\022", new_row=<optimized out>) at /mariadb/10.0/storage/innobase/handler/ha_innodb.cc:7721
#14 0x0000000000747cbb in handler::ha_update_row (this=0x7fdc6401ee70, old_data=0x7fdc64014e50 "\376\022", new_data=0x7fdc64014e40 "\376\020\022") at /mariadb/10.0/sql/handler.cc:6066
#15 0x000000000066553c in mysql_update (thd=0x34b0398, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=2, handle_duplicates=DUP_ERROR, ignore=<optimized out>, found_return=0x7fdca03c4080, updated_return=0x7fdca03c3930) at /mariadb/10.0/sql/sql_update.cc:817
#16 0x00000000005c7b19 in mysql_execute_command (thd=0x34b0398) at /mariadb/10.0/sql/sql_parse.cc:3308
#17 0x00000000005c43bc in mysql_parse (thd=0x34b0398, rawbuf=0x7fdc64004fc0 "UPDATE t2 SET col1 = @MY_INT LIMIT 2", length=<optimized out>, parser_state=0x7fdca03c47e0) at /mariadb/10.0/sql/sql_parse.cc:6644

Comment by Marko Mäkelä [ 2019-08-22 ]

I was able to repeat the problem on MariaDB 10.0.26. With earlier versions, the test would abort on a line like this:

--file_exists $RUN_FILE

I reviewed the 13,536-line output of

diff -rwpu storage/innobase ../mysql-5.6.31/storage/innobase

The size is mostly due to changes to diagnostics and to FOREIGN KEY error reporting. I see some changes to transactional deadlock detection that might explain why the test case is not repeating the bug on MySQL, but I do not see any changes that would directly explain this bug.

Some code at the start of row_undo_mod_remove_clust_low() had been accidentally duplicated in MariaDB’s InnoDB (not XtraDB), but removing that did not affect the repeatability of this bug.

Comment by Marko Mäkelä [ 2019-08-23 ]

I gave MDEV-20038.tgz one more try on mysql-5.6.31 (replacing ALGORITHM=NOCOPY with ALGORITHM=INPLACE in order to avoid a syntax error). According to top, the mysqltest_background.sh is not running in parallel with the test, and that might explain why the failure is not being repeated.

mleich, could you please check if MySQL 5.6 or 5.7 are affected? I used the following procedure:

cd /mariadb/mysql-server
git checkout mysql-5.6.31
mkdir /dev/shm/5.6
cd /dev/shm/5.6
CXXFLAGS='-O2 -std=gnu++98' cmake -DWITH_DEBUG=ON -DWITH_EMBEDDED_SERVER=OFF /mariadb/mysql-server
nice make -j$(nproc}
cd /mariadb/mysql-server
MTR_BINDIR=/dev/shm/5.6 perl -I. mysql-test-run.pl main.ml_MDEV-20038

Comment by Matthias Leich [ 2019-08-23 ]

Replay also on MySQL 5.6.31-debug-log
      https://github.com/mysql/mysql-server , git checkout 5.6 , commit 6594835303da437105bb3841df976620b716e088
      There comes a
       2019-08-23 15:42:21 7f8c4d6b9700  InnoDB: Assertion failure in thread 140240571045632 in file row0ins.cc line 1913
       InnoDB: Failing assertion: s_latch == rw_lock_own(&index->lock, 352)
       ....
       Query (7f8c20004c90): UPDATE t2 SET col1 = 2722 LIMIT 2  /* E_R Thread2 QNO 7933 CON_ID 10 */
       Status: NOT_KILLED
if algorithm = NOCOPY gets replaced by algorithm INPLACE.
 
The MTR based brute force test case does not run well on 5.6.
So I uploaded the RQG test
MDEV-20038.yy -- Simplified RQG grammar
simp_MDEV-20038.cfg - RQG config file
The test needs (only) threads=4 and replays in ~ 60s on MySQL 5.6.

Comment by Elena Stepanova [ 2020-01-29 ]

Another attempt. The test case below currently fails for me on 10.4 debug within 1-100 attempts (unfortunately, can't give any better stats, it's really this volatile). Luckily, the test case is fast.
The rr profile with mysqld binary can be found here.

--source include/have_sequence.inc
--source include/have_innodb.inc
 
CREATE TABLE t1 (id INT, a CHAR(65), b VARCHAR(64), PRIMARY KEY(id,a,b), UNIQUE(a(64),id)) ENGINE=InnoDB;
INSERT INTO t1 (id,a,b) SELECT seq, '', '' FROM seq_1_to_100;
 
--connect (con1,localhost,root,,test)
ALTER TABLE t1 ADD f INT;
ALTER TABLE t1 ADD KEY(f);
 
--connection default
--send
    ALTER TABLE t1 DROP KEY f;
 
--connection con1
UPDATE t1 SET b = 'foo';
 
# Cleanup
--disconnect con1
--connection default
--reap
DROP TABLE t1;

10.4 bc891054

mysqld: /data/src/10.4/storage/innobase/btr/btr0cur.cc:1508: dberr_t btr_cur_search_to_nth_level_func(dict_index_t*, ulint, const dtuple_t*, page_cur_mode_t, ulint, btr_cur_t*, rw_lock_t*, const char*, unsigned int, mtr_t*, ib_uint64_t): Assertion `rw_lock_own(dict_index_get_lock(index), RW_LOCK_S)' failed.
200129 19:55:52 [ERROR] mysqld got signal 6 ;
 
#6  0x00007fd3fd51ae67 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x561887cb0e30 "rw_lock_own(dict_index_get_lock(index), RW_LOCK_S)", file=file@entry=0x561887cb0070 "/data/src/10.4/storage/innobase/btr/btr0cur.cc", line=line@entry=1508, function=function@entry=0x561887cb5140 <btr_cur_search_to_nth_level_func(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_cur_t*, rw_lock_t*, char const*, unsigned int, mtr_t*, unsigned long)::__PRETTY_FUNCTION__> "dberr_t btr_cur_search_to_nth_level_func(dict_index_t*, ulint, const dtuple_t*, page_cur_mode_t, ulint, btr_cur_t*, rw_lock_t*, const char*, unsigned int, mtr_t*, ib_uint64_t)") at assert.c:92
#7  0x00007fd3fd51af12 in __GI___assert_fail (assertion=0x561887cb0e30 "rw_lock_own(dict_index_get_lock(index), RW_LOCK_S)", file=0x561887cb0070 "/data/src/10.4/storage/innobase/btr/btr0cur.cc", line=1508, function=0x561887cb5140 <btr_cur_search_to_nth_level_func(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_cur_t*, rw_lock_t*, char const*, unsigned int, mtr_t*, unsigned long)::__PRETTY_FUNCTION__> "dberr_t btr_cur_search_to_nth_level_func(dict_index_t*, ulint, const dtuple_t*, page_cur_mode_t, ulint, btr_cur_t*, rw_lock_t*, const char*, unsigned int, mtr_t*, ib_uint64_t)") at assert.c:101
#8  0x00005618875323a4 in btr_cur_search_to_nth_level_func (index=0x7fd3a41a7418, level=0, tuple=0x7fd3a802a308, mode=PAGE_CUR_GE, latch_mode=1, cursor=0x7fd3f805daa0, ahi_latch=0x0, file=0x561887c46b90 "/data/src/10.4/storage/innobase/row/row0ins.cc", line=2155, mtr=0x7fd3f805e0f0, autoinc=0) at /data/src/10.4/storage/innobase/btr/btr0cur.cc:1508
#9  0x00005618873e6e31 in btr_pcur_open_low (index=0x7fd3a41a7418, level=0, tuple=0x7fd3a802a308, mode=PAGE_CUR_GE, latch_mode=16385, cursor=0x7fd3f805daa0, file=0x561887c46b90 "/data/src/10.4/storage/innobase/row/row0ins.cc", line=2155, autoinc=0, mtr=0x7fd3f805e0f0) at /data/src/10.4/storage/innobase/include/btr0pcur.ic:441
#10 0x00005618873ed065 in row_ins_scan_sec_index_for_duplicate (flags=0, index=0x7fd3a41a7418, entry=0x7fd3a802a308, thr=0x7fd3a805c2e0, s_latch=true, mtr=0x7fd3f805e0f0, offsets_heap=0x7fd3a80936c0) at /data/src/10.4/storage/innobase/row/row0ins.cc:2155
#11 0x00005618873effe7 in row_ins_sec_index_entry_low (flags=0, mode=33, index=0x7fd3a41a7418, offsets_heap=0x7fd3a80936c0, heap=0x7fd3a80a0b50, entry=0x7fd3a802a308, trx_id=0, thr=0x7fd3a805c2e0) at /data/src/10.4/storage/innobase/row/row0ins.cc:3041
#12 0x00005618873f0e0b in row_ins_sec_index_entry (index=0x7fd3a41a7418, entry=0x7fd3a802a308, thr=0x7fd3a805c2e0, check_foreign=true) at /data/src/10.4/storage/innobase/row/row0ins.cc:3330
#13 0x0000561887479cf1 in row_upd_sec_index_entry (node=0x7fd3a805bf78, thr=0x7fd3a805c2e0) at /data/src/10.4/storage/innobase/row/row0upd.cc:2519
#14 0x0000561887479f56 in row_upd_sec_step (node=0x7fd3a805bf78, thr=0x7fd3a805c2e0) at /data/src/10.4/storage/innobase/row/row0upd.cc:2546
#15 0x000056188747cf77 in row_upd (node=0x7fd3a805bf78, thr=0x7fd3a805c2e0) at /data/src/10.4/storage/innobase/row/row0upd.cc:3327
#16 0x000056188747d3bd in row_upd_step (thr=0x7fd3a805c2e0) at /data/src/10.4/storage/innobase/row/row0upd.cc:3442
#17 0x0000561887418c94 in row_update_for_mysql (prebuilt=0x7fd3a805b718) at /data/src/10.4/storage/innobase/row/row0mysql.cc:1889
#18 0x0000561887273e06 in ha_innobase::update_row (this=0x7fd3a807a668, old_row=0x7fd3a8025360 "\377\021", new_row=0x7fd3a80252d0 "\377\021") at /data/src/10.4/storage/innobase/handler/ha_innodb.cc:8830
#19 0x000056188703579a in handler::ha_update_row (this=0x7fd3a807a668, old_data=0x7fd3a8025360 "\377\021", new_data=0x7fd3a80252d0 "\377\021") at /data/src/10.4/sql/handler.cc:6720
#20 0x0000561886dd54b1 in mysql_update (thd=0x7fd3a8000af0, table_list=0x7fd3a8011eb0, fields=..., values=..., conds=0x0, order_num=0, order=0x0, limit=18446744073709551599, ignore=false, found_return=0x7fd3f8060840, updated_return=0x7fd3f8060900) at /data/src/10.4/sql/sql_update.cc:1055
#21 0x0000561886cc5967 in mysql_execute_command (thd=0x7fd3a8000af0) at /data/src/10.4/sql/sql_parse.cc:4377
#22 0x0000561886cd290d in mysql_parse (thd=0x7fd3a8000af0, rawbuf=0x7fd3a8011dd8 "UPDATE t1 SET b = 'foo'", length=23, parser_state=0x7fd3f8061160, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:7901
#23 0x0000561886cbdada in dispatch_command (command=COM_QUERY, thd=0x7fd3a8000af0, packet=0x7fd3a807d931 "UPDATE t1 SET b = 'foo'", packet_length=23, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1842
#24 0x0000561886cbc167 in do_command (thd=0x7fd3a8000af0) at /data/src/10.4/sql/sql_parse.cc:1360
#25 0x0000561886e45381 in do_handle_one_connection (connect=0x56188a5835c0) at /data/src/10.4/sql/sql_connect.cc:1412
#26 0x0000561886e450d0 in handle_one_connection (arg=0x56188a5835c0) at /data/src/10.4/sql/sql_connect.cc:1316
#27 0x000056188784d7cd in pfs_spawn_thread (arg=0x56188a4a22f0) at /data/src/10.4/storage/perfschema/pfs.cc:1869
#28 0x00007fd3ff4a34a4 in start_thread (arg=0x7fd3f8062700) at pthread_create.c:456
#29 0x00007fd3fd5d7d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Couldn't reproduce with this test case on other versions, only 10.4. Could be just the matter of luck.

Comment by Elena Stepanova [ 2023-10-09 ]

Yet another attempt
rr:/data/MDEV-20038/rr-profile-packed

10.5 b04af64882f233e3937f729aac321de504db3f63

(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000146bae9fe859 in __GI_abort () at abort.c:79
#2  0x0000146bae9fe729 in __assert_fail_base (fmt=0x146baeb94588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55f81ea7d680 "rw_lock_own(dict_index_get_lock(index), RW_LOCK_S)", 
    file=0x55f81ea7b720 "/data/MDEV-20038/10.5/storage/innobase/btr/btr0cur.cc", line=1504, function=<optimized out>) at assert.c:92
#3  0x0000146baea0ffd6 in __GI___assert_fail (assertion=0x55f81ea7d680 "rw_lock_own(dict_index_get_lock(index), RW_LOCK_S)", file=0x55f81ea7b720 "/data/MDEV-20038/10.5/storage/innobase/btr/btr0cur.cc", 
    line=1504, function=0x55f81ea7ce40 "dberr_t btr_cur_search_to_nth_level(dict_index_t*, ulint, const dtuple_t*, page_cur_mode_t, ulint, btr_cur_t*, const char*, unsigned int, mtr_t*, ib_uint64_t)")
    at assert.c:101
#4  0x000055f81d6f30a7 in btr_cur_search_to_nth_level (index=0x617000080408, level=0, tuple=0x6190001aaa08, mode=PAGE_CUR_GE, latch_mode=1, cursor=0x146b93667330, 
    file=0x55f81e92e580 "/data/MDEV-20038/10.5/storage/innobase/row/row0ins.cc", line=2162, mtr=0x146b93667bd0, autoinc=0) at /data/MDEV-20038/10.5/storage/innobase/btr/btr0cur.cc:1504
#5  0x000055f81d48bfb7 in btr_pcur_open_low (index=0x617000080408, level=0, tuple=0x6190001aaa08, mode=PAGE_CUR_GE, latch_mode=16385, cursor=0x146b93667330, 
    file=0x55f81e92e580 "/data/MDEV-20038/10.5/storage/innobase/row/row0ins.cc", line=2162, autoinc=0, mtr=0x146b93667bd0) at /data/MDEV-20038/10.5/storage/innobase/include/btr0pcur.inl:441
#6  0x000055f81d4985fb in row_ins_scan_sec_index_for_duplicate (flags=0, index=0x617000080408, entry=0x6190001aaa08, thr=0x6210000e9588, s_latch=true, mtr=0x146b93667bd0, offsets_heap=0x6190001aae80)
    at /data/MDEV-20038/10.5/storage/innobase/row/row0ins.cc:2162
#7  0x000055f81d49d93d in row_ins_sec_index_entry_low (flags=0, mode=33, index=0x617000080408, offsets_heap=0x6190001aae80, heap=0x6190001ab380, entry=0x6190001aaa08, trx_id=0, thr=0x6210000e9588)
    at /data/MDEV-20038/10.5/storage/innobase/row/row0ins.cc:3087
#8  0x000055f81d49efdc in row_ins_sec_index_entry (index=0x617000080408, entry=0x6190001aaa08, thr=0x6210000e9588, check_foreign=true) at /data/MDEV-20038/10.5/storage/innobase/row/row0ins.cc:3372
#9  0x000055f81d5977ce in row_upd_sec_index_entry (node=0x6210000e91d8, thr=0x6210000e9588) at /data/MDEV-20038/10.5/storage/innobase/row/row0upd.cc:2237
#10 0x000055f81d597d0f in row_upd_sec_step (node=0x6210000e91d8, thr=0x6210000e9588) at /data/MDEV-20038/10.5/storage/innobase/row/row0upd.cc:2264
#11 0x000055f81d59ce68 in row_upd (node=0x6210000e91d8, thr=0x6210000e9588) at /data/MDEV-20038/10.5/storage/innobase/row/row0upd.cc:3028
#12 0x000055f81d59d634 in row_upd_step (thr=0x6210000e9588) at /data/MDEV-20038/10.5/storage/innobase/row/row0upd.cc:3143
#13 0x000055f81d4e7c33 in row_update_for_mysql (prebuilt=0x6210000e8988) at /data/MDEV-20038/10.5/storage/innobase/row/row0mysql.cc:1801
#14 0x000055f81d168e13 in ha_innobase::update_row (this=0x61d0002ae4b8, old_row=0x623000053290 "\004", new_row=0x623000052538 "\004") at /data/MDEV-20038/10.5/storage/innobase/handler/ha_innodb.cc:8477
#15 0x000055f81c932997 in handler::ha_update_row (this=0x61d0002ae4b8, old_data=0x623000053290 "\004", new_data=0x623000052538 "\004") at /data/MDEV-20038/10.5/sql/handler.cc:7334
#16 0x000055f81c453249 in mysql_update (thd=0x62b0000e0218, table_list=0x62b0000e7468, fields=..., values=..., conds=0x0, order_num=0, order=0x0, limit=18446744073709551615, ignore=false, 
    found_return=0x146b9366a1d0, updated_return=0x146b9366a1f0) at /data/MDEV-20038/10.5/sql/sql_update.cc:1090
#17 0x000055f81c185263 in mysql_execute_command (thd=0x62b0000e0218) at /data/MDEV-20038/10.5/sql/sql_parse.cc:4484
#18 0x000055f81c19d075 in mysql_parse (thd=0x62b0000e0218, rawbuf=0x62b0000e7238 "/* WRK-1 QNO 9845 */ UPDATE `advanced_db`.`t5_InnoDB` SET `col_varchar` = CONVERT('rehabilitation' USING UCS2)", 
    length=110, parser_state=0x146b9366ab00, is_com_multi=false, is_next_command=false) at /data/MDEV-20038/10.5/sql/sql_parse.cc:8120
#19 0x000055f81c175363 in dispatch_command (command=COM_QUERY, thd=0x62b0000e0218, 
    packet=0x6290002ee219 "/* WRK-1 QNO 9845 */ UPDATE `advanced_db`.`t5_InnoDB` SET `col_varchar` = CONVERT('rehabilitation' USING UCS2)", packet_length=110, is_com_multi=false, is_next_command=false)
    at /data/MDEV-20038/10.5/sql/sql_parse.cc:1891
#20 0x000055f81c172622 in do_command (thd=0x62b0000e0218) at /data/MDEV-20038/10.5/sql/sql_parse.cc:1375
#21 0x000055f81c56061e in do_handle_one_connection (connect=0x608000002b38, put_in_cache=true) at /data/MDEV-20038/10.5/sql/sql_connect.cc:1416
#22 0x000055f81c55ffb2 in handle_one_connection (arg=0x608000002b38) at /data/MDEV-20038/10.5/sql/sql_connect.cc:1318
#23 0x0000146baef28609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#24 0x0000146baeafb133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment by Marko Mäkelä [ 2023-10-10 ]

In that trace, the mini-transaction is only holding the index.lock, in SX mode, not S mode:

(rr) p *(mtr_memo_slot_t*)mtr.m_memo.m_first_block.m_data
$2 = {object = 0x617000080578, type = MTR_MEMO_SX_LOCK}

The index latch was acquired a little earlier by the same thread:

#2  0x000055f81d2ce4af in mtr_t::sx_lock (this=0x146b93667bd0, lock=0x617000080578, file=0x55f81e92e580 "/data/MDEV-20038/10.5/storage/innobase/row/row0ins.cc", line=2897)
    at /data/MDEV-20038/10.5/storage/innobase/include/mtr0mtr.h:251
#3  0x000055f81d49c8bf in row_ins_sec_mtr_start_and_check_if_aborted (mtr=0x146b93667bd0, index=0x617000080408, check=true, search_mode=545) at /data/MDEV-20038/10.5/storage/innobase/row/row0ins.cc:2897
#4  0x000055f81d49d8f6 in row_ins_sec_index_entry_low (flags=0, mode=33, index=0x617000080408, offsets_heap=0x6190001aae80, heap=0x6190001ab380, entry=0x6190001aaa08, trx_id=0, thr=0x6210000e9588)
    at /data/MDEV-20038/10.5/storage/innobase/row/row0ins.cc:3082
#5  0x000055f81d49efdc in row_ins_sec_index_entry (index=0x617000080408, entry=0x6190001aaa08, thr=0x6210000e9588, check_foreign=true) at /data/MDEV-20038/10.5/storage/innobase/row/row0ins.cc:3372
#6  0x000055f81d5977ce in row_upd_sec_index_entry (node=0x6210000e91d8, thr=0x6210000e9588) at /data/MDEV-20038/10.5/storage/innobase/row/row0upd.cc:2237
#7  0x000055f81d597d0f in row_upd_sec_step (node=0x6210000e91d8, thr=0x6210000e9588) at /data/MDEV-20038/10.5/storage/innobase/row/row0upd.cc:2264
#8  0x000055f81d59ce68 in row_upd (node=0x6210000e91d8, thr=0x6210000e9588) at /data/MDEV-20038/10.5/storage/innobase/row/row0upd.cc:3028
#9  0x000055f81d59d634 in row_upd_step (thr=0x6210000e9588) at /data/MDEV-20038/10.5/storage/innobase/row/row0upd.cc:3143
#10 0x000055f81d4e7c33 in row_update_for_mysql (prebuilt=0x6210000e8988) at /data/MDEV-20038/10.5/storage/innobase/row/row0mysql.cc:1801
#11 0x000055f81d168e13 in ha_innobase::update_row (this=0x61d0002ae4b8, old_row=0x623000053290 "\004", new_row=0x623000052538 "\004") at /data/MDEV-20038/10.5/storage/innobase/handler/ha_innodb.cc:8477

The failing debug assertion is too strict. It should qualify also an SX latch and not only S latch on the index. Would the following patch work, or would it trigger an assertion failure somewhere else?

diff --git a/storage/innobase/btr/btr0cur.cc b/storage/innobase/btr/btr0cur.cc
index e076a3b7962..c76d167dcab 100644
--- a/storage/innobase/btr/btr0cur.cc
+++ b/storage/innobase/btr/btr0cur.cc
@@ -1501,8 +1501,9 @@ dberr_t btr_cur_search_to_nth_level(dict_index_t *index, ulint level,
 	default:
 		if (!srv_read_only_mode) {
 			if (s_latch_by_caller) {
-				ut_ad(rw_lock_own(dict_index_get_lock(index),
-				              RW_LOCK_S));
+				ut_ad(rw_lock_own_flagged(&index->lock,
+							  RW_LOCK_FLAG_SX
+							  | RW_LOCK_FLAG_S));
 			} else if (!modify_external) {
 				/* BTR_SEARCH_TREE is intended to be used with
 				BTR_ALREADY_S_LATCHED */

In MDEV-30400 and MDEV-24142, this code had been refactored in MariaDB Server 10.6. A corresponding assertion is at the start of btr_cur_t::search_leaf():

  ut_ad(!(latch_mode & BTR_ALREADY_S_LATCHED) ||
        mtr->memo_contains_flagged(&index()->lock,
                                   MTR_MEMO_S_LOCK | MTR_MEMO_SX_LOCK |
                                   MTR_MEMO_X_LOCK));

This assertion checks that if the caller claims that the index tree was already latched, the mtr_t::m_memo must contain the index tree in one of the modes. For latch_mode=BTR_MODIFY_TREE we additionally assert that the exclusive index latch is being held.

Generated at Thu Feb 08 08:56:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.