[MDEV-20154] Assertion `len <= col->len || ((col->mtype) == 5 || (col->mtype) == 14)' failed in row_merge_buf_add Created: 2019-07-24  Updated: 2021-07-29  Resolved: 2021-07-29

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Virtual Columns
Affects Version/s: 10.2, 10.3
Fix Version/s: 10.2.40, 10.3.31, 10.4.21, 10.5.12, 10.6.4

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Nikita Malyavin
Resolution: Fixed Votes: 0
Labels: regression

Issue Links:
Relates
relates to MDEV-26251 CREATE INDEX on virtual column ignore... Open
relates to MDEV-26263 Investigate possible race on vc_templ... Open
relates to MDEV-20799 DROP Virtual Column crashes MariaDB Closed

 Description   

Important note: Even though the test case looks deterministic, the failure seems come and go. Sometimes it happens every time, dozens times in a row, and suddenly just stops and the test starts passing, with the same options and the same binaries. It seems to be more reliable with ASAN, even though it's still the same assertion failure, but maybe it's just the matter of luck.

--source include/have_innodb.inc
 
CREATE TABLE t1 ( 
    a VARCHAR(2500),
    b VARCHAR(2499) AS (a) VIRTUAL
) ENGINE=InnoDB;
INSERT INTO t1 (a) VALUES ('foo');
 
ALTER TABLE t1 MODIFY a VARCHAR(2600);
ALTER TABLE t1 ADD KEY (b);
 
# Cleanup
DROP TABLE t1;

10.2 97055e6b

mysqld: /data/src/10.2/storage/innobase/row/row0merge.cc:752: ulint row_merge_buf_add(row_merge_buf_t*, dict_index_t*, const dict_table_t*, const dict_table_t*, fts_psort_t*, dtuple_t*, const row_ext_t*, doc_id_t*, mem_heap_t*, dberr_t*, mem_heap_t**, TABLE*, trx_t*): Assertion `len <= col->len || ((col->mtype) == 5 || (col->mtype) == 14)' failed.
190724 21:31:06 [ERROR] mysqld got signal 6 ;
 
#5  0x00007f42d12b542a in __GI_abort () at abort.c:89
#6  0x00007f42d12ace67 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55c2cceb0868 "len <= col->len || ((col->mtype) == 5 || (col->mtype) == 14)", file=file@entry=0x55c2cceb05e0 "/data/src/10.2/storage/innobase/row/row0merge.cc", line=line@entry=752, function=function@entry=0x55c2cceb2f20 <row_merge_buf_add(row_merge_buf_t*, dict_index_t*, dict_table_t const*, dict_table_t const*, fts_psort_t*, dtuple_t*, row_ext_t const*, unsigned long*, mem_block_info_t*, dberr_t*, mem_block_info_t**, TABLE*, trx_t*)::__PRETTY_FUNCTION__> "ulint row_merge_buf_add(row_merge_buf_t*, dict_index_t*, const dict_table_t*, const dict_table_t*, fts_psort_t*, dtuple_t*, const row_ext_t*, doc_id_t*, mem_heap_t*, dberr_t*, mem_heap_t**, TABLE*, tr"...) at assert.c:92
#7  0x00007f42d12acf12 in __GI___assert_fail (assertion=0x55c2cceb0868 "len <= col->len || ((col->mtype) == 5 || (col->mtype) == 14)", file=0x55c2cceb05e0 "/data/src/10.2/storage/innobase/row/row0merge.cc", line=752, function=0x55c2cceb2f20 <row_merge_buf_add(row_merge_buf_t*, dict_index_t*, dict_table_t const*, dict_table_t const*, fts_psort_t*, dtuple_t*, row_ext_t const*, unsigned long*, mem_block_info_t*, dberr_t*, mem_block_info_t**, TABLE*, trx_t*)::__PRETTY_FUNCTION__> "ulint row_merge_buf_add(row_merge_buf_t*, dict_index_t*, const dict_table_t*, const dict_table_t*, fts_psort_t*, dtuple_t*, const row_ext_t*, doc_id_t*, mem_heap_t*, dberr_t*, mem_heap_t**, TABLE*, tr"...) at assert.c:101
#8  0x000055c2cc85acc3 in row_merge_buf_add (buf=0x7f427c038918, fts_index=0x0, old_table=0x7f427c0311d8, new_table=0x7f427c0311d8, psort_info=0x0, row=0x7f427c09b1b0, ext=0x0, doc_id=0x7f42cc116908, conv_heap=0x0, err=0x7f42cc1168f4, v_heap=0x7f42cc116900, my_table=0x7f427c035d50, trx=0x7f42cc4bf148) at /data/src/10.2/storage/innobase/row/row0merge.cc:752
#9  0x000055c2cc85f685 in row_merge_read_clustered_index (trx=0x7f42cc4bf148, table=0x7f427c035d50, old_table=0x7f427c0311d8, new_table=0x7f427c0311d8, online=true, index=0x7f427c041f18, fts_sort_idx=0x0, psort_info=0x0, files=0x7f427c1702e0, key_numbers=0x7f427c041f20, n_index=1, add_cols=0x0, add_v=0x0, col_map=0x0, add_autoinc=18446744073709551615, sequence=..., block=0x7f42c0088000 <error: Cannot access memory at address 0x7f42c0088000>, skip_pk_sort=false, tmpfd=0x7f42cc117424, stage=0x7f427c11b630, pct_cost=50, crypt_block=0x0, eval_table=0x7f427c035d50) at /data/src/10.2/storage/innobase/row/row0merge.cc:2235
#10 0x000055c2cc865ef2 in row_merge_build_indexes (trx=0x7f42cc4bf148, old_table=0x7f427c0311d8, new_table=0x7f427c0311d8, online=true, indexes=0x7f427c041f18, key_numbers=0x7f427c041f20, n_indexes=1, table=0x7f427c035d50, add_cols=0x0, col_map=0x0, add_autoinc=18446744073709551615, sequence=..., skip_pk_sort=false, stage=0x7f427c11b630, add_v=0x0, eval_table=0x7f427c035d50) at /data/src/10.2/storage/innobase/row/row0merge.cc:4680
#11 0x000055c2cc76e362 in ha_innobase::inplace_alter_table (this=0x7f427c08c108, altered_table=0x7f427c035d50, ha_alter_info=0x7f42cc117c40) at /data/src/10.2/storage/innobase/handler/handler0alter.cc:6259
#12 0x000055c2cc381ef5 in handler::ha_inplace_alter_table (this=0x7f427c08c108, altered_table=0x7f427c035d50, ha_alter_info=0x7f42cc117c40) at /data/src/10.2/sql/handler.h:3796
#13 0x000055c2cc3778a6 in mysql_inplace_alter_table (thd=0x7f427c000b00, table_list=0x7f427c012540, table=0x7f427c08b500, altered_table=0x7f427c035d50, ha_alter_info=0x7f42cc117c40, inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, target_mdl_request=0x7f42cc117cd0, alter_ctx=0x7f42cc118280) at /data/src/10.2/sql/sql_table.cc:7367
#14 0x000055c2cc37d3f2 in mysql_alter_table (thd=0x7f427c000b00, new_db=0x7f427c012b50 "test", new_name=0x0, create_info=0x7f42cc118e90, table_list=0x7f427c012540, alter_info=0x7f42cc118de0, order_num=0, order=0x0, ignore=false) at /data/src/10.2/sql/sql_table.cc:9495
#15 0x000055c2cc3f9353 in Sql_cmd_alter_table::execute (this=0x7f427c012c18, thd=0x7f427c000b00) at /data/src/10.2/sql/sql_alter.cc:333
#16 0x000055c2cc2ae0da in mysql_execute_command (thd=0x7f427c000b00) at /data/src/10.2/sql/sql_parse.cc:5973
#17 0x000055c2cc2b31ce in mysql_parse (thd=0x7f427c000b00, rawbuf=0x7f427c012458 "ALTER TABLE t1 ADD KEY (b)", length=26, parser_state=0x7f42cc11a200, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:7760
#18 0x000055c2cc2a14dd in dispatch_command (command=COM_QUERY, thd=0x7f427c000b00, packet=0x7f427c095fd1 "ALTER TABLE t1 ADD KEY (b)", packet_length=26, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1832
#19 0x000055c2cc29fe31 in do_command (thd=0x7f427c000b00) at /data/src/10.2/sql/sql_parse.cc:1385
#20 0x000055c2cc3f3f1b in do_handle_one_connection (connect=0x55c2cfbf8000) at /data/src/10.2/sql/sql_connect.cc:1336
#21 0x000055c2cc3f3c86 in handle_one_connection (arg=0x55c2cfbf8000) at /data/src/10.2/sql/sql_connect.cc:1241
#22 0x000055c2ccc22d86 in pfs_spawn_thread (arg=0x55c2cfc1b2b0) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#23 0x00007f42d2e214a4 in start_thread (arg=0x7f42cc11b700) at pthread_create.c:456
#24 0x00007f42d1369d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

The failure seemingly appeared in 10.2 tree after this commit:

commit 12614af1fe14a045f09dedaced7e2d4c4caf7bf4
Author: Nikita Malyavin
Date:   Mon Jul 15 20:30:45 2019 +1000
 
    MDEV-17005 ASAN heap-use-after-free in innobase_get_computed_value

As of now (July 24th), it's also in 10.3, but not yet in 10.4, and 10.4 does not fail.



 Comments   
Comment by Nikita Malyavin [ 2021-07-05 ]

Setting to critical since it's a regression

Comment by Nikita Malyavin [ 2021-07-19 ]

Commit for review:
commit e7e7db02a7 (HEAD -> bb-10.2-nikita, origin/bb-10.2-nikita)
Author: Nikita Malyavin <nikitamalyavin@gmail.com>
Date: Tue Jul 6 10:17:22 2021 +0300

Comment by Nikita Malyavin [ 2021-07-19 ]

commit for review:
commit 5ac3578488a56b9 (bb-10.2-nikita)
Author: Nikita Malyavin <nikitamalyavin@gmail.com>
Date: Tue Jul 6 10:17:22 2021 +0300

Comment by Marko Mäkelä [ 2021-07-19 ]

I think that this needs to be thoroughly stress tested. Where is the 10.6 based version?

Comment by Nikita Malyavin [ 2021-07-19 ]

marko pushed here https://github.com/MariaDB/server/commit/5ac3578488a56b94d19784eaeee77aba0584691

Comment by Marko Mäkelä [ 2021-07-27 ]

nikitamalyavin, I believe that we do allow ADD INDEX on a virtual column while concurrent DML is possible. The 10.6 version does not seem to address my review comments for the 10.2 version. I do not think that it is safe to modify dict_table_t::vc_templ while not holding MDL_EXCLUSIVE. We need to be prepared for a failed execution too.

Please add test case scenarios that include concurrent ADD UNIQUE INDEX, and triggers a duplicate key errors:

  • during row_log_apply
  • after row_log_apply, while the created index is uncommitted but completed; the DML thread must apply changes directly to the index, but report any duplicate key error to the DDL thread, which is waiting for MDL upgrade.

Please refer to the test innodb.innodb-index-online, in particular the DEBUG_SYNC labels row_log_apply_before and row_log_apply_after.

Comment by Marko Mäkelä [ 2021-07-27 ]

You may easily adapt the DEBUG_SYNC test case that I posted to MDEV-26251.

Comment by Nikita Malyavin [ 2021-07-27 ]

I have moved the change in ha_innobase::prepare_inplace_alter_table, which is supposed to be evaluated under ecxlusive MDL lock. This is guaranteed by the check in the beginning of the function:

	if (!(ha_alter_info->handler_flags & ~INNOBASE_INPLACE_IGNORE)) {
		/* Nothing to do */
		DBUG_ASSERT(m_prebuilt->trx->dict_operation_lock_mode == 0);
                ...
		DBUG_RETURN(false);
	}

Patch for review: d5c8842df0e

Comment by Marko Mäkelä [ 2021-07-28 ]

Thank you, it is clearer now.

I see that we are unnecessarily constructing dict_table_t::vc_templ already for the INSERT when there do not exist any indexes on the virtual column. Because of this, we must update or invalidate dict_table_t::vc_templ when the base columns of a non-indexed virtual column are changed (in the first ALTER TABLE statement).

Please make this clear in the commit message.

Initially, I thought that the fix would apply to the second ALTER TABLE statement. That is why I initially asked to cover a failed ALTER TABLE in the test case. That should not actually be necessary, because the fix only affects the first ALTER TABLE (metadata-only change) and it does not matter whether that operation will succeed or be rolled back.

While I do not see a problem with the updated solution, to be on the safe side, I think that it would be good to subject this to additional stress testing before pushing.

Related to the second ALTER TABLE statement, I do not completely understand the pre-existing logic in ha_innobase::inplace_alter_table() for updating dict_table_t::vc_templ. It would seem to me that there are race conditions with concurrently executed DML. Isn’t the s_templ only supposed to be used by the ALTER TABLE thread, and not by concurrently running DML operations? Wouldn’t it be safer to avoid updating dict_table_t::vc_templ before the transaction is committed, and make CREATE INDEX use its own private data structure? Maybe you can file a separate ticket for that?

Comment by Nikita Malyavin [ 2021-07-28 ]

marko I am not sure whether vc_templ is first created on CREATE TABLE, or on INSERT

Comment by Nikita Malyavin [ 2021-07-28 ]

New patch is: https://github.com/MariaDB/server/commit/3817d067f9bc51191e302887534576bf4903fe28

Comment by Nikita Malyavin [ 2021-07-28 ]

Marko:
I wouldn’t use Markdown in commit messages, because not even the Github web interface is interpreting that. So, please remove the backticks from the commit message.
There is a typo s/od/do/. Also, please do not refer to ha_innobase::open() by "open()".
Add --echo # End of 10.2 tests to the end of innodb_virtual_index.test.
I asked you to format the prototype of alter_templ_needs_rebuild with fewer line breaks. You can copy that code snippet from my previous review.
OK to push with these.

New patch is: https://github.com/MariaDB/server/commit/e7e8a2daa86e40c53eeaee81e0ccb9735ed9584c

Comment by Nikita Malyavin [ 2021-07-28 ]

10.6 version:
https://github.com/MariaDB/server/commit/122457d517dbf47a96f302a877d75f11ad9ced6a
(bb-10.6-nikita-MDEV-26229)

Comment by Marko Mäkelä [ 2021-07-28 ]

Thank you, this looks OK from my point of view. Let us wait for the test results from mleich.

Generated at Thu Feb 08 08:57:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.