[MDEV-16131] Assertion `is_instant() || id == DICT_INDEXES_ID' failed in dict_index_t::instant_field_value Created: 2018-05-09  Updated: 2021-11-17  Resolved: 2018-07-26

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.2
Fix Version/s: 10.3.9

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Attachments: File MDEV-16131.cc     File MDEV-16131.sh     File MDEV-16131.yy     File MDEV-16131.zz     HTML File threads    
Issue Links:
Problem/Incident
is caused by MDEV-11369 Instant add column for InnoDB Closed
Relates
relates to MDEV-24730 Corruption in online ALTER TABLE with... Closed
relates to MDEV-23805 Make Online DDL to Instant DDL when t... Closed

 Description   

10.3 85ccdd9ff0a7ca

mysqld: /data/src/10.3/storage/innobase/include/dict0mem.h:1075: const byte* dict_index_t::instant_field_value(ulint, ulint*) const: Assertion `is_instant() || id == DICT_INDEXES_ID' failed.
180509 23:06:47 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f519db22312 in __GI___assert_fail (assertion=0x7f51a0fb26c0 "is_instant() || id == DICT_INDEXES_ID", file=0x7f51a0fb2600 "/data/src/10.3/storage/innobase/include/dict0mem.h", line=1075, function=0x7f51a0fb8bc0 <dict_index_t::instant_field_value(unsigned long, unsigned long*) const::__PRETTY_FUNCTION__> "const byte* dict_index_t::instant_field_value(ulint, ulint*) const") at assert.c:101
#8  0x00007f51a07fbf44 in dict_index_t::instant_field_value (this=0x7f511c503868, n=7, len=0x7f519cbe79a8) at /data/src/10.3/storage/innobase/include/dict0mem.h:1075
#9  0x00007f51a08beabe in rec_init_offsets_comp_ordinary (rec=0x7f517830a007 "", index=0x7f511c503868, offsets=0x7f511c509c20, n_core=6, format=REC_LEAF_TEMP_COLUMNS_ADDED) at /data/src/10.3/storage/innobase/rem/rem0rec.cc:384
#10 0x00007f51a08c2f82 in rec_init_offsets_temp (rec=0x7f517830a007 "", index=0x7f511c503868, offsets=0x7f511c509c20, n_core=6, status=REC_STATUS_COLUMNS_ADDED) at /data/src/10.3/storage/innobase/rem/rem0rec.cc:1716
#11 0x00007f51a0922477 in row_log_table_apply_op (thr=0x7f511c521dd8, new_trx_id_col=1, dup=0x7f519cbe7de0, error=0x7f519cbe7b8c, offsets_heap=0x7f511c531f30, heap=0x7f511c52dea0, mrec=0x7f517830a007 "", mrec_end=0x7f517830a150 "", offsets=0x7f511c509c20) at /data/src/10.3/storage/innobase/row/row0log.cc:2476
#12 0x00007f51a092482f in row_log_table_apply_ops (thr=0x7f511c521dd8, dup=0x7f519cbe7de0, stage=0x7f511c50b6f0) at /data/src/10.3/storage/innobase/row/row0log.cc:3063
#13 0x00007f51a0924de7 in row_log_table_apply (thr=0x7f511c521dd8, old_table=0x7f511c506908, table=0x7f511c51e990, stage=0x7f511c50b6f0) at /data/src/10.3/storage/innobase/row/row0log.cc:3163
#14 0x00007f51a07fed5f in commit_try_rebuild (ha_alter_info=0x7f519cbe9450, ctx=0x7f511c0175c0, altered_table=0x7f511c51e990, old_table=0x7f510c053b90, trx=0x7f519cef3908, table_name=0x7f511c2ee8c5 "t1") at /data/src/10.3/storage/innobase/handler/handler0alter.cc:8488
#15 0x00007f51a07f93e6 in ha_innobase::commit_inplace_alter_table (this=0x7f510c0547d8, altered_table=0x7f511c51e990, ha_alter_info=0x7f519cbe9450, commit=true) at /data/src/10.3/storage/innobase/handler/handler0alter.cc:9354
#16 0x00007f51a05679e0 in handler::ha_commit_inplace_alter_table (this=0x7f510c0547d8, altered_table=0x7f511c51e990, ha_alter_info=0x7f519cbe9450, commit=true) at /data/src/10.3/sql/handler.cc:4407
#17 0x00007f51a034a019 in mysql_inplace_alter_table (thd=0x7f511c000b00, table_list=0x7f511c015d28, table=0x7f510c053b90, altered_table=0x7f511c51e990, ha_alter_info=0x7f519cbe9450, inplace_supported=HA_ALTER_INPLACE_COPY_NO_LOCK, target_mdl_request=0x7f519cbe94c0, alter_ctx=0x7f519cbea0c0) at /data/src/10.3/sql/sql_table.cc:7553
#18 0x00007f51a034fe76 in mysql_alter_table (thd=0x7f511c000b00, new_db=0x7f511c005170, new_name=0x7f511c005520, create_info=0x7f519cbeacb0, table_list=0x7f511c015d28, alter_info=0x7f519cbeabf0, order_num=0, order=0x0, ignore=false) at /data/src/10.3/sql/sql_table.cc:9644
#19 0x00007f51a03d61d6 in Sql_cmd_alter_table::execute (this=0x7f511c0163a8, thd=0x7f511c000b00) at /data/src/10.3/sql/sql_alter.cc:463
#20 0x00007f51a027b213 in mysql_execute_command (thd=0x7f511c000b00) at /data/src/10.3/sql/sql_parse.cc:6283
#21 0x00007f51a027fd4a in mysql_parse (thd=0x7f511c000b00, rawbuf=0x7f511c015c08 "ALTER TABLE t1 DROP COLUMN col3  /* QNO 229 CON_ID 16 */", length=56, parser_state=0x7f519cbec630, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:8003
#22 0x00007f51a026d44f in dispatch_command (command=COM_QUERY, thd=0x7f511c000b00, packet=0x7f511c01c511 " ALTER TABLE t1 DROP COLUMN col3  /* QNO 229 CON_ID 16 */ ", packet_length=58, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:1846
#23 0x00007f51a026be73 in do_command (thd=0x7f511c000b00) at /data/src/10.3/sql/sql_parse.cc:1391
#24 0x00007f51a03d088f in do_handle_one_connection (connect=0x7f51a3f1e010) at /data/src/10.3/sql/sql_connect.cc:1402
#25 0x00007f51a03d0613 in handle_one_connection (arg=0x7f51a3f1e010) at /data/src/10.3/sql/sql_connect.cc:1308
#26 0x00007f519f789064 in start_thread (arg=0x7f519cbed700) at pthread_create.c:309
#27 0x00007f519dbdc62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Query (0x7f511c015c08): ALTER TABLE t1 DROP COLUMN col3  /* QNO 229 CON_ID 16 */
Connection ID (thread ID): 16
Status: NOT_KILLED

Very poorly reproducible, so far once in travis tests and once locally out of ~40 attempts

experimental 6c40fc679d99

perl ./runall-trials.pl --trials=20 --duration=400 --threads=6 --seed=1525833353 --reporters=Backtrace,ErrorLog,Deadlock --validators=TransformerNoComparator --transformers=ConvertSubqueriesToViews,ConvertTablesToDerived,Count,DisableOptimizations,Distinct,EnableOptimizations,ExecuteAsCTE,ExecuteAsDeleteReturning,ExecuteAsDerived,ExecuteAsExcept,ExecuteAsExecuteImmediate,ExecuteAsInsertSelect,ExecuteAsIntersect,ExecuteAsSelectItem,ExecuteAsUnion,ExecuteAsUpdateDelete,ExecuteAsView,ExecuteAsWhereSubquery,Having,InlineSubqueries,LimitRowsExamined,OrderBy,StraightJoin,ExecuteAsPreparedTwice,ExecuteAsTrigger,ExecuteAsSPTwice,ExecuteAsFunctionTwice --redefine=conf/mariadb/general-workarounds.yy --mysqld=--log_output=FILE --mysqld=--log_bin_trust_function_creators=1 --mysqld=--log-bin --mysqld=--max-statement-time=30 --mysqld=--loose-debug_assert_on_not_freed_memory=0 --views --redefine=conf/mariadb/sequences.yy --basedir=/data/bld/10.3 --grammar=conf/runtime/alter_online.yy --gendata=conf/runtime/alter_online.zz --engine=InnoDB --vardir=/dev/shm/vardir



 Comments   
Comment by Elena Stepanova [ 2018-05-15 ]

New occurrence: https://api.travis-ci.org/v3/job/378823604/log.txt

Comment by Matthias Leich [ 2018-07-19 ]

I have some RQG grammar which replays the problem quite fast.
But I do not know if the '+d,ib_rename_indexes_too_many_concurrent_trxs' has sideeffects
which leaves the server in some bad state and than is some assert maybe to expected.

query:
    ALTER TABLE t1 ADD COLUMN col2_copy INT ; ALTER TABLE t1 DROP COLUMN col2 ; ALTER TABLE t1 CHANGE COLUMN col2_copy col2 INT ; SET SESSION DEBUG_DBUG = '' |
    SET SESSION DEBUG_DBUG = '+d,ib_rename_indexes_too_many_concurrent_trxs' ; DELETE FROM t1 ; INSERT INTO t1 VALUES (1,1); SET SESSION DEBUG_DBUG = '';
 
query_init:
    CREATE TABLE IF NOT EXISTS t1 (col1 INT, col2 INT) ENGINE = InnoDB;

MariaDB 10.3 compiled with debug
commit 141a5b24843c626f545a0c2f49013b54da194c0d
~ Sat Jul 14
The binary claims to be a 10.3.9.

My backtrace:

mysqld: 10.3/storage/innobase/include/dict0mem.h:1094: const byte* dict_index_t::instant_field_value(ulint, ulint*) const: Assertion `is_instant() || id == DICT_INDEXES_ID' failed.
[ERROR] mysqld got signal 6 ;
...
Query (0x7fccc0012d70): ALTER TABLE t1 DROP COLUMN col2  /* E_R Thread6 QNO 2375 CON_ID 19 */
Connection ID (thread ID): 19
Status: NOT_KILLED
 
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#5  0x00007fcd70a97f5d in __GI_abort ()
    at abort.c:90
#6  0x00007fcd70a8df17 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x557d00ca6368 "is_instant() || id == DICT_INDEXES_ID", file=file@entry=0x557d00ca6288 "storage/innobase/include/dict0mem.h", line=line@entry=1094, function=function@entry=0x557d00cac820 <dict_index_t::instant_field_value(unsigned long, unsigned long*) const::__PRETTY_FUNCTION__> "const byte* dict_index_t::instant_field_value(ulint, ulint*) const") 
    at assert.c:92
#7  0x00007fcd70a8dfc2 in __GI___assert_fail (assertion=0x557d00ca6368 "is_instant() || id == DICT_INDEXES_ID", file=0x557d00ca6288 "storage/innobase/include/dict0mem.h", line=1094, function=0x557d00cac820 <dict_index_t::instant_field_value(unsigned long, unsigned long*) const::__PRETTY_FUNCTION__> "const byte* dict_index_t::instant_field_value(ulint, ulint*) const") 
    at assert.c:101
#8  0x0000557d0049776d in dict_index_t::instant_field_value (this=0x7fccd4029848, n=4, len=0x7fcd50c3b338) 
    at storage/innobase/include/dict0mem.h:1094
#9  0x0000557d0055db28 in rec_init_offsets_comp_ordinary (rec=0x7fcd50a63004 "", index=0x7fccd4029848, offsets=0x7fccc009a130, n_core=4, format=REC_LEAF_ORDINARY) 
    at storage/innobase/rem/rem0rec.cc:384
#10 0x0000557d0056203d in rec_init_offsets_temp (rec=0x7fcd50a63004 "", index=0x7fccd4029848, offsets=0x7fccc009a130, n_core=4, status=REC_STATUS_ORDINARY) 
    at storage/innobase/rem/rem0rec.cc:1713
#11 0x0000557d005c2617 in row_log_table_apply_op (thr=0x7fccc00a5358, new_trx_id_col=1, dup=0x7fcd50c3b780, error=0x7fcd50c3b4fc, offsets_heap=0x7fccc00d56a0, heap=0x7fccc00d1610, mrec=0x7fcd50a63004 "", mrec_end=0x7fcd50a6301f "", offsets=0x7fccc009a130) 
    at storage/innobase/row/row0log.cc:2418
#12 0x0000557d005c4970 in row_log_table_apply_ops (thr=0x7fccc00a5358, dup=0x7fcd50c3b780, stage=0x7fccc0094d40) 
    at storage/innobase/row/row0log.cc:2988
#13 0x0000557d005c4f72 in row_log_table_apply (thr=0x7fccc00a5358, old_table=0x7fcccc0234c8, table=0x7fccc00ab8f8, stage=0x7fccc0094d40, new_table=0x7fccc00234f8) 
    at storage/innobase/row/row0log.cc:3097
#14 0x0000557d0049a70d in commit_try_rebuild (ha_alter_info=0x7fcd50c3cc70, ctx=0x7fccc0014100, altered_table=0x7fccc00ab8f8, old_table=0x7fccc00a7b38, trx=0x7fcd5b3ff3b8, table_name=0x7fccdc244d5d "t1") 
    at storage/innobase/handler/handler0alter.cc:8692
#15 0x0000557d00494b94 in ha_innobase::commit_inplace_alter_table (this=0x7fccc00a6fb0, altered_table=0x7fccc00ab8f8, ha_alter_info=0x7fcd50c3cc70, commit=true) 
    at storage/innobase/handler/handler0alter.cc:9555
#16 0x0000557d00245579 in handler::ha_commit_inplace_alter_table (this=0x7fccc00a6fb0, altered_table=0x7fccc00ab8f8, ha_alter_info=0x7fcd50c3cc70, commit=true) 
    at sql/handler.cc:4416
#17 0x0000557d00027c69 in mysql_inplace_alter_table (thd=0x7fccc0000ce8, table_list=0x7fccc0012ea8, table=0x7fccc00a7b38, altered_table=0x7fccc00ab8f8, ha_alter_info=0x7fcd50c3cc70, inplace_supported=HA_ALTER_INPLACE_COPY_NO_LOCK, target_mdl_request=0x7fcd50c3cce0, alter_ctx=0x7fcd50c3d8e0) 
    at sql/sql_table.cc:7616
#18 0x0000557d0002daa7 in mysql_alter_table (thd=0x7fccc0000ce8, new_db=0x7fccc0005398, new_name=0x7fccc0005750, create_info=0x7fcd50c3e4d0, table_list=0x7fccc0012ea8, alter_info=0x7fcd50c3e410, order_num=0, order=0x0, ignore=false) 
    at sql/sql_table.cc:9710
#19 0x0000557d000b4212 in Sql_cmd_alter_table::execute (this=0x7fccc0013520, thd=0x7fccc0000ce8) 
    at sql/sql_alter.cc:489
#20 0x0000557cfff58972 in mysql_execute_command (thd=0x7fccc0000ce8) 
    at sql/sql_parse.cc:6280
#21 0x0000557cfff5dbf3 in mysql_parse (thd=0x7fccc0000ce8, rawbuf=0x7fccc0012d70 "ALTER TABLE t1 DROP COLUMN col2  /* E_R Thread6 QNO 2375 CON_ID 19 */", length=69, parser_state=0x7fcd50c3f630, is_com_multi=false, is_next_command=false) 
    at sql/sql_parse.cc:8073
#22 0x0000557cfff4acab in dispatch_command (command=COM_QUERY, thd=0x7fccc0000ce8, packet=0x7fccc000aa39 " ALTER TABLE t1 DROP COLUMN col2  /* E_R Thread6 QNO 2375 CON_ID 19 */ ", packet_length=71, is_com_multi=false, is_next_command=false) 
    at sql/sql_parse.cc:1846
#23 0x0000557cfff4970c in do_command (thd=0x7fccc0000ce8) 
    at sql/sql_parse.cc:1391

Comment by Matthias Leich [ 2018-07-19 ]

No replay on a 10.2 compiled with debug within 8 attempts.

I have uploaded the files required for the replay.
You will need my experimental RQG.
git clone https://github.com/mleich1/rqg RQG_mleich1
cd RQG_mleich1
git checkout experimental
In case that does not work please show up.

Comment by Marko Mäkelä [ 2018-07-26 ]

There is no debug injection point ib_rename_indexes_too_many_concurrent_trxs in MariaDB, so the SET DEBUG_DBUG should have no effect.
The problem appears to be that a table-rebuilding ALTER (as needed for the DROP COLUMN before MDEV-15562) fails to properly parse the table.
Likely, a purge following the DELETE FROM t1 will occasionally empty the table (removing its ‘instantness’), and row_log_table_apply() is not prepared for the table format change.

I can repeat this with a slightly simpler test:

--source include/have_innodb.inc
CREATE TABLE t1 (col1 INT PRIMARY KEY) ENGINE=InnoDB;
sleep 3600;
DROP TABLE t1;

Then, after starting the test, start two clients:

yes 'ALTER TABLE t1 ADD COLUMN col2 INT; ALTER TABLE t1 DROP COLUMN col2;'|
$BUILDDIR/client/mysql --defaults-file=$BUILDDIR/mysql-test/var/my.cnf -uroot test &
yes 'REPLACE INTO t1 SET col1=1; DELETE FROM t1;'|
$BUILDDIR/client/mysql --defaults-file=$BUILDDIR/mysql-test/var/my.cnf -uroot test &

This will result in the assertion failure during the execution of DROP COLUMN.

Comment by Marko Mäkelä [ 2018-07-26 ]

I think that the following is happening:

  1. ALTER TABLE t1 DROP COLUMN col2; was started.
  2. REPLACE INTO t1 SET col1=1; was executed while the table was in the "instantly added column" format, and it wrote a ROW_T_INSERT record that did not contain an explicit value for col2.
  3. DELETE FROM t1; and a subsequent purge were executed. This emptied the table and converted it to the canonical "non-instant" format.
  4. Applying the ROW_T_INSERT record fails in ALTER TABLE when trying to retrieve the value for col2, because the table no longer is in "instantly added column" format.

We could try 2 fixes:

  1. If a table rebuild is in progress, never convert the table to non-instant format when it becomes empty.
  2. In the log that is handled by row_log_table_apply(), explicitly store values for all columns, no matter if they were instantly added.

I believe that the first fix is easier and less risky. Converting the table to the canonical format is merely an optimization for performance and file format compatibility. If the table-rebuilding ALTER succeeds, the table would be in the canonical format again. If the ALTER fails, then the table would remain in the less efficient "instant" format.

Comment by Marko Mäkelä [ 2018-07-26 ]

Deterministic test case:

--source include/have_innodb.inc
--source include/have_debug.inc
--source include/have_debug_sync.inc
 
SET @saved_frequency = @@GLOBAL.innodb_purge_rseg_truncate_frequency;
SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;
 
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t1 SET a=0;
ALTER TABLE t1 ADD COLUMN b INT;
 
connect (con1,localhost,root);
START TRANSACTION WITH CONSISTENT SNAPSHOT;
 
connection default;
DELETE FROM t1;
 
connection con1;
SET DEBUG_SYNC = 'row_log_table_apply1_before SIGNAL go WAIT_FOR insert';
send ALTER TABLE t1 FORCE;
 
connection default;
SET DEBUG_SYNC = 'now WAIT_FOR go';
let $wait_all_purged = 1;
--source include/wait_all_purged.inc
INSERT INTO t1 SET a=1;
SET DEBUG_SYNC = 'now SIGNAL insert';
 
connection con1;
reap;
disconnect con1;
 
connection default;
SET DEBUG_SYNC = RESET;
SELECT * FROM t1;
DROP TABLE t1;
SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency;

Comment by Marko Mäkelä [ 2018-07-26 ]

The test case needs 2 auxiliary connections to be deterministic. We must keep the read view open until after the ALTER TABLE has been sent. Also, adding a default value of NULL is a special case. When adding a column with a NOT NULL DEFAULT value, then we would have a similar problem in row_log_table_apply_convert_mrec().

It turns out to be simplest to introduce row_log_t::non_core_fields[] for storing the default values, and rely on those during row_log_table_apply(). In that way, we are free to invoke dict_index_t::remove_instant() on the source table at any time.

I implemented this fix, but it needs some adjustment, because my test is randomly crashing due to row_log_table_apply_op() not consuming all data for instantly added columns.

Comment by Marko Mäkelä [ 2018-07-26 ]

My fix ensures that the online_log is written and parsed in a consistent way. It was easier to change how online table rebuild works than to try avoid calls to dict_index_t::remove_instant().

Generated at Thu Feb 08 08:26:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.