[MDEV-31441] BLOB corruption on UPDATE of PRIMARY KEY with FOREIGN KEY Created: 2023-06-09  Updated: 2023-12-15  Resolved: 2023-11-29

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.4, 10.5, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2, 11.3, 10.6.15
Fix Version/s: 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2

Type: Bug Priority: Critical
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: rr-profile-analyzed

Attachments: File TBR-1808.cfg     File TBR-1808.yy    
Issue Links:
Duplicate
duplicates MDEV-30869 Failing assertion: rec == NULL || !(d... Open
Relates
relates to MDEV-32174 ROW_FORMAT=COMPRESSED table corruptio... Confirmed

 Description   

origin/10.6 8171f9da87c9e062b056834942634f04f39d5aea 2023-06-09T10:22:34+10:00
mysqld: /data/Server/10.6K/storage/innobase/row/row0upd.cc:2132: bool row_upd_clust_rec_by_insert_inherit_func(const rec_t*, dict_index_t*, const rec_offs*, dtuple_t*, const upd_t*): Assertion `!(rec_data[12U] & 128U)' failed.
...
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000078866d959859 in __GI_abort () at abort.c:79
#2  0x000078866d959729 in __assert_fail_base (fmt=0x78866daef588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55bc4a7879a5 "!(rec_data[12U] & 128U)", 
    file=0x55bc4a787b78 "/data/Server/10.6K/storage/innobase/row/row0upd.cc", line=2132, function=<optimized out>) at assert.c:92
#3  0x000078866d96af36 in __GI___assert_fail (assertion=assertion@entry=0x55bc4a7879a5 "!(rec_data[12U] & 128U)", file=file@entry=0x55bc4a787b78 "/data/Server/10.6K/storage/innobase/row/row0upd.cc", line=line@entry=2132, 
    function=function@entry=0x55bc4a787dc8 "bool row_upd_clust_rec_by_insert_inherit_func(const rec_t*, dict_index_t*, const rec_offs*, dtuple_t*, const upd_t*)") at assert.c:101
#4  0x000055bc4a0f7f2a in row_upd_clust_rec_by_insert_inherit_func (rec=rec@entry=0xca11b4c0da "\200", index=index@entry=0x630d14020d10, offsets=offsets@entry=0x7fa148ee5db0, entry=entry@entry=0x3ef2c063640, 
    update=0x630d14025730) at /data/Server/10.6K/storage/innobase/row/row0upd.cc:2132
#5  0x000055bc4a100576 in row_upd_clust_rec_by_insert (node=node@entry=0x630d14025530, index=index@entry=0x630d14020d10, thr=thr@entry=0x3ef2c04eba0, referenced=referenced@entry=true, foreign=foreign@entry=false, 
    mtr=mtr@entry=0x7fa148ee6300) at /data/Server/10.6K/storage/innobase/row/row0upd.cc:2263
#6  0x000055bc4a102a43 in row_upd_clust_step (node=node@entry=0x630d14025530, thr=thr@entry=0x3ef2c04eba0) at /data/Server/10.6K/storage/innobase/row/row0upd.cc:2682
#7  0x000055bc4a102d95 in row_upd (node=node@entry=0x630d14025530, thr=thr@entry=0x3ef2c04eba0) at /data/Server/10.6K/storage/innobase/row/row0upd.cc:2758
#8  0x000055bc4a103440 in row_upd_step (thr=thr@entry=0x3ef2c04eba0) at /data/Server/10.6K/storage/innobase/row/row0upd.cc:2900
#9  0x000055bc4a0a3987 in row_update_for_mysql (prebuilt=0x630d14024b20) at /data/Server/10.6K/storage/innobase/row/row0mysql.cc:1691
#10 0x000055bc49f14b5b in ha_innobase::update_row (this=0x630d1402cf40, old_row=0x630d14027518 "\376\v\001", new_row=0x630d14027508 "\376", <incomplete sequence \304>)
    at /data/Server/10.6K/storage/innobase/handler/ha_innodb.cc:8691
#11 0x000055bc49ba740d in handler::ha_update_row (this=0x630d1402cf40, old_data=0x630d14027518 "\376\v\001", new_data=0x630d14027508 "\376", <incomplete sequence \304>) at /data/Server/10.6K/sql/handler.cc:7684
#12 0x000055bc499b14bc in mysql_update (thd=thd@entry=0x3ef2c002788, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, 
    ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /data/Server/10.6K/sql/sql_update.cc:1087
#13 0x000055bc498aa9c0 in mysql_execute_command (thd=thd@entry=0x3ef2c002788, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /data/Server/10.6K/sql/sql_limit.h:94
#14 0x000055bc49894a47 in mysql_parse (thd=thd@entry=0x3ef2c002788, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7fa148ee7370) at /data/Server/10.6K/sql/sql_parse.cc:8036
#15 0x000055bc498a4204 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x3ef2c002788, packet=packet@entry=0x3ef2c00c489 "UPDATE test.t_p SET col1 = 196 /* E_R Thread2 QNO 57 CON_ID 17 */ ", 
    packet_length=packet_length@entry=66, blocking=blocking@entry=true) at /data/Server/10.6K/sql/sql_class.h:1388
#16 0x000055bc498a762c in do_command (thd=0x3ef2c002788, blocking=blocking@entry=true) at /data/Server/10.6K/sql/sql_parse.cc:1409
#17 0x000055bc49a0e122 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55bc4c842b68, put_in_cache=put_in_cache@entry=true) at /data/Server/10.6K/sql/sql_connect.cc:1416
#18 0x000055bc49a0e603 in handle_one_connection (arg=0x55bc4c842b68) at /data/Server/10.6K/sql/sql_connect.cc:1318
#19 0x000078866db2f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#20 0x000078866da56293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr)
pluto:/data/results/1686309626/TBR-1808$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio
 
RQG
====
# git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
#
# GIT_SHOW: HEAD -> master, origin/master, origin/HEAD ad3888c646e0b35ae2e4aa3fa4a568e25dbd92fd 2023-06-08T21:26:08+02:00
# rqg.pl  : Version 4.2.1 (2022-12)
#
# $RQG_HOME/rqg.pl \
# --duration=300 \
# --queries=10000000 \
# --no_mask \
# --seed=random \
# --gendata_sql=conf/mariadb/table_stress.sql \
# --rpl_mode=none \
# --max_gd_duration=1200 \
# --engine=InnoDB \
# --gendata=conf/mariadb/table_stress.zz \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --mysqld=--innodb-buffer-pool-size=256M \
# --mysqld=--innodb_page_size=16K \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--loose_innodb_change_buffering=none \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--log-output=none \
# --mysqld=--file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--loose-max-statement-time=30 \
# --mysqld=--loose-plugin-load-add=provider_lz4.so \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=on \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--log-bin \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--loose-innodb-sync-debug \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-innodb_compression_level=1 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--innodb_undo_log_truncate=OFF \
# --reporters=Backtrace,Deadlock,None \
# --validators=None \
# --grammar=TBR-1808.yy \
# --threads=2 \
# <local settings>
 
The same test showed on various development trees MDEV-30869 instead.
 
rqg grammar
create_child:
    CREATE TABLE birth . child LIKE birth . parent ; ALTER TABLE birth . child ADD FOREIGN KEY (col1) REFERENCES birth . parent(col1) ;
create_parent:
    CREATE TABLE birth . parent ( col1 INT PRIMARY KEY, col_text TEXT ) ;
move_to_test:
    RENAME TABLE birth . parent TO test . t_p , birth . child TO test . t_c ;
 
my_int:
    { $my_int= 1 } |
    { $my_int= $prng->int( 65, 512) } ;
 
thread1:
    UPDATE test . t_p SET col1 = 1 |
    UPDATE test . t_c SET col1 = 1 ;
 
thread1_connect:
    ;
 
thread1_init:
    CREATE SCHEMA birth ; create_parent ; create_child ; move_to_test ;
 
thread2:
    DELETE FROM test . t_p WHERE col1 = my_int OR col1 IS NULL |
    DELETE FROM test . t_c WHERE col1 = my_int OR col1 IS NULL |
    UPDATE test . t_p SET col1 = { $my_int= $prng->int( 65, 512) } |
    UPDATE test . t_p SET col1 = { $my_int= $prng->int( 65, 512) } |
    INSERT INTO test . t_p (col1, col_text) VALUES ( 1 , REPEAT('A', @fill_amount) ) |
    INSERT INTO test . t_c (col1, col_text) VALUES ( 1 , REPEAT('A', @fill_amount) ) ;
 
thread2_connect:
    SET AUTOCOMMIT = 1; SET @fill_amount = (@@innodb_page_size / 2 ) + 1 ;
 
thread2_init:
    ;



 Comments   
Comment by Marko Mäkelä [ 2023-06-09 ]

The assertion is failing because during the UPDATE of the PRIMARY KEY column col1, we notice that a BLOB is not "owned" by the record.

In the trace that I analyzed, the primary key value col1=1 is being updated: UPDATE test.t_p SET col1 = 267. This is supposed to cause delete-marking of the old record 1 and an ownership transfer to the new record 267.

Comment by Marko Mäkelä [ 2023-06-09 ]

Before the UPDATE that leads to the assertion failure, the primary key index consists of a single page, comprising the following records:
(col1,DB_TRX_ID,DB_ROLL_PTR,col_text)=
delete-marked(1,0xa2,(update),(disowned BLOB)),
delete-marked(222,0x63,(update),(disowned BLOB)),
(267,0xa2,(insert),(disowned BLOB))).

The UPDATE would delete-mark the record 267 and set its transaction ID to 0xa5. It looks like already transaction 0xa2 (executing an identical statement) did something fishy, inserting a disowned BLOB to the new record. Before that UPDATE, the contents of the index was as follows:
(col1,DB_TRX_ID,DB_ROLL_PTR,col_text)=
(1,0x63,(update),(inherited BLOB)),
delete-marked(222,0x63,(update),(disowned BLOB))).
This state looks valid to me.

This bug does not necessarily have anything to do with FOREIGN KEY constraints. The constraint might merely be useful for achieving (un)fortunate timing with respect to the purge of transaction history. It would seem that frequent updates of PRIMARY KEY could cause trouble with BLOBs.

Comment by Marko Mäkelä [ 2023-07-06 ]

I checked this in a little more detail, setting breakpoints on dispatch_command and row_upd_clust_step, and initially setting a watchpoint on PAGE_N_RECS ((char)0xca11b4c036@2).

The assertion fails on the parent table t_p. We should be able to ignore any operations on the child table t_c. There are no secondary indexes on either table.

transaction ID when statement
0x49 34376 INSERT INTO test.t_p (col1, col_text) VALUES ( 1, REPEAT('A', @fill_amount) );
0x55 34691 UPDATE test.t_p SET col1 = 237;
0x5a 34849 UPDATE test.t_p SET col1 = 118; (rolled back)
0x62 35092 UPDATE test.t_p SET col1 = 222;
0x63 35142 UPDATE test.t_p SET col1 = 1;
0x69 35311 UPDATE test.t_p SET col1 = 353; (probably rejected due to FK violation)
0x76 35665 UPDATE test.t_p SET col1 = 493; (ditto)
0x84 36038 INSERT INTO test.t_p (col1, col_text) VALUES ( 1, REPEAT('A', @fill_amount) ); (duplicate key error)
0x86 36088 UPDATE test.t_p SET col1 = 464; (will be rolled back)
0x89 36181 UPDATE test.t_p SET col1 = 420; (will be rolled back)
0x8a 36214 UPDATE test.t_p SET col1 = 1; (waiting in lock_wait() while 0x89 is rolled back; probably timed out)
0x8c 36272 UPDATE test.t_p SET col1 = 289; (will be rolled back)
0x8f 36365 UPDATE test.t_p SET col1 = 380; (will be rolled back)
0x92 36449 UPDATE test.t_p SET col1 = 216; (will be rolled back)
0x97 36560 UPDATE test.t_p SET col1 = 435; (will be rolled back)
0xa2 36841 UPDATE test.t_p SET col1 = 267;
0xa5 36929 UPDATE test.t_p SET col1 = 196; (assertion failure)

I think that it should be possible to convert this to a single-connection, single-table, single-row test case, something like this:

CREATE TABLE t_p (col1 INT PRIMARY KEY, col_text TEXT) ENGINE=InnoDB;
SET @fill_amount = (@@innodb_page_size / 2 ) + 1;
INSERT INTO test.t_p VALUES ( 1, REPEAT('A', @fill_amount) );
UPDATE t_p SET col1=237;
UPDATE t_p SET col1=222;
UPDATE t_p SET col1=1;
UPDATE t_p SET col1=267;
UPDATE test.t_p SET col1 = 196;

Timing or purge control might be important. Because we are updating the PRIMARY KEY here, multiple copies of the row will exist in the clustered index. In this trace, PAGE_N_RECS was incremented to 1 by the INSERT, to 2 by the first UPDATE, and to 3 by the UPDATE to 267.

Comment by Marko Mäkelä [ 2023-11-23 ]

The flag that trips the debug assertion is set in btr_cur_disown_inherited_fields(). It is supposed to be reset on a rollback, according to a comment in its only caller:

		/* If the the new row inherits externally stored
		fields (off-page columns a.k.a. BLOBs) from the
		delete-marked old record, mark them disowned by the
		old record and owned by the new entry. */
 
		if (rec_offs_any_extern(offsets)) {
			if (row_upd_clust_rec_by_insert_inherit(
				    rec, index, offsets,
				    entry, node->update)) {
				/* The blobs are disowned here, expecting the
				insert down below to inherit them.  But if the
				insert fails, then this disown will be undone
				when the operation is rolled back. */
				btr_cur_disown_inherited_fields(
					btr_cur_get_block(btr_cur),
					rec, index, offsets, node->update,
					mtr);
			}
		}

Unfortunately, the rr replay trace does not work anymore, because the file /usr/lib/rr/librrpage.so differs from the rr record run.

Is this reproducible if the table t_c is removed from the grammar?

Comment by Matthias Leich [ 2023-11-23 ]

pluto:/data/results/1700764387/TBR-1808-MDEV-31441

There was no replay when removing the table t_c from the grammar.

Comment by Marko Mäkelä [ 2023-11-24 ]

It looks like the PRIMARY KEY(col1) is being updated back and forth quite a bit. The flag is set during the execution of the following statement, which seems to match exactly one row.

UPDATE test.t_p SET col1 = 12 /* E_R Thread1 QNO 12 CON_ID 12 */;

It is writing 2 undo log records, one for inserting the updated row and another for delete-marking the old row. After this, we get the assertion failure when executing a statement in another connection:

UPDATE test.t_p SET col1 = 1 /* E_R Thread2 QNO 27 CON_ID 13 */;

At the time of the first UPDATE, there is exactly one record in the table, with col1=1.

There must be some reason why this is not repeatable by running the following test:

--source include/have_innodb.inc
CREATE TABLE t_p (col1 INT PRIMARY KEY, col_text TEXT) ENGINE=InnoDB;
INSERT INTO t_p VALUES (1, REPEAT('A', @@innodb_page_size / 2));
UPDATE t_p SET col1=12;
UPDATE t_p SET col1=1;
DROP TABLE t_p;

Comment by Marko Mäkelä [ 2023-11-24 ]

I can reproduce the failure with the following test case:

--source include/have_innodb.inc
CREATE TABLE t1 (pk INT PRIMARY KEY, t TEXT) ENGINE=InnoDB;
CREATE TABLE t2 (pk INT PRIMARY KEY, FOREIGN KEY (pk) REFERENCES t1(pk))
ENGINE=InnoDB;
 
INSERT INTO t1 SET pk=1, t=REPEAT('A', @@innodb_page_size / 2);
INSERT INTO t2 SET pk=1;
--connect (con1,localhost,root,,test)
BEGIN;
DELETE FROM t2;
--connection default
--send
UPDATE t1 SET pk=12;
--connection con1
let $wait_condition=
SELECT count(*) > 0 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE state='Updating';
--source include/wait_condition.inc
COMMIT;
--connection default
--reap
UPDATE t1 SET pk=1;
DROP TABLE t2, t1;

10.6 2f467de4c4851d2f9b0f3bec54f748d92349582a

mariadbd: /mariadb/10.6/storage/innobase/row/row0upd.cc:2130: bool row_upd_clust_rec_by_insert_inherit_func(const rec_t*, dict_index_t*, const rec_offs*, dtuple_t*, const upd_t*): Assertion `!(rec_data[12U] & 128U)' failed.

The issue is that the first UPDATE will copy the ‘disowned’ flag from the original record’s BLOB, so neither of the 2 records in the parent table (the delete-marked 1, or the normal 12) will ‘own’ the BLOB.

Comment by Marko Mäkelä [ 2023-11-27 ]

mleich, please test the patch on both 10.4 and 10.6.

Comment by Matthias Leich [ 2023-11-28 ]

The trees
- origin/10.4-MDEV-31441 89bc52f131440c93e79c32c04f1239ec6063ca03 2023-11-27T10:05:39+02:00
- origin/10.6 2f467de4c4851d2f9b0f3bec54f748d92349582a 2023-11-24T14:25:32+02:00
  + cherry pick of MDEV-31441 patch
performed well in RQG testing.

Comment by Vladislav Lesin [ 2023-11-29 ]

Looks good to me.

Generated at Thu Feb 08 10:23:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.