[MDEV-29181] Potential corruption on Foreign key update on a table with vcol index Created: 2022-07-27  Updated: 2024-01-03

Status: In Review
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Virtual Columns
Affects Version/s: None
Fix Version/s: 10.5, 10.6, 10.11, 11.0

Type: Bug Priority: Critical
Reporter: Nikita Malyavin Assignee: Sergei Golubchik
Resolution: Unresolved Votes: 1
Labels: None

Issue Links:
Blocks
blocks MDEV-12302 Execute triggers for foreign key upda... Confirmed
blocks MDEV-22880 Honor constraints on UPDATE CASCADE Open
blocks MDEV-31654 Support STORED generated columns with... Open
blocks MDEV-31931 FK fields cannot be used anymore in g... Closed
blocks MDEV-31942 Online alter: support cascade foreign... Open
is blocked by MDEV-29299 SELECT from table with vcol index rep... Closed
is blocked by MDEV-30021 FK actions are ignored after tc_purge... Stalled
is blocked by MDEV-30378 Versioned REPLACE succeeds with ON DE... Closed
is blocked by MDEV-30869 Failing assertion: rec == NULL || !(d... Open
Relates
relates to MDEV-19601 MDEV-13708 is still unfixed Stalled
relates to MDEV-20640 [ERROR] InnoDB: tried to purge non-de... Closed
relates to MDEV-29182 Assertion `fld->field_no < table->n_v... Confirmed
relates to MDEV-30674 Implement CHECK constraints validatio... Closed
relates to MDEV-20640 [ERROR] InnoDB: tried to purge non-de... Closed
relates to MDEV-26951 gcol.innodb_virtual_basic fails with ... Stalled
relates to MDEV-30103 FK constraint fails upon referenced t... Stalled

 Description   

I have added an assertion to ensure that the correct table is chosen in innodb_find_table_for_vc:

--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -20007,6 +20007,9 @@
 	} else {
 		if (table->vc_templ->mysql_table_query_id
 		    == thd_get_query_id(thd)) {
+			DEBUG_SYNC(thd, "find_table_for_vc_table_valid");
+                        DBUG_ASSERT(table->vc_templ->mysql_table_query_id
+                                    == thd_get_query_id(thd));
 			return table->vc_templ->mysql_table;
 		}
 	}

This is important, since the table is thread-local data.

The above assertion fails with the following test:

--source include/have_debug_sync.inc
--source include/have_innodb.inc
--source include/have_sequence.inc
--connect (con2, localhost, root,,)
--connection default
set default_storage_engine= innodb;
 
 
create table t1 (a int primary key);
insert into t1(a) select * from seq_1_to_65536;
create table t2 (a int,  b int as (a), key(b), foreign key (a) references t1 (a)
  on delete cascade on update cascade);
insert into t2(a) select * from seq_1_to_65536;
set debug_sync= "find_table_for_vc_table_valid SIGNAL found WAIT_FOR go";
--send
UPDATE t1 set a = a + 1000000 where a <= 2;
--connection con2
set debug_sync= "now WAIT_FOR found TIMEOUT 3";
UPDATE t1 set a= 330000 where a = 33333;
set debug_sync= "now SIGNAL go";
 
--connection default
--reap
 
# Cleanup
drop table t2, t1;
set debug_sync= reset;
--connection default

I believe we need enough rows to have more that one leaf page in the tree, to avoid a row lock timeout.

The best way to fix it is to put mysql_table to a threadlocal data as well. A good place can be row_prebuilt_t (and table is already there): to access child prebuilt, parent prebuilt should contain references to it, we can fill it on open under certain conditions, or on demand



 Comments   
Comment by Nikita Malyavin [ 2022-08-09 ]

I have implemented the solution on the innodb side, based on row_prebuilt_t.
https://github.com/MariaDB/server/commit/e3114f8722a3edfe0dd790333134807b8cadc560

I feel unsatisfied of this, given how much effort it required to make a solution that passes current tests. Many corner cases were there, especially the interesting fact I wasn't aware about, that prebuilt and handler remain open is tc instance is still cached. This may affect the maintainance cost later, especially given that 10.11 is LTS, and it is likely that new FOREIGN KEYs implementation will be finished in 2023.

So I'm thinking of a simpler solution easier to maintain partially made on sql side.

Comment by Nikita Malyavin [ 2022-08-22 ]

A new solution is implemented in a simpler way, by extending TABLE_LIST class.
https://github.com/MariaDB/server/commit/1c3acbd08b20b20f975a1e68087414bc25608337

Comment by Nikita Malyavin [ 2022-08-22 ]

serg marko The solution requres the review from both innodb and runtime sides.
I have cherry-picked it on 10.5 branch. Please review it on bb-10.5-nikita branch.

Note that it can be merged to 10.3 and 10.4 only after MDEV-26951 is done.

10.10 version is here

Since innodb_find_table_for_vc is no longer called for foreign keys (only akcground thead can need it in some cases), only non-deterministic test is possible, I have added it as a separate commit.

Buildbot

Comment by Marko Mäkelä [ 2022-08-23 ]

I think that a new debug instrumentation dependent test needs to be created:

CURRENT_TEST: vcol.innodb_virtual_fk
mysqltest: At line 43: query 'set debug_sync= reset' failed: 1193: Unknown system variable 'debug_sync'

The InnoDB changes look OK to me, except that the upd_node_t::prebuilt needs to be documented better. The comments must say that essentially, row_create_update_node_for_mysql() is passing a parameter to row_upd_check_references_constraints(). Can that parameter be a pointer to const?

Some code changes mix TABs and spaces. Some comments say MySQL instead of MariaDB.

This will have to be extensively stress-tested before being pushed to a main branch.

Comment by Nikita Malyavin [ 2022-08-23 ]

Can that parameter be a pointer to const?

It can be done for an upd_node_t field, but doesn't lie so nasty for an argument passed in functions row_ins_check_foreign_constraint and row_ins_foreign_check_on_constraint, because I modify prebuilt->index to match clust_index in processing online changes in 10.11. This can be done differently, though, but doubt it's worth it. I also can do it consts for 10.3-10.10, and leave them mutable in 10.11.

I think that a new debug instrumentation dependent test needs to be created:

No, just forgot to remove the reset line.

Some comments say MySQL instead of MariaDB

In ha_innodb.h? It was just a move from private zone to public. But yes, can be updated, too.

The branch is updated, you can see the changes (except for the test, I've deleted the just like in-place there) in this commit

Comment by Nikita Malyavin [ 2022-11-02 ]

RQG testing reported a crash (by mleich)

mysqld: /data/Server/bb-10.6-MDEV-29181/storage/innobase/row/row0upd.cc:240: 
dberr_t row_upd_check_references_constraints(upd_node_t*, btr_pcur_t*, dict_table_t*, 
dict_index_t*, rec_offs*, que_thr_t*, mtr_t*): Assertion `maria_table->fk_ref_list' failed.
 
pluto:/data/results/1665405077/TBR-1621$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio

Comment by Matthias Leich [ 2022-11-04 ]

Results of RQG testing on
origin/bb-10.6-MDEV-29181 352acbb59a55b0723602dcf1796196cc8640cee2 2022-11-03T16:47:43+03:00
1. No further new bad effects ~ We have the same trouble on other source trees.
2. The bad effect "TBR-1621" which does not occur on main source trees was several times replayed.
      pluto:/data/results/1667488260/TBR-1621$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio

Comment by Nikita Malyavin [ 2022-11-11 ]

TBR-1621 demonstrated a flaw against not locked parent table of the ADDed FK. The fix is to prelock this table.

This was already done by Svoj in MDEV-16060 fix, but only for COPY algorithm. I have added the prelocking for INPLACE as well, right before commit_iinplace_alter_table.

The vulnerability for DROP was also existed, but a way to fix it without additional prelocking was found.

https://github.com/MariaDB/server/commit/57f69f307f47152aa6db54e7dac9c4549f5ac68c

Comment by Matthias Leich [ 2023-01-17 ]

origin/bb-10.6-MDEV-29181 332ca90e4e796c82733f9fcdd3bba7b9b7541d64 2023-01-12T17:48:32+03:00
performed well in RQG testing. The failures observed occur on main trees too.

Comment by Marko Mäkelä [ 2023-01-26 ]

I can’t find anything wrong with the algorithm here. The branch for testing included also MDEV-30103, which I think goes need some more work, and MDEV-30021, which looks OK to me.

I only have a coding style comment: It would be good to declare maria_table_to_prebuilt() in ha_prototypes.h, define it in ha_innodb.cc and avoid including ha_innodb.h in low-level source files.

OK to push after addressing this.

Comment by Marko Mäkelä [ 2023-07-28 ]

I am not sure if this would fix the following failure related to FOREIGN KEY checks, or if it would be something else, such as MDEV-30103:

ssh pluto
rr replay /data1/results/1690304709/TBR-1676/1/rr/latest-trace

10.6 b102872ad50cce5959ad95369740766d14e9e48c

mariadbd: /data/Server/bb-10.6-primary-corruption/storage/innobase/include/sux_lock.h:85: void sux_lock<ssux>::free() [with ssux = ssux_lock_impl<false>]: Assertion `r->empty()' failed.
(rr) backtrace
#8  0x000055ca271b15e8 in dict_index_remove_from_cache_low (table=table@entry=0x7fd0e83bd250, index=0x7fd0c983f880, lru_evict=lru_evict@entry=0)
    at /data/Server/bb-10.6-primary-corruption/storage/innobase/dict/dict0dict.cc:2166
#9  0x000055ca271b1b3a in dict_sys_t::remove (this=0x55ca27d87280 <dict_sys>, table=table@entry=0x7fd0e83bd250, lru=lru@entry=false, keep=keep@entry=false)
    at /data/Server/bb-10.6-primary-corruption/storage/innobase/dict/dict0dict.cc:1887
#10 0x000055ca26efd7bf in innobase_reload_table (…)
#18 0x000055ca2682eba8 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fd0c4020ee8, 
    packet=packet@entry=0x7fd0c40b5f69 " ALTER IGNORE TABLE test.t8_p DROP INDEX `Marvão_uidx3`, ALGORITHM = INPLACE  /* E_R Thread4 QNO 906 CON_ID 86 */ ", packet_length=packet_length@entry=115, 
    blocking=blocking@entry=true) at /data/Server/bb-10.6-primary-corruption/sql/sql_parse.cc:1896
(rr) thread apply 15 backtrace
#18 0x000055ca27140d7c in btr_cur_t::search_leaf (this=this@entry=0x7fd0de3cfd70, tuple=tuple@entry=0x7fd0cc101280, mode=mode@entry=PAGE_CUR_GE, latch_mode=<optimized out>, latch_mode@entry=BTR_SEARCH_LEAF, mtr=mtr@entry=0x7fd0de3d00f0) at /data/Server/bb-10.6-primary-corruption/storage/innobase/btr/btr0cur.cc:1177
#19 0x000055ca27024b97 in btr_pcur_open (mtr=0x7fd0de3d00f0, cursor=0x7fd0de3cfd70, latch_mode=BTR_SEARCH_LEAF, mode=PAGE_CUR_GE, tuple=0x7fd0cc101280) at /data/Server/bb-10.6-primary-corruption/storage/innobase/include/btr0pcur.h:431
#20 row_ins_check_foreign_constraint (check_ref=check_ref@entry=1, foreign=foreign@entry=0x7fd0e4098b10, table=table@entry=0x7fd0e807cd30, entry=entry@entry=0x7fd0cc101280, thr=thr@entry=0x7fd0ccb28ea8) at /data/Server/bb-10.6-primary-corruption/storage/innobase/row/row0ins.cc:1627
#34 0x000055ca2682eba8 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fd0d80017d8, packet=packet@entry=0x7fd0d800b789 " INSERT INTO test.t8_c (col1,col2, col_int, col_string, col_text) VALUES ( 1, 1, 1 - 1, REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), @fill_amount) ) /* E_R Thread3 QN"..., packet_length=packet_length@entry=219, blocking=blocking@entry=true) at /data/Server/bb-10.6-primary-corruption/sql/sql_parse.cc:1896

That is, the ALTER TABLE test.t8_p is trying to evict the data dictionary object that the INSERT INTO test.t8_c is accessing, to see if an insert into a child table is allowed by the existence of the parent table. When I implemented MDEV-21175, I was assured that proper locking would be in place, but it appears to me that the final phase of ALTER TABLE test.t8_p fails to block concurrent operations on all parent and child tables (such as test.t8_c).

Comment by Marko Mäkelä [ 2023-09-20 ]

As part of applying this fix, please re-enable the test gcol.innodb_virtual_fk and ensure that it is stable.

Generated at Thu Feb 08 10:06:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.