Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29181

Potential corruption on Foreign key update on a table with vcol index

Details

    Description

      I have added an assertion to ensure that the correct table is chosen in innodb_find_table_for_vc:

      --- a/storage/innobase/handler/ha_innodb.cc
      +++ b/storage/innobase/handler/ha_innodb.cc
      @@ -20007,6 +20007,9 @@
       	} else {
       		if (table->vc_templ->mysql_table_query_id
       		    == thd_get_query_id(thd)) {
      +			DEBUG_SYNC(thd, "find_table_for_vc_table_valid");
      +                        DBUG_ASSERT(table->vc_templ->mysql_table_query_id
      +                                    == thd_get_query_id(thd));
       			return table->vc_templ->mysql_table;
       		}
       	}
      

      This is important, since the table is thread-local data.

      The above assertion fails with the following test:

      --source include/have_debug_sync.inc
      --source include/have_innodb.inc
      --source include/have_sequence.inc
      --connect (con2, localhost, root,,)
      --connection default
      set default_storage_engine= innodb;
       
       
      create table t1 (a int primary key);
      insert into t1(a) select * from seq_1_to_65536;
      create table t2 (a int,  b int as (a), key(b), foreign key (a) references t1 (a)
        on delete cascade on update cascade);
      insert into t2(a) select * from seq_1_to_65536;
      set debug_sync= "find_table_for_vc_table_valid SIGNAL found WAIT_FOR go";
      --send
      UPDATE t1 set a = a + 1000000 where a <= 2;
      --connection con2
      set debug_sync= "now WAIT_FOR found TIMEOUT 3";
      UPDATE t1 set a= 330000 where a = 33333;
      set debug_sync= "now SIGNAL go";
       
      --connection default
      --reap
       
      # Cleanup
      drop table t2, t1;
      set debug_sync= reset;
      --connection default
      

      I believe we need enough rows to have more that one leaf page in the tree, to avoid a row lock timeout.

      The best way to fix it is to put mysql_table to a threadlocal data as well. A good place can be row_prebuilt_t (and table is already there): to access child prebuilt, parent prebuilt should contain references to it, we can fill it on open under certain conditions, or on demand

      Attachments

        Issue Links

          Activity

            mleich Matthias Leich added a comment - - edited

            origin/bb-10.6-MDEV-29181 332ca90e4e796c82733f9fcdd3bba7b9b7541d64 2023-01-12T17:48:32+03:00
            performed well in RQG testing. The failures observed occur on main trees too.

            mleich Matthias Leich added a comment - - edited origin/bb-10.6- MDEV-29181 332ca90e4e796c82733f9fcdd3bba7b9b7541d64 2023-01-12T17:48:32+03:00 performed well in RQG testing. The failures observed occur on main trees too.

            I can’t find anything wrong with the algorithm here. The branch for testing included also MDEV-30103, which I think goes need some more work, and MDEV-30021, which looks OK to me.

            I only have a coding style comment: It would be good to declare maria_table_to_prebuilt() in ha_prototypes.h, define it in ha_innodb.cc and avoid including ha_innodb.h in low-level source files.

            OK to push after addressing this.

            marko Marko Mäkelä added a comment - I can’t find anything wrong with the algorithm here. The branch for testing included also MDEV-30103 , which I think goes need some more work, and MDEV-30021 , which looks OK to me. I only have a coding style comment: It would be good to declare maria_table_to_prebuilt() in ha_prototypes.h , define it in ha_innodb.cc and avoid including ha_innodb.h in low-level source files. OK to push after addressing this.
            marko Marko Mäkelä added a comment - - edited

            I am not sure if this would fix the following failure related to FOREIGN KEY checks, or if it would be something else, such as MDEV-30103:

            ssh pluto
            rr replay /data1/results/1690304709/TBR-1676/1/rr/latest-trace
            

            10.6 b102872ad50cce5959ad95369740766d14e9e48c

            mariadbd: /data/Server/bb-10.6-primary-corruption/storage/innobase/include/sux_lock.h:85: void sux_lock<ssux>::free() [with ssux = ssux_lock_impl<false>]: Assertion `r->empty()' failed.
            …
            (rr) backtrace
            …
            #8  0x000055ca271b15e8 in dict_index_remove_from_cache_low (table=table@entry=0x7fd0e83bd250, index=0x7fd0c983f880, lru_evict=lru_evict@entry=0)
                at /data/Server/bb-10.6-primary-corruption/storage/innobase/dict/dict0dict.cc:2166
            #9  0x000055ca271b1b3a in dict_sys_t::remove (this=0x55ca27d87280 <dict_sys>, table=table@entry=0x7fd0e83bd250, lru=lru@entry=false, keep=keep@entry=false)
                at /data/Server/bb-10.6-primary-corruption/storage/innobase/dict/dict0dict.cc:1887
            #10 0x000055ca26efd7bf in innobase_reload_table (…)
            …
            #18 0x000055ca2682eba8 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fd0c4020ee8, 
                packet=packet@entry=0x7fd0c40b5f69 " ALTER IGNORE TABLE test.t8_p DROP INDEX `Marvão_uidx3`, ALGORITHM = INPLACE  /* E_R Thread4 QNO 906 CON_ID 86 */ ", packet_length=packet_length@entry=115, 
                blocking=blocking@entry=true) at /data/Server/bb-10.6-primary-corruption/sql/sql_parse.cc:1896
            …
            (rr) thread apply 15 backtrace
            …
            #18 0x000055ca27140d7c in btr_cur_t::search_leaf (this=this@entry=0x7fd0de3cfd70, tuple=tuple@entry=0x7fd0cc101280, mode=mode@entry=PAGE_CUR_GE, latch_mode=<optimized out>, latch_mode@entry=BTR_SEARCH_LEAF, mtr=mtr@entry=0x7fd0de3d00f0) at /data/Server/bb-10.6-primary-corruption/storage/innobase/btr/btr0cur.cc:1177
            #19 0x000055ca27024b97 in btr_pcur_open (mtr=0x7fd0de3d00f0, cursor=0x7fd0de3cfd70, latch_mode=BTR_SEARCH_LEAF, mode=PAGE_CUR_GE, tuple=0x7fd0cc101280) at /data/Server/bb-10.6-primary-corruption/storage/innobase/include/btr0pcur.h:431
            #20 row_ins_check_foreign_constraint (check_ref=check_ref@entry=1, foreign=foreign@entry=0x7fd0e4098b10, table=table@entry=0x7fd0e807cd30, entry=entry@entry=0x7fd0cc101280, thr=thr@entry=0x7fd0ccb28ea8) at /data/Server/bb-10.6-primary-corruption/storage/innobase/row/row0ins.cc:1627
            …
            #34 0x000055ca2682eba8 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fd0d80017d8, packet=packet@entry=0x7fd0d800b789 " INSERT INTO test.t8_c (col1,col2, col_int, col_string, col_text) VALUES ( 1, 1, 1 - 1, REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), @fill_amount) ) /* E_R Thread3 QN"..., packet_length=packet_length@entry=219, blocking=blocking@entry=true) at /data/Server/bb-10.6-primary-corruption/sql/sql_parse.cc:1896
            

            That is, the ALTER TABLE test.t8_p is trying to evict the data dictionary object that the INSERT INTO test.t8_c is accessing, to see if an insert into a child table is allowed by the existence of the parent table. When I implemented MDEV-21175, I was assured that proper locking would be in place, but it appears to me that the final phase of ALTER TABLE test.t8_p fails to block concurrent operations on all parent and child tables (such as test.t8_c).

            marko Marko Mäkelä added a comment - - edited I am not sure if this would fix the following failure related to FOREIGN KEY checks, or if it would be something else, such as MDEV-30103 : ssh pluto rr replay /data1/results/1690304709/TBR-1676/1/rr/latest-trace 10.6 b102872ad50cce5959ad95369740766d14e9e48c mariadbd: /data/Server/bb-10.6-primary-corruption/storage/innobase/include/sux_lock.h:85: void sux_lock<ssux>::free() [with ssux = ssux_lock_impl<false>]: Assertion `r->empty()' failed. … (rr) backtrace … #8 0x000055ca271b15e8 in dict_index_remove_from_cache_low (table=table@entry=0x7fd0e83bd250, index=0x7fd0c983f880, lru_evict=lru_evict@entry=0) at /data/Server/bb-10.6-primary-corruption/storage/innobase/dict/dict0dict.cc:2166 #9 0x000055ca271b1b3a in dict_sys_t::remove (this=0x55ca27d87280 <dict_sys>, table=table@entry=0x7fd0e83bd250, lru=lru@entry=false, keep=keep@entry=false) at /data/Server/bb-10.6-primary-corruption/storage/innobase/dict/dict0dict.cc:1887 #10 0x000055ca26efd7bf in innobase_reload_table (…) … #18 0x000055ca2682eba8 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fd0c4020ee8, packet=packet@entry=0x7fd0c40b5f69 " ALTER IGNORE TABLE test.t8_p DROP INDEX `Marvão_uidx3`, ALGORITHM = INPLACE /* E_R Thread4 QNO 906 CON_ID 86 */ ", packet_length=packet_length@entry=115, blocking=blocking@entry=true) at /data/Server/bb-10.6-primary-corruption/sql/sql_parse.cc:1896 … (rr) thread apply 15 backtrace … #18 0x000055ca27140d7c in btr_cur_t::search_leaf (this=this@entry=0x7fd0de3cfd70, tuple=tuple@entry=0x7fd0cc101280, mode=mode@entry=PAGE_CUR_GE, latch_mode=<optimized out>, latch_mode@entry=BTR_SEARCH_LEAF, mtr=mtr@entry=0x7fd0de3d00f0) at /data/Server/bb-10.6-primary-corruption/storage/innobase/btr/btr0cur.cc:1177 #19 0x000055ca27024b97 in btr_pcur_open (mtr=0x7fd0de3d00f0, cursor=0x7fd0de3cfd70, latch_mode=BTR_SEARCH_LEAF, mode=PAGE_CUR_GE, tuple=0x7fd0cc101280) at /data/Server/bb-10.6-primary-corruption/storage/innobase/include/btr0pcur.h:431 #20 row_ins_check_foreign_constraint (check_ref=check_ref@entry=1, foreign=foreign@entry=0x7fd0e4098b10, table=table@entry=0x7fd0e807cd30, entry=entry@entry=0x7fd0cc101280, thr=thr@entry=0x7fd0ccb28ea8) at /data/Server/bb-10.6-primary-corruption/storage/innobase/row/row0ins.cc:1627 … #34 0x000055ca2682eba8 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fd0d80017d8, packet=packet@entry=0x7fd0d800b789 " INSERT INTO test.t8_c (col1,col2, col_int, col_string, col_text) VALUES ( 1, 1, 1 - 1, REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), @fill_amount) ) /* E_R Thread3 QN"..., packet_length=packet_length@entry=219, blocking=blocking@entry=true) at /data/Server/bb-10.6-primary-corruption/sql/sql_parse.cc:1896 That is, the ALTER TABLE test.t8_p is trying to evict the data dictionary object that the INSERT INTO test.t8_c is accessing, to see if an insert into a child table is allowed by the existence of the parent table. When I implemented MDEV-21175 , I was assured that proper locking would be in place, but it appears to me that the final phase of ALTER TABLE test.t8_p fails to block concurrent operations on all parent and child tables (such as test.t8_c ).

            As part of applying this fix, please re-enable the test gcol.innodb_virtual_fk and ensure that it is stable.

            marko Marko Mäkelä added a comment - As part of applying this fix, please re-enable the test gcol.innodb_virtual_fk and ensure that it is stable.

            The part for serg is still to review, however there is also an issue in innodb reported by Mathias this December, that should be investigated by me. Serg is not blocked by it, though.

            nikitamalyavin Nikita Malyavin added a comment - The part for serg is still to review, however there is also an issue in innodb reported by Mathias this December, that should be investigated by me. Serg is not blocked by it, though.

            People

              nikitamalyavin Nikita Malyavin
              nikitamalyavin Nikita Malyavin
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.