[MDEV-29489] Assertion `lock_table_has(trx, index->table, LOCK_IX)' failed at lock0lock.cc:6025 Created: 2022-09-07  Updated: 2023-03-30  Resolved: 2022-11-09

Status: Closed
Project: MariaDB Server
Component/s: Data Manipulation - Delete
Affects Version/s: 10.10.1, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.11.2, 10.3.38, 10.4.28, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Critical
Reporter: Zuming Jiang Assignee: Vladislav Lesin
Resolution: Duplicate Votes: 0
Labels: crash
Environment:

Ubuntu 20.04


Attachments: Text File bug_report.txt     File mysql_bk.sql    
Issue Links:
Duplicate
duplicates MDEV-10087 mysqld_update()/mysql_delete() contin... Closed
is duplicated by MDEV-29118 Assertion `lock_table_has(trx, index-... Closed
Relates
relates to MDEV-21987 Assertion failure in file lock0lock.c... Confirmed

 Description   

I used my fuzzing tool to test MariaDB and found a transaction-related bug that make the crashes. This bug might be similar to the bug MDEV-29233 that I previously reported. But the file location of bug triggering and the stack trace are quite different from MDEV-29233. Based on my experience, it might be a different bug.

Mariadb installation
1) cd mariadb-10.10.1
2) mkdir build; cd build
3) cmake .. -DCMAKE_BUILD_TYPE=Debug -DWITH_ASAN=ON
4) make -j12 && sudo make install

Setup the environment
1) export ASAN_OPTIONS=detect_leaks=0
2) /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql &
3) /usr/local/mysql/bin/mysql -uroot -Dtestdb < mysql_bk.sql (attached) # set up the database

Reproduce bug

/usr/local/mysql/bin/mysql -uroot -Dtestdb # set up connection conn_0
/usr/local/mysql/bin/mysql -uroot -Dtestdb # set up connection conn_1

conn_1> START TRANSACTION;
conn_0> START TRANSACTION;
conn_1> update t_yfrkzd set wkey = 80;
conn_1> delete from t_yfrkzd
where
t_yfrkzd.c_n1makd between t_yfrkzd.c_n1makd and t_yfrkzd.wkey;
conn_0> delete from t_ywo4_b
where
t_ywo4_b.c_hlsgr not in (
select
ref_0.pkey as c0
from
(t_yfrkzd as ref_0
inner join (select
ref_1.wkey as c0
from
t_yfrkzd as ref_1
) as subq_0
on (ref_0.wkey = subq_0.c0 ))); --- this DELETE blocks
conn_1> delete from t_yfrkzd
where
(t_yfrkzd.c_aob5e not in (
select
ref_1.c_k4lijb as c0
from
t_ywo4_b as ref_1)); --- this DELETE make the above DELETE crash the maridb server
conn_1> ROLLBACK;
conn_0> ROLLBACK;

I have simplified the content of the test case, and I hope this report can help you reproduce and fix the bug. In addition, I attached the failure report (which has its stack trace).



 Comments   
Comment by Marko Mäkelä [ 2022-09-08 ]

I reproduced this on 10.6 without AddressSanitizer.

Comment by Marko Mäkelä [ 2022-09-08 ]

Here is a somewhat simplified mtr test case. I expect that some rows and maybe also columns may still be removed:

--source include/have_innodb.inc
 
CREATE TABLE t1 (
  wkey int,
  pkey int NOT NULL PRIMARY KEY,
  c_n1makd int,
  c_aob5e text
) ENGINE=InnoDB;
 
INSERT INTO t1 VALUES
(1,11000,41,NULL),
(1,12000,100,NULL),
(1,13000,62,NULL),
(4,29000,43,'avkwdb'),
(4,30000,44,'w9gz1b'),
(4,31000,56,'nxhhpc'),
(4,32000,71,'fiyrmc'),
(4,33000,42,'kil97c'),
(4,34000,10,NULL),
(4,35000,87,'bz5mw'),
(4,36000,17,'9cl1yd'),
(6,45000,NULL,NULL),
(6,46000,NULL,NULL),
(6,47000,NULL,NULL),
(8,55000,42,NULL),
(8,56000,35,'dothi'),
(8,57000,19,'ig_sbb'),
(8,58000,83,'_egg6b'),
(8,59000,100,'asdqy'),
(8,60000,49,'nj8akc'),
(8,61000,14,'gwkdmb'),
(8,62000,14,'qczqhc'),
(10,66000,NULL,'uay_8b'),
(10,67000,NULL,'eym53c'),
(10,68000,NULL,'ajrll'),
(10,69000,NULL,NULL),
(10,70000,NULL,'mepiuc'),
(12,77000,NULL,'9nn5yc'),
(12,78000,NULL,NULL),
(12,79000,NULL,'usjb8d'),
(12,80000,NULL,'lfyj9c');
 
CREATE TABLE t2 (
  wkey int,
  pkey int NOT NULL PRIMARY KEY,
  c_m53hyb double,
  c_k4lijb text,
  c_hlsgr int,
  c_s8yzsd int,
  UNIQUE KEY t_ghfno (wkey,pkey,c_m53hyb,c_hlsgr,c_s8yzsd)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
 
INSERT INTO t2 VALUES
(2,14000,13.7,'m3vhdd',53,NULL),
(2,15000,86.8,'sa9pq',37,NULL),
(2,16000,82.79,'njcxob',6,NULL),
(2,17000,24.14,'z0gx2c',30,NULL),
(2,18000,41.81,'5ehzm',17,NULL),
(2,19000,56.72,'u4c_id',38,NULL),
(2,20000,57.37,'kgbe3c',96,NULL),
(2,21000,32.17,'gwdbgc',57,NULL),
(3,22000,98.66,'hld0_b',60,12),
(3,23000,5.7,'x_zb6',32,61),
(3,24000,59.56,'9x_hvd',14,46),
(3,25000,24.51,'gbcmw',98,98),
(3,26000,64.96,'b1msib',37,36),
(3,27000,40.98,'oltxod',41,4),
(3,28000,40.26,'bk3rwc',31,99),
(5,37000,NULL,'yznqgd',23,68),
(5,38000,NULL,'mahkqb',45,3),
(5,39000,NULL,'m5ilgc',34,34),
(5,40000,NULL,'rjh_5c',57,9),
(5,41000,NULL,'esohkd',15,26),
(5,42000,NULL,'kpu3fd',47,64),
(5,43000,NULL,'bwhvic',24,21),
(5,44000,NULL,'ari9p',70,99),
(7,48000,NULL,'dmm0kd',96,32),
(7,49000,NULL,'0a5lzc',39,87),
(7,50000,NULL,'dxappd',97,69),
(7,51000,NULL,'u9s2dd',6,85),
(7,52000,NULL,'srl9yd',36,28),
(7,53000,NULL,'eqrp1d',22,44),
(7,54000,NULL,'war_wc',86,68),
(9,63000,46.28,NULL,56,52),
(9,64000,68.52,NULL,61,85),
(9,65000,39.93,NULL,43,78),
(11,71000,14.97,'4rujkb',50,NULL),
(11,72000,34.81,'3yrdbc',11,NULL),
(11,73000,47.6,'dhef5d',55,NULL),
(11,74000,51.23,'ub1_ib',82,NULL),
(11,75000,71.39,'_kdtrd',84,NULL),
(11,76000,61.65,'_ijplc',90,NULL),
(192,408000,NULL,'obvzkd',NULL,31),
(192,409000,NULL,'bbened',NULL,61),
(192,410000,NULL,'rtaz2c',NULL,48);
 
START TRANSACTION;
UPDATE t1 SET wkey = 80;
DELETE FROM t1 WHERE c_n1makd IS NOT NULL;
 
connect(con1, localhost,root,,);
START TRANSACTION;
send delete from t2 where
c_hlsgr not in (select ref_0.pkey from
                (t1 as ref_0 inner join
                 (select wkey from t1) as subq_0
                 on (ref_0.wkey = ref_0.pkey)));
 
connection default;
let $wait_condition=
  select count(*) = 1 from information_schema.processlist
  where state = 'Sending data'
  and info like 'delete from t2 where%';
--source include/wait_condition.inc
 
delete from t1 where c_aob5e not in (select c_k4lijb from t2);
 
ROLLBACK;
connection con1;
reap;
ROLLBACK;
disconnect con1;
connection default;
DROP TABLE t1,t2;

Comment by Vladislav Lesin [ 2022-10-31 ]

Take a look at the following stack trace gotten from the above test:

#9  0x0000556e66473b9e in trx_t::rollback (this=0x63b34902a680, savept=0x0) at /home/vlesin/unencrypted/work/git/10.6/storage/innobase/trx/trx0roll.cc:176
#10 0x0000556e663d99a7 in row_mysql_handle_errors (new_err=0x7ee765004c38, trx=0x63b34902a680, thr=0x590170238e90, savept=0x0)
    at /home/vlesin/unencrypted/work/git/10.6/storage/innobase/row/row0mysql.cc:696
#11 0x0000556e6641d755 in row_search_mvcc (buf=0x590170225858 "\377", mode=PAGE_CUR_G, prebuilt=0x590170238458, match_mode=0, direction=0)
    at /home/vlesin/unencrypted/work/git/10.6/storage/innobase/row/row0sel.cc:5842
#12 0x0000556e66216aee in ha_innobase::index_read (this=0x590170237780, buf=0x590170225858 "\377", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY)
    at /home/vlesin/unencrypted/work/git/10.6/storage/innobase/handler/ha_innodb.cc:9066
#13 0x0000556e66217b2e in ha_innobase::index_first (this=0x590170237780, buf=0x590170225858 "\377")
    at /home/vlesin/unencrypted/work/git/10.6/storage/innobase/handler/ha_innodb.cc:9435
#14 0x0000556e66217d1a in ha_innobase::rnd_next (this=0x590170237780, buf=0x590170225858 "\377")
    at /home/vlesin/unencrypted/work/git/10.6/storage/innobase/handler/ha_innodb.cc:9528
#15 0x0000556e65decfdc in handler::ha_rnd_next (this=0x590170237780, buf=0x590170225858 "\377") at /home/vlesin/unencrypted/work/git/10.6/sql/handler.cc:3416
#16 0x0000556e65fb926c in rr_sequential (info=0x590174021928) at /home/vlesin/unencrypted/work/git/10.6/sql/records.cc:519
#17 0x0000556e6598365f in READ_RECORD::read_record (this=0x590174021928) at /home/vlesin/unencrypted/work/git/10.6/sql/records.h:81
#18 0x0000556e65acb6df in read_first_record_seq (tab=0x590174021860) at /home/vlesin/unencrypted/work/git/10.6/sql/sql_select.cc:22188
#19 0x0000556e65ac9526 in sub_select (join=0x590174018ee8, join_tab=0x590174021860, end_of_records=false)
    at /home/vlesin/unencrypted/work/git/10.6/sql/sql_select.cc:21291
#20 0x0000556e65ac8a21 in do_select (join=0x590174018ee8, procedure=0x0) at /home/vlesin/unencrypted/work/git/10.6/sql/sql_select.cc:20839
#21 0x0000556e65a9ba40 in JOIN::exec_inner (this=0x590174018ee8) at /home/vlesin/unencrypted/work/git/10.6/sql/sql_select.cc:4787
#22 0x0000556e65a9aab1 in JOIN::exec (this=0x590174018ee8) at /home/vlesin/unencrypted/work/git/10.6/sql/sql_select.cc:4565
#23 0x0000556e65ee24d6 in subselect_single_select_engine::exec (this=0x590174018c10) at /home/vlesin/unencrypted/work/git/10.6/sql/item_subselect.cc:4093
#24 0x0000556e65ed4dc9 in Item_subselect::exec (this=0x5901740189e8) at /home/vlesin/unencrypted/work/git/10.6/sql/item_subselect.cc:811
#25 0x0000556e65ed552f in Item_in_subselect::exec (this=0x5901740189e8) at /home/vlesin/unencrypted/work/git/10.6/sql/item_subselect.cc:993
#26 0x0000556e65ed95ba in Item_in_subselect::val_bool (this=0x5901740189e8) at /home/vlesin/unencrypted/work/git/10.6/sql/item_subselect.cc:1941
#27 0x0000556e6590ade9 in Item::val_bool_result (this=0x5901740189e8) at /home/vlesin/unencrypted/work/git/10.6/sql/item.h:1772
#28 0x0000556e65e37191 in Item_in_optimizer::val_int (this=0x590174019db0) at /home/vlesin/unencrypted/work/git/10.6/sql/item_cmpfunc.cc:1650
#29 0x0000556e65cbdae3 in Type_handler_int_result::Item_val_bool (this=0x556e673e1320 <type_handler_bool>, item=0x590174019db0)
    at /home/vlesin/unencrypted/work/git/10.6/sql/sql_type.cc:5094
#30 0x0000556e6590ac0c in Item::val_bool (this=0x590174019db0) at /home/vlesin/unencrypted/work/git/10.6/sql/item.h:1676
--Type <RET> for more, q to quit, c to continue without paging--
#31 0x0000556e65e32096 in Item_func_not::val_int (this=0x590174018c50) at /home/vlesin/unencrypted/work/git/10.6/sql/item_cmpfunc.cc:203
#32 0x0000556e65b86cbd in SQL_SELECT::skip_record (this=0x590174024a50, thd=0x590174000db8) at /home/vlesin/unencrypted/work/git/10.6/sql/opt_range.h:1728
#33 0x0000556e65fd8a13 in record_should_be_deleted (thd=0x590174000db8, table=0x5901702325d8, sel=0x590174024a50, explain=0x590174024c20,
    truncate_history=false) at /home/vlesin/unencrypted/work/git/10.6/sql/sql_delete.cc:221
#34 0x0000556e65fdae56 in mysql_delete (thd=0x590174000db8, table_list=0x590174014360, conds=0x590174018c50, order_list=0x590174005c30,
    limit=18446744073709551576, options=0, result=0x0) at /home/vlesin/unencrypted/work/git/10.6/sql/sql_delete.cc:801
#35 0x0000556e65a45937 in mysql_execute_command (thd=0x590174000db8, is_called_from_prepared_stmt=false)
    at /home/vlesin/unencrypted/work/git/10.6/sql/sql_parse.cc:4807
#36 0x0000556e65a50389 in mysql_parse (thd=0x590174000db8,
    rawbuf=0x590174014190 "delete from t2 where\nc_hlsgr not in (select ref_0.pkey from\n(t1 as ref_0 inner join\n(select wkey from t1) as subq_0\non (ref_0.wkey = ref_0.pkey)))", length=146, parser_state=0x7ee7650073a0) at /home/vlesin/unencrypted/work/git/10.6/sql/sql_parse.cc:8016

trx_t::rollback() does rollback and releases all locks including table locks. row_mysql_handle_errors() passes DB_DEADLOCK up to the stack till sub_select(), converting it to HA_ERR_LOCK_DEADLOCK by the way.

sub_select() invokes evaluate_join_record(), which returns NESTED_LOOP_ERROR. The error is passed up the the call stack till do_select(), which converts is to -1 with the following code:

  if (error == NESTED_LOOP_OK)                                                  
  {                                                                             
    /*                                                                          
      Sic: this branch works even if rc != 0, e.g. when                         
      send_data above returns an error.                                         
    */                                                                          
    if (unlikely(join->result->send_eof()))                                     
      rc= 1;                                  // Don't send error               
    DBUG_PRINT("info",("%ld records output", (long) join->send_records));       
  }                                                                             
  else                                                                          
    rc= -1;   

Then that -1 is passed up to the stack till subselect_single_select_engine::exec(), which must return int. And it returns -1 to Item_subselect::exec(), but Item_subselect::exec() converts it to bool with the following code:

       bool res= engine->exec();

and passes that bool up to the stack till Item_in_subselect::val_bool(), which, in turns, just returns false with the following code:

      if (exec())
      {
        reset();
        return 0;
      }

And that false is again passed up to the stack to Item_in_optimizer::val_int(), and the value, returned by Item::val_bool_result() is just ignored in Item_in_optimizer::val_int():

            /* The subquery has to be evaluated */
            (void) item_subs->val_bool_result();

And Item_in_optimizer::val_int() just returns 0, like there was not error at all.

Then record_should_be_deleted() returns true to mysql_delete(), and mysql_delete() just invokes TABLE::delete_row() instead of finishing the statement with error.

So the transaction was rolled back by InnoDB as subquery got deadlock error, but runtime still continues statement execution, as error, returned by InnoDB, was lost in the above code. As all locks were released on that transaction rollback, the corresponding assertion let us know, that we are going to create record lock without corresponding intention table lock.

This is not InnoDB bug, we need to pass it to somebody from Runtime team.

Comment by Vladislav Lesin [ 2022-11-01 ]

The above analyses was done on 10.6 commit 64143741789a3e1c2bb8c6bf627eaec3751af0c6.

t1 IX lock is created with the following stack:

#3  0x0000556e662cc6ea in lock_table (table=0x590170232c08, fktable=0x0, mode=LOCK_IX, thr=0x590170067ac8)
    at ./storage/innobase/lock/lock0lock.cc:3577                                
#4  0x0000556e6641a3c0 in row_search_mvcc (buf=0x590170231658 "\377", mode=PAGE_CUR_G, prebuilt=0x590170066fd8, match_mode=0, direction=0)
    at ./storage/innobase/row/row0sel.cc:4713                                   
#5  0x0000556e66216aee in ha_innobase::index_read (this=0x590170065e90, buf=0x590170231658 "\377", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY)
    at ./storage/innobase/handler/ha_innodb.cc:9066                             
#6  0x0000556e66217b2e in ha_innobase::index_first (this=0x590170065e90, buf=0x590170231658 "\377")
    at ./storage/innobase/handler/ha_innodb.cc:9435                             
#7  0x0000556e66217d1a in ha_innobase::rnd_next (this=0x590170065e90, buf=0x590170231658 "\377")
    at ./storage/innobase/handler/ha_innodb.cc:9528                             
#8  0x0000556e65decfdc in handler::ha_rnd_next (this=0x590170065e90, buf=0x590170231658 "\377") at ./sql/handler.cc:3416
#9  0x0000556e65fb926c in rr_sequential (info=0x7ee7650064b0) at ./sql/records.cc:519
#10 0x0000556e6598365f in READ_RECORD::read_record (this=0x7ee7650064b0) at ./sql/records.h:81
#11 0x0000556e65fdada4 in mysql_delete (thd=0x590174000db8, table_list=0x590174014360, conds=0x590174018c50, order_list=0x590174005c30,
    limit=18446744073709551615, options=0, result=0x0) at ./sql/sql_delete.cc:797

t2 IS lock is created with the following stack:

#2  0x0000556e662cc6ea in lock_table (table=0x5901701a3a38, fktable=0x0, mode=LOCK_IS, thr=0x590170238e90)
    at ./storage/innobase/lock/lock0lock.cc:3577                                
#3  0x0000556e6641a3c0 in row_search_mvcc (buf=0x590170225858 "\377", mode=PAGE_CUR_GE, prebuilt=0x590170238458, match_mode=1, direction=0)
    at ./storage/innobase/row/row0sel.cc:4713                                   
#4  0x0000556e66216aee in ha_innobase::index_read (this=0x590170237780, buf=0x590170225858 "\377", key_ptr=0x590174022740 "5", key_len=4,
    find_flag=HA_READ_KEY_EXACT) at ./storage/innobase/handler/ha_innodb.cc:9066
#5  0x0000556e65d0fde8 in handler::index_read_map (this=0x590170237780, buf=0x590170225858 "\377", key=0x590174022740 "5", keypart_map=1,
    find_flag=HA_READ_KEY_EXACT) at ./sql/handler.h:3930                        
#6  0x0000556e65dedbd2 in handler::ha_index_read_map (this=0x590170237780, buf=0x590170225858 "\377", key=0x590174022740 "5", keypart_map=1,
    find_flag=HA_READ_KEY_EXACT) at ./sql/handler.cc:3473                       
#7  0x0000556e65acaebe in join_read_key2 (thd=0x590174000db8, tab=0x590174021860, table=0x59017022bde8, table_ref=0x590174021a88)
    at ./sql/sql_select.cc:21975                                                
#8  0x0000556e65acacd0 in join_read_key (tab=0x590174021860) at ./sql/sql_select.cc:21925
#9  0x0000556e65ac9526 in sub_select (join=0x590174018ee8, join_tab=0x590174021860, end_of_records=false)
    at ./sql/sql_select.cc:21291                                                
#10 0x0000556e65ac8a21 in do_select (join=0x590174018ee8, procedure=0x0) at ./sql/sql_select.cc:20839
#11 0x0000556e65a9ba40 in JOIN::exec_inner (this=0x590174018ee8) at ./sql/sql_select.cc:4787
#12 0x0000556e65a9aab1 in JOIN::exec (this=0x590174018ee8) at ./sql/sql_select.cc:4565
#13 0x0000556e65ee24d6 in subselect_single_select_engine::exec (this=0x590174018c10) at ./sql/item_subselect.cc:4093
#14 0x0000556e65ed4dc9 in Item_subselect::exec (this=0x5901740189e8) at ./sql/item_subselect.cc:811
#15 0x0000556e65ed552f in Item_in_subselect::exec (this=0x5901740189e8) at ./sql/item_subselect.cc:993
#16 0x0000556e65ed95ba in Item_in_subselect::val_bool (this=0x5901740189e8) at ./sql/item_subselect.cc:1941
#17 0x0000556e6590ade9 in Item::val_bool_result (this=0x5901740189e8) at ./sql/item.h:1772
#18 0x0000556e65e37257 in Item_in_optimizer::val_int (this=0x590174019db0) at ./sql/item_cmpfunc.cc:1665
#19 0x0000556e65cbdae3 in Type_handler_int_result::Item_val_bool (this=0x556e673e1320 <type_handler_bool>, item=0x590174019db0)
    at ./sql/sql_type.cc:5094                                                   
#20 0x0000556e6590ac0c in Item::val_bool (this=0x590174019db0) at ./sql/item.h:1676
#21 0x0000556e65e32096 in Item_func_not::val_int (this=0x590174018c50) at ./sql/item_cmpfunc.cc:203
#22 0x0000556e65b86cbd in SQL_SELECT::skip_record (this=0x590174024a50, thd=0x590174000db8) at ./sql/opt_range.h:1728
#23 0x0000556e65fd8a13 in record_should_be_deleted (thd=0x590174000db8, table=0x5901702325d8, sel=0x590174024a50, explain=0x590174024c20,
    truncate_history=false) at ./sql/sql_delete.cc:221                          
#24 0x0000556e65fdae56 in mysql_delete (thd=0x590174000db8, table_list=0x590174014360, conds=0x590174018c50, order_list=0x590174005c30,
    limit=18446744073709551615, options=0, result=0x0) at ./sql/sql_delete.cc:801

I.e. the t1 IX lock is created during info.read_record() call, the t2 IS lock is created during record_should_be_deleted() call, and both locks are released during record_should_be_deleted() from mysql_delete() as sub-select got deadlock error, then table->delete_row() is invoked despite the transaction was rolled back in record_should_be_deleted():

  while (likely(!(error=info.read_record())) && likely(!thd->killed) &&         
         likely(!thd->is_error()))                                              
  {                                                                             
    if (delete_while_scanning)                                                  
      delete_record= record_should_be_deleted(thd, table, select, explain,      
                                              delete_history);                  
    if (delete_record)                                                          
    {                                                                           
      ...                                                                       
      if (table_list->has_period() && portion_of_time_through_update)           
      {                                                                         
        ...                                                                     
      }                                                                         
      else                                                                      
      {                                                                         
        error= table->delete_row();                                             
        ...                                                                     
      }                                                                         
      ...                                                                       
    }                                                                           
    ...                                                                         
  }

thd->is_error() returns false after record_should_be_deleted() call.

Both locks are released with the stack showed in the previous comment.

Comment by Marko Mäkelä [ 2022-11-01 ]

This bug reminds me of MDEV-21987. While searching for that bug, I found also some other bugs that I now linked to MDEV-10087, which I think shows similar API misuse (attempting to carry on using an aborted transaction).

Comment by Vladislav Lesin [ 2022-11-02 ]

I simplified the test a bit:

--source include/have_innodb.inc
--source include/count_sessions.inc
 
CREATE TABLE t1 (
  pkey int NOT NULL PRIMARY KEY,
  c int
) ENGINE=InnoDB;
 
INSERT INTO t1 VALUES(1,1);
 
CREATE TABLE t2 (
  pkey int NOT NULL PRIMARY KEY,
  c int
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
 
INSERT INTO t2 VALUES (2, NULL);
 
# The following table is to increase tansaction weight on deadlock resolution
CREATE TABLE t3 (c int) engine = InnoDB STATS_PERSISTENT=0;
INSERT INTO t3 VALUES (10), (20), (30), (40), (50);
 
START TRANSACTION; # trx 1
# The following update is necessary to increase the transaction weight, which is
# calculated as the number of locks + the number of undo records during deadlock
# report. Victim's transaction should have minimum weight. We need trx 2 to be
# choosen as victim, that's why we need to increase the current transaction
# weight.
UPDATE t3 SET c=c+1000;
SELECT * FROM t1 FOR UPDATE;
 
--connect(con1, localhost,root,,)
START TRANSACTION; # trx 2
# 1) read record from t2, lock it
# 2) check if the read record should be deleted, i.e. read record from t1,
# as the record from t1 is locked by trx 1, the subselect will be suspended.
# see 'while' loop in mysql_delete() for details.
--send DELETE FROM t2 WHERE c NOT IN (SELECT ref_0.pkey FROM t1 AS ref_0 INNER JOIN t1 AS ref_1 ON ref_0.c = ref_0.pkey)
 
--connection default
let $wait_condition=
  SELECT count(*) = 1 FROM information_schema.processlist
  WHERE (state = 'Sending data' or state = "Updating")
  AND info LIKE 'delete from t2 where%';
--source include/wait_condition.inc
 
# The record from t2 is locked by the previous delete, so trx 2 is waiting for
# trx 1, and trx 1 will be blocked by trx 2 with the following SELECT. So we
# have deadlock here. And trx 2 is chosen as deadlock victim as trx 1 has
# greater weight.
SELECT * FROM t2 FOR UPDATE;
COMMIT;
 
--connection con1
# If the bug is not fixed, there will be assertion failure as mysql_delete()
# will continue execution despite its subselect got deadlock error
--error ER_LOCK_DEADLOCK
--reap
COMMIT;
--disconnect con1
 
--connection default
DROP TABLE t1,t2,t3;
--source include/wait_until_count_sessions.inc

There will not be assertion failure if we change the subselect in DELETE to: SELECT pkey FROM t1 WHERE c=10.

I tried to understand the difference, i.e. why record_should_be_deleted() returns true for the subselect with join, and false for the subselect without join.

So consider each of the cases.

DELETE FROM t2 WHERE c NOT IN (SELECT pkey FROM t1 WHERE c=10);

We have the following call stack from record_should_be_deleted() to InnoDB

#3  0x000055f31193ed1a in ha_innobase::rnd_next (this=0x7f9d40252a90, buf=0x7f9d40247258 "\377") at ./storage/innobase/handler/ha_innodb.cc:9528
#4  0x000055f311513fdc in handler::ha_rnd_next (this=0x7f9d40252a90, buf=0x7f9d40247258 "\377") at ./sql/handler.cc:3416
#5  0x000055f311609891 in subselect_uniquesubquery_engine::scan_table (this=0x7f9d4c01c8c8) at ./sql/item_subselect.cc:4164
#6  0x000055f311609c8e in subselect_uniquesubquery_engine::exec (this=0x7f9d4c01c8c8) at ./sql/item_subselect.cc:4282
#7  0x000055f3115fbdc9 in Item_subselect::exec (this=0x7f9d4c0167e8) at ./sql/item_subselect.cc:811
#8  0x000055f3115fc52f in Item_in_subselect::exec (this=0x7f9d4c0167e8) at ./sql/item_subselect.cc:993
#9  0x000055f3116005ba in Item_in_subselect::val_bool (this=0x7f9d4c0167e8) at ./sql/item_subselect.cc:1941
#10 0x000055f311031de9 in Item::val_bool_result (this=0x7f9d4c0167e8) at ./sql/item.h:1772
#11 0x000055f31155e191 in Item_in_optimizer::val_int (this=0x7f9d4c017340) at ./sql/item_cmpfunc.cc:1650
#12 0x000055f3113e4ae3 in Type_handler_int_result::Item_val_bool (this=0x55f312b08320 <type_handler_bool>, item=0x7f9d4c017340) at ./sql/sql_type.cc:5094
#13 0x000055f311031c0c in Item::val_bool (this=0x7f9d4c017340) at ./sql/item.h:1676
#14 0x000055f311559096 in Item_func_not::val_int (this=0x7f9d4c016a50) at ./sql/item_cmpfunc.cc:203
#15 0x000055f3112adcbd in SQL_SELECT::skip_record (this=0x7f9d4c01cfc0, thd=0x7f9d4c0011c8) at ./sql/opt_range.h:1728
#16 0x000055f3116ffa13 in record_should_be_deleted (thd=0x7f9d4c0011c8, table=0x7f9d40071558, sel=0x7f9d4c01cfc0, explain=0x7f9d4c01d190, truncate_history=false) at ./sql/sql_delete.cc:221

SQL_SELECT::skip_record() checks thd->is_error():

  inline int skip_record(THD *thd)                                               
  {                                                                              
    int rc= MY_TEST(!cond || cond->val_int());                                   
    if (thd->is_error())                                                         
      rc= -1;                                                                    
    return rc;                                                                   
  }

and thd->is_error() returns true, that's why record_should_be_deleted() returns false. The error is set with the following call stack:

#0  0x000055f31110c815 in Diagnostics_area::set_error_status (this=0x7f9d4c007070, sql_errno=1213,
    message=0x7f9d60b16e10 "Deadlock found when trying to get lock; try restarting transaction", sqlstate=0x55f311f25e81 "40001", ucid=...,
    error_condition=0x0) at ./sql/sql_error.cc:479                              
#1  0x000055f3110e3316 in THD::raise_condition (this=0x7f9d4c0011c8, sql_errno=1213, sqlstate=0x55f311f25e81 "40001",
    level=Sql_state_errno_level::WARN_LEVEL_ERROR, ucid=..., msg=0x7f9d60b16e10 "Deadlock found when trying to get lock; try restarting transaction")
    at ./sql/sql_class.cc:1137                                                  
#2  0x000055f31101d662 in THD::raise_condition (this=0x7f9d4c0011c8, sql_errno=1213, sqlstate=0x0, level=Sql_state_errno_level::WARN_LEVEL_ERROR,
    msg=0x7f9d60b16e10 "Deadlock found when trying to get lock; try restarting transaction") at ./sql/sql_class.h:4863
#3  0x000055f311010331 in my_message_sql (error=1213, str=0x7f9d60b16e10 "Deadlock found when trying to get lock; try restarting transaction", MyFlags=0)
    at ./sql/mysqld.cc:3292                                                     
#4  0x000055f311dd922f in my_printf_error (error=1213, format=0x55f3120ee1d1 "%s", MyFlags=0) at ./mysys/my_error.c:153
#5  0x000055f311518aea in handler::print_error (this=0x7f9d40252a90, error=149, errflag=0) at ./sql/handler.cc:4475
#6  0x000055f3111f1227 in report_error (table=0x7f9d402401e8, error=149) at ./sql/sql_select.cc:21671
#7  0x000055f3116098c6 in subselect_uniquesubquery_engine::scan_table (this=0x7f9d4c01c8c8)
    at ./sql/item_subselect.cc:4174                                             
#8  0x000055f311609c8e in subselect_uniquesubquery_engine::exec (this=0x7f9d4c01c8c8) at ./sql/item_subselect.cc:4282
#9  0x000055f3115fbdc9 in Item_subselect::exec (this=0x7f9d4c0167e8) at ./sql/item_subselect.cc:811
#10 0x000055f3115fc52f in Item_in_subselect::exec (this=0x7f9d4c0167e8) at ./sql/item_subselect.cc:993
#11 0x000055f3116005ba in Item_in_subselect::val_bool (this=0x7f9d4c0167e8) at ./sql/item_subselect.cc:1941
#12 0x000055f311031de9 in Item::val_bool_result (this=0x7f9d4c0167e8) at ./sql/item.h:1772
#13 0x000055f31155e191 in Item_in_optimizer::val_int (this=0x7f9d4c017340) at ./sql/item_cmpfunc.cc:1650
#14 0x000055f3113e4ae3 in Type_handler_int_result::Item_val_bool (this=0x55f312b08320 <type_handler_bool>, item=0x7f9d4c017340)
    at ./sql/sql_type.cc:5094                                                   
#15 0x000055f311031c0c in Item::val_bool (this=0x7f9d4c017340) at ./sql/item.h:1676
#16 0x000055f311559096 in Item_func_not::val_int (this=0x7f9d4c016a50) at ./sql/item_cmpfunc.cc:203
#17 0x000055f3112adcbd in SQL_SELECT::skip_record (this=0x7f9d4c01cfc0, thd=0x7f9d4c0011c8) at ./sql/opt_range.h:1728
#18 0x000055f3116ffa13 in record_should_be_deleted (thd=0x7f9d4c0011c8, table=0x7f9d40071558, sel=0x7f9d4c01cfc0, explain=0x7f9d4c01d190,
    truncate_history=false) at ./sql/sql_delete.cc:221                          

I.e. subselect_uniquesubquery_engine::scan_table() invokes report_error().

DELETE FROM t2 WHERE c NOT IN (SELECT ref_0.pkey FROM t1 as ref_0 INNER JOIN t1 AS ref_1 ON ref_0.c = ref_0.pkey);

We have the following call stack from record_should_be_deleted() to InnoDB:

#3  0x000055fc65f81d1a in ha_innobase::rnd_next (this=0x7f3eb8252a90, buf=0x7f3eb8247258 "\377")
    at ./storage/innobase/handler/ha_innodb.cc:9528                             
#4  0x000055fc65b56fdc in handler::ha_rnd_next (this=0x7f3eb8252a90, buf=0x7f3eb8247258 "\377") at ./sql/handler.cc:3416
#5  0x000055fc65d2326c in rr_sequential (info=0x7f3eb401e6c8) at ./sql/records.cc:519
#6  0x000055fc656ed65f in READ_RECORD::read_record (this=0x7f3eb401e6c8) at ./sql/records.h:81
#7  0x000055fc658356df in read_first_record_seq (tab=0x7f3eb401e600) at ./sql/sql_select.cc:22188
#8  0x000055fc65833526 in sub_select (join=0x7f3eb4017d90, join_tab=0x7f3eb401e600, end_of_records=false)
    at ./sql/sql_select.cc:21291                                                
#9  0x000055fc65832a21 in do_select (join=0x7f3eb4017d90, procedure=0x0) at ./sql/sql_select.cc:20839
#10 0x000055fc65805a40 in JOIN::exec_inner (this=0x7f3eb4017d90) at ./sql/sql_select.cc:4787
#11 0x000055fc65804ab1 in JOIN::exec (this=0x7f3eb4017d90) at ./sql/sql_select.cc:4565
#12 0x000055fc65c4c4d6 in subselect_single_select_engine::exec (this=0x7f3eb4017ab8) at ./sql/item_subselect.cc:4093
#13 0x000055fc65c3edc9 in Item_subselect::exec (this=0x7f3eb4017890) at ./sql/item_subselect.cc:811
#14 0x000055fc65c3f52f in Item_in_subselect::exec (this=0x7f3eb4017890) at ./sql/item_subselect.cc:993
#15 0x000055fc65c435ba in Item_in_subselect::val_bool (this=0x7f3eb4017890) at ./sql/item_subselect.cc:1941
#16 0x000055fc65674de9 in Item::val_bool_result (this=0x7f3eb4017890) at ./sql/item.h:1772
#17 0x000055fc65ba1191 in Item_in_optimizer::val_int (this=0x7f3eb4018430) at ./sql/item_cmpfunc.cc:1650
#18 0x000055fc65a27ae3 in Type_handler_int_result::Item_val_bool (this=0x55fc6714b320 <type_handler_bool>, item=0x7f3eb4018430)
    at ./sql/sql_type.cc:5094                                                   
#19 0x000055fc65674c0c in Item::val_bool (this=0x7f3eb4018430) at ./sql/item.h:1676
#20 0x000055fc65b9c096 in Item_func_not::val_int (this=0x7f3eb4017af8) at ./sql/item_cmpfunc.cc:203
#21 0x000055fc658f0cbd in SQL_SELECT::skip_record (this=0x7f3eb4020ae8, thd=0x7f3eb40011c8) at ./sql/opt_range.h:1728
#22 0x000055fc65d42a13 in record_should_be_deleted (thd=0x7f3eb40011c8, table=0x7f3eb8071558, sel=0x7f3eb4020ae8, explain=0x7f3eb4020cb8,
    truncate_history=false) at ./sql/sql_delete.cc:221                          

thd->is_error() returns false in SQL_SELECT::skip_record(), that's why record_should_be_deleted() returns true despite InnoDB passed DB_DEADLOCK error for the subselect.

I suppose thd->m_stmt_da->m_status should be set in:

#0  rr_handle_error (info=0x7f3eb401e6c8, error=149) at ./sql/records.cc:389
#1  0x000055fc65d2328b in rr_sequential (info=0x7f3eb401e6c8) at ./sql/records.cc:521

with the following code:

static int rr_handle_error(READ_RECORD *info, int error)                        
{                                                                               
  ...                                                                           
  if (error == HA_ERR_END_OF_FILE)                                              
    error= -1;                                                                  
  else                                                                          
  {                                                                             
    if (info->print_error)                                                      
      info->table->file->print_error(error, MYF(0));                            
    ...                                                                         
  }                                                                             
  ...                                                                           
} 

But info->print_error is not set, that's why thd->is_error() returns false in SQL_SELECT::skip_record().

I think the following research should be focused on info->print_error, i.e. for what cases it should be set, and why it isn't set for our case.

Comment by Vladislav Lesin [ 2022-11-02 ]

The following fix:

--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -13683,6 +13683,7 @@ make_join_readinfo(JOIN *join, ulonglong options, uint no_jbuf_after)
     uint jcl= tab->used_join_cache_level;
     tab->read_record.table= table;
     tab->read_record.unlock_row= rr_unlock_row;
+    tab->read_record.print_error= true;
     tab->sorted= sorted;
     sorted= 0;                                  // only first must be sorted
     

makes the above test passed, as well as local mtr testing.

sanja, could you please ask somebody from Runtime team to check the correctness of the fix?

Comment by Vladislav Lesin [ 2022-11-02 ]

I have not checked yet, but MDEV-10087 looks very similar to this issue, and the above fix might also fix MDEV-10087.

Comment by Vladislav Lesin [ 2022-11-03 ]

I have just checked MDEV-10087, the above test crashes if we change DELETE to UPDATE in it. So the reason of the bug is the same in MDEV-10087. And the above fix also works for MDEV-10087.

Comment by Vladislav Lesin [ 2022-11-03 ]

It affects 10.3+. The fix is the same.

Comment by Sergei Petrunia [ 2022-11-09 ]

Looked at this from the SQL layer point of view and I approve the fix.

Comment by Vladislav Lesin [ 2022-11-09 ]

This issue is closed as a duplicate of MDEV-10087.

Generated at Thu Feb 08 10:08:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.