Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21987

Assertion failure in file lock0lock.cc line 6901 / DeadlockChecker::check_and_resolve

Details

    Description

      Notes:
      The test case currently fails every time for me, but there is still some timing involved, so try to run several times and/or modify timeouts if it doesn't fail right away.
      The problem is likely to be related to MDEV-20666 / MDEV-14846. On a debug build it causes the same assertion failure as MDEV-20666. However, unlike test cases in other reports, this one also affects non-debug builds (causing a non-debug assertion failure), which means it potentially more severe.

      --source include/have_innodb.inc
       
      CREATE TABLE t1 (a DATE DEFAULT '2000-01-01', b DATE DEFAULT '2020-01-01', PERIOD FOR app(a,b)) ENGINE=InnoDB;
      INSERT INTO t1 () VALUES (),();
      START TRANSACTION;
      INSERT INTO t1 () VALUES (),();
       
      --connect (con1,localhost,root,,test)
      SET innodb_lock_wait_timeout= 2, lock_wait_timeout= 2;
      --send
        DELETE FROM t1 FOR PORTION OF app FROM '2004-10-21 12:00:00.000' TO '2020-03-19 19:50:10.187';
       
      --connection default
      --sleep 1
      UPDATE t1 SET a = '2010-01-01';
       
      --connection con1
      --error ER_LOCK_WAIT_TIMEOUT
      --reap
       
      # Cleanup
      --disconnect con1
      --connection default
      DROP TABLE t1;
      

      10.4 RelWithDebInfo 6a634576

      2020-03-19 21:04:07 0x7fbf38105700  InnoDB: Assertion failure in file /data/src/10.4/storage/innobase/lock/lock0lock.cc line 6901
      InnoDB: We intentionally generate a memory trap.
       
      #6  0x00005632b153ba8d in ut_dbg_assertion_failed (expr=expr@entry=0x0, file=file@entry=0x5632b1ee1aa0 "/data/src/10.4/storage/innobase/lock/lock0lock.cc", line=line@entry=6901) at /data/src/10.4/storage/innobase/ut/ut0dbg.cc:60
      #7  0x00005632b19d7a46 in DeadlockChecker::check_and_resolve (lock=lock@entry=0x7fbf33400228, trx=trx@entry=0x7fbf33400158) at /data/src/10.4/storage/innobase/lock/lock0lock.cc:6901
      #8  0x00005632b19d7b7c in lock_rec_enqueue_waiting (c_lock=0x7fbf333ff2e0, type_mode=type_mode@entry=1027, block=block@entry=0x7fbf32c19828, heap_no=heap_no@entry=2, index=index@entry=0x7fbee407fc88, thr=thr@entry=0x7fbee40c3420, prdt=0x0) at /data/src/10.4/storage/innobase/lock/lock0lock.cc:1727
      #9  0x00005632b19d854e in lock_rec_lock (impl=impl@entry=false, mode=mode@entry=1027, block=block@entry=0x7fbf32c19828, heap_no=2, index=index@entry=0x7fbee407fc88, thr=thr@entry=0x7fbee40c3420) at /data/src/10.4/storage/innobase/lock/lock0lock.cc:1972
      #10 0x00005632b19d8c9d in lock_clust_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0x7fbf32c19828, rec=rec@entry=0x7fbf3311007d <error: Cannot access memory at address 0x7fbf3311007d>, index=index@entry=0x7fbee407fc88, offsets=offsets@entry=0x7fbf38100020, mode=mode@entry=LOCK_X, gap_mode=1024, thr=0x7fbee40c3420) at /data/src/10.4/storage/innobase/lock/lock0lock.cc:5832
      #11 0x00005632b1a613fe in sel_set_rec_lock (pcur=pcur@entry=0x7fbee40c2c00, rec=rec@entry=0x7fbf3311007d <error: Cannot access memory at address 0x7fbf3311007d>, index=index@entry=0x7fbee407fc88, offsets=0x7fbf38100020, mode=3, type=type@entry=1024, thr=0x7fbee40c3420, mtr=0x7fbf38100c60) at /data/src/10.4/storage/innobase/row/row0sel.cc:1264
      #12 0x00005632b1a6663c in row_search_mvcc (buf=buf@entry=0x7fbee4081728 "\377!\240\017!\310\017", mode=mode@entry=PAGE_CUR_GE, prebuilt=0x7fbee40c2a38, match_mode=match_mode@entry=1, direction=direction@entry=0) at /data/src/10.4/storage/innobase/row/row0sel.cc:5051
      #13 0x00005632b1997abb in ha_innobase::index_read (this=0x7fbee40839e0, buf=0x7fbee4081728 "\377!\240\017!\310\017", key_ptr=0x7fbee800fcb8 "", key_len=<optimized out>, find_flag=<optimized out>) at /data/src/10.4/storage/innobase/handler/ha_innodb.cc:9296
      #14 0x00005632b1813603 in handler::ha_rnd_pos (this=0x7fbee40839e0, buf=0x7fbee4081728 "\377!\240\017!\310\017", pos=0x7fbee800fcb8 "") at /data/src/10.4/sql/handler.cc:2861
      #15 0x00005632b192478a in rr_from_pointers (info=0x7fbf38101370) at /data/src/10.4/sql/records.cc:543
      #16 0x00005632b19395e6 in READ_RECORD::read_record (this=0x7fbf38101370) at /data/src/10.4/sql/records.h:69
      #17 mysql_delete (thd=thd@entry=0x7fbee80009a8, table_list=0x7fbee800fe98, conds=<optimized out>, order_list=order_list@entry=0x7fbee80051a0, limit=18446744073709551615, options=<optimized out>, result=0x0) at /data/src/10.4/sql/sql_delete.cc:772
      #18 0x00005632b161759b in mysql_execute_command (thd=thd@entry=0x7fbee80009a8) at /data/src/10.4/sql/sql_parse.cc:4727
      #19 0x00005632b161c0c9 in mysql_parse (thd=thd@entry=0x7fbee80009a8, rawbuf=<optimized out>, length=93, parser_state=parser_state@entry=0x7fbf381041a0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /data/src/10.4/sql/sql_parse.cc:7900
      #20 0x00005632b161e4c8 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fbee80009a8, packet=packet@entry=0x7fbee8007a19 "DELETE FROM t1 FOR PORTION OF app FROM '2004-10-21 12:00:00.000' TO '2020-03-19 19:50:10.187'", packet_length=packet_length@entry=93, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /data/src/10.4/sql/sql_parse.cc:1842
      #21 0x00005632b161fca9 in do_command (thd=0x7fbee80009a8) at /data/src/10.4/sql/sql_parse.cc:1360
      #22 0x00005632b16ef944 in do_handle_one_connection (connect=connect@entry=0x5632b5274548) at /data/src/10.4/sql/sql_connect.cc:1412
      #23 0x00005632b16ef9f4 in handle_one_connection (arg=arg@entry=0x5632b5274548) at /data/src/10.4/sql/sql_connect.cc:1316
      #24 0x00005632b1c9e154 in pfs_spawn_thread (arg=0x5632b52748e8) at /data/src/10.4/storage/perfschema/pfs.cc:1869
      #25 0x00007fbf3f92f4a4 in start_thread (arg=0x7fbf38105700) at pthread_create.c:456
      #26 0x00007fbf3da63d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
      

      Reproducible on 10.4, 10.5. The test case is not applicable to previous versions.

      The debug failure for the record:

      10.4 Debug 6a634576

      mysqld: /data/src/10.4/storage/innobase/row/row0sel.cc:4485: dberr_t row_search_mvcc(byte*, page_cur_mode_t, row_prebuilt_t*, ulint, ulint): Assertion `prebuilt->sql_stat_start || trx->state == TRX_STATE_ACTIVE || (prebuilt->table->no_rollback() && trx->state == TRX_STATE_NOT_STARTED)' failed.
      200319 21:09:23 [ERROR] mysqld got signal 6 ;
       
      #6  0x00007f4dd79c7e67 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55cbad89f388 "prebuilt->sql_stat_start || trx->state == TRX_STATE_ACTIVE || (prebuilt->table->no_rollback() && trx->state == TRX_STATE_NOT_STARTED)", file=file@entry=0x55cbad89e0a0 "/data/src/10.4/storage/innobase/row/row0sel.cc", line=line@entry=4485, function=function@entry=0x55cbad8a2500 <row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)::__PRETTY_FUNCTION__> "dberr_t row_search_mvcc(byte*, page_cur_mode_t, row_prebuilt_t*, ulint, ulint)") at assert.c:92
      #7  0x00007f4dd79c7f12 in __GI___assert_fail (assertion=0x55cbad89f388 "prebuilt->sql_stat_start || trx->state == TRX_STATE_ACTIVE || (prebuilt->table->no_rollback() && trx->state == TRX_STATE_NOT_STARTED)", file=0x55cbad89e0a0 "/data/src/10.4/storage/innobase/row/row0sel.cc", line=4485, function=0x55cbad8a2500 <row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)::__PRETTY_FUNCTION__> "dberr_t row_search_mvcc(byte*, page_cur_mode_t, row_prebuilt_t*, ulint, ulint)") at assert.c:101
      #8  0x000055cbad0964c6 in row_search_mvcc (buf=0x7f4d8400a5c0 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\360\245", mode=PAGE_CUR_GE, prebuilt=0x7f4d84188220, match_mode=1, direction=0) at /data/src/10.4/storage/innobase/row/row0sel.cc:4485
      #9  0x000055cbacea87ba in ha_innobase::index_read (this=0x7f4d8400aa18, buf=0x7f4d8400a5c0 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\360\245", key_ptr=0x7f4d78018f00 "", key_len=6, find_flag=HA_READ_KEY_EXACT) at /data/src/10.4/storage/innobase/handler/ha_innodb.cc:9296
      #10 0x000055cbacea9add in ha_innobase::rnd_pos (this=0x7f4d8400aa18, buf=0x7f4d8400a5c0 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\360\245", pos=0x7f4d78018f00 "") at /data/src/10.4/storage/innobase/handler/ha_innodb.cc:9797
      #11 0x000055cbacc5ccff in handler::ha_rnd_pos (this=0x7f4d8400aa18, buf=0x7f4d8400a5c0 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\360\245", pos=0x7f4d78018f00 "") at /data/src/10.4/sql/handler.cc:2861
      #12 0x000055cbace0252d in rr_from_pointers (info=0x7f4dd0d23e30) at /data/src/10.4/sql/records.cc:543
      #13 0x000055cbac845559 in READ_RECORD::read_record (this=0x7f4dd0d23e30) at /data/src/10.4/sql/records.h:69
      #14 0x000055cbace2068d in mysql_delete (thd=0x7f4d78000af0, table_list=0x7f4d78011f40, conds=0x7f4d780133c0, order_list=0x7f4d780054a8, limit=18446744073709551615, options=0, result=0x0) at /data/src/10.4/sql/sql_delete.cc:772
      #15 0x000055cbac8f9a0a in mysql_execute_command (thd=0x7f4d78000af0) at /data/src/10.4/sql/sql_parse.cc:4727
      #16 0x000055cbac905247 in mysql_parse (thd=0x7f4d78000af0, rawbuf=0x7f4d78011dd8 "DELETE FROM t1 FOR PORTION OF app FROM '2004-10-21 12:00:00.000' TO '2020-03-19 19:50:10.187'", length=93, parser_state=0x7f4dd0d25160, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:7900
      #17 0x000055cbac8f042a in dispatch_command (command=COM_QUERY, thd=0x7f4d78000af0, packet=0x7f4d780083a1 "DELETE FROM t1 FOR PORTION OF app FROM '2004-10-21 12:00:00.000' TO '2020-03-19 19:50:10.187'", packet_length=93, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1842
      #18 0x000055cbac8eeab7 in do_command (thd=0x7f4d78000af0) at /data/src/10.4/sql/sql_parse.cc:1360
      #19 0x000055cbaca77f3d in do_handle_one_connection (connect=0x55cbb01a5e30) at /data/src/10.4/sql/sql_connect.cc:1412
      #20 0x000055cbaca77c8c in handle_one_connection (arg=0x55cbb01a5e30) at /data/src/10.4/sql/sql_connect.cc:1316
      #21 0x000055cbad47f129 in pfs_spawn_thread (arg=0x55cbb01107f0) at /data/src/10.4/storage/perfschema/pfs.cc:1869
      #22 0x00007f4dd99504a4 in start_thread (arg=0x7f4dd0d26700) at pthread_create.c:456
      #23 0x00007f4dd7a84d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
      

      Attachments

        Issue Links

          Activity

            asun Andrew Sun added a comment - - edited

            I encountered this assertion failure on a busy database running 10.4.12 (in production-ish):

            mariadb-ssd_1           | 2020-05-17 19:01:16 0x7faab5457700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.4.12/storage/innobase/lock/lock0lock.cc line 6901
            mariadb-ssd_1           | InnoDB: We intentionally generate a memory trap.
            mariadb-ssd_1           | InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
            mariadb-ssd_1           | InnoDB: If you get repeated assertion failures or crashes, even
            mariadb-ssd_1           | InnoDB: immediately after the mysqld startup, there may be
            mariadb-ssd_1           | InnoDB: corruption in the InnoDB tablespace. Please refer to
            mariadb-ssd_1           | InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
            mariadb-ssd_1           | InnoDB: about forcing recovery.
            mariadb-ssd_1           | 200517 19:01:16 [ERROR] mysqld got signal 6 ;
            mariadb-ssd_1           | This could be because you hit a bug. It is also possible that this binary
            mariadb-ssd_1           | or one of the libraries it was linked against is corrupt, improperly built,
            mariadb-ssd_1           | or misconfigured. This error can also be caused by malfunctioning hardware.
            mariadb-ssd_1           |
            mariadb-ssd_1           | To report this bug, see https://mariadb.com/kb/en/reporting-bugs
            mariadb-ssd_1           |
            mariadb-ssd_1           | We will try our best to scrape up some info that will hopefully help
            mariadb-ssd_1           | diagnose the problem, but since we have already crashed,
            mariadb-ssd_1           | something is definitely wrong and this may fail.
            mariadb-ssd_1           |
            mariadb-ssd_1           | Server version: 10.4.12-MariaDB-1:10.4.12+maria~bionic
            mariadb-ssd_1           | key_buffer_size=134217728
            mariadb-ssd_1           | read_buffer_size=2097152
            mariadb-ssd_1           | max_used_connections=67
            mariadb-ssd_1           | max_threads=2050
            mariadb-ssd_1           | thread_count=73
            mariadb-ssd_1           | It is possible that mysqld could use up to
            mariadb-ssd_1           | key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 12776904 K  bytes of memory
            mariadb-ssd_1           | Hope that's ok; if not, decrease some variables in the equation.
            mariadb-ssd_1           |
            mariadb-ssd_1           | Thread pointer: 0x7fa9ec412008
            mariadb-ssd_1           | Attempting backtrace. You can use the following information to find out
            mariadb-ssd_1           | where mysqld died. If you see no messages after this, something went
            mariadb-ssd_1           | terribly wrong...
            mariadb-ssd_1           | stack_bottom = 0x7faab5456d98 thread_stack 0x49000
            mariadb-ssd_1           | *** buffer overflow detected ***: mysqld terminated
            mariadb-ssd_1           | Fatal signal 11 while backtracing
            

            It is unclear whether this happened each time the database server has previously crashed, nor is it clear exactly what caused this crash (though the application running that uses this database uses a lot of SELECT ... LOCK IN SHARE MODE and SELECT ... FOR UPDATE queries).

            Edit: This crash seems to happen every hour or so. It seems to coincide with a significant increase in database traffic involving a job queue table backed by InnoDB, where the job queue started being processed.

            asun Andrew Sun added a comment - - edited I encountered this assertion failure on a busy database running 10.4.12 (in production-ish): mariadb-ssd_1 | 2020-05-17 19:01:16 0x7faab5457700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.4.12/storage/innobase/lock/lock0lock.cc line 6901 mariadb-ssd_1 | InnoDB: We intentionally generate a memory trap. mariadb-ssd_1 | InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ mariadb-ssd_1 | InnoDB: If you get repeated assertion failures or crashes, even mariadb-ssd_1 | InnoDB: immediately after the mysqld startup, there may be mariadb-ssd_1 | InnoDB: corruption in the InnoDB tablespace. Please refer to mariadb-ssd_1 | InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/ mariadb-ssd_1 | InnoDB: about forcing recovery. mariadb-ssd_1 | 200517 19:01:16 [ERROR] mysqld got signal 6 ; mariadb-ssd_1 | This could be because you hit a bug. It is also possible that this binary mariadb-ssd_1 | or one of the libraries it was linked against is corrupt, improperly built, mariadb-ssd_1 | or misconfigured. This error can also be caused by malfunctioning hardware. mariadb-ssd_1 | mariadb-ssd_1 | To report this bug, see https://mariadb.com/kb/en/reporting-bugs mariadb-ssd_1 | mariadb-ssd_1 | We will try our best to scrape up some info that will hopefully help mariadb-ssd_1 | diagnose the problem, but since we have already crashed, mariadb-ssd_1 | something is definitely wrong and this may fail. mariadb-ssd_1 | mariadb-ssd_1 | Server version: 10.4.12-MariaDB-1:10.4.12+maria~bionic mariadb-ssd_1 | key_buffer_size=134217728 mariadb-ssd_1 | read_buffer_size=2097152 mariadb-ssd_1 | max_used_connections=67 mariadb-ssd_1 | max_threads=2050 mariadb-ssd_1 | thread_count=73 mariadb-ssd_1 | It is possible that mysqld could use up to mariadb-ssd_1 | key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 12776904 K bytes of memory mariadb-ssd_1 | Hope that's ok; if not, decrease some variables in the equation. mariadb-ssd_1 | mariadb-ssd_1 | Thread pointer: 0x7fa9ec412008 mariadb-ssd_1 | Attempting backtrace. You can use the following information to find out mariadb-ssd_1 | where mysqld died. If you see no messages after this, something went mariadb-ssd_1 | terribly wrong... mariadb-ssd_1 | stack_bottom = 0x7faab5456d98 thread_stack 0x49000 mariadb-ssd_1 | *** buffer overflow detected ***: mysqld terminated mariadb-ssd_1 | Fatal signal 11 while backtracing It is unclear whether this happened each time the database server has previously crashed, nor is it clear exactly what caused this crash (though the application running that uses this database uses a lot of SELECT ... LOCK IN SHARE MODE and SELECT ... FOR UPDATE queries). Edit: This crash seems to happen every hour or so. It seems to coincide with a significant increase in database traffic involving a job queue table backed by InnoDB, where the job queue started being processed.
            asun Andrew Sun added a comment -

            MariaDB seems to have finally gotten past segfaulting after the assertion failure, and I found this in the logs that may be relevant to the investigation of this bug: https://gist.github.com/as-com/6ecd0f0951efd187862e07b042573985

            We should probably escalate the priority of this issue since it's actively crashing every 30 seconds to 2 hours on a production server.

            asun Andrew Sun added a comment - MariaDB seems to have finally gotten past segfaulting after the assertion failure, and I found this in the logs that may be relevant to the investigation of this bug: https://gist.github.com/as-com/6ecd0f0951efd187862e07b042573985 We should probably escalate the priority of this issue since it's actively crashing every 30 seconds to 2 hours on a production server.

            Reproduced on mariadb-10.4.3, not on mariadb-10.4.2. However, this does not provide much information because the application-time period is introduced by 10.4.3.

            nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - - edited Reproduced on mariadb-10.4.3, not on mariadb-10.4.2. However, this does not provide much information because the application-time period is introduced by 10.4.3.

            I checked the test case with rr replay. It is pretty simple to see with a watchpoint on trx_t::state:

            10.5

            #5  0x000055894ca22572 in trx_t::rollback (this=0x7f72e5fb6360, savept=0x0)
                at /mariadb/10.5m/storage/innobase/trx/trx0roll.cc:178
            #6  0x000055894c973084 in row_mysql_handle_errors (new_err=new_err@entry=0x7f72d50adbcc, 
                trx=trx@entry=0x7f72e5fb6360, thr=thr@entry=0x7f72c400b838, savept=savept@entry=0x0)
                at /mariadb/10.5m/storage/innobase/row/row0mysql.cc:733
            #7  0x000055894c9bc85b in row_search_mvcc (buf=<optimized out>, 
                buf@entry=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", mode=PAGE_CUR_G, mode@entry=PAGE_CUR_UNSUPP, prebuilt=0x7f72c400ae88, match_mode=<optimized out>, 
                match_mode@entry=0, direction=<optimized out>, direction@entry=1)
                at /mariadb/10.5m/storage/innobase/row/row0sel.cc:5691
            #8  0x000055894c7fbffe in ha_innobase::general_fetch (this=0x7f72c41f8400, 
                buf=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", 
                direction=1, match_mode=0) at /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:9116
            #9  0x000055894c4c57d5 in handler::ha_rnd_next (this=0x7f72c41f8400, 
                buf=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177")
                at /mariadb/10.5m/sql/handler.cc:3066
            #10 0x000055894c639f26 in rr_sequential (info=0x7f72d50ae898) at /mariadb/10.5m/sql/records.cc:519
            #11 0x000055894c652506 in READ_RECORD::read_record (this=0x7f72d50ae898)
                at /mariadb/10.5m/sql/records.h:80
            #12 mysql_delete (thd=thd@entry=0x7f72c0001af8, table_list=<optimized out>, conds=<optimized out>, 
                order_list=<optimized out>, order_list@entry=0x7f72c0006750, limit=<optimized out>, options=0, 
                result=<optimized out>) at /mariadb/10.5m/sql/sql_delete.cc:721
            #13 0x000055894c267791 in mysql_execute_command (thd=thd@entry=0x7f72c0001af8)
                at /mariadb/10.5m/sql/sql_parse.cc:4870
            #14 0x000055894c25d3b0 in mysql_parse (thd=thd@entry=0x7f72c0001af8, 
                rawbuf=0x7f72c0013b80 "DELETE FROM t1 FOR PORTION OF app FROM '2004-10-21 12:00:00.000' TO '2020-03-19 19:50:10.187'", length=<optimized out>, parser_state=<optimized out>, 
                parser_state@entry=0x7f72d50af620, is_com_multi=false, is_next_command=<optimized out>)
                at /mariadb/10.5m/sql/sql_parse.cc:8099
            

            A DB_DEADLOCK error results in the entire transaction being rolled back. But, the SQL layer ignored this error and attempted to carry on as if nothing had happened:

            #4  0x000055894c9b801f in row_search_mvcc (buf=<optimized out>, 
                buf@entry=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", mode=mode@entry=PAGE_CUR_GE, prebuilt=0x7f72c400ae88, match_mode=<optimized out>, 
                direction=<optimized out>, direction@entry=0) at /mariadb/10.5m/storage/innobase/row/row0sel.cc:4546
            #5  0x000055894c7fbbfb in ha_innobase::index_read (this=this@entry=0x7f72c41f8400, 
                buf=buf@entry=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", key_ptr=key_ptr@entry=0x7f72c0008de8 "", key_len=<optimized out>, 
                find_flag=find_flag@entry=HA_READ_KEY_EXACT)
                at /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:8858
            #6  0x000055894c7fc3d6 in ha_innobase::rnd_pos (this=0x7f72c41f8400, 
                buf=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", 
                pos=0x7f72c0008de8 "") at /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:9347
            #7  0x000055894c4c5bd1 in handler::ha_rnd_pos (this=0x7f72c41f8400, buf=<optimized out>, 
                pos=0x7f72c0008de8 "") at /mariadb/10.5m/sql/handler.cc:3098
            #8  0x000055894c639ea4 in rr_from_pointers (info=0x7f72d50ae898) at /mariadb/10.5m/sql/records.cc:615
            #9  0x000055894c6528b7 in READ_RECORD::read_record (this=0x7f72d50ae898)
                at /mariadb/10.5m/sql/records.h:80
            #10 mysql_delete (thd=thd@entry=0x7f72c0001af8, table_list=<optimized out>, conds=<optimized out>, 
                order_list=<optimized out>, order_list@entry=0x7f72c0006750, limit=18446744073709551615, options=0, 
                result=<optimized out>) at /mariadb/10.5m/sql/sql_delete.cc:772
            #11 0x000055894c267791 in mysql_execute_command (thd=thd@entry=0x7f72c0001af8)
                at /mariadb/10.5m/sql/sql_parse.cc:4870
            

            Here, the debug assertion would fail:

            	ut_ad(prebuilt->sql_stat_start
            	      || trx->state == TRX_STATE_ACTIVE
            	      || (prebuilt->table->no_rollback()
            		  && trx->state == TRX_STATE_NOT_STARTED));
            

            The problem should be in mysql_delete(). It is easier to provide a patch than to explain what is going wrong:

            diff --git a/sql/sql_delete.cc b/sql/sql_delete.cc
            index cf411d01f38..358887afd2b 100644
            --- a/sql/sql_delete.cc
            +++ b/sql/sql_delete.cc
            @@ -718,9 +718,18 @@ bool mysql_delete(THD *thd, TABLE_LIST *table_list, COND *conds,
                                                          MEM_STRIP_BUF_SIZE);
             
                 THD_STAGE_INFO(thd, stage_searching_rows_for_update);
            -    while (!(error=info.read_record()) && !thd->killed &&
            -          ! thd->is_error())
            +    for (;;)
                 {
            +      error= info.read_record();
            +      if (error)
            +      {
            +        error= 1;
            +        goto terminate_delete;
            +      }
            +
            +      if (thd->killed || thd->is_error())
            +        break;
            +
                   if (record_should_be_deleted(thd, table, select, explain, delete_history))
                   {
                     table->file->position(table->record[0]);
            

            With this fix, the test no longer crashes. But the transaction will have been terminated due to deadlock, not lock wait timeout:

            mysqltest: At line 19: query 'reap' failed with wrong errno 1213: 'Deadlock found when trying to get lock; try restarting transaction', instead of 1205...
            

            marko Marko Mäkelä added a comment - I checked the test case with rr replay . It is pretty simple to see with a watchpoint on trx_t::state : 10.5 #5 0x000055894ca22572 in trx_t::rollback (this=0x7f72e5fb6360, savept=0x0) at /mariadb/10.5m/storage/innobase/trx/trx0roll.cc:178 #6 0x000055894c973084 in row_mysql_handle_errors (new_err=new_err@entry=0x7f72d50adbcc, trx=trx@entry=0x7f72e5fb6360, thr=thr@entry=0x7f72c400b838, savept=savept@entry=0x0) at /mariadb/10.5m/storage/innobase/row/row0mysql.cc:733 #7 0x000055894c9bc85b in row_search_mvcc (buf=<optimized out>, buf@entry=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", mode=PAGE_CUR_G, mode@entry=PAGE_CUR_UNSUPP, prebuilt=0x7f72c400ae88, match_mode=<optimized out>, match_mode@entry=0, direction=<optimized out>, direction@entry=1) at /mariadb/10.5m/storage/innobase/row/row0sel.cc:5691 #8 0x000055894c7fbffe in ha_innobase::general_fetch (this=0x7f72c41f8400, buf=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", direction=1, match_mode=0) at /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:9116 #9 0x000055894c4c57d5 in handler::ha_rnd_next (this=0x7f72c41f8400, buf=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177") at /mariadb/10.5m/sql/handler.cc:3066 #10 0x000055894c639f26 in rr_sequential (info=0x7f72d50ae898) at /mariadb/10.5m/sql/records.cc:519 #11 0x000055894c652506 in READ_RECORD::read_record (this=0x7f72d50ae898) at /mariadb/10.5m/sql/records.h:80 #12 mysql_delete (thd=thd@entry=0x7f72c0001af8, table_list=<optimized out>, conds=<optimized out>, order_list=<optimized out>, order_list@entry=0x7f72c0006750, limit=<optimized out>, options=0, result=<optimized out>) at /mariadb/10.5m/sql/sql_delete.cc:721 #13 0x000055894c267791 in mysql_execute_command (thd=thd@entry=0x7f72c0001af8) at /mariadb/10.5m/sql/sql_parse.cc:4870 #14 0x000055894c25d3b0 in mysql_parse (thd=thd@entry=0x7f72c0001af8, rawbuf=0x7f72c0013b80 "DELETE FROM t1 FOR PORTION OF app FROM '2004-10-21 12:00:00.000' TO '2020-03-19 19:50:10.187'", length=<optimized out>, parser_state=<optimized out>, parser_state@entry=0x7f72d50af620, is_com_multi=false, is_next_command=<optimized out>) at /mariadb/10.5m/sql/sql_parse.cc:8099 A DB_DEADLOCK error results in the entire transaction being rolled back. But, the SQL layer ignored this error and attempted to carry on as if nothing had happened: #4 0x000055894c9b801f in row_search_mvcc (buf=<optimized out>, buf@entry=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", mode=mode@entry=PAGE_CUR_GE, prebuilt=0x7f72c400ae88, match_mode=<optimized out>, direction=<optimized out>, direction@entry=0) at /mariadb/10.5m/storage/innobase/row/row0sel.cc:4546 #5 0x000055894c7fbbfb in ha_innobase::index_read (this=this@entry=0x7f72c41f8400, buf=buf@entry=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", key_ptr=key_ptr@entry=0x7f72c0008de8 "", key_len=<optimized out>, find_flag=find_flag@entry=HA_READ_KEY_EXACT) at /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:8858 #6 0x000055894c7fc3d6 in ha_innobase::rnd_pos (this=0x7f72c41f8400, buf=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", pos=0x7f72c0008de8 "") at /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:9347 #7 0x000055894c4c5bd1 in handler::ha_rnd_pos (this=0x7f72c41f8400, buf=<optimized out>, pos=0x7f72c0008de8 "") at /mariadb/10.5m/sql/handler.cc:3098 #8 0x000055894c639ea4 in rr_from_pointers (info=0x7f72d50ae898) at /mariadb/10.5m/sql/records.cc:615 #9 0x000055894c6528b7 in READ_RECORD::read_record (this=0x7f72d50ae898) at /mariadb/10.5m/sql/records.h:80 #10 mysql_delete (thd=thd@entry=0x7f72c0001af8, table_list=<optimized out>, conds=<optimized out>, order_list=<optimized out>, order_list@entry=0x7f72c0006750, limit=18446744073709551615, options=0, result=<optimized out>) at /mariadb/10.5m/sql/sql_delete.cc:772 #11 0x000055894c267791 in mysql_execute_command (thd=thd@entry=0x7f72c0001af8) at /mariadb/10.5m/sql/sql_parse.cc:4870 Here, the debug assertion would fail: ut_ad(prebuilt->sql_stat_start || trx->state == TRX_STATE_ACTIVE || (prebuilt->table->no_rollback() && trx->state == TRX_STATE_NOT_STARTED)); The problem should be in mysql_delete() . It is easier to provide a patch than to explain what is going wrong: diff --git a/sql/sql_delete.cc b/sql/sql_delete.cc index cf411d01f38..358887afd2b 100644 --- a/sql/sql_delete.cc +++ b/sql/sql_delete.cc @@ -718,9 +718,18 @@ bool mysql_delete(THD *thd, TABLE_LIST *table_list, COND *conds, MEM_STRIP_BUF_SIZE); THD_STAGE_INFO(thd, stage_searching_rows_for_update); - while (!(error=info.read_record()) && !thd->killed && - ! thd->is_error()) + for (;;) { + error= info.read_record(); + if (error) + { + error= 1; + goto terminate_delete; + } + + if (thd->killed || thd->is_error()) + break; + if (record_should_be_deleted(thd, table, select, explain, delete_history)) { table->file->position(table->record[0]); With this fix, the test no longer crashes. But the transaction will have been terminated due to deadlock, not lock wait timeout: mysqltest: At line 19: query 'reap' failed with wrong errno 1213: 'Deadlock found when trying to get lock; try restarting transaction', instead of 1205...

            My crude fix would cause several regression tests to fail, including main.subselect. It hopefully gives an idea how this should be fixed. The API constraint always was that after a terminal error (such as ER_LOCK_DEADLOCK), no further row operations must be executed in the same thread before the transaction has been restarted. For DELETE, this could have been broken already in 10.3 with system-versioned tables (if not earlier). For versioned tables, DELETE would execute two operations for each affected row.

            marko Marko Mäkelä added a comment - My crude fix would cause several regression tests to fail, including main.subselect . It hopefully gives an idea how this should be fixed. The API constraint always was that after a terminal error (such as ER_LOCK_DEADLOCK ), no further row operations must be executed in the same thread before the transaction has been restarted. For DELETE , this could have been broken already in 10.3 with system-versioned tables (if not earlier). For versioned tables, DELETE would execute two operations for each affected row.

            At least the debug failure is still reproducible on all existing branches (10.4-11.4). I'm not sure about release builds, those may be already fixed, or at least the probability of the failure decreased.

            elenst Elena Stepanova added a comment - At least the debug failure is still reproducible on all existing branches (10.4-11.4). I'm not sure about release builds, those may be already fixed, or at least the probability of the failure decreased.

            People

              sanja Oleksandr Byelkin
              elenst Elena Stepanova
              Votes:
              5 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.