[MDEV-21987] Assertion failure in file lock0lock.cc line 6901 / DeadlockChecker::check_and_resolve - Jira

Details

Type: Bug
Status: Confirmed (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.4(EOL), 10.5, 10.6, 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4
Fix Version/s: 10.5, 10.6, 10.11
Component/s: Locking, Storage Engine - InnoDB, Versioned Tables
Labels:
None

Description

Notes:
The test case currently fails every time for me, but there is still some timing involved, so try to run several times and/or modify timeouts if it doesn't fail right away.
The problem is likely to be related to ~~MDEV-20666~~ / ~~MDEV-14846~~. On a debug build it causes the same assertion failure as ~~MDEV-20666~~. However, unlike test cases in other reports, this one also affects non-debug builds (causing a non-debug assertion failure), which means it potentially more severe.

--source include/have_innodb.inc

CREATE TABLE t1 (a DATE DEFAULT '2000-01-01', b DATE DEFAULT '2020-01-01', PERIOD FOR app(a,b)) ENGINE=InnoDB;

INSERT INTO t1 () VALUES (),();

START TRANSACTION;

INSERT INTO t1 () VALUES (),();

--connect (con1,localhost,root,,test)

SET innodb_lock_wait_timeout= 2, lock_wait_timeout= 2;

--send

  DELETE FROM t1 FOR PORTION OF app FROM '2004-10-21 12:00:00.000' TO '2020-03-19 19:50:10.187';

--connection default

--sleep 1

UPDATE t1 SET a = '2010-01-01';

--connection con1

--error ER_LOCK_WAIT_TIMEOUT

--reap

# Cleanup

--disconnect con1

--connection default

DROP TABLE t1;

10.4 RelWithDebInfo 6a634576
2020-03-19 21:04:07 0x7fbf38105700 InnoDB: Assertion failure in file /data/src/10.4/storage/innobase/lock/lock0lock.cc line 6901
InnoDB: We intentionally generate a memory trap.

#6 0x00005632b153ba8d in ut_dbg_assertion_failed (expr=expr@entry=0x0, file=file@entry=0x5632b1ee1aa0 "/data/src/10.4/storage/innobase/lock/lock0lock.cc", line=line@entry=6901) at /data/src/10.4/storage/innobase/ut/ut0dbg.cc:60
#7 0x00005632b19d7a46 in DeadlockChecker::check_and_resolve (lock=lock@entry=0x7fbf33400228, trx=trx@entry=0x7fbf33400158) at /data/src/10.4/storage/innobase/lock/lock0lock.cc:6901
#8 0x00005632b19d7b7c in lock_rec_enqueue_waiting (c_lock=0x7fbf333ff2e0, type_mode=type_mode@entry=1027, block=block@entry=0x7fbf32c19828, heap_no=heap_no@entry=2, index=index@entry=0x7fbee407fc88, thr=thr@entry=0x7fbee40c3420, prdt=0x0) at /data/src/10.4/storage/innobase/lock/lock0lock.cc:1727
#9 0x00005632b19d854e in lock_rec_lock (impl=impl@entry=false, mode=mode@entry=1027, block=block@entry=0x7fbf32c19828, heap_no=2, index=index@entry=0x7fbee407fc88, thr=thr@entry=0x7fbee40c3420) at /data/src/10.4/storage/innobase/lock/lock0lock.cc:1972
#10 0x00005632b19d8c9d in lock_clust_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0x7fbf32c19828, rec=rec@entry=0x7fbf3311007d <error: Cannot access memory at address 0x7fbf3311007d>, index=index@entry=0x7fbee407fc88, offsets=offsets@entry=0x7fbf38100020, mode=mode@entry=LOCK_X, gap_mode=1024, thr=0x7fbee40c3420) at /data/src/10.4/storage/innobase/lock/lock0lock.cc:5832
#11 0x00005632b1a613fe in sel_set_rec_lock (pcur=pcur@entry=0x7fbee40c2c00, rec=rec@entry=0x7fbf3311007d <error: Cannot access memory at address 0x7fbf3311007d>, index=index@entry=0x7fbee407fc88, offsets=0x7fbf38100020, mode=3, type=type@entry=1024, thr=0x7fbee40c3420, mtr=0x7fbf38100c60) at /data/src/10.4/storage/innobase/row/row0sel.cc:1264
#12 0x00005632b1a6663c in row_search_mvcc (buf=buf@entry=0x7fbee4081728 "\377!\240\017!\310\017", mode=mode@entry=PAGE_CUR_GE, prebuilt=0x7fbee40c2a38, match_mode=match_mode@entry=1, direction=direction@entry=0) at /data/src/10.4/storage/innobase/row/row0sel.cc:5051
#13 0x00005632b1997abb in ha_innobase::index_read (this=0x7fbee40839e0, buf=0x7fbee4081728 "\377!\240\017!\310\017", key_ptr=0x7fbee800fcb8 "", key_len=<optimized out>, find_flag=<optimized out>) at /data/src/10.4/storage/innobase/handler/ha_innodb.cc:9296
#14 0x00005632b1813603 in handler::ha_rnd_pos (this=0x7fbee40839e0, buf=0x7fbee4081728 "\377!\240\017!\310\017", pos=0x7fbee800fcb8 "") at /data/src/10.4/sql/handler.cc:2861
#15 0x00005632b192478a in rr_from_pointers (info=0x7fbf38101370) at /data/src/10.4/sql/records.cc:543
#16 0x00005632b19395e6 in READ_RECORD::read_record (this=0x7fbf38101370) at /data/src/10.4/sql/records.h:69
#17 mysql_delete (thd=thd@entry=0x7fbee80009a8, table_list=0x7fbee800fe98, conds=<optimized out>, order_list=order_list@entry=0x7fbee80051a0, limit=18446744073709551615, options=<optimized out>, result=0x0) at /data/src/10.4/sql/sql_delete.cc:772
#18 0x00005632b161759b in mysql_execute_command (thd=thd@entry=0x7fbee80009a8) at /data/src/10.4/sql/sql_parse.cc:4727
#19 0x00005632b161c0c9 in mysql_parse (thd=thd@entry=0x7fbee80009a8, rawbuf=<optimized out>, length=93, parser_state=parser_state@entry=0x7fbf381041a0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /data/src/10.4/sql/sql_parse.cc:7900
#20 0x00005632b161e4c8 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fbee80009a8, packet=packet@entry=0x7fbee8007a19 "DELETE FROM t1 FOR PORTION OF app FROM '2004-10-21 12:00:00.000' TO '2020-03-19 19:50:10.187'", packet_length=packet_length@entry=93, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /data/src/10.4/sql/sql_parse.cc:1842
#21 0x00005632b161fca9 in do_command (thd=0x7fbee80009a8) at /data/src/10.4/sql/sql_parse.cc:1360
#22 0x00005632b16ef944 in do_handle_one_connection (connect=connect@entry=0x5632b5274548) at /data/src/10.4/sql/sql_connect.cc:1412
#23 0x00005632b16ef9f4 in handle_one_connection (arg=arg@entry=0x5632b5274548) at /data/src/10.4/sql/sql_connect.cc:1316
#24 0x00005632b1c9e154 in pfs_spawn_thread (arg=0x5632b52748e8) at /data/src/10.4/storage/perfschema/pfs.cc:1869
#25 0x00007fbf3f92f4a4 in start_thread (arg=0x7fbf38105700) at pthread_create.c:456
#26 0x00007fbf3da63d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Reproducible on 10.4, 10.5. The test case is not applicable to previous versions.

The debug failure for the record:

10.4 Debug 6a634576
mysqld: /data/src/10.4/storage/innobase/row/row0sel.cc:4485: dberr_t row_search_mvcc(byte, page_cur_mode_t, row_prebuilt_t, ulint, ulint): Assertion `prebuilt->sql_stat_start \|\| trx->state == TRX_STATE_ACTIVE \|\| (prebuilt->table->no_rollback() && trx->state == TRX_STATE_NOT_STARTED)' failed.
200319 21:09:23 [ERROR] mysqld got signal 6 ;

#6 0x00007f4dd79c7e67 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55cbad89f388 "prebuilt->sql_stat_start \|\| trx->state == TRX_STATE_ACTIVE \|\| (prebuilt->table->no_rollback() && trx->state == TRX_STATE_NOT_STARTED)", file=file@entry=0x55cbad89e0a0 "/data/src/10.4/storage/innobase/row/row0sel.cc", line=line@entry=4485, function=function@entry=0x55cbad8a2500 <row_search_mvcc(unsigned char, page_cur_mode_t, row_prebuilt_t, unsigned long, unsigned long)::__PRETTY_FUNCTION__> "dberr_t row_search_mvcc(byte, page_cur_mode_t, row_prebuilt_t, ulint, ulint)") at assert.c:92
#7 0x00007f4dd79c7f12 in __GI___assert_fail (assertion=0x55cbad89f388 "prebuilt->sql_stat_start \|\| trx->state == TRX_STATE_ACTIVE \|\| (prebuilt->table->no_rollback() && trx->state == TRX_STATE_NOT_STARTED)", file=0x55cbad89e0a0 "/data/src/10.4/storage/innobase/row/row0sel.cc", line=4485, function=0x55cbad8a2500 <row_search_mvcc(unsigned char, page_cur_mode_t, row_prebuilt_t, unsigned long, unsigned long)::__PRETTY_FUNCTION__> "dberr_t row_search_mvcc(byte, page_cur_mode_t, row_prebuilt_t, ulint, ulint)") at assert.c:101
#8 0x000055cbad0964c6 in row_search_mvcc (buf=0x7f4d8400a5c0 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\360\245", mode=PAGE_CUR_GE, prebuilt=0x7f4d84188220, match_mode=1, direction=0) at /data/src/10.4/storage/innobase/row/row0sel.cc:4485
#9 0x000055cbacea87ba in ha_innobase::index_read (this=0x7f4d8400aa18, buf=0x7f4d8400a5c0 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\360\245", key_ptr=0x7f4d78018f00 "", key_len=6, find_flag=HA_READ_KEY_EXACT) at /data/src/10.4/storage/innobase/handler/ha_innodb.cc:9296
#10 0x000055cbacea9add in ha_innobase::rnd_pos (this=0x7f4d8400aa18, buf=0x7f4d8400a5c0 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\360\245", pos=0x7f4d78018f00 "") at /data/src/10.4/storage/innobase/handler/ha_innodb.cc:9797
#11 0x000055cbacc5ccff in handler::ha_rnd_pos (this=0x7f4d8400aa18, buf=0x7f4d8400a5c0 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\360\245", pos=0x7f4d78018f00 "") at /data/src/10.4/sql/handler.cc:2861
#12 0x000055cbace0252d in rr_from_pointers (info=0x7f4dd0d23e30) at /data/src/10.4/sql/records.cc:543
#13 0x000055cbac845559 in READ_RECORD::read_record (this=0x7f4dd0d23e30) at /data/src/10.4/sql/records.h:69
#14 0x000055cbace2068d in mysql_delete (thd=0x7f4d78000af0, table_list=0x7f4d78011f40, conds=0x7f4d780133c0, order_list=0x7f4d780054a8, limit=18446744073709551615, options=0, result=0x0) at /data/src/10.4/sql/sql_delete.cc:772
#15 0x000055cbac8f9a0a in mysql_execute_command (thd=0x7f4d78000af0) at /data/src/10.4/sql/sql_parse.cc:4727
#16 0x000055cbac905247 in mysql_parse (thd=0x7f4d78000af0, rawbuf=0x7f4d78011dd8 "DELETE FROM t1 FOR PORTION OF app FROM '2004-10-21 12:00:00.000' TO '2020-03-19 19:50:10.187'", length=93, parser_state=0x7f4dd0d25160, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:7900
#17 0x000055cbac8f042a in dispatch_command (command=COM_QUERY, thd=0x7f4d78000af0, packet=0x7f4d780083a1 "DELETE FROM t1 FOR PORTION OF app FROM '2004-10-21 12:00:00.000' TO '2020-03-19 19:50:10.187'", packet_length=93, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1842
#18 0x000055cbac8eeab7 in do_command (thd=0x7f4d78000af0) at /data/src/10.4/sql/sql_parse.cc:1360
#19 0x000055cbaca77f3d in do_handle_one_connection (connect=0x55cbb01a5e30) at /data/src/10.4/sql/sql_connect.cc:1412
#20 0x000055cbaca77c8c in handle_one_connection (arg=0x55cbb01a5e30) at /data/src/10.4/sql/sql_connect.cc:1316
#21 0x000055cbad47f129 in pfs_spawn_thread (arg=0x55cbb01107f0) at /data/src/10.4/storage/perfschema/pfs.cc:1869
#22 0x00007f4dd99504a4 in start_thread (arg=0x7f4dd0d26700) at pthread_create.c:456
#23 0x00007f4dd7a84d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Attachments

Issue Links

relates to

MDEV-14846 InnoDB: assertion on trx->state because of deadlock error ignored

Closed

MDEV-20666 Assertion failures prebuilt->sql_stat_start ... in row_search / index_read upon concurrent INSERT and CREATE .. SELECT

Closed

MDEV-21986 Unexpected error 3 "No such process" from storage engine upon DELETE ... FOR PORTION

Open

MDEV-25594 Crash in deadlock checker under high load

Closed

MDEV-29489 Assertion `lock_table_has(trx, index->table, LOCK_IX)' failed at lock0lock.cc:6025

Closed

Activity

Ascending order - Click to sort in descending order

Andrew Sun added a comment - 2020-05-17 20:37 - edited

I encountered this assertion failure on a busy database running 10.4.12 (in production-ish):

mariadb-ssd_1           | 2020-05-17 19:01:16 0x7faab5457700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.4.12/storage/innobase/lock/lock0lock.cc line 6901

mariadb-ssd_1           | InnoDB: We intentionally generate a memory trap.

mariadb-ssd_1           | InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

mariadb-ssd_1           | InnoDB: If you get repeated assertion failures or crashes, even

mariadb-ssd_1           | InnoDB: immediately after the mysqld startup, there may be

mariadb-ssd_1           | InnoDB: corruption in the InnoDB tablespace. Please refer to

mariadb-ssd_1           | InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/

mariadb-ssd_1           | InnoDB: about forcing recovery.

mariadb-ssd_1           | 200517 19:01:16 [ERROR] mysqld got signal 6 ;

mariadb-ssd_1           | This could be because you hit a bug. It is also possible that this binary

mariadb-ssd_1           | or one of the libraries it was linked against is corrupt, improperly built,

mariadb-ssd_1           | or misconfigured. This error can also be caused by malfunctioning hardware.

mariadb-ssd_1           |

mariadb-ssd_1           | To report this bug, see https://mariadb.com/kb/en/reporting-bugs

mariadb-ssd_1           |

mariadb-ssd_1           | We will try our best to scrape up some info that will hopefully help

mariadb-ssd_1           | diagnose the problem, but since we have already crashed,

mariadb-ssd_1           | something is definitely wrong and this may fail.

mariadb-ssd_1           |

mariadb-ssd_1           | Server version: 10.4.12-MariaDB-1:10.4.12+maria~bionic

mariadb-ssd_1           | key_buffer_size=134217728

mariadb-ssd_1           | read_buffer_size=2097152

mariadb-ssd_1           | max_used_connections=67

mariadb-ssd_1           | max_threads=2050

mariadb-ssd_1           | thread_count=73

mariadb-ssd_1           | It is possible that mysqld could use up to

mariadb-ssd_1           | key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 12776904 K  bytes of memory

mariadb-ssd_1           | Hope that's ok; if not, decrease some variables in the equation.

mariadb-ssd_1           |

mariadb-ssd_1           | Thread pointer: 0x7fa9ec412008

mariadb-ssd_1           | Attempting backtrace. You can use the following information to find out

mariadb-ssd_1           | where mysqld died. If you see no messages after this, something went

mariadb-ssd_1           | terribly wrong...

mariadb-ssd_1           | stack_bottom = 0x7faab5456d98 thread_stack 0x49000

mariadb-ssd_1           | *** buffer overflow detected ***: mysqld terminated

mariadb-ssd_1           | Fatal signal 11 while backtracing

It is unclear whether this happened each time the database server has previously crashed, nor is it clear exactly what caused this crash (though the application running that uses this database uses a lot of SELECT ... LOCK IN SHARE MODE and SELECT ... FOR UPDATE queries).

Edit: This crash seems to happen every hour or so. It seems to coincide with a significant increase in database traffic involving a job queue table backed by InnoDB, where the job queue started being processed.

Andrew Sun added a comment - 2020-05-17 20:37 - edited I encountered this assertion failure on a busy database running 10.4.12 (in production-ish): mariadb-ssd_1 | 2020-05-17 19:01:16 0x7faab5457700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.4.12/storage/innobase/lock/lock0lock.cc line 6901 mariadb-ssd_1 | InnoDB: We intentionally generate a memory trap. mariadb-ssd_1 | InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ mariadb-ssd_1 | InnoDB: If you get repeated assertion failures or crashes, even mariadb-ssd_1 | InnoDB: immediately after the mysqld startup, there may be mariadb-ssd_1 | InnoDB: corruption in the InnoDB tablespace. Please refer to mariadb-ssd_1 | InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/ mariadb-ssd_1 | InnoDB: about forcing recovery. mariadb-ssd_1 | 200517 19:01:16 [ERROR] mysqld got signal 6 ; mariadb-ssd_1 | This could be because you hit a bug. It is also possible that this binary mariadb-ssd_1 | or one of the libraries it was linked against is corrupt, improperly built, mariadb-ssd_1 | or misconfigured. This error can also be caused by malfunctioning hardware. mariadb-ssd_1 | mariadb-ssd_1 | To report this bug, see https://mariadb.com/kb/en/reporting-bugs mariadb-ssd_1 | mariadb-ssd_1 | We will try our best to scrape up some info that will hopefully help mariadb-ssd_1 | diagnose the problem, but since we have already crashed, mariadb-ssd_1 | something is definitely wrong and this may fail. mariadb-ssd_1 | mariadb-ssd_1 | Server version: 10.4.12-MariaDB-1:10.4.12+maria~bionic mariadb-ssd_1 | key_buffer_size=134217728 mariadb-ssd_1 | read_buffer_size=2097152 mariadb-ssd_1 | max_used_connections=67 mariadb-ssd_1 | max_threads=2050 mariadb-ssd_1 | thread_count=73 mariadb-ssd_1 | It is possible that mysqld could use up to mariadb-ssd_1 | key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 12776904 K bytes of memory mariadb-ssd_1 | Hope that's ok; if not, decrease some variables in the equation. mariadb-ssd_1 | mariadb-ssd_1 | Thread pointer: 0x7fa9ec412008 mariadb-ssd_1 | Attempting backtrace. You can use the following information to find out mariadb-ssd_1 | where mysqld died. If you see no messages after this, something went mariadb-ssd_1 | terribly wrong... mariadb-ssd_1 | stack_bottom = 0x7faab5456d98 thread_stack 0x49000 mariadb-ssd_1 | *** buffer overflow detected ***: mysqld terminated mariadb-ssd_1 | Fatal signal 11 while backtracing It is unclear whether this happened each time the database server has previously crashed, nor is it clear exactly what caused this crash (though the application running that uses this database uses a lot of SELECT ... LOCK IN SHARE MODE and SELECT ... FOR UPDATE queries). Edit: This crash seems to happen every hour or so. It seems to coincide with a significant increase in database traffic involving a job queue table backed by InnoDB, where the job queue started being processed.

Andrew Sun added a comment - 2020-05-21 02:09

MariaDB seems to have finally gotten past segfaulting after the assertion failure, and I found this in the logs that may be relevant to the investigation of this bug: https://gist.github.com/as-com/6ecd0f0951efd187862e07b042573985

We should probably escalate the priority of this issue since it's actively crashing every 30 seconds to 2 hours on a production server.

Andrew Sun added a comment - 2020-05-21 02:09 MariaDB seems to have finally gotten past segfaulting after the assertion failure, and I found this in the logs that may be relevant to the investigation of this bug: https://gist.github.com/as-com/6ecd0f0951efd187862e07b042573985 We should probably escalate the priority of this issue since it's actively crashing every 30 seconds to 2 hours on a production server.

Nayuta Yanagisawa (Inactive) added a comment - 2020-11-13 19:43 - edited

Reproduced on mariadb-10.4.3, not on mariadb-10.4.2. However, this does not provide much information because the application-time period is introduced by 10.4.3.

Nayuta Yanagisawa (Inactive) added a comment - 2020-11-13 19:43 - edited Reproduced on mariadb-10.4.3, not on mariadb-10.4.2. However, this does not provide much information because the application-time period is introduced by 10.4.3.

Marko Mäkelä added a comment - 2021-05-07 10:07

I checked the test case with rr replay. It is pretty simple to see with a watchpoint on trx_t::state:

10.5
#5 0x000055894ca22572 in trx_t::rollback (this=0x7f72e5fb6360, savept=0x0)
at /mariadb/10.5m/storage/innobase/trx/trx0roll.cc:178
#6 0x000055894c973084 in row_mysql_handle_errors (new_err=new_err@entry=0x7f72d50adbcc,
trx=trx@entry=0x7f72e5fb6360, thr=thr@entry=0x7f72c400b838, savept=savept@entry=0x0)
at /mariadb/10.5m/storage/innobase/row/row0mysql.cc:733
#7 0x000055894c9bc85b in row_search_mvcc (buf=<optimized out>,
buf@entry=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", mode=PAGE_CUR_G, mode@entry=PAGE_CUR_UNSUPP, prebuilt=0x7f72c400ae88, match_mode=<optimized out>,
match_mode@entry=0, direction=<optimized out>, direction@entry=1)
at /mariadb/10.5m/storage/innobase/row/row0sel.cc:5691
#8 0x000055894c7fbffe in ha_innobase::general_fetch (this=0x7f72c41f8400,
buf=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177",
direction=1, match_mode=0) at /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:9116
#9 0x000055894c4c57d5 in handler::ha_rnd_next (this=0x7f72c41f8400,
buf=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177")
at /mariadb/10.5m/sql/handler.cc:3066
#10 0x000055894c639f26 in rr_sequential (info=0x7f72d50ae898) at /mariadb/10.5m/sql/records.cc:519
#11 0x000055894c652506 in READ_RECORD::read_record (this=0x7f72d50ae898)
at /mariadb/10.5m/sql/records.h:80
#12 mysql_delete (thd=thd@entry=0x7f72c0001af8, table_list=<optimized out>, conds=<optimized out>,
order_list=<optimized out>, order_list@entry=0x7f72c0006750, limit=<optimized out>, options=0,
result=<optimized out>) at /mariadb/10.5m/sql/sql_delete.cc:721
#13 0x000055894c267791 in mysql_execute_command (thd=thd@entry=0x7f72c0001af8)
at /mariadb/10.5m/sql/sql_parse.cc:4870
#14 0x000055894c25d3b0 in mysql_parse (thd=thd@entry=0x7f72c0001af8,
rawbuf=0x7f72c0013b80 "DELETE FROM t1 FOR PORTION OF app FROM '2004-10-21 12:00:00.000' TO '2020-03-19 19:50:10.187'", length=<optimized out>, parser_state=<optimized out>,
parser_state@entry=0x7f72d50af620, is_com_multi=false, is_next_command=<optimized out>)
at /mariadb/10.5m/sql/sql_parse.cc:8099

A DB_DEADLOCK error results in the entire transaction being rolled back. But, the SQL layer ignored this error and attempted to carry on as if nothing had happened:

#4  0x000055894c9b801f in row_search_mvcc (buf=<optimized out>,

    buf@entry=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", mode=mode@entry=PAGE_CUR_GE, prebuilt=0x7f72c400ae88, match_mode=<optimized out>,

    direction=<optimized out>, direction@entry=0) at /mariadb/10.5m/storage/innobase/row/row0sel.cc:4546

#5  0x000055894c7fbbfb in ha_innobase::index_read (this=this@entry=0x7f72c41f8400,

    buf=buf@entry=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", key_ptr=key_ptr@entry=0x7f72c0008de8 "", key_len=<optimized out>,

    find_flag=find_flag@entry=HA_READ_KEY_EXACT)

    at /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:8858

#6  0x000055894c7fc3d6 in ha_innobase::rnd_pos (this=0x7f72c41f8400,

    buf=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177",

    pos=0x7f72c0008de8 "") at /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:9347

#7  0x000055894c4c5bd1 in handler::ha_rnd_pos (this=0x7f72c41f8400, buf=<optimized out>,

    pos=0x7f72c0008de8 "") at /mariadb/10.5m/sql/handler.cc:3098

#8  0x000055894c639ea4 in rr_from_pointers (info=0x7f72d50ae898) at /mariadb/10.5m/sql/records.cc:615

#9  0x000055894c6528b7 in READ_RECORD::read_record (this=0x7f72d50ae898)

    at /mariadb/10.5m/sql/records.h:80

#10 mysql_delete (thd=thd@entry=0x7f72c0001af8, table_list=<optimized out>, conds=<optimized out>,

    order_list=<optimized out>, order_list@entry=0x7f72c0006750, limit=18446744073709551615, options=0,

    result=<optimized out>) at /mariadb/10.5m/sql/sql_delete.cc:772

#11 0x000055894c267791 in mysql_execute_command (thd=thd@entry=0x7f72c0001af8)

    at /mariadb/10.5m/sql/sql_parse.cc:4870

Here, the debug assertion would fail:

	ut_ad(prebuilt->sql_stat_start

	      || trx->state == TRX_STATE_ACTIVE

	      || (prebuilt->table->no_rollback()

		  && trx->state == TRX_STATE_NOT_STARTED));

The problem should be in mysql_delete(). It is easier to provide a patch than to explain what is going wrong:

diff --git a/sql/sql_delete.cc b/sql/sql_delete.cc

index cf411d01f38..358887afd2b 100644

--- a/sql/sql_delete.cc

+++ b/sql/sql_delete.cc

@@ -718,9 +718,18 @@ bool mysql_delete(THD *thd, TABLE_LIST *table_list, COND *conds,

                                              MEM_STRIP_BUF_SIZE);

     THD_STAGE_INFO(thd, stage_searching_rows_for_update);

-    while (!(error=info.read_record()) && !thd->killed &&

-          ! thd->is_error())

+    for (;;)

+      error= info.read_record();

+      if (error)

+      {

+        error= 1;

+        goto terminate_delete;

+      }

+      if (thd->killed || thd->is_error())

+        break;

       if (record_should_be_deleted(thd, table, select, explain, delete_history))

         table->file->position(table->record[0]);

With this fix, the test no longer crashes. But the transaction will have been terminated due to deadlock, not lock wait timeout:

mysqltest: At line 19: query 'reap' failed with wrong errno 1213: 'Deadlock found when trying to get lock; try restarting transaction', instead of 1205...

Marko Mäkelä added a comment - 2021-05-07 10:07 I checked the test case with rr replay . It is pretty simple to see with a watchpoint on trx_t::state : 10.5 #5 0x000055894ca22572 in trx_t::rollback (this=0x7f72e5fb6360, savept=0x0) at /mariadb/10.5m/storage/innobase/trx/trx0roll.cc:178 #6 0x000055894c973084 in row_mysql_handle_errors (new_err=new_err@entry=0x7f72d50adbcc, trx=trx@entry=0x7f72e5fb6360, thr=thr@entry=0x7f72c400b838, savept=savept@entry=0x0) at /mariadb/10.5m/storage/innobase/row/row0mysql.cc:733 #7 0x000055894c9bc85b in row_search_mvcc (buf=<optimized out>, buf@entry=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", mode=PAGE_CUR_G, mode@entry=PAGE_CUR_UNSUPP, prebuilt=0x7f72c400ae88, match_mode=<optimized out>, match_mode@entry=0, direction=<optimized out>, direction@entry=1) at /mariadb/10.5m/storage/innobase/row/row0sel.cc:5691 #8 0x000055894c7fbffe in ha_innobase::general_fetch (this=0x7f72c41f8400, buf=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", direction=1, match_mode=0) at /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:9116 #9 0x000055894c4c57d5 in handler::ha_rnd_next (this=0x7f72c41f8400, buf=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177") at /mariadb/10.5m/sql/handler.cc:3066 #10 0x000055894c639f26 in rr_sequential (info=0x7f72d50ae898) at /mariadb/10.5m/sql/records.cc:519 #11 0x000055894c652506 in READ_RECORD::read_record (this=0x7f72d50ae898) at /mariadb/10.5m/sql/records.h:80 #12 mysql_delete (thd=thd@entry=0x7f72c0001af8, table_list=<optimized out>, conds=<optimized out>, order_list=<optimized out>, order_list@entry=0x7f72c0006750, limit=<optimized out>, options=0, result=<optimized out>) at /mariadb/10.5m/sql/sql_delete.cc:721 #13 0x000055894c267791 in mysql_execute_command (thd=thd@entry=0x7f72c0001af8) at /mariadb/10.5m/sql/sql_parse.cc:4870 #14 0x000055894c25d3b0 in mysql_parse (thd=thd@entry=0x7f72c0001af8, rawbuf=0x7f72c0013b80 "DELETE FROM t1 FOR PORTION OF app FROM '2004-10-21 12:00:00.000' TO '2020-03-19 19:50:10.187'", length=<optimized out>, parser_state=<optimized out>, parser_state@entry=0x7f72d50af620, is_com_multi=false, is_next_command=<optimized out>) at /mariadb/10.5m/sql/sql_parse.cc:8099 A DB_DEADLOCK error results in the entire transaction being rolled back. But, the SQL layer ignored this error and attempted to carry on as if nothing had happened: #4 0x000055894c9b801f in row_search_mvcc (buf=<optimized out>, buf@entry=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", mode=mode@entry=PAGE_CUR_GE, prebuilt=0x7f72c400ae88, match_mode=<optimized out>, direction=<optimized out>, direction@entry=0) at /mariadb/10.5m/storage/innobase/row/row0sel.cc:4546 #5 0x000055894c7fbbfb in ha_innobase::index_read (this=this@entry=0x7f72c41f8400, buf=buf@entry=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", key_ptr=key_ptr@entry=0x7f72c0008de8 "", key_len=<optimized out>, find_flag=find_flag@entry=HA_READ_KEY_EXACT) at /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:8858 #6 0x000055894c7fc3d6 in ha_innobase::rnd_pos (this=0x7f72c41f8400, buf=0x7f72c41f92d8 "\377!\240\017!\310\017", '\245' <repeats 17 times>, "\b\223\037\304r\177", pos=0x7f72c0008de8 "") at /mariadb/10.5m/storage/innobase/handler/ha_innodb.cc:9347 #7 0x000055894c4c5bd1 in handler::ha_rnd_pos (this=0x7f72c41f8400, buf=<optimized out>, pos=0x7f72c0008de8 "") at /mariadb/10.5m/sql/handler.cc:3098 #8 0x000055894c639ea4 in rr_from_pointers (info=0x7f72d50ae898) at /mariadb/10.5m/sql/records.cc:615 #9 0x000055894c6528b7 in READ_RECORD::read_record (this=0x7f72d50ae898) at /mariadb/10.5m/sql/records.h:80 #10 mysql_delete (thd=thd@entry=0x7f72c0001af8, table_list=<optimized out>, conds=<optimized out>, order_list=<optimized out>, order_list@entry=0x7f72c0006750, limit=18446744073709551615, options=0, result=<optimized out>) at /mariadb/10.5m/sql/sql_delete.cc:772 #11 0x000055894c267791 in mysql_execute_command (thd=thd@entry=0x7f72c0001af8) at /mariadb/10.5m/sql/sql_parse.cc:4870 Here, the debug assertion would fail: ut_ad(prebuilt->sql_stat_start || trx->state == TRX_STATE_ACTIVE || (prebuilt->table->no_rollback() && trx->state == TRX_STATE_NOT_STARTED)); The problem should be in mysql_delete() . It is easier to provide a patch than to explain what is going wrong: diff --git a/sql/sql_delete.cc b/sql/sql_delete.cc index cf411d01f38..358887afd2b 100644 --- a/sql/sql_delete.cc +++ b/sql/sql_delete.cc @@ -718,9 +718,18 @@ bool mysql_delete(THD *thd, TABLE_LIST *table_list, COND *conds, MEM_STRIP_BUF_SIZE); THD_STAGE_INFO(thd, stage_searching_rows_for_update); - while (!(error=info.read_record()) && !thd->killed && - ! thd->is_error()) + for (;;) { + error= info.read_record(); + if (error) + { + error= 1; + goto terminate_delete; + } + + if (thd->killed || thd->is_error()) + break; + if (record_should_be_deleted(thd, table, select, explain, delete_history)) { table->file->position(table->record[0]); With this fix, the test no longer crashes. But the transaction will have been terminated due to deadlock, not lock wait timeout: mysqltest: At line 19: query 'reap' failed with wrong errno 1213: 'Deadlock found when trying to get lock; try restarting transaction', instead of 1205...

Marko Mäkelä added a comment - 2021-05-07 10:17

My crude fix would cause several regression tests to fail, including main.subselect. It hopefully gives an idea how this should be fixed. The API constraint always was that after a terminal error (such as ER_LOCK_DEADLOCK), no further row operations must be executed in the same thread before the transaction has been restarted. For DELETE, this could have been broken already in 10.3 with system-versioned tables (if not earlier). For versioned tables, DELETE would execute two operations for each affected row.

Marko Mäkelä added a comment - 2021-05-07 10:17 My crude fix would cause several regression tests to fail, including main.subselect . It hopefully gives an idea how this should be fixed. The API constraint always was that after a terminal error (such as ER_LOCK_DEADLOCK ), no further row operations must be executed in the same thread before the transaction has been restarted. For DELETE , this could have been broken already in 10.3 with system-versioned tables (if not earlier). For versioned tables, DELETE would execute two operations for each affected row.

Elena Stepanova added a comment - 2024-02-06 13:52

At least the debug failure is still reproducible on all existing branches (10.4-11.4). I'm not sure about release builds, those may be already fixed, or at least the probability of the failure decreased.

Elena Stepanova added a comment - 2024-02-06 13:52 At least the debug failure is still reproducible on all existing branches (10.4-11.4). I'm not sure about release builds, those may be already fixed, or at least the probability of the failure decreased.

People

Assignee:: Oleksandr Byelkin

Reporter:: Elena Stepanova

Votes:: 5 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 2020-03-19 19:10

Updated:: 2024-11-22 17:11

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server