[MDEV-20775] InnoDB: Failing assertion: !page_zip || page_zip_validate(page_zip, page, index) - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.2.28
Fix Version/s: 10.5.1, 10.2.32, 10.3.23, 10.4.13
Component/s: Storage Engine - InnoDB
Labels:
None
Environment:
Ubuntu 17.04 but most probably unimportant

Description

The assert was observed on the trees

-    10.2 commit 27664ef29d97be2973982aedcc0c8c90d18b20a4 2019-10-08

and

-     bb-10.2-merge commit d9f224233497ce5e3706fe6539a2b711b491af1d 2019-10-08

Hitting that assert is very rare (one per ~ 700 RQG test runs).

The snippets of logs belong to the test run on bb-10.2-merge which contains

- a fix for https://jira.mariadb.org/browse/MDEV-19335 (fix is currently in review)

- some merge from 10.1

10.2.28-MariaDB-debug-log

2019-10-08 13:29:21 0x7f3bbc51b700  InnoDB: Assertion failure in file storage/innobase/btr/btr0cur.cc line 1434

InnoDB: Failing assertion: !page_zip || page_zip_validate(page_zip, page, index)

...

Query (0x7f3b140104e0): UPDATE `t7` SET `pad` = 'o' WHERE `id` = 2942 /* E_R Thread2 QNO 1599 CON_ID 20 */

Connection ID (thread ID): 20

Status: NOT_KILLED

#3  <signal handler called>

#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58

#5  0x00007f3bc40ad37a in __GI_abort () at abort.c:89

#6  0x000055c433c9e014 in ut_dbg_assertion_failed (expr=0x55c4342cfe10 "!page_zip || page_zip_validate(page_zip, page, index)", file=0x55c4342cf0a0 "storage/innobase/btr/btr0cur.cc", line=1434) at storage/innobase/ut/ut0dbg.cc:60

#7  0x000055c433cc8e39 in btr_cur_search_to_nth_level (index=0x7f3b0c042d18, level=0, tuple=0x7f3b14060058, mode=PAGE_CUR_GE, latch_mode=1, cursor=0x7f3b1405fe68, has_search_latch=0, file=0x55c434282818 "storage/innobase/row/row0sel.cc", line=4724, mtr=0x7f3bbc5188d0, autoinc=0) at storage/innobase/btr/btr0cur.cc:1434

#8  0x000055c433bfb479 in btr_pcur_open_with_no_init_func (index=0x7f3b0c042d18, tuple=0x7f3b14060058, mode=PAGE_CUR_GE, latch_mode=1, cursor=0x7f3b1405fe68, has_search_latch=0, file=0x55c434282818 "storage/innobase/row/row0sel.cc", line=4724, mtr=0x7f3bbc5188d0) at storage/innobase/include/btr0pcur.ic:527

#9  0x000055c433c05f52 in row_search_mvcc (buf=0x7f3b14022fe8 "\377", mode=PAGE_CUR_GE, prebuilt=0x7f3b1405fc98, match_mode=1, direction=0) at storage/innobase/row/row0sel.cc:4722

#10 0x000055c433a9488f in ha_innobase::index_read (this=0x7f3b14029130, buf=0x7f3b14022fe8 "\377", key_ptr=0x7f3b140556d0 "~\v", key_len=4, find_flag=HA_READ_KEY_EXACT) at storage/innobase/handler/ha_innodb.cc:9471

#11 0x000055c433890b4c in handler::index_read_map (this=0x7f3b14029130, buf=0x7f3b14022fe8 "\377", key=0x7f3b140556d0 "~\v", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at sql/handler.h:3136

#12 0x000055c433886d8a in handler::ha_index_read_map (this=0x7f3b14029130, buf=0x7f3b14022fe8 "\377", key=0x7f3b140556d0 "~\v", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at sql/handler.cc:2711

#13 0x000055c43388d962 in handler::read_range_first (this=0x7f3b14029130, start_key=0x7f3b14029218, end_key=0x7f3b14029238, eq_range_arg=true, sorted=true) at sql/handler.cc:5458

#14 0x000055c43379cbc6 in handler::multi_range_read_next (this=0x7f3b14029130, range_info=0x7f3bbc519110) at sql/multi_range_read.cc:298

#15 0x000055c43379ccff in Mrr_simple_index_reader::get_next (this=0x7f3b140296a0, range_info=0x7f3bbc519110) at sql/multi_range_read.cc:335

#16 0x000055c43379f840 in DsMrr_impl::dsmrr_next (this=0x7f3b14029560, range_info=0x7f3bbc519110) at sql/multi_range_read.cc:1411

#17 0x000055c433aa984e in ha_innobase::multi_range_read_next (this=0x7f3b14029130, range_info=0x7f3bbc519110) at storage/innobase/handler/ha_innodb.cc:21332

#18 0x000055c4339df8d2 in QUICK_RANGE_SELECT::get_next (this=0x7f3b1401ada0) at sql/opt_range.cc:11473

#19 0x000055c4339f0882 in rr_quick (info=0x7f3bbc519350) at sql/records.cc:365

#20 0x000055c4336f219b in mysql_update (thd=0x7f3b14000a98, table_list=0x7f3b14010640, fields=..., values=..., conds=0x7f3b14010fe8, order_num=0, order=0x0, limit=18446744073709551615, handle_duplicates=DUP_ERROR, ignore=false, found_return=0x7f3bbc5198c0, updated_return=0x7f3bbc519970) at sql/sql_update.cc:749

#21 0x000055c433602a1f in mysql_execute_command (thd=0x7f3b14000a98) at sql/sql_parse.cc:4014

#22 0x000055c43360e8a0 in mysql_parse (thd=0x7f3b14000a98, rawbuf=0x7f3b140104e0 "UPDATE `t7` SET `pad` = 'o' WHERE `id` = 2942 /* E_R Thread2 QNO 1599 CON_ID 20 */", length=82, parser_state=0x7f3bbc51a250, is_com_multi=false, is_next_command=false) at sql/sql_parse.cc:7759

#23 0x000055c4335fcba9 in dispatch_command (command=COM_QUERY, thd=0x7f3b14000a98, packet=0x7f3b140081a9 "UPDATE `t7` SET `pad` = 'o' WHERE `id` = 2942 /* E_R Thread2 QNO 1599 CON_ID 20 */ ", packet_length=83, is_com_multi=false, is_next_command=false) at sql/sql_parse.cc:1831

#24 0x000055c4335fb4fd in do_command (thd=0x7f3b14000a98) at sql/sql_parse.cc:1384

#25 0x000055c433750957 in do_handle_one_connection (connect=0x55c4372aed98) at sql/sql_connect.cc:1336

#26 0x000055c4337506c2 in handle_one_connection (arg=0x55c4372aed98) at sql/sql_connect.cc:1241

#27 0x00007f3bc4cec6da in start_thread (arg=0x7f3bbc51b700) at pthread_create.c:456

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

simp_page_zip_validate_rqg.log
2020-01-14 14:16
312 kB
Matthias Leich
simp_page_zip_validate.cfg
2020-01-14 14:16
52 kB
Matthias Leich
simp_page_zip_validate.yy
2020-01-14 14:16
0.6 kB
Matthias Leich
simp_page_zip_validate.zz
2020-01-14 14:16
1 kB
Matthias Leich

Issue Links

is caused by

MDEV-6076 Persistent AUTO_INCREMENT for InnoDB

Closed

Activity

Ascending order - Click to sort in descending order

Matthias Leich added a comment - 2019-10-31 12:48

I have hit that assert on 10.5 too.

Matthias Leich added a comment - 2019-10-31 12:48 I have hit that assert on 10.5 too.

Matthias Leich added a comment - 2020-01-14 14:29 - edited

simp_page_zip_validate.* RQG files for replaying the problem

simp_page_zip_validate_rqg.log    Protocol (includes backtrace) of my RQG run against

                                                                       10.5 commit 8576a7bacfaa4f7ac3ec86cff436e0d6e7bb3a7d

Only two concurrent sessions are required.

One runs random

- (frequent) DELETE FROM oltp3 LIMIT 2 |

- (not often) FLUSH TABLE oltp3 FOR EXPORT

- (often) UNLOCK TABLES

The second runs random just

   SELECT SUM(k) FROM oltp3 WHERE id BETWEEN _smallint_unsigned AND _smallint_unsigned ;

After short time (~ 50s) InnoDB reports into the server error that there is a deadlock

and than DB server aborts with a core.

The uploaded test does not replay the assert on

- 10.2 commit 7d313214641241e8ce839d9de01529c2335c620f 2020-01-10

- 10.3 commit d531b4ee3a9bcd89a2fa6b49a2207eaf966f53e3 2020-01-12

- 10.4 commit 2d4b6571ecf15a4382702d26df9439bc5a1005de 2020-01-14

In case I memorize correct than the assert in the bug title occured at

several places in InnoDB code.

During the automatic+manual RQG grammar the variety of SQL shrinks

and by that also the places in InnoDB code where the assert was hit.

Please do not work on this bug.

Caused by some mistake in the setup of the simplifier the uploaded test

replays some other bug.

Matthias Leich added a comment - 2020-01-14 14:29 - edited simp_page_zip_validate.* RQG files for replaying the problem simp_page_zip_validate_rqg.log Protocol (includes backtrace) of my RQG run against 10.5 commit 8576a7bacfaa4f7ac3ec86cff436e0d6e7bb3a7d Only two concurrent sessions are required. One runs random - (frequent) DELETE FROM oltp3 LIMIT 2 | - (not often) FLUSH TABLE oltp3 FOR EXPORT - (often) UNLOCK TABLES The second runs random just SELECT SUM(k) FROM oltp3 WHERE id BETWEEN _smallint_unsigned AND _smallint_unsigned ; After short time (~ 50s) InnoDB reports into the server error that there is a deadlock and than DB server aborts with a core. The uploaded test does not replay the assert on - 10.2 commit 7d313214641241e8ce839d9de01529c2335c620f 2020-01-10 - 10.3 commit d531b4ee3a9bcd89a2fa6b49a2207eaf966f53e3 2020-01-12 - 10.4 commit 2d4b6571ecf15a4382702d26df9439bc5a1005de 2020-01-14 In case I memorize correct than the assert in the bug title occured at several places in InnoDB code. During the automatic+manual RQG grammar the variety of SQL shrinks and by that also the places in InnoDB code where the assert was hit. Please do not work on this bug. Caused by some mistake in the setup of the simplifier the uploaded test replays some other bug.

Marko Mäkelä added a comment - 2020-01-23 11:08

When testing ~~MDEV-12353~~, I accidentally ran into a failure like this. I believe that there is a race condition between page reads that call page_zip_validate() (with page S-latch) and writes that update PAGE_ROOT_AUTO_INC (with page SX-latch). That writer code was introduced in ~~MDEV-6076~~.

The following patch will ignore the PAGE_ROOT_AUTO_INC when comparing the two clustered index root page images (page number 3):

diff --git a/storage/innobase/page/page0zip.cc b/storage/innobase/page/page0zip.cc

index 3211c8fad83..40e3909e2f2 100644

--- a/storage/innobase/page/page0zip.cc

+++ b/storage/innobase/page/page0zip.cc

@@ -3310,7 +3310,13 @@ page_zip_validate_low(

 		   FIL_PAGE_LSN - FIL_PAGE_PREV)

 	    || memcmp(page_zip->data + FIL_PAGE_TYPE, page + FIL_PAGE_TYPE, 2)

 	    || memcmp(page_zip->data + FIL_PAGE_DATA, page + FIL_PAGE_DATA,

-		      PAGE_DATA - FIL_PAGE_DATA)) {

+		      PAGE_ROOT_AUTO_INC)

+	    || (page_get_page_no(page) != 3

+		&& memcmp(&page_zip->data[FIL_PAGE_DATA + PAGE_ROOT_AUTO_INC],

+			  &page[FIL_PAGE_DATA + PAGE_ROOT_AUTO_INC], 8))

+	    || memcmp(&page_zip->data[FIL_PAGE_DATA + PAGE_HEADER_PRIV_END],

+		      &page[FIL_PAGE_DATA + PAGE_HEADER_PRIV_END],

+		      PAGE_DATA - FIL_PAGE_DATA - PAGE_HEADER_PRIV_END)) {

 		page_zip_fail(("page_zip_validate: page header\n"));

 		page_zip_hexdump(page_zip, sizeof *page_zip);

 		page_zip_hexdump(page_zip->data, page_zip_get_size(page_zip));

I successfully tested it on a 10.5-based ~~MDEV-12353~~ branch with the following test, adapted from innodb_zip.wl5522_zip:

-- source include/have_innodb.inc

set @save_limit= @@innodb_limit_optimistic_insert_debug;

set global innodb_limit_optimistic_insert_debug=2;

CREATE TABLE t1

(a INT AUTO_INCREMENT PRIMARY KEY,

 b char(22),

 c varchar(255),

 KEY (b))

ENGINE = InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;

insert into t1 (b, c) values ('Apa', 'Filler........'),

('Banan', 'Filler........'), ('Cavalry', '..asdasdfaeraf'),

('Devotion', 'asdfuihknaskdf'), ('Evolution', 'lsjndofiabsoibeg');

INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;

INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;

INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;

INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;

INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;

INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;

INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;

DROP TABLE t1;

set global innodb_limit_optimistic_insert_debug=@save_limit;

./mtr --mysqld=--innodb-page-size=4k --parallel=auto --repeat=1000 innodb_zip.wz{,,,,,,,,,,,}

On my system, the test would typically crash on the first round already. With the fix, it already completed 20 or 21 parallel rounds (of 14 seconds each) without failures.

mleich, do you still see the failures with this fix? And was it always related to a table that contains an AUTO_INCREMENT column?

Marko Mäkelä added a comment - 2020-01-23 11:08 When testing MDEV-12353 , I accidentally ran into a failure like this. I believe that there is a race condition between page reads that call page_zip_validate() (with page S-latch) and writes that update PAGE_ROOT_AUTO_INC (with page SX-latch). That writer code was introduced in MDEV-6076 . The following patch will ignore the PAGE_ROOT_AUTO_INC when comparing the two clustered index root page images (page number 3): diff --git a/storage/innobase/page/page0zip.cc b/storage/innobase/page/page0zip.cc index 3211c8fad83..40e3909e2f2 100644 --- a/storage/innobase/page/page0zip.cc +++ b/storage/innobase/page/page0zip.cc @@ -3310,7 +3310,13 @@ page_zip_validate_low( FIL_PAGE_LSN - FIL_PAGE_PREV) || memcmp(page_zip->data + FIL_PAGE_TYPE, page + FIL_PAGE_TYPE, 2) || memcmp(page_zip->data + FIL_PAGE_DATA, page + FIL_PAGE_DATA, - PAGE_DATA - FIL_PAGE_DATA)) { + PAGE_ROOT_AUTO_INC) + || (page_get_page_no(page) != 3 + && memcmp(&page_zip->data[FIL_PAGE_DATA + PAGE_ROOT_AUTO_INC], + &page[FIL_PAGE_DATA + PAGE_ROOT_AUTO_INC], 8)) + || memcmp(&page_zip->data[FIL_PAGE_DATA + PAGE_HEADER_PRIV_END], + &page[FIL_PAGE_DATA + PAGE_HEADER_PRIV_END], + PAGE_DATA - FIL_PAGE_DATA - PAGE_HEADER_PRIV_END)) { page_zip_fail(("page_zip_validate: page header\n")); page_zip_hexdump(page_zip, sizeof *page_zip); page_zip_hexdump(page_zip->data, page_zip_get_size(page_zip)); I successfully tested it on a 10.5-based MDEV-12353 branch with the following test, adapted from innodb_zip.wl5522_zip : -- source include/have_innodb.inc set @save_limit= @@innodb_limit_optimistic_insert_debug; set global innodb_limit_optimistic_insert_debug=2; CREATE TABLE t1 (a INT AUTO_INCREMENT PRIMARY KEY , b char (22), c varchar (255), KEY (b)) ENGINE = InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1; insert into t1 (b, c) values ( 'Apa' , 'Filler........' ), ( 'Banan' , 'Filler........' ), ( 'Cavalry' , '..asdasdfaeraf' ), ( 'Devotion' , 'asdfuihknaskdf' ), ( 'Evolution' , 'lsjndofiabsoibeg' ); INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a; INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a; INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a; INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a; INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a; INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a; INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a; DROP TABLE t1; set global innodb_limit_optimistic_insert_debug=@save_limit; ./mtr --mysqld=--innodb-page-size=4k --parallel=auto --repeat=1000 innodb_zip.wz{,,,,,,,,,,,} On my system, the test would typically crash on the first round already. With the fix, it already completed 20 or 21 parallel rounds (of 14 seconds each) without failures. mleich , do you still see the failures with this fix? And was it always related to a table that contains an AUTO_INCREMENT column?

MariaDB Server

InnoDB: Failing assertion: !page_zip || page_zip_validate(page_zip, page, index)

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration