[MDEV-20775] InnoDB: Failing assertion: !page_zip || page_zip_validate(page_zip, page, index) Created: 2019-10-08  Updated: 2020-01-23  Resolved: 2020-01-23

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.28
Fix Version/s: 10.5.1, 10.2.32, 10.3.23, 10.4.13

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: None
Environment:

Ubuntu 17.04 but most probably unimportant


Attachments: File simp_page_zip_validate.cfg     File simp_page_zip_validate.yy     File simp_page_zip_validate.zz     File simp_page_zip_validate_rqg.log    
Issue Links:
Problem/Incident
is caused by MDEV-6076 Persistent AUTO_INCREMENT for InnoDB Closed

 Description   

The assert was observed on the trees
-    10.2 commit 27664ef29d97be2973982aedcc0c8c90d18b20a4 2019-10-08
    and
-     bb-10.2-merge commit d9f224233497ce5e3706fe6539a2b711b491af1d 2019-10-08
Hitting that assert is very rare (one per ~ 700 RQG test runs).
 
The snippets of logs belong to the test run on bb-10.2-merge which contains
- a fix for https://jira.mariadb.org/browse/MDEV-19335 (fix is currently in review)
- some merge from 10.1
 
10.2.28-MariaDB-debug-log
2019-10-08 13:29:21 0x7f3bbc51b700  InnoDB: Assertion failure in file storage/innobase/btr/btr0cur.cc line 1434
InnoDB: Failing assertion: !page_zip || page_zip_validate(page_zip, page, index)
...
Query (0x7f3b140104e0): UPDATE `t7` SET `pad` = 'o' WHERE `id` = 2942 /* E_R Thread2 QNO 1599 CON_ID 20 */
Connection ID (thread ID): 20
Status: NOT_KILLED
 
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#5  0x00007f3bc40ad37a in __GI_abort () at abort.c:89
#6  0x000055c433c9e014 in ut_dbg_assertion_failed (expr=0x55c4342cfe10 "!page_zip || page_zip_validate(page_zip, page, index)", file=0x55c4342cf0a0 "storage/innobase/btr/btr0cur.cc", line=1434) at storage/innobase/ut/ut0dbg.cc:60
#7  0x000055c433cc8e39 in btr_cur_search_to_nth_level (index=0x7f3b0c042d18, level=0, tuple=0x7f3b14060058, mode=PAGE_CUR_GE, latch_mode=1, cursor=0x7f3b1405fe68, has_search_latch=0, file=0x55c434282818 "storage/innobase/row/row0sel.cc", line=4724, mtr=0x7f3bbc5188d0, autoinc=0) at storage/innobase/btr/btr0cur.cc:1434
#8  0x000055c433bfb479 in btr_pcur_open_with_no_init_func (index=0x7f3b0c042d18, tuple=0x7f3b14060058, mode=PAGE_CUR_GE, latch_mode=1, cursor=0x7f3b1405fe68, has_search_latch=0, file=0x55c434282818 "storage/innobase/row/row0sel.cc", line=4724, mtr=0x7f3bbc5188d0) at storage/innobase/include/btr0pcur.ic:527
#9  0x000055c433c05f52 in row_search_mvcc (buf=0x7f3b14022fe8 "\377", mode=PAGE_CUR_GE, prebuilt=0x7f3b1405fc98, match_mode=1, direction=0) at storage/innobase/row/row0sel.cc:4722
#10 0x000055c433a9488f in ha_innobase::index_read (this=0x7f3b14029130, buf=0x7f3b14022fe8 "\377", key_ptr=0x7f3b140556d0 "~\v", key_len=4, find_flag=HA_READ_KEY_EXACT) at storage/innobase/handler/ha_innodb.cc:9471
#11 0x000055c433890b4c in handler::index_read_map (this=0x7f3b14029130, buf=0x7f3b14022fe8 "\377", key=0x7f3b140556d0 "~\v", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at sql/handler.h:3136
#12 0x000055c433886d8a in handler::ha_index_read_map (this=0x7f3b14029130, buf=0x7f3b14022fe8 "\377", key=0x7f3b140556d0 "~\v", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at sql/handler.cc:2711
#13 0x000055c43388d962 in handler::read_range_first (this=0x7f3b14029130, start_key=0x7f3b14029218, end_key=0x7f3b14029238, eq_range_arg=true, sorted=true) at sql/handler.cc:5458
#14 0x000055c43379cbc6 in handler::multi_range_read_next (this=0x7f3b14029130, range_info=0x7f3bbc519110) at sql/multi_range_read.cc:298
#15 0x000055c43379ccff in Mrr_simple_index_reader::get_next (this=0x7f3b140296a0, range_info=0x7f3bbc519110) at sql/multi_range_read.cc:335
#16 0x000055c43379f840 in DsMrr_impl::dsmrr_next (this=0x7f3b14029560, range_info=0x7f3bbc519110) at sql/multi_range_read.cc:1411
#17 0x000055c433aa984e in ha_innobase::multi_range_read_next (this=0x7f3b14029130, range_info=0x7f3bbc519110) at storage/innobase/handler/ha_innodb.cc:21332
#18 0x000055c4339df8d2 in QUICK_RANGE_SELECT::get_next (this=0x7f3b1401ada0) at sql/opt_range.cc:11473
#19 0x000055c4339f0882 in rr_quick (info=0x7f3bbc519350) at sql/records.cc:365
#20 0x000055c4336f219b in mysql_update (thd=0x7f3b14000a98, table_list=0x7f3b14010640, fields=..., values=..., conds=0x7f3b14010fe8, order_num=0, order=0x0, limit=18446744073709551615, handle_duplicates=DUP_ERROR, ignore=false, found_return=0x7f3bbc5198c0, updated_return=0x7f3bbc519970) at sql/sql_update.cc:749
#21 0x000055c433602a1f in mysql_execute_command (thd=0x7f3b14000a98) at sql/sql_parse.cc:4014
#22 0x000055c43360e8a0 in mysql_parse (thd=0x7f3b14000a98, rawbuf=0x7f3b140104e0 "UPDATE `t7` SET `pad` = 'o' WHERE `id` = 2942 /* E_R Thread2 QNO 1599 CON_ID 20 */", length=82, parser_state=0x7f3bbc51a250, is_com_multi=false, is_next_command=false) at sql/sql_parse.cc:7759
#23 0x000055c4335fcba9 in dispatch_command (command=COM_QUERY, thd=0x7f3b14000a98, packet=0x7f3b140081a9 "UPDATE `t7` SET `pad` = 'o' WHERE `id` = 2942 /* E_R Thread2 QNO 1599 CON_ID 20 */ ", packet_length=83, is_com_multi=false, is_next_command=false) at sql/sql_parse.cc:1831
#24 0x000055c4335fb4fd in do_command (thd=0x7f3b14000a98) at sql/sql_parse.cc:1384
#25 0x000055c433750957 in do_handle_one_connection (connect=0x55c4372aed98) at sql/sql_connect.cc:1336
#26 0x000055c4337506c2 in handle_one_connection (arg=0x55c4372aed98) at sql/sql_connect.cc:1241
#27 0x00007f3bc4cec6da in start_thread (arg=0x7f3bbc51b700) at pthread_create.c:456



 Comments   
Comment by Matthias Leich [ 2019-10-31 ]

I have hit that assert on 10.5 too.

Comment by Matthias Leich [ 2020-01-14 ]

simp_page_zip_validate.* RQG files for replaying the problem
simp_page_zip_validate_rqg.log    Protocol (includes backtrace) of my RQG run against
                                                                       10.5 commit 8576a7bacfaa4f7ac3ec86cff436e0d6e7bb3a7d
 
Only two concurrent sessions are required.
One runs random
- (frequent) DELETE FROM oltp3 LIMIT 2 |
- (not often) FLUSH TABLE oltp3 FOR EXPORT
- (often) UNLOCK TABLES
The second runs random just
   SELECT SUM(k) FROM oltp3 WHERE id BETWEEN _smallint_unsigned AND _smallint_unsigned ;
 
After short time (~ 50s) InnoDB reports into the server error that there is a deadlock
and than DB server aborts with a core.
 
The uploaded test does not replay the assert on
- 10.2 commit 7d313214641241e8ce839d9de01529c2335c620f 2020-01-10
- 10.3 commit d531b4ee3a9bcd89a2fa6b49a2207eaf966f53e3 2020-01-12
- 10.4 commit 2d4b6571ecf15a4382702d26df9439bc5a1005de 2020-01-14
In case I memorize correct than the assert in the bug title occured at
several places in InnoDB code.
During the automatic+manual RQG grammar the variety of SQL shrinks
and by that also the places in InnoDB code where the assert was hit.
 
Please do not work on this bug.
Caused by some mistake in the setup of the simplifier the uploaded test
replays some other bug.

Comment by Marko Mäkelä [ 2020-01-23 ]

When testing MDEV-12353, I accidentally ran into a failure like this. I believe that there is a race condition between page reads that call page_zip_validate() (with page S-latch) and writes that update PAGE_ROOT_AUTO_INC (with page SX-latch). That writer code was introduced in MDEV-6076.

The following patch will ignore the PAGE_ROOT_AUTO_INC when comparing the two clustered index root page images (page number 3):

diff --git a/storage/innobase/page/page0zip.cc b/storage/innobase/page/page0zip.cc
index 3211c8fad83..40e3909e2f2 100644
--- a/storage/innobase/page/page0zip.cc
+++ b/storage/innobase/page/page0zip.cc
@@ -3310,7 +3310,13 @@ page_zip_validate_low(
 		   FIL_PAGE_LSN - FIL_PAGE_PREV)
 	    || memcmp(page_zip->data + FIL_PAGE_TYPE, page + FIL_PAGE_TYPE, 2)
 	    || memcmp(page_zip->data + FIL_PAGE_DATA, page + FIL_PAGE_DATA,
-		      PAGE_DATA - FIL_PAGE_DATA)) {
+		      PAGE_ROOT_AUTO_INC)
+	    || (page_get_page_no(page) != 3
+		&& memcmp(&page_zip->data[FIL_PAGE_DATA + PAGE_ROOT_AUTO_INC],
+			  &page[FIL_PAGE_DATA + PAGE_ROOT_AUTO_INC], 8))
+	    || memcmp(&page_zip->data[FIL_PAGE_DATA + PAGE_HEADER_PRIV_END],
+		      &page[FIL_PAGE_DATA + PAGE_HEADER_PRIV_END],
+		      PAGE_DATA - FIL_PAGE_DATA - PAGE_HEADER_PRIV_END)) {
 		page_zip_fail(("page_zip_validate: page header\n"));
 		page_zip_hexdump(page_zip, sizeof *page_zip);
 		page_zip_hexdump(page_zip->data, page_zip_get_size(page_zip));

I successfully tested it on a 10.5-based MDEV-12353 branch with the following test, adapted from innodb_zip.wl5522_zip:

-- source include/have_innodb.inc
 
set @save_limit= @@innodb_limit_optimistic_insert_debug;
 
set global innodb_limit_optimistic_insert_debug=2;
 
CREATE TABLE t1
(a INT AUTO_INCREMENT PRIMARY KEY,
 b char(22),
 c varchar(255),
 KEY (b))
ENGINE = InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;
 
insert into t1 (b, c) values ('Apa', 'Filler........'),
('Banan', 'Filler........'), ('Cavalry', '..asdasdfaeraf'),
('Devotion', 'asdfuihknaskdf'), ('Evolution', 'lsjndofiabsoibeg');
 
INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
 
DROP TABLE t1;
 
set global innodb_limit_optimistic_insert_debug=@save_limit;

./mtr --mysqld=--innodb-page-size=4k --parallel=auto --repeat=1000 innodb_zip.wz{,,,,,,,,,,,}

On my system, the test would typically crash on the first round already. With the fix, it already completed 20 or 21 parallel rounds (of 14 seconds each) without failures.

mleich, do you still see the failures with this fix? And was it always related to a table that contains an AUTO_INCREMENT column?

Generated at Thu Feb 08 09:02:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.