[MDEV-21353] ut0byte.ic:91: void* ut_align_down(const void*, ulint): Assertion `ptr' failed. Created: 2019-12-18  Updated: 2019-12-20  Resolved: 2019-12-20

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.1
Fix Version/s: 10.5.1

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Attachments: File MDEV-21353.tgz     HTML File prt     HTML File prt_other    

 Description   

Assert hit during RQG testing
10.5.1 commit 44be8652c567ccaa0067830fa327cbffbe66b4cd 2019-12-18  compiled with debug
Version: '10.5.1-MariaDB-debug-log'  socket: '... Source distribution
mysqld: /home/mleich/Server/10.5/storage/innobase/include/ut0byte.ic:91: void* ut_align_down(const void*, ulint): Assertion `ptr' failed.
...
Query (0x7ff33c0115a0): DELETE FROM t3 WHERE col2 = @MY_INT
Status: NOT_KILLED
 
The attached file 'prt' is the log of that MTR test run.
 
The MTR test could also show some other assert
mysqld: storage/innobase/btr/btr0pcur.cc:346: ulint btr_pcur_restore_position_func(ulint, btr_pcur_t*, const char*, unsigned int, mtr_t*): Assertion `!cmp_rec_rec(cursor->old_rec, rec, offsets1, offsets2, index)' failed.
...
Query (0x7f26d80115a0): DELETE FROM t3 WHERE col2 = @MY_INT
Status: NOT_KILLED
 
The attached file 'prt_other' is the log that MTR test run.
 
 
10.4.9 commit 576c96a9385f20cf4b6fe685e079dcf8026c3a24 2019-10-29
10.4.12 commit 088de81d965f1b5ae2f50b1ed6e7cfccc0366098 2019-12-18
did not replay that crash.



 Comments   
Comment by Matthias Leich [ 2019-12-18 ]

How to replay the problem
0. Use a Linux because the test needs a shellscript (in archive)
1. cd <source_dir>/mysql-test
2. tar xvzf MDEV-21353.tgz
3. cd <bin_dir>/mysql-test
4. mysql-test-run.pl --mem --parallel=8 MDEV-21353 MDEV-21353 MDEV-21353 MDEV-21353 MDEV-21353 MDEV-21353 MDEV-21353 MDEV-21353 2>&1 | tee prt
Please adjust the 8 and the list with 8 "MDEV-21353" to the number of CPU's of your box (nproc)
because it looks like exceptional CPU load is required for replay.
4. In case of hitting an error mysql-test-run.pl will stop. But at least in my runs it was likely that certain mysqld and perl processes did not stop immediate (but a few minutes later).
Some killall -9 perl ; killall -9 mysqld helps.
But please be aware that all processes with such names get killed.

Comment by Marko Mäkelä [ 2019-12-18 ]

In prt the problem is rec=NULL:

10.5 44be8652c567ccaa0067830fa327cbffbe66b4cd

#10 0x0000557505245500 in page_rec_is_comp (rec=0x0) at /home/mleich/Server/10.5/storage/innobase/include/page0page.h:274
#11 0x00005575053e183e in row_search_mvcc (buf=0x7ff34801e4c8 "\372\233\f", mode=PAGE_CUR_G, prebuilt=0x7ff34801f488, match_mode=0, direction=1) at /home/mleich/Server/10.5/storage/innobase/row/row0sel.cc:4679
#12 0x00005575051f2452 in ha_innobase::general_fetch (this=0x7ff34801e8b0, buf=0x7ff34801e4c8 "\372\233\f", direction=1, match_mode=0) at /home/mleich/Server/10.5/storage/innobase/handler/ha_innodb.cc:9321
#13 0x00005575051f2997 in ha_innobase::rnd_next (this=0x7ff34801e8b0, buf=0x7ff34801e4c8 "\372\233\f") at /home/mleich/Server/10.5/storage/innobase/handler/ha_innodb.cc:9530
#14 0x0000557504e19342 in handler::ha_rnd_next (this=0x7ff34801e8b0, buf=0x7ff34801e4c8 "\372\233\f") at /home/mleich/Server/10.5/sql/handler.cc:2816

In prt_other the problem is a failure to restore a persistent cursor position:

10.5

#7  0x00007f27483bffc2 in __GI___assert_fail (assertion=0x5577f52e9da0 "!cmp_rec_rec(cursor->old_rec, rec, offsets1, offsets2, index)", file=0x5577f52e99b0 "/home/mleich/Server/10.5/storage/innobase/btr/btr0pcur.cc", line=346, function=0x5577f52ea9c0 <btr_pcur_restore_position_func(unsigned long, btr_pcur_t*, char const*, unsigned int, mtr_t*)::__PRETTY_FUNCTION__> "ulint btr_pcur_restore_position_func(ulint, btr_pcur_t*, const char*, unsigned int, mtr_t*)") at assert.c:101
#8  0x00005577f4bd0fef in btr_pcur_restore_position_func (latch_mode=1, cursor=0x7f26d801f658, file=0x5577f5294d40 "/home/mleich/Server/10.5/storage/innobase/row/row0sel.cc", line=3511, mtr=0x7f2741744d60) at /home/mleich/Server/10.5/storage/innobase/btr/btr0pcur.cc:346
#9  0x00005577f4adde4a in sel_restore_position_for_mysql (same_user_rec=0x7f2741744888, latch_mode=1, pcur=0x7f26d801f658, moves_up=1, mtr=0x7f2741744d60) at /home/mleich/Server/10.5/storage/innobase/row/row0sel.cc:3511
#10 0x00005577f4ae128d in row_search_mvcc (buf=0x7f26d801e4c8 "\372l\021", mode=PAGE_CUR_G, prebuilt=0x7f26d801f488, match_mode=0, direction=1) at /home/mleich/Server/10.5/storage/innobase/row/row0sel.cc:4554
#11 0x00005577f48f2452 in ha_innobase::general_fetch (this=0x7f26d801e8b0, buf=0x7f26d801e4c8 "\372l\021", direction=1, match_mode=0) at /home/mleich/Server/10.5/storage/innobase/handler/ha_innodb.cc:9321
#12 0x00005577f48f2997 in ha_innobase::rnd_next (this=0x7f26d801e8b0, buf=0x7f26d801e4c8 "\372l\021") at /home/mleich/Server/10.5/storage/innobase/handler/ha_innodb.cc:9530

Comment by Marko Mäkelä [ 2019-12-20 ]

MDEV-21353.tgz failed to work in my environment, apparently something unexpected during the setup phase. With my help, mleich narrowed the cause down to the changes to page_cur_delete_rec() in befde6e97e8d14eb120fa28a012474dce2b7c185.

Comment by Marko Mäkelä [ 2019-12-20 ]

I suspect that the patch added a condition that incorrectly skips a buf_block_modify_clock_inc() call. To rescue the situation, maybe that call could be added to the callers? Here is a patch that reverts the condition (and breaks ALTER TABLE…IMPORT TABLESPACE:

diff --git a/storage/innobase/page/page0cur.cc b/storage/innobase/page/page0cur.cc
index 771fbfae1ef..d7b0fa094f9 100644
--- a/storage/innobase/page/page0cur.cc
+++ b/storage/innobase/page/page0cur.cc
@@ -2525,7 +2525,7 @@ page_cur_delete_rec(
 	/* The page gets invalid for optimistic searches: increment
 	the frame modify clock. Avoid this during IMPORT; the block is
 	not actually in the buffer pool. */
-	if (mtr->get_log_mode() != MTR_LOG_NONE) {
+	if (true||mtr->get_log_mode() != MTR_LOG_NONE) {
 		buf_block_modify_clock_inc(block);
 		page_cur_delete_rec_write_log(current_rec, index, mtr);
 	}

mleich, can you please test with this patch? Since the buf_block_t::modify_clock affects optimistic btr_pcur_restore_position(), the failure to call buf_block_modify_clock_inc() should explain all the observed symptoms.

Generated at Thu Feb 08 09:06:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.