[MDEV-30777] InnoDB: Failing assertion: pcur->restore_position(BTR_MODIFY_TREE, mtr) == btr_pcur_t::SAME_ALL Created: 2023-03-02  Updated: 2024-01-23

Status: Needs Feedback
Project: MariaDB Server
Component/s: Data Manipulation - Update
Affects Version/s: 10.6.12, 11.2.0
Fix Version/s: 11.2

Type: Bug Priority: Major
Reporter: Inderjit Bassi Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 0
Labels: None
Environment:

MariaDB 10.6.12 running on AWS RDS r6g class.


Issue Links:
Relates
relates to MDEV-20605 Awaken transaction can miss inserted ... Closed

 Description   

Recently upgraded our DB replica server from MariaDB 10.4.22 to 10.6.12. Upgraded a few other prior with no issues. Ran into an assertion error that crashed only on one of our replicas (oddly enough, the others were fine). Stack trace shows an UPDATE statement caused the crash.

```0x40003629b8f0 InnoDB: Assertion failure in file /local/p4clients/pkgbuild-_nuWH/workspace/src/RDSMariaDB/storage/innobase/row/row0upd.cc line 2413
InnoDB: Failing assertion: pcur->restore_position(BTR_MODIFY_TREE, mtr) == btr_pcur_t::SAME_ALL
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mariadbd startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
230302 17:28:39 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.6.12-MariaDB-log source revision:
key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=719
max_threads=12002
thread_count=753
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 27964704 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x4060c3208d18
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x40003629b060 thread_stack 0x40000
/rdsdbbin/mysql/bin/mysqld(my_print_stacktrace+0x2c)[0xaaaae119ad18]
/rdsdbbin/mysql/bin/mysqld(handle_fatal_signal+0x490)[0xaaaae0a54f40]
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0x4000243d4860]
/lib64/libc.so.6(gsignal+0xb0)[0x4000245ff530]
/lib64/libc.so.6(abort+0x154)[0x400024600d74]
/rdsdbbin/mysql/bin/mysqld(+0x784864)[0xaaaae0784864]
/rdsdbbin/mysql/bin/mysqld(+0xff0bc4)[0xaaaae0ff0bc4]
/rdsdbbin/mysql/bin/mysqld(+0xff3f80)[0xaaaae0ff3f80]
/rdsdbbin/mysql/bin/mysqld(+0xff50dc)[0xaaaae0ff50dc]
/rdsdbbin/mysql/bin/mysqld(+0xfd4614)[0xaaaae0fd4614]
/rdsdbbin/mysql/bin/mysqld(+0xf2b478)[0xaaaae0f2b478]
/rdsdbbin/mysql/bin/mysqld(ZN7handler13ha_update_rowEPKhS1+0x228)[0xaaaae0a61014]
/rdsdbbin/mysql/bin/mysqld(_ZN21Update_rows_log_event11do_exec_rowEP14rpl_group_info+0x234)[0xaaaae0b5c174]
/rdsdbbin/mysql/bin/mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x25c)[0xaaaae0b5115c]
/rdsdbbin/mysql/bin/mysqld(_Z39apply_event_and_update_pos_for_parallelP9Log_eventP3THDP14rpl_group_info+0x144)[0xaaaae07d3e0c]
Printing to addr2line failed
/rdsdbbin/mysql/bin/mysqld(+0x99ede8)[0xaaaae099ede8]
/rdsdbbin/mysql/bin/mysqld(handle_rpl_parallel_thread+0xa14)[0xaaaae09a3134]
/rdsdbbin/mysql/bin/mysqld(+0xea8730)[0xaaaae0ea8730]
/lib64/libpthread.so.0(+0x722c)[0x40002459f22c]
/lib64/libc.so.6(+0xd4a1c)[0x4000246a1a1c]

After digging around, this looks like an old bug that was fixed - https://bugs.mysql.com/bug.php?id=53390. Verified there are no BLOB columns being used either.



 Comments   
Comment by Marko Mäkelä [ 2023-03-02 ]

Can you please post a fully resolved stack trace? We may also need a copy of the corrupted page. First, the page number needs to be determined based on the stack trace.

Comment by Inderjit Bassi [ 2023-03-02 ]

Is there a separate procedure for full stack traces on AWS RDS?

Comment by Matthias Leich [ 2023-06-22 ]

I was able to replay a similar assert on origin/preview-11.2-preview 7c7de020b576a42bd381a9060c3076f85da0d9d4 2023-06-18T23:49:17+02:00.
The assert looks there slightly different
2023-06-22 15:17:36 0x7f6279824700  InnoDB: Assertion failure in file preview-11.2-previewB/storage/innobase/row/row0uins.cc line 210
# 2023-06-22T15:18:57 [3327234] | # 2023-06-22T15:18:57 [3327234] | InnoDB: Failing assertion: node->pcur.restore_position(BTR_PURGE_TREE, &mtr) == btr_pcur_t::SAME_ALL
 
# 2023-06-22T15:17:50 [3327234] Program terminated with signal SIGABRT, Aborted.
# 2023-06-22T15:17:50 [3327234] #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
# 2023-06-22T15:17:50 [3327234] [Current thread is 1 (Thread 0x7f6279824700 (LWP 3368527))]
# 2023-06-22T15:17:50 [3327234] #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
# 2023-06-22T15:17:50 [3327234] #1  0x000055dd43c38166 in my_write_core (sig=sig@entry=6) at /data/Server/preview-11.2-previewB/mysys/stacktrace.c:424
# 2023-06-22T15:17:50 [3327234] #2  0x000055dd4349a442 in handle_fatal_signal (sig=6) at /data/Server/preview-11.2-previewB/sql/signal_handler.cc:360
# 2023-06-22T15:17:50 [3327234] #3  <signal handler called>
# 2023-06-22T15:17:50 [3327234] #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
# 2023-06-22T15:17:50 [3327234] #5  0x00007f6283acb859 in __GI_abort () at abort.c:79
# 2023-06-22T15:17:50 [3327234] #6  0x000055dd43a33182 in ut_dbg_assertion_failed (expr=expr@entry=0x55dd440a60e8 "node->pcur.restore_position(BTR_PURGE_TREE, &mtr) == btr_pcur_t::SAME_ALL", file=file@entry=0x55dd440a5de0 "/data/Server/preview-11.2-previewB/storage/innobase/row/row0uins.cc", line=line@entry=210) at /data/Server/preview-11.2-previewB/storage/innobase/ut/ut0dbg.cc:60
# 2023-06-22T15:17:50 [3327234] #7  0x000055dd43bb6478 in row_undo_ins_remove_clust_rec (node=node@entry=0x7f6270001b20) at /data/Server/preview-11.2-previewB/storage/innobase/row/row0uins.cc:210
# 2023-06-22T15:17:50 [3327234] #8  0x000055dd43bb8d4a in row_undo_ins (node=node@entry=0x7f6270001b20, thr=thr@entry=0x7f6270001940) at /data/Server/preview-11.2-previewB/storage/innobase/row/row0uins.cc:599
# 2023-06-22T15:17:50 [3327234] #9  0x000055dd439bba32 in row_undo (node=node@entry=0x7f6270001b20, thr=thr@entry=0x7f6270001940) at /data/Server/preview-11.2-previewB/storage/innobase/row/row0undo.cc:401
# 2023-06-22T15:17:50 [3327234] #10 0x000055dd439bdb5e in row_undo_step (thr=thr@entry=0x7f6270001940) at /data/Server/preview-11.2-previewB/storage/innobase/row/row0undo.cc:452
# 2023-06-22T15:17:50 [3327234] #11 0x000055dd438fc51d in que_thr_step (thr=0x7f6270001940) at /data/Server/preview-11.2-previewB/storage/innobase/que/que0que.cc:586
# 2023-06-22T15:17:50 [3327234] #12 que_run_threads_low (thr=thr@entry=0x7f6270001940) at /data/Server/preview-11.2-previewB/storage/innobase/que/que0que.cc:644
# 2023-06-22T15:17:50 [3327234] #13 0x000055dd438fc798 in que_run_threads (thr=0x7f6270001940) at /data/Server/preview-11.2-previewB/storage/innobase/que/que0que.cc:664
# 2023-06-22T15:17:50 [3327234] #14 0x000055dd43a0b098 in trx_rollback_active (trx=trx@entry=0x7f627b4e7280) at /data/Server/preview-11.2-previewB/storage/innobase/trx/trx0roll.cc:612
# 2023-06-22T15:17:50 [3327234] #15 0x000055dd43a0d37c in trx_rollback_recovered (all=all@entry=true) at /data/Server/preview-11.2-previewB/storage/innobase/trx/trx0roll.cc:750
# 2023-06-22T15:17:50 [3327234] #16 0x000055dd43a0d7bb in trx_rollback_all_recovered () at /data/Server/preview-11.2-previewB/storage/innobase/trx/trx0roll.cc:796
# 2023-06-22T15:17:50 [3327234] #17 0x000055dd43bcf8a3 in tpool::task_group::execute (this=0x55dd450edbe0 <rollback_all_recovered_group>, t=t@entry=0x55dd450edbc0 <rollback_all_recovered_task>) at /data/Server/preview-11.2-previewB/tpool/task_group.cc:70
 
But all my attempts to replay one of these asserts by applying the same test to
origin/10.6 c2d44ecb904dd5558d116433b6cceb5592182447 2023-06-21T12:51:01-06:00
failed.
Hence I assume the problem is fixed in current 10.6 and hitting the similar assert
on preview-11.2-preview has some other reason.

Comment by Marko Mäkelä [ 2024-01-05 ]

Is this failure reproducible with MariaDB Server 10.6.16? There have been some fixes to some race condition bugs, such as MDEV-30531.

Comment by William Wong [ 2024-01-23 ]

Hi Marko, we are planning to upgrade MariaDB 10.6.16 but not tested yet.

Generated at Thu Feb 08 10:18:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.