Details
-
Bug
-
Status: Confirmed (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.6, 10.11, 11.4
Description
While testing MDEV-34466, mleich produced an rr replay trace of a failure.
ssh pluto
|
rr replay /data/results/1728920463/MB-1782A/1_clone/rr/latest-trace
|
A running server had been backed up and the backup restored. There is exactly one incomplete transaction that needs to be rolled back:
2024-10-14 17:46:03 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 953 row operations to undo
|
2024-10-14 17:46:03 0 [Note] InnoDB: Trx id counter is 885
|
...
|
mariadbd: /data/Server/10.6-MDEV-34466/storage/innobase/trx/trx0rec.cc:510: const byte* trx_undo_rec_get_pars(const trx_undo_rec_t*, byte*, byte*, bool*, undo_no_t*, table_id_t*): Assertion `*table_id' failed.
|
The assertion fails during the execution of CHECK TABLE...EXTENDED. The reason is that trx_undo_prev_version_build() keeps fetching previous record versions that are not safe to access. The newest version of that record belongs to the being-rolled-back recovered transaction 779 (0x30b). All older versions should not exist in the purge_sys.view or purge_sys.end_view.
The reason for this failure is that purge_sys.end_view.m_low_limit_id had been reset from 885 to 0, because both purge_sys.head and purge_sys.tail are 0,0 at that point of time:
#0 0x000055c1808858da in purge_sys_t::clone_oldest_view<true> (this=<optimized out>) at /data/Server/10.6-MDEV-34466/storage/innobase/include/trx0purge.h:427
|
#1 trx_lists_init_at_db_start () at /data/Server/10.6-MDEV-34466/storage/innobase/trx/trx0trx.cc:794
|
#2 0x000055c1808400a0 in srv_start (create_new_db=<optimized out>) at /data/Server/10.6-MDEV-34466/storage/innobase/srv/srv0start.cc:1514
|
#3 0x000055c1805ed787 in innodb_init (p=<optimized out>) at /data/Server/10.6-MDEV-34466/storage/innobase/handler/ha_innodb.cc:4317
|
During the execution, trx_purge() was invoked in the purge_coordinator_callback exactly once, and the execution is still in progress at the time the assertion fails. Also the rollback of the incomplete transaction 779 is in progress.
I think that at the end of the purge batch, purge_sys.end_view should be adjusted to something better. Because the purge_sys.end_view is only used by CHECK TABLE...EXTENDED, this bug should be limited to executing that statement soon after starting the server (which is what our stress tests do).
To fix this, I think that we must resolve the mystery of MDEV-22718.
Attachments
Issue Links
- is caused by
-
MDEV-24402 CHECK TABLE may miss some cases of index inconsistencies
- Closed
- relates to
-
MDEV-22718 InnoDB: purge_sys.low_limit_no() is not protected
- Stalled
-
MDEV-34466 XA prepare don't release unmodified records in non-blocking mode
- Closed