[MDEV-32820] Race condition between trx_purge_free_segment() and trx_undo_create() - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.5.20, 10.6.13, 10.8.8, 10.9.6, 10.10.4, 10.11.3, 11.0.2, 11.1.1, 11.2.1, 11.2(EOL)
Fix Version/s: 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2
Component/s: Storage Engine - InnoDB
Labels:

Description

The test encryption.create_or_replace failed on a IA-32 debug build like this:

bb-11.2-release 23651e27c672aa57d0dfba1251a4fc0abc6c95d6
encryption.create_or_replace 'ctr,innodb' w3 [ fail ]
Test ended at 2023-11-03 14:56:55

CURRENT_TEST: encryption.create_or_replace
mysqltest: At line 68: query 'CREATE OR REPLACE TABLE `create_or_replace_t` AS SELECT * FROM `table10_int_autoinc`' failed: <Unknown> (2013): Lost connection to server during query
...
Version: '11.2.2-MariaDB-debug-log' socket: '/mnt/buildbot/build/mariadb-11.2.2/mysql-test/var/tmp/3/mysqld.1.sock' port: 16040 Source distribution
2023-11-03 14:56:46 4 [Note] InnoDB: Creating #1 encryption thread id 2890922816 total threads 4.
2023-11-03 14:56:46 4 [Note] InnoDB: Creating #2 encryption thread id 2882530112 total threads 4.
2023-11-03 14:56:46 4 [Note] InnoDB: Creating #3 encryption thread id 2874137408 total threads 4.
2023-11-03 14:56:46 4 [Note] InnoDB: Creating #4 encryption thread id 2865744704 total threads 4.
mariadbd: /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/include/buf0buf.h:789: uint32_t buf_page_t::fix(uint32_t): Assertion `f >= FREED' failed.
...
#9 0xb7042d8b in __assert_fail () from /lib/i386-linux-gnu/libc.so.6
#10 0x810c6b5b in buf_page_t::fix (this=0xb0c00068, count=1) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/include/buf0buf.h:789
#11 0x8119a97d in buf_page_create_low (page_id=..., zip_size=0, mtr=0xaefa5724, free_block=0xb09ff298) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/buf/buf0buf.cc:2861
#12 0x8119b23f in buf_page_create (space=0x83a87ed8, offset=46, zip_size=0, mtr=0xaefa5724, free_block=0xb09ff298) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/buf/buf0buf.cc:3010
#13 0x8123c91e in fsp_page_create (space=0x83a87ed8, offset=46, mtr=0xaefa5724) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:1066
#14 0x8123cdaa in fsp_alloc_free_page (space=0x83a87ed8, hint=0, mtr=0xaefa5724, init_mtr=0xaefa5724, err=0xaefa56f4) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:1178
#15 0x81240017 in fseg_alloc_free_page_low (space=0x83a87ed8, seg_inode=0xb0eca072 "", iblock=0xb0dffa08, hint=0, direction=111 'o', has_done_reservation=true, mtr=0xaefa5724, init_mtr=0xaefa5724, err=0xaefa56f4) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:2143
#16 0x8123e8e5 in fseg_create (space=0x83a87ed8, byte_offset=60, mtr=0xaefa5724, err=0xaefa56f4, has_done_reservation=true, block=0x0) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:1755
#17 0x811349b0 in trx_undo_seg_create (space=0x83a87ed8, rseg_hdr=0xb0fff870, id=0xaefa5638, err=0xaefa56f4, mtr=0xaefa5724) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0undo.cc:482
#18 0x81136d11 in trx_undo_create (trx=0xb1a00b80, rseg=0x81e6f280 <trx_sys+22336>, undo=0xb1a010f4, err=0xaefa56f4, mtr=0xaefa5724) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0undo.cc:1180
#19 0x81138602 in trx_undo_assign_low<false> (trx=0xb1a00b80, rseg=0x81e6f280 <trx_sys+22336>, undo=0xb1a010f4, mtr=0xaefa5724, err=0xaefa56f4) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0undo.cc:1372
#20 0x8111321e in trx_undo_report_row_operation (thr=0xa7b65fc8, index=0xac50a7f8, clust_entry=0x0, update=0xa7b69158, cmpl_info=0, rec=0xb1138152 "testcreate_or_replace_t", offsets=0xaefa60ac, roll_ptr=0xaefa5ac8) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0rec.cc:1898

Concurrently, another thread is trying to free pages from this undo log tablespace:

bb-11.2-release 23651e27c672aa57d0dfba1251a4fc0abc6c95d6
#8 0x80fc5b41 in fil_space_t::x_lock (this=0x83a87ed8) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/include/fil0fil.h:983
#9 0x80fc1175 in mtr_t::x_lock_space (this=0xa845bc44, space=0x83a87ed8) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/mtr/mtr0mtr.cc:790
#10 0x80fc1062 in mtr_t::x_lock_space (this=0xa845bc44, space_id=2) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/mtr/mtr0mtr.cc:776
#11 0x812420f9 in fseg_free_step_not_header (header=0xb0c5c03c "", mtr=0xa845bc44, ahi=false) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fsp/fsp0fsp.cc:2905
#12 0x810f5896 in trx_purge_free_segment (block=0xb0bfebb0, mtr=...) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0purge.cc:354
#13 0x810f5ed7 in trx_purge_truncate_rseg_history (rseg=..., limit=..., all=true) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0purge.cc:459
#14 0x810f65fd in purge_sys_t::iterator::free_history (this=0xa845bf80) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/trx/trx0purge.cc:546
#15 0x810e0851 in purge_truncation_callback () at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/srv/srv0srv.cc:1101

This task was added in ~~MDEV-32050~~. Previously this was part of the purge_coordinator_callback and executed less frequently.

Yet another thread is trying to write out pages of this undo tablespace:

bb-11.2-release 23651e27c672aa57d0dfba1251a4fc0abc6c95d6
#5 0x811af4d0 in inline_mysql_mutex_lock (that=0x81e72080 <buf_pool>, src_file=0x81800010 "/home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/buf/buf0flu.cc", src_line=1677) at /home/buildbot/buildbot/build/mariadb-11.2.2/include/mysql/psi/mysql_thread.h:746
#6 0x811b6bd9 in buf_flush_list_space (space=0x83a87ed8, n_flushed=0xabcfdc8c) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/buf/buf0flu.cc:1677
#7 0x81236df5 in fil_crypt_flush_space (state=0xabcfe008) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fil/fil0crypt.cc:1909
#8 0x812372a3 in fil_crypt_complete_rotate_space (state=0xabcfe008) at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fil/fil0crypt.cc:1994
#9 0x812375af in fil_crypt_thread () at /home/buildbot/buildbot/build/mariadb-11.2.2/storage/innobase/fil/fil0crypt.cc:2066

So far, I failed to reproduce this locally by running 2,800 repetitions of the test on a local 32-bit build. I started another campaign of 56,000 repetitions, and it is over ¼ through, with no failures.

Attachments

Issue Links

causes

MDEV-33137 Assertion `end_lsn == page_lsn' failed in recv_recover_page

Closed

is caused by

MDEV-30753 Possible corruption due to trx_purge_free_segment()

Closed

relates to

MDEV-32050 UNDO logs still growing for write-intensive workloads

Closed

MariaDB Server

Race condition between trx_purge_free_segment() and trx_undo_create()

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration