[MDEV-34259] Optimization in row_purge_poss_sec Function for Undo Purge Process - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Won't Fix
Affects Version/s: 10.5, 10.6, 10.11, 11.4
Fix Version/s: N/A
Component/s: Storage Engine - InnoDB
Labels:
- performance
- purge

Description

I have report this performance issue to upstream. After review the code of mariadb, it seem mariadb has the same issue

https://bugs.mysql.com/bug.php?id=115107

Attachments

Issue Links

relates to

MDEV-32050 UNDO logs still growing for write-intensive workloads

Closed

MDEV-33213 History list is not shrunk unless there is a pause in the workload

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2024-05-29 05:40

Thank you! I missed this while working on ~~MDEV-32050~~ and ~~MDEV-33213~~. Thanks to ~~MDEV-24402~~, it should be rather easy to confirm if there is any fault in your analysis that might cause secondary index records to be left behind. If the idea is correct, MDEV-17598 might become less critical.

Marko Mäkelä added a comment - 2024-05-29 05:40 Thank you! I missed this while working on MDEV-32050 and MDEV-33213 . Thanks to MDEV-24402 , it should be rather easy to confirm if there is any fault in your analysis that might cause secondary index records to be left behind. If the idea is correct, MDEV-17598 might become less critical.

zongzhi chen added a comment - 2024-05-29 09:06

Yes, It seems that adding transaction ID to per secondary index may cause huge space overhead.
I was thinking only add max transaction ID to secondary index, that may not solve all of the 3 issue, but It can solve purge of transaction history, and implicit locking.

zongzhi chen added a comment - 2024-05-29 09:06 Yes, It seems that adding transaction ID to per secondary index may cause huge space overhead. I was thinking only add max transaction ID to secondary index, that may not solve all of the 3 issue, but It can solve purge of transaction history, and implicit locking.

Marko Mäkelä added a comment - 2024-05-29 14:00

baotiao, I implemented your idea as follows on top of MariaDB Server 10.6:

diff --git a/storage/innobase/row/row0purge.cc b/storage/innobase/row/row0purge.cc

index 4533b7166b7..8e2e6f8cd82 100644

--- a/storage/innobase/row/row0purge.cc

+++ b/storage/innobase/row/row0purge.cc

@@ -305,11 +305,17 @@ row_purge_poss_sec(

 	mtr_t*		sec_mtr,

 	bool		is_tree)

+	ut_ad(index->is_btree() || index->is_spatial());

+	ut_ad(!index->is_primary());

+	if (sec_pcur && !purge_sys.is_purgeable(

+		    page_get_max_trx_id(btr_pcur_get_page(sec_pcur)))) {

+		return false;

+	}

 	bool	can_delete;

 	mtr_t	mtr;

-	ut_ad(!dict_index_is_clust(index));

 	mtr_start(&mtr);

 	can_delete = !row_purge_reposition_pcur(BTR_SEARCH_LEAF, node, &mtr)

We would have sec_pcur=nullptr if the caller is purge buffering, which after some refactoring in 10.6 would be btr_cur_t::search_leaf(). In that case the secondary index leaf page is not available in the buffer pool.

I am not completely convinced about the correctness of this yet. As far as I can tell, row_vers_old_has_index_entry(also_curr=true, …) does not consider the secondary index leaf page header field PAGE_MAX_TRX_ID field at all. It just traverses the stack of clustered index record versions down to the oldest version that is visible in purge_sys.view. If no matching clustered index record is found, the secondary index record is deemed safe to purge.

The PAGE_MAX_TRX_ID can be ‘too new to purge’ because a completely unrelated record in the secondary index is new. Until MDEV-17598 is implemented, we do not know which secondary index records are too new. In fact, with the above fix, I would have several tests failing. Here is an example:

CURRENT_TEST: parts.partition_alter2_2_2_innodb

--- /mariadb/10.6/mysql-test/suite/parts/r/partition_alter2_2_2_innodb.result	2023-08-22 11:40:09.227128297 +0300

+++ /mariadb/10.6/mysql-test/suite/parts/r/partition_alter2_2_2_innodb.reject	2024-05-29 16:51:04.449362810 +0300

@@ -18195,6 +18195,9 @@

 test.t1	analyze	status	OK

 CHECK    TABLE t1 EXTENDED;

 Table	Op	Msg_type	Msg_text

+test.t1	check	Warning	InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `parta` */: COMPACT RECORD(info_bits=32, 3 fields): {[3]   (0x7FFFF5),[3]   (0x7FFFF5),[6]     A(0x000000000C41)}

+test.t1	check	Warning	InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `partb` */: COMPACT RECORD(info_bits=32, 3 fields): {[3]   (0x7FFFED),[3]   (0x800015),[6]      (0x000000000BDE)}

+test.t1	check	Warning	InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `partc` */: COMPACT RECORD(info_bits=32, 3 fields): {[3]   (0x800005),[3]   (0x800005),[6]      (0x000000000BE3)}

 test.t1	check	status	OK

 CHECKSUM TABLE t1 EXTENDED;

 Table	Checksum

@@ -19248,6 +19251,7 @@

 test.t1	analyze	status	OK

 CHECK    TABLE t1 EXTENDED;

 Table	Op	Msg_type	Msg_text

+test.t1	check	Warning	InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `part3`, Subpartition `subpart32` */: COMPACT RECORD(info_bits=32, 3 fields): {[3]   (0x800006),[3]   (0x800006),[6]      (0x000000000D18)}

 test.t1	check	status	OK

 CHECKSUM TABLE t1 EXTENDED;

 Table	Checksum

@@ -20300,6 +20304,7 @@

 test.t1	analyze	status	OK

 CHECK    TABLE t1 EXTENDED;

 Table	Op	Msg_type	Msg_text

+test.t1	check	Warning	InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `part3`, Subpartition `part3sp0` */: COMPACT RECORD(info_bits=32, 3 fields): {[3]   (0x7FFFEC),NULL,[6]     h(0x000000000E68)}

 test.t1	check	status	OK

 CHECKSUM TABLE t1 EXTENDED;

 Table	Checksum

Note that to my understanding, nothing comparable to ~~MDEV-24402~~ was implemented in MySQL; MySQL would ignore the EXTENDED attribute of CHECK TABLE for InnoDB tables. Also note that such orphan records can be created due to a known bug MDEV-29823. In any case, I do not remember any failures of the regression tests that the above patch would seem to break.

The orphan secondary index records are mostly ‘invisible’ to users; you would only observe that the data files are larger than usual. MVCC reads and other operations would just simply ignore the orphan secondary indexes, of course with some performance overhead. At https://mariadb.org/fest2022/how-innodb-multi-version-concurrency-control-mvcc-works/ you can find my presentation about this.

That said, this is a nice idea. Maybe you can come up with an improvement, or maybe I made a mistake in my translation to MariaDB.

Marko Mäkelä added a comment - 2024-05-29 14:00 baotiao , I implemented your idea as follows on top of MariaDB Server 10.6: diff --git a/storage/innobase/row/row0purge.cc b/storage/innobase/row/row0purge.cc index 4533b7166b7..8e2e6f8cd82 100644 --- a/storage/innobase/row/row0purge.cc +++ b/storage/innobase/row/row0purge.cc @@ -305,11 +305,17 @@ row_purge_poss_sec( mtr_t* sec_mtr, bool is_tree) { + ut_ad(index->is_btree() || index->is_spatial()); + ut_ad(!index->is_primary()); + + if (sec_pcur && !purge_sys.is_purgeable( + page_get_max_trx_id(btr_pcur_get_page(sec_pcur)))) { + return false; + } + bool can_delete; mtr_t mtr; - ut_ad(!dict_index_is_clust(index)); - mtr_start(&mtr); can_delete = !row_purge_reposition_pcur(BTR_SEARCH_LEAF, node, &mtr) We would have sec_pcur=nullptr if the caller is purge buffering, which after some refactoring in 10.6 would be btr_cur_t::search_leaf() . In that case the secondary index leaf page is not available in the buffer pool. I am not completely convinced about the correctness of this yet. As far as I can tell, row_vers_old_has_index_entry(also_curr=true, …) does not consider the secondary index leaf page header field PAGE_MAX_TRX_ID field at all. It just traverses the stack of clustered index record versions down to the oldest version that is visible in purge_sys.view . If no matching clustered index record is found, the secondary index record is deemed safe to purge. The PAGE_MAX_TRX_ID can be ‘too new to purge’ because a completely unrelated record in the secondary index is new. Until MDEV-17598 is implemented, we do not know which secondary index records are too new. In fact, with the above fix, I would have several tests failing. Here is an example: CURRENT_TEST: parts.partition_alter2_2_2_innodb --- /mariadb/10.6/mysql-test/suite/parts/r/partition_alter2_2_2_innodb.result 2023-08-22 11:40:09.227128297 +0300 +++ /mariadb/10.6/mysql-test/suite/parts/r/partition_alter2_2_2_innodb.reject 2024-05-29 16:51:04.449362810 +0300 @@ -18195,6 +18195,9 @@ test.t1 analyze status OK CHECK TABLE t1 EXTENDED; Table Op Msg_type Msg_text +test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `parta` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x7FFFF5),[3] (0x7FFFF5),[6] A(0x000000000C41)} +test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `partb` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x7FFFED),[3] (0x800015),[6] (0x000000000BDE)} +test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `partc` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x800005),[3] (0x800005),[6] (0x000000000BE3)} test.t1 check status OK CHECKSUM TABLE t1 EXTENDED; Table Checksum @@ -19248,6 +19251,7 @@ test.t1 analyze status OK CHECK TABLE t1 EXTENDED; Table Op Msg_type Msg_text +test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `part3`, Subpartition `subpart32` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x800006),[3] (0x800006),[6] (0x000000000D18)} test.t1 check status OK CHECKSUM TABLE t1 EXTENDED; Table Checksum @@ -20300,6 +20304,7 @@ test.t1 analyze status OK CHECK TABLE t1 EXTENDED; Table Op Msg_type Msg_text +test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `part3`, Subpartition `part3sp0` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x7FFFEC),NULL,[6] h(0x000000000E68)} test.t1 check status OK CHECKSUM TABLE t1 EXTENDED; Table Checksum Note that to my understanding, nothing comparable to MDEV-24402 was implemented in MySQL; MySQL would ignore the EXTENDED attribute of CHECK TABLE for InnoDB tables. Also note that such orphan records can be created due to a known bug MDEV-29823 . In any case, I do not remember any failures of the regression tests that the above patch would seem to break. The orphan secondary index records are mostly ‘invisible’ to users; you would only observe that the data files are larger than usual. MVCC reads and other operations would just simply ignore the orphan secondary indexes, of course with some performance overhead. At https://mariadb.org/fest2022/how-innodb-multi-version-concurrency-control-mvcc-works/ you can find my presentation about this. That said, this is a nice idea. Maybe you can come up with an improvement, or maybe I made a mistake in my translation to MariaDB.

MariaDB Server

Optimization in row_purge_poss_sec Function for Undo Purge Process

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration