Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34259

Optimization in row_purge_poss_sec Function for Undo Purge Process

Details

    Description

      I have report this performance issue to upstream. After review the code of mariadb, it seem mariadb has the same issue

      https://bugs.mysql.com/bug.php?id=115107

      Attachments

        Issue Links

          Activity

            Thank you! I missed this while working on MDEV-32050 and MDEV-33213. Thanks to MDEV-24402, it should be rather easy to confirm if there is any fault in your analysis that might cause secondary index records to be left behind. If the idea is correct, MDEV-17598 might become less critical.

            marko Marko Mäkelä added a comment - Thank you! I missed this while working on MDEV-32050 and MDEV-33213 . Thanks to MDEV-24402 , it should be rather easy to confirm if there is any fault in your analysis that might cause secondary index records to be left behind. If the idea is correct, MDEV-17598 might become less critical.
            baotiao zongzhi chen added a comment -

            Yes, It seems that adding transaction ID to per secondary index may cause huge space overhead.
            I was thinking only add max transaction ID to secondary index, that may not solve all of the 3 issue, but It can solve purge of transaction history, and implicit locking.

            baotiao zongzhi chen added a comment - Yes, It seems that adding transaction ID to per secondary index may cause huge space overhead. I was thinking only add max transaction ID to secondary index, that may not solve all of the 3 issue, but It can solve purge of transaction history, and implicit locking.

            baotiao, I implemented your idea as follows on top of MariaDB Server 10.6:

            diff --git a/storage/innobase/row/row0purge.cc b/storage/innobase/row/row0purge.cc
            index 4533b7166b7..8e2e6f8cd82 100644
            --- a/storage/innobase/row/row0purge.cc
            +++ b/storage/innobase/row/row0purge.cc
            @@ -305,11 +305,17 @@ row_purge_poss_sec(
             	mtr_t*		sec_mtr,
             	bool		is_tree)
             {
            +	ut_ad(index->is_btree() || index->is_spatial());
            +	ut_ad(!index->is_primary());
            +
            +	if (sec_pcur && !purge_sys.is_purgeable(
            +		    page_get_max_trx_id(btr_pcur_get_page(sec_pcur)))) {
            +		return false;
            +	}
            +
             	bool	can_delete;
             	mtr_t	mtr;
             
            -	ut_ad(!dict_index_is_clust(index));
            -
             	mtr_start(&mtr);
             
             	can_delete = !row_purge_reposition_pcur(BTR_SEARCH_LEAF, node, &mtr)
            

            We would have sec_pcur=nullptr if the caller is purge buffering, which after some refactoring in 10.6 would be btr_cur_t::search_leaf(). In that case the secondary index leaf page is not available in the buffer pool.

            I am not completely convinced about the correctness of this yet. As far as I can tell, row_vers_old_has_index_entry(also_curr=true, …) does not consider the secondary index leaf page header field PAGE_MAX_TRX_ID field at all. It just traverses the stack of clustered index record versions down to the oldest version that is visible in purge_sys.view. If no matching clustered index record is found, the secondary index record is deemed safe to purge.

            The PAGE_MAX_TRX_ID can be ‘too new to purge’ because a completely unrelated record in the secondary index is new. Until MDEV-17598 is implemented, we do not know which secondary index records are too new. In fact, with the above fix, I would have several tests failing. Here is an example:

            CURRENT_TEST: parts.partition_alter2_2_2_innodb
            --- /mariadb/10.6/mysql-test/suite/parts/r/partition_alter2_2_2_innodb.result	2023-08-22 11:40:09.227128297 +0300
            +++ /mariadb/10.6/mysql-test/suite/parts/r/partition_alter2_2_2_innodb.reject	2024-05-29 16:51:04.449362810 +0300
            @@ -18195,6 +18195,9 @@
             test.t1	analyze	status	OK
             CHECK    TABLE t1 EXTENDED;
             Table	Op	Msg_type	Msg_text
            +test.t1	check	Warning	InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `parta` */: COMPACT RECORD(info_bits=32, 3 fields): {[3]   (0x7FFFF5),[3]   (0x7FFFF5),[6]     A(0x000000000C41)}
            +test.t1	check	Warning	InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `partb` */: COMPACT RECORD(info_bits=32, 3 fields): {[3]   (0x7FFFED),[3]   (0x800015),[6]      (0x000000000BDE)}
            +test.t1	check	Warning	InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `partc` */: COMPACT RECORD(info_bits=32, 3 fields): {[3]   (0x800005),[3]   (0x800005),[6]      (0x000000000BE3)}
             test.t1	check	status	OK
             CHECKSUM TABLE t1 EXTENDED;
             Table	Checksum
            @@ -19248,6 +19251,7 @@
             test.t1	analyze	status	OK
             CHECK    TABLE t1 EXTENDED;
             Table	Op	Msg_type	Msg_text
            +test.t1	check	Warning	InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `part3`, Subpartition `subpart32` */: COMPACT RECORD(info_bits=32, 3 fields): {[3]   (0x800006),[3]   (0x800006),[6]      (0x000000000D18)}
             test.t1	check	status	OK
             CHECKSUM TABLE t1 EXTENDED;
             Table	Checksum
            @@ -20300,6 +20304,7 @@
             test.t1	analyze	status	OK
             CHECK    TABLE t1 EXTENDED;
             Table	Op	Msg_type	Msg_text
            +test.t1	check	Warning	InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `part3`, Subpartition `part3sp0` */: COMPACT RECORD(info_bits=32, 3 fields): {[3]   (0x7FFFEC),NULL,[6]     h(0x000000000E68)}
             test.t1	check	status	OK
             CHECKSUM TABLE t1 EXTENDED;
             Table	Checksum
            

            Note that to my understanding, nothing comparable to MDEV-24402 was implemented in MySQL; MySQL would ignore the EXTENDED attribute of CHECK TABLE for InnoDB tables. Also note that such orphan records can be created due to a known bug MDEV-29823. In any case, I do not remember any failures of the regression tests that the above patch would seem to break.

            The orphan secondary index records are mostly ‘invisible’ to users; you would only observe that the data files are larger than usual. MVCC reads and other operations would just simply ignore the orphan secondary indexes, of course with some performance overhead. At https://mariadb.org/fest2022/how-innodb-multi-version-concurrency-control-mvcc-works/ you can find my presentation about this.

            That said, this is a nice idea. Maybe you can come up with an improvement, or maybe I made a mistake in my translation to MariaDB.

            marko Marko Mäkelä added a comment - baotiao , I implemented your idea as follows on top of MariaDB Server 10.6: diff --git a/storage/innobase/row/row0purge.cc b/storage/innobase/row/row0purge.cc index 4533b7166b7..8e2e6f8cd82 100644 --- a/storage/innobase/row/row0purge.cc +++ b/storage/innobase/row/row0purge.cc @@ -305,11 +305,17 @@ row_purge_poss_sec( mtr_t* sec_mtr, bool is_tree) { + ut_ad(index->is_btree() || index->is_spatial()); + ut_ad(!index->is_primary()); + + if (sec_pcur && !purge_sys.is_purgeable( + page_get_max_trx_id(btr_pcur_get_page(sec_pcur)))) { + return false; + } + bool can_delete; mtr_t mtr; - ut_ad(!dict_index_is_clust(index)); - mtr_start(&mtr); can_delete = !row_purge_reposition_pcur(BTR_SEARCH_LEAF, node, &mtr) We would have sec_pcur=nullptr if the caller is purge buffering, which after some refactoring in 10.6 would be btr_cur_t::search_leaf() . In that case the secondary index leaf page is not available in the buffer pool. I am not completely convinced about the correctness of this yet. As far as I can tell, row_vers_old_has_index_entry(also_curr=true, …) does not consider the secondary index leaf page header field PAGE_MAX_TRX_ID field at all. It just traverses the stack of clustered index record versions down to the oldest version that is visible in purge_sys.view . If no matching clustered index record is found, the secondary index record is deemed safe to purge. The PAGE_MAX_TRX_ID can be ‘too new to purge’ because a completely unrelated record in the secondary index is new. Until MDEV-17598 is implemented, we do not know which secondary index records are too new. In fact, with the above fix, I would have several tests failing. Here is an example: CURRENT_TEST: parts.partition_alter2_2_2_innodb --- /mariadb/10.6/mysql-test/suite/parts/r/partition_alter2_2_2_innodb.result 2023-08-22 11:40:09.227128297 +0300 +++ /mariadb/10.6/mysql-test/suite/parts/r/partition_alter2_2_2_innodb.reject 2024-05-29 16:51:04.449362810 +0300 @@ -18195,6 +18195,9 @@ test.t1 analyze status OK CHECK TABLE t1 EXTENDED; Table Op Msg_type Msg_text +test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `parta` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x7FFFF5),[3] (0x7FFFF5),[6] A(0x000000000C41)} +test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `partb` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x7FFFED),[3] (0x800015),[6] (0x000000000BDE)} +test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `partc` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x800005),[3] (0x800005),[6] (0x000000000BE3)} test.t1 check status OK CHECKSUM TABLE t1 EXTENDED; Table Checksum @@ -19248,6 +19251,7 @@ test.t1 analyze status OK CHECK TABLE t1 EXTENDED; Table Op Msg_type Msg_text +test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `part3`, Subpartition `subpart32` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x800006),[3] (0x800006),[6] (0x000000000D18)} test.t1 check status OK CHECKSUM TABLE t1 EXTENDED; Table Checksum @@ -20300,6 +20304,7 @@ test.t1 analyze status OK CHECK TABLE t1 EXTENDED; Table Op Msg_type Msg_text +test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `part3`, Subpartition `part3sp0` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x7FFFEC),NULL,[6] h(0x000000000E68)} test.t1 check status OK CHECKSUM TABLE t1 EXTENDED; Table Checksum Note that to my understanding, nothing comparable to MDEV-24402 was implemented in MySQL; MySQL would ignore the EXTENDED attribute of CHECK TABLE for InnoDB tables. Also note that such orphan records can be created due to a known bug MDEV-29823 . In any case, I do not remember any failures of the regression tests that the above patch would seem to break. The orphan secondary index records are mostly ‘invisible’ to users; you would only observe that the data files are larger than usual. MVCC reads and other operations would just simply ignore the orphan secondary indexes, of course with some performance overhead. At https://mariadb.org/fest2022/how-innodb-multi-version-concurrency-control-mvcc-works/ you can find my presentation about this. That said, this is a nice idea. Maybe you can come up with an improvement, or maybe I made a mistake in my translation to MariaDB.

            People

              marko Marko Mäkelä
              baotiao zongzhi chen
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.