baotiao, I implemented your idea as follows on top of MariaDB Server 10.6:
diff --git a/storage/innobase/row/row0purge.cc b/storage/innobase/row/row0purge.cc
|
index 4533b7166b7..8e2e6f8cd82 100644
|
--- a/storage/innobase/row/row0purge.cc
|
+++ b/storage/innobase/row/row0purge.cc
|
@@ -305,11 +305,17 @@ row_purge_poss_sec(
|
mtr_t* sec_mtr,
|
bool is_tree)
|
{
|
+ ut_ad(index->is_btree() || index->is_spatial());
|
+ ut_ad(!index->is_primary());
|
+
|
+ if (sec_pcur && !purge_sys.is_purgeable(
|
+ page_get_max_trx_id(btr_pcur_get_page(sec_pcur)))) {
|
+ return false;
|
+ }
|
+
|
bool can_delete;
|
mtr_t mtr;
|
|
- ut_ad(!dict_index_is_clust(index));
|
-
|
mtr_start(&mtr);
|
|
can_delete = !row_purge_reposition_pcur(BTR_SEARCH_LEAF, node, &mtr)
|
We would have sec_pcur=nullptr if the caller is purge buffering, which after some refactoring in 10.6 would be btr_cur_t::search_leaf(). In that case the secondary index leaf page is not available in the buffer pool.
I am not completely convinced about the correctness of this yet. As far as I can tell, row_vers_old_has_index_entry(also_curr=true, …) does not consider the secondary index leaf page header field PAGE_MAX_TRX_ID field at all. It just traverses the stack of clustered index record versions down to the oldest version that is visible in purge_sys.view. If no matching clustered index record is found, the secondary index record is deemed safe to purge.
The PAGE_MAX_TRX_ID can be ‘too new to purge’ because a completely unrelated record in the secondary index is new. Until MDEV-17598 is implemented, we do not know which secondary index records are too new. In fact, with the above fix, I would have several tests failing. Here is an example:
CURRENT_TEST: parts.partition_alter2_2_2_innodb
|
--- /mariadb/10.6/mysql-test/suite/parts/r/partition_alter2_2_2_innodb.result 2023-08-22 11:40:09.227128297 +0300
|
+++ /mariadb/10.6/mysql-test/suite/parts/r/partition_alter2_2_2_innodb.reject 2024-05-29 16:51:04.449362810 +0300
|
@@ -18195,6 +18195,9 @@
|
test.t1 analyze status OK
|
CHECK TABLE t1 EXTENDED;
|
Table Op Msg_type Msg_text
|
+test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `parta` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x7FFFF5),[3] (0x7FFFF5),[6] A(0x000000000C41)}
|
+test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `partb` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x7FFFED),[3] (0x800015),[6] (0x000000000BDE)}
|
+test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `partc` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x800005),[3] (0x800005),[6] (0x000000000BE3)}
|
test.t1 check status OK
|
CHECKSUM TABLE t1 EXTENDED;
|
Table Checksum
|
@@ -19248,6 +19251,7 @@
|
test.t1 analyze status OK
|
CHECK TABLE t1 EXTENDED;
|
Table Op Msg_type Msg_text
|
+test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `part3`, Subpartition `subpart32` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x800006),[3] (0x800006),[6] (0x000000000D18)}
|
test.t1 check status OK
|
CHECKSUM TABLE t1 EXTENDED;
|
Table Checksum
|
@@ -20300,6 +20304,7 @@
|
test.t1 analyze status OK
|
CHECK TABLE t1 EXTENDED;
|
Table Op Msg_type Msg_text
|
+test.t1 check Warning InnoDB: Clustered index record not found for index `uidx` of table `test`.`t1` /* Partition `part3`, Subpartition `part3sp0` */: COMPACT RECORD(info_bits=32, 3 fields): {[3] (0x7FFFEC),NULL,[6] h(0x000000000E68)}
|
test.t1 check status OK
|
CHECKSUM TABLE t1 EXTENDED;
|
Table Checksum
|
Note that to my understanding, nothing comparable to MDEV-24402 was implemented in MySQL; MySQL would ignore the EXTENDED attribute of CHECK TABLE for InnoDB tables. Also note that such orphan records can be created due to a known bug MDEV-29823. In any case, I do not remember any failures of the regression tests that the above patch would seem to break.
The orphan secondary index records are mostly ‘invisible’ to users; you would only observe that the data files are larger than usual. MVCC reads and other operations would just simply ignore the orphan secondary indexes, of course with some performance overhead. At https://mariadb.org/fest2022/how-innodb-multi-version-concurrency-control-mvcc-works/ you can find my presentation about this.
That said, this is a nice idea. Maybe you can come up with an improvement, or maybe I made a mistake in my translation to MariaDB.
Thank you! I missed this while working on
MDEV-32050andMDEV-33213. Thanks toMDEV-24402, it should be rather easy to confirm if there is any fault in your analysis that might cause secondary index records to be left behind. If the idea is correct, MDEV-17598 might become less critical.