[MDEV-29666] InnoDB fails to purge secondary index records when indexed virtual columns exist Created: 2022-09-29  Updated: 2022-11-16  Resolved: 2022-10-05

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Virtual Columns
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.3.37, 10.4.27, 10.5.18, 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2, 10.11.1

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: leak, purge

Issue Links:
Problem/Incident
causes MDEV-30024 InnoDB: tried to purge non-delete-mar... Closed
is caused by MDEV-5800 indexes on virtual (not materialized)... Closed
Relates
relates to MDEV-22363 Reimplement the InnoDB virtual column... Open
relates to MDEV-29823 Secondary index records may never be ... Open
relates to MDEV-14051 'Undo log record is too big.' error o... Closed
relates to MDEV-14799 After UPDATE of indexed columns, old ... Closed
relates to MDEV-20640 [ERROR] InnoDB: tried to purge non-de... Closed
relates to MDEV-24402 CHECK TABLE may miss some cases of in... Closed

 Description   

When I tested MDEV-24402 CHECK TABLE...EXTENDED, the test gcol.innodb_virtual_purge would fail because secondary indexes contain orphan records. Here is a reduced test case:

--source include/have_innodb.inc
 
CREATE TABLE t1 (a INT, b INT, a1 INT AS(a) VIRTUAL,
INDEX(a1),INDEX(b)) ENGINE=InnoDB;
INSERT INTO t1 SET a=1, b=1;
 
UPDATE t1 SET a=2, b=3;
let $datadir=`select @@datadir`;
let $sec=`select 3*@@innodb_page_size`;
 
SET GLOBAL innodb_fast_shutdown=0;
--source include/shutdown_mysqld.inc
--exec od -Ax -t x1 -j $sec $datadir/test/t1.ibd
--source include/start_mysqld.inc
DROP TABLE t1;

According to the od output, all history for the clustered index record (data fields starting at 0xc07e = 0xc063+0x001b) was purged. The fields are (DB_ROW_ID,DB_TRX_ID,DB_ROLL_PTR,a,b)=(0x200,0,1<<55,2,3):

10.3 ce23802c0e4c85018f2141d2ebb0602dc0d3d6d8

00c060 02 00 1b 69 6e 66 69 6d 75 6d 00 02 00 0b 00 00
00c070 73 75 70 72 65 6d 75 6d 00 00 00 10 ff f2 00 00
00c080 00 00 02 00 00 00 00 00 00 00 80 00 00 00 00 00
00c090 00 80 00 00 02 80 00 00 03 00 00 00 00 00 00 00

In the secondary indexes, the delete-marked records created by the UPDATE were not purged. Here is the entry for index(a1):

10.3 ce23802c0e4c85018f2141d2ebb0602dc0d3d6d8

010060 02 00 1b 69 6e 66 69 6d 75 6d 00 03 00 0b 00 00
010070 73 75 70 72 65 6d 75 6d 00 20 00 10 00 10 80 00
010080 00 01 00 00 00 00 02 00 00 00 00 18 ff e2 80 00
010090 00 02 00 00 00 00 02 00 00 00 00 00 00 00 00 00

We see both the delete-marked (a,DB_ROW_ID)=(1,0x200) at 0x1007e as well as the correct record (a,DB_ROW_ID)=2,0x200) at 0x1008e. There is a garbage record in index(b) as well:

10.3 ce23802c0e4c85018f2141d2ebb0602dc0d3d6d8

014060 02 00 1b 69 6e 66 69 6d 75 6d 00 03 00 0b 00 00
014070 73 75 70 72 65 6d 75 6d 00 20 00 10 00 10 80 00
014080 00 01 00 00 00 00 02 00 00 00 00 18 ff e2 80 00
014090 00 03 00 00 00 00 02 00 00 00 00 00 00 00 00 00

The delete-marked (b,DB_ROW_ID)=(1,0x200) at 0x1e07e should have been purged. Only the record (b,DB_ROW_ID)=(3,0x200) should be there.

I tracked this down to the following code in the function row_purge_upd_exist_or_extern_func():

		if (row_upd_changes_ord_field_binary(node->index, node->update,
						     thr, NULL, NULL)) {
			/* Build the older version of the index entry */
			dtuple_t*	entry = row_build_index_entry_low(
				node->row, NULL, node->index,
				heap, ROW_BUILD_FOR_PURGE);
			row_purge_remove_sec_if_poss(node, node->index, entry);

We end up with entry=nullptr for both indexes, because node->row is invalid or being accessed incorrectly. The pointer to data_error indicates a column that has not been fully initialized:

10.3 ce23802c0e4c85018f2141d2ebb0602dc0d3d6d8

Thread 22 hit Breakpoint 2, row_purge_remove_sec_if_poss (node=node@entry=0x5613c42f37a0, index=0x7f8414167a50, entry=0x0) at /mariadb/10.3/storage/innobase/row/row0purge.cc:688
688		if (!entry) {
1: node.table.name = {m_name = 0x7f8414158690 "test/t1"}
(rr) p *node.row
$1 = {info_bits = 0, n_fields = 5, n_fields_cmp = 5, fields = 0x7f841c009270, 
  n_v_fields = 1, v_fields = 0x7f841c0092e8, magic_n = 65478679}
(rr) p *node.row.fields@5
$2 = {{data = 0x5613c42f3d5d, ext = 0, spatial_status = 0, len = 4, type = {
      prtype = 1027, mtype = 6, len = 4, mbminlen = 0, mbmaxlen = 0}}, {
    data = 0x5613c42f3d63, ext = 0, spatial_status = 0, len = 4, type = {
      prtype = 1027, mtype = 6, len = 4, mbminlen = 0, mbmaxlen = 0}}, {
    data = 0x5613c3518908 <data_error>, ext = 0, spatial_status = 0, 
    len = 4294967295, type = {prtype = 111, mtype = 0, len = 0, mbminlen = 0, 
      mbmaxlen = 0}}, {data = 0x7f841c009210, ext = 0, spatial_status = 0, 
    len = 6, type = {prtype = 257, mtype = 8, len = 6, mbminlen = 0, 
      mbmaxlen = 0}}, {data = 0x7f841c009218, ext = 0, spatial_status = 0, 
    len = 7, type = {prtype = 258, mtype = 8, len = 7, mbminlen = 0, 
      mbmaxlen = 0}}}



 Comments   
Comment by Marko Mäkelä [ 2022-09-29 ]

As far as I can tell, the problem was caused by the unnecessary and problematic separation of virtual and regular columns in MySQL 5.7.8. The function trx_undo_rec_get_partial_row() will leave a gap in the table row tuple, and instead copy the value of the indexed virtual column a1=1 from the undo log record to a separate array:

		if (is_virtual) {
			dict_v_col_t* vcol = dict_table_get_nth_v_col(
						index->table, field_no);
			col = &vcol->m_col;
			col_no = dict_col_get_no(col);
			dfield = dtuple_get_nth_v_field(*row, vcol->v_pos);
			dict_col_copy_type(
				&vcol->m_col,
				dfield_get_type(dfield));
		} else {
		}
 
		dfield_set_data(dfield, field, len);

After the MDEV-22363 cleanup, we would just allocate a table row tuple containing all the columns in the same order as TABLE_SHARE::field, including any virtual columns.

Some insult to injury is being added in row_build_index_entry_low():

		if (col->is_virtual()) {
			const dict_v_col_t*	v_col
				= reinterpret_cast<const dict_v_col_t*>(col);
 
			ut_ad(v_col->v_pos < dtuple_get_n_v_fields(row));
			dfield2 = dtuple_get_nth_v_field(row, v_col->v_pos);
 
			ut_ad(dfield_is_null(dfield2) ||
			      dfield_get_len(dfield2) == 0 || dfield2->data);
		} else {
			dfield2 = dtuple_get_nth_field(row, col_no);
			ut_ad(dfield_get_type(dfield2)->mtype == DATA_MISSING
			      || (!(dfield_get_type(dfield2)->prtype
				    & DATA_VIRTUAL)));
		}
 
		if (UNIV_UNLIKELY(dfield_get_type(dfield2)->mtype
				  == DATA_MISSING)) {
			/* The field has not been initialized in the row.
			This should be from trx_undo_rec_get_partial_row(). */
			return(NULL);
		}

The assertion in the else branch had been relaxed in MySQL 5.7.9, with no analysis, explanation or test case, or identification of the failing assertion.

The column that we fail to find in the clustered index row tuple is very much present in the undo log record, because it is the primary key GEN_CLUST_INDEX(DB_ROW_ID):

10.3 ce23802c0e4c85018f2141d2ebb0602dc0d3d6d8

Thread 3 hit Hardware access (read/write) watchpoint 7: *(dfield_t *) 0x7f841c0092a0
 
Value = {data = 0x5613c3518908 <data_error>, ext = 0, spatial_status = 0, len = 4294967295, type = {prtype = 111, mtype = 0, len = 0, mbminlen = 0, mbmaxlen = 0}}
0x00005613c30e496b in row_build_index_entry_low (row=0x7f841c009238, ext=ext@entry=0x0, index=0x7f8414168110, heap=heap@entry=0x7f841c00aff0, flag=flag@entry=1) at /mariadb/10.3/storage/innobase/row/row0row.cc:259
259				ut_ad(dfield_get_type(dfield2)->mtype == DATA_MISSING
(rr) p *ind_field
$25 = {col = 0x7f841400ca20, name = {m_name = 0x7f841400cb2c "DB_ROW_ID"}, prefix_len = 0, fixed_len = 6}

Comment by Marko Mäkelä [ 2022-09-29 ]

In the split encoding of the table row tuple, row_build_index_entry_low() should have the persistent columns of the table in row->fields, indexed by the column position (as if no virtual columns existed), and the virtual columns in row->v_fields. That is, row->fields should contain the value of (a,b,DB_ROW_ID,DB_TRX_ID,DB_ROLL_PTR) and row->v_fields the value of (a1).

This error affects all secondary indexes because the value of the hidden primary key column DB_ROW_ID is missing.

If the UPDATE statement is changed to UPDATE t1 SET b=3 (no change to the indexed virtual column), then trx_undo_rec_get_partial_row() will be invoked with the following:

10.3 ce23802c0e4c85018f2141d2ebb0602dc0d3d6d8

#0  trx_undo_rec_get_partial_row (ptr=ptr@entry=0x55efec4809d1 "", index=index@entry=0x7f599800d850, update=0x7f59940090f0, row=row@entry=0x55efec480498, ignore_prefix=0, heap=0x55efec480920)
    at /mariadb/10.3/storage/innobase/trx/trx0rec.cc:1681
1681	{
(rr) p *update
$1 = {heap = 0x55efec480920, info_bits = 0, old_vrow = 0x0, n_fields = 3, fields = 0x7f5994009120, vers_sys_value = "\000\000\000\000\000\000\000"}
(rr) p *update.fields@update.n_fields
$2 = {{field_no = 4, orig_len = 0, exp = 0x0, new_val = {data = 0x55efec4809cd, ext = 0, spatial_status = 0, len = 4, type = {prtype = 1027, mtype = 6, len = 4, mbminlen = 0, mbmaxlen = 0}}, old_v_val = 0x0}, {
    field_no = 1, orig_len = 0, exp = 0x0, new_val = {data = 0x7f59940091b0, ext = 0, spatial_status = 0, len = 6, type = {prtype = 257, mtype = 8, len = 6, mbminlen = 0, mbmaxlen = 0}}, old_v_val = 0x0}, {
    field_no = 2, orig_len = 0, exp = 0x0, new_val = {data = 0x7f59940091b8, ext = 0, spatial_status = 0, len = 7, type = {prtype = 258, mtype = 8, len = 7, mbminlen = 0, mbmaxlen = 0}}, old_v_val = 0x0}}
(rr) p *index.fields@index.n_fields
$3 = {{col = 0x7f599800ca20, name = {m_name = 0x7f599800cb2c "DB_ROW_ID"}, prefix_len = 0, fixed_len = 6}, {col = 0x7f599800ca40, name = {m_name = 0x7f599800cb36 "DB_TRX_ID"}, prefix_len = 0, fixed_len = 6}, {
    col = 0x7f599800ca60, name = {m_name = 0x7f599800cb40 "DB_ROLL_PTR"}, prefix_len = 0, fixed_len = 7}, {col = 0x7f599800c9e0, name = {m_name = 0x7f599800cb28 "a"}, prefix_len = 0, fixed_len = 4}, {
    col = 0x7f599800ca00, name = {m_name = 0x7f599800cb2a "b"}, prefix_len = 0, fixed_len = 4}}
(rr) p/x *ptr@100
$4 = {0x0, 0x17, 0x0, 0x6, 0x0, 0x0, 0x0, 0x0, 0x2, 0x0, 0x83, 0xff, 0xf1, 0x0, 0x5, 0x1, 0x1a, 0x0, 0x4, 0x80, 0x0, 0x0, 0x1, 0x2, 0xc3, 0x48, 0xec, 0xef, 0x55, 0x0, 0x0, 0x5a, 0xb6, 0x1c, 0xeb, 0xef, 0x55, 
  0x0, 0x0, 0xf3, 0xa8, 0x1c, 0xeb, 0xef, 0x55, 0x0, 0x0, 0x83, 0xc0, 0x1c, 0xeb, 0xef, 0x55, 0x0, 0x0, 0x20, 0x9, 0x48, 0xec, 0xef, 0x55, 0x0, 0x0, 0xf0, 0x9, 0x48, 0xec, 0xef, 0x55, 0x0, 0x0, 0xb0, 0x4f, 
  0xa, 0x98, 0x59, 0x7f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2c, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0x0, 0x0, 0x0, 0x0}

The hidden primary key column DB_ROW_ID at field_no=0 is not affected by the update, but its value will be read by trx_undo_rec_get_col_val() from ptr+3 that was passed to trx_undo_rec_get_partial_row() and correctly stored to the row tuple. The primary key was also already parsed and stored in purge_node_t::ref by the caller.

In the case of UPDATE t1 SET a=2, b=3, the ptr that is passed to trx_undo_rec_get_partial_row() will look different:

10.3 ce23802c0e4c85018f2141d2ebb0602dc0d3d6d8

Thread 3 hit Breakpoint 1, trx_undo_rec_get_partial_row (ptr=ptr@entry=0x557248fc29e9 "", index=index@entry=0x7f21bc00d850, update=0x7f21c40090f0, row=row@entry=0x557248fc2498, ignore_prefix=0, heap=0x557248fc2920) at /mariadb/10.3/storage/innobase/trx/trx0rec.cc:1681
1681	{
(rr) p/x *ptr@100
$1 = {0x0, 0xf, 0x83, 0xff, 0xf1, 0x0, 0x5, 0x1, 0x1a, 0x0, 0x4, 0x80, 0x0, 0x0, 0x1, 0x2, 0xc3, 0xa, 0xbc, 0x21, 0x7f, 0x0, 0x0, 0x5a, 0x46, 0x76, 0x47, 0x72, 0x55, 0x0, 0x0, 0xf3, 0x38, 0x76, 0x47, 0x72, 
  0x55, 0x0, 0x0, 0x83, 0x50, 0x76, 0x47, 0x72, 0x55, 0x0, 0x0, 0x20, 0x29, 0xfc, 0x48, 0x72, 0x55, 0x0, 0x0, 0x0, 0x2a, 0xfc, 0x48, 0x72, 0x55, 0x0, 0x0, 0xb0, 0x4f, 0xa, 0xbc, 0x21, 0x7f, 0x0, 0x0, 0x0, 0x0, 
  0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2c, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0x0 <repeats 12 times>}

The record for the DB_ROW_ID=0x200 is completely missing.

Even if I add an explicit primary key, the outcome is the same:

--source include/have_innodb.inc
 
CREATE TABLE t1 (pk INT PRIMARY KEY, a INT, b INT, a1 INT AS(a) VIRTUAL,
INDEX(a1),INDEX(b)) ENGINE=InnoDB;
INSERT INTO t1 SET pk=1, a=1, b=1;
 
UPDATE t1 SET a=2, b=3;
let $datadir=`select @@datadir`;
let $sec=`select 3*@@innodb_page_size`;
 
SET GLOBAL innodb_fast_shutdown=0;
--source include/shutdown_mysqld.inc
--exec od -Ax -t x1 -j $sec $datadir/test/t1.ibd
--source include/start_mysqld.inc
DROP TABLE t1;

10.3 ce23802c0e4c85018f2141d2ebb0602dc0d3d6d8

014060 02 00 1b 69 6e 66 69 6d 75 6d 00 03 00 0b 00 00
014070 73 75 70 72 65 6d 75 6d 00 20 00 10 00 0e 80 00
014080 00 01 80 00 00 01 00 00 00 18 ff e4 80 00 00 03
014090 80 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00

Here, we will have two records in the index: a delete-marked (1,1) that should have been purged, and the correct record (b,pk)=(3,1).

This bug reminds me of MDEV-14799. If I revert both that fix and the optimization MDEV-14051 that caused it, then the index record will be purged correctly. It looks like the optimization must be adapted for indexed virtual columns.

Generated at Thu Feb 08 10:10:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.