[MDEV-14799] After UPDATE of indexed columns, old values will not be purged from secondary indexes Created: 2017-12-29  Updated: 2022-10-05  Resolved: 2018-01-02

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 5.5.59, 10.0.33, 10.1.29, 10.0.33-galera, 10.2.10, 10.3.3
Fix Version/s: 5.5.59, 10.0.34, 10.2.12, 10.0.34-galera, 10.3.4, 10.1.31

Type: Bug Priority: Blocker
Reporter: Valerii Kravchuk Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: purge, regression

Issue Links:
Duplicate
is duplicated by MDEV-16003 too large INDEX_LENGTH Closed
Relates
relates to MDEV-29666 InnoDB fails to purge secondary index... Closed
relates to MDEV-14051 'Undo log record is too big.' error o... Closed

 Description   

It seems a regression happened after 10.0.31 with InnoDB handling of deleted secondary index records. The basic test case is the following:

drop table if exists test;
create table test(
  id int unsigned auto_increment primary key,
  serial int unsigned not null,
  acked boolean default false,
  index serial_acked (serial, acked)
);
 
insert into test (serial) values (12345);
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
insert into test (serial) select serial from test;
 
explain select * from test where serial = 12345 and acked = true;
explain select * from test where serial = 12345 and acked = false;
 
update test set acked = true;
 
explain select * from test where serial = 12345 and acked = true;
explain select * from test where serial = 12345 and acked = false;
 
analyze table test;
 
explain select * from test where serial = 12345 and acked = true;
explain select * from test where serial = 12345 and acked = false;

We get the following after populating the table:

MariaDB [indextest]> select count(*) from test where serial = 12345 and acked = true;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.03 sec)
 
MariaDB [indextest]> select count(*) from test where serial = 12345 and acked = false;
+----------+
| count(*) |
+----------+
|  1048576 |
+----------+
1 row in set (0.52 sec)
 
MariaDB [indextest]> explain select * from test where serial = 12345 and acked = true;
+------+-------------+-------+------+---------------+--------------+---------+-------------+------+-------------+
| id   | select_type | table | type | possible_keys | key          | key_len | ref         | rows | Extra       |
+------+-------------+-------+------+---------------+--------------+---------+-------------+------+-------------+
|    1 | SIMPLE      | test  | ref  | serial_acked  | serial_acked | 6       | const,const |    1 | Using index |
+------+-------------+-------+------+---------------+--------------+---------+-------------+------+-------------+
1 row in set (0.00 sec)
 
MariaDB [indextest]> explain select * from test where serial = 12345 and acked = false;
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
| id   | select_type | table | type | possible_keys | key          | key_len | ref         | rows   | Extra       |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
|    1 | SIMPLE      | test  | ref  | serial_acked  | serial_acked | 6       | const,const | 523672 | Using index |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
1 row in set (0.00 sec)
 
MariaDB [indextest]> show table status like 'test'\G                           
*************************** 1. row ***************************
           Name: test
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 1047345
 Avg_row_length: 31
    Data_length: 33095680
Max_data_length: 0
   Index_length: 17350656
      Data_free: 4194304
 Auto_increment: 1376221
    Create_time: 2017-12-29 12:58:19
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

Now, if we run UPDATE on 10.0.31 (all default settings besides port and socket), we get:

MariaDB [indextest]> update test set acked = true;
Query OK, 1048576 rows affected (23.26 sec)
Rows matched: 1048576  Changed: 1048576  Warnings: 0
 
MariaDB [indextest]> explain select * from test where serial = 12345 and acked = true;
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
| id   | select_type | table | type | possible_keys | key          | key_len | ref         | rows   | Extra       |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
|    1 | SIMPLE      | test  | ref  | serial_acked  | serial_acked | 6       | const,const | 523672 | Using index |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
1 row in set (0.00 sec)
 
MariaDB [indextest]> explain select * from test where serial = 12345 and acked = false;
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
| id   | select_type | table | type | possible_keys | key          | key_len | ref         | rows   | Extra       |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
|    1 | SIMPLE      | test  | ref  | serial_acked  | serial_acked | 6       | const,const | 465874 | Using index |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
1 row in set (0.01 sec)
 
MariaDB [indextest]> analyze table test;
+----------------+---------+----------+----------+
| Table          | Op      | Msg_type | Msg_text |
+----------------+---------+----------+----------+
| indextest.test | analyze | status   | OK       |
+----------------+---------+----------+----------+
1 row in set (0.10 sec)
 
MariaDB [indextest]> explain select * from test where serial = 12345 and acked = true;
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
| id   | select_type | table | type | possible_keys | key          | key_len | ref         | rows   | Extra       |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
|    1 | SIMPLE      | test  | ref  | serial_acked  | serial_acked | 6       | const,const | 523672 | Using index |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
1 row in set (0.00 sec)
 
MariaDB [indextest]> explain select * from test where serial = 12345 and acked = false;
+------+-------------+-------+------+---------------+--------------+---------+-------------+------+-------------+
| id   | select_type | table | type | possible_keys | key          | key_len | ref         | rows | Extra       |
+------+-------------+-------+------+---------------+--------------+---------+-------------+------+-------------+
|    1 | SIMPLE      | test  | ref  | serial_acked  | serial_acked | 6       | const,const |    1 | Using index |
+------+-------------+-------+------+---------------+--------------+---------+-------------+------+-------------+
1 row in set (0.00 sec)
 
MariaDB [indextest]> select count(*) from test where serial = 12345 and acked = true;
+----------+
| count(*) |
+----------+
|  1048576 |
+----------+
1 row in set (0.58 sec)
 
MariaDB [indextest]> select count(*) from test where serial = 12345 and acked = false;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

That is, immediately after UPDATE the estimated number of rows in EXPLAIN for "false" case may be wrong, but as soon as we run ANALYZE we get expected estimation, and query that tries to get these rows (to find nothing) run fast.

Now, if we do the same on versions 10.0.33 or 10.2.11 (again all default settings besides port and socket), we get:

...
MariaDB [test]> select count(*) from test where serial = 12345 and acked = true;
+----------+
| count(*) |
+----------+
|  1048576 |
+----------+
1 row in set (0.52 sec)
 
MariaDB [test]> select count(*) from test where serial = 12345 and acked = false;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.13 sec)
 
MariaDB [test]> analyze table test;
+-----------+---------+----------+----------+
| Table     | Op      | Msg_type | Msg_text |
+-----------+---------+----------+----------+
| test.test | analyze | status   | OK       |
+-----------+---------+----------+----------+
1 row in set (0.17 sec)
 
MariaDB [test]> explain select * from test where serial = 12345 and acked = true;
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
| id   | select_type | table | type | possible_keys | key          | key_len | ref         | rows   | Extra       |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
|    1 | SIMPLE      | test  | ref  | serial_acked  | serial_acked | 6       | const,const | 523672 | Using index |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
1 row in set (0.00 sec)
 
MariaDB [test]> explain select * from test where serial = 12345 and acked = false;
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
| id   | select_type | table | type | possible_keys | key          | key_len | ref         | rows   | Extra       |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
|    1 | SIMPLE      | test  | ref  | serial_acked  | serial_acked | 6       | const,const | 523672 | Using index |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
1 row in set (0.00 sec)

Note that ANALYZE does NOT help to get more correct estimation for rows for the "false" case, moreover, attempt to read these rows (to find nothing) takes notable time. Moreover, in the table status we see:

MariaDB [test]> show table status like 'test'\G
*************************** 1. row ***************************
           Name: test
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 1047345
 Avg_row_length: 31
    Data_length: 33095680
Max_data_length: 0
   Index_length: 33095680
      Data_free: 5242880
 Auto_increment: 1376221
    Create_time: 2017-12-29 13:07:55
    Update_time: 2017-12-29 13:10:14
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.10 sec)

that secondary index size is estimated as too big, to it seems delete-marked records are taken into account. Further attempts to run ANALYZE with any settings do not help:

MariaDB [test]> set global innodb_stats_persistent_sample_pages = 20000;
Query OK, 0 rows affected (0.03 sec)
 
MariaDB [test]> analyze table test;
+-----------+---------+----------+----------+
| Table     | Op      | Msg_type | Msg_text |
+-----------+---------+----------+----------+
| test.test | analyze | status   | OK       |
+-----------+---------+----------+----------+
1 row in set (0.41 sec)
 
MariaDB [test]> explain select * from test where serial = 12345 and acked = true;
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
| id   | select_type | table | type | possible_keys | key          | key_len | ref         | rows   | Extra       |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
|    1 | SIMPLE      | test  | ref  | serial_acked  | serial_acked | 6       | const,const | 524288 | Using index |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
1 row in set (0.00 sec)
 
MariaDB [test]> explain select * from test where serial = 12345 and acked = false;
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
| id   | select_type | table | type | possible_keys | key          | key_len | ref         | rows   | Extra       |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
|    1 | SIMPLE      | test  | ref  | serial_acked  | serial_acked | 6       | const,const | 524288 | Using index |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
1 row in set (0.00 sec)
 
MariaDB [test]> show table status like 'test'\G
*************************** 1. row ***************************
           Name: test
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 1048576
 Avg_row_length: 31
    Data_length: 33095680
Max_data_length: 0
   Index_length: 33095680
      Data_free: 5242880
 Auto_increment: 1376221
    Create_time: 2017-12-29 13:07:55
    Update_time: 2017-12-29 13:10:14
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

What does help is table rebuild/OPTIMIZE:

MariaDB [test]> optimize table test;
+-----------+----------+----------+-------------------------------------------------------------------+
| Table     | Op       | Msg_type | Msg_text                                                          |
+-----------+----------+----------+-------------------------------------------------------------------+
| test.test | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| test.test | optimize | status   | OK                                                                |
+-----------+----------+----------+-------------------------------------------------------------------+
2 rows in set (16.62 sec)
 
MariaDB [test]> show table status like 'test'\G                                 *************************** 1. row ***************************
           Name: test
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 1048576
 Avg_row_length: 35
    Data_length: 37273600
Max_data_length: 0
   Index_length: 19447808
      Data_free: 2097152
 Auto_increment: 1376221
    Create_time: 2017-12-29 13:13:51
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)
 
MariaDB [test]> explain select * from test where serial = 12345 and acked = true;
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
| id   | select_type | table | type | possible_keys | key          | key_len | ref         | rows   | Extra       |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
|    1 | SIMPLE      | test  | ref  | serial_acked  | serial_acked | 6       | const,const | 524288 | Using index |
+------+-------------+-------+------+---------------+--------------+---------+-------------+--------+-------------+
1 row in set (0.00 sec)
 
MariaDB [test]> explain select * from test where serial = 12345 and acked = false;
+------+-------------+-------+------+---------------+--------------+---------+-------------+------+-------------+
| id   | select_type | table | type | possible_keys | key          | key_len | ref         | rows | Extra       |
+------+-------------+-------+------+---------------+--------------+---------+-------------+------+-------------+
|    1 | SIMPLE      | test  | ref  | serial_acked  | serial_acked | 6       | const,const |    1 | Using index |
+------+-------------+-------+------+---------------+--------------+---------+-------------+------+-------------+
1 row in set (0.00 sec)
 
MariaDB [test]> select count(*) from test where serial = 12345 and acked = false;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

I consider the wrong estimated number of records a symptom, as even if we get with exactly the same access plan, SELECT itself seems to spend time checking deleted rows as well, somehow. In real life cases, many selects that get wrong rows estimations becomes slow for this and similar cases involving changes of many records in secondary indexes after upgrade from 10.0.31, so there is a regression somewhere.

This may have something to do with https://bugs.mysql.com/bug.php?id=75231

I think this may have something to do



 Comments   
Comment by Elena Stepanova [ 2017-12-30 ]

The change came with this patch in 5.5:

commit 439a7c994a6f01de1a3980c0405fb9e24ae3bd1a
Author: Marko Mäkelä <marko.makela@mariadb.com>
Date:   Tue Oct 24 15:20:54 2017 +0300
 
    MDEV-14051 'Undo log record is too big.' error occurring in very narrow range of string lengths
    
    InnoDB was writing unnecessary information to the
    update undo log records. Most notably, if an indexed column is updated,
    the old value of the column would be logged twice: first as part of
    the update vector, and then another time because it is an indexed column.
    
    Because the InnoDB undo log record must fit in a single page,
    this would cause unnecessary failure of certain updates.
    
    Even after this fix, InnoDB still seems to be unnecessarily logging
    indexed column values for non-updated columns. It seems that non-updated
    secondary index columns only need to be logged when a PRIMARY KEY
    column is updated. To reduce risk, we are not fixing this remaining flaw
    in GA versions.
    
    trx_undo_page_report_modify(): Log updated indexed columns only once.

Please note that the test case, if it's executed automatically or copy-pasted, might cause false positives even on "good" versions, it needs some time between ANALYZE and EXPLAIN to get the proper numbers.

Comment by Marko Mäkelä [ 2018-01-02 ]

Purge in the secondary index is wrongly skipped in row_purge_upd_exist_or_extern_func() because node->row only contains values for the columns id,serial but not acked:

		if (row_upd_changes_ord_field_binary(node->index, node->update,
						     thr, NULL, NULL)) {
			/* Build the older version of the index entry */
			dtuple_t*	entry = row_build_index_entry_low(
				node->row, NULL, node->index, heap);
			row_purge_remove_sec_if_poss(node, node->index, entry);
			mem_heap_empty(heap);
		}

The node->update only specifies values for serial,DB_TRX_ID,DB_ROLL_PTR, even though the value of the updated column acked is included in the undo log record:

p/x undo_page[first_free]@ptr-$44
$46 = {0x1, 0x94, 0xc, 0x2, 0x12, 0x0, 0x0, 0x0, 0x0, 0x5, 0x5, 0xe0, 0x85, 
  0x0, 0x0, 0x1, 0x38, 0x1, 0x10, 0x4, 0x0, 0x0, 0x0, 0x3, 0x1, 0x4, 0x1, 
  0x80, 0x0, 0xe, 0x0, 0x4, 0x0, 0x0, 0x0, 0x3, 0x3, 0x4, 0x0, 0x0, 0x30, 
  0x39, 0x1, 0x68}

The substring 0x4, 0x1, 0x80 represents the old value of the column acked=false.
After the fix of MDEV-14051, we are not restoring enough information from the undo log record; before the fix, the information was restored from the redundant portion of the undo log record.

Comment by Marko Mäkelä [ 2018-01-02 ]

The parsing problem should be local to row_purge_parse_undo_rec().

	ptr = trx_undo_update_rec_get_update(ptr, clust_index, type, trx_id,
					     roll_ptr, info_bits, trx,
					     node->heap, &(node->update));
 
	/* Read to the partial row the fields that occur in indexes */
 
	if (!(node->cmpl_info & UPD_NODE_NO_ORD_CHANGE)) {
		ptr = trx_undo_rec_get_partial_row(
			ptr, clust_index, &node->row,
			type == TRX_UNDO_UPD_DEL_REC,
			node->heap);
	}

Here, we should copy the missing columns from node->update to node->row.

Comment by Marko Mäkelä [ 2018-01-02 ]

No test case will be included before 10.2, because purge cannot be controlled reliably enough. The following test case was manually run in 5.5, 10.0, 10.1:

--source include/have_innodb.inc
 
CREATE TABLE t1 (a SERIAL, b CHAR(255) NOT NULL DEFAULT '', c BOOLEAN DEFAULT false, INDEX(b,c)) ENGINE=InnoDB ROW_FORMAT=REDUNDANT;
INSERT INTO t1 () VALUES (),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),();
INSERT INTO t1 (a) SELECT NULL FROM t1;
INSERT INTO t1 (a) SELECT NULL FROM t1;
CHECK TABLE t1;
UPDATE t1 SET c=true;
DELETE FROM t1;
 
# wait for purge (not reliable!)
--sleep 20
 
ANALYZE TABLE t1;
SELECT NAME, OTHER_INDEX_SIZE FROM INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS;
DROP TABLE t1;

Comment by Marko Mäkelä [ 2018-01-02 ]

The bug should not affect any results, but it can severely hurt the performance of operations on secondary indexes.

With the bug, the old values of updated indexed columns will forever remain delete-marked in the secondary indexes, and each secondary index operation that stumbles upon these delete-marked records will have to look up the clustered index record and optionally search for old versions in the undo logs.

The expensive lookups were filed as an upstream bug that was never really fixed. But, with this bug, the impact is much more profound, as the purge would never remove the garbage.
MySQL Bug #67044 Secondary index updates make consistent reads do O(N^2) undo page lookups
To really fix this problem, we would have to include transaction identifiers in secondary index records, changing the data format.

To fix the performance regression after upgrade, you have to drop and rebuild the (worst) affected secondary indexes. There is no easy way to determine how many delete-marked records a secondary index contains.

Generated at Thu Feb 08 08:16:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.