[MDEV-26643] Inconsistent behaviors of UPDATE under RU & RC isolation level - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.5.12
Fix Version/s: 10.11.8, 10.6.18, 11.0.6, 11.1.5, 11.2.4, 11.4.2
Component/s: Documentation, Locking, Storage Engine - InnoDB
Labels:
- innodb
Environment:
OS: Ubuntu-20.04

Description

Isolation Level: Read Uncommitted & Read Committed.
The behaviors of UPDATE for different rows are inconsistent. Just some of the rows are not updated while others are.

/* init */ drop table if exists t;

/* init */ create table t(a int, b int);

/* init */ insert into t values(null, 1), (2, 2), (null, null), (null, 3), (4, null);

/* s1 */ begin;

/* s1 */ update t set a = 10 where 1;

/* s2 */ begin;

/* s2 */ update t set b = 20 where a; -- blocked

/* s1 */ commit; -- s2 unblocked

/* s2 */ commit;

select * from t;

+------+------+

| a    | b    |

+------+------+

|   10 |    1 |

|   10 |   20 |

|   10 |   20 |

|   10 |   20 |

|   10 |   20 |

+------+------+

The field b of the first row is not updated but other rows are updated.

After several attempts, I found that the results depend on the distribution of the data in the table rather than the timing.

If the initial table is like:

+------+------+

| a    | b    |

+------+------+

|    1 |    1 |

|    2 |    2 |

| null | null |

| null |    3 |

|    4 | null |

+------+------+

then, the result will be

+------+------+

| a    | b    |

+------+------+

|   10 |   20 |

|   10 |   20 |

|   10 |   20 |

|   10 |   20 |

|   10 |   20 |

+------+------+

More specifically, if there exists a row whose column `a` is not NULL, and it is the first record from top to bottom whose column `a` is not NULL. Then, all rows before it with a NULL column `a` will not be updated, while all rows after it with a NULL column `a` will be updated. This means that even the execution of a single statement is not atomic.

This bug can be reproduced under Read Uncommitted and Read Committed.

Attachments

Issue Links

blocks

MDEV-33802 Weird read view after ROLLBACK of other transactions.

Closed

relates to

MDEV-26642 Weird SELECT view when a record is modified to the same value by two transactions

Closed

MDEV-32898 Phantom rows caused by UPDATE of PRIMARY KEY

Closed

MDEV-35124 Set innodb_snapshot_isolation=ON by default

Closed

MDEV-26671 Inconsistent SELECT View when modify a record added by another transaction

Closed

MDEV-34108 Inappropriate semi-consistent read in RC if innodb_snapshot_isolation=ON

Closed

MDEV-35140 Support innodb-snapshot-isolation in Galera cluster

Stalled

(2 relates to)

Activity

Ascending order - Click to sort in descending order

View 5 older comments

Marko Mäkelä added a comment - 2022-04-05 11:46

I think that this must be addressed in our documentation.

Years ago, the semi-consistent read was originally implemented by me to fix MySQL Bug #3300. The function ha_innobase::try_semi_consistent_read() will enable the semi-consistent reads for the READ COMMITTED and READ UNCOMMITTED isolation levels. Normal consistent reads will wait for a lock on any uncommitted record version.

In the scenario given by the Description, when the second UPDATE statement visits the first row, it will be locked by the first UPDATE, which was not committed yet. For the semi-consistent read, the latest committed version (a,b)=(null,1) will be constructed, and a=null will not satisfy the condition WHERE a. Hence, the row will not be updated.

dinary, are you able to access MySQL Bug #104833, or did you receive any updates for it? It has been made private.

Marko Mäkelä added a comment - 2022-04-05 11:46 I think that this must be addressed in our documentation. Years ago, the semi-consistent read was originally implemented by me to fix MySQL Bug #3300 . The function ha_innobase::try_semi_consistent_read() will enable the semi-consistent reads for the READ COMMITTED and READ UNCOMMITTED isolation levels. Normal consistent reads will wait for a lock on any uncommitted record version. In the scenario given by the Description, when the second UPDATE statement visits the first row, it will be locked by the first UPDATE , which was not committed yet. For the semi-consistent read, the latest committed version (a,b)=(null,1) will be constructed, and a=null will not satisfy the condition WHERE a . Hence, the row will not be updated. dinary , are you able to access MySQL Bug #104833 , or did you receive any updates for it? It has been made private.

Marko Mäkelä added a comment - 2024-01-16 13:10

Unlike for my mtr test cases of ~~MDEV-26642~~, for vlad.lesin’s test on 2022-04-05 I can’t observe any difference when testing with or without my work-in-progress fix of ~~MDEV-32898~~. It is notable that for the READ UNCOMMITTED isolation level, no read view will be created. The work-in-progress fix of ~~MDEV-32898~~ only affects the lock acquisition of records when the transaction has a read view where the record is not visible.

Marko Mäkelä added a comment - 2024-01-16 13:10 Unlike for my mtr test cases of MDEV-26642 , for vlad.lesin ’s test on 2022-04-05 I can’t observe any difference when testing with or without my work-in-progress fix of MDEV-32898 . It is notable that for the READ UNCOMMITTED isolation level, no read view will be created. The work-in-progress fix of MDEV-32898 only affects the lock acquisition of records when the transaction has a read view where the record is not visible.

Marko Mäkelä added a comment - 2024-02-20 16:12

I wonder if a similar approach would work as in ~~MDEV-26642~~ a.k.a. ~~MDEV-32898~~: if a new parameter innodb_strict_isolation is set, we would return ER_CHECKREAD when the non-matching rows are newer than the transaction’s read view. This could address anomalies with the READ COMMITTED isolation level. This should not change anything about the READ UNCOMMITTED isolation level, because there should be no read view in that case.

Marko Mäkelä added a comment - 2024-02-20 16:12 I wonder if a similar approach would work as in MDEV-26642 a.k.a. MDEV-32898 : if a new parameter innodb_strict_isolation is set, we would return ER_CHECKREAD when the non-matching rows are newer than the transaction’s read view. This could address anomalies with the READ COMMITTED isolation level. This should not change anything about the READ UNCOMMITTED isolation level, because there should be no read view in that case.

Marko Mäkelä added a comment - 2024-02-22 11:10

This turns out to be just like vlad.lesin noted: working as intended due to the fix of MySQL Bug #3300 and some follow-up changes. Initially this behaviour may have been enabled by the setting innodb_locks_unsafe_for_binlog. At some later point, this behaviour was enabled for the READ COMMITTED and READ UNCOMMITTED isolation levels and the setting innodb_locks_unsafe_for_binlog deprecated, and ultimately removed in ~~MDEV-19544~~.

Both at READ COMMITTED and READ UNCOMMITTED isolation level, the simplified test does what it is expected to do. For the first row, row_sel_build_committed_vers_for_mysql() will return the initial version of the row, as it was before the UPDATE t SET a=10. Because this row does not match the WHERE condition, it will be skipped.

At the REPEATABLE READ isolation level, the second UPDATE will wait for the first UPDATE to be committed, and therefore it will update both rows.

I tested that the fix of ~~MDEV-26642~~ a.k.a. ~~MDEV-32898~~ does not have any impact on this, on any isolation level.

It could make sense to introduce a separate isolation level so that a correct READ COMMITTED isolation level would be available to those who want it. Even simpler, we could make this dependent on a parameter (suggestion: innodb_strict_isolation) that would enable the fix of ~~MDEV-26642~~. That is exactly what the following patch would do:

diff --git a/storage/innobase/row/row0sel.cc b/storage/innobase/row/row0sel.cc

index edc62fbeef1..16a331f36f5 100644

--- a/storage/innobase/row/row0sel.cc

+++ b/storage/innobase/row/row0sel.cc

@@ -865,6 +865,11 @@ row_sel_build_committed_vers_for_mysql(

 					column version if any */

 	mtr_t*		mtr)		/*!< in: mtr */

+	if (prebuilt->trx->strict_isolation) {

+		*old_vers = rec;

+		return;

+	}

 	if (prebuilt->old_vers_heap) {

 		mem_heap_empty(prebuilt->old_vers_heap);

 	} else {

With this patch and innodb_strict_isolation=ON, both READ COMMITTED and READ UNCOMMITTED will return the correct result.

Marko Mäkelä added a comment - 2024-02-22 11:10 This turns out to be just like vlad.lesin noted: working as intended due to the fix of MySQL Bug #3300 and some follow-up changes. Initially this behaviour may have been enabled by the setting innodb_locks_unsafe_for_binlog . At some later point, this behaviour was enabled for the READ COMMITTED and READ UNCOMMITTED isolation levels and the setting innodb_locks_unsafe_for_binlog deprecated, and ultimately removed in MDEV-19544 . Both at READ COMMITTED and READ UNCOMMITTED isolation level, the simplified test does what it is expected to do. For the first row, row_sel_build_committed_vers_for_mysql() will return the initial version of the row, as it was before the UPDATE t SET a=10 . Because this row does not match the WHERE condition, it will be skipped. At the REPEATABLE READ isolation level, the second UPDATE will wait for the first UPDATE to be committed, and therefore it will update both rows. I tested that the fix of MDEV-26642 a.k.a. MDEV-32898 does not have any impact on this, on any isolation level. It could make sense to introduce a separate isolation level so that a correct READ COMMITTED isolation level would be available to those who want it. Even simpler, we could make this dependent on a parameter (suggestion: innodb_strict_isolation ) that would enable the fix of MDEV-26642 . That is exactly what the following patch would do: diff --git a/storage/innobase/row/row0sel.cc b/storage/innobase/row/row0sel.cc index edc62fbeef1..16a331f36f5 100644 --- a/storage/innobase/row/row0sel.cc +++ b/storage/innobase/row/row0sel.cc @@ -865,6 +865,11 @@ row_sel_build_committed_vers_for_mysql( column version if any */ mtr_t* mtr) /*!< in: mtr */ { + if (prebuilt->trx->strict_isolation) { + *old_vers = rec; + return; + } + if (prebuilt->old_vers_heap) { mem_heap_empty(prebuilt->old_vers_heap); } else { With this patch and innodb_strict_isolation=ON , both READ COMMITTED and READ UNCOMMITTED will return the correct result.

Marko Mäkelä added a comment - 2024-03-20 08:39

This was fixed together with ~~MDEV-32898~~ (PR#3067). If SET innodb_snapshot_isolation=ON is executed before starting a transaction, then the “semi-consistent read” that had deliberately broken these scenarios will be disabled.

Marko Mäkelä added a comment - 2024-03-20 08:39 This was fixed together with MDEV-32898 ( PR#3067 ). If SET innodb_snapshot_isolation=ON is executed before starting a transaction, then the “semi-consistent read” that had deliberately broken these scenarios will be disabled.

MariaDB Server

Inconsistent behaviors of UPDATE under RU & RC isolation level

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration