Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26643

Inconsistent behaviors of UPDATE under RU & RC isolation level

Details

    Description

      Isolation Level: Read Uncommitted & Read Committed.
      The behaviors of UPDATE for different rows are inconsistent. Just some of the rows are not updated while others are.

      /* init */ drop table if exists t;
      /* init */ create table t(a int, b int);
      /* init */ insert into t values(null, 1), (2, 2), (null, null), (null, 3), (4, null);
       
      /* s1 */ begin;
      /* s1 */ update t set a = 10 where 1;
      /* s2 */ begin;
      /* s2 */ update t set b = 20 where a; -- blocked
      /* s1 */ commit; -- s2 unblocked
      /* s2 */ commit;
       
      select * from t;
       
      +------+------+
      | a    | b    |
      +------+------+
      |   10 |    1 |
      |   10 |   20 |
      |   10 |   20 |
      |   10 |   20 |
      |   10 |   20 |
      +------+------+
      

      The field b of the first row is not updated but other rows are updated.

      After several attempts, I found that the results depend on the distribution of the data in the table rather than the timing.

      If the initial table is like:

      +------+------+
      | a    | b    |
      +------+------+
      |    1 |    1 |
      |    2 |    2 |
      | null | null |
      | null |    3 |
      |    4 | null |
      +------+------+
      

      then, the result will be

      +------+------+
      | a    | b    |
      +------+------+
      |   10 |   20 |
      |   10 |   20 |
      |   10 |   20 |
      |   10 |   20 |
      |   10 |   20 |
      +------+------+
      

      More specifically, if there exists a row whose column `a` is not NULL, and it is the first record from top to bottom whose column `a` is not NULL. Then, all rows before it with a NULL column `a` will not be updated, while all rows after it with a NULL column `a` will be updated. This means that even the execution of a single statement is not atomic.

      This bug can be reproduced under Read Uncommitted and Read Committed.

      Attachments

        Issue Links

          Activity

            I think that this must be addressed in our documentation.

            Years ago, the semi-consistent read was originally implemented by me to fix MySQL Bug #3300. The function ha_innobase::try_semi_consistent_read() will enable the semi-consistent reads for the READ COMMITTED and READ UNCOMMITTED isolation levels. Normal consistent reads will wait for a lock on any uncommitted record version.

            In the scenario given by the Description, when the second UPDATE statement visits the first row, it will be locked by the first UPDATE, which was not committed yet. For the semi-consistent read, the latest committed version (a,b)=(null,1) will be constructed, and a=null will not satisfy the condition WHERE a. Hence, the row will not be updated.

            dinary, are you able to access MySQL Bug #104833, or did you receive any updates for it? It has been made private.

            marko Marko Mäkelä added a comment - I think that this must be addressed in our documentation. Years ago, the semi-consistent read was originally implemented by me to fix MySQL Bug #3300 . The function ha_innobase::try_semi_consistent_read() will enable the semi-consistent reads for the READ COMMITTED and READ UNCOMMITTED isolation levels. Normal consistent reads will wait for a lock on any uncommitted record version. In the scenario given by the Description, when the second UPDATE statement visits the first row, it will be locked by the first UPDATE , which was not committed yet. For the semi-consistent read, the latest committed version (a,b)=(null,1) will be constructed, and a=null will not satisfy the condition WHERE a . Hence, the row will not be updated. dinary , are you able to access MySQL Bug #104833 , or did you receive any updates for it? It has been made private.

            Unlike for my mtr test cases of MDEV-26642, for vlad.lesin’s test on 2022-04-05 I can’t observe any difference when testing with or without my work-in-progress fix of MDEV-32898. It is notable that for the READ UNCOMMITTED isolation level, no read view will be created. The work-in-progress fix of MDEV-32898 only affects the lock acquisition of records when the transaction has a read view where the record is not visible.

            marko Marko Mäkelä added a comment - Unlike for my mtr test cases of MDEV-26642 , for vlad.lesin ’s test on 2022-04-05 I can’t observe any difference when testing with or without my work-in-progress fix of MDEV-32898 . It is notable that for the READ UNCOMMITTED isolation level, no read view will be created. The work-in-progress fix of MDEV-32898 only affects the lock acquisition of records when the transaction has a read view where the record is not visible.

            I wonder if a similar approach would work as in MDEV-26642 a.k.a. MDEV-32898: if a new parameter innodb_strict_isolation is set, we would return ER_CHECKREAD when the non-matching rows are newer than the transaction’s read view. This could address anomalies with the READ COMMITTED isolation level. This should not change anything about the READ UNCOMMITTED isolation level, because there should be no read view in that case.

            marko Marko Mäkelä added a comment - I wonder if a similar approach would work as in MDEV-26642 a.k.a. MDEV-32898 : if a new parameter innodb_strict_isolation is set, we would return ER_CHECKREAD when the non-matching rows are newer than the transaction’s read view. This could address anomalies with the READ COMMITTED isolation level. This should not change anything about the READ UNCOMMITTED isolation level, because there should be no read view in that case.

            This turns out to be just like vlad.lesin noted: working as intended due to the fix of MySQL Bug #3300 and some follow-up changes. Initially this behaviour may have been enabled by the setting innodb_locks_unsafe_for_binlog. At some later point, this behaviour was enabled for the READ COMMITTED and READ UNCOMMITTED isolation levels and the setting innodb_locks_unsafe_for_binlog deprecated, and ultimately removed in MDEV-19544.

            Both at READ COMMITTED and READ UNCOMMITTED isolation level, the simplified test does what it is expected to do. For the first row, row_sel_build_committed_vers_for_mysql() will return the initial version of the row, as it was before the UPDATE t SET a=10. Because this row does not match the WHERE condition, it will be skipped.

            At the REPEATABLE READ isolation level, the second UPDATE will wait for the first UPDATE to be committed, and therefore it will update both rows.

            I tested that the fix of MDEV-26642 a.k.a. MDEV-32898 does not have any impact on this, on any isolation level.

            It could make sense to introduce a separate isolation level so that a correct READ COMMITTED isolation level would be available to those who want it. Even simpler, we could make this dependent on a parameter (suggestion: innodb_strict_isolation) that would enable the fix of MDEV-26642. That is exactly what the following patch would do:

            diff --git a/storage/innobase/row/row0sel.cc b/storage/innobase/row/row0sel.cc
            index edc62fbeef1..16a331f36f5 100644
            --- a/storage/innobase/row/row0sel.cc
            +++ b/storage/innobase/row/row0sel.cc
            @@ -865,6 +865,11 @@ row_sel_build_committed_vers_for_mysql(
             					column version if any */
             	mtr_t*		mtr)		/*!< in: mtr */
             {
            +	if (prebuilt->trx->strict_isolation) {
            +		*old_vers = rec;
            +		return;
            +	}
            +
             	if (prebuilt->old_vers_heap) {
             		mem_heap_empty(prebuilt->old_vers_heap);
             	} else {
            

            With this patch and innodb_strict_isolation=ON, both READ COMMITTED and READ UNCOMMITTED will return the correct result.

            marko Marko Mäkelä added a comment - This turns out to be just like vlad.lesin noted: working as intended due to the fix of MySQL Bug #3300 and some follow-up changes. Initially this behaviour may have been enabled by the setting innodb_locks_unsafe_for_binlog . At some later point, this behaviour was enabled for the READ COMMITTED and READ UNCOMMITTED isolation levels and the setting innodb_locks_unsafe_for_binlog deprecated, and ultimately removed in MDEV-19544 . Both at READ COMMITTED and READ UNCOMMITTED isolation level, the simplified test does what it is expected to do. For the first row, row_sel_build_committed_vers_for_mysql() will return the initial version of the row, as it was before the UPDATE t SET a=10 . Because this row does not match the WHERE condition, it will be skipped. At the REPEATABLE READ isolation level, the second UPDATE will wait for the first UPDATE to be committed, and therefore it will update both rows. I tested that the fix of MDEV-26642 a.k.a. MDEV-32898 does not have any impact on this, on any isolation level. It could make sense to introduce a separate isolation level so that a correct READ COMMITTED isolation level would be available to those who want it. Even simpler, we could make this dependent on a parameter (suggestion: innodb_strict_isolation ) that would enable the fix of MDEV-26642 . That is exactly what the following patch would do: diff --git a/storage/innobase/row/row0sel.cc b/storage/innobase/row/row0sel.cc index edc62fbeef1..16a331f36f5 100644 --- a/storage/innobase/row/row0sel.cc +++ b/storage/innobase/row/row0sel.cc @@ -865,6 +865,11 @@ row_sel_build_committed_vers_for_mysql( column version if any */ mtr_t* mtr) /*!< in: mtr */ { + if (prebuilt->trx->strict_isolation) { + *old_vers = rec; + return; + } + if (prebuilt->old_vers_heap) { mem_heap_empty(prebuilt->old_vers_heap); } else { With this patch and innodb_strict_isolation=ON , both READ COMMITTED and READ UNCOMMITTED will return the correct result.

            This was fixed together with MDEV-32898 (PR#3067). If SET innodb_snapshot_isolation=ON is executed before starting a transaction, then the “semi-consistent read” that had deliberately broken these scenarios will be disabled.

            marko Marko Mäkelä added a comment - This was fixed together with MDEV-32898 ( PR#3067 ). If SET innodb_snapshot_isolation=ON is executed before starting a transaction, then the “semi-consistent read” that had deliberately broken these scenarios will be disabled.

            People

              marko Marko Mäkelä
              dinary dinary
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.