Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26643

Inconsistent behaviors of UPDATE under RU & RC isolation level

Details

    Description

      Isolation Level: Read Uncommitted & Read Committed.
      The behaviors of UPDATE for different rows are inconsistent. Just some of the rows are not updated while others are.

      /* init */ drop table if exists t;
      /* init */ create table t(a int, b int);
      /* init */ insert into t values(null, 1), (2, 2), (null, null), (null, 3), (4, null);
       
      /* s1 */ begin;
      /* s1 */ update t set a = 10 where 1;
      /* s2 */ begin;
      /* s2 */ update t set b = 20 where a; -- blocked
      /* s1 */ commit; -- s2 unblocked
      /* s2 */ commit;
       
      select * from t;
       
      +------+------+
      | a    | b    |
      +------+------+
      |   10 |    1 |
      |   10 |   20 |
      |   10 |   20 |
      |   10 |   20 |
      |   10 |   20 |
      +------+------+
      

      The field b of the first row is not updated but other rows are updated.

      After several attempts, I found that the results depend on the distribution of the data in the table rather than the timing.

      If the initial table is like:

      +------+------+
      | a    | b    |
      +------+------+
      |    1 |    1 |
      |    2 |    2 |
      | null | null |
      | null |    3 |
      |    4 | null |
      +------+------+
      

      then, the result will be

      +------+------+
      | a    | b    |
      +------+------+
      |   10 |   20 |
      |   10 |   20 |
      |   10 |   20 |
      |   10 |   20 |
      |   10 |   20 |
      +------+------+
      

      More specifically, if there exists a row whose column `a` is not NULL, and it is the first record from top to bottom whose column `a` is not NULL. Then, all rows before it with a NULL column `a` will not be updated, while all rows after it with a NULL column `a` will be updated. This means that even the execution of a single statement is not atomic.

      This bug can be reproduced under Read Uncommitted and Read Committed.

      Attachments

        Issue Links

          Activity

            dinary dinary added a comment - - edited

            This bug can also be reproduced in MySQL and has been verified by MySQL Verification Team.

            https://bugs.mysql.com/bug.php?id=104833

            dinary dinary added a comment - - edited This bug can also be reproduced in MySQL and has been verified by MySQL Verification Team. https://bugs.mysql.com/bug.php?id=104833

            I suspect that the root cause of this the way how InnoDB read views work. I posted some explanation and executable test cases in MDEV-26642.

            marko Marko Mäkelä added a comment - I suspect that the root cause of this the way how InnoDB read views work. I posted some explanation and executable test cases in MDEV-26642 .
            dinary dinary added a comment -

            Thank you for your reply. I agree with your explanation MDEV-26642.

            But this issue seems different with MDEV-26642 and MDEV-26671.

            dinary dinary added a comment - Thank you for your reply. I agree with your explanation MDEV-26642 . But this issue seems different with MDEV-26642 and MDEV-26671 .

            I think that this one would be good for vlad.lesin to check. He has been spending a lot of time diagnosing locking in the past months. I wonder if MDEV-27025 would affect this.

            marko Marko Mäkelä added a comment - I think that this one would be good for vlad.lesin to check. He has been spending a lot of time diagnosing locking in the past months. I wonder if MDEV-27025 would affect this.

            I have analyzed the above case. I simplified test case a little bit:

            --source include/have_innodb.inc                                                
                                                                                            
            create table t(a int, b int) engine = InnoDB;                                   
            insert into t values(null, 1), (2, 2);                                          
            SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;                               
                                                                                            
            begin;                                                                          
            update t set a = 10;                                                            
                                                                                            
            connect (con1,localhost,root);                                                  
            SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;                               
            begin;                                                                          
            --send update t set b = 20 where a                                              
                                                                                            
            --connection default                                                            
            select sleep(1);                                                                
            commit;                                                                         
                                                                                            
            --connection con1                                                               
            --reap                                                                          
            commit;                                                                         
                                                                                            
            --disconnect con1                                                               
            --connection default                                                            
            select * from t;                                                                
            drop table t;
            

            When the second transaction reads (10,1) row, the row is locked by the first transaction. For this case row_search_mvcc() reads previously committed (null,1) value:

            row_search_mvcc(...)                                                            
            {                                                                               
            ...                                                                             
            no_gap_lock:                                                                    
            ...                                                                             
                            err = sel_set_rec_lock(pcur,                                    
                                                   rec, index, offsets,                     
                                                   prebuilt->select_lock_type,              
                                                   lock_type, thr, &mtr);                   
                                                                                            
                            switch (err) {                                                  
                            ...                                                             
                            case DB_LOCK_WAIT:                                              
                                    ...                                                     
                                    row_sel_build_committed_vers_for_mysql(                 
                                            clust_index, prebuilt, rec,                     
                                            &offsets, &heap, &old_vers, need_vrow ? &vrow : NULL,
                                            &mtr);                                          
                                    ...                                                     
                                    if (old_vers == NULL) {                                 
                                            /* The row was not yet committed */             
                                                                                            
                                            goto next_rec;                                  
                                    }                                                       
                                                                                            
                                    did_semi_consistent_read = true;                        
                                    rec = old_vers;                                         
                                    break;                                                  
                                    ...                                                     
                            }                                                               
                                                                                            
            ...                                                                             
            }
            

            And as (null,1) does no match "where" condition of the 2nd update, i.e. a = null, it's not updated.

            This behavior is documented out for mysql https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html:

            "Using READ COMMITTED has additional effects:
            ...
            For UPDATE statements, if a row is already locked, InnoDB performs a “semi-consistent” read, returning the latest committed version to MySQL so that MySQL can determine whether the row matches the WHERE condition of the UPDATE. If the row matches (must be updated), MySQL reads the row again and this time InnoDB either locks it or waits for a lock on it. "

            Unfortunately, out isolation levels documentation does not contain such description: https://mariadb.com/kb/en/mariadb-transactions-and-isolation-levels-for-sql-server-users.

            vlad.lesin Vladislav Lesin added a comment - I have analyzed the above case. I simplified test case a little bit: --source include/have_innodb.inc create table t(a int, b int) engine = InnoDB; insert into t values(null, 1), (2, 2); SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; begin; update t set a = 10; connect (con1,localhost,root); SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; begin; --send update t set b = 20 where a --connection default select sleep(1); commit; --connection con1 --reap commit; --disconnect con1 --connection default select * from t; drop table t; When the second transaction reads (10,1) row, the row is locked by the first transaction. For this case row_search_mvcc() reads previously committed (null,1) value: row_search_mvcc(...) { ... no_gap_lock: ... err = sel_set_rec_lock(pcur, rec, index, offsets, prebuilt->select_lock_type, lock_type, thr, &mtr); switch (err) { ... case DB_LOCK_WAIT: ... row_sel_build_committed_vers_for_mysql( clust_index, prebuilt, rec, &offsets, &heap, &old_vers, need_vrow ? &vrow : NULL, &mtr); ... if (old_vers == NULL) { /* The row was not yet committed */ goto next_rec; } did_semi_consistent_read = true ; rec = old_vers; break ; ... } ... } And as (null,1) does no match "where" condition of the 2nd update, i.e. a = null, it's not updated. This behavior is documented out for mysql https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html: "Using READ COMMITTED has additional effects: ... For UPDATE statements, if a row is already locked, InnoDB performs a “semi-consistent” read, returning the latest committed version to MySQL so that MySQL can determine whether the row matches the WHERE condition of the UPDATE. If the row matches (must be updated), MySQL reads the row again and this time InnoDB either locks it or waits for a lock on it. " Unfortunately, out isolation levels documentation does not contain such description: https://mariadb.com/kb/en/mariadb-transactions-and-isolation-levels-for-sql-server-users .

            I think that this must be addressed in our documentation.

            Years ago, the semi-consistent read was originally implemented by me to fix MySQL Bug #3300. The function ha_innobase::try_semi_consistent_read() will enable the semi-consistent reads for the READ COMMITTED and READ UNCOMMITTED isolation levels. Normal consistent reads will wait for a lock on any uncommitted record version.

            In the scenario given by the Description, when the second UPDATE statement visits the first row, it will be locked by the first UPDATE, which was not committed yet. For the semi-consistent read, the latest committed version (a,b)=(null,1) will be constructed, and a=null will not satisfy the condition WHERE a. Hence, the row will not be updated.

            dinary, are you able to access MySQL Bug #104833, or did you receive any updates for it? It has been made private.

            marko Marko Mäkelä added a comment - I think that this must be addressed in our documentation. Years ago, the semi-consistent read was originally implemented by me to fix MySQL Bug #3300 . The function ha_innobase::try_semi_consistent_read() will enable the semi-consistent reads for the READ COMMITTED and READ UNCOMMITTED isolation levels. Normal consistent reads will wait for a lock on any uncommitted record version. In the scenario given by the Description, when the second UPDATE statement visits the first row, it will be locked by the first UPDATE , which was not committed yet. For the semi-consistent read, the latest committed version (a,b)=(null,1) will be constructed, and a=null will not satisfy the condition WHERE a . Hence, the row will not be updated. dinary , are you able to access MySQL Bug #104833 , or did you receive any updates for it? It has been made private.

            Unlike for my mtr test cases of MDEV-26642, for vlad.lesin’s test on 2022-04-05 I can’t observe any difference when testing with or without my work-in-progress fix of MDEV-32898. It is notable that for the READ UNCOMMITTED isolation level, no read view will be created. The work-in-progress fix of MDEV-32898 only affects the lock acquisition of records when the transaction has a read view where the record is not visible.

            marko Marko Mäkelä added a comment - Unlike for my mtr test cases of MDEV-26642 , for vlad.lesin ’s test on 2022-04-05 I can’t observe any difference when testing with or without my work-in-progress fix of MDEV-32898 . It is notable that for the READ UNCOMMITTED isolation level, no read view will be created. The work-in-progress fix of MDEV-32898 only affects the lock acquisition of records when the transaction has a read view where the record is not visible.

            I wonder if a similar approach would work as in MDEV-26642 a.k.a. MDEV-32898: if a new parameter innodb_strict_isolation is set, we would return ER_CHECKREAD when the non-matching rows are newer than the transaction’s read view. This could address anomalies with the READ COMMITTED isolation level. This should not change anything about the READ UNCOMMITTED isolation level, because there should be no read view in that case.

            marko Marko Mäkelä added a comment - I wonder if a similar approach would work as in MDEV-26642 a.k.a. MDEV-32898 : if a new parameter innodb_strict_isolation is set, we would return ER_CHECKREAD when the non-matching rows are newer than the transaction’s read view. This could address anomalies with the READ COMMITTED isolation level. This should not change anything about the READ UNCOMMITTED isolation level, because there should be no read view in that case.

            This turns out to be just like vlad.lesin noted: working as intended due to the fix of MySQL Bug #3300 and some follow-up changes. Initially this behaviour may have been enabled by the setting innodb_locks_unsafe_for_binlog. At some later point, this behaviour was enabled for the READ COMMITTED and READ UNCOMMITTED isolation levels and the setting innodb_locks_unsafe_for_binlog deprecated, and ultimately removed in MDEV-19544.

            Both at READ COMMITTED and READ UNCOMMITTED isolation level, the simplified test does what it is expected to do. For the first row, row_sel_build_committed_vers_for_mysql() will return the initial version of the row, as it was before the UPDATE t SET a=10. Because this row does not match the WHERE condition, it will be skipped.

            At the REPEATABLE READ isolation level, the second UPDATE will wait for the first UPDATE to be committed, and therefore it will update both rows.

            I tested that the fix of MDEV-26642 a.k.a. MDEV-32898 does not have any impact on this, on any isolation level.

            It could make sense to introduce a separate isolation level so that a correct READ COMMITTED isolation level would be available to those who want it. Even simpler, we could make this dependent on a parameter (suggestion: innodb_strict_isolation) that would enable the fix of MDEV-26642. That is exactly what the following patch would do:

            diff --git a/storage/innobase/row/row0sel.cc b/storage/innobase/row/row0sel.cc
            index edc62fbeef1..16a331f36f5 100644
            --- a/storage/innobase/row/row0sel.cc
            +++ b/storage/innobase/row/row0sel.cc
            @@ -865,6 +865,11 @@ row_sel_build_committed_vers_for_mysql(
             					column version if any */
             	mtr_t*		mtr)		/*!< in: mtr */
             {
            +	if (prebuilt->trx->strict_isolation) {
            +		*old_vers = rec;
            +		return;
            +	}
            +
             	if (prebuilt->old_vers_heap) {
             		mem_heap_empty(prebuilt->old_vers_heap);
             	} else {
            

            With this patch and innodb_strict_isolation=ON, both READ COMMITTED and READ UNCOMMITTED will return the correct result.

            marko Marko Mäkelä added a comment - This turns out to be just like vlad.lesin noted: working as intended due to the fix of MySQL Bug #3300 and some follow-up changes. Initially this behaviour may have been enabled by the setting innodb_locks_unsafe_for_binlog . At some later point, this behaviour was enabled for the READ COMMITTED and READ UNCOMMITTED isolation levels and the setting innodb_locks_unsafe_for_binlog deprecated, and ultimately removed in MDEV-19544 . Both at READ COMMITTED and READ UNCOMMITTED isolation level, the simplified test does what it is expected to do. For the first row, row_sel_build_committed_vers_for_mysql() will return the initial version of the row, as it was before the UPDATE t SET a=10 . Because this row does not match the WHERE condition, it will be skipped. At the REPEATABLE READ isolation level, the second UPDATE will wait for the first UPDATE to be committed, and therefore it will update both rows. I tested that the fix of MDEV-26642 a.k.a. MDEV-32898 does not have any impact on this, on any isolation level. It could make sense to introduce a separate isolation level so that a correct READ COMMITTED isolation level would be available to those who want it. Even simpler, we could make this dependent on a parameter (suggestion: innodb_strict_isolation ) that would enable the fix of MDEV-26642 . That is exactly what the following patch would do: diff --git a/storage/innobase/row/row0sel.cc b/storage/innobase/row/row0sel.cc index edc62fbeef1..16a331f36f5 100644 --- a/storage/innobase/row/row0sel.cc +++ b/storage/innobase/row/row0sel.cc @@ -865,6 +865,11 @@ row_sel_build_committed_vers_for_mysql( column version if any */ mtr_t* mtr) /*!< in: mtr */ { + if (prebuilt->trx->strict_isolation) { + *old_vers = rec; + return; + } + if (prebuilt->old_vers_heap) { mem_heap_empty(prebuilt->old_vers_heap); } else { With this patch and innodb_strict_isolation=ON , both READ COMMITTED and READ UNCOMMITTED will return the correct result.

            This was fixed together with MDEV-32898 (PR#3067). If SET innodb_snapshot_isolation=ON is executed before starting a transaction, then the “semi-consistent read” that had deliberately broken these scenarios will be disabled.

            marko Marko Mäkelä added a comment - This was fixed together with MDEV-32898 ( PR#3067 ). If SET innodb_snapshot_isolation=ON is executed before starting a transaction, then the “semi-consistent read” that had deliberately broken these scenarios will be disabled.

            People

              marko Marko Mäkelä
              dinary dinary
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.