Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33802

Weird read view after ROLLBACK of other transactions.

Details

    Description

      How to repeat:

      /* init */ CREATE TABLE t(a INT PRIMARY KEY, b INT UNIQUE);
      /* init */ INSERT INTO t(a) VALUES (1);
      /* t1 */ SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE;
      /* t2 */ SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE;
       
      /* t1 */ BEGIN;
      /* t2 */ BEGIN;
      /* t1 */ INSERT INTO t(a) VALUES (2);
      /* t2 */ SELECT a, b FROM t WHERE TRUE;  -- [(1, null), (1, null)]
      /* t1 */ ROLLBACK;
      /* t2 */ COMMIT;
      

      The SELECT statement of t2 was initially blocked by t1. Then it recovered after t1's ROLLBACK, but its query result contained two identical records (1, null). Interestingly, removing the UNIQUE constraint on column b eliminates this anomaly.

      Attachments

        Issue Links

          Activity

            zhuangliu Zhuang Liu added a comment -

            It's a fuzzing tool in development designed for detecting transaction bugs, which is related to an academic paper I'm currently writing. I apologize that the tool isn't publicly available yet, but I'll inform you when it is. Until then, I'll continue testing on the fixed version and report any potential bugs.

            zhuangliu Zhuang Liu added a comment - It's a fuzzing tool in development designed for detecting transaction bugs, which is related to an academic paper I'm currently writing. I apologize that the tool isn't publicly available yet, but I'll inform you when it is. Until then, I'll continue testing on the fixed version and report any potential bugs.
            vlad.lesin Vladislav Lesin added a comment - - edited

            There are two (1, NULL) rows in the result because the second transaction restores cursor to (null, 1) for secondary index after resuming after the first transaction deleted (null,2) in the secondary index. Logically, the cursor should point to supremum after restoring.

            The error is in the following code:

            btr_pcur_t::restore_status                                                      
            btr_pcur_t::restore_position(btr_latch_mode restore_latch_mode, mtr_t *mtr)     
            {                                                                               
            ...                                                                             
                    if (rel_pos == BTR_PCUR_ON && btr_pcur_is_on_user_rec(this)) {                                                            
                            ulint n_matched_fields= 0;                                      
                            if (!cmp_dtuple_rec_with_match(                                 
                                  tuple, btr_pcur_get_rec(this),                            
                                  rec_get_offsets(btr_pcur_get_rec(this), index, offsets,   
                                    index->n_core_fields, ULINT_UNDEFINED, &heap),          
                                  &n_matched_fields)) {                                     
                                    ...                                                     
                                    return restore_status::SAME_ALL;                                                                          
                            }                                                                                                                 
                            if (n_matched_fields >= index->n_uniq)                          
                                    ret_val= restore_status::SAME_UNIQ;                     
                    }                                                                       
            ...                                                                             
            }
            

            "if (n_matched_fields >= index->n_uniq)" condition is wrong. It works when we have non-null unique key. But when unique index has null key, there can be several non-unique records with null key, and this logic does not work.

            So innodb_snapshot_isolation=OFF mode should also be fixed.

            vlad.lesin Vladislav Lesin added a comment - - edited There are two (1, NULL) rows in the result because the second transaction restores cursor to (null, 1) for secondary index after resuming after the first transaction deleted (null,2) in the secondary index. Logically, the cursor should point to supremum after restoring. The error is in the following code: btr_pcur_t::restore_status btr_pcur_t::restore_position(btr_latch_mode restore_latch_mode, mtr_t *mtr) { ... if (rel_pos == BTR_PCUR_ON && btr_pcur_is_on_user_rec( this )) { ulint n_matched_fields= 0 ; if (!cmp_dtuple_rec_with_match( tuple, btr_pcur_get_rec( this ), rec_get_offsets(btr_pcur_get_rec( this ), index, offsets, index->n_core_fields, ULINT_UNDEFINED, &heap), &n_matched_fields)) { ... return restore_status::SAME_ALL; } if (n_matched_fields >= index->n_uniq) ret_val= restore_status::SAME_UNIQ; } ... } "if (n_matched_fields >= index->n_uniq)" condition is wrong. It works when we have non-null unique key. But when unique index has null key, there can be several non-unique records with null key, and this logic does not work. So innodb_snapshot_isolation=OFF mode should also be fixed.

            Preliminary result of testing on origin/bb-10.6-MDEV-33802-unique-cursor-restore a61efa8651d9355a1f92cbdb43a0a6e7bca48211 2024-04-03T13:43:02+03:00

            pluto:/data/results/1712142700/TBR-2114$ gdb c ./1/data/core.1206604 /data/Server_bin/bb-10.6MDEV-33802-unique-cursor-restore_debug_Og/bin/mariadbd

            ....
            # 2024-04-03T17:53:22 [1206243] | 2024-04-03 17:52:43 0 [Note] InnoDB: Starting to truncate .//undo003
            # 2024-04-03T17:53:22 [1206243] | 2024-04-03 17:52:44 0 [Note] InnoDB: Truncating .//undo003
            # 2024-04-03T17:53:22 [1206243] | mariadbd: /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/mtr/mtr0mtr.cc:279: void mtr_t::commit_shrink(fil_space_t&, uint32_t): Assertion `m_made_dirty' failed.
            (gdb) bt
            #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140238180382272) at ./nptl/pthread_kill.c:44
            #1  __pthread_kill_internal (signo=6, threadid=140238180382272) at ./nptl/pthread_kill.c:78
            #2  __GI___pthread_kill (threadid=140238180382272, signo=6) at ./nptl/pthread_kill.c:89
            #3  0x0000561ed7688e85 in my_write_core (sig=6) at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/mysys/stacktrace.c:424
            #4  0x0000561ed6f25375 in handle_fatal_signal (sig=6) at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/sql/signal_handler.cc:357
            #5  <signal handler called>
            #6  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140238180382272) at ./nptl/pthread_kill.c:44
            #7  __pthread_kill_internal (signo=6, threadid=140238180382272) at ./nptl/pthread_kill.c:78
            #8  __GI___pthread_kill (threadid=140238180382272, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
            #9  0x00007f8bdf533476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
            #10 0x00007f8bdf5197f3 in __GI_abort () at ./stdlib/abort.c:79
            #11 0x00007f8bdf51971b in __assert_fail_base (fmt=0x7f8bdf6ce130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x561ed7a9f86c "m_made_dirty", 
                file=0x561ed7a9f720 "/data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/mtr/mtr0mtr.cc", line=279, function=<optimized out>) at ./assert/assert.c:92
            #12 0x00007f8bdf52ae96 in __GI___assert_fail (assertion=0x561ed7a9f86c "m_made_dirty", file=0x561ed7a9f720 "/data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/mtr/mtr0mtr.cc", line=279, 
                function=0x561ed7aa07c8 "void mtr_t::commit_shrink(fil_space_t&, uint32_t)") at ./assert/assert.c:101
            #13 0x0000561ed7359d6d in mtr_t::commit_shrink (this=this@entry=0x7f8bbeece770, space=..., size=size@entry=640) at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/mtr/mtr0mtr.cc:279
            #14 0x0000561ed7486924 in trx_purge_truncate_history () at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/trx/trx0purge.cc:819
            #15 0x0000561ed74769bc in purge_coordinator_state::do_purge (this=this@entry=0x561ed8a5bf20 <purge_state>) at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/srv/srv0srv.cc:1645
            #16 0x0000561ed7475a85 in purge_coordinator_callback () at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/srv/srv0srv.cc:1716
            #17 0x0000561ed762ac3d in tpool::task_group::execute (this=0x561ed8a5bdc0 <purge_coordinator_task_group>, t=t@entry=0x561ed8a5bd20 <purge_coordinator_task>)
                at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/tpool/task_group.cc:55
            #18 0x0000561ed762af93 in tpool::task::execute (this=0x561ed8a5bd20 <purge_coordinator_task>) at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/tpool/task.cc:32
            #19 0x0000561ed7629479 in tpool::thread_pool_generic::worker_main (this=0x561ed96a5fd0, thread_var=0x561ed96a6390) at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/tpool/tpool_generic.cc:583
            #20 0x0000561ed76298b2 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__t=<optimized out>, __f=<optimized out>)
                at /usr/include/c++/11/bits/invoke.h:74
            #21 std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=<optimized out>) at /usr/include/c++/11/bits/invoke.h:96
            #22 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>)
                at /usr/include/c++/11/bits/std_thread.h:259
            #23 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=<optimized out>)
                at /usr/include/c++/11/bits/std_thread.h:266
            #24 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=<optimized out>)
                at /usr/include/c++/11/bits/std_thread.h:211
            #25 0x00007f8bdf8ff253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
            #26 0x00007f8bdf585ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
            #27 0x00007f8bdf617850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
            (gdb)
            

            mleich Matthias Leich added a comment - Preliminary result of testing on origin/bb-10.6- MDEV-33802 -unique-cursor-restore a61efa8651d9355a1f92cbdb43a0a6e7bca48211 2024-04-03T13:43:02+03:00 pluto:/data/results/1712142700/TBR-2114$ gdb c ./1/data/core.1206604 /data/Server_bin/bb-10.6 MDEV-33802 -unique-cursor-restore_debug_Og/bin/mariadbd .... # 2024-04-03T17:53:22 [1206243] | 2024-04-03 17:52:43 0 [Note] InnoDB: Starting to truncate .//undo003 # 2024-04-03T17:53:22 [1206243] | 2024-04-03 17:52:44 0 [Note] InnoDB: Truncating .//undo003 # 2024-04-03T17:53:22 [1206243] | mariadbd: /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/mtr/mtr0mtr.cc:279: void mtr_t::commit_shrink(fil_space_t&, uint32_t): Assertion `m_made_dirty' failed. (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140238180382272) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140238180382272) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140238180382272, signo=6) at ./nptl/pthread_kill.c:89 #3 0x0000561ed7688e85 in my_write_core (sig=6) at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/mysys/stacktrace.c:424 #4 0x0000561ed6f25375 in handle_fatal_signal (sig=6) at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/sql/signal_handler.cc:357 #5 <signal handler called> #6 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140238180382272) at ./nptl/pthread_kill.c:44 #7 __pthread_kill_internal (signo=6, threadid=140238180382272) at ./nptl/pthread_kill.c:78 #8 __GI___pthread_kill (threadid=140238180382272, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #9 0x00007f8bdf533476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #10 0x00007f8bdf5197f3 in __GI_abort () at ./stdlib/abort.c:79 #11 0x00007f8bdf51971b in __assert_fail_base (fmt=0x7f8bdf6ce130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x561ed7a9f86c "m_made_dirty", file=0x561ed7a9f720 "/data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/mtr/mtr0mtr.cc", line=279, function=<optimized out>) at ./assert/assert.c:92 #12 0x00007f8bdf52ae96 in __GI___assert_fail (assertion=0x561ed7a9f86c "m_made_dirty", file=0x561ed7a9f720 "/data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/mtr/mtr0mtr.cc", line=279, function=0x561ed7aa07c8 "void mtr_t::commit_shrink(fil_space_t&, uint32_t)") at ./assert/assert.c:101 #13 0x0000561ed7359d6d in mtr_t::commit_shrink (this=this@entry=0x7f8bbeece770, space=..., size=size@entry=640) at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/mtr/mtr0mtr.cc:279 #14 0x0000561ed7486924 in trx_purge_truncate_history () at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/trx/trx0purge.cc:819 #15 0x0000561ed74769bc in purge_coordinator_state::do_purge (this=this@entry=0x561ed8a5bf20 <purge_state>) at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/srv/srv0srv.cc:1645 #16 0x0000561ed7475a85 in purge_coordinator_callback () at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/storage/innobase/srv/srv0srv.cc:1716 #17 0x0000561ed762ac3d in tpool::task_group::execute (this=0x561ed8a5bdc0 <purge_coordinator_task_group>, t=t@entry=0x561ed8a5bd20 <purge_coordinator_task>) at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/tpool/task_group.cc:55 #18 0x0000561ed762af93 in tpool::task::execute (this=0x561ed8a5bd20 <purge_coordinator_task>) at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/tpool/task.cc:32 #19 0x0000561ed7629479 in tpool::thread_pool_generic::worker_main (this=0x561ed96a5fd0, thread_var=0x561ed96a6390) at /data/Server/bb-10.6-MDEV-33802-unique-cursor-restore/tpool/tpool_generic.cc:583 #20 0x0000561ed76298b2 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/11/bits/invoke.h:74 #21 std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=<optimized out>) at /usr/include/c++/11/bits/invoke.h:96 #22 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:259 #23 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:266 #24 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:211 #25 0x00007f8bdf8ff253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #26 0x00007f8bdf585ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 #27 0x00007f8bdf617850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb)

            This will be fixed in two parts. The fix that was applied to 10.6 and later (https://github.com/MariaDB/server/pull/3171) will ensure that in innodb_strict_isolation=ON mode we will return ER_CHECKREAD instead of waiting for a lock. (This revealed a bug MDEV-33864 in the test harness.)

            Another fix that vlad.lesin is working on is https://github.com/MariaDB/server/pull/3187 to correct the cursor restoration for UNIQUE NULL indexes. I have approved it, and it should be merged as soon as stress tests pass.

            marko Marko Mäkelä added a comment - This will be fixed in two parts. The fix that was applied to 10.6 and later ( https://github.com/MariaDB/server/pull/3171 ) will ensure that in innodb_strict_isolation=ON mode we will return ER_CHECKREAD instead of waiting for a lock. (This revealed a bug MDEV-33864 in the test harness.) Another fix that vlad.lesin is working on is https://github.com/MariaDB/server/pull/3187 to correct the cursor restoration for UNIQUE NULL indexes. I have approved it, and it should be merged as soon as stress tests pass.

            The fix of wrong persistent cursor position restoring is pushed to 10.4.

            vlad.lesin Vladislav Lesin added a comment - The fix of wrong persistent cursor position restoring is pushed to 10.4.

            People

              vlad.lesin Vladislav Lesin
              zhuangliu Zhuang Liu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.