Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27701

Race on trx->lock.wait_lock between lock_rec_move() and lock_sys_t::cancel()

Details

    Description

      origin/bb-10.6-MDEV-20605-cur-pos-fix ac48095d5ba1c408cdc77d095985b2f20e46c82f 2022-01-21T21:00:54+03:00
      Per Vladislav his modifications for MDEV-20605 cannot be the reason for the bad effect.
       
      mysqld: /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:5744: static dberr_t lock_sys_t::cancel(trx_t*, ib_lock_t*) [with bool check_victim = false]: Assertion `trx->lock.wait_lock == lock' failed.
      # 2022-01-31T20:06:20 [219582] | Query (0x62b000237238): UPDATE t1 SET col1 = 2702 LIMIT 2 
      # 2022-01-31T20:06:20 [219582] | Status: NOT_KILLED
       
      (rr) bt
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      #1  0x000013540ba13859 in __GI_abort () at abort.c:79
      #2  0x000013540ba13729 in __assert_fail_base (fmt=0x13540bba9588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55a9db267a60 "trx->lock.wait_lock == lock", 
          file=0x55a9db25d680 "/data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc", line=5744, function=<optimized out>) at assert.c:92
      #3  0x000013540ba24f36 in __GI___assert_fail (assertion=0x55a9db267a60 "trx->lock.wait_lock == lock", file=0x55a9db25d680 "/data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc", line=5744, 
          function=0x55a9db2679e0 "static dberr_t lock_sys_t::cancel(trx_t*, ib_lock_t*) [with bool check_victim = false]") at assert.c:101
      #4  0x000055a9d9f2543a in lock_sys_t::cancel<false> (trx=0x7ffa5799fe88, lock=0x7ffa579a01c8) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:5744
      #5  0x000055a9d9efd1e2 in lock_wait (thr=0x621000899db8) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:1918
      #6  0x000055a9da10c759 in row_mysql_handle_errors (new_err=0x7ffa4d83c4c0, trx=0x7ffa5799fe88, thr=0x621000899db8, savept=0x0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/row/row0mysql.cc:686
      #7  0x000055a9da192e2c in row_search_mvcc (buf=0x61a00028c2b8 "\377", mode=PAGE_CUR_G, prebuilt=0x621000899588, match_mode=0, direction=0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/row/row0sel.cc:5826
      #8  0x000055a9d9d8b7d9 in ha_innobase::index_read (this=0x61d0000f1eb8, buf=0x61a00028c2b8 "\377", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/handler/ha_innodb.cc:9015
      #9  0x000055a9d9d8e156 in ha_innobase::index_first (this=0x61d0000f1eb8, buf=0x61a00028c2b8 "\377") at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/handler/ha_innodb.cc:9376
      #10 0x000055a9d9d8e394 in ha_innobase::rnd_next (this=0x61d0000f1eb8, buf=0x61a00028c2b8 "\377") at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/handler/ha_innodb.cc:9469
      #11 0x000055a9d94936d0 in handler::ha_rnd_next (this=0x61d0000f1eb8, buf=0x61a00028c2b8 "\377") at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/handler.cc:3397
      #12 0x000055a9d989cfc5 in rr_sequential (info=0x7ffa4d83dac0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/records.cc:519
      #13 0x000055a9d8b22cf6 in READ_RECORD::read_record (this=0x7ffa4d83dac0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/records.h:81
      #14 0x000055a9d8f98334 in mysql_update (thd=0x62b000230218, table_list=0x62b000237398, fields=..., values=..., conds=0x0, order_num=0, order=0x0, limit=2, ignore=false, found_return=0x7ffa4d83e2f0, updated_return=0x7ffa4d83e310)
          at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_update.cc:1005
      #15 0x000055a9d8cb91e4 in mysql_execute_command (thd=0x62b000230218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_parse.cc:4407
      #16 0x000055a9d8cd1628 in mysql_parse (thd=0x62b000230218, rawbuf=0x62b000237238 "UPDATE t1 SET col1 = 2702 LIMIT 2  /* E_R Thread24 QNO 409 CON_ID 38 */", length=71, parser_state=0x7ffa4d83eb20) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_parse.cc:8029
      #17 0x000055a9d8ca99cd in dispatch_command (command=COM_QUERY, thd=0x62b000230218, packet=0x629000e3d219 " UPDATE t1 SET col1 = 2702 LIMIT 2  /* E_R Thread24 QNO 409 CON_ID 38 */ ", packet_length=73, blocking=true)
          at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_parse.cc:1896
      #18 0x000055a9d8ca6da5 in do_command (thd=0x62b000230218, blocking=true) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_parse.cc:1404
      #19 0x000055a9d90a72d4 in do_handle_one_connection (connect=0x608000003bb8, put_in_cache=true) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_connect.cc:1418
      #20 0x000055a9d90a6b60 in handle_one_connection (arg=0x608000003bb8) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_connect.cc:1312
      #21 0x0000470645dd1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #22 0x000013540bb10293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      (rr)
       
      pluto:/data/results/1643649542/TBR-1342/dev/shm/rqg/1643649542/10/1/rr
       
      RQG
      ===
      # git clone https://github.com/mleich1/rqg --branch experimental RQG
      #
      # GIT_SHOW: HEAD -> experimental, origin/experimental 067340e8ec78c25880ccd2e67dd62ad134edc531 2022-01-31T17:10:39+01:00
      # rqg.pl  : Version 4.0.4 (2021-12)
      #
      # $RQG_HOME/rqg.pl \
      # --grammar=conf/mariadb/table_stress_innodb.yy \
      # --gendata=conf/mariadb/table_stress.zz \
      # --gendata_sql=conf/mariadb/table_stress.sql \
      # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      # --mysqld=--loose-idle_write_transaction_timeout=0 \
      # --mysqld=--loose-idle_transaction_timeout=0 \
      # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      # --mysqld=--connect_timeout=60 \
      # --mysqld=--interactive_timeout=28800 \
      # --mysqld=--slave_net_timeout=60 \
      # --mysqld=--net_read_timeout=30 \
      # --mysqld=--net_write_timeout=60 \
      # --mysqld=--loose-table_lock_wait_timeout=50 \
      # --mysqld=--wait_timeout=28800 \
      # --mysqld=--lock-wait-timeout=86400 \
      # --mysqld=--innodb-lock-wait-timeout=50 \
      # --no-mask \
      # --queries=10000000 \
      # --seed=random \
      # --reporters=Backtrace \
      # --reporters=ErrorLog \
      # --reporters=Deadlock1 \
      # --validators=None \
      # --mysqld=--log_output=none \
      # --mysqld=--log_bin_trust_function_creators=1 \
      # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      # --engine=InnoDB \
      # --restart_timeout=240 \
      # --mysqld=--plugin-load-add=file_key_management.so \
      # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      # --mysqld=--plugin-load-add=provider_lzo.so \
      # --mysqld=--plugin-load-add=provider_bzip2.so \
      # --mysqld=--plugin-load-add=provider_lzma \
      # --mysqld=--plugin-load-add=provider_snappy \
      # --mysqld=--plugin-load-add=provider_lz4 \
      # --duration=300 \
      # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      # --mysqld=--loose-innodb_read_only_compressed=OFF \
      # --mysqld=--innodb_stats_persistent=on \
      # --mysqld=--innodb_adaptive_hash_index=off \
      # --mysqld=--log-bin \
      # --mysqld=--sync-binlog=1 \
      # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
      # --mysqld=--loose-max-statement-time=30 \
      # --threads=33 \
      # --mysqld=--innodb-use-native-aio=0 \
      # --mysqld=--loose-gdb \
      # --mysqld=--loose-debug-gdb \
      # --rr=Extended \
      # --rr_options=--chaos --wait \
      # --mysqld=--innodb_rollback_on_timeout=OFF \
      # --vardir_type=fast \
      # --mysqld=--innodb_page_size=4K \
      # --mysqld=--innodb-buffer-pool-size=5M \
      # --no_mask \
      # <local settings>
      
      

      The MDEV-27701.test.diff test case shows the issue.

      The scenario is the following at least for 10.6:

      1. trx2 acquires lock on some record;

      2. trx3 requests conflicting with trx2 lock on the record, creates waiting lock and suspends. After that trx3 waiting is finished by timeout and trx3 tries to cancel waiting in lock_sys_t::cancel() call.

      3. trx2 starts record updating. It updates clustered record without changing ordering fields and changing the size of non-ordering fields. In that case inplace update is impossible, that's why the combination of delete+insert is executed. Before deleting record locks are moved from the deleting record to infimum, then the record is deleted, then new record is inserted, and then the locks are moved from the infimum to the inserted record. See the code between lock_rec_store_on_page_infimum() and lock_rec_restore_from_page_infimum() calls for details. So trx2 moved trx3 locks from the deleted record to the newly inserted record, changing trx->lock.wait_lock for trx3 in lock_rec_create_low(). lock_rec_create_low() changes trx->lock.wait_lock without locking lock_sys.wait_mutex. I suppose it was done because it was guessed that lock_rec_create_low() can be invoked only from the transaction owning the mutex. As we can see, lock_rec_move() can be invoked not from lock owning transaction.

      4. trx3 checks ut_ad(trx->lock.wait_lock == lock) in lock_sys_t::cancel(). Note the check happens under lock_sys.wait_mutex, which was acquired in lock_wait(). as trx->lock.wait_lock was changed by trx2 after locks moving, the assertion fails.

      Attachments

        Issue Links

          Activity

            vlad.lesin, can you debug this? Or would this already be fixed by one of your recent changes?
            I used the following commands:

            ssh pluto
            rr replay /data/results/1643649542/TBR-1342/dev/shm/rqg/1643649542/10/1/rr/latest-trace
            

            continue
            frame 4
            watch -l trx.lock.wait_lock.m._M_b._M_p
            reverse-continue
            reverse-continue
            thread apply 3 backtrace
            backtrace
            

            The assertion failed in Thread 3 for me. The lock had been replaced in Thread 39 executing ROLLBACK:

            bb-10.6-MDEV-20605-cur-pos-fix ac48095d5ba1c408cdc77d095985b2f20e46c82f

            #3  0x000055a9d9f24cda in Atomic_relaxed<ib_lock_t*>::operator= (this=0x7ffa5799fec0, i=0x7ffa579a0478) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/include/my_atomic_wrapper.h:49
            #4  0x000055a9d9ef8ca0 in lock_rec_create_low (c_lock=0x0, type_mode=259, page_id=..., page=0x7ffa49cf8000 "", heap_no=13, index=0x6160107b7208, trx=0x7ffa5799fe88, holds_trx_mutex=true, 
                insert_before_waiting=false) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:1300
            #5  0x000055a9d9efa572 in lock_rec_add_to_queue (type_mode=259, cell=..., id=..., page=0x7ffa49cf8000 "", heap_no=13, index=0x6160107b7208, trx=0x7ffa5799fe88, caller_owns_trx_mutex=true, 
                insert_before_waiting=false) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:1521
            #6  0x000055a9d9f00bfd in lock_rec_move (receiver_cell=..., receiver=..., receiver_id=..., donator_cell=..., donator_id=..., receiver_heap_no=13, donator_heap_no=0)
                at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:2283
            #7  0x000055a9d9f072a0 in lock_rec_restore_from_page_infimum (block=..., rec=0x7ffa49cf83fd "", donator=...) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:3187
            #8  0x000055a9da2f40a5 in btr_cur_pessimistic_update (flags=7, cursor=0x61a0000e4178, offsets=0x100353764ee0, offsets_heap=0x100353764ec0, entry_heap=0x6190003ec680, big_rec=0x100353764d40, 
                update=0x618000373d08, cmpl_info=0, thr=0x616004789cc0, trx_id=5395, mtr=0x100353764fc0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0cur.cc:5134
            #9  0x000055a9da5593eb in row_undo_mod_clust_low (node=0x61a0000e4108, offsets=0x100353764ee0, offsets_heap=0x100353764ec0, heap=0x6190003ec680, rebuilt_old_pk=0x100353764f00, 
                sys=0x100353765500 "0UvS\003\020", thr=0x616004789cc0, mtr=0x100353764fc0, mode=33) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/row/row0umod.cc:142
            

            At this time, the to-be-crashing Thread 3 was already executing on that assertion that would fail moments later:

            bb-10.6-MDEV-20605-cur-pos-fix ac48095d5ba1c408cdc77d095985b2f20e46c82f

            #0  lock_sys_t::cancel<false> (trx=0x7ffa5799fe88, lock=0x7ffa579a01c8) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:5744
            #1  0x000055a9d9efd1e2 in lock_wait (thr=0x621000899db8) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:1918
            #2  0x000055a9da10c759 in row_mysql_handle_errors (new_err=0x7ffa4d83c4c0, trx=0x7ffa5799fe88, thr=0x621000899db8, savept=0x0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/row/row0mysql.cc:686
            #3  0x000055a9da192e2c in row_search_mvcc (buf=0x61a00028c2b8 "\377", mode=PAGE_CUR_G, prebuilt=0x621000899588, match_mode=0, direction=0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/row/row0sel.cc:5826
            

            Because this is an unoptimized build, the read of trx->lock.wait_lock would occur in a separate procedure:

            => 0x000055a9d9f253ad <+387>:	mov    -0x28(%rbp),%rax
               0x000055a9d9f253b1 <+391>:	add    $0x38,%rax
               0x000055a9d9f253b5 <+395>:	mov    %rax,%rdi
               0x000055a9d9f253b8 <+398>:	callq  0x55a9d9dddea8 <Atomic_relaxed<ib_lock_t*>::operator ib_lock_t*() const>
            

            The modification by Thread 39 occurred between the first mov and the add. So, the value had already been modified by the time it was being read. The corresponding source code is:

            template<bool check_victim>
            dberr_t lock_sys_t::cancel(trx_t *trx, lock_t *lock)
            {
              mysql_mutex_assert_owner(&lock_sys.wait_mutex);
              ut_ad(trx->lock.wait_lock == lock);
            

            It is worth noting that Thread 39 was not holding lock_sys.wait_mutex at the time it updated the trx->lock.wait_lock.

            I may have wrongly assumed in lock_rec_create_low() that waiting locks are always being created for the transaction that is being executed in the current thread. It is definitely not the case here. This entire code in lock_rec_create_low() may need to be revised:

            	if (type_mode & LOCK_WAIT) {
            		if (trx->lock.wait_trx) {
            			ut_ad(!c_lock || trx->lock.wait_trx == c_lock->trx);
            			ut_ad(trx->lock.wait_lock);
            			ut_ad((*trx->lock.wait_lock).trx == trx);
            		} else {
            			ut_ad(c_lock);
            			trx->lock.wait_trx = c_lock->trx;
            			ut_ad(!trx->lock.wait_lock);
            		}
            		trx->lock.wait_lock = lock;
            	}
            

            When we are moving a waiting lock request of a ‘foreign’ thread, we must hold lock_sys.wait_mutex. Note: Starting with MDEV-24671 (10.6), lock_sys.wait_mutex must be acquired after lock_sys.latch.

            A possible fix would be to make all callers of lock_rec_move() check for the return value, which would indicate if any LOCK_WAIT records were encountered (and initially skipped). If they were, then we would have to retry the operation. Any code like the following would have to be adjusted:

            void lock_rec_restore_from_page_infimum(const buf_block_t &block,
            					const rec_t *rec, page_id_t donator)
            {
              const ulint heap_no= page_rec_get_heap_no(rec);
              const page_id_t id{block.page.id()};
              LockMultiGuard g{lock_sys.rec_hash, id, donator};
              lock_rec_move(g.cell1(), block, id, g.cell2(), donator, heap_no,
                            PAGE_HEAP_NO_INFIMUM);
            }
            

            In lock_rec_move() we could mysql_mutex_trylock(&lock_sys.wait_mutex) upon encountering the first LOCK_WAIT object. If it succeeds, fine. If not, the LockMultiGuard latches on the hash table cells must be released, lock_sys.wait_mutex acquired, and the hash table latches reacquired again.

            marko Marko Mäkelä added a comment - vlad.lesin , can you debug this? Or would this already be fixed by one of your recent changes? I used the following commands: ssh pluto rr replay /data/results/1643649542/TBR-1342/dev/shm/rqg/1643649542/10/1/rr/latest-trace continue frame 4 watch -l trx.lock.wait_lock.m._M_b._M_p reverse-continue reverse-continue thread apply 3 backtrace backtrace The assertion failed in Thread 3 for me. The lock had been replaced in Thread 39 executing ROLLBACK : bb-10.6-MDEV-20605-cur-pos-fix ac48095d5ba1c408cdc77d095985b2f20e46c82f #3 0x000055a9d9f24cda in Atomic_relaxed<ib_lock_t*>::operator= (this=0x7ffa5799fec0, i=0x7ffa579a0478) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/include/my_atomic_wrapper.h:49 #4 0x000055a9d9ef8ca0 in lock_rec_create_low (c_lock=0x0, type_mode=259, page_id=..., page=0x7ffa49cf8000 "", heap_no=13, index=0x6160107b7208, trx=0x7ffa5799fe88, holds_trx_mutex=true, insert_before_waiting=false) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:1300 #5 0x000055a9d9efa572 in lock_rec_add_to_queue (type_mode=259, cell=..., id=..., page=0x7ffa49cf8000 "", heap_no=13, index=0x6160107b7208, trx=0x7ffa5799fe88, caller_owns_trx_mutex=true, insert_before_waiting=false) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:1521 #6 0x000055a9d9f00bfd in lock_rec_move (receiver_cell=..., receiver=..., receiver_id=..., donator_cell=..., donator_id=..., receiver_heap_no=13, donator_heap_no=0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:2283 #7 0x000055a9d9f072a0 in lock_rec_restore_from_page_infimum (block=..., rec=0x7ffa49cf83fd "", donator=...) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:3187 #8 0x000055a9da2f40a5 in btr_cur_pessimistic_update (flags=7, cursor=0x61a0000e4178, offsets=0x100353764ee0, offsets_heap=0x100353764ec0, entry_heap=0x6190003ec680, big_rec=0x100353764d40, update=0x618000373d08, cmpl_info=0, thr=0x616004789cc0, trx_id=5395, mtr=0x100353764fc0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/btr/btr0cur.cc:5134 #9 0x000055a9da5593eb in row_undo_mod_clust_low (node=0x61a0000e4108, offsets=0x100353764ee0, offsets_heap=0x100353764ec0, heap=0x6190003ec680, rebuilt_old_pk=0x100353764f00, sys=0x100353765500 "0UvS\003\020", thr=0x616004789cc0, mtr=0x100353764fc0, mode=33) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/row/row0umod.cc:142 At this time, the to-be-crashing Thread 3 was already executing on that assertion that would fail moments later: bb-10.6-MDEV-20605-cur-pos-fix ac48095d5ba1c408cdc77d095985b2f20e46c82f #0 lock_sys_t::cancel<false> (trx=0x7ffa5799fe88, lock=0x7ffa579a01c8) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:5744 #1 0x000055a9d9efd1e2 in lock_wait (thr=0x621000899db8) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:1918 #2 0x000055a9da10c759 in row_mysql_handle_errors (new_err=0x7ffa4d83c4c0, trx=0x7ffa5799fe88, thr=0x621000899db8, savept=0x0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/row/row0mysql.cc:686 #3 0x000055a9da192e2c in row_search_mvcc (buf=0x61a00028c2b8 "\377", mode=PAGE_CUR_G, prebuilt=0x621000899588, match_mode=0, direction=0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/row/row0sel.cc:5826 Because this is an unoptimized build, the read of trx->lock.wait_lock would occur in a separate procedure: => 0x000055a9d9f253ad <+387>: mov -0x28(%rbp),%rax 0x000055a9d9f253b1 <+391>: add $0x38,%rax 0x000055a9d9f253b5 <+395>: mov %rax,%rdi 0x000055a9d9f253b8 <+398>: callq 0x55a9d9dddea8 <Atomic_relaxed<ib_lock_t*>::operator ib_lock_t*() const> The modification by Thread 39 occurred between the first mov and the add . So, the value had already been modified by the time it was being read. The corresponding source code is: template < bool check_victim> dberr_t lock_sys_t::cancel(trx_t *trx, lock_t *lock) { mysql_mutex_assert_owner(&lock_sys.wait_mutex); ut_ad(trx->lock.wait_lock == lock); It is worth noting that Thread 39 was not holding lock_sys.wait_mutex at the time it updated the trx->lock.wait_lock . I may have wrongly assumed in lock_rec_create_low() that waiting locks are always being created for the transaction that is being executed in the current thread. It is definitely not the case here. This entire code in lock_rec_create_low() may need to be revised: if (type_mode & LOCK_WAIT) { if (trx->lock.wait_trx) { ut_ad(!c_lock || trx->lock.wait_trx == c_lock->trx); ut_ad(trx->lock.wait_lock); ut_ad((*trx->lock.wait_lock).trx == trx); } else { ut_ad(c_lock); trx->lock.wait_trx = c_lock->trx; ut_ad(!trx->lock.wait_lock); } trx->lock.wait_lock = lock; } When we are moving a waiting lock request of a ‘foreign’ thread, we must hold lock_sys.wait_mutex . Note: Starting with MDEV-24671 (10.6), lock_sys.wait_mutex must be acquired after lock_sys.latch . A possible fix would be to make all callers of lock_rec_move() check for the return value, which would indicate if any LOCK_WAIT records were encountered (and initially skipped). If they were, then we would have to retry the operation. Any code like the following would have to be adjusted: void lock_rec_restore_from_page_infimum( const buf_block_t &block, const rec_t *rec, page_id_t donator) { const ulint heap_no= page_rec_get_heap_no(rec); const page_id_t id{block.page.id()}; LockMultiGuard g{lock_sys.rec_hash, id, donator}; lock_rec_move(g.cell1(), block, id, g.cell2(), donator, heap_no, PAGE_HEAP_NO_INFIMUM); } In lock_rec_move() we could mysql_mutex_trylock(&lock_sys.wait_mutex) upon encountering the first LOCK_WAIT object. If it succeeds, fine. If not, the LockMultiGuard latches on the hash table cells must be released, lock_sys.wait_mutex acquired, and the hash table latches reacquired again.

            The issue affects not only lock_rec_move(), but also lock_move_reorganize_page(), lock_move_rec_list_end(), lock_move_rec_list_start().

            vlad.lesin Vladislav Lesin added a comment - The issue affects not only lock_rec_move(), but also lock_move_reorganize_page(), lock_move_rec_list_end(), lock_move_rec_list_start().

            The solution, proposed by Marko above, does not work, because in the middle of locks moving locks queue becomes invalid, as some locks have already been moved, and some waiting locks don't have conflicting locks in the queue. If we release cell latch for such cell with invalid lock queue to acquire wait_lock, some other thread can add new lock to the queue, and as the queue is invalid, there can be wrong decision about transaction blocking necessity.

            vlad.lesin Vladislav Lesin added a comment - The solution, proposed by Marko above, does not work, because in the middle of locks moving locks queue becomes invalid, as some locks have already been moved, and some waiting locks don't have conflicting locks in the queue. If we release cell latch for such cell with invalid lock queue to acquire wait_lock, some other thread can add new lock to the queue, and as the queue is invalid, there can be wrong decision about transaction blocking necessity.

            vlad.lesin, right, moving the locks must be atomic. Maybe I thought that the exclusive index tree leaf page latch would prevent any concurrent access to the index record locks on that page. I realize that when locks are released on transaction commit, no page latch would be acquired.

            How about the following? First, acquire LockMultiGuard latches and determine if lock_sys.wait_mutex needs to be acquired. If yes, and mysql_mutex_trylock(&lock_sys.wait_mutex) fails, release the latches (without having modified any locks yet), acquire them in the correct order. Finally, while holding the necessary latches, proceed to move the locks, and release the latches.

            marko Marko Mäkelä added a comment - vlad.lesin , right, moving the locks must be atomic. Maybe I thought that the exclusive index tree leaf page latch would prevent any concurrent access to the index record locks on that page. I realize that when locks are released on transaction commit, no page latch would be acquired. How about the following? First, acquire LockMultiGuard latches and determine if lock_sys.wait_mutex needs to be acquired. If yes, and mysql_mutex_trylock(&lock_sys.wait_mutex) fails, release the latches (without having modified any locks yet), acquire them in the correct order. Finally, while holding the necessary latches, proceed to move the locks, and release the latches.

            The issue is that trx->lock.wait_trx can be changed by other thread even if lock_sys.wait_mutex is held by the current thread.

            lock_sys.wait_mutex is held if something is added to or removed from waiting queues. Each queue at one hand belongs to the certain rec lock hash cell. At the other hand, each waiting transaction contains pointer to waited transaction, so we have some waiting graph, which is used to find deadlocks. The vertices of the graph are all active transactions, and the edges are dependencies. And lock_sys.wait_mutex is mostly used to keep the graph in consistent state to let deadlock detector to traverse the graph.

            Deadlock resolver uses Brent's cycle detection algorithm for each transaction in a global set, protected by lock_sys.wait_mutex. The other version of deadlock resolving function is one for the certain transaction instead of all transactions in the set.

            So, lock_sys.wait_mutex protects:
            1. Deadlock::to_check set.
            2. all trx->lock.wait_trx, i.e. the whole waiting graph
            3. all trx->lock.wait_lock, which is atomic. Both wait_trx and wait_lock are changed in the same cases, i.e. either when wait_lock conflicts with some(granted or waiting) lock of wait_trx or when wait_lock is granted.
            4. some statistics in lock_sys

            And lock_sys.wait_mutex is also used as a mutex for condition variable trx->lock.cond in lock_wait(), i.e. when some transaction is awakened after suspending on some record(or table) lock, wait_mutex is held by the transaction.

            Summarizing all the above, I see the following way to fix the bug. We could change locking order and acquire cell latches before lock_sys.wait_lock. In this case there is no need to reacquire cell latches, we can just acquire wait_mutex from lock_rec_create_low() when we want to change trx->lock.wait_lock. But this approach can potentially has negative performance impact. When we acquire separate cell latches instead of locking one global hash mutex, we distribute lock requests among hash table array elements, and such a way decrease the overall contention of the threads requesting the latches. If after some separate cell latch acquiring we are waiting for global lock_sys.wait_mutex, we increase the time during which the cell latch is held, increasing the probability of requesting the same cell latch by another thread. So overall contention on hash cell latches is also increased. I would say, we need performance testing after implementing this variant.

            vlad.lesin Vladislav Lesin added a comment - The issue is that trx->lock.wait_trx can be changed by other thread even if lock_sys.wait_mutex is held by the current thread. lock_sys.wait_mutex is held if something is added to or removed from waiting queues. Each queue at one hand belongs to the certain rec lock hash cell. At the other hand, each waiting transaction contains pointer to waited transaction, so we have some waiting graph, which is used to find deadlocks. The vertices of the graph are all active transactions, and the edges are dependencies. And lock_sys.wait_mutex is mostly used to keep the graph in consistent state to let deadlock detector to traverse the graph. Deadlock resolver uses Brent's cycle detection algorithm for each transaction in a global set, protected by lock_sys.wait_mutex. The other version of deadlock resolving function is one for the certain transaction instead of all transactions in the set. So, lock_sys.wait_mutex protects: 1. Deadlock::to_check set. 2. all trx->lock.wait_trx, i.e. the whole waiting graph 3. all trx->lock.wait_lock, which is atomic. Both wait_trx and wait_lock are changed in the same cases, i.e. either when wait_lock conflicts with some(granted or waiting) lock of wait_trx or when wait_lock is granted. 4. some statistics in lock_sys And lock_sys.wait_mutex is also used as a mutex for condition variable trx->lock.cond in lock_wait(), i.e. when some transaction is awakened after suspending on some record(or table) lock, wait_mutex is held by the transaction. Summarizing all the above, I see the following way to fix the bug. We could change locking order and acquire cell latches before lock_sys.wait_lock. In this case there is no need to reacquire cell latches, we can just acquire wait_mutex from lock_rec_create_low() when we want to change trx->lock.wait_lock. But this approach can potentially has negative performance impact. When we acquire separate cell latches instead of locking one global hash mutex, we distribute lock requests among hash table array elements, and such a way decrease the overall contention of the threads requesting the latches. If after some separate cell latch acquiring we are waiting for global lock_sys.wait_mutex, we increase the time during which the cell latch is held, increasing the probability of requesting the same cell latch by another thread. So overall contention on hash cell latches is also increased. I would say, we need performance testing after implementing this variant.

            origin/bb-10.6-MDEV-27701 a876c4c303ae5942d30861edc6b8a14edbdc2db0 2023-02-14T10:46:37+03:00
            performed well in RQG testing.

            mleich Matthias Leich added a comment - origin/bb-10.6- MDEV-27701 a876c4c303ae5942d30861edc6b8a14edbdc2db0 2023-02-14T10:46:37+03:00 performed well in RQG testing.

            People

              vlad.lesin Vladislav Lesin
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.