Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27701

Race on trx->lock.wait_lock between lock_rec_move() and lock_sys_t::cancel()

    XMLWordPrintable

Details

    Description

      origin/bb-10.6-MDEV-20605-cur-pos-fix ac48095d5ba1c408cdc77d095985b2f20e46c82f 2022-01-21T21:00:54+03:00
      Per Vladislav his modifications for MDEV-20605 cannot be the reason for the bad effect.
       
      mysqld: /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:5744: static dberr_t lock_sys_t::cancel(trx_t*, ib_lock_t*) [with bool check_victim = false]: Assertion `trx->lock.wait_lock == lock' failed.
      # 2022-01-31T20:06:20 [219582] | Query (0x62b000237238): UPDATE t1 SET col1 = 2702 LIMIT 2 
      # 2022-01-31T20:06:20 [219582] | Status: NOT_KILLED
       
      (rr) bt
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      #1  0x000013540ba13859 in __GI_abort () at abort.c:79
      #2  0x000013540ba13729 in __assert_fail_base (fmt=0x13540bba9588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55a9db267a60 "trx->lock.wait_lock == lock", 
          file=0x55a9db25d680 "/data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc", line=5744, function=<optimized out>) at assert.c:92
      #3  0x000013540ba24f36 in __GI___assert_fail (assertion=0x55a9db267a60 "trx->lock.wait_lock == lock", file=0x55a9db25d680 "/data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc", line=5744, 
          function=0x55a9db2679e0 "static dberr_t lock_sys_t::cancel(trx_t*, ib_lock_t*) [with bool check_victim = false]") at assert.c:101
      #4  0x000055a9d9f2543a in lock_sys_t::cancel<false> (trx=0x7ffa5799fe88, lock=0x7ffa579a01c8) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:5744
      #5  0x000055a9d9efd1e2 in lock_wait (thr=0x621000899db8) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/lock/lock0lock.cc:1918
      #6  0x000055a9da10c759 in row_mysql_handle_errors (new_err=0x7ffa4d83c4c0, trx=0x7ffa5799fe88, thr=0x621000899db8, savept=0x0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/row/row0mysql.cc:686
      #7  0x000055a9da192e2c in row_search_mvcc (buf=0x61a00028c2b8 "\377", mode=PAGE_CUR_G, prebuilt=0x621000899588, match_mode=0, direction=0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/row/row0sel.cc:5826
      #8  0x000055a9d9d8b7d9 in ha_innobase::index_read (this=0x61d0000f1eb8, buf=0x61a00028c2b8 "\377", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/handler/ha_innodb.cc:9015
      #9  0x000055a9d9d8e156 in ha_innobase::index_first (this=0x61d0000f1eb8, buf=0x61a00028c2b8 "\377") at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/handler/ha_innodb.cc:9376
      #10 0x000055a9d9d8e394 in ha_innobase::rnd_next (this=0x61d0000f1eb8, buf=0x61a00028c2b8 "\377") at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/storage/innobase/handler/ha_innodb.cc:9469
      #11 0x000055a9d94936d0 in handler::ha_rnd_next (this=0x61d0000f1eb8, buf=0x61a00028c2b8 "\377") at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/handler.cc:3397
      #12 0x000055a9d989cfc5 in rr_sequential (info=0x7ffa4d83dac0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/records.cc:519
      #13 0x000055a9d8b22cf6 in READ_RECORD::read_record (this=0x7ffa4d83dac0) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/records.h:81
      #14 0x000055a9d8f98334 in mysql_update (thd=0x62b000230218, table_list=0x62b000237398, fields=..., values=..., conds=0x0, order_num=0, order=0x0, limit=2, ignore=false, found_return=0x7ffa4d83e2f0, updated_return=0x7ffa4d83e310)
          at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_update.cc:1005
      #15 0x000055a9d8cb91e4 in mysql_execute_command (thd=0x62b000230218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_parse.cc:4407
      #16 0x000055a9d8cd1628 in mysql_parse (thd=0x62b000230218, rawbuf=0x62b000237238 "UPDATE t1 SET col1 = 2702 LIMIT 2  /* E_R Thread24 QNO 409 CON_ID 38 */", length=71, parser_state=0x7ffa4d83eb20) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_parse.cc:8029
      #17 0x000055a9d8ca99cd in dispatch_command (command=COM_QUERY, thd=0x62b000230218, packet=0x629000e3d219 " UPDATE t1 SET col1 = 2702 LIMIT 2  /* E_R Thread24 QNO 409 CON_ID 38 */ ", packet_length=73, blocking=true)
          at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_parse.cc:1896
      #18 0x000055a9d8ca6da5 in do_command (thd=0x62b000230218, blocking=true) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_parse.cc:1404
      #19 0x000055a9d90a72d4 in do_handle_one_connection (connect=0x608000003bb8, put_in_cache=true) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_connect.cc:1418
      #20 0x000055a9d90a6b60 in handle_one_connection (arg=0x608000003bb8) at /data/Server/bb-10.6-MDEV-20605-cur-pos-fix/sql/sql_connect.cc:1312
      #21 0x0000470645dd1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #22 0x000013540bb10293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      (rr)
       
      pluto:/data/results/1643649542/TBR-1342/dev/shm/rqg/1643649542/10/1/rr
       
      RQG
      ===
      # git clone https://github.com/mleich1/rqg --branch experimental RQG
      #
      # GIT_SHOW: HEAD -> experimental, origin/experimental 067340e8ec78c25880ccd2e67dd62ad134edc531 2022-01-31T17:10:39+01:00
      # rqg.pl  : Version 4.0.4 (2021-12)
      #
      # $RQG_HOME/rqg.pl \
      # --grammar=conf/mariadb/table_stress_innodb.yy \
      # --gendata=conf/mariadb/table_stress.zz \
      # --gendata_sql=conf/mariadb/table_stress.sql \
      # --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      # --mysqld=--loose-idle_write_transaction_timeout=0 \
      # --mysqld=--loose-idle_transaction_timeout=0 \
      # --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      # --mysqld=--connect_timeout=60 \
      # --mysqld=--interactive_timeout=28800 \
      # --mysqld=--slave_net_timeout=60 \
      # --mysqld=--net_read_timeout=30 \
      # --mysqld=--net_write_timeout=60 \
      # --mysqld=--loose-table_lock_wait_timeout=50 \
      # --mysqld=--wait_timeout=28800 \
      # --mysqld=--lock-wait-timeout=86400 \
      # --mysqld=--innodb-lock-wait-timeout=50 \
      # --no-mask \
      # --queries=10000000 \
      # --seed=random \
      # --reporters=Backtrace \
      # --reporters=ErrorLog \
      # --reporters=Deadlock1 \
      # --validators=None \
      # --mysqld=--log_output=none \
      # --mysqld=--log_bin_trust_function_creators=1 \
      # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      # --engine=InnoDB \
      # --restart_timeout=240 \
      # --mysqld=--plugin-load-add=file_key_management.so \
      # --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      # --mysqld=--plugin-load-add=provider_lzo.so \
      # --mysqld=--plugin-load-add=provider_bzip2.so \
      # --mysqld=--plugin-load-add=provider_lzma \
      # --mysqld=--plugin-load-add=provider_snappy \
      # --mysqld=--plugin-load-add=provider_lz4 \
      # --duration=300 \
      # --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      # --mysqld=--loose-innodb_read_only_compressed=OFF \
      # --mysqld=--innodb_stats_persistent=on \
      # --mysqld=--innodb_adaptive_hash_index=off \
      # --mysqld=--log-bin \
      # --mysqld=--sync-binlog=1 \
      # --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
      # --mysqld=--loose-max-statement-time=30 \
      # --threads=33 \
      # --mysqld=--innodb-use-native-aio=0 \
      # --mysqld=--loose-gdb \
      # --mysqld=--loose-debug-gdb \
      # --rr=Extended \
      # --rr_options=--chaos --wait \
      # --mysqld=--innodb_rollback_on_timeout=OFF \
      # --vardir_type=fast \
      # --mysqld=--innodb_page_size=4K \
      # --mysqld=--innodb-buffer-pool-size=5M \
      # --no_mask \
      # <local settings>
      
      

      The MDEV-27701.test.diff test case shows the issue.

      The scenario is the following at least for 10.6:

      1. trx2 acquires lock on some record;

      2. trx3 requests conflicting with trx2 lock on the record, creates waiting lock and suspends. After that trx3 waiting is finished by timeout and trx3 tries to cancel waiting in lock_sys_t::cancel() call.

      3. trx2 starts record updating. It updates clustered record without changing ordering fields and changing the size of non-ordering fields. In that case inplace update is impossible, that's why the combination of delete+insert is executed. Before deleting record locks are moved from the deleting record to infimum, then the record is deleted, then new record is inserted, and then the locks are moved from the infimum to the inserted record. See the code between lock_rec_store_on_page_infimum() and lock_rec_restore_from_page_infimum() calls for details. So trx2 moved trx3 locks from the deleted record to the newly inserted record, changing trx->lock.wait_lock for trx3 in lock_rec_create_low(). lock_rec_create_low() changes trx->lock.wait_lock without locking lock_sys.wait_mutex. I suppose it was done because it was guessed that lock_rec_create_low() can be invoked only from the transaction owning the mutex. As we can see, lock_rec_move() can be invoked not from lock owning transaction.

      4. trx3 checks ut_ad(trx->lock.wait_lock == lock) in lock_sys_t::cancel(). Note the check happens under lock_sys.wait_mutex, which was acquired in lock_wait(). as trx->lock.wait_lock was changed by trx2 after locks moving, the assertion fails.

      Attachments

        Issue Links

          Activity

            People

              vlad.lesin Vladislav Lesin
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.