Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27025

insert-intention lock conflicts with waiting ORDINARY lock

Details

    Description

      We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds lock on the record. And the second's transaction lock contains "waiting" flag.

      Let's take a look 10.6 code:

      dberr_t                                                                      
      lock_rec_insert_check_and_lock(...)
      {
      ...
            const unsigned type_mode= LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION;      
       
            if (lock_t *c_lock= lock_rec_other_has_conflicting(type_mode,             
                                                               g.cell(), id,          
                                                               heap_no, trx))         
            {                                                                         
              trx->mutex_lock();                                                      
              err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->frame,      
                                            heap_no, index, thr, nullptr);            
              trx->mutex_unlock();                                                    
            }
      ...
      }
      

      Neither lock_rec_insert_check_and_lock() nor lock_rec_other_has_conflicting() doesn't check conflicting lock is in waiting state and it already waits for the requesting insert-intention lock transaction.

      The test is attached: ii-conflicts-waiting.test

      Attachments

        Issue Links

          Activity

            vlad.lesin Vladislav Lesin created issue -
            vlad.lesin Vladislav Lesin made changes -
            Field Original Value New Value
            vlad.lesin Vladislav Lesin made changes -
            Description We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds stronger lock, then the second transaction. And the second's transaction lock contains "waiting" flag.

            The test is attached: [^ii-conflicts-waiting.test]
            We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds stronger lock, then the second transaction. And the second's transaction lock contains "waiting" flag.

            I have not yet checked versions less then 10.6.

            The test is attached: [^ii-conflicts-waiting.test]
            vlad.lesin Vladislav Lesin made changes -
            Description We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds stronger lock, then the second transaction. And the second's transaction lock contains "waiting" flag.

            I have not yet checked versions less then 10.6.

            The test is attached: [^ii-conflicts-waiting.test]
            We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds stronger lock, then the second transaction. And the second's transaction lock contains "waiting" flag.

            Let's take a look 10.6 code:

            {code:java}
            dberr_t⏎
            lock_rec_insert_check_and_lock(...)
            {
            ...
                  const unsigned type_mode= LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION;
             
                  if (lock_t *c_lock= lock_rec_other_has_conflicting(type_mode,
                                                                     g.cell(), id,
                                                                     heap_no, trx))
                  {
                    trx->mutex_lock();
                    err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->frame,
                                                  heap_no, index, thr, nullptr);
                    trx->mutex_unlock();
                  }
            ...
            }
            {code}

            Neither lock_rec_insert_check_and_lock() nor lock_rec_other_has_conflicting() doesn't check conflicting lock is in waiting state and it already waits for the requesting insert-intention lock transaction.

            I have not yet checked versions less then 10.6.

            The test is attached: [^ii-conflicts-waiting.test]
            vlad.lesin Vladislav Lesin made changes -
            Description We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds stronger lock, then the second transaction. And the second's transaction lock contains "waiting" flag.

            Let's take a look 10.6 code:

            {code:java}
            dberr_t⏎
            lock_rec_insert_check_and_lock(...)
            {
            ...
                  const unsigned type_mode= LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION;
             
                  if (lock_t *c_lock= lock_rec_other_has_conflicting(type_mode,
                                                                     g.cell(), id,
                                                                     heap_no, trx))
                  {
                    trx->mutex_lock();
                    err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->frame,
                                                  heap_no, index, thr, nullptr);
                    trx->mutex_unlock();
                  }
            ...
            }
            {code}

            Neither lock_rec_insert_check_and_lock() nor lock_rec_other_has_conflicting() doesn't check conflicting lock is in waiting state and it already waits for the requesting insert-intention lock transaction.

            I have not yet checked versions less then 10.6.

            The test is attached: [^ii-conflicts-waiting.test]
            We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds stronger lock, then the second transaction. And the second's transaction lock contains "waiting" flag.

            Let's take a look 10.6 code:

            {code:java}
            dberr_t
            lock_rec_insert_check_and_lock(...)
            {
            ...
                  const unsigned type_mode= LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION;
             
                  if (lock_t *c_lock= lock_rec_other_has_conflicting(type_mode,
                                                                     g.cell(), id,
                                                                     heap_no, trx))
                  {
                    trx->mutex_lock();
                    err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->frame,
                                                  heap_no, index, thr, nullptr);
                    trx->mutex_unlock();
                  }
            ...
            }
            {code}

            Neither lock_rec_insert_check_and_lock() nor lock_rec_other_has_conflicting() doesn't check conflicting lock is in waiting state and it already waits for the requesting insert-intention lock transaction.

            I have not yet checked versions less then 10.6.

            The test is attached: [^ii-conflicts-waiting.test]
            marko Marko Mäkelä made changes -
            Summary insert-intetion lock conflicts with waiting ORDINARY lock insert-intention lock conflicts with waiting ORDINARY lock
            vlad.lesin Vladislav Lesin made changes -
            Description We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds stronger lock, then the second transaction. And the second's transaction lock contains "waiting" flag.

            Let's take a look 10.6 code:

            {code:java}
            dberr_t
            lock_rec_insert_check_and_lock(...)
            {
            ...
                  const unsigned type_mode= LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION;
             
                  if (lock_t *c_lock= lock_rec_other_has_conflicting(type_mode,
                                                                     g.cell(), id,
                                                                     heap_no, trx))
                  {
                    trx->mutex_lock();
                    err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->frame,
                                                  heap_no, index, thr, nullptr);
                    trx->mutex_unlock();
                  }
            ...
            }
            {code}

            Neither lock_rec_insert_check_and_lock() nor lock_rec_other_has_conflicting() doesn't check conflicting lock is in waiting state and it already waits for the requesting insert-intention lock transaction.

            I have not yet checked versions less then 10.6.

            The test is attached: [^ii-conflicts-waiting.test]
            We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds lock on the record. And the second's transaction lock contains "waiting" flag.

            Let's take a look 10.6 code:

            {code:java}
            dberr_t
            lock_rec_insert_check_and_lock(...)
            {
            ...
                  const unsigned type_mode= LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION;
             
                  if (lock_t *c_lock= lock_rec_other_has_conflicting(type_mode,
                                                                     g.cell(), id,
                                                                     heap_no, trx))
                  {
                    trx->mutex_lock();
                    err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->frame,
                                                  heap_no, index, thr, nullptr);
                    trx->mutex_unlock();
                  }
            ...
            }
            {code}

            Neither lock_rec_insert_check_and_lock() nor lock_rec_other_has_conflicting() doesn't check conflicting lock is in waiting state and it already waits for the requesting insert-intention lock transaction.

            I have not yet checked versions less then 10.6.

            The test is attached: [^ii-conflicts-waiting.test]
            valerii Valerii Kravchuk added a comment - See also https://bugs.mysql.com/bug.php?id=21356
            vlad.lesin Vladislav Lesin made changes -
            Affects Version/s 10.2 [ 14601 ]
            Affects Version/s 10.3 [ 22126 ]
            Affects Version/s 10.4 [ 22408 ]
            Affects Version/s 10.5 [ 23123 ]
            vlad.lesin Vladislav Lesin made changes -
            Description We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds lock on the record. And the second's transaction lock contains "waiting" flag.

            Let's take a look 10.6 code:

            {code:java}
            dberr_t
            lock_rec_insert_check_and_lock(...)
            {
            ...
                  const unsigned type_mode= LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION;
             
                  if (lock_t *c_lock= lock_rec_other_has_conflicting(type_mode,
                                                                     g.cell(), id,
                                                                     heap_no, trx))
                  {
                    trx->mutex_lock();
                    err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->frame,
                                                  heap_no, index, thr, nullptr);
                    trx->mutex_unlock();
                  }
            ...
            }
            {code}

            Neither lock_rec_insert_check_and_lock() nor lock_rec_other_has_conflicting() doesn't check conflicting lock is in waiting state and it already waits for the requesting insert-intention lock transaction.

            I have not yet checked versions less then 10.6.

            The test is attached: [^ii-conflicts-waiting.test]
            We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds lock on the record. And the second's transaction lock contains "waiting" flag.

            Let's take a look 10.6 code:

            {code:java}
            dberr_t
            lock_rec_insert_check_and_lock(...)
            {
            ...
                  const unsigned type_mode= LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION;
             
                  if (lock_t *c_lock= lock_rec_other_has_conflicting(type_mode,
                                                                     g.cell(), id,
                                                                     heap_no, trx))
                  {
                    trx->mutex_lock();
                    err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->frame,
                                                  heap_no, index, thr, nullptr);
                    trx->mutex_unlock();
                  }
            ...
            }
            {code}

            Neither lock_rec_insert_check_and_lock() nor lock_rec_other_has_conflicting() doesn't check conflicting lock is in waiting state and it already waits for the requesting insert-intention lock transaction.

            The test is attached: [^ii-conflicts-waiting.test]
            vlad.lesin Vladislav Lesin made changes -
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            vlad.lesin Vladislav Lesin made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            julien.fritsch Julien Fritsch made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 127341 ] MariaDB v4 [ 144616 ]
            mleich Matthias Leich added a comment - - edited

            Non rare problem observed during RQG testing.
            sdp:/data/results/1639496318/TBR-1300/dev/shm/rqg/1639496318/112/1/rr
             
            (rr) bt
            #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
            #1  0x00007f0ae1e11859 in __GI_abort () at abort.c:79
            #2  0x00007f0ae1e11729 in __assert_fail_base (fmt=0x7f0ae1fa7588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55ef2ea609e0 "lock_rec_has_expl(LOCK_X | 1024U, cell, id, heap_no, impl_trx)", 
                file=0x55ef2ea59580 "/data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc", line=4676, function=<optimized out>) at assert.c:92
            #3  0x00007f0ae1e22f36 in __GI___assert_fail (assertion=0x55ef2ea609e0 "lock_rec_has_expl(LOCK_X | 1024U, cell, id, heap_no, impl_trx)", file=0x55ef2ea59580 "/data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc", line=4676, 
                function=0x55ef2ea605e0 "bool lock_rec_queue_validate(bool, page_id_t, const rec_t*, const dict_index_t*, const rec_offs*)") at assert.c:101
            #4  0x000055ef2d6eafc5 in lock_rec_queue_validate (locked_lock_trx_sys=false, id=..., rec=0x7f0ad5ab0369 "\200", index=0x6160070ec708, offsets=0x7f0abf5108c0) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc:4676
            #5  0x000055ef2d6edf25 in lock_rec_insert_check_and_lock (rec=0x7f0ad5ab0143 "\200", block=0x7f0ad470ed70, index=0x6160070ec708, thr=0x620000326868, mtr=0x7f0abf511a70, inherit=0x7f0abf510e50)
                at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc:5034
            #6  0x000055ef2dab9ae8 in btr_cur_ins_lock_and_undo (flags=0, cursor=0x7f0abf511660, entry=0x616006f75408, thr=0x620000326868, mtr=0x7f0abf511a70, inherit=0x7f0abf510e50) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/btr/btr0cur.cc:3271
            #7  0x000055ef2dabb3fd in btr_cur_optimistic_insert (flags=0, cursor=0x7f0abf511660, offsets=0x7f0abf511620, heap=0x7f0abf511600, entry=0x616006f75408, rec=0x7f0abf511640, big_rec=0x7f0abf5115e0, n_ext=0, thr=0x620000326868, mtr=0x7f0abf511a70)
                at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/btr/btr0cur.cc:3515
            #8  0x000055ef2d8a1590 in row_ins_clust_index_entry_low (flags=0, mode=2, index=0x6160070ec708, n_uniq=1, entry=0x616006f75408, n_ext=0, thr=0x620000326868) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0ins.cc:2759
            #9  0x000055ef2d8a3cf2 in row_ins_clust_index_entry (index=0x6160070ec708, entry=0x616006f75408, thr=0x620000326868, n_ext=0) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0ins.cc:3230
            #10 0x000055ef2d8a45f1 in row_ins_index_entry (index=0x6160070ec708, entry=0x616006f75408, thr=0x620000326868) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0ins.cc:3356
            #11 0x000055ef2d8a5661 in row_ins_index_entry_step (node=0x620000326630, thr=0x620000326868) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0ins.cc:3524
            #12 0x000055ef2d8a6020 in row_ins (node=0x620000326630, thr=0x620000326868) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0ins.cc:3670
            #13 0x000055ef2d8a7148 in row_ins_step (thr=0x620000326868) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0ins.cc:3816
            #14 0x000055ef2d8e62e3 in row_insert_for_mysql (mysql_rec=0x6190005884d0 "\377\001", prebuilt=0x620000326108, ins_mode=ROW_INS_NORMAL) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0mysql.cc:1318
            #15 0x000055ef2d55dd15 in ha_innobase::write_row (this=0x61d0014866b8, record=0x6190005884d0 "\377\001") at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/handler/ha_innodb.cc:7836
            #16 0x000055ef2cc8cc86 in handler::ha_write_row (this=0x61d0014866b8, buf=0x6190005884d0 "\377\001") at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/handler.cc:7519
            #17 0x000055ef2c3e1820 in write_record (thd=0x62b00016c218, table=0x619000587f98, info=0x7f0abf512e60, sink=0x0) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_insert.cc:2146
            #18 0x000055ef2c3d9fd9 in mysql_insert (thd=0x62b00016c218, table_list=0x62b0001733b0, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false, result=0x0) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_insert.cc:1123
            #19 0x000055ef2c4971a3 in mysql_execute_command (thd=0x62b00016c218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:4565
            #20 0x000055ef2c4ae54a in mysql_parse (thd=0x62b00016c218, rawbuf=0x62b000173238 "INSERT INTO unrelated (a) VALUES ( 1) /* E_R Thread8 QNO 93 CON_ID 22 */", length=72, parser_state=0x7f0abf513b20) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:8030
            #21 0x000055ef2c4867c1 in dispatch_command (command=COM_QUERY, thd=0x62b00016c218, packet=0x629000c8f219 "INSERT INTO unrelated (a) VALUES ( 1) /* E_R Thread8 QNO 93 CON_ID 22 */ ", packet_length=73, blocking=true)
                at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:1896
            #22 0x000055ef2c483b99 in do_command (thd=0x62b00016c218, blocking=true) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:1404
            #23 0x000055ef2c883cfc in do_handle_one_connection (connect=0x608000003338, put_in_cache=true) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_connect.cc:1418
            #24 0x000055ef2c883588 in handle_one_connection (arg=0x608000003338) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_connect.cc:1312
            #25 0x00007f0ae2339609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #26 0x00007f0ae1f0e293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            (rr)
             
            mysqld: /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc:4676: bool lock_rec_queue_validate(bool, page_id_t, const rec_t*, const dict_index_t*, const rec_offs*): Assertion `lock_rec_has_expl(LOCK_X | 1024U, cell, id, heap_no, impl_trx)' failed.
            for some INSERT INTO unrelated (a) VALUES ( 1)
            Status: NOT_KILLED
             
             
            sdp:/data/results/1639496318/TBR-1301/dev/shm/rqg/1639496318/38/1/rr
            (rr) bt
            #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
            #1  0x00007fa520f7d859 in __GI_abort () at abort.c:79
            #2  0x00007fa520f7d729 in __assert_fail_base (fmt=0x7fa521113588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x559bd2c76c40 "!other_lock", file=0x559bd2c74580 "/data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc", line=4721, 
                function=<optimized out>) at assert.c:92
            #3  0x00007fa520f8ef36 in __GI___assert_fail (assertion=0x559bd2c76c40 "!other_lock", file=0x559bd2c74580 "/data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc", line=4721, 
                function=0x559bd2c7b5e0 "bool lock_rec_queue_validate(bool, page_id_t, const rec_t*, const dict_index_t*, const rec_offs*)") at assert.c:101
            #4  0x0000559bd1906680 in lock_rec_queue_validate (locked_lock_trx_sys=false, id=..., rec=0x77414acd02c4 "\200", index=0x616005566f08, offsets=0x655873840a70) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc:4721
            #5  0x0000559bd190ca0c in lock_clust_rec_read_check_and_lock (flags=0, block=0x774149e77e50, rec=0x77414acd02c4 "\200", index=0x616005566f08, offsets=0x655873840a70, mode=LOCK_X, gap_mode=0, thr=0x62100015ea10)
                at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc:5532
            #6  0x0000559bd1b68de9 in sel_set_rec_lock (pcur=0x62100015e368, rec=0x77414acd02c4 "\200", index=0x616005566f08, offsets=0x655873840a70, mode=3, type=0, thr=0x62100015ea10, mtr=0x655873840d50)
                at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0sel.cc:1326
            #7  0x0000559bd1b7f270 in row_search_mvcc (buf=0x61a00011d6b8 "\377\377", mode=PAGE_CUR_G, prebuilt=0x62100015e188, match_mode=0, direction=0) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0sel.cc:5186
            #8  0x0000559bd177f0cb in ha_innobase::index_read (this=0x61d000e664b8, buf=0x61a00011d6b8 "\377\377", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/handler/ha_innodb.cc:9016
            #9  0x0000559bd1781a48 in ha_innobase::index_first (this=0x61d000e664b8, buf=0x61a00011d6b8 "\377\377") at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/handler/ha_innodb.cc:9377
            #10 0x0000559bd1781c86 in ha_innobase::rnd_next (this=0x61d000e664b8, buf=0x61a00011d6b8 "\377\377") at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/handler/ha_innodb.cc:9470
            #11 0x0000559bd0e8ae3c in handler::ha_rnd_next (this=0x61d000e664b8, buf=0x61a00011d6b8 "\377\377") at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/handler.cc:3396
            #12 0x0000559bd1294667 in rr_sequential (info=0x655873841710) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/records.cc:519
            #13 0x0000559bd051ab26 in READ_RECORD::read_record (this=0x655873841710) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/records.h:81
            #14 0x0000559bd12e1a87 in mysql_delete (thd=0x62b00012d218, table_list=0x62b0001343b8, conds=0x62b000135088, order_list=0x62b000131e60, limit=18446744073709551615, options=0, result=0x0) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_delete.cc:796
            #15 0x0000559bd06b4680 in mysql_execute_command (thd=0x62b00012d218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:4807
            #16 0x0000559bd06c954a in mysql_parse (thd=0x62b00012d218, rawbuf=0x62b000134238 "DELETE FROM t6 WHERE col2 = 16 OR col2 IS NULL  /* E_R Thread2 QNO 11207 CON_ID 55 */", length=85, parser_state=0x655873842b20)
                at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:8030
            #17 0x0000559bd06a17c1 in dispatch_command (command=COM_QUERY, thd=0x62b00012d218, packet=0x629006095219 " DELETE FROM t6 WHERE col2 = 16 OR col2 IS NULL  /* E_R Thread2 QNO 11207 CON_ID 55 */ ", packet_length=87, blocking=true)
                at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:1896
            #18 0x0000559bd069eb99 in do_command (thd=0x62b00012d218, blocking=true) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:1404
            #19 0x0000559bd0a9ecfc in do_handle_one_connection (connect=0x608000038738, put_in_cache=true) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_connect.cc:1418
            #20 0x0000559bd0a9e588 in handle_one_connection (arg=0x608000003038) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_connect.cc:1312
            #21 0x00007fa521174609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #22 0x00007fa52107a293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            (rr)
            mysqld: /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc:4721: bool lock_rec_queue_validate(bool, page_id_t, const rec_t*, const dict_index_t*, const rec_offs*): Assertion `!other_lock' failed.
            for some DELETE FROM t6 WHERE col2 = 16 OR col2 IS NULL
            Status: KILL_TIMEOUT
            
            

            mleich Matthias Leich added a comment - - edited Non rare problem observed during RQG testing. sdp:/data/results/1639496318/TBR-1300/dev/shm/rqg/1639496318/112/1/rr   (rr) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f0ae1e11859 in __GI_abort () at abort.c:79 #2 0x00007f0ae1e11729 in __assert_fail_base (fmt=0x7f0ae1fa7588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55ef2ea609e0 "lock_rec_has_expl(LOCK_X | 1024U, cell, id, heap_no, impl_trx)", file=0x55ef2ea59580 "/data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc", line=4676, function=<optimized out>) at assert.c:92 #3 0x00007f0ae1e22f36 in __GI___assert_fail (assertion=0x55ef2ea609e0 "lock_rec_has_expl(LOCK_X | 1024U, cell, id, heap_no, impl_trx)", file=0x55ef2ea59580 "/data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc", line=4676, function=0x55ef2ea605e0 "bool lock_rec_queue_validate(bool, page_id_t, const rec_t*, const dict_index_t*, const rec_offs*)") at assert.c:101 #4 0x000055ef2d6eafc5 in lock_rec_queue_validate (locked_lock_trx_sys=false, id=..., rec=0x7f0ad5ab0369 "\200", index=0x6160070ec708, offsets=0x7f0abf5108c0) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc:4676 #5 0x000055ef2d6edf25 in lock_rec_insert_check_and_lock (rec=0x7f0ad5ab0143 "\200", block=0x7f0ad470ed70, index=0x6160070ec708, thr=0x620000326868, mtr=0x7f0abf511a70, inherit=0x7f0abf510e50) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc:5034 #6 0x000055ef2dab9ae8 in btr_cur_ins_lock_and_undo (flags=0, cursor=0x7f0abf511660, entry=0x616006f75408, thr=0x620000326868, mtr=0x7f0abf511a70, inherit=0x7f0abf510e50) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/btr/btr0cur.cc:3271 #7 0x000055ef2dabb3fd in btr_cur_optimistic_insert (flags=0, cursor=0x7f0abf511660, offsets=0x7f0abf511620, heap=0x7f0abf511600, entry=0x616006f75408, rec=0x7f0abf511640, big_rec=0x7f0abf5115e0, n_ext=0, thr=0x620000326868, mtr=0x7f0abf511a70) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/btr/btr0cur.cc:3515 #8 0x000055ef2d8a1590 in row_ins_clust_index_entry_low (flags=0, mode=2, index=0x6160070ec708, n_uniq=1, entry=0x616006f75408, n_ext=0, thr=0x620000326868) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0ins.cc:2759 #9 0x000055ef2d8a3cf2 in row_ins_clust_index_entry (index=0x6160070ec708, entry=0x616006f75408, thr=0x620000326868, n_ext=0) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0ins.cc:3230 #10 0x000055ef2d8a45f1 in row_ins_index_entry (index=0x6160070ec708, entry=0x616006f75408, thr=0x620000326868) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0ins.cc:3356 #11 0x000055ef2d8a5661 in row_ins_index_entry_step (node=0x620000326630, thr=0x620000326868) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0ins.cc:3524 #12 0x000055ef2d8a6020 in row_ins (node=0x620000326630, thr=0x620000326868) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0ins.cc:3670 #13 0x000055ef2d8a7148 in row_ins_step (thr=0x620000326868) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0ins.cc:3816 #14 0x000055ef2d8e62e3 in row_insert_for_mysql (mysql_rec=0x6190005884d0 "\377\001", prebuilt=0x620000326108, ins_mode=ROW_INS_NORMAL) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0mysql.cc:1318 #15 0x000055ef2d55dd15 in ha_innobase::write_row (this=0x61d0014866b8, record=0x6190005884d0 "\377\001") at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/handler/ha_innodb.cc:7836 #16 0x000055ef2cc8cc86 in handler::ha_write_row (this=0x61d0014866b8, buf=0x6190005884d0 "\377\001") at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/handler.cc:7519 #17 0x000055ef2c3e1820 in write_record (thd=0x62b00016c218, table=0x619000587f98, info=0x7f0abf512e60, sink=0x0) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_insert.cc:2146 #18 0x000055ef2c3d9fd9 in mysql_insert (thd=0x62b00016c218, table_list=0x62b0001733b0, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false, result=0x0) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_insert.cc:1123 #19 0x000055ef2c4971a3 in mysql_execute_command (thd=0x62b00016c218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:4565 #20 0x000055ef2c4ae54a in mysql_parse (thd=0x62b00016c218, rawbuf=0x62b000173238 "INSERT INTO unrelated (a) VALUES ( 1) /* E_R Thread8 QNO 93 CON_ID 22 */", length=72, parser_state=0x7f0abf513b20) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:8030 #21 0x000055ef2c4867c1 in dispatch_command (command=COM_QUERY, thd=0x62b00016c218, packet=0x629000c8f219 "INSERT INTO unrelated (a) VALUES ( 1) /* E_R Thread8 QNO 93 CON_ID 22 */ ", packet_length=73, blocking=true) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:1896 #22 0x000055ef2c483b99 in do_command (thd=0x62b00016c218, blocking=true) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:1404 #23 0x000055ef2c883cfc in do_handle_one_connection (connect=0x608000003338, put_in_cache=true) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_connect.cc:1418 #24 0x000055ef2c883588 in handle_one_connection (arg=0x608000003338) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_connect.cc:1312 #25 0x00007f0ae2339609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #26 0x00007f0ae1f0e293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (rr)   mysqld: /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc:4676: bool lock_rec_queue_validate(bool, page_id_t, const rec_t*, const dict_index_t*, const rec_offs*): Assertion `lock_rec_has_expl(LOCK_X | 1024U, cell, id, heap_no, impl_trx)' failed. for some INSERT INTO unrelated (a) VALUES ( 1) Status: NOT_KILLED     sdp:/data/results/1639496318/TBR-1301/dev/shm/rqg/1639496318/38/1/rr (rr) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fa520f7d859 in __GI_abort () at abort.c:79 #2 0x00007fa520f7d729 in __assert_fail_base (fmt=0x7fa521113588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x559bd2c76c40 "!other_lock", file=0x559bd2c74580 "/data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc", line=4721, function=<optimized out>) at assert.c:92 #3 0x00007fa520f8ef36 in __GI___assert_fail (assertion=0x559bd2c76c40 "!other_lock", file=0x559bd2c74580 "/data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc", line=4721, function=0x559bd2c7b5e0 "bool lock_rec_queue_validate(bool, page_id_t, const rec_t*, const dict_index_t*, const rec_offs*)") at assert.c:101 #4 0x0000559bd1906680 in lock_rec_queue_validate (locked_lock_trx_sys=false, id=..., rec=0x77414acd02c4 "\200", index=0x616005566f08, offsets=0x655873840a70) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc:4721 #5 0x0000559bd190ca0c in lock_clust_rec_read_check_and_lock (flags=0, block=0x774149e77e50, rec=0x77414acd02c4 "\200", index=0x616005566f08, offsets=0x655873840a70, mode=LOCK_X, gap_mode=0, thr=0x62100015ea10) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc:5532 #6 0x0000559bd1b68de9 in sel_set_rec_lock (pcur=0x62100015e368, rec=0x77414acd02c4 "\200", index=0x616005566f08, offsets=0x655873840a70, mode=3, type=0, thr=0x62100015ea10, mtr=0x655873840d50) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0sel.cc:1326 #7 0x0000559bd1b7f270 in row_search_mvcc (buf=0x61a00011d6b8 "\377\377", mode=PAGE_CUR_G, prebuilt=0x62100015e188, match_mode=0, direction=0) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/row/row0sel.cc:5186 #8 0x0000559bd177f0cb in ha_innobase::index_read (this=0x61d000e664b8, buf=0x61a00011d6b8 "\377\377", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY) at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/handler/ha_innodb.cc:9016 #9 0x0000559bd1781a48 in ha_innobase::index_first (this=0x61d000e664b8, buf=0x61a00011d6b8 "\377\377") at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/handler/ha_innodb.cc:9377 #10 0x0000559bd1781c86 in ha_innobase::rnd_next (this=0x61d000e664b8, buf=0x61a00011d6b8 "\377\377") at /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/handler/ha_innodb.cc:9470 #11 0x0000559bd0e8ae3c in handler::ha_rnd_next (this=0x61d000e664b8, buf=0x61a00011d6b8 "\377\377") at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/handler.cc:3396 #12 0x0000559bd1294667 in rr_sequential (info=0x655873841710) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/records.cc:519 #13 0x0000559bd051ab26 in READ_RECORD::read_record (this=0x655873841710) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/records.h:81 #14 0x0000559bd12e1a87 in mysql_delete (thd=0x62b00012d218, table_list=0x62b0001343b8, conds=0x62b000135088, order_list=0x62b000131e60, limit=18446744073709551615, options=0, result=0x0) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_delete.cc:796 #15 0x0000559bd06b4680 in mysql_execute_command (thd=0x62b00012d218, is_called_from_prepared_stmt=false) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:4807 #16 0x0000559bd06c954a in mysql_parse (thd=0x62b00012d218, rawbuf=0x62b000134238 "DELETE FROM t6 WHERE col2 = 16 OR col2 IS NULL /* E_R Thread2 QNO 11207 CON_ID 55 */", length=85, parser_state=0x655873842b20) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:8030 #17 0x0000559bd06a17c1 in dispatch_command (command=COM_QUERY, thd=0x62b00012d218, packet=0x629006095219 " DELETE FROM t6 WHERE col2 = 16 OR col2 IS NULL /* E_R Thread2 QNO 11207 CON_ID 55 */ ", packet_length=87, blocking=true) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:1896 #18 0x0000559bd069eb99 in do_command (thd=0x62b00012d218, blocking=true) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_parse.cc:1404 #19 0x0000559bd0a9ecfc in do_handle_one_connection (connect=0x608000038738, put_in_cache=true) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_connect.cc:1418 #20 0x0000559bd0a9e588 in handle_one_connection (arg=0x608000003038) at /data/Server/bb-10.6-MDEV-27025-deadlock/sql/sql_connect.cc:1312 #21 0x00007fa521174609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #22 0x00007fa52107a293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (rr) mysqld: /data/Server/bb-10.6-MDEV-27025-deadlock/storage/innobase/lock/lock0lock.cc:4721: bool lock_rec_queue_validate(bool, page_id_t, const rec_t*, const dict_index_t*, const rec_offs*): Assertion `!other_lock' failed. for some DELETE FROM t6 WHERE col2 = 16 OR col2 IS NULL Status: KILL_TIMEOUT
            vlad.lesin Vladislav Lesin made changes -
            Comment [ A comment with security level 'Developers' was removed. ]
            vlad.lesin Vladislav Lesin added a comment - - edited

            Some update. During fixing the above bugs I found out that the initial fix is wrong. We need to preserve the invariant that any lock in locks queue can wait only the lock which is located before the waiting lock in the queue. I added the corresponding code, but RQG testing showed one more crash, which I have not fixed yet. See commit message for details (https://github.com/MariaDB/server/tree/bb-10.6-MDEV-27025-deadlock).

            vlad.lesin Vladislav Lesin added a comment - - edited Some update. During fixing the above bugs I found out that the initial fix is wrong. We need to preserve the invariant that any lock in locks queue can wait only the lock which is located before the waiting lock in the queue. I added the corresponding code, but RQG testing showed one more crash, which I have not fixed yet. See commit message for details ( https://github.com/MariaDB/server/tree/bb-10.6-MDEV-27025-deadlock ).

            Update. I got rid of SEGFAULT mentioned the previous comment, but RQG testing still shows the case when there are two granted conflicting locks on the same record. It still happens during "XA prepare" when all S-locks of the XA are released and X-lock of the other transaction is granted because it is located before the conflicting XA X-lock in the queue despite that XA X-lock was created before the granted X-lock of the other transaction. So the question is how such order of locks in the queue is possible. Still debugging it.

            vlad.lesin Vladislav Lesin added a comment - Update. I got rid of SEGFAULT mentioned the previous comment, but RQG testing still shows the case when there are two granted conflicting locks on the same record. It still happens during "XA prepare" when all S-locks of the XA are released and X-lock of the other transaction is granted because it is located before the conflicting XA X-lock in the queue despite that XA X-lock was created before the granted X-lock of the other transaction. So the question is how such order of locks in the queue is possible. Still debugging it.

            Update. I have fixed the above error. Matthias created special RQG config file to cause the above bugs, local RQG testing of my fix with this file contains no above errors. I pushed my branch for buildbot testing and in parallel launched local RQG testing with InnoDB_standard.cc config.

            vlad.lesin Vladislav Lesin added a comment - Update. I have fixed the above error. Matthias created special RQG config file to cause the above bugs, local RQG testing of my fix with this file contains no above errors. I pushed my branch for buildbot testing and in parallel launched local RQG testing with InnoDB_standard.cc config.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Status In Progress [ 3 ] In Testing [ 10301 ]

            This is great work. I cannot see anything obviously wrong in the logic. I pushed some suggested cleanup. If you agree and tests pass, it should be eventually merged to the final commit.

            mleich, please test this extensively with RQG. Unfortunately, we had to relax a debug assertion in lock_rec_queue_validate() due to the improved conflict handling. Instead of asserting that the conflicting lock is exclusive, we assert that it must be at least shared.

            marko Marko Mäkelä added a comment - This is great work. I cannot see anything obviously wrong in the logic. I pushed some suggested cleanup . If you agree and tests pass, it should be eventually merged to the final commit. mleich , please test this extensively with RQG. Unfortunately, we had to relax a debug assertion in lock_rec_queue_validate() due to the improved conflict handling. Instead of asserting that the conflicting lock is exclusive, we assert that it must be at least shared.
            marko Marko Mäkelä made changes -
            Assignee Vladislav Lesin [ vlad.lesin ] Matthias Leich [ mleich ]

            origin/bb-10.6-MDEV-27025-deadlock 4eaea8c5032b9040394bc4138d28c0cb9e29caab 2022-01-04T15:01:42+02:00
            behaved well in RQG testing. Failing tests are either caused by bad effects known from other branches too
            or weaknesses in RQG.

            mleich Matthias Leich added a comment - origin/bb-10.6- MDEV-27025 -deadlock 4eaea8c5032b9040394bc4138d28c0cb9e29caab 2022-01-04T15:01:42+02:00 behaved well in RQG testing. Failing tests are either caused by bad effects known from other branches too or weaknesses in RQG.
            mleich Matthias Leich made changes -
            Status In Testing [ 10301 ] Stalled [ 10000 ]
            mleich Matthias Leich made changes -
            Assignee Matthias Leich [ mleich ] Vladislav Lesin [ vlad.lesin ]
            vlad.lesin Vladislav Lesin made changes -

            I think it has to be backported. At least to 10.5

            serg Sergei Golubchik added a comment - I think it has to be backported. At least to 10.5
            vlad.lesin Vladislav Lesin added a comment - - edited

            Update: I have started trx_lock_t::wait_trx backporting to 10.5. I have also ported some debug checks, some of them fail on mtr tests due to wrong trx_lock_t::wait_trx reset. 10.5 and 10.6 locking code differs. As an example, in 10.5 lock_rec_move_low() invokes lock_reset_lock_and_trx_wait() before lock_rec_add_to_queue(). lock_reset_lock_and_trx_wait() resets trx_lock_t::wait_trx, what causes assertion failure in lock_rec_add_to_queue(). 10.6 does not invoke lock_reset_lock_and_trx_wait() from lock_rec_move(), it just does necessary actions inline instead. So I need to catch all such errors.

            vlad.lesin Vladislav Lesin added a comment - - edited Update: I have started trx_lock_t::wait_trx backporting to 10.5. I have also ported some debug checks, some of them fail on mtr tests due to wrong trx_lock_t::wait_trx reset. 10.5 and 10.6 locking code differs. As an example, in 10.5 lock_rec_move_low() invokes lock_reset_lock_and_trx_wait() before lock_rec_add_to_queue(). lock_reset_lock_and_trx_wait() resets trx_lock_t::wait_trx, what causes assertion failure in lock_rec_add_to_queue(). 10.6 does not invoke lock_reset_lock_and_trx_wait() from lock_rec_move(), it just does necessary actions inline instead. So I need to catch all such errors.
            vlad.lesin Vladislav Lesin made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]

            Update: backported trx_lock_t::wait_trx and the fix to 10.5(branch bb-10.5-MDEV-27025-deadlock). The branch is not fully tested, code cleanup and review are necessary after RQG testing.

            vlad.lesin Vladislav Lesin added a comment - Update: backported trx_lock_t::wait_trx and the fix to 10.5(branch bb-10.5- MDEV-27025 -deadlock). The branch is not fully tested, code cleanup and review are necessary after RQG testing.

            Update: did some code cleanup, fixed embedded server compilation error caused by my changes, did some code research to make sure my changes will not break Galera cluster and innodb-lock-schedule-algorithm=VATS,, passed the branch for RQG testing. My local RQG testing have not finished yet, but it looks promising, at least I don't see any crashes during several hours.

            vlad.lesin Vladislav Lesin added a comment - Update: did some code cleanup, fixed embedded server compilation error caused by my changes, did some code research to make sure my changes will not break Galera cluster and innodb-lock-schedule-algorithm=VATS,, passed the branch for RQG testing. My local RQG testing have not finished yet, but it looks promising, at least I don't see any crashes during several hours.
            vlad.lesin Vladislav Lesin made changes -
            Status In Progress [ 3 ] In Testing [ 10301 ]
            vlad.lesin Vladislav Lesin made changes -
            Assignee Vladislav Lesin [ vlad.lesin ] Matthias Leich [ mleich ]

            origin/bb-10.5-MDEV-27025-deadlock 1519b9a7aa774a6b728e29f107c16b6d713ce647 2022-01-12T19:32:22+03:00
            behaved well in RQG testing. Bad effects observed occur also on actual MariaDB versions.

            mleich Matthias Leich added a comment - origin/bb-10.5- MDEV-27025 -deadlock 1519b9a7aa774a6b728e29f107c16b6d713ce647 2022-01-12T19:32:22+03:00 behaved well in RQG testing. Bad effects observed occur also on actual MariaDB versions.
            mleich Matthias Leich made changes -
            Assignee Matthias Leich [ mleich ] Vladislav Lesin [ vlad.lesin ]
            Status In Testing [ 10301 ] Stalled [ 10000 ]

            I squashed the commits and rebased the branches. In 10.6 branch I added some details in the comment to clarify why we don't need to lock lock_sys.wait_mutex to check trx->lock.wait_trx. marko, you have already reviewed 10.6 branch, could you please also review 10.5 branch?

            vlad.lesin Vladislav Lesin added a comment - I squashed the commits and rebased the branches. In 10.6 branch I added some details in the comment to clarify why we don't need to lock lock_sys.wait_mutex to check trx->lock.wait_trx. marko , you have already reviewed 10.6 branch, could you please also review 10.5 branch?
            vlad.lesin Vladislav Lesin made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]

            bb-10.5-MDEV-27025-deadlock branch needs review.

            vlad.lesin Vladislav Lesin added a comment - bb-10.5- MDEV-27025 -deadlock branch needs review.
            vlad.lesin Vladislav Lesin made changes -
            Assignee Vladislav Lesin [ vlad.lesin ] Marko Mäkelä [ marko ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            The 10.6 version has already been reviewed and tested. That one is OK to push.

            Backporting to earlier versions is tricky and potentially risky, because the locking subsystem was heavily refactored in the 10.6 release.

            I posted some minor comments to the 10.5 version. I did not find anything really wrong there. I think that we’d better wait for additional test results from our customer before pushing the 10.5 version.

            marko Marko Mäkelä added a comment - The 10.6 version has already been reviewed and tested. That one is OK to push. Backporting to earlier versions is tricky and potentially risky, because the locking subsystem was heavily refactored in the 10.6 release. I posted some minor comments to the 10.5 version. I did not find anything really wrong there. I think that we’d better wait for additional test results from our customer before pushing the 10.5 version.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Vladislav Lesin [ vlad.lesin ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            vlad.lesin Vladislav Lesin made changes -
            Fix Version/s 10.5.14 [ 26809 ]
            Fix Version/s 10.6.6 [ 26811 ]
            Fix Version/s 10.7.2 [ 26813 ]
            Fix Version/s 10.8.1 [ 26815 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            vlad.lesin Vladislav Lesin made changes -
            Fix Version/s 10.3.35 [ 27512 ]
            Fix Version/s 10.4.25 [ 27510 ]
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            marko Marko Mäkelä made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            marko Marko Mäkelä made changes -

            This change was reverted, because it caused the incorrect-result bug MDEV-27992 when the PRIMARY KEY of a table was concurrently updated to a smaller value.

            marko Marko Mäkelä added a comment - This change was reverted, because it caused the incorrect-result bug MDEV-27992 when the PRIMARY KEY of a table was concurrently updated to a smaller value.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            vlad.lesin Vladislav Lesin made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -

            There is some interesting commit in MySQL-8: "Bug #11745929 CHANGE LOCK PRIORITY SO THAT THE TRANSACTION HOLDING S-LOCK GETS X-LOCK". It should be analyzed, it might fix the issue.

            vlad.lesin Vladislav Lesin added a comment - There is some interesting commit in MySQL-8: "Bug #11745929 CHANGE LOCK PRIORITY SO THAT THE TRANSACTION HOLDING S-LOCK GETS X-LOCK". It should be analyzed, it might fix the issue.
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            vlad.lesin Vladislav Lesin added a comment - - edited

            I have analysed "Bug #11745929 CHANGE LOCK PRIORITY SO THAT THE TRANSACTION HOLDING S-LOCK GETS X-LOCK" commit from Oracle.

            The general reason why we reverted MDEV-27025 is MDEV-27992. Let's take a look MDEV-27992 once more. Suppose we don't have MDEV-27025 fix. And we have trx 1 which holds ordinary X-lock on some record. And trx 2 executes "DELETE FROM t" or "SELECT * FOR UPDATE" in RR. It creates waiting ordinary X-lock on the same record. And then trx 1 wants to insert some record just before the locked record. It requests insert-intention lock. And the only thing which does not let trx 1 to insert new record is that trx 2 has conflicting waiting X-lock. That is why MDEV-27025 is not a bug, it's a feature. If trx 1 ii-lock did not conflict with waiting trx 2 ordinary X-lock, there would be phantom records in RR. And there is nothing to fix.

            But, what if trx 1 holds S-lock, and tries to acquire X-lock on the same record after trx 2 created waiting X-lock? Why trx 1 should wait for trx 2? This is exactly the case of the above MySQL fix. More commonly it can be formulated as "insert-intention locks must not overtake a waiting ordinary or gap locks".

            I think it could be useful for us to port that commit to fix the above issue. Besides, it contains some useful optimization:

            `lock_rec_find_set_bit` which searches for the first bit set in a bitmap used bit-by-bit loop. Now it uses 13x times faster implementation which tries to skip 64,then 32,16, or 8 bits at a time. This is important for WAITING locks which have just a single bit set, in a bitmap with number of bits equal to the number of records on a page (which can be ~500).

            If we port it some time, we should also port "Bug #34123159 Assertion failure: lock0lock.cc:5161:lock_rec_has_expl(LOCK_X | LOCK_REC_NOT_GAP".

            One more note. In MDEV-27992 case DELETE converts implicit lock to explicit one despite conflicting transaction already holds ordinary X-lock on the record. That is because lock_rec_convert_impl_to_expl_for_trx() checks only LOCK_X | LOCK_REC_NOT_GAP when it looks for existing explicit locks. We could include LOCK_ORDINARY in the search also.

            vlad.lesin Vladislav Lesin added a comment - - edited I have analysed "Bug #11745929 CHANGE LOCK PRIORITY SO THAT THE TRANSACTION HOLDING S-LOCK GETS X-LOCK" commit from Oracle. The general reason why we reverted MDEV-27025 is MDEV-27992 . Let's take a look MDEV-27992 once more. Suppose we don't have MDEV-27025 fix. And we have trx 1 which holds ordinary X-lock on some record. And trx 2 executes "DELETE FROM t" or "SELECT * FOR UPDATE" in RR. It creates waiting ordinary X-lock on the same record. And then trx 1 wants to insert some record just before the locked record. It requests insert-intention lock. And the only thing which does not let trx 1 to insert new record is that trx 2 has conflicting waiting X-lock. That is why MDEV-27025 is not a bug, it's a feature . If trx 1 ii-lock did not conflict with waiting trx 2 ordinary X-lock, there would be phantom records in RR. And there is nothing to fix. But, what if trx 1 holds S-lock, and tries to acquire X-lock on the same record after trx 2 created waiting X-lock? Why trx 1 should wait for trx 2? This is exactly the case of the above MySQL fix. More commonly it can be formulated as "insert-intention locks must not overtake a waiting ordinary or gap locks". I think it could be useful for us to port that commit to fix the above issue. Besides, it contains some useful optimization: `lock_rec_find_set_bit` which searches for the first bit set in a bitmap used bit-by-bit loop. Now it uses 13x times faster implementation which tries to skip 64,then 32,16, or 8 bits at a time. This is important for WAITING locks which have just a single bit set, in a bitmap with number of bits equal to the number of records on a page (which can be ~500). If we port it some time, we should also port "Bug #34123159 Assertion failure: lock0lock.cc:5161:lock_rec_has_expl(LOCK_X | LOCK_REC_NOT_GAP". One more note. In MDEV-27992 case DELETE converts implicit lock to explicit one despite conflicting transaction already holds ordinary X-lock on the record. That is because lock_rec_convert_impl_to_expl_for_trx() checks only LOCK_X | LOCK_REC_NOT_GAP when it looks for existing explicit locks. We could include LOCK_ORDINARY in the search also.
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 202352 201658 136328
            Zendesk active tickets 201658
            monty Michael Widenius added a comment - - edited

            What was done as part of fixing this issue ? It is not clear if anything from vlad's comment above has been addressed.

            Is the issue in https://github.com/mysql/mysql-server/commit/7037a0bdc83196755a3bf3e935cfb3c0127715d5
            addressed now ?
            The above commit include a test case. Does that test case now work in MariaDB?

            monty Michael Widenius added a comment - - edited What was done as part of fixing this issue ? It is not clear if anything from vlad's comment above has been addressed. Is the issue in https://github.com/mysql/mysql-server/commit/7037a0bdc83196755a3bf3e935cfb3c0127715d5 addressed now ? The above commit include a test case. Does that test case now work in MariaDB?

            Note that in MariaDB 10.6 we have now optimized bit operations in my_bitmap.cc
            There is no looping over bits anymore and all operations are done on 64 bits at a time.

            monty Michael Widenius added a comment - Note that in MariaDB 10.6 we have now optimized bit operations in my_bitmap.cc There is no looping over bits anymore and all operations are done on 64 bits at a time.
            vlad.lesin Vladislav Lesin made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            vlad.lesin Vladislav Lesin made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            vlad.lesin Vladislav Lesin made changes -

            I filed MDEV-34877 for porting MySQL's "Bug #11745929".

            vlad.lesin Vladislav Lesin added a comment - I filed MDEV-34877 for porting MySQL's "Bug #11745929".
            vlad.lesin Vladislav Lesin made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            JIraAutomate JiraAutomate made changes -
            Fix Version/s 10.5.16 [ 27508 ]
            Fix Version/s 10.6.8 [ 27506 ]
            Fix Version/s 10.7.4 [ 27504 ]
            Fix Version/s 10.8.3 [ 27502 ]
            alessandro.vetere Alessandro Vetere made changes -
            marko Marko Mäkelä added a comment -

            In order to run the test ii-conflicts-waiting.test on MariaDB Server 10.6 or later, the following adjustment is needed.

            @@ -18,7 +18,7 @@
             
             --connect(con_del,localhost,root,,)
             SET DEBUG_SYNC = 'now WAIT_FOR ins_set_locks';
            -SET DEBUG_SYNC = 'lock_wait_suspend_thread_enter SIGNAL del_locked';
            +SET DEBUG_SYNC = 'lock_wait_start SIGNAL del_locked';
             ###############################################################################
             # This DELETE creates waiting ORDINARY X-lock for heap_no 2 as the record is
             # delete-marked, this lock conflicts with ORDINARY S-lock set by the the last
            

            vlad.lesin, it seems that MDEV-34877 is not fixing the scenario of this test:

            11.4 30140c066d50f7e4ac4f490a9e081d9d605aea07

            mysqltest: At line 43: query 'reap' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction
            

            I tested this with both values of innodb_snapshot_isolation (MDEV-35124) and got the same result. Based on my reading of the analysis in MDEV-27992, this setting should make no difference in that scenario, but I did not check that by re-applying and retesting the original fix of MDEV-27025.

            The regression MDEV-27992, which forced us to revert the original fix, involves a scenario where some PRIMARY KEY columns are being updated, or more generally, if the same transaction is first deleting and then inserting rows in the same table. If the problems are only limited to only such scenarios, I wonder if we could treat those as a special case and enable the optimization in other cases. At the core of the MDEV-27992 fix is the added parameter bool insert_before_waiting, which is being set in the calls of lock_rec_add_to_queue() in lock_rec_convert_impl_to_expl_for_trx() and when lock_rec_other_has_conflicting() sets was_ignored in lock_rec_lock().

            marko Marko Mäkelä added a comment - In order to run the test ii-conflicts-waiting.test on MariaDB Server 10.6 or later, the following adjustment is needed. @@ -18,7 +18,7 @@ --connect(con_del,localhost,root,,) SET DEBUG_SYNC = 'now WAIT_FOR ins_set_locks'; -SET DEBUG_SYNC = 'lock_wait_suspend_thread_enter SIGNAL del_locked'; +SET DEBUG_SYNC = 'lock_wait_start SIGNAL del_locked'; ############################################################################### # This DELETE creates waiting ORDINARY X-lock for heap_no 2 as the record is # delete-marked, this lock conflicts with ORDINARY S-lock set by the the last vlad.lesin , it seems that MDEV-34877 is not fixing the scenario of this test: 11.4 30140c066d50f7e4ac4f490a9e081d9d605aea07 mysqltest: At line 43: query 'reap' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction I tested this with both values of innodb_snapshot_isolation ( MDEV-35124 ) and got the same result. Based on my reading of the analysis in MDEV-27992 , this setting should make no difference in that scenario, but I did not check that by re-applying and retesting the original fix of MDEV-27025 . The regression MDEV-27992 , which forced us to revert the original fix, involves a scenario where some PRIMARY KEY columns are being updated, or more generally, if the same transaction is first deleting and then inserting rows in the same table. If the problems are only limited to only such scenarios, I wonder if we could treat those as a special case and enable the optimization in other cases. At the core of the MDEV-27992 fix is the added parameter bool insert_before_waiting , which is being set in the calls of lock_rec_add_to_queue() in lock_rec_convert_impl_to_expl_for_trx() and when lock_rec_other_has_conflicting() sets was_ignored in lock_rec_lock() .

            People

              vlad.lesin Vladislav Lesin
              vlad.lesin Vladislav Lesin
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.