[MDEV-32353] lock_release_on_prepare_try() does not release lock if supremum bit is set along with other bits set in lock's bitmap Created: 2023-10-05  Updated: 2023-10-13  Resolved: 2023-10-13

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, XA
Affects Version/s: 10.6
Fix Version/s: 10.6.16

Type: Bug Priority: Blocker
Reporter: Roel Van de Paar Assignee: Vladislav Lesin
Resolution: Duplicate Votes: 0
Labels: debug, locking, regression

Issue Links:
Duplicate
duplicates MDEV-32272 lock_release_on_prepare_try() does no... Closed
Problem/Incident
is caused by MDEV-30165 X-lock on supremum for prepared trans... Closed
Relates

 Description   

--source include/have_innodb.inc
SET GLOBAL innodb_stats_persistent=DEFAULT;
CREATE TABLE t ENGINE=InnoDB AS SELECT 1;
XA START 'a';
INSERT INTO mysql.innodb_index_stats SELECT '','' AS table_name,index_name,LAST_UPDATE,stat_name,0 AS stat_value,sample_size,stat_description FROM mysql.innodb_index_stats WHERE table_name='dummy';  # Note the SELECT is empty
INSERT INTO t VALUES (1);
XA END 'a';
XA PREPARE 'a';

Leads to:

10.6.16 076df87b4c1dc5123d66600b82201050c45aa9d9 (Debug)

mariadbd: /test/10.6_dbg/storage/innobase/lock/lock0lock.cc:5046: bool lock_rec_queue_validate(bool, page_id_t, const rec_t*, const dict_index_t*, const rec_offs*): Assertion `other_lock->is_waiting()' failed.

10.6.16 076df87b4c1dc5123d66600b82201050c45aa9d9 (Debug)

Core was generated by `/test/MD270923-mariadb-10.6.16-linux-x86_64-dbg/bin/mariadbd --no-defaults --co'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22878682601024)
    at ./nptl/pthread_kill.c:44
[Current thread is 1 (Thread 0x14cedbbfd640 (LWP 3118002))]
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22878682601024) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=22878682601024) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=22878682601024, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x000014cf0de42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x000014cf0de287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000014cf0de2871b in __assert_fail_base (fmt=0x14cf0dfdd150 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5640babb2f76 "other_lock->is_waiting()", file=0x5640babb3628 "/test/10.6_dbg/storage/innobase/lock/lock0lock.cc", line=5046, function=<optimized out>) at ./assert/assert.c:92
#6  0x000014cf0de39e96 in __GI___assert_fail (assertion=0x5640babb2f76 "other_lock->is_waiting()", file=0x5640babb3628 "/test/10.6_dbg/storage/innobase/lock/lock0lock.cc", line=5046, function=0x5640babb4860 "bool lock_rec_queue_validate(bool, page_id_t, const rec_t*, const dict_index_t*, const rec_offs*)") at ./assert/assert.c:101
#7  0x00005640ba44b2a4 in lock_rec_queue_validate (locked_lock_trx_sys=locked_lock_trx_sys@entry=false, id=id@entry={m_id = 8589934595}, rec=rec@entry=0x14ceec624209 "testtGEN_CLUST_INDEXn_diff_pfx01", index=index@entry=0x14ceb000c0f8, offsets=offsets@entry=0x14cedbbfb100) at /test/10.6_dbg/storage/innobase/lock/lock0lock.cc:5046
#8  0x00005640ba455cd8 in lock_clust_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0x14ceec00eba0, rec=rec@entry=0x14ceec624209 "testtGEN_CLUST_INDEXn_diff_pfx01", index=index@entry=0x14ceb000c0f8, offsets=offsets@entry=0x14cedbbfb100, mode=mode@entry=LOCK_S, gap_mode=1024, thr=0x14ceb00102a8) at /test/10.6_dbg/storage/innobase/lock/lock0lock.cc:5890
#9  0x00005640ba502ef4 in row_ins_set_shared_rec_lock (type=type@entry=1024, block=0x14ceec00eba0, rec=rec@entry=0x14ceec624209 "testtGEN_CLUST_INDEXn_diff_pfx01", index=0x14ceb000c0f8, offsets=offsets@entry=0x14cedbbfb100, thr=thr@entry=0x14ceb00102a8) at /test/10.6_dbg/storage/innobase/row/row0ins.cc:1409
#10 0x00005640ba5101cb in row_ins_duplicate_error_in_clust (thr=0x14ceb00102a8, entry=0x14ceb0012a58, cursor=0x14cedbbfadc0, flags=0) at /test/10.6_dbg/storage/innobase/include/btr0cur.h:744
#11 row_ins_clust_index_entry_low (flags=flags@entry=0, mode=<optimized out>, mode@entry=BTR_MODIFY_LEAF, index=index@entry=0x14ceb000c0f8, n_uniq=n_uniq@entry=4, entry=entry@entry=0x14ceb0012a58, n_ext=n_ext@entry=0, thr=<optimized out>) at /test/10.6_dbg/storage/innobase/row/row0ins.cc:2818
#12 0x00005640ba511380 in row_ins_clust_index_entry (index=index@entry=0x14ceb000c0f8, entry=entry@entry=0x14ceb0012a58, thr=thr@entry=0x14ceb00102a8, n_ext=n_ext@entry=0) at /test/10.6_dbg/storage/innobase/row/row0ins.cc:3242
#13 0x00005640ba512b67 in row_ins_index_entry (thr=0x14ceb00102a8, entry=0x14ceb0012a58, index=0x14ceb000c0f8) at /test/10.6_dbg/storage/innobase/row/row0ins.cc:3368
#14 row_ins_index_entry_step (thr=0x14ceb00102a8, node=0x14ceb000fff0) at /test/10.6_dbg/storage/innobase/row/row0ins.cc:3536
#15 row_ins (thr=0x14ceb00102a8, node=0x14ceb000fff0) at /test/10.6_dbg/storage/innobase/row/row0ins.cc:3661
#16 row_ins_step (thr=thr@entry=0x14ceb00102a8) at /test/10.6_dbg/storage/innobase/row/row0ins.cc:3790
#17 0x00005640ba4d2dda in que_thr_step (thr=0x14ceb00102a8) at /test/10.6_dbg/storage/innobase/que/que0que.cc:567
#18 que_run_threads_low (thr=0x14ceb00102a8) at /test/10.6_dbg/storage/innobase/que/que0que.cc:644
#19 que_run_threads (thr=0x14ceb00102a8) at /test/10.6_dbg/storage/innobase/que/que0que.cc:664
#20 0x00005640ba4d3391 in que_eval_sql (info=info@entry=0x14ceb0012618, sql=sql@entry=0x5640bac09718 "PROCEDURE INDEX_STATS_SAVE () IS\nBEGIN\nDELETE FROM \"mysql/innodb_index_stats\"\nWHERE\ndatabase_name = :database_name AND\ntable_name = :table_name AND\nindex_name = :index_name AND\nstat_name = :stat_name;"..., trx=trx@entry=0x14cf04822680) at /test/10.6_dbg/storage/innobase/que/que0que.cc:703
#21 0x00005640ba6aecf8 in dict_stats_exec_sql (pinfo=pinfo@entry=0x14ceb0012618, sql=sql@entry=0x5640bac09718 "PROCEDURE INDEX_STATS_SAVE () IS\nBEGIN\nDELETE FROM \"mysql/innodb_index_stats\"\nWHERE\ndatabase_name = :database_name AND\ntable_name = :table_name AND\nindex_name = :index_name AND\nstat_name = :stat_name;"..., trx=trx@entry=0x14cf04822680) at /test/10.6_dbg/storage/innobase/dict/dict0stats.cc:548
#22 0x00005640ba6af421 in dict_stats_save_index_stat (index=index@entry=0x14ceb000d868, last_update=last_update@entry=1696488608, stat_name=<optimized out>, stat_name@entry=0x14cedbbfc120 "n_diff_pfx01", stat_value=2, sample_size=0x14ceb000da50, stat_description=stat_description@entry=0x14cedbbfc2d0 "DB_ROW_ID", trx=0x14cf04822680) at /test/10.6_dbg/storage/innobase/dict/dict0stats.cc:3156
#23 0x00005640ba6b3ae0 in dict_stats_save (table_orig=table_orig@entry=0x14ce9801eb08, only_for_index=only_for_index@entry=0x0) at /test/10.6_dbg/storage/innobase/dict/dict0stats.cc:3425
#24 0x00005640ba6b666b in dict_stats_update (table=table@entry=0x14ce9801eb08, stats_upd_option=stats_upd_option@entry=DICT_STATS_RECALC_PERSISTENT) at /test/10.6_dbg/storage/innobase/dict/dict0stats.cc:4087
#25 0x00005640ba6b9abc in dict_stats_process_entry_from_recalc_pool (thd=0x14ceb0003848) at /test/10.6_dbg/storage/innobase/dict/dict0stats_bg.cc:343
#26 dict_stats_func () at /test/10.6_dbg/storage/innobase/dict/dict0stats_bg.cc:382
#27 0x00005640ba746269 in tpool::thread_pool_generic::timer_generic::run (this=0x5640bcac47e0) at /test/10.6_dbg/tpool/tpool_generic.cc:343
#28 tpool::thread_pool_generic::timer_generic::execute (arg=0x5640bcac47e0) at /test/10.6_dbg/tpool/tpool_generic.cc:363
#29 0x00005640ba747126 in tpool::task::execute (this=0x5640bcac4820) at /test/10.6_dbg/tpool/task.cc:37
#30 0x00005640ba744cee in tpool::thread_pool_generic::worker_main (this=0x5640bc7350f0, thread_var=0x5640bc735920) at /test/10.6_dbg/tpool/tpool_generic.cc:580
#31 0x00005640ba7460be in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/11/bits/invoke.h:74
#32 std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=<optimized out>) at /usr/include/c++/11/bits/invoke.h:96
#33 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:259
#34 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:266
#35 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:211
#36 0x000014cf0e2dc253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#37 0x000014cf0de94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#38 0x000014cf0df26a40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Bug confirmed present in:
MariaDB: 10.6.16 (dbg)

Bug (or feature/syntax) confirmed not present in:
MariaDB: 10.4.32 (dbg), 10.4.32 (opt), 10.5.23 (dbg), 10.5.23 (opt), 10.6.16 (opt), 10.9.8 (dbg), 10.9.8 (opt), 10.10.7 (dbg), 10.10.7 (opt), 10.11.6 (dbg), 10.11.6 (opt), 11.0.4 (dbg), 11.0.4 (opt), 11.1.3 (dbg), 11.1.3 (opt), 11.2.2 (dbg), 11.2.2 (opt), 11.3.0 (dbg), 11.3.0 (opt)
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.51 (dbg), 5.6.51 (opt), 5.7.40 (dbg), 8.0.33 (dbg), 8.0.33 (opt)



 Comments   
Comment by Marko Mäkelä [ 2023-10-06 ]

I can reproduce this with the merge of MDEV-30165 but not its parent commit.

Comment by Vladislav Lesin [ 2023-10-09 ]

Some update. I am still analysing it, my goal is to create stable test case. But, more likely, it does not depend on MDEV-30165.

What I see is that after "XA PREPARE" there is dict_stats_func() call by timer. It calls dict_stats_save_index_stat(), which, in turns, executes the following internal query:

PROCEDURE INDEX_STATS_SAVE () IS\nBEGIN\nDELETE FROM \"mysql/innodb_index_stats\"\nWHERE\ndatabase_name = :database_name AND\ntable_name = :table_name AND\nindex_name = :index_name AND\nstat_name = :stat_name;\nINSERT INTO \"mysql/innodb_index_stats\"\nVALUES\n(\n:database_name,\n:table_name,\n:index_name,\n:last_update,\n:stat_name,\n:stat_value,\n:sample_size,\n:stat_description\n);\nEND;

During the query parsing node->has_clust_rec_x_lock is set to true in the following code:

upd_node_t*                                                                     
pars_update_statement(...)                                                      
{                                                                               
...                                                                             
        if (node->searched_update) {                                            
                node->has_clust_rec_x_lock = TRUE;                              
                sel_node->set_x_locks = TRUE;                                   
                sel_node->row_lock_mode = LOCK_X;                               
        } else {                                                                
        ...                                                                     
        }                                                                       
...                                                                             
}

Then row_upd_del_mark_clust_rec() is executed. It should set X-lock before delete-marking some record, but it did not do that, as node->has_clust_rec_x_lock was set:

static MY_ATTRIBUTE((nonnull, warn_unused_result))                              
dberr_t                                                                         
row_upd_clust_step(...)                                                         
{                                                                               
...                                                                             
        if (!flags && !node->has_clust_rec_x_lock) {                            
                err = lock_clust_rec_modify_check_and_lock(                     
                        btr_pcur_get_block(pcur),                               
                        rec, index, offsets, thr);                              
                if (err != DB_SUCCESS) {                                        
                        goto exit_func;                                         
                }                                                               
        }                                                                       
...                                                                             
        if (node->is_delete == PLAIN_DELETE) {                                  
                err = row_upd_del_mark_clust_rec(                               
                        node, ...);                                             
        ...                                                                     
        }                                                                       
...                                                                             
}

node->has_clust_rec_x_lock is set because it's supposed that lock will be set during row_sel() execution. But, when row_sel() tries to set X-lock, it finds out that the transaction owns table X-lock, and skips record lock requesting. The table's X-lock is set in dict_stats_save() call with lock_table_for_trx(). It can be set, because "XA prepare" resets table's S and IS locks.

Then, when INSERT statement of that internal procedure is executed, it invokes locking queue validation check during duplicates check. The validation sees that modified by one transaction record has explicit lock of another transaction.

Comment by Marko Mäkelä [ 2023-10-10 ]

vlad.lesin, while analyzing MDEV-32272 I noticed that INSERT…SELECT is only acquiring an intention-shared but not intention-exclusive table lock when the source and the target tables are the same. Could that be the root cause also here?

Comment by Vladislav Lesin [ 2023-10-10 ]

marko, yes, this is more likely the root case, "XA prepare" releases table's S and IS, but not X and IX locks.

Comment by Vladislav Lesin [ 2023-10-12 ]

Roel, as we can see from the stack trace from the description, the assertion failures in background thread. dict_stats_save() execution in background thread is triggered with dict_stats_update_if_needed() call during "INSERT INTO t VALUES (1);" execution. So, to make it stable, we have to wait until dict_stats_save() function is executed after "INSERT INTO t VALUES (1);" is finished. For this purpose I added some debug sync point:

--- a/storage/innobase/dict/dict0stats.cc
+++ b/storage/innobase/dict/dict0stats.cc
@@ -34,6 +34,8 @@ Created Jan 06, 2010 Vasil Dimov
 #include "log.h"
 #include "btr0btr.h"
 #include "que0que.h"
+#include "scope.h"
+#include "debug_sync.h"
 
 #include <algorithm>
 #include <map>
@@ -3245,6 +3247,13 @@ dict_stats_save(
        char            db_utf8[MAX_DB_UTF8_LEN];
        char            table_utf8[MAX_TABLE_UTF8_LEN];
 
+#ifdef ENABLED_DEBUG_SYNC
+       DBUG_EXECUTE_IF("dict_stats_save_exit_notify",
+          SCOPE_EXIT([] {
+              debug_sync_set_action(current_thd,
+              STRING_WITH_LEN("now SIGNAL dict_stats_save_finished"));}););
+#endif /* ENABLES_DEBUG_SYNC */
+
        if (high_level_read_only) {
                return DB_READ_ONLY;
        }

And used it in mtr test case:

--source include/have_innodb.inc                                                
--source include/have_debug.inc                                                 
--source include/have_debug_sync.inc                                            
                                                                                
SET @old_innodb_stats_persistent = @@innodb_stats_persistent;                   
SET GLOBAL innodb_stats_persistent=1;                                           
CREATE TABLE t ENGINE=InnoDB AS SELECT 1;                                       
                                                                                
SET @old_debug_dbug = @@global.debug_dbug;                                      
                                                                                
XA START 'a';                                                                   
INSERT INTO mysql.innodb_index_stats SELECT '','' AS table_name,index_name,LAST_UPDATE,stat_name,0 AS stat_value,sample_size,stat_description FROM mysql.innodb_index_stats WHERE tab
le_name='dummy' FOR UPDATE;  # Note the SELECT is empty  
SET GLOBAL debug_dbug = "+d,dict_stats_save_exit_notify";                       
INSERT INTO t VALUES (1);                                                       
XA END 'a';                                                                     
XA PREPARE 'a';                                                                 
                                                                                
# Locking queue validation will crash the server if the bug is not fixed        
SET DEBUG_SYNC="now WAIT_FOR dict_stats_save_finished";                         
SET @@global.debug_dbug = @old_debug_dbug;                                      
SET DEBUG_SYNC="RESET";                                                         
SET GLOBAL innodb_stats_persistent = @old_innodb_stats_persistent;              
XA COMMIT 'a';                                                                  
DROP TABLE t;

Comment by Vladislav Lesin [ 2023-10-12 ]

The fact that I can't repeat it on 10.6 without MDEV-30165 fix even after I added sync point, looked very suspiciously for me, that's why I continued the analyses.

In the above comment I wrongly supposed that "INSERT…SELECT" requests X-locks during "SELECT" subquery execution. I supposed that because S-locks should be released on "XA PREPARE", and I did not check locks type during debugging. De-facto S-locks are requested.

As Marko noted, "INSERT…SELECT" requests IS but not IX table lock. But, if we modify the request to force X record locks requesting, like "INSERT…SELECT...FOR UPDATE", table IX lock will be requested during row_search_mvcc() execution for "SELECT...FOR UPDATE" subquery:

                if (prebuilt->select_lock_type == LOCK_NONE) {                  
                        trx->read_view.open(trx);                               
                } else {                                                        
wait_table_again:                                                               
                        err = lock_table(prebuilt->table, nullptr,              
                                         prebuilt->select_lock_type == LOCK_S   
                                         ? LOCK_IS : LOCK_IX, thr);
                       ...
                } 

As we can see, the type of table lock is either IS or IX. Both of them conflict with X-lock, requested from dict_stats_save() call with lock_table_for_trx().

So, my answer is wrong. Even if INSERT…SELECT is only acquiring an IS but not IX table lock, the "SELECT" subquery is either acquiring IS table and S record locks, which must be released on "XA PREPARE", or IX table lock.

So either dict_stats_save() acquires X-lock as there are no IS table and S record locks, as "XA PREPARE" released them, or it ignores DB_LOCK_WAIT and the next dict_stats_save() call is rescheduled by timer. In any cases there can't be the lock queue validation crash, as no internal queries are executed by dict_stats_save() when the XA still holds conflicting record locks.

But what is the reason of the crash? There is logical error in lock_release_on_prepare_try():

        if (supremum_bit)
          lock_rec_unlock_supremum(*cell, lock);
        else
          lock_rec_dequeue_from_page(lock, false);

So, if supremum bit is set, we only reset it and rebuild waiting queue even if !lock->is_rec_granted_exclusive_not_gap(). The other bits in the lock bitmap are just ignored. The correct logic is the same as in lock_release_on_prepare() and the following:

        if (!rec_granted_exclusive_not_gap)                                     
          lock_rec_dequeue_from_page(lock, false);                              
        else if (supremum_bit)                                                  
          lock_rec_unlock_supremum(*cell, lock);

So the issue is really caused by the error in MDEV-30165 fix.

Comment by Vladislav Lesin [ 2023-10-12 ]

10.5 is not affected, but test cases backport would be useful.

Comment by Marko Mäkelä [ 2023-10-12 ]

The fix looks good to me. Would this fix MDEV-32272 as well?

Generated at Thu Feb 08 10:30:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.