Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25037

SIGSEGV in MDL_lock::hog_lock_types_bitmap on INSERT w/ RELEASE_ALL_LOCKS()

Details

    Description

      This issue is similar to MDEV-20945, but we need to set wsrep_trx_fragment_size to reproduce the issue with Galera cluster

      CREATE TABLE t (c DOUBLE,c2 INT,PRIMARY KEY(c));
      SELECT GET_LOCK('a',1);
      START TRANSACTION;
      SET SESSION wsrep_trx_fragment_size=1;
      INSERT INTO t VALUES(1,1), (1,2);
      SELECT RELEASE_ALL_LOCKS();
      

      10.6.0 74281fe1fb0faf444aec3744b90995156f9f58f9 (Optimized)

      Core was generated by `/test/GAL_MD240221-mariadb-10.6.0-linux-x86_64-opt/bin/mysqld --defaults-file=/'.
      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11)
          at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
      [Current thread is 1 (Thread 0x147cadccf700 (LWP 2135279))]
      (gdb) bt
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
      #1  0x000055cdd403f67f in my_write_core (sig=sig@entry=11) at /test/10.6_opt/mysys/stacktrace.c:424
      #2  0x000055cdd3ab1510 in handle_fatal_signal (sig=11) at /test/10.6_opt/sql/signal_handler.cc:331
      #3  <signal handler called>
      #4  MDL_lock::hog_lock_types_bitmap (this=0x147c70021ec8) at /test/10.6_opt/sql/mdl.cc:587
      #5  MDL_lock::reschedule_waiters (this=0x147c70021ec8) at /test/10.6_opt/sql/mdl.cc:1273
      #6  0x000055cdd39b2bc6 in MDL_lock::remove_ticket (this=0x147c70021ec8, pins=0x55cdd6c24468, list=list@entry=&MDL_lock::m_granted, ticket=ticket@entry=0x55cdd6ca8fb0) at /test/10.6_opt/sql/mdl.cc:1811
      #7  0x000055cdd39b3529 in MDL_context::release_lock (this=<optimized out>, duration=<optimized out>, ticket=0x55cdd6ca8fb0) at /test/10.6_opt/sql/mdl.cc:2822
      #8  0x000055cdd39b3571 in MDL_context::release_lock (this=<optimized out>, ticket=<optimized out>) at /test/10.6_opt/sql/mdl.cc:2842
      #9  0x000055cdd3b186bd in Item_func_release_all_locks::val_int (this=<optimized out>) at /test/10.6_opt/sql/item_func.cc:4294
      #10 0x000055cdd3a19e1d in Type_handler::Item_send_long (this=<optimized out>, item=0x147c70010988, protocol=0x147c700011b0, buf=<optimized out>) at /test/10.6_opt/sql/sql_type.cc:7392
      #11 0x000055cdd37de910 in Protocol::send_result_set_row (this=this@entry=0x147c700011b0, row_items=row_items@entry=0x147c70010658) at /test/10.6_opt/sql/protocol.cc:1331
      #12 0x000055cdd384fba7 in select_send::send_data (this=0x147c70011330, items=@0x147c70010658: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x147c70010a50, last = 0x147c70010a50, elements = 1}, <No data fields>}) at /test/10.6_opt/sql/sql_class.cc:3019
      #13 0x000055cdd390d226 in select_result_sink::send_data_with_check (u=<optimized out>, sent=0, items=<optimized out>, this=<optimized out>) at /test/10.6_opt/sql/sql_class.h:5529
      #14 select_result_sink::send_data_with_check (sent=0, u=<optimized out>, items=<optimized out>, this=<optimized out>) at /test/10.6_opt/sql/sql_class.h:5519
      #15 JOIN::exec_inner (this=0x147c70011358) at /test/10.6_opt/sql/sql_select.cc:4345
      #16 0x000055cdd390d519 in JOIN::exec (this=this@entry=0x147c70011358) at /test/10.6_opt/sql/sql_select.cc:4257
      #17 0x000055cdd390b5da in mysql_select (thd=0x147c70000c58, tables=0x0, fields=<optimized out>, conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x147c70011330, unit=0x147c70004c68, select_lex=0x147c70010508) at /test/10.6_opt/sql/sql_select.cc:4730
      #18 0x000055cdd390bfa7 in handle_select (thd=thd@entry=0x147c70000c58, lex=lex@entry=0x147c70004ba0, result=result@entry=0x147c70011330, setup_tables_done_option=setup_tables_done_option@entry=0) at /test/10.6_opt/sql/sql_select.cc:417
      #19 0x000055cdd389acb1 in execute_sqlcom_select (thd=0x147c70000c58, all_tables=0x0) at /test/10.6_opt/sql/sql_parse.cc:6204
      #20 0x000055cdd38a8983 in mysql_execute_command (thd=0x147c70000c58) at /test/10.6_opt/sql/sql_parse.cc:3900
      #21 0x000055cdd389583f in mysql_parse (thd=thd@entry=0x147c70000c58, rawbuf=rawbuf@entry=0x147c70010470 "SELECT RELEASE_ALL_LOCKS()", length=length@entry=26, parser_state=parser_state@entry=0x147cadcce400) at /test/10.6_opt/sql/sql_parse.cc:7972
      #22 0x000055cdd38952b6 in wsrep_mysql_parse (thd=0x147c70000c58, rawbuf=0x147c70010470 "SELECT RELEASE_ALL_LOCKS()", length=26, parser_state=0x147cadcce400) at /test/10.6_opt/sql/sql_parse.cc:7786
      #23 0x000055cdd38a26ee in dispatch_command (command=COM_QUERY, thd=0x147c70000c58, packet=0x147c700080d9 "", packet_length=<optimized out>, blocking=<optimized out>) at /test/10.6_opt/sql/sql_class.h:1295
      #24 0x000055cdd38a3136 in do_command (thd=0x147c70000c58, blocking=blocking@entry=true) at /test/10.6_opt/sql/sql_parse.cc:1397
      #25 0x000055cdd39a91dd in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55cdd6ccb3f8, put_in_cache=put_in_cache@entry=true) at /test/10.6_opt/sql/sql_connect.cc:1410
      #26 0x000055cdd39a968d in handle_one_connection (arg=arg@entry=0x55cdd6ccb3f8) at /test/10.6_opt/sql/sql_connect.cc:1312
      #27 0x000055cdd3d30846 in pfs_spawn_thread (arg=0x55cdd6cdeb68) at /test/10.6_opt/storage/perfschema/pfs.cc:2201
      #28 0x0000147ce397d609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #29 0x0000147ce356c293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      10.6.0 208233be5af55072d7ef80c37ddbc664bc51f342 (Debug)

      Core was generated by `/test/GAL_MD230221-mariadb-10.6.0-linux-x86_64-dbg/bin/mysqld --defaults-file=/'.
      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11)
          at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
      [Current thread is 1 (Thread 0x154c5c96a700 (LWP 2121372))]
      (gdb) bt
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
      #1  0x0000559915c254db in my_write_core (sig=sig@entry=11) at /test/10.6_dbg/mysys/stacktrace.c:424
      #2  0x00005599153b52df in handle_fatal_signal (sig=11) at /test/10.6_dbg/sql/signal_handler.cc:330
      #3  <signal handler called>
      #4  0x00005599152563e4 in ilist<MDL_ticket, void>::erase (pos={node_ = 0x154bf40083b8}, this=0x154bf404ab20) at /test/10.6_dbg/include/ilist.h:207
      #5  ilist<MDL_ticket, void>::remove (value=@0x154bf40083b0: {<MDL_wait_for_subgraph> = {_vptr.MDL_wait_for_subgraph = 0x154c4c026df0}, <ilist_node<void>> = {next = 0x154bf40008d0, prev = 0x0}, next_in_context = 0x0, prev_in_context = 0x154bf4000fe8, m_type = MDL_SHARED_NO_WRITE, m_duration = MDL_EXPLICIT, m_ctx = 0x154bf4000ee8, m_lock = 0x154bf404a918, m_psi = 0x0}, this=0x154bf404ab20) at /test/10.6_dbg/include/ilist.h:207
      #6  MDL_lock::Ticket_list::remove_ticket (this=this@entry=0x154bf404ab20, ticket=ticket@entry=0x154bf40083b0) at /test/10.6_dbg/sql/mdl.cc:1247
      #7  0x0000559915257d27 in MDL_lock::remove_ticket (this=this@entry=0x154bf404a918, pins=0x559917f94fb0, list=list@entry=&MDL_lock::m_granted, ticket=ticket@entry=0x154bf40083b0) at /test/10.6_dbg/sql/mdl.cc:1792
      #8  0x0000559915257e3c in MDL_context::release_lock (this=this@entry=0x154bf4000ee8, duration=duration@entry=MDL_EXPLICIT, ticket=ticket@entry=0x154bf40083b0) at /test/10.6_dbg/sql/mdl.cc:2822
      #9  0x0000559915257f1b in MDL_context::release_lock (this=this@entry=0x154bf4000ee8, ticket=0x154bf40083b0) at /test/10.6_dbg/sql/mdl.cc:2842
      #10 0x000055991544cd20 in Item_func_release_all_locks::val_int (this=<optimized out>) at /test/10.6_dbg/sql/item_func.cc:4294
      #11 0x00005599152e7f8b in Type_handler::Item_send_long (this=<optimized out>, item=0x154bf4014078, protocol=0x154bf4001398, buf=<optimized out>) at /test/10.6_dbg/sql/sql_type.cc:7392
      #12 0x00005599152f11af in Type_handler_long::Item_send (this=<optimized out>, item=<optimized out>, protocol=<optimized out>, buf=<optimized out>) at /test/10.6_dbg/sql/sql_type.h:5595
      #13 0x0000559914fe64d8 in Item::send (this=0x154bf4014078, protocol=0x154bf4001398, buffer=0x154c5c968240) at /test/10.6_dbg/sql/item.h:1066
      #14 0x0000559914fe39da in Protocol::send_result_set_row (this=this@entry=0x154bf4001398, row_items=row_items@entry=0x154bf4013d48) at /test/10.6_dbg/sql/protocol.cc:1331
      #15 0x000055991507da3b in select_send::send_data (this=0x154bf4014a20, items=@0x154bf4013d48: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x154bf4014140, last = 0x154bf4014140, elements = 1}, <No data fields>}) at /test/10.6_dbg/sql/sql_class.cc:3019
      #16 0x0000559915170c40 in select_result_sink::send_data_with_check (sent=0, u=<optimized out>, items=<optimized out>, this=<optimized out>) at /test/10.6_dbg/sql/sql_class.h:5529
      #17 JOIN::exec_inner (this=this@entry=0x154bf4014a48) at /test/10.6_dbg/sql/sql_select.cc:4344
      #18 0x0000559915171b2d in JOIN::exec (this=this@entry=0x154bf4014a48) at /test/10.6_dbg/sql/sql_select.cc:4256
      #19 0x000055991516fd89 in mysql_select (thd=thd@entry=0x154bf4000db8, tables=0x0, fields=@0x154bf4013d48: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x154bf4014140, last = 0x154bf4014140, elements = 1}, <No data fields>}, conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x154bf4014a20, unit=0x154bf4004f88, select_lex=0x154bf4013bf8) at /test/10.6_dbg/sql/sql_select.cc:4729
      #20 0x0000559915170050 in handle_select (thd=thd@entry=0x154bf4000db8, lex=lex@entry=0x154bf4004ec0, result=result@entry=0x154bf4014a20, setup_tables_done_option=setup_tables_done_option@entry=0) at /test/10.6_dbg/sql/sql_select.cc:417
      #21 0x00005599150e21e0 in execute_sqlcom_select (thd=thd@entry=0x154bf4000db8, all_tables=0x0) at /test/10.6_dbg/sql/sql_parse.cc:6204
      #22 0x00005599150eeee2 in mysql_execute_command (thd=thd@entry=0x154bf4000db8) at /test/10.6_dbg/sql/sql_parse.cc:3900
      #23 0x00005599150db360 in mysql_parse (thd=thd@entry=0x154bf4000db8, rawbuf=rawbuf@entry=0x154bf4013b60 "SELECT RELEASE_ALL_LOCKS()", length=length@entry=26, parser_state=parser_state@entry=0x154c5c9693d0) at /test/10.6_dbg/sql/sql_parse.cc:7972
      #24 0x00005599150dacbc in wsrep_mysql_parse (thd=thd@entry=0x154bf4000db8, rawbuf=0x154bf4013b60 "SELECT RELEASE_ALL_LOCKS()", length=26, parser_state=parser_state@entry=0x154c5c9693d0) at /test/10.6_dbg/sql/sql_parse.cc:7786
      #25 0x00005599150e92a0 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x154bf4000db8, packet=packet@entry=0x154bf400b319 "", packet_length=packet_length@entry=26, blocking=blocking@entry=true) at /test/10.6_dbg/sql/sql_class.h:1295
      #26 0x00005599150ec70c in do_command (thd=0x154bf4000db8, blocking=blocking@entry=true) at /test/10.6_dbg/sql/sql_parse.cc:1397
      #27 0x0000559915249bf5 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55991805aeb8, put_in_cache=put_in_cache@entry=true) at /test/10.6_dbg/sql/sql_connect.cc:1410
      #28 0x000055991524a2fb in handle_one_connection (arg=arg@entry=0x55991805aeb8) at /test/10.6_dbg/sql/sql_connect.cc:1312
      #29 0x00005599157015cd in pfs_spawn_thread (arg=0x5599180d26c8) at /test/10.6_dbg/storage/perfschema/pfs.cc:2201
      #30 0x0000154c6e8a5609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #31 0x0000154c6e494293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      10.5.10 85bec9d691bb69ed20beb565b03d5585b94624fe (Optimized)

      Core was generated by `/test/GAL_MD240221-mariadb-10.5.10-linux-x86_64-opt/bin/mysqld --defaults-file='.
      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11)
          at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
      [Current thread is 1 (Thread 0x152674098700 (LWP 2822567))]
      (gdb) bt
      #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
      #1  0x0000562d5adc02ef in my_write_core (sig=sig@entry=11) at /test/10.5_opt/mysys/stacktrace.c:424
      #2  0x0000562d5a7ebb80 in handle_fatal_signal (sig=11) at /test/10.5_opt/sql/signal_handler.cc:330
      #3  <signal handler called>
      #4  MDL_lock::hog_lock_types_bitmap (this=0x15260c0218f8) at /test/10.5_opt/sql/mdl.cc:587
      #5  MDL_lock::reschedule_waiters (this=0x15260c0218f8) at /test/10.5_opt/sql/mdl.cc:1273
      #6  0x0000562d5a6ed136 in MDL_lock::remove_ticket (this=0x15260c0218f8, pins=0x562d5c8c08b8, list=list@entry=&MDL_lock::m_granted, ticket=ticket@entry=0x562d5c943fb0) at /test/10.5_opt/sql/mdl.cc:1811
      #7  0x0000562d5a6eda99 in MDL_context::release_lock (this=<optimized out>, duration=<optimized out>, ticket=0x562d5c943fb0) at /test/10.5_opt/sql/mdl.cc:2822
      #8  0x0000562d5a6edae1 in MDL_context::release_lock (this=<optimized out>, ticket=<optimized out>) at /test/10.5_opt/sql/mdl.cc:2842
      #9  0x0000562d5a852d2d in Item_func_release_all_locks::val_int (this=<optimized out>) at /test/10.5_opt/sql/item_func.cc:4294
      #10 0x0000562d5a75431d in Type_handler::Item_send_long (this=<optimized out>, item=0x15260c0108d8, protocol=0x15260c0011a0, buf=<optimized out>) at /test/10.5_opt/sql/sql_type.cc:7392
      #11 0x0000562d5a5199b0 in Protocol::send_result_set_row (this=this@entry=0x15260c0011a0, row_items=row_items@entry=0x15260c0105a8) at /test/10.5_opt/sql/protocol.cc:1085
      #12 0x0000562d5a58abf7 in select_send::send_data (this=0x15260c011280, items=@0x15260c0105a8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x15260c0109a0, last = 0x15260c0109a0, elements = 1}, <No data fields>}) at /test/10.5_opt/sql/sql_class.cc:3018
      #13 0x0000562d5a647d16 in select_result_sink::send_data_with_check (u=<optimized out>, sent=0, items=<optimized out>, this=<optimized out>) at /test/10.5_opt/sql/sql_class.h:5328
      #14 select_result_sink::send_data_with_check (sent=0, u=<optimized out>, items=<optimized out>, this=<optimized out>) at /test/10.5_opt/sql/sql_class.h:5318
      #15 JOIN::exec_inner (this=0x15260c0112a8) at /test/10.5_opt/sql/sql_select.cc:4334
      #16 0x0000562d5a648009 in JOIN::exec (this=this@entry=0x15260c0112a8) at /test/10.5_opt/sql/sql_select.cc:4246
      #17 0x0000562d5a6460ca in mysql_select (thd=0x15260c000c58, tables=0x0, fields=<optimized out>, conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x15260c011280, unit=0x15260c004c40, select_lex=0x15260c010458) at /test/10.5_opt/sql/sql_select.cc:4719
      #18 0x0000562d5a646a97 in handle_select (thd=thd@entry=0x15260c000c58, lex=lex@entry=0x15260c004b78, result=result@entry=0x15260c011280, setup_tables_done_option=setup_tables_done_option@entry=0) at /test/10.5_opt/sql/sql_select.cc:417
      #19 0x0000562d5a5d5381 in execute_sqlcom_select (thd=0x15260c000c58, all_tables=0x0) at /test/10.5_opt/sql/sql_parse.cc:6282
      #20 0x0000562d5a5e3617 in mysql_execute_command (thd=0x15260c000c58) at /test/10.5_opt/sql/sql_parse.cc:3978
      #21 0x0000562d5a5cff0f in mysql_parse (thd=thd@entry=0x15260c000c58, rawbuf=rawbuf@entry=0x15260c0103c0 "SELECT RELEASE_ALL_LOCKS()", length=length@entry=26, parser_state=parser_state@entry=0x152674097400, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.5_opt/sql/sql_parse.cc:8063
      #22 0x0000562d5a5cf94c in wsrep_mysql_parse (thd=0x15260c000c58, rawbuf=0x15260c0103c0 "SELECT RELEASE_ALL_LOCKS()", length=26, parser_state=0x152674097400, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /test/10.5_opt/sql/sql_parse.cc:7866
      #23 0x0000562d5a5dd607 in dispatch_command (command=COM_QUERY, thd=0x15260c000c58, packet=<optimized out>, packet_length=<optimized out>, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /test/10.5_opt/sql/sql_class.h:1257
      #24 0x0000562d5a5ddea7 in do_command (thd=0x15260c000c58) at /test/10.5_opt/sql/sql_parse.cc:1370
      #25 0x0000562d5a6e3781 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x562d5c9547f8, put_in_cache=put_in_cache@entry=true) at /test/10.5_opt/sql/sql_connect.cc:1410
      #26 0x0000562d5a6e3bfd in handle_one_connection (arg=arg@entry=0x562d5c9547f8) at /test/10.5_opt/sql/sql_connect.cc:1312
      #27 0x0000562d5aa6c4b6 in pfs_spawn_thread (arg=0x562d5c97c208) at /test/10.5_opt/storage/perfschema/pfs.cc:2201
      #28 0x000015268846d609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #29 0x000015268805c293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Bug confirmed present in:
      MariaDB: 10.6.0 (opt), 10.6.0 (dbg), 10.5.10 (opt), 10.5.10 (dbg)

      Bug confirmed not present in:
      MariaDB: 10.2.38, 10.3.29, 10.4.19

      Attachments

        Issue Links

          Activity

            Tested as main/skr.test:
            BACKUP STAGE START;
            START TRANSACTION;
            mtr main.skr - Works fine.

            Then I copied it to the suite/galera/t and then I got a crash

            The reason is this code in sql_connect.cc:
            #ifdef WITH_WSREP
            if (thd->wsrep_cs().state() == wsrep::client_state::s_exec)

            { /* Error happened after the thread acquired ownership to wsrep client state, but before command was processed. Clean up the state before wsrep_close(). */ wsrep_after_command_ignore_result(thd); }

            wsrep_close(thd);
            #endif /* WITH_WSREP */

            wsrep_close() calls eventually wsrep_client_service::bf_rollback(), which calls MDL_context::release_explicit_locks which is NOT ALLOWED TO DO
            This deletes all explicit locks in the server, including locks that are still active and that will be released
            later by the sub system that is using them (which leads to crashes)!

            In the above example, it is releasing the backup lock that later will be released when THD is deleted, which leads to a crash. I suspect we will have the similar issues with GLOBAL READ LOCK and other constructs that takes DDL locks.

            Suggested fix (one of the following)

            • Move the call to ws_close to the absolute end of THD::cleanup()
            • Don't call bf_rollback() in ws_close, but instead in THD::cleanup()

            and also remove the call m_thd->mdl_context.release_explicit_locks() from bf_rollback() !

            I think it's is wrong that bf_rollback() does a lot of things that THD::cleanup() is already doing.
            Better to remove this function altogether from beeing called by ws_close().

            monty Michael Widenius added a comment - Tested as main/skr.test: BACKUP STAGE START; START TRANSACTION; mtr main.skr - Works fine. Then I copied it to the suite/galera/t and then I got a crash The reason is this code in sql_connect.cc: #ifdef WITH_WSREP if (thd->wsrep_cs().state() == wsrep::client_state::s_exec) { /* Error happened after the thread acquired ownership to wsrep client state, but before command was processed. Clean up the state before wsrep_close(). */ wsrep_after_command_ignore_result(thd); } wsrep_close(thd); #endif /* WITH_WSREP */ wsrep_close() calls eventually wsrep_client_service::bf_rollback(), which calls MDL_context::release_explicit_locks which is NOT ALLOWED TO DO This deletes all explicit locks in the server, including locks that are still active and that will be released later by the sub system that is using them (which leads to crashes)! In the above example, it is releasing the backup lock that later will be released when THD is deleted, which leads to a crash. I suspect we will have the similar issues with GLOBAL READ LOCK and other constructs that takes DDL locks. Suggested fix (one of the following) Move the call to ws_close to the absolute end of THD::cleanup() Don't call bf_rollback() in ws_close, but instead in THD::cleanup() and also remove the call m_thd->mdl_context.release_explicit_locks() from bf_rollback() ! I think it's is wrong that bf_rollback() does a lot of things that THD::cleanup() is already doing. Better to remove this function altogether from beeing called by ws_close().

            seppo This can't be solved by simply removing that all m_thd->mdl_context.release_explicit_locks() from bf_rollback(). Above problem is naturally solved but then we have lot of regressions on when we do bf kill. Adding check if (thd->killed) before doing that code piece that is also done in THD::cleanup() helps most of cases but not all, especially streaming replication tests. For some reason when we do bf kill we do not call THD::cleanup() before we enter back to do_command() and especially here:

            /*
                If this command does not return a result, then we
                instruct wsrep_before_command() to skip result handling.
                This causes BF aborted transaction to roll back but keep
                the error state until next command which is able to return
                a result to the client.
              */
              if (unlikely(wsrep_service_started) &&
                  wsrep_before_command(thd, wsrep_command_no_result(command)))
              {
                /*
                  Aborted by background rollbacker thread.
                  Handle error here and jump straight to out.
                  Notice that thd->store_globals() is called
                  in wsrep_before_command().
                */
                WSREP_LOG_THD(thd, "enter found BF aborted");
                DBUG_ASSERT(!thd->mdl_context.has_locks());
                DBUG_ASSERT(!thd->get_stmt_da()->is_set());
             
            and naturally mdl_context.has_locks() at this point
            

            jplindst Jan Lindström (Inactive) added a comment - seppo This can't be solved by simply removing that all m_thd->mdl_context.release_explicit_locks() from bf_rollback(). Above problem is naturally solved but then we have lot of regressions on when we do bf kill. Adding check if (thd->killed) before doing that code piece that is also done in THD::cleanup() helps most of cases but not all, especially streaming replication tests. For some reason when we do bf kill we do not call THD::cleanup() before we enter back to do_command() and especially here: /* If this command does not return a result, then we instruct wsrep_before_command() to skip result handling. This causes BF aborted transaction to roll back but keep the error state until next command which is able to return a result to the client. */ if (unlikely(wsrep_service_started) && wsrep_before_command(thd, wsrep_command_no_result(command))) { /* Aborted by background rollbacker thread. Handle error here and jump straight to out. Notice that thd->store_globals() is called in wsrep_before_command(). */ WSREP_LOG_THD(thd, "enter found BF aborted"); DBUG_ASSERT(!thd->mdl_context.has_locks()); DBUG_ASSERT(!thd->get_stmt_da()->is_set());   and naturally mdl_context.has_locks() at this point

            Update: Removing code from bf_rollback and removing above assertion does not also fix the issue especially for TOI.

            jplindst Jan Lindström (Inactive) added a comment - - edited Update: Removing code from bf_rollback and removing above assertion does not also fix the issue especially for TOI.
            seppo Seppo Jaakola added a comment -

            Note that explicit locks by GET_LOCK() are not supported in the cluster. Knowledge base has a page about it too: https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/. This page is outdated, and needs to be updated, but it is still the fact that such explicit named locks are not supported in cluster environment. Adding explicit locking support is not in current road map either.

            To fix this issue, the GET_LOCK() / RELEASE_LOCK() calls should be rejected with proper warning message.

            seppo Seppo Jaakola added a comment - Note that explicit locks by GET_LOCK() are not supported in the cluster. Knowledge base has a page about it too: https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/ . This page is outdated, and needs to be updated, but it is still the fact that such explicit named locks are not supported in cluster environment. Adding explicit locking support is not in current road map either. To fix this issue, the GET_LOCK() / RELEASE_LOCK() calls should be rejected with proper warning message.

            seppo My test case contained only :

            BACKUP STAGE START;
            START TRANSACTION;
            

            This we need to support.

            jplindst Jan Lindström (Inactive) added a comment - seppo My test case contained only : BACKUP STAGE START; START TRANSACTION; This we need to support.

            People

              jplindst Jan Lindström (Inactive)
              ramesh Ramesh Sivaraman
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.