Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27039

Trying to lock mutex ... when the mutex was already locked | SIGABRT in safe_mutex_lock | Hangs

Details

    Description

      Very similar to MDEV-25890 but different versions and different area. Almost definitely related. This testcase too causes mysqld hangs and likely (though not confirmed) cross-mysqld interaction (i.e. one mysqld affecting another, without anything shared). Testcase is sporadic, though usually only 1-2 executions are needed.

      # mysqld options required for replay: --log-bin --performance-schema
      INSTALL PLUGIN spider SONAME 'ha_spider.so';
      SET GLOBAL expire_logs_days=11;
      SET GLOBAL binlog_checksum=NONE;
      

      Leads to:

      10.8.0 bc57ff7cf79504a198d4752b4748340a4afd886c (Debug)

      2021-11-13 14:21:40 0 [Note] /test/MD121121-mariadb-10.8.0-linux-x86_64-dbg/bin/mysqld: ready for connections.
      Version: '10.8.0-MariaDB-debug-log'  socket: '/test/MD121121-mariadb-10.8.0-linux-x86_64-dbg/socket.sock'  port: 37855  MariaDB Server
      safe_mutex: Trying to lock mutex at /test/10.8_dbg/sql/sql_plugin.cc, line 3093, when the mutex was already locked at /test/10.8_dbg/sql/sys_vars_shared.h, line 56 in thread T@47
      

      10.8.0 bc57ff7cf79504a198d4752b4748340a4afd886c (Debug)

      Core was generated by `/test/MD121121-mariadb-10.8.0-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
      Program terminated with signal SIGABRT, Aborted.
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      [Current thread is 1 (Thread 0x154df8bed700 (LWP 1629659))]
      (gdb) bt
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      #1  0x0000154e130d3859 in __GI_abort () at abort.c:79
      #2  0x00005603c8c83268 in safe_mutex_lock (mp=mp@entry=0x5603c9795b00 <LOCK_global_system_variables>, my_flags=my_flags@entry=0, file=file@entry=0x5603c8e31318 "/test/10.8_dbg/sql/sql_plugin.cc", line=line@entry=3093) at /test/10.8_dbg/mysys/thr_mutex.c:262
      #3  0x00005603c8c7d770 in psi_mutex_lock (that=0x5603c9795b00 <LOCK_global_system_variables>, file=file@entry=0x5603c8e31318 "/test/10.8_dbg/sql/sql_plugin.cc", line=line@entry=3093) at /test/10.8_dbg/mysys/my_thr_init.c:487
      #4  0x00005603c817f81c in inline_mysql_mutex_lock (src_line=3093, src_file=0x5603c8e31318 "/test/10.8_dbg/sql/sql_plugin.cc", that=<optimized out>) at /test/10.8_dbg/include/mysql/psi/mysql_thread.h:746
      #5  sync_dynamic_session_variables (thd=thd@entry=0x154dc0000db8, global_lock=global_lock@entry=true) at /test/10.8_dbg/sql/sql_plugin.cc:3093
      #6  0x00005603c817fa9a in intern_sys_var_ptr (thd=0x154dc0000db8, offset=160, global_lock=global_lock@entry=true) at /test/10.8_dbg/sql/sql_plugin.cc:3160
      #7  0x00005603c817fb8a in mysql_sys_var_int (thd=<optimized out>, offset=<optimized out>) at /test/10.8_dbg/sql/sql_plugin.cc:3182
      #8  0x0000154de91d59bf in spider_param_init_sql_alloc_size (thd=0x154dc0000db8, init_sql_alloc_size=0) at /test/10.8_dbg/storage/spider/spd_param.cc:600
      #9  0x0000154de92a0b01 in spider_mbase_handler::init (this=0x154dc005d950) at /test/10.8_dbg/storage/spider/spd_db_mysql.cc:8993
      #10 0x0000154de91edf2e in spider_get_trx (thd=thd@entry=0x154dc0000db8, regist_allocated_thds=regist_allocated_thds@entry=true, error_num=error_num@entry=0x154df8bebb1c) at /test/10.8_dbg/storage/spider/spd_trx.cc:1518
      #11 0x0000154de9248946 in spider_flush_logs (hton=<optimized out>) at /test/10.8_dbg/storage/spider/spd_table.cc:7040
      #12 0x00005603c845bf88 in flush_handlerton (thd=thd@entry=0x0, plugin=<optimized out>, arg=arg@entry=0x0) at /test/10.8_dbg/sql/handler.cc:3011
      #13 0x00005603c8180ab0 in plugin_foreach_with_mask (thd=thd@entry=0x0, func=func@entry=0x5603c845bf66 <flush_handlerton(THD*, plugin_ref, void*)>, type=type@entry=1, state_mask=state_mask@entry=8, arg=arg@entry=0x0) at /test/10.8_dbg/sql/sql_plugin.cc:2509
      #14 0x00005603c8463419 in ha_flush_logs () at /test/10.8_dbg/sql/handler.cc:3017
      #15 0x00005603c85ca5e8 in MYSQL_BIN_LOG::purge (this=this@entry=0x5603c97ca160 <mysql_bin_log>) at /test/10.8_dbg/sql/log.cc:7031
      #16 0x00005603c85cf2f9 in MYSQL_BIN_LOG::checkpoint_and_purge (this=this@entry=0x5603c97ca160 <mysql_bin_log>, binlog_id=binlog_id@entry=1) at /test/10.8_dbg/sql/log.cc:7043
      #17 0x00005603c85da247 in binlog_checksum_update (thd=<optimized out>, var=<optimized out>, var_ptr=<optimized out>, save=<optimized out>) at /test/10.8_dbg/sql/log.cc:11570
      #18 0x00005603c8181bf1 in sys_var_pluginvar::global_update (this=0x5603cbead8b0, thd=0x154dc0000db8, var=0x154dc0014308) at /test/10.8_dbg/sql/sql_plugin.cc:3621
      #19 0x00005603c805b6ae in sys_var::update (this=0x5603cbead8b0, thd=0x154dc0000db8, var=0x154dc0014308) at /test/10.8_dbg/sql/set_var.cc:207
      #20 0x00005603c805bbdb in set_var::update (this=<optimized out>, thd=<optimized out>) at /test/10.8_dbg/sql/set_var.cc:863
      #21 0x00005603c805cf21 in sql_set_variables (thd=thd@entry=0x154dc0000db8, var_list=var_list@entry=0x154dc00060d8, free=free@entry=true) at /test/10.8_dbg/sql/set_var.cc:745
      #22 0x00005603c81638f4 in mysql_execute_command (thd=thd@entry=0x154dc0000db8, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/10.8_dbg/sql/sql_parse.cc:5034
      #23 0x00005603c814ccad in mysql_parse (thd=thd@entry=0x154dc0000db8, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x154df8bec400) at /test/10.8_dbg/sql/sql_parse.cc:8028
      #24 0x00005603c815b949 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x154dc0000db8, packet=packet@entry=0x154dc001a239 "SET GLOBAL binlog_checksum=NONE", packet_length=packet_length@entry=31, blocking=blocking@entry=true) at /test/10.8_dbg/sql/sql_class.h:1360
      #25 0x00005603c815ed83 in do_command (thd=0x154dc0000db8, blocking=blocking@entry=true) at /test/10.8_dbg/sql/sql_parse.cc:1402
      #26 0x00005603c82d8e2a in do_handle_one_connection (connect=<optimized out>, connect@entry=0x5603cc5ccba8, put_in_cache=put_in_cache@entry=true) at /test/10.8_dbg/sql/sql_connect.cc:1418
      #27 0x00005603c82d942f in handle_one_connection (arg=arg@entry=0x5603cc5ccba8) at /test/10.8_dbg/sql/sql_connect.cc:1312
      #28 0x00005603c87594ce in pfs_spawn_thread (arg=0x5603cc4c3b88) at /test/10.8_dbg/storage/perfschema/pfs.cc:2201
      #29 0x0000154e135e2609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #30 0x0000154e131d0293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      

      Bug confirmed present in:
      MariaDB: 10.6.6 (dbg), 10.7.2 (dbg), 10.8.0 (dbg)

      Bug (or feature/syntax) confirmed not present in (unless sporadicity caused it not to crash/fail):
      MariaDB: 10.2.42 (dbg), 10.2.42 (opt), 10.3.33 (dbg), 10.3.33 (opt), 10.4.23 (dbg), 10.4.23 (opt), 10.5.14 (dbg), 10.5.14 (opt), 10.6.6 (opt), 10.7.2 (opt), 10.8.0 (opt)
      MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.51 (dbg), 5.6.51 (opt), 5.7.36 (dbg), 5.7.36 (opt), 8.0.27 (dbg), 8.0.27 (opt)

      Attachments

        Issue Links

          Activity

            The bug is reproducible on 10.6.3 but not on 10.6.0.

            nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - The bug is reproducible on 10.6.3 but not on 10.6.0.

            The result of git-bisect:

            49ff2cbff443b7d7e688f97887a05297901601da is the first bad commit
            commit 49ff2cbff443b7d7e688f97887a05297901601da
            Author: Sujatha <sujatha.sivakumar@mariadb.com>
            Date:   Fri May 7 12:57:20 2021 +0530
             
                MDEV-19371: Implement binlog_expire_logs_seconds for purging of binary logs
                
                Part1: Functional changes
                
                Backporting upstream changes.
                commit a7e1ef858ee82493dd8ad9a76bc9c22fe3b8c05b
                Author: Neha Kumari <neha.n.kumari@oracle.com>
                Note:
                From the upstream patch only the new option binlog_expire_logs_seconds
                specific changes are taken.
                
                * Unlike in the upstream patch 'binlog_expire_logs_seconds' does not
                  replace the "old" 'expire_logs_days', to preserve backward-compatibility.
                
                * Datatype of 'expire_logs_days' variable is changed to double.
                
                * Default value of 'binlog_expire_logs_seconds=0' similar to
                  'expire_logs_days'.
                
                * The purge_time can be specified in days with the micro-day precision.
                  Eg:
                  expire_logs_days=1 is the same as expire_logs_days=1.000000 to make
                    binlog_expire_logs_seconds=86400.
                  binlog_expire_logs_seconds=1 is the same as expire_logs_days=0.000012.
                
                * If binary log is disabled and option 'expire_logs_days' or
                  'binlog_expire_logs_seconds' used with purge_time > 0 a warning will be
                  issued.
             
             sql/log.cc      |  6 ++++--
             sql/mysqld.cc   | 34 +++++++++++++++++++++++++++++-----
             sql/mysqld.h    |  5 ++++-
             sql/sys_vars.cc | 38 +++++++++++++++++++++++++++++++++++---
             4 files changed, 72 insertions(+), 11 deletions(-)
            

            nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - The result of git-bisect : 49ff2cbff443b7d7e688f97887a05297901601da is the first bad commit commit 49ff2cbff443b7d7e688f97887a05297901601da Author: Sujatha <sujatha.sivakumar@mariadb.com> Date: Fri May 7 12:57:20 2021 +0530   MDEV-19371: Implement binlog_expire_logs_seconds for purging of binary logs Part1: Functional changes Backporting upstream changes. commit a7e1ef858ee82493dd8ad9a76bc9c22fe3b8c05b Author: Neha Kumari <neha.n.kumari@oracle.com> Note: From the upstream patch only the new option binlog_expire_logs_seconds specific changes are taken. * Unlike in the upstream patch 'binlog_expire_logs_seconds' does not replace the "old" 'expire_logs_days', to preserve backward-compatibility. * Datatype of 'expire_logs_days' variable is changed to double. * Default value of 'binlog_expire_logs_seconds=0' similar to 'expire_logs_days'. * The purge_time can be specified in days with the micro-day precision. Eg: expire_logs_days=1 is the same as expire_logs_days=1.000000 to make binlog_expire_logs_seconds=86400. binlog_expire_logs_seconds=1 is the same as expire_logs_days=0.000012. * If binary log is disabled and option 'expire_logs_days' or 'binlog_expire_logs_seconds' used with purge_time > 0 a warning will be issued.   sql/log.cc | 6 ++++-- sql/mysqld.cc | 34 +++++++++++++++++++++++++++++----- sql/mysqld.h | 5 ++++- sql/sys_vars.cc | 38 +++++++++++++++++++++++++++++++++++--- 4 files changed, 72 insertions(+), 11 deletions(-)

            Elkin The bug seems to be due to the change in the replication component. Please check it.

            nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - - edited Elkin The bug seems to be due to the change in the replication component. Please check it.

            In particular, this change which is not explained in the commit comment and is not present in the upstream patch.

            It comes from a different upstream commit and I suspect it's just wrong there.

            serg Sergei Golubchik added a comment - In particular, this change which is not explained in the commit comment and is not present in the upstream patch . It comes from a different upstream commit and I suspect it's just wrong there.

            What happens in the above back trace:

            LOCK_global_system_variables is acquired in sys_var::update(). Then, spider_param_init_sql_alloc_size() tries to access the Spider's system variable spider_init_sql_alloc_size and tries to take LOCK_global_system_variables lock (dead lock!).

            nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - What happens in the above back trace: LOCK_global_system_variables is acquired in sys_var::update() . Then, spider_param_init_sql_alloc_size() tries to access the Spider's system variable spider_init_sql_alloc_size and tries to take LOCK_global_system_variables lock (dead lock!).

            As serg pointed out, removing ha_flush_logs() fixes the problem.

            The following makeshift patch also fix the problem (note: this cannot be applied as is):

            diff --git a/storage/spider/spd_param.cc b/storage/spider/spd_param.cc
            index b23877ca92a..e8d9dd4fca3 100644
            --- a/storage/spider/spd_param.cc
            +++ b/storage/spider/spd_param.cc
            @@ -597,8 +597,7 @@ int spider_param_init_sql_alloc_size(
               int init_sql_alloc_size
             ) {
               DBUG_ENTER("spider_param_init_sql_alloc_size");
            -  DBUG_RETURN(THDVAR(thd, init_sql_alloc_size) < 0 ?
            -    init_sql_alloc_size : THDVAR(thd, init_sql_alloc_size));
            +  DBUG_RETURN(1024);
             }
             
             /*
            @@ -1147,7 +1146,7 @@ bool spider_param_use_flash_logs(
               THD *thd
             ) {
               DBUG_ENTER("spider_param_use_flash_logs");
            -  DBUG_RETURN(THDVAR(thd, use_flash_logs));
            +  DBUG_RETURN(FALSE);
             }
            

            nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - As serg pointed out, removing ha_flush_logs() fixes the problem. The following makeshift patch also fix the problem (note: this cannot be applied as is): diff --git a/storage/spider/spd_param.cc b/storage/spider/spd_param.cc index b23877ca92a..e8d9dd4fca3 100644 --- a/storage/spider/spd_param.cc +++ b/storage/spider/spd_param.cc @@ -597,8 +597,7 @@ int spider_param_init_sql_alloc_size( int init_sql_alloc_size ) { DBUG_ENTER( "spider_param_init_sql_alloc_size" ); - DBUG_RETURN(THDVAR(thd, init_sql_alloc_size) < 0 ? - init_sql_alloc_size : THDVAR(thd, init_sql_alloc_size)); + DBUG_RETURN(1024); } /* @@ -1147,7 +1146,7 @@ bool spider_param_use_flash_logs( THD *thd ) { DBUG_ENTER( "spider_param_use_flash_logs" ); - DBUG_RETURN(THDVAR(thd, use_flash_logs)); + DBUG_RETURN(FALSE); }

            Elkin Please check comments made by serg and me.

            If the log flushing is necessary for setting the binlog_checksum variable, we may have to change the Spider implementation. However, if it is wrongly there, please remove it.

            nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - Elkin Please check comments made by serg and me. If the log flushing is necessary for setting the binlog_checksum variable, we may have to change the Spider implementation. However, if it is wrongly there, please remove it.

            Elkin FYI, MDEV-25890 also seems to be introduced by 49ff2cb.

            nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - Elkin FYI, MDEV-25890 also seems to be introduced by 49ff2cb.

            I suspect that ha_flush_logs() there is not needed.
            But if it is, it certainly cannot be done under LOCK_global_system_variables mutex.

            serg Sergei Golubchik added a comment - I suspect that ha_flush_logs() there is not needed. But if it is, it certainly cannot be done under LOCK_global_system_variables mutex.

            Elkin FYI, some related info (testcase, notes) in MDEV-25890.

            Roel Roel Van de Paar added a comment - Elkin FYI, some related info (testcase, notes) in MDEV-25890 .
            Elkin Andrei Elkin added a comment -

            Indeed, ha_flush_logs() is unnecessary. Unlike the upstream, where it arrived from, Mariadb exploits different
            mechanisms for not letting PURGE or RESET-MASTER to trouble transaction recovery. That is in case should a trx
            be prepared but its binlog file gone, the trx then is committed on disk too.
            I am reverting the culprit to submit a commit for review.

            Elkin Andrei Elkin added a comment - Indeed, ha_flush_logs() is unnecessary. Unlike the upstream, where it arrived from, Mariadb exploits different mechanisms for not letting PURGE or RESET-MASTER to trouble transaction recovery. That is in case should a trx be prepared but its binlog file gone, the trx then is committed on disk too. I am reverting the culprit to submit a commit for review.
            Elkin Andrei Elkin added a comment -

            Howdy, Sergei.

            I'm inviting you to check 'cos you've been looking into anyway.

            Thank you.

            /ndrei.

            Elkin Andrei Elkin added a comment - Howdy, Sergei. I'm inviting you to check 'cos you've been looking into anyway. Thank you. /ndrei.

            the commit de548dbb0f2089 is, of course, ok, as discussed above.
            WIll you add a test case?

            serg Sergei Golubchik added a comment - the commit de548dbb0f2089 is, of course, ok, as discussed above. WIll you add a test case?
            Elkin Andrei Elkin added a comment -

            serg, no need to add more tests than those that the commit message points. They test PURGE and more.

            Elkin Andrei Elkin added a comment - serg , no need to add more tests than those that the commit message points. They test PURGE and more.

            what about the test for MDEV-27039? The commit specifically mentions this MDEV in the subject, but there is no test that the bug — the crash — went away.

            serg Sergei Golubchik added a comment - what about the test for MDEV-27039 ? The commit specifically mentions this MDEV in the subject, but there is no test that the bug — the crash — went away.
            Elkin Andrei Elkin added a comment -

            serg, You're right. My focus here was too much replication-centric. I adding the description based one to spider 's own suite.

            Elkin Andrei Elkin added a comment - serg , You're right. My focus here was too much replication-centric. I adding the description based one to spider 's own suite.
            Elkin Andrei Elkin added a comment -

            A regression test is added as a separate commit (to be fixed up at future pushing).
            It actually makes clear about the necessary condition to summon the bug.

            Cheers.

            Elkin Andrei Elkin added a comment - A regression test is added as a separate commit (to be fixed up at future pushing). It actually makes clear about the necessary condition to summon the bug. Cheers.

            2611abb3562 is to to push too, thanks!

            serg Sergei Golubchik added a comment - 2611abb3562 is to to push too, thanks!

            People

              Elkin Andrei Elkin
              Roel Roel Van de Paar
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.