Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22370

safe_mutex: Trying to lock uninitialized mutex at /data/src/10.4-bug/sql/rpl_parallel.cc, line 470 upon shutdown during FTWRL

Details

    Description

      10.5 87a7968c

      2020-04-14 10:14:49 0 [Note] /ten1/builds/10.5-87a7968c-deb/bin/mysqld (initiated by: root[root] @ localhost [127.0.0.1]): Normal shutdown
      2020-04-14 10:14:49 0 [Note] Event Scheduler: Purging the queue. 14 events
      safe_mutex: Trying to lock uninitialized mutex at /home/elenst/src/10.5/sql/rpl_parallel.cc, line 470
      200414 10:14:49 [ERROR] mysqld got signal 6 ;
       
      #3  <signal handler called>
      #4  0x00007f73a1ad5207 in raise () from /lib64/libc.so.6
      #5  0x00007f73a1ad68f8 in abort () from /lib64/libc.so.6
      #6  0x0000560ad0f8b4cf in safe_mutex_lock (mp=0x560ad1f58d90 <global_rpl_thread_pool+16>, my_flags=0, file=0x560ad11aaf40 "/home/elenst/src/10.5/sql/rpl_parallel.cc", line=470) at /home/elenst/src/10.5/mysys/thr_mutex.c:244
      #7  0x0000560ad05107d1 in inline_mysql_mutex_lock (that=0x560ad1f58d90 <global_rpl_thread_pool+16>, src_file=0x560ad11aaf40 "/home/elenst/src/10.5/sql/rpl_parallel.cc", src_line=470) at /home/elenst/src/10.5/include/mysql/psi/mysql_thread.h:763
      #8  0x0000560ad051229d in pool_mark_busy (pool=0x560ad1f58d80 <global_rpl_thread_pool>, thd=0x7f7328000b18) at /home/elenst/src/10.5/sql/rpl_parallel.cc:470
      #9  0x0000560ad05126f2 in rpl_pause_for_ftwrl (thd=0x7f7328000b18) at /home/elenst/src/10.5/sql/rpl_parallel.cc:563
      #10 0x0000560ad02eb534 in mysql_execute_command (thd=0x7f7328000b18) at /home/elenst/src/10.5/sql/sql_parse.cc:5374
      #11 0x0000560ad02f3912 in mysql_parse (thd=0x7f7328000b18, rawbuf=0x7f7328012460 "FLUSH /* QNO 1411 CON_ID 9 */ TABLES WITH READ LOCK", length=51, parser_state=0x7f7384acf520, is_com_multi=false, is_next_command=false) at /home/elenst/src/10.5/sql/sql_parse.cc:7953
      #12 0x0000560ad02df2c0 in dispatch_command (command=COM_QUERY, thd=0x7f7328000b18, packet=0x7f7328008809 "FLUSH /* QNO 1411 CON_ID 9 */ TABLES WITH READ LOCK", packet_length=51, is_com_multi=false, is_next_command=false) at /home/elenst/src/10.5/sql/sql_parse.cc:1840
      #13 0x0000560ad02dd93c in do_command (thd=0x7f7328000b18) at /home/elenst/src/10.5/sql/sql_parse.cc:1359
      #14 0x0000560ad0483c6d in do_handle_one_connection (connect=0x560ad51db8f8, put_in_cache=true) at /home/elenst/src/10.5/sql/sql_connect.cc:1422
      #15 0x0000560ad0483986 in handle_one_connection (arg=0x560ad51db8f8) at /home/elenst/src/10.5/sql/sql_connect.cc:1319
      #16 0x0000560ad09c173e in pfs_spawn_thread (arg=0x560ad52f7448) at /home/elenst/src/10.5/storage/perfschema/pfs.cc:2201
      #17 0x00007f73a310bdd5 in start_thread () from /lib64/libpthread.so.0
      #18 0x00007f73a1b9cead in clone () from /lib64/libc.so.6
      

      The stack trace above is from an original failure in concurrent tests.

      The test case below will produce a failure with line numbers in rpl_parallel.cc shifted by 1, due to the suggested debug injection.
      Theoretically it can fail without the injection, but chances are slim, while with it it fails for me every time.

      injection

      index 9344312fe62..e42e6b2ebb5 100644
      --- a/sql/rpl_parallel.cc
      +++ b/sql/rpl_parallel.cc
      @@ -467,6 +467,7 @@ pool_mark_busy(rpl_parallel_thread_pool *pool, THD *thd)
           So we protect the infrequent operations of FLUSH TABLES WITH READ LOCK and
           pool size changes with this condition wait.
         */
      +  DBUG_EXECUTE_IF("sleep_mark_busy",my_sleep(1000000););
         mysql_mutex_lock(&pool->LOCK_rpl_thread_pool);
         if (thd)
         {
      

      Test case

      --connect (con1,localhost,root,,)
      SET DEBUG_DBUG='+d,sleep_mark_busy';
      --send
        FLUSH TABLES WITH READ LOCK;
       
      --connection default
      --source include/restart_mysqld.inc
      

      Reproducible on 10.4, 10.5.
      Could not reproduce and didn't observe on 10.3.
      I have no information if the problem has any effect on non-debug builds.

      Attachments

        Issue Links

          Activity

            Elkin Andrei Elkin added a comment -

            sachin.setiya.007 I think we can crack this one quickly. Could you check it once having breaks from ALTER?

            Elkin Andrei Elkin added a comment - sachin.setiya.007 I think we can crack this one quickly. Could you check it once having breaks from ALTER?
            elenst Elena Stepanova added a comment - - edited

            In concurrent tests, I am also getting what appears to be a closely related problem. It's harder for me to simulate, maybe the race condition is somewhat thinner, or I just don't see the right spot, but I hope you'll take care of it along with the reported one.
            In all cases that I looked at, the one in the description happens in rpl_pause_for_ftwrl/pool_mark_busy, while the one below – in rpl_unpause_after_ftwrl/pool_mark_not_busy. Both happen on FTWRL during server shutdown.

            10.5 2655984e

            2020-04-23  8:54:58 0 [Note] /ten1/builds/10.5-2655984e-deb/bin/mysqld (initiated by: root[root] @ localhost [127.0.0.1]): Normal shutdown
            2020-04-23  8:54:58 0 [Note] Event Scheduler: Purging the queue. 0 events
            Got error 22 when trying to lock mutex LOCK_rpl_thread_pool at /home/elenst/src/10.5/sql/rpl_parallel.cc, line 500
            200423  8:54:58 [ERROR] mysqld got signal 6 ;
             
            #3  <signal handler called>
            #4  0x00007f43283f2207 in raise () from /lib64/libc.so.6
            #5  0x00007f43283f38f8 in abort () from /lib64/libc.so.6
            #6  0x000056135e2a6474 in safe_mutex_lock (mp=0x56135f273d90 <global_rpl_thread_pool+16>, my_flags=0, file=0x56135e4c6060 "/home/elenst/src/10.5/sql/rpl_parallel.cc", line=500) at /home/elenst/src/10.5/mysys/thr_mutex.c:297
            #7  0x000056135d82b3e3 in inline_mysql_mutex_lock (that=0x56135f273d90 <global_rpl_thread_pool+16>, src_file=0x56135e4c6060 "/home/elenst/src/10.5/sql/rpl_parallel.cc", src_line=500) at /home/elenst/src/10.5/include/mysql/psi/mysql_thread.h:763
            #8  0x000056135d82d016 in pool_mark_not_busy (pool=0x56135f273d80 <global_rpl_thread_pool>) at /home/elenst/src/10.5/sql/rpl_parallel.cc:500
            #9  0x000056135d82d280 in rpl_unpause_after_ftwrl (thd=0x7f42b0000b18) at /home/elenst/src/10.5/sql/rpl_parallel.cc:537
            #10 0x000056135d605bb0 in mysql_execute_command (thd=0x7f42b0000b18) at /home/elenst/src/10.5/sql/sql_parse.cc:5433
            #11 0x000056135d60dcda in mysql_parse (thd=0x7f42b0000b18, rawbuf=0x7f42b0012580 "FLUSH /* QNO 3372 CON_ID 10 */ TABLES WITH READ LOCK", length=52, parser_state=0x7f43107bd520, is_com_multi=false, is_next_command=false) at /home/elenst/src/10.5/sql/sql_parse.cc:7957
            #12 0x000056135d5f962c in dispatch_command (command=COM_QUERY, thd=0x7f42b0000b18, packet=0x7f42b0008929 "FLUSH /* QNO 3372 CON_ID 10 */ TABLES WITH READ LOCK", packet_length=52, is_com_multi=false, is_next_command=false) at /home/elenst/src/10.5/sql/sql_parse.cc:1840
            #13 0x000056135d5f7ca8 in do_command (thd=0x7f42b0000b18) at /home/elenst/src/10.5/sql/sql_parse.cc:1359
            #14 0x000056135d79e479 in do_handle_one_connection (connect=0x56136289ed68, put_in_cache=true) at /home/elenst/src/10.5/sql/sql_connect.cc:1422
            #15 0x000056135d79e192 in handle_one_connection (arg=0x56136289ed68) at /home/elenst/src/10.5/sql/sql_connect.cc:1319
            #16 0x000056135dcdc84c in pfs_spawn_thread (arg=0x561362781be8) at /home/elenst/src/10.5/storage/perfschema/pfs.cc:2201
            #17 0x00007f4329a28dd5 in start_thread () from /lib64/libpthread.so.0
            #18 0x00007f43284b9ead in clone () from /lib64/libc.so.6
            

            All threads stack trace is attached: error22_all_threads

            elenst Elena Stepanova added a comment - - edited In concurrent tests, I am also getting what appears to be a closely related problem. It's harder for me to simulate, maybe the race condition is somewhat thinner, or I just don't see the right spot, but I hope you'll take care of it along with the reported one. In all cases that I looked at, the one in the description happens in rpl_pause_for_ftwrl / pool_mark_busy , while the one below – in rpl_unpause_after_ftwrl / pool_mark_not_busy . Both happen on FTWRL during server shutdown. 10.5 2655984e 2020-04-23 8:54:58 0 [Note] /ten1/builds/10.5-2655984e-deb/bin/mysqld (initiated by: root[root] @ localhost [127.0.0.1]): Normal shutdown 2020-04-23 8:54:58 0 [Note] Event Scheduler: Purging the queue. 0 events Got error 22 when trying to lock mutex LOCK_rpl_thread_pool at /home/elenst/src/10.5/sql/rpl_parallel.cc, line 500 200423 8:54:58 [ERROR] mysqld got signal 6 ;   #3 <signal handler called> #4 0x00007f43283f2207 in raise () from /lib64/libc.so.6 #5 0x00007f43283f38f8 in abort () from /lib64/libc.so.6 #6 0x000056135e2a6474 in safe_mutex_lock (mp=0x56135f273d90 <global_rpl_thread_pool+16>, my_flags=0, file=0x56135e4c6060 "/home/elenst/src/10.5/sql/rpl_parallel.cc", line=500) at /home/elenst/src/10.5/mysys/thr_mutex.c:297 #7 0x000056135d82b3e3 in inline_mysql_mutex_lock (that=0x56135f273d90 <global_rpl_thread_pool+16>, src_file=0x56135e4c6060 "/home/elenst/src/10.5/sql/rpl_parallel.cc", src_line=500) at /home/elenst/src/10.5/include/mysql/psi/mysql_thread.h:763 #8 0x000056135d82d016 in pool_mark_not_busy (pool=0x56135f273d80 <global_rpl_thread_pool>) at /home/elenst/src/10.5/sql/rpl_parallel.cc:500 #9 0x000056135d82d280 in rpl_unpause_after_ftwrl (thd=0x7f42b0000b18) at /home/elenst/src/10.5/sql/rpl_parallel.cc:537 #10 0x000056135d605bb0 in mysql_execute_command (thd=0x7f42b0000b18) at /home/elenst/src/10.5/sql/sql_parse.cc:5433 #11 0x000056135d60dcda in mysql_parse (thd=0x7f42b0000b18, rawbuf=0x7f42b0012580 "FLUSH /* QNO 3372 CON_ID 10 */ TABLES WITH READ LOCK", length=52, parser_state=0x7f43107bd520, is_com_multi=false, is_next_command=false) at /home/elenst/src/10.5/sql/sql_parse.cc:7957 #12 0x000056135d5f962c in dispatch_command (command=COM_QUERY, thd=0x7f42b0000b18, packet=0x7f42b0008929 "FLUSH /* QNO 3372 CON_ID 10 */ TABLES WITH READ LOCK", packet_length=52, is_com_multi=false, is_next_command=false) at /home/elenst/src/10.5/sql/sql_parse.cc:1840 #13 0x000056135d5f7ca8 in do_command (thd=0x7f42b0000b18) at /home/elenst/src/10.5/sql/sql_parse.cc:1359 #14 0x000056135d79e479 in do_handle_one_connection (connect=0x56136289ed68, put_in_cache=true) at /home/elenst/src/10.5/sql/sql_connect.cc:1422 #15 0x000056135d79e192 in handle_one_connection (arg=0x56136289ed68) at /home/elenst/src/10.5/sql/sql_connect.cc:1319 #16 0x000056135dcdc84c in pfs_spawn_thread (arg=0x561362781be8) at /home/elenst/src/10.5/storage/perfschema/pfs.cc:2201 #17 0x00007f4329a28dd5 in start_thread () from /lib64/libpthread.so.0 #18 0x00007f43284b9ead in clone () from /lib64/libc.so.6 All threads stack trace is attached: error22_all_threads

            In versions before 10.4 pool_mark_busy is called only once , But in 10.4 forward shutdown slave also calls pool_mark_busy

            (gdb) bt
            #0  pool_mark_busy (pool=0x5555576f07e0 <global_rpl_thread_pool>, thd=0x0) at /home/sachin/10.4/server/sql/rpl_parallel.cc:455
            #1  0x0000555555f67532 in rpl_parallel_change_thread_count (pool=0x5555576f07e0 <global_rpl_thread_pool>, new_count=0, force=true) at /home/sachin/10.4/server/sql/rpl_parallel.cc:1530
            #2  0x0000555555f68e88 in rpl_parallel_thread_pool::destroy (this=0x5555576f07e0 <global_rpl_thread_pool>) at /home/sachin/10.4/server/sql/rpl_parallel.cc:2017
            #3  0x0000555555c51eb3 in slave_prepare_for_shutdown () at /home/sachin/10.4/server/sql/slave.cc:1448
            #4  0x0000555555c1eb89 in close_connections () at /home/sachin/10.4/server/sql/mysqld.cc:1765
            #5  0x0000555555c2785b in mysqld_main (argc=131, argv=0x555557f868e0) at /home/sachin/10.4/server/sql/mysqld.cc:5936
            #6  0x0000555555c1b77a in main (argc=7, argv=0x7fffffffb2c8) at /home/sachin/10.4/server/sql/main.cc:25
            (gdb)
            

            This commit seems responsible for this 5cd21ac20225 (https://jira.mariadb.org/browse/MDEV-20821)

            sachin.setiya.007 Sachin Setiya (Inactive) added a comment - In versions before 10.4 pool_mark_busy is called only once , But in 10.4 forward shutdown slave also calls pool_mark_busy (gdb) bt #0 pool_mark_busy (pool=0x5555576f07e0 <global_rpl_thread_pool>, thd=0x0) at /home/sachin/10.4/server/sql/rpl_parallel.cc:455 #1 0x0000555555f67532 in rpl_parallel_change_thread_count (pool=0x5555576f07e0 <global_rpl_thread_pool>, new_count=0, force=true) at /home/sachin/10.4/server/sql/rpl_parallel.cc:1530 #2 0x0000555555f68e88 in rpl_parallel_thread_pool::destroy (this=0x5555576f07e0 <global_rpl_thread_pool>) at /home/sachin/10.4/server/sql/rpl_parallel.cc:2017 #3 0x0000555555c51eb3 in slave_prepare_for_shutdown () at /home/sachin/10.4/server/sql/slave.cc:1448 #4 0x0000555555c1eb89 in close_connections () at /home/sachin/10.4/server/sql/mysqld.cc:1765 #5 0x0000555555c2785b in mysqld_main (argc=131, argv=0x555557f868e0) at /home/sachin/10.4/server/sql/mysqld.cc:5936 #6 0x0000555555c1b77a in main (argc=7, argv=0x7fffffffb2c8) at /home/sachin/10.4/server/sql/main.cc:25 (gdb) This commit seems responsible for this 5cd21ac20225 ( https://jira.mariadb.org/browse/MDEV-20821 )

            So the rpl_parallel_thread_pool::destroy destroys the mutex LOCK_rpl_thread_pool, while FTWRL is waiting

            sachin.setiya.007 Sachin Setiya (Inactive) added a comment - So the rpl_parallel_thread_pool::destroy destroys the mutex LOCK_rpl_thread_pool, while FTWRL is waiting

            This also fails

            --source include/master-slave.inc
             
            --connection master
             
            --connection slave
            stop slave;
            SET GLOBAL slave_parallel_threads=10;
            set global slave_parallel_mode=optimistic;
            change master to master_use_gtid=slave_pos;
            start slave;
            --connect (con1,127.0.0.1,root,,test,$SLAVE_MYPORT)
            SET DEBUG_DBUG='+d,sleep_mark_busy';
            --send
              FLUSH TABLES WITH READ LOCK;
             
            --connection slave
            --source include/rpl_restart_server.inc
             
            --connection slave
            stop slave;
            SET GLOBAL slave_parallel_threads=0;
            set global slave_parallel_mode=conservative;
            start slave;
            --disconnect con1
            --source include/rpl_end.inc
            
            

            sachin.setiya.007 Sachin Setiya (Inactive) added a comment - This also fails --source include/master-slave.inc   --connection master   --connection slave stop slave; SET GLOBAL slave_parallel_threads=10; set global slave_parallel_mode=optimistic; change master to master_use_gtid=slave_pos; start slave; --connect (con1,127.0.0.1,root,,test,$SLAVE_MYPORT) SET DEBUG_DBUG='+d,sleep_mark_busy'; --send FLUSH TABLES WITH READ LOCK;   --connection slave --source include/rpl_restart_server.inc   --connection slave stop slave; SET GLOBAL slave_parallel_threads=0; set global slave_parallel_mode=conservative; start slave; --disconnect con1 --source include/rpl_end.inc

            So this patch will not work because it fails with earlier case

            diff --git a/sql/rpl_parallel.cc b/sql/rpl_parallel.cc
            index 9344312fe62..1dea1caae7e 100644
            --- a/sql/rpl_parallel.cc
            +++ b/sql/rpl_parallel.cc
            @@ -467,6 +467,7 @@ pool_mark_busy(rpl_parallel_thread_pool *pool, THD *thd)
                 So we protect the infrequent operations of FLUSH TABLES WITH READ LOCK and
                 pool size changes with this condition wait.
               */
            +  DBUG_EXECUTE_IF("sleep_mark_busy",my_sleep(1000000););
               mysql_mutex_lock(&pool->LOCK_rpl_thread_pool);
               if (thd)
               {
            @@ -511,6 +512,8 @@ rpl_unpause_after_ftwrl(THD *thd)
               uint32 i;
               rpl_parallel_thread_pool *pool= &global_rpl_thread_pool;
               DBUG_ENTER("rpl_unpause_after_ftwrl");
            +  if (!pool->count)
            +    DBUG_VOID_RETURN;
             
               DBUG_ASSERT(pool->busy);
             
            @@ -552,6 +555,8 @@ rpl_pause_for_ftwrl(THD *thd)
               int err;
               DBUG_ENTER("rpl_pause_for_ftwrl");
             
            +  if (!pool->count)
            +    DBUG_RETURN(0);
               /*
                 While the count_pending_pause_for_ftwrl counter is non-zero, the pool
                 cannot be shutdown/resized, so threads are guaranteed to not disappear.
            
            

            sachin.setiya.007 Sachin Setiya (Inactive) added a comment - So this patch will not work because it fails with earlier case diff --git a/sql/rpl_parallel.cc b/sql/rpl_parallel.cc index 9344312fe62..1dea1caae7e 100644 --- a/sql/rpl_parallel.cc +++ b/sql/rpl_parallel.cc @@ -467,6 +467,7 @@ pool_mark_busy(rpl_parallel_thread_pool *pool, THD *thd) So we protect the infrequent operations of FLUSH TABLES WITH READ LOCK and pool size changes with this condition wait. */ + DBUG_EXECUTE_IF("sleep_mark_busy",my_sleep(1000000);); mysql_mutex_lock(&pool->LOCK_rpl_thread_pool); if (thd) { @@ -511,6 +512,8 @@ rpl_unpause_after_ftwrl(THD *thd) uint32 i; rpl_parallel_thread_pool *pool= &global_rpl_thread_pool; DBUG_ENTER("rpl_unpause_after_ftwrl"); + if (!pool->count) + DBUG_VOID_RETURN; DBUG_ASSERT(pool->busy); @@ -552,6 +555,8 @@ rpl_pause_for_ftwrl(THD *thd) int err; DBUG_ENTER("rpl_pause_for_ftwrl"); + if (!pool->count) + DBUG_RETURN(0); /* While the count_pending_pause_for_ftwrl counter is non-zero, the pool cannot be shutdown/resized, so threads are guaranteed to not disappear.

            Patch branch bb-10.4-sachin

            sachin.setiya.007 Sachin Setiya (Inactive) added a comment - Patch branch bb-10.4-sachin
            Elkin Andrei Elkin added a comment -

            Couple of cosmetic comments are left on GH, otherwise the patch looks nice.
            Thanks!

            Elkin Andrei Elkin added a comment - Couple of cosmetic comments are left on GH, otherwise the patch looks nice. Thanks!

            People

              sachin.setiya.007 Sachin Setiya (Inactive)
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.