Details

    Description

      mtr --repeat=10 rpl.rpl_parallel2 sometimes hangs in 10.5
      It works in 10.4.

      The problem in 10.5 started after a merge from 10.3 -> 10.5 on 2 of July that enabled the test in
      10.5. The issue is probably because of the slightly different implementation of FLUSH TABLES WITH READ LOCK in 10.5 compared to 10.4

      The hangs happens in reap of "FLUSH TABLES WITH READ LOCK" at line
      157 in suite/rpl/t/rpl_parallel2.test

      Other things:

      • The comments in ba02550166eb39c0375a6422ecaa4731421250b6 may be useful to find and fix the bug
      • The code for flush_tables_with_read_lock is in sql_reload.cc. I would suggest that one looks at the differences
        between the functions in 10.4 and 10.5 to try to find out what is going on.

      Attachments

        Issue Links

          Activity

            sachin.setiya.007 Sachin Setiya (Inactive) added a comment - - edited

            It fails in 10.4 also but for that for have to change slave_parallel_mode to optimistic , In 10.1 to 10.3 there is debug assert when I change slave_parallel_mode , So the patch in bb-10.5-23089 applies to 10.4 and 10.5

            sachin.setiya.007 Sachin Setiya (Inactive) added a comment - - edited It fails in 10.4 also but for that for have to change slave_parallel_mode to optimistic , In 10.1 to 10.3 there is debug assert when I change slave_parallel_mode , So the patch in bb-10.5-23089 applies to 10.4 and 10.5

            If I change slave_parallel_mode in 10.1 , 10.2, 10.3 I get following assert

            Thread 1 (Thread 0x7f1802b68700 (LWP 24673)):                                                                                                                                                 
            #0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57                                                                                        
            #1  0x0000555bd263e22f in my_write_core (sig=6) at mysys/stacktrace.c:477                                                                                                                     
            #2  0x0000555bd1fecb79 in handle_fatal_signal (sig=6) at sql/signal_handler.cc:296                                                                                                            
            #3  <signal handler called>                                                                                                                                                                   
            #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51                                                                                                                     
            #5  0x00007f1808af58b1 in __GI_abort () at abort.c:79                                                                                                                                         
            #6  0x00007f1808ae542a in __assert_fail_base (fmt=0x7f1808c6ca38 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x555bd2743518 "(mdl_request->type != MDL_INTENTION_EX
            CLUSIVE && mdl_request->type != MDL_EXCLUSIVE) || !(get_thd()->rgi_slave && get_thd()->rgi_slave->is_parallel_exec && lock->check_if_conflicting_replication_locks(this))", file=file@entry=0x
            555bd2742f8a "sql/mdl.cc", line=line@entry=2104, function=function@entry=0x555bd2743da0 <MDL_context::acquire_lock(MDL_request*, double)::__PRETTY_FUNCTION__> "bool MDL_context::acquire_lock
            (MDL_request*, double)") at assert.c:92                                                                                                                                                       
            #7  0x00007f1808ae54a2 in __GI___assert_fail (assertion=0x555bd2743518 "(mdl_request->type != MDL_INTENTION_EXCLUSIVE && mdl_request->type != MDL_EXCLUSIVE) || !(get_thd()->rgi_slave && get_
            thd()->rgi_slave->is_parallel_exec && lock->check_if_conflicting_replication_locks(this))", file=0x555bd2742f8a "sql/mdl.cc", line=2104, function=0x555bd2743da0 <MDL_context::acquire_lock(MD
            L_request*, double)::__PRETTY_FUNCTION__> "bool MDL_context::acquire_lock(MDL_request*, double)") at assert.c:101                                                                             
            #8  0x0000555bd1ef3567 in MDL_context::acquire_lock (this=0x7f17ef051168, mdl_request=0x7f1802b66fa0, lock_wait_timeout=31536000) at sql/mdl.cc:2100                                          
            #9  0x0000555bd1d46649 in open_table (thd=0x7f17ef051070, table_list=0x7f1802b67590, ot_ctx=0x7f1802b672e0) at sql/sql_base.cc:2403                                                           
            #10 0x0000555bd1d496d4 in open_and_process_table (thd=0x7f17ef051070, tables=0x7f1802b67590, counter=0x7f1802b67374, flags=0, prelocking_strategy=0x7f1802b673f8, has_prelocking_list=false, o
            t_ctx=0x7f1802b672e0) at sql/sql_base.cc:4168                                                                                                                                                 
            #11 0x0000555bd1d4a4bd in open_tables (thd=0x7f17ef051070, options=..., start=0x7f1802b67358, counter=0x7f1802b67374, flags=0, prelocking_strategy=0x7f1802b673f8) at sql/sql_base.cc:4627    
            #12 0x0000555bd1d4bc22 in open_and_lock_tables (thd=0x7f17ef051070, options=..., tables=0x7f1802b67590, derived=false, flags=0, prelocking_strategy=0x7f1802b673f8) at sql/sql_base.cc:5386   
            #13 0x0000555bd1d149a1 in open_and_lock_tables (thd=0x7f17ef051070, tables=0x7f1802b67590, derived=false, flags=0) at sql/sql_base.h:547                                                      
            #14 0x0000555bd1f4d7d2 in rpl_slave_state::record_gtid (this=0x7f1808047c00, thd=0x7f17ef051070, gtid=0x7f1802b67be0, sub_id=20, rgi=0x7f17f0c1a800, in_statement=false) at sql/rpl_gtid.cc:55
            8                                                                                                                                                                                             
            #15 0x0000555bd20ec9a8 in Xid_log_event::do_apply_event (this=0x7f17f0c7b670, rgi=0x7f17f0c1a800) at sql/log_event.cc:7703                                                                    
            #16 0x0000555bd1d068ad in Log_event::apply_event (this=0x7f17f0c7b670, rgi=0x7f17f0c1a800) at sql/log_event.h:1343                                                                            
            #17 0x0000555bd1cfc2f2 in apply_event_and_update_pos_apply (ev=0x7f17f0c7b670, thd=0x7f17ef051070, rgi=0x7f17f0c1a800, reason=0) at sql/slave.cc:3482                                         
            #18 0x0000555bd1cfc792 in apply_event_and_update_pos_for_parallel (ev=0x7f17f0c7b670, thd=0x7f17ef051070, rgi=0x7f17f0c1a800) at sql/slave.cc:3626                                            
            #19 0x0000555bd1f529d7 in rpt_handle_event (qev=0x7f17f0c6e270, rpt=0x7f17f0c7ece0) at sql/rpl_parallel.cc:50                                                                                 
            #20 0x0000555bd1f5584f in handle_rpl_parallel_thread (arg=0x7f17f0c7ece0) at sql/rpl_parallel.cc:1274                                                                                         
            #21 0x0000555bd2305f93 in pfs_spawn_thread (arg=0x7f17f0c58270) at storage/perfschema/pfs.cc:1868                                                                                             
            #22 0x00007f18095d46db in start_thread (arg=0x7f1802b68700) at pthread_create.c:463                                                                                                           
            #23 0x00007f1808bd6a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95   
            

            sachin.setiya.007 Sachin Setiya (Inactive) added a comment - If I change slave_parallel_mode in 10.1 , 10.2, 10.3 I get following assert Thread 1 (Thread 0x7f1802b68700 (LWP 24673)): #0 __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57 #1 0x0000555bd263e22f in my_write_core (sig=6) at mysys/stacktrace.c:477 #2 0x0000555bd1fecb79 in handle_fatal_signal (sig=6) at sql/signal_handler.cc:296 #3 <signal handler called> #4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #5 0x00007f1808af58b1 in __GI_abort () at abort.c:79 #6 0x00007f1808ae542a in __assert_fail_base (fmt=0x7f1808c6ca38 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x555bd2743518 "(mdl_request->type != MDL_INTENTION_EX CLUSIVE && mdl_request->type != MDL_EXCLUSIVE) || !(get_thd()->rgi_slave && get_thd()->rgi_slave->is_parallel_exec && lock->check_if_conflicting_replication_locks(this))", file=file@entry=0x 555bd2742f8a "sql/mdl.cc", line=line@entry=2104, function=function@entry=0x555bd2743da0 <MDL_context::acquire_lock(MDL_request*, double)::__PRETTY_FUNCTION__> "bool MDL_context::acquire_lock (MDL_request*, double)") at assert.c:92 #7 0x00007f1808ae54a2 in __GI___assert_fail (assertion=0x555bd2743518 "(mdl_request->type != MDL_INTENTION_EXCLUSIVE && mdl_request->type != MDL_EXCLUSIVE) || !(get_thd()->rgi_slave && get_ thd()->rgi_slave->is_parallel_exec && lock->check_if_conflicting_replication_locks(this))", file=0x555bd2742f8a "sql/mdl.cc", line=2104, function=0x555bd2743da0 <MDL_context::acquire_lock(MD L_request*, double)::__PRETTY_FUNCTION__> "bool MDL_context::acquire_lock(MDL_request*, double)") at assert.c:101 #8 0x0000555bd1ef3567 in MDL_context::acquire_lock (this=0x7f17ef051168, mdl_request=0x7f1802b66fa0, lock_wait_timeout=31536000) at sql/mdl.cc:2100 #9 0x0000555bd1d46649 in open_table (thd=0x7f17ef051070, table_list=0x7f1802b67590, ot_ctx=0x7f1802b672e0) at sql/sql_base.cc:2403 #10 0x0000555bd1d496d4 in open_and_process_table (thd=0x7f17ef051070, tables=0x7f1802b67590, counter=0x7f1802b67374, flags=0, prelocking_strategy=0x7f1802b673f8, has_prelocking_list=false, o t_ctx=0x7f1802b672e0) at sql/sql_base.cc:4168 #11 0x0000555bd1d4a4bd in open_tables (thd=0x7f17ef051070, options=..., start=0x7f1802b67358, counter=0x7f1802b67374, flags=0, prelocking_strategy=0x7f1802b673f8) at sql/sql_base.cc:4627 #12 0x0000555bd1d4bc22 in open_and_lock_tables (thd=0x7f17ef051070, options=..., tables=0x7f1802b67590, derived=false, flags=0, prelocking_strategy=0x7f1802b673f8) at sql/sql_base.cc:5386 #13 0x0000555bd1d149a1 in open_and_lock_tables (thd=0x7f17ef051070, tables=0x7f1802b67590, derived=false, flags=0) at sql/sql_base.h:547 #14 0x0000555bd1f4d7d2 in rpl_slave_state::record_gtid (this=0x7f1808047c00, thd=0x7f17ef051070, gtid=0x7f1802b67be0, sub_id=20, rgi=0x7f17f0c1a800, in_statement=false) at sql/rpl_gtid.cc:55 8 #15 0x0000555bd20ec9a8 in Xid_log_event::do_apply_event (this=0x7f17f0c7b670, rgi=0x7f17f0c1a800) at sql/log_event.cc:7703 #16 0x0000555bd1d068ad in Log_event::apply_event (this=0x7f17f0c7b670, rgi=0x7f17f0c1a800) at sql/log_event.h:1343 #17 0x0000555bd1cfc2f2 in apply_event_and_update_pos_apply (ev=0x7f17f0c7b670, thd=0x7f17ef051070, rgi=0x7f17f0c1a800, reason=0) at sql/slave.cc:3482 #18 0x0000555bd1cfc792 in apply_event_and_update_pos_for_parallel (ev=0x7f17f0c7b670, thd=0x7f17ef051070, rgi=0x7f17f0c1a800) at sql/slave.cc:3626 #19 0x0000555bd1f529d7 in rpt_handle_event (qev=0x7f17f0c6e270, rpt=0x7f17f0c7ece0) at sql/rpl_parallel.cc:50 #20 0x0000555bd1f5584f in handle_rpl_parallel_thread (arg=0x7f17f0c7ece0) at sql/rpl_parallel.cc:1274 #21 0x0000555bd2305f93 in pfs_spawn_thread (arg=0x7f17f0c58270) at storage/perfschema/pfs.cc:1868 #22 0x00007f18095d46db in start_thread (arg=0x7f1802b68700) at pthread_create.c:463 #23 0x00007f1808bd6a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

            Thanks for the information, will do the review of the patch tomorrow morning

            monty Michael Widenius added a comment - Thanks for the information, will do the review of the patch tomorrow morning

            Review sent by emal

            monty Michael Widenius added a comment - Review sent by emal

            Removing version 10.1 to 10.3 , Since they fail with different error. I have created MDEV-23381 for it

            sachin.setiya.007 Sachin Setiya (Inactive) added a comment - Removing version 10.1 to 10.3 , Since they fail with different error. I have created MDEV-23381 for it

            People

              sachin.setiya.007 Sachin Setiya (Inactive)
              monty Michael Widenius
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.