[MDEV-23381] rpl_parallel2 fails in 10.1 to 10.3 if slave_parallel_mode is changed to optimistic Created: 2020-08-03  Updated: 2020-08-04  Resolved: 2020-08-03

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 10.1.46, 10.2.33, 10.3.24

Type: Bug Priority: Major
Reporter: Sachin Setiya (Inactive) Assignee: Sachin Setiya (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-23089 rpl_parallel2 fails in 10.5 Closed

 Description   

I got similar crash in 10.1 to 10.3

Thread 1 (Thread 0x7f1802b68700 (LWP 24673)):                                                                                                                                                 
#0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57                                                                                        
#1  0x0000555bd263e22f in my_write_core (sig=6) at mysys/stacktrace.c:477                                                                                                                     
#2  0x0000555bd1fecb79 in handle_fatal_signal (sig=6) at sql/signal_handler.cc:296                                                                                                            
#3  <signal handler called>                                                                                                                                                                   
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51                                                                                                                     
#5  0x00007f1808af58b1 in __GI_abort () at abort.c:79                                                                                                                                         
#6  0x00007f1808ae542a in __assert_fail_base (fmt=0x7f1808c6ca38 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x555bd2743518 "(mdl_request->type != MDL_INTENTION_EX
CLUSIVE && mdl_request->type != MDL_EXCLUSIVE) || !(get_thd()->rgi_slave && get_thd()->rgi_slave->is_parallel_exec && lock->check_if_conflicting_replication_locks(this))", file=file@entry=0x
555bd2742f8a "sql/mdl.cc", line=line@entry=2104, function=function@entry=0x555bd2743da0 <MDL_context::acquire_lock(MDL_request*, double)::__PRETTY_FUNCTION__> "bool MDL_context::acquire_lock
(MDL_request*, double)") at assert.c:92                                                                                                                                                       
#7  0x00007f1808ae54a2 in __GI___assert_fail (assertion=0x555bd2743518 "(mdl_request->type != MDL_INTENTION_EXCLUSIVE && mdl_request->type != MDL_EXCLUSIVE) || !(get_thd()->rgi_slave && get_
thd()->rgi_slave->is_parallel_exec && lock->check_if_conflicting_replication_locks(this))", file=0x555bd2742f8a "sql/mdl.cc", line=2104, function=0x555bd2743da0 <MDL_context::acquire_lock(MD
L_request*, double)::__PRETTY_FUNCTION__> "bool MDL_context::acquire_lock(MDL_request*, double)") at assert.c:101                                                                             
#8  0x0000555bd1ef3567 in MDL_context::acquire_lock (this=0x7f17ef051168, mdl_request=0x7f1802b66fa0, lock_wait_timeout=31536000) at sql/mdl.cc:2100                                          
#9  0x0000555bd1d46649 in open_table (thd=0x7f17ef051070, table_list=0x7f1802b67590, ot_ctx=0x7f1802b672e0) at sql/sql_base.cc:2403                                                           
#10 0x0000555bd1d496d4 in open_and_process_table (thd=0x7f17ef051070, tables=0x7f1802b67590, counter=0x7f1802b67374, flags=0, prelocking_strategy=0x7f1802b673f8, has_prelocking_list=false, o
t_ctx=0x7f1802b672e0) at sql/sql_base.cc:4168                                                                                                                                                 
#11 0x0000555bd1d4a4bd in open_tables (thd=0x7f17ef051070, options=..., start=0x7f1802b67358, counter=0x7f1802b67374, flags=0, prelocking_strategy=0x7f1802b673f8) at sql/sql_base.cc:4627    
#12 0x0000555bd1d4bc22 in open_and_lock_tables (thd=0x7f17ef051070, options=..., tables=0x7f1802b67590, derived=false, flags=0, prelocking_strategy=0x7f1802b673f8) at sql/sql_base.cc:5386   
#13 0x0000555bd1d149a1 in open_and_lock_tables (thd=0x7f17ef051070, tables=0x7f1802b67590, derived=false, flags=0) at sql/sql_base.h:547                                                      
#14 0x0000555bd1f4d7d2 in rpl_slave_state::record_gtid (this=0x7f1808047c00, thd=0x7f17ef051070, gtid=0x7f1802b67be0, sub_id=20, rgi=0x7f17f0c1a800, in_statement=false) at sql/rpl_gtid.cc:55
8                                                                                                                                                                                             
#15 0x0000555bd20ec9a8 in Xid_log_event::do_apply_event (this=0x7f17f0c7b670, rgi=0x7f17f0c1a800) at sql/log_event.cc:7703                                                                    
#16 0x0000555bd1d068ad in Log_event::apply_event (this=0x7f17f0c7b670, rgi=0x7f17f0c1a800) at sql/log_event.h:1343                                                                            
#17 0x0000555bd1cfc2f2 in apply_event_and_update_pos_apply (ev=0x7f17f0c7b670, thd=0x7f17ef051070, rgi=0x7f17f0c1a800, reason=0) at sql/slave.cc:3482                                         
#18 0x0000555bd1cfc792 in apply_event_and_update_pos_for_parallel (ev=0x7f17f0c7b670, thd=0x7f17ef051070, rgi=0x7f17f0c1a800) at sql/slave.cc:3626                                            
#19 0x0000555bd1f529d7 in rpt_handle_event (qev=0x7f17f0c6e270, rpt=0x7f17f0c7ece0) at sql/rpl_parallel.cc:50                                                                                 
#20 0x0000555bd1f5584f in handle_rpl_parallel_thread (arg=0x7f17f0c7ece0) at sql/rpl_parallel.cc:1274                                                                                         
#21 0x0000555bd2305f93 in pfs_spawn_thread (arg=0x7f17f0c58270) at storage/perfschema/pfs.cc:1868                                                                                             
#22 0x00007f18095d46db in start_thread (arg=0x7f1802b68700) at pthread_create.c:463                                                                                                           
#23 0x00007f1808bd6a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95   



 Comments   
Comment by Sachin Setiya (Inactive) [ 2020-08-03 ]

So the reason it fails in 10.1 to 10.3 but not in 10.4
in 10.4 onwards

(rr) p mdl_request->type
$10 = MDL_SHARED_NO_WRITE

in 10.1 to 10.3

(rr) p mdl_request->type 
$7 = MDL_INTENTION_EXCLUSIVE

Comment by Sachin Setiya (Inactive) [ 2020-08-03 ]

After removing the assert I am getting same hang as MDEV-23089 , and after applying patch for 23089 there is no hang.

Comment by Sachin Setiya (Inactive) [ 2020-08-03 ]

Actually after applying the patch for 23089 , I dont even need to remove assert.

Comment by Sachin Setiya (Inactive) [ 2020-08-03 ]

So when in the worker thread when I get the assert , I checked for rpl_parallel_entry->force_abort and it was true, this explains why by applying the patch for Mdev-23089 I am no longer getting crash.

Closing it as a duplicate of MDEV-23089.

Comment by Sachin Setiya (Inactive) [ 2020-08-03 ]

Duplicate of MDEV-23089

Generated at Thu Feb 08 09:21:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.