[MDEV-12229] Crashes due to parallel replication slave Created: 2017-03-10  Updated: 2018-10-16

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1
Fix Version/s: 10.1

Type: Bug Priority: Major
Reporter: Chris Calender (Inactive) Assignee: Andrei Elkin
Resolution: Unresolved Votes: 1
Labels: None


 Description   

Seeing frequent crashes in MariaDB 10.1.21 with slave_parallel_threads=4 & slave_parallel_mode= optimistic.

If we set slave_parallel_threads=0, the crashing stops.

o this table does get updated very frequently and also with concurrent updates
o the update query would have updated only one record
o this table is very small

CREATE TABLE `t1` (
`id` int(6) NOT NULL,
`count1` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
CONSTRAINT `fk_id` FOREIGN KEY (`id`) REFERENCES `t2` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Here is the most recent stack trace:

170307 18:06:14 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.1.21-MariaDB-enterprise
key_buffer_size=67108864
read_buffer_size=131072
max_used_connections=64
max_threads=1002
thread_count=20
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2266447 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7fcbf8a14008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fcbfa87b070 thread_stack 0x48400
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7fd270256bee]
/usr/sbin/mysqld(handle_fatal_signal+0x305)[0x7fd26fd7c455]
/lib64/libpthread.so.0(+0xf370)[0x7fd26f398370]
/usr/sbin/mysqld(_ZN5TABLE4initEP3THDP10TABLE_LIST+0x150)[0x7fd26fca7040]
/usr/sbin/mysqld(_Z10open_tableP3THDP10TABLE_LISTP18Open_table_context+0x6b4)[0x7fd26fbb2c84]
/usr/sbin/mysqld(_Z11open_tablesP3THDRK14DDL_options_stPP10TABLE_LISTPjjP19Prelocking_strategy+0xfa0)[0x7fd26fbb7130]
/usr/sbin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_ordery15enum_duplicatesbPySB_+0x150)[0x7fd26fc96510]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2d4a)[0x7fd26fbf945a]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x332)[0x7fd26fbffb02]
/usr/sbin/mysqld(_ZN15Query_log_event14do_apply_eventEP14rpl_group_infoPKcj+0x11e8)[0x7fd26fe52c88]
/usr/sbin/mysqld(+0x3b275b)[0x7fd26fb7975b]
/usr/sbin/mysqld(+0x547abe)[0x7fd26fd0eabe]
/usr/sbin/mysqld(handle_rpl_parallel_thread+0xfee)[0x7fd26fd12f2e]
/lib64/libpthread.so.0(+0x7dc5)[0x7fd26f390dc5]
/lib64/libc.so.6(clone+0x6d)[0x7fd26d7af73d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fcc0bed2df3): is an invalid pointer
Connection ID (thread ID): 321778
Status: NOT_KILLED
 
Optimizer switch: index_merge=off,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=on,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
 
We think the query pointer is invalid, but we will try to print it anyway.
Query: UPDATE t1 SET count1=count1+(1) WHERE id=1234
 
170307 18:06:15 mysqld_safe Number of processes running now: 0
170307 18:06:15 mysqld_safe mysqld restarted



 Comments   
Comment by Elena Stepanova [ 2017-04-17 ]

Do you know whether it ever happened with parallel replication, but without optimistic mode?
Does the table only receive this kind of UPDATEs, or can there be other concurrent actions on the table, or on the table it references?

Comment by Chris Calender (Inactive) [ 2017-04-25 ]

As far as I know, they only tested with "optimistic". However, I am confirming with them now about that, if that's the case, I'm asking if they can test both "conservative" and "minimal".

I've also asked the other question as well. I'll post the details here as soon as I hear back.

Comment by Chris Calender (Inactive) [ 2017-05-01 ]

They only used "optimistic mode". They had not tested any other modes. This is a production system, so they cannot re-enable to test and crash the system.

Regarding the updates, yes only concurrent updates happen on this table. Very few inserts.

Generated at Thu Feb 08 07:56:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.