[MDEV-6225] Idle replication slave keeps crashing. Created: 2014-05-10  Updated: 2014-06-30  Resolved: 2014-06-30

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.5.37-galera
Fix Version/s: 5.5.39-galera, 5.5.39

Type: Bug Priority: Minor
Reporter: Eric Webster Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: galera, xtradb
Environment:

Debian 7.5, kernel 3.2.54-2, 24 core Xeon E5645 @ 2.40 GHz, 48GB RAM
Running mysqld_multi with a dozen instances, the crashing one has a 10G buffer pool.


Attachments: HTML File core-2014-05-12     File core-2014-05-21.tgz     HTML File core-dumps     File crashes-2014-05-23-and-25.tgz     File mysqld.err.2014-05-12    

 Description   

I'm using mysqld_multi to run many instances of mysql on the same machine. Each instance is in a 3 node Galera cluster, so there are three physical boxes running a dozen instances each comprising a dozen clusters.

One instance will always crash given enough time. Sometimes it's less than a day, sometimes it runs for a week, but it always eventually crashes. That instance is no longer in a cluster as I'm trying to troubleshoot it but Galera is still loaded. I've tried not loading the provider as well with the same results.

It is just replicating and not serving any real queries/traffic. I haven't been able to narrow it down to a specific table or query unfortunately. Today was the first time I had another instance crash ever (it's been running for several months). What does stand out though, is that it's always this host that is actually being a slave that crashes. I have not had the other instances of the cluster crash when it was clustered. It's always the one acting as a slave. Someone on #maria IRC mentioned deadlocks with replication and show slave status but I can't confirm anything.

I was encouraged to submit a report and include my core dump traces. The 3rd dump labelled mysql-13 is the new crash I was referring to. The other two are from the same instance that has been repeatedly crashing. When it does crash, I rebuild the data fresh. I have the same data on a host that is running mariadb-server-5.5 (not galera) and has never crashed. It's actually what I use to rebuild/reseed this system from when it crashes.



 Comments   
Comment by Jan Lindström (Inactive) [ 2014-05-10 ]

Hi,

Could you also attach here full unedited error log (at least one) from crashing server.

R: Jan

Comment by Eric Webster [ 2014-05-12 ]

I did not save them unfortunately, only the core dumps. I can post one as soon as it happens again though.

Comment by Eric Webster [ 2014-05-12 ]

New core backtrace attached and associated error log.

Comment by Eric Webster [ 2014-05-21 ]

Another core backtrace and associated error log.

Comment by Eric Webster [ 2014-05-27 ]

Two more crashes from the weekend. I'm not running it on two machines (identical hardware and software and data) and they are both crashing, but at different times. If it was a bad query or something, you'd think it would crash both.

Comment by Eric Webster [ 2014-06-17 ]

I'm still collecting core dumps from these slaves but I've stopped posting them since there hasn't been any reply. If they are still useful, let me know and I'll continue to post them.

Comment by Jan Lindström (Inactive) [ 2014-06-30 ]

revno: 3506
committer: Jan Lindström <jplindst@mariadb.org>
branch nick: maria-5.5-galera
timestamp: Mon 2014-06-30 14:02:54 +0300
message:
MDEV-6225: Idle replication slave keeps crashing.

Analysis: Based on crashed the buffer pool instance identifier is
not correct on block to be freed. Add LRU list mutex holding
on functions calling free and add additional safety checks.

Comment by Jan Lindström (Inactive) [ 2014-06-30 ]

5.5:

revno: 4221
committer: Jan Lindström <jplindst@mariadb.org>
branch nick: 5.5
timestamp: Mon 2014-06-30 14:06:28 +0300
message:
MDEV-6225: Idle replication slave keeps crashing.

Analysis: Based on crashed the buffer pool instance identifier is
not correct on block to be freed. Add LRU list mutex holding
on functions calling free and add additional safety checks.

Generated at Thu Feb 08 07:10:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.