[MDEV-6225] Idle replication slave keeps crashing. - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Minor
Resolution: Fixed
Affects Version/s: 5.5.37-galera
Fix Version/s: 5.5.39-galera, 5.5.39
Component/s: None
Labels:
- galera
- xtradb
Environment:
Debian 7.5, kernel 3.2.54-2, 24 core Xeon E5645 @ 2.40 GHz, 48GB RAM
Running mysqld_multi with a dozen instances, the crashing one has a 10G buffer pool.

Description

I'm using mysqld_multi to run many instances of mysql on the same machine. Each instance is in a 3 node Galera cluster, so there are three physical boxes running a dozen instances each comprising a dozen clusters.

One instance will always crash given enough time. Sometimes it's less than a day, sometimes it runs for a week, but it always eventually crashes. That instance is no longer in a cluster as I'm trying to troubleshoot it but Galera is still loaded. I've tried not loading the provider as well with the same results.

It is just replicating and not serving any real queries/traffic. I haven't been able to narrow it down to a specific table or query unfortunately. Today was the first time I had another instance crash ever (it's been running for several months). What does stand out though, is that it's always this host that is actually being a slave that crashes. I have not had the other instances of the cluster crash when it was clustered. It's always the one acting as a slave. Someone on #maria IRC mentioned deadlocks with replication and show slave status but I can't confirm anything.

I was encouraged to submit a report and include my core dump traces. The 3rd dump labelled mysql-13 is the new crash I was referring to. The other two are from the same instance that has been repeatedly crashing. When it does crash, I rebuild the data fresh. I have the same data on a host that is running mariadb-server-5.5 (not galera) and has never crashed. It's actually what I use to rebuild/reseed this system from when it crashes.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

core-2014-05-12
6 kB
2014-05-12 23:06
core-2014-05-21.tgz
4 kB
2014-05-21 22:36
core-dumps
12 kB
2014-05-10 01:15
crashes-2014-05-23-and-25.tgz
3 kB
2014-05-27 19:54
mysqld.err.2014-05-12
5 kB
2014-05-12 23:06

Activity

Ascending order - Click to sort in descending order

Jan Lindström (Inactive) added a comment - 2014-05-10 10:08

Hi,

Could you also attach here full unedited error log (at least one) from crashing server.

R: Jan

Jan Lindström (Inactive) added a comment - 2014-05-10 10:08 Hi, Could you also attach here full unedited error log (at least one) from crashing server. R: Jan

Eric Webster added a comment - 2014-05-12 20:28

I did not save them unfortunately, only the core dumps. I can post one as soon as it happens again though.

Eric Webster added a comment - 2014-05-12 20:28 I did not save them unfortunately, only the core dumps. I can post one as soon as it happens again though.

Eric Webster added a comment - 2014-05-12 23:06

New core backtrace attached and associated error log.

Eric Webster added a comment - 2014-05-12 23:06 New core backtrace attached and associated error log.

Eric Webster added a comment - 2014-05-21 22:36

Another core backtrace and associated error log.

Eric Webster added a comment - 2014-05-21 22:36 Another core backtrace and associated error log.

Eric Webster added a comment - 2014-05-27 19:54

Two more crashes from the weekend. I'm not running it on two machines (identical hardware and software and data) and they are both crashing, but at different times. If it was a bad query or something, you'd think it would crash both.

Eric Webster added a comment - 2014-05-27 19:54 Two more crashes from the weekend. I'm not running it on two machines (identical hardware and software and data) and they are both crashing, but at different times. If it was a bad query or something, you'd think it would crash both.

Eric Webster added a comment - 2014-06-17 20:01

I'm still collecting core dumps from these slaves but I've stopped posting them since there hasn't been any reply. If they are still useful, let me know and I'll continue to post them.

Eric Webster added a comment - 2014-06-17 20:01 I'm still collecting core dumps from these slaves but I've stopped posting them since there hasn't been any reply. If they are still useful, let me know and I'll continue to post them.

Jan Lindström (Inactive) added a comment - 2014-06-30 14:06

revno: 3506
committer: Jan Lindström <jplindst@mariadb.org>
branch nick: maria-5.5-galera
timestamp: Mon 2014-06-30 14:02:54 +0300
message:
~~MDEV-6225~~: Idle replication slave keeps crashing.

Analysis: Based on crashed the buffer pool instance identifier is
not correct on block to be freed. Add LRU list mutex holding
on functions calling free and add additional safety checks.

Jan Lindström (Inactive) added a comment - 2014-06-30 14:06 revno: 3506 committer: Jan Lindström <jplindst@mariadb.org> branch nick: maria-5.5-galera timestamp: Mon 2014-06-30 14:02:54 +0300 message: MDEV-6225 : Idle replication slave keeps crashing. Analysis: Based on crashed the buffer pool instance identifier is not correct on block to be freed. Add LRU list mutex holding on functions calling free and add additional safety checks.

Jan Lindström (Inactive) added a comment - 2014-06-30 14:06

5.5:

revno: 4221
committer: Jan Lindström <jplindst@mariadb.org>
branch nick: 5.5
timestamp: Mon 2014-06-30 14:06:28 +0300
message:
~~MDEV-6225~~: Idle replication slave keeps crashing.

Analysis: Based on crashed the buffer pool instance identifier is
not correct on block to be freed. Add LRU list mutex holding
on functions calling free and add additional safety checks.

Jan Lindström (Inactive) added a comment - 2014-06-30 14:06 5.5: revno: 4221 committer: Jan Lindström <jplindst@mariadb.org> branch nick: 5.5 timestamp: Mon 2014-06-30 14:06:28 +0300 message: MDEV-6225 : Idle replication slave keeps crashing. Analysis: Based on crashed the buffer pool instance identifier is not correct on block to be freed. Add LRU list mutex holding on functions calling free and add additional safety checks.

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Eric Webster

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2014-05-10 01:15

Updated:: 2014-06-30 14:07

Resolved:: 2014-06-30 14:07

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration