[MDEV-9715] mysqld got signal 11 - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.1.12
Fix Version/s: 10.1.13
Component/s: Galera, Storage Engine - InnoDB
Labels:
None
Environment:
Ubuntu 14.04.4 LTS / 3.13.0-79-generic x86_64 GNU/Linux
Ubuntu 14.04.4 LTS / 4.2.0-30-generic x86_64 GNU/Linux

Description

Hi,

we're running a 5 nodes Galera Cluster, based on MariaDB 10.1.12 (+maria-1~trusty). The only engine being used, is of course InnoDB.

Two days ago it started crashing (one node at a time, very quickly), with the error+stacktrace you can find in the attachment. For some reason, every time we restored the cluster, we saw 3 out of 5 nodes crashing in 5/10 minutes, the fourth one being able to be up and running for 30/45 minutes and the last one running for hours (but in the end crashing with the same error). This is just experimental and we can't figure out why this was happening.

As a few minutes before the first crash a TRUNCATE had been executed (we don't have such a good experience with TRUNCATE DDL in Galera, it's always ending up with deadlocks, so we're trying to replace them with "DELETE FROM" or RENAME + TRUNCATE on an "offline" table), we focused our efforts in recovering what we tought as a corrupted ibd file or InnoDB index (http://dba.stackexchange.com/questions/29870/mysql-innodb-corruption-after-server-crash-during-concurrent-truncate-command).

With no clue at all, as on a new cluster, where we had imported data dumped from the old one with innodb_force_recovery = 1 the issue was still present.

After this, we noticed that we had some kind of bot on a payment gateway page, that was causing the CMS (Prestashop) to execute tens of DELETE and SELECT per second on the same table:

SELECT * FROM `ps_ccpayments` WHERE `id_cart` = 0 LIMIT 1;
DELETE FROM `ps_ccpayments` WHERE `id_cart` = 0;

We saw no INSERT, and under no circumnstance in that table you would have found a row with id_cart = 0, so we should expect those queries always working on an empty datased.

Fixing the code for not executing them, fixed the issue. Let me clarify that the nodes don't have an high load, and there are no resource contraints. MariaDB is configured in order not to be able to fill up node's RAM.

We're not able to replicate the issue. If we manually launch those queries, they execute properly. I think the issue could be releated to:

the extremely high number of concurrent requests of that kind
some kind of 0day, related to a malicious input (I'm saying this because this is happening on a page where we process credit card payments)

If you need any additional detail, just ask please and any help would be really appreciated. Attaching my.cnf as well.

Giorgio

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

my.cnf
2 kB
2016-03-12 10:49
SJ-ERROR.txt
4 kB
2016-03-12 10:26

Issue Links

relates to

MDEV-9498 MariaDB server with Galera replication crashes randomly

Closed

Activity

There are no comments yet on this issue.

People

Assignee:: Nirbhay Choubey (Inactive)

Reporter:: Giorgio Bonfiglio

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2016-03-12 10:49

Updated:: 2016-03-30 04:45

Resolved:: 2016-03-30 04:42

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration