[MDEV-6825] mysqld server "deadlocks" after issuing STOP SLAVE command while replicated DELETE query is executing (freeing items vs killing slave) Created: 2014-10-01 Updated: 2015-05-04 Resolved: 2015-05-04 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication, Storage Engine - InnoDB |
| Affects Version/s: | 5.5.37, 5.5.39 |
| Fix Version/s: | 5.5.43 |
| Type: | Bug | Priority: | Major | ||||||||||||||||
| Reporter: | Maciej Zalewski | Assignee: | Sergei Golubchik | ||||||||||||||||
| Resolution: | Fixed | Votes: | 0 | ||||||||||||||||
| Labels: | innodb, replication | ||||||||||||||||||
| Environment: |
Linux 2.6.32-5-amd64 #1 SMP Fri May 10 08:43:19 UTC 2013 x86_64 GNU/Linux > show global variables like '%version%';
------------------------
|
||||||||||||||||||
| Attachments: |
|
| Description |
|
Hello, This problem I noticed after replacing MySQL 5.5.x with MariaDB 5.5. It is possible to render MariaDB instance unusable by putting it into some kind of internal "deadlock state": How to reproduce: 1. Use master-slave setup with statement based replication and InnoDB as storage engine. Effect:
Last entry in mysq.err:
Output head from "SHOW PROCESSLIST" (after "deadlock state" is in place for a few hours here, normally our DELETE queries take much less time to complete):
(full processlist attached) Output head from running Poor Man's Profiler on mysqld in the "deadlock state":
(full output attached) |
| Comments |
| Comment by Elena Stepanova [ 2014-10-01 ] | |||||||||||||||||||||||||||
|
Hi, Could you please also attach your cnf files from the master and slave or output of SHOW GLOBAL VARIABLES if you so prefer? Thanks. | |||||||||||||||||||||||||||
| Comment by Maciej Zalewski [ 2014-10-02 ] | |||||||||||||||||||||||||||
|
Hello, I've attached my.cnf files from both master and slave. binlog_format is set to 'STATEMENT' Problem occured on a few servers with different database schema. These are deletes that affect multiple rows (up to 1000 rows per delete), In this particular case rows in delete query are specified by primary key, which consists of DELETE FROM t1 WHERE (c1=3345 AND c2='1234567890ab' ) OR (c1=111111 AND c2='abcABCabc+++' ) OR ... Volume of this particular table is varying every day between 20.000 and 500.000 with many rows being inserted and deleted. | |||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2014-10-15 ] | |||||||||||||||||||||||||||
|
I've been trying to reproduce the problem, and haven't succeeded so far; but here are two observations: 1) with 500,000 rows and a 1000-value DELETE statement, even on my modest local machine it's nearly impossible to catch the DELETE statement, the DELETE finishes almost momentarily, generally in less than a second. I had to raise the numbers to 1,000,000 rows and 10,000 values correspondingly until I could reliably issue STOP SLAVE while DELETE was running; 2) even with these increased numbers, I cannot actually see the state 'Freeing items', neither when the event is executed normally, nor when STOP SLAVE is waiting for it. I only see 'updating', apparently 'freeing items' is too fast. So, the questions: How long does the DELETE take normally, when you don't run STOP SLAVE? When you reproduce the problem and run STOP SLAVE, do you do it when the DELETE is already in 'Freeing items' state, or does it reach it later? Thanks. | |||||||||||||||||||||||||||
| Comment by Maciej Zalewski [ 2014-10-27 ] | |||||||||||||||||||||||||||
|
Unfortunately it seems I am not able to replicate the issue any longer for some reason. As for your questions: 1) The delete takes about 10-20 seconds to complete. This is one of the smaller table in the database and not so frequently used (a serie of INSERTs once an hour, then a SELECT and a serie of DELETEs also once per hour). There are other tables, much bigger in size, accessed many times per second so I guess the table data and index pages get rotated out of the innodb buffer pool pretty quickly. The table size:
The database size:
2) The STOP SLAVE was originally issued by a crontab script. I am no longer able to reproduce it myself for some reason. | |||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2015-05-04 ] | |||||||||||||||||||||||||||
|
Given that here we also had concurrent execution of STOP SLAVE and SHOW GLOBAL STATUS, I think it's reasonable to believe it's the same issue as |