[MDEV-6751] possible deadlock with ALTER + threadpool + events Created: 2014-09-17 Updated: 2014-10-24 Resolved: 2014-10-24 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data Definition - Alter Table, Events, Storage Engine - InnoDB |
| Affects Version/s: | 10.0.13 |
| Fix Version/s: | 10.0.14 |
| Type: | Bug | Priority: | Major |
| Reporter: | Sean Pringle | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Ubuntu 12.04 LTS Linux db1062 3.2.0-60-generic #91-Ubuntu SMP |
||
| Attachments: |
|
| Description |
|
Am running the following on a slave:
After several hours, MariaDB locks up with 0% CPU and disk activity, and no response on existing or new connections on port, extra_port, or socket. Attached are gdb backtraces for two occurrences, examples of the ALTER and the INFORMATION_SCHEMA activity, and other info. Would appreciate any insight from devs to identify the deadlock, and to narrow down the variables for a test case that isn't 200G. Am presently trialing the ALTER outside the threadpool using the extra_port, with all other settings unchanged. Other notes:
|
| Comments |
| Comment by Sean Pringle [ 2014-09-18 ] |
|
Using extra_port for the ALTER still locked up, but managed to catch it in action. See attached processlist and gdb trace #3. Similar to https://mariadb.atlassian.net/browse/MDEV-5551 ? The client connections in Command="Killed" and State="Cleaning up" successfully connected according to client logs, and apparently got past authentication Subsequently the client did not show open TCP connections hanging around in lsof -i tcp output. However the server did continue to show open TCP connection in lsof, and did not report anything in err log with log_warnings=2. Notice the REPLACE INTO `heartbeat` query in state Update for many seconds; that is percona toolkit pt-heartbeat writing to an InnoDB table which is usually instant. Attempting SHOW ENGINE INNODB STATUS to investigate simply hung indefinitely. |
| Comment by Elena Stepanova [ 2014-09-22 ] |
|
Hi, What happened to the ALTER query? Who or what killed it, and was it before or after you identified the situation as a deadlock? Thanks. |
| Comment by Sean Pringle [ 2014-09-23 ] |
|
In the example processlist, I killed the ALTER after everything else had already become Command=Killed without known intervention by human or tool. It didn't help. |
| Comment by Elena Stepanova [ 2014-09-24 ] |
|
Could you please check if it has something in common with other problems reported or observed, or it's a new one? |
| Comment by Sean Pringle [ 2014-09-24 ] |
|
Tried disabling thread pool, replication, other traffic, all to no effect. The problem recurred. Each time after restart I waited until transactions had completed rollback. Observing the threads stuck in buf_, mtr_, and log_ calls I tried reverting to a single buffer pool instance, innodb_buffer_pool_instances=1 (which matches our 5.5 config). That allowed the ALTER to complete normally without a hiccup. A flushing lock-cycle of some sort? |
| Comment by Elena Stepanova [ 2014-09-24 ] |
|
If you can experiment with your instance, could you maybe try setting innodb_buffer_pool_instances to 1? |
| Comment by Jan Lindström (Inactive) [ 2014-09-24 ] |
|
Do you see long semaphore wait warnings/errors on error log ? |
| Comment by Sean Pringle [ 2014-09-24 ] |
|
@Elena, yes innodb_buffer_pool_instances=1 helped. See comment above. @Jan, no long semaphore warnings, and no 600 second abort. |
| Comment by Elena Stepanova [ 2014-09-24 ] |
Sorry, I somehow missed that part of the comment. Thanks. |