[MDEV-6751] possible deadlock with ALTER + threadpool + events - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.0.13
Fix Version/s: 10.0.14
Component/s: Data Definition - Alter Table, Events, Storage Engine - InnoDB
Labels:
None
Environment:
Ubuntu 12.04 LTS Linux db1062 3.2.0-60-generic #91-Ubuntu SMP
Server version: 10.0.13-MariaDB-log Source distribution

Description

Am running the following on a slave:

Largish (24h, 600M rows, 200G) ALTER TABLE
Events with INFORMATION_SCHEMA queries
Threadpool pool-of-threads active
Replication active
No other significant traffic

After several hours, MariaDB locks up with 0% CPU and disk activity, and no response on existing or new connections on port, extra_port, or socket.

Attached are gdb backtraces for two occurrences, examples of the ALTER and the INFORMATION_SCHEMA activity, and other info. Would appreciate any insight from devs to identify the deadlock, and to narrow down the variables for a test case that isn't 200G.

Am presently trialing the ALTER outside the threadpool using the extra_port, with all other settings unchanged.

Other notes:

It doesn't seem to be a thread pool overload, as there aren't enough threads in the backtrace.
The INFORMATION_SCHEMA event traffic uses GET_LOCK to serialize some activity and prevent pile-up.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

db1062_lockup_gdb.2.log
172 kB
2014-09-17 17:45
db1062_lockup_gdb.3.log
183 kB
2014-09-18 06:29
db1062_lockup_gdb.log
162 kB
2014-09-17 17:45
db1062_lockup.ALTER.txt
4 kB
2014-09-17 17:45
db1062_lockup.cmake.txt
0.6 kB
2014-09-17 17:45
db1062_lockup.global_vars.txt
325 kB
2014-09-17 17:45
db1062_lockup.i_s_query.txt
0.4 kB
2014-09-17 17:45
db1062_lockup.processlist.txt
13 kB
2014-09-18 06:29

Activity

Ascending order - Click to sort in descending order

View 4 older comments

Sean Pringle added a comment - 2014-09-24 04:21 - edited

Tried disabling thread pool, replication, other traffic, all to no effect. The problem recurred. Each time after restart I waited until transactions had completed rollback.

Observing the threads stuck in buf_, mtr_, and log_ calls I tried reverting to a single buffer pool instance, innodb_buffer_pool_instances=1 (which matches our 5.5 config). That allowed the ALTER to complete normally without a hiccup.

A flushing lock-cycle of some sort?

Sean Pringle added a comment - 2014-09-24 04:21 - edited Tried disabling thread pool, replication, other traffic, all to no effect. The problem recurred. Each time after restart I waited until transactions had completed rollback. Observing the threads stuck in buf_, mtr_, and log_ calls I tried reverting to a single buffer pool instance, innodb_buffer_pool_instances=1 (which matches our 5.5 config). That allowed the ALTER to complete normally without a hiccup. A flushing lock-cycle of some sort?

Elena Stepanova added a comment - 2014-09-24 10:48

If you can experiment with your instance, could you maybe try setting innodb_buffer_pool_instances to 1?
We are currently investigating a problem which looks related to multiple buffer pool instances. I am not sure at all that your deadlock is the same issue, but it won't hurt to try.

Elena Stepanova added a comment - 2014-09-24 10:48 If you can experiment with your instance, could you maybe try setting innodb_buffer_pool_instances to 1? We are currently investigating a problem which looks related to multiple buffer pool instances. I am not sure at all that your deadlock is the same issue, but it won't hurt to try.

Jan Lindström (Inactive) added a comment - 2014-09-24 13:47

Do you see long semaphore wait warnings/errors on error log ?

Jan Lindström (Inactive) added a comment - 2014-09-24 13:47 Do you see long semaphore wait warnings/errors on error log ?

Sean Pringle added a comment - 2014-09-24 16:05

@Elena, yes innodb_buffer_pool_instances=1 helped. See comment above.

@Jan, no long semaphore warnings, and no 600 second abort.

Sean Pringle added a comment - 2014-09-24 16:05 @Elena, yes innodb_buffer_pool_instances=1 helped. See comment above. @Jan, no long semaphore warnings, and no 600 second abort.

Elena Stepanova added a comment - 2014-09-24 16:09

Elena, yes innodb_buffer_pool_instances=1 helped. See comment above.

Sorry, I somehow missed that part of the comment. Thanks.

Elena Stepanova added a comment - 2014-09-24 16:09 Elena, yes innodb_buffer_pool_instances=1 helped. See comment above. Sorry, I somehow missed that part of the comment. Thanks.

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Sean Pringle

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2014-09-17 17:45

Updated:: 2014-10-24 14:14

Resolved:: 2014-10-24 14:14

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration