[MDEV-12496] mtflush thread's hang cause mysqld crash - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: 10.1.17
Fix Version/s: N/A
Component/s: Replication, Storage Engine - XtraDB
Labels:
Environment:
linux

Description

Problem:
There is a dead lock between sql thread , page cleaner work thread and page cleaner worker thread, there are three threads involved.

thread 8 (page cleaner coordinator thread) wait for -> thread 5 (mtflush_io_thread) wait for -> thread 181 (handle_rpl_parallel_thread) [waiting for free block]

For detail stack info, please refer to the attachment. [gdb.txt]

Analyze:
There is a problem between Pager cleaner Coordinator thread and worker threads (mtflush_io_thread) . that is, coordinator may miss os_event_set from mtflush_io_thread and caused mysqld to crash, take the situation bellow for example:

1) Coordinator produce work items for mtflush_io_thread ;
562 for(i=0;i<buf_pool_inst; i++)

{ 563 work_item[i].tsk = MT_WRK_WRITE; 564 work_item[i].wr.buf_pool = buf_pool_from_array(i); 565 work_item[i].wr.flush_type = flush_type; 566 work_item[i].wr.min = min_n; 567 work_item[i].wr.lsn_limit = lsn_limit; 568 work_item[i].wi_status = WRK_ITEM_UNSET; 569 work_item[i].wheap = work_heap; 570 work_item[i].rheap = reply_heap; 571 work_item[i].n_flushed = 0; 572 work_item[i].n_evicted = 0; 573 work_item[i].id_usr = 0; 574 575 ib_wqueue_add(mtflush_ctx->wq, 576 (void *)(work_item + i), 577 work_heap); 578 }

579

2) Consumer thread consume thread and send event to Coordinator thread that hasn't call os_event_wait;

3) Coordinator thread call os_event_wait to collect status produced by mtflush_io_thread, but the events sent by mtflush_io_thread has gone before.

4) Coordinator thread will call ib_wqueue_wait(mtflush_ctx->wr_cq) and won't produce work_item of flush jobs, as a result, user threads from client will use of free blocks from buf->free_list;

5) Because of reasons above, mysqld will crash in some reason.

the gdb.txt include all threads info.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

gdb.txt
18 kB
2017-04-13 06:40

Issue Links

is duplicated by

MDEV-12722 Maria DB 10.1.16 freeze

Closed

relates to

MDEV-10843 XtraDB Semaphore Stalls with innodb_use_mtflush enabled

Closed

MDEV-14497 rpl.rpl_gtid_reconnect failed in buildbot, lost connection to server

Closed

Activity

People

Assignee:: Jan Lindström (Inactive)

Reporter:: musazhang

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2017-04-13 07:03

Updated:: 2024-07-08 00:57

Resolved:: 2018-03-08 18:42

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.