[MDEV-12496] mtflush thread's hang cause mysqld crash Created: 2017-04-13  Updated: 2020-08-25  Resolved: 2018-03-08

Status: Closed
Project: MariaDB Server
Component/s: Replication, Storage Engine - XtraDB
Affects Version/s: 10.1.17
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: musazhang Assignee: Jan Lindström (Inactive)
Resolution: Cannot Reproduce Votes: 1
Labels: buf_LRU_get_free_block, need_feedback, page_cleaner
Environment:

linux


Attachments: Text File gdb.txt    
Issue Links:
Duplicate
is duplicated by MDEV-12722 Maria DB 10.1.16 freeze Closed
Relates
relates to MDEV-10843 XtraDB Semaphore Stalls with innodb_u... Closed
relates to MDEV-14497 rpl.rpl_gtid_reconnect failed in bui... Closed

 Description   

Problem:
There is a dead lock between sql thread , page cleaner work thread and page cleaner worker thread, there are three threads involved.

thread 8 (page cleaner coordinator thread) wait for -> thread 5 (mtflush_io_thread) wait for -> thread 181 (handle_rpl_parallel_thread) [waiting for free block]

For detail stack info, please refer to the attachment. [gdb.txt]

Analyze:
There is a problem between Pager cleaner Coordinator thread and worker threads (mtflush_io_thread) . that is, coordinator may miss os_event_set from mtflush_io_thread and caused mysqld to crash, take the situation bellow for example:

1) Coordinator produce work items for mtflush_io_thread ;
562 for(i=0;i<buf_pool_inst; i++)

{ 563 work_item[i].tsk = MT_WRK_WRITE; 564 work_item[i].wr.buf_pool = buf_pool_from_array(i); 565 work_item[i].wr.flush_type = flush_type; 566 work_item[i].wr.min = min_n; 567 work_item[i].wr.lsn_limit = lsn_limit; 568 work_item[i].wi_status = WRK_ITEM_UNSET; 569 work_item[i].wheap = work_heap; 570 work_item[i].rheap = reply_heap; 571 work_item[i].n_flushed = 0; 572 work_item[i].n_evicted = 0; 573 work_item[i].id_usr = 0; 574 575 ib_wqueue_add(mtflush_ctx->wq, 576 (void *)(work_item + i), 577 work_heap); 578 }

579

2) Consumer thread consume thread and send event to Coordinator thread that hasn't call os_event_wait;

3) Coordinator thread call os_event_wait to collect status produced by mtflush_io_thread, but the events sent by mtflush_io_thread has gone before.

4) Coordinator thread will call ib_wqueue_wait(mtflush_ctx->wr_cq) and won't produce work_item of flush jobs, as a result, user threads from client will use of free blocks from buf->free_list;

5) Because of reasons above, mysqld will crash in some reason.

the gdb.txt include all threads info.



 Comments   
Comment by Jan Lindström (Inactive) [ 2017-07-26 ]

Could you provide your my.cnf file, did you set innodb-use-mtflush=1 ?

Comment by musazhang [ 2017-07-27 ]

yes, innodb-use-mtflush=1

Generated at Thu Feb 08 07:58:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.