Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12496

mtflush thread's hang cause mysqld crash

Details

    Description

      Problem:
      There is a dead lock between sql thread , page cleaner work thread and page cleaner worker thread, there are three threads involved.

      thread 8 (page cleaner coordinator thread) wait for -> thread 5 (mtflush_io_thread) wait for -> thread 181 (handle_rpl_parallel_thread) [waiting for free block]

      For detail stack info, please refer to the attachment. [gdb.txt]

      Analyze:
      There is a problem between Pager cleaner Coordinator thread and worker threads (mtflush_io_thread) . that is, coordinator may miss os_event_set from mtflush_io_thread and caused mysqld to crash, take the situation bellow for example:

      1) Coordinator produce work items for mtflush_io_thread ;
      562 for(i=0;i<buf_pool_inst; i++)

      { 563 work_item[i].tsk = MT_WRK_WRITE; 564 work_item[i].wr.buf_pool = buf_pool_from_array(i); 565 work_item[i].wr.flush_type = flush_type; 566 work_item[i].wr.min = min_n; 567 work_item[i].wr.lsn_limit = lsn_limit; 568 work_item[i].wi_status = WRK_ITEM_UNSET; 569 work_item[i].wheap = work_heap; 570 work_item[i].rheap = reply_heap; 571 work_item[i].n_flushed = 0; 572 work_item[i].n_evicted = 0; 573 work_item[i].id_usr = 0; 574 575 ib_wqueue_add(mtflush_ctx->wq, 576 (void *)(work_item + i), 577 work_heap); 578 }

      579

      2) Consumer thread consume thread and send event to Coordinator thread that hasn't call os_event_wait;

      3) Coordinator thread call os_event_wait to collect status produced by mtflush_io_thread, but the events sent by mtflush_io_thread has gone before.

      4) Coordinator thread will call ib_wqueue_wait(mtflush_ctx->wr_cq) and won't produce work_item of flush jobs, as a result, user threads from client will use of free blocks from buf->free_list;

      5) Because of reasons above, mysqld will crash in some reason.

      the gdb.txt include all threads info.

      Attachments

        Issue Links

          Activity

            Transition Time In Source Status Execution Times
            Sergei Golubchik made transition -
            Open Closed
            329d 11h 38m 1

            People

              jplindst Jan Lindström (Inactive)
              qinglin musazhang
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.