The bug is well reproducible and can be recorded with rr.
The initial state before the bug:
1. "START TRANSACTION WITH CONSISTENT SNAPSHOT" - is active, has not been rolled back yet
2. "DELETE FROM t WHERE b = 20 # trx_1" - is committed
3. "INSERT INTO t VALUES(10, 20) # trx_2" - hanging on "ib_after_row_insert" sync point, waiting for "first_ins_cont" signal
4. "DELETE FROM t WHERE b = 20 # trx_3" - blocked on delete-marked by trx_1 record, waiting for trx_2
5. connection "default" is waiting on 'now WAIT_FOR row_purge_del_mark_finished'
The bug scenario is the following:
1. trx_purge() is invoked by purge coordinator thread, trx_purge_fetch_next_rec() returns NULL because "START TRANSACTION WITH CONSISTENT SNAPSHOT" is still active and the condition (purge_sys.tail.trx_no >= purge_sys.low_limit_no()) is true. trx_purge_attach_undo_recs() and trx_purge() return 0 to srv_do_purge(), see the following call stack:
▾ TrxUndoRsegsIterator::set_next
|
▾ trx_purge_choose_next_log
|
▾ trx_purge_get_next_rec
|
▾ trx_purge_fetch_next_rec
|
▾ trx_purge_attach_undo_recs
|
▾ trx_purge
|
▾ srv_do_purge
|
▾ purge_coordinator_callback_low
|
• purge_coordinator_callback
|
2. Consider the following code in srv_do_purge():
do {
|
...
|
/* Take a snapshot of the history list before purge. */
|
if (!(rseg_history_len = trx_sys.rseg_history_len)) {
|
...
|
}
|
|
n_pages_purged = trx_purge(n_use_threads);
|
...
|
} while (n_pages_purged > 0 && !purge_sys.paused()
|
&& !srv_purge_should_exit(rseg_history_len));
|
return(rseg_history_len);
|
So, trx_purge() purges nothing because there is transaction with view, which can see the changes, made by trx_1 and trx2, i.e. delete-marked and inserted records. trx_sys.rseg_history_len equal to 2. srv_do_purge() returns 2 despite it purged nothing.
3. Consider the following code in purge_coordinator_callback_low():
do
|
{
|
...
|
purge_state.m_history_length= srv_do_purge(&n_total_purged);
|
...
|
/* If last purge batch processed less than 1 page and there is
|
still work to do, delay the next batch by 10ms. Unless
|
someone added work and woke us up. */
|
if (n_total_purged == 0)
|
{
|
...
|
if (!woken_during_purge)
|
{
|
/* Delay next purge round*/
|
purge_coordinator_timer->set_time(10, 0);
|
return;
|
}
|
}
|
}
|
while ((purge_sys.enabled() && !purge_sys.paused()) ||
|
!srv_purge_should_exit(trx_sys.rseg_history_len));
|
After the above code execution purge_state.m_history_length is equal to 2, the timer is set to 10 milliseconds and purge_coordinator_callback() is quitted.
4. When 10 milliseconds is expired, purge_coordinator_timer_callback() is invoked. Take a look the following code in purge_coordinator_timer_callback():
if (purge_state.m_history_length < 5000 &&
|
purge_state.m_history_length == trx_sys.rseg_history_len)
|
/* No new records were added since wait started.
|
Simply wait for new records. The magic number 5000 is an
|
approximation for the case where we have cached UNDO
|
log records which prevent truncate of the UNDO segments.*/
|
return;
|
srv_wake_purge_thread_if_not_active();
|
So, purge_state.m_history_length was set to 2 on step 3. It was set to the value of trx_sys.rseg_history_len. The (purge_state.m_history_length == trx_sys.rseg_history_len) condition is true, that's why purge_coordinator_timer_callback() does not awake purge coordinator thread. Every further purge_coordinator_timer_callback() will not awake purge coordinator thread because the above condition stays true.
The overall logic is the following. Save history length before suspending, and if the history length was not changed while purge coordinator thread was suspended, then just do nothing. But it does not take into account that history can be not purged before coordinator thread is suspended.
The bug is well reproducible and can be recorded with rr.
The initial state before the bug:
1. "START TRANSACTION WITH CONSISTENT SNAPSHOT" - is active, has not been rolled back yet
2. "DELETE FROM t WHERE b = 20 # trx_1" - is committed
3. "INSERT INTO t VALUES(10, 20) # trx_2" - hanging on "ib_after_row_insert" sync point, waiting for "first_ins_cont" signal
4. "DELETE FROM t WHERE b = 20 # trx_3" - blocked on delete-marked by trx_1 record, waiting for trx_2
5. connection "default" is waiting on 'now WAIT_FOR row_purge_del_mark_finished'
The bug scenario is the following:
1. trx_purge() is invoked by purge coordinator thread, trx_purge_fetch_next_rec() returns NULL because "START TRANSACTION WITH CONSISTENT SNAPSHOT" is still active and the condition (purge_sys.tail.trx_no >= purge_sys.low_limit_no()) is true. trx_purge_attach_undo_recs() and trx_purge() return 0 to srv_do_purge(), see the following call stack:
▾ TrxUndoRsegsIterator::set_next
▾ trx_purge_choose_next_log
▾ trx_purge_get_next_rec
▾ trx_purge_fetch_next_rec
▾ trx_purge_attach_undo_recs
▾ trx_purge
▾ srv_do_purge
▾ purge_coordinator_callback_low
• purge_coordinator_callback
2. Consider the following code in srv_do_purge():
...
...
}
n_pages_purged = trx_purge(n_use_threads);
...
&& !srv_purge_should_exit(rseg_history_len));
So, trx_purge() purges nothing because there is transaction with view, which can see the changes, made by trx_1 and trx2, i.e. delete-marked and inserted records. trx_sys.rseg_history_len equal to 2. srv_do_purge() returns 2 despite it purged nothing.
3. Consider the following code in purge_coordinator_callback_low():
{
...
purge_state.m_history_length= srv_do_purge(&n_total_purged);
...
still work to do, delay the next batch by 10ms. Unless
{
...
{
}
}
}
!srv_purge_should_exit(trx_sys.rseg_history_len));
After the above code execution purge_state.m_history_length is equal to 2, the timer is set to 10 milliseconds and purge_coordinator_callback() is quitted.
4. When 10 milliseconds is expired, purge_coordinator_timer_callback() is invoked. Take a look the following code in purge_coordinator_timer_callback():
purge_state.m_history_length == trx_sys.rseg_history_len)
Simply wait for new records. The magic number 5000 is an
approximation for the case where we have cached UNDO
srv_wake_purge_thread_if_not_active();
So, purge_state.m_history_length was set to 2 on step 3. It was set to the value of trx_sys.rseg_history_len. The (purge_state.m_history_length == trx_sys.rseg_history_len) condition is true, that's why purge_coordinator_timer_callback() does not awake purge coordinator thread. Every further purge_coordinator_timer_callback() will not awake purge coordinator thread because the above condition stays true.
The overall logic is the following. Save history length before suspending, and if the history length was not changed while purge coordinator thread was suspended, then just do nothing. But it does not take into account that history can be not purged before coordinator thread is suspended.