The bug is well reproducible and can be recorded with rr.
The initial state before the bug:
1. "START TRANSACTION WITH CONSISTENT SNAPSHOT" - is active, has not been rolled back yet
2. "DELETE FROM t WHERE b = 20 # trx_1" - is committed
3. "INSERT INTO t VALUES(10, 20) # trx_2" - hanging on "ib_after_row_insert" sync point, waiting for "first_ins_cont" signal
4. "DELETE FROM t WHERE b = 20 # trx_3" - blocked on delete-marked by trx_1 record, waiting for trx_2
5. connection "default" is waiting on 'now WAIT_FOR row_purge_del_mark_finished'
The bug scenario is the following:
1. trx_purge() is invoked by purge coordinator thread, trx_purge_fetch_next_rec() returns NULL because "START TRANSACTION WITH CONSISTENT SNAPSHOT" is still active and the condition (purge_sys.tail.trx_no >= purge_sys.low_limit_no()) is true. trx_purge_attach_undo_recs() and trx_purge() return 0 to srv_do_purge(), see the following call stack:
▾ TrxUndoRsegsIterator::set_next
▾ trx_purge_choose_next_log
▾ trx_purge_get_next_rec
▾ trx_purge_fetch_next_rec
▾ trx_purge_attach_undo_recs
▾ trx_purge
▾ srv_do_purge
▾ purge_coordinator_callback_low
• purge_coordinator_callback
2. Consider the following code in srv_do_purge():
do {
...
/* Take a snapshot of the history list before purge. */
if (!(rseg_history_len = trx_sys.rseg_history_len)) {
...
}
n_pages_purged = trx_purge(n_use_threads);
...
} while (n_pages_purged > 0 && !purge_sys.paused()
&& !srv_purge_should_exit(rseg_history_len));
return(rseg_history_len);
So, trx_purge() purges nothing because there is transaction with view, which can see the changes, made by trx_1 and trx2, i.e. delete-marked and inserted records. trx_sys.rseg_history_len equal to 2. srv_do_purge() returns 2 despite it purged nothing.
3. Consider the following code in purge_coordinator_callback_low():
After the above code execution purge_state.m_history_length is equal to 2, the timer is set to 10 milliseconds and purge_coordinator_callback() is quitted.
4. When 10 milliseconds is expired, purge_coordinator_timer_callback() is invoked. Take a look the following code in purge_coordinator_timer_callback():
Simply wait for new records. The magic number 5000 is an
approximation for the case where we have cached UNDO
log records which prevent truncate of the UNDO segments.*/
return;
srv_wake_purge_thread_if_not_active();
So, purge_state.m_history_length was set to 2 on step 3. It was set to the value of trx_sys.rseg_history_len. The (purge_state.m_history_length == trx_sys.rseg_history_len) condition is true, that's why purge_coordinator_timer_callback() does not awake purge coordinator thread. Every further purge_coordinator_timer_callback() will not awake purge coordinator thread because the above condition stays true.
The overall logic is the following. Save history length before suspending, and if the history length was not changed while purge coordinator thread was suspended, then just do nothing. But it does not take into account that history can be not purged before coordinator thread is suspended.
Vladislav Lesin
added a comment - The bug is well reproducible and can be recorded with rr.
The initial state before the bug:
1. "START TRANSACTION WITH CONSISTENT SNAPSHOT" - is active, has not been rolled back yet
2. "DELETE FROM t WHERE b = 20 # trx_1" - is committed
3. "INSERT INTO t VALUES(10, 20) # trx_2" - hanging on "ib_after_row_insert" sync point, waiting for "first_ins_cont" signal
4. "DELETE FROM t WHERE b = 20 # trx_3" - blocked on delete-marked by trx_1 record, waiting for trx_2
5. connection "default" is waiting on 'now WAIT_FOR row_purge_del_mark_finished'
The bug scenario is the following:
1. trx_purge() is invoked by purge coordinator thread, trx_purge_fetch_next_rec() returns NULL because "START TRANSACTION WITH CONSISTENT SNAPSHOT" is still active and the condition (purge_sys.tail.trx_no >= purge_sys.low_limit_no()) is true. trx_purge_attach_undo_recs() and trx_purge() return 0 to srv_do_purge(), see the following call stack:
▾ TrxUndoRsegsIterator::set_next
▾ trx_purge_choose_next_log
▾ trx_purge_get_next_rec
▾ trx_purge_fetch_next_rec
▾ trx_purge_attach_undo_recs
▾ trx_purge
▾ srv_do_purge
▾ purge_coordinator_callback_low
• purge_coordinator_callback
2. Consider the following code in srv_do_purge():
do {
...
/* Take a snapshot of the history list before purge. */
if (!(rseg_history_len = trx_sys.rseg_history_len)) {
...
}
n_pages_purged = trx_purge(n_use_threads);
...
} while (n_pages_purged > 0 && !purge_sys.paused()
&& !srv_purge_should_exit(rseg_history_len));
return (rseg_history_len);
So, trx_purge() purges nothing because there is transaction with view, which can see the changes, made by trx_1 and trx2, i.e. delete-marked and inserted records. trx_sys.rseg_history_len equal to 2. srv_do_purge() returns 2 despite it purged nothing.
3. Consider the following code in purge_coordinator_callback_low():
do
{
...
purge_state.m_history_length= srv_do_purge(&n_total_purged);
...
/* If last purge batch processed less than 1 page and there is
still work to do, delay the next batch by 10ms. Unless
someone added work and woke us up. */
if (n_total_purged == 0 )
{
...
if (!woken_during_purge)
{
/* Delay next purge round*/
purge_coordinator_timer->set_time( 10 , 0 );
return ;
}
}
}
while ((purge_sys.enabled() && !purge_sys.paused()) ||
!srv_purge_should_exit(trx_sys.rseg_history_len));
After the above code execution purge_state.m_history_length is equal to 2, the timer is set to 10 milliseconds and purge_coordinator_callback() is quitted.
4. When 10 milliseconds is expired, purge_coordinator_timer_callback() is invoked. Take a look the following code in purge_coordinator_timer_callback():
if (purge_state.m_history_length < 5000 &&
purge_state.m_history_length == trx_sys.rseg_history_len)
/* No new records were added since wait started.
Simply wait for new records. The magic number 5000 is an
approximation for the case where we have cached UNDO
log records which prevent truncate of the UNDO segments.*/
return ;
srv_wake_purge_thread_if_not_active();
So, purge_state.m_history_length was set to 2 on step 3. It was set to the value of trx_sys.rseg_history_len. The (purge_state.m_history_length == trx_sys.rseg_history_len) condition is true, that's why purge_coordinator_timer_callback() does not awake purge coordinator thread. Every further purge_coordinator_timer_callback() will not awake purge coordinator thread because the above condition stays true.
The overall logic is the following. Save history length before suspending, and if the history length was not changed while purge coordinator thread was suspended, then just do nothing. But it does not take into account that history can be not purged before coordinator thread is suspended.
marko
In 10.6 717a32154cc6de81d1a0d3a820928b78aa7b1140 it was refactored:
+ /* The magic number 5000 is an approximation for the case where we have
+ cached undo log records which prevent truncate of rollback segments. */
+ wakeup= history_size &&
+ (history_size >= 5000 ||
+ history_size != trx_sys.history_size_approx());
marko
and in 44689eb7d85b05df9ffbc631735447ba96c92e41 (MDEV-32050) this code was removed:
- if (wakeup)
- purge_coordinator_timer->set_time(10, 0);
marko
It might be easiest to remove this test from 10.5 (but not 10.6).
Vladislav Lesin
added a comment - Some useful notes from Marko:
marko
In 10.6 717a32154cc6de81d1a0d3a820928b78aa7b1140 it was refactored:
+ /* The magic number 5000 is an approximation for the case where we have
+ cached undo log records which prevent truncate of rollback segments. */
+ wakeup= history_size &&
+ (history_size >= 5000 ||
+ history_size != trx_sys.history_size_approx());
marko
and in 44689eb7d85b05df9ffbc631735447ba96c92e41 ( MDEV-32050 ) this code was removed:
- if (wakeup)
- purge_coordinator_timer->set_time(10, 0);
marko
It might be easiest to remove this test from 10.5 (but not 10.6).
This was fixed by removing the test innodb.cursor-restore-locking from 10.5 only. The test will remain in 10.6 and later releases, where it is stable.
Marko Mäkelä
added a comment - This was fixed by removing the test innodb.cursor-restore-locking from 10.5 only. The test will remain in 10.6 and later releases, where it is stable.
People
Vladislav Lesin
Vladislav Lesin
Votes:
0Vote for this issue
Watchers:
2Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
[{"id":-1,"name":"My open issues","jql":"assignee = currentUser() AND resolution = Unresolved order by updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":true},{"id":-2,"name":"Reported by me","jql":"reporter = currentUser() order by created DESC","isSystem":true,"sharePermissions":[],"requiresLogin":true},{"id":-4,"name":"All issues","jql":"order by created DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-5,"name":"Open issues","jql":"resolution = Unresolved order by priority DESC,updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-9,"name":"Done issues","jql":"statusCategory = Done order by updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-3,"name":"Viewed recently","jql":"issuekey in issueHistory() order by lastViewed DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-6,"name":"Created recently","jql":"created >= -1w order by created DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-7,"name":"Resolved recently","jql":"resolutiondate >= -1w order by updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-8,"name":"Updated recently","jql":"updated >= -1w order by updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false}]
The bug is well reproducible and can be recorded with rr.
The initial state before the bug:
1. "START TRANSACTION WITH CONSISTENT SNAPSHOT" - is active, has not been rolled back yet
2. "DELETE FROM t WHERE b = 20 # trx_1" - is committed
3. "INSERT INTO t VALUES(10, 20) # trx_2" - hanging on "ib_after_row_insert" sync point, waiting for "first_ins_cont" signal
4. "DELETE FROM t WHERE b = 20 # trx_3" - blocked on delete-marked by trx_1 record, waiting for trx_2
5. connection "default" is waiting on 'now WAIT_FOR row_purge_del_mark_finished'
The bug scenario is the following:
1. trx_purge() is invoked by purge coordinator thread, trx_purge_fetch_next_rec() returns NULL because "START TRANSACTION WITH CONSISTENT SNAPSHOT" is still active and the condition (purge_sys.tail.trx_no >= purge_sys.low_limit_no()) is true. trx_purge_attach_undo_recs() and trx_purge() return 0 to srv_do_purge(), see the following call stack:
▾ TrxUndoRsegsIterator::set_next
▾ trx_purge_choose_next_log
▾ trx_purge_get_next_rec
▾ trx_purge_fetch_next_rec
▾ trx_purge_attach_undo_recs
▾ trx_purge
▾ srv_do_purge
▾ purge_coordinator_callback_low
• purge_coordinator_callback
2. Consider the following code in srv_do_purge():
...
...
}
n_pages_purged = trx_purge(n_use_threads);
...
&& !srv_purge_should_exit(rseg_history_len));
So, trx_purge() purges nothing because there is transaction with view, which can see the changes, made by trx_1 and trx2, i.e. delete-marked and inserted records. trx_sys.rseg_history_len equal to 2. srv_do_purge() returns 2 despite it purged nothing.
3. Consider the following code in purge_coordinator_callback_low():
{
...
purge_state.m_history_length= srv_do_purge(&n_total_purged);
...
still work to do, delay the next batch by 10ms. Unless
{
...
{
}
}
}
!srv_purge_should_exit(trx_sys.rseg_history_len));
After the above code execution purge_state.m_history_length is equal to 2, the timer is set to 10 milliseconds and purge_coordinator_callback() is quitted.
4. When 10 milliseconds is expired, purge_coordinator_timer_callback() is invoked. Take a look the following code in purge_coordinator_timer_callback():
purge_state.m_history_length == trx_sys.rseg_history_len)
Simply wait for new records. The magic number 5000 is an
approximation for the case where we have cached UNDO
srv_wake_purge_thread_if_not_active();
So, purge_state.m_history_length was set to 2 on step 3. It was set to the value of trx_sys.rseg_history_len. The (purge_state.m_history_length == trx_sys.rseg_history_len) condition is true, that's why purge_coordinator_timer_callback() does not awake purge coordinator thread. Every further purge_coordinator_timer_callback() will not awake purge coordinator thread because the above condition stays true.
The overall logic is the following. Save history length before suspending, and if the history length was not changed while purge coordinator thread was suspended, then just do nothing. But it does not take into account that history can be not purged before coordinator thread is suspended.