Details
-
Bug
-
Status: Stalled (View Workflow)
-
Critical
-
Resolution: Unresolved
-
5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
-
None
Description
Check if this is fixable and re-enable this assertion:
/** A wrapper around ReadView::low_limit_no(). */
|
trx_id_t low_limit_no() const
|
{
|
#if 0 /* Unfortunately we don't hold this assertion, see MDEV-22718. */
|
ut_ad(rw_lock_own(&latch, RW_LOCK_S));
|
#endif
|
return view.low_limit_no();
|
}
|
Attachments
Issue Links
- relates to
-
MDEV-16260 Scale the purge effort according to the workload
-
- Open
-
-
MDEV-30671 innodb_undo_log_truncate=ON fails to wait for purge of transaction history
-
- Closed
-
-
MDEV-35227 Executing CHECK TABLE...EXTENDED right after server startup may attempt to access too old history
-
- Confirmed
-
-
MDEV-31234 InnoDB does not free UNDO after the fix of MDEV-30671, thus shared tablespace (ibdata1) may grow indefinitely for no good reason
-
- Closed
-
Activity
The following change would enable the assertion:
diff --git a/storage/innobase/include/trx0purge.h b/storage/innobase/include/trx0purge.h
|
index 38e9e7207a2..61903d9a3e4 100644
|
--- a/storage/innobase/include/trx0purge.h
|
+++ b/storage/innobase/include/trx0purge.h
|
@@ -255,9 +255,7 @@ class purge_sys_t
|
/** A wrapper around ReadView::low_limit_no(). */
|
trx_id_t low_limit_no() const
|
{
|
-#if 0 /* Unfortunately we don't hold this assertion, see MDEV-22718. */
|
- ut_ad(rw_lock_own(&latch, RW_LOCK_S));
|
-#endif
|
+ ut_ad(rw_lock_own_flagged(&latch, RW_LOCK_FLAG_X | RW_LOCK_FLAG_S));
|
return view.low_limit_no();
|
}
|
/** A wrapper around trx_sys_t::clone_oldest_view(). */
|
diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
|
index c37a8b98cbd..7c03a232efe 100644
|
--- a/storage/innobase/trx/trx0purge.cc
|
+++ b/storage/innobase/trx/trx0purge.cc
|
@@ -553,6 +553,8 @@ function is called, the caller must not have any latches on undo log pages!
|
*/
|
static void trx_purge_truncate_history()
|
{
|
+ rw_lock_x_lock(&purge_sys.latch);
|
+
|
ut_ad(purge_sys.head <= purge_sys.tail);
|
purge_sys_t::iterator& head = purge_sys.head.commit
|
? purge_sys.head : purge_sys.tail;
|
@@ -570,6 +572,8 @@ static void trx_purge_truncate_history()
|
}
|
}
|
|
+ rw_lock_x_unlock(&purge_sys.latch);
|
+
|
if (srv_undo_tablespaces_active < 2) {
|
return;
|
}
|
@@ -980,7 +984,9 @@ trx_purge_get_next_rec(
|
mtr_t mtr;
|
|
ut_ad(purge_sys.next_stored);
|
+ ut_d(rw_lock_s_lock(&purge_sys.latch));
|
ut_ad(purge_sys.tail.trx_no() < purge_sys.low_limit_no());
|
+ ut_d(rw_lock_s_unlock(&purge_sys.latch));
|
|
const ulint space = purge_sys.rseg->space->id;
|
const uint32_t page_no = purge_sys.page_no;
|
@@ -1070,11 +1076,16 @@ trx_purge_fetch_next_rec(
|
}
|
}
|
|
+ ut_d(rw_lock_s_lock(&purge_sys.latch));
|
+
|
if (purge_sys.tail.trx_no() >= purge_sys.low_limit_no()) {
|
|
+ ut_d(rw_lock_s_unlock(&purge_sys.latch));
|
return(NULL);
|
}
|
|
+ ut_d(rw_lock_s_unlock(&purge_sys.latch));
|
+
|
/* fprintf(stderr, "Thread %lu purging trx %llu undo record %llu\n",
|
os_thread_get_curr_id(), iter->trx_no, iter->undo_no); */
|
|
But, I am not sure if trx_purge_truncate_history() really needs to hold the purge_sys.latch exclusively for so long. Also, the above patch is not removing the above mentioned code snippet.
It seems that trx_purge_truncate_history() is being executed by the purge coordinator only while the purge workers are inactive, and therefore there might not be any correctness issue.
In MDEV-16260, we might want to eliminate the purge coordinator altogether. In that case, this bug would definitely have to be fixed.
As far as I can tell, purge_sys.head and purge_sys.tail are only accessed by the purge coordinator, never by purge workers.
The trx_purge_fetch_next_rec() and trx_purge_get_next_rec() are executed by the purge coordinator before any workers are launched. The function trx_purge_truncate_history() is executed by the purge coordinator after all workers had completed (trx_purge_wait_for_workers_to_complete()). All this code will be executed in the purge coordinator, which ‘owns’ most of purge_sys. Hence, the lack of locking does not seem to cause any obvious correctness problem.
The main purpose of purge_sys.latch is that acquiring it will freeze purge_sys.view. That is being done in ROLLBACK and MVCC reads.
That said, I do find the following code suspicious:
static void trx_purge_truncate_history() |
{
|
ut_ad(purge_sys.head <= purge_sys.tail);
|
purge_sys_t::iterator &head= purge_sys.head.trx_no
|
? purge_sys.head : purge_sys.tail;
|
|
if (head.trx_no >= purge_sys.low_limit_no()) |
{
|
/* This is sometimes necessary. TODO: find out why. */ |
head.trx_no= purge_sys.low_limit_no();
|
head.undo_no= 0;
|
}
|
It may have something to do with TrxUndoRsegsIterator::set_next() (which is executed solely in the purge coordinator), which is copying something from purge_sys.rseg and then releasing the purge_sys.rseg->latch.
I tested the following patch on 10.6.
diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
|
index c0fafe1ec6b..ef88ebb922a 100644
|
--- a/storage/innobase/trx/trx0purge.cc
|
+++ b/storage/innobase/trx/trx0purge.cc
|
@@ -607,6 +607,9 @@ TRANSACTIONAL_TARGET static void trx_purge_truncate_history()
|
|
if (head.trx_no >= purge_sys.low_limit_no())
|
{
|
+ ut_a(head.undo_no
|
+ ? head.trx_no < purge_sys.low_limit_no()
|
+ : head.trx_no == purge_sys.low_limit_no());
|
/* This is sometimes necessary. TODO: find out why. */
|
head.trx_no= purge_sys.low_limit_no();
|
head.undo_no= 0; |
It would often fail in a few tests:
10.6 5e270ca28d05acb72c6aec9f1d37f9610fc11a0e |
Failing test(s): innodb.instant_alter_purge innodb.innodb_bug84958 innodb.foreign_key
|
In one failure, the crashing thread (the purge coordinator) had advanced the position by itself:
10.6 5e270ca28d05acb72c6aec9f1d37f9610fc11a0e |
(rr) continue
|
Continuing.
|
|
Thread 2 hit Hardware watchpoint 4: -location purge_sys.tail.trx_no
|
|
Old value = 36
|
New value = 39
|
TrxUndoRsegsIterator::set_next (this=0x557e00a6ec50 <purge_sys+144>) at /mariadb/10.6/storage/innobase/trx/trx0purge.cc:125
|
125 purge_sys.hdr_offset = purge_sys.rseg->last_offset();
|
(rr) continue
|
Continuing.
|
|
Thread 2 hit Hardware watchpoint 1: -location head.trx_no
|
|
Old value = 35
|
New value = 39
|
trx_purge_attach_undo_recs (n_purge_threads=n_purge_threads@entry=4) at /mariadb/10.6/storage/innobase/trx/trx0purge.cc:1225
|
1225 purge_rec.undo_rec = trx_purge_fetch_next_rec(
|
(rr) continue
|
Continuing.
|
|
Thread 2 received signal SIGABRT, Aborted.
|
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
|
44 ./nptl/pthread_kill.c: Tiedostoa tai hakemistoa ei ole.
|
(rr) frame 5
|
#5 0x0000557dffec2414 in trx_purge_truncate_history ()
|
at /mariadb/10.6/storage/innobase/trx/trx0purge.cc:610
|
610 ut_a(head.undo_no
|
(rr) p head
|
$4 = (purge_sys_t::iterator &) @0x557e00a6ec20: {trx_no = 39, undo_no = 0}
|
(rr) p purge_sys.view
|
$5 = {m_low_limit_id = 36, m_up_limit_id = 36,
|
m_ids = std::vector of length 0, capacity 32, m_low_limit_no = 36}
|
Because the purge coordinator is in sole control of all this, this does not look like a race condition. It would be good to clean it up anyway.
I debugged the server execution corresponding to the following section of innodb.instant_alter_purge:
connection prevent_purge; |
COMMIT; |
START TRANSACTION WITH CONSISTENT SNAPSHOT; |
connection default; |
|
ALTER TABLE t1 ADD COLUMN extra TINYINT UNSIGNED NOT NULL DEFAULT 42; |
let $wait_all_purged= 1;
|
--source include/wait_all_purged.inc |
The assertion would fail during the wait. The purge coordinator would be invoked after the ALTER TABLE transaction commit invoked trx_purge_add_undo_to_history(), which invoked rseg->set_last_commit(86,39). The purge coordinator would first update purge_sys.view.m_low_limit_no to 40 (corresponding to the time after the ALTER TABLE) and then adjust it to 36 corresponding to the read view of the older started transaction. Shortly after this, the assertion failed within the same invocation of purge_coordinator_callback(). The state would be as follows:
(rr) p purge_sys.tail
|
$10 = {trx_no = 39, undo_no = 0}
|
(rr) p purge_sys.head
|
$11 = {trx_no = 39, undo_no = 0}
|
(rr) p purge_sys.view
|
$12 = {m_low_limit_id = 36, m_up_limit_id = 36,
|
m_ids = std::vector of length 0, capacity 32, m_low_limit_no = 36}
|
Simply put, TrxUndoRsegsIterator::set_next() is ‘overshooting’ the purge_sys.view when updating purge_sys.tail. I tried the following patch:
diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
|
index c0fafe1ec6b..8ee1bd688cd 100644
|
--- a/storage/innobase/trx/trx0purge.cc
|
+++ b/storage/innobase/trx/trx0purge.cc
|
@@ -110,7 +110,7 @@ TRANSACTIONAL_INLINE inline bool TrxUndoRsegsIterator::set_next()
|
ut_ad(purge_sys.rseg->space->id == TRX_SYS_SPACE
|
|| srv_is_undo_tablespace(purge_sys.rseg->space->id));
|
|
- trx_id_t last_trx_no, tail_trx_no;
|
+ trx_id_t last_trx_no;
|
{
|
#ifdef SUX_LOCK_GENERIC
|
purge_sys.rseg->latch.rd_lock(SRW_LOCK_CALL);
|
@@ -119,9 +119,7 @@ TRANSACTIONAL_INLINE inline bool TrxUndoRsegsIterator::set_next()
|
{purge_sys.rseg->latch};
|
#endif
|
last_trx_no = purge_sys.rseg->last_trx_no();
|
- tail_trx_no = purge_sys.tail.trx_no;
|
|
- purge_sys.tail.trx_no = last_trx_no;
|
purge_sys.hdr_offset = purge_sys.rseg->last_offset();
|
purge_sys.hdr_page_no = purge_sys.rseg->last_page_no;
|
|
@@ -131,10 +129,14 @@ TRANSACTIONAL_INLINE inline bool TrxUndoRsegsIterator::set_next()
|
}
|
|
/* Only the purge coordinator task will access
|
- purge_sys.rseg_iter or purge_sys.hdr_page_no. */
|
+ purge_sys.rseg_iter, purge_sys.hdr_page_no,
|
+ or modify purge_sys.view. */
|
ut_ad(last_trx_no == m_rsegs.trx_no);
|
ut_a(purge_sys.hdr_page_no != FIL_NULL);
|
- ut_a(tail_trx_no <= last_trx_no);
|
+ ut_a(purge_sys.tail.trx_no <= last_trx_no);
|
+ if (last_trx_no <= purge_sys.low_limit_no()) {
|
+ purge_sys.tail.trx_no = last_trx_no;
|
+ }
|
|
return(true);
|
}
|
@@ -607,6 +609,9 @@ TRANSACTIONAL_TARGET static void trx_purge_truncate_history()
|
|
if (head.trx_no >= purge_sys.low_limit_no())
|
{
|
+ ut_a(head.undo_no
|
+ ? head.trx_no < purge_sys.low_limit_no()
|
+ : head.trx_no == purge_sys.low_limit_no());
|
/* This is sometimes necessary. TODO: find out why. */
|
head.trx_no= purge_sys.low_limit_no();
|
head.undo_no= 0; |
Not only would the added assertion in trx_purge_truncate_history() fail in the test innodb.instant_alter_bugs, but the test innodb.innodb-isolation would show result differences.
vlad.lesin, since you are familiar with read views and purge, do you have any suggestions? The options are to replace the FIXME comment with a good explanation, or to adjust the code so that no adjustment in trx_purge_truncate_history() would be needed.
The ‘overshooting’ might be related to innodb_purge_batch_size. In MDEV-24402 I had to tweak the purge_sys.end_view creation so that it will not be a copy of purge_sys.view, but capped by purge_sys.head.trx_no or purge_sys.tail.trx_no. Few mtr tests that would invoke CHECK TABLE are generating enough undo logs so that the default innodb_purge_batch_size=300 would start to matter. A few tests for partitioned tables do that.
In MDEV-24402, I found out that it is not easy to reconstruct the state of purge after server restart. The tweaks might also be related to that.
The MDEV-24402 CHECK TABLE…EXTENDED sometimes produces warnings when enabling innodb_undo_log_truncate. I observed it for 2 tests:
parts.partition_alter4_innodb 'innodb' w13 [ fail ]
|
Test ended at 2022-12-08 11:12:16
|
|
CURRENT_TEST: parts.partition_alter4_innodb
|
--- /home/buildbot/maria-slave/fulltest-debug-big-only/build/mysql-test/suite/parts/r/partition_alter4_innodb.result 2022-12-08 10:44:53.105837636 +0000
|
+++ /home/buildbot/maria-slave/fulltest-debug-big-only/build/mysql-test/suite/parts/r/partition_alter4_innodb.reject 2022-12-08 11:12:15.246208692 +0000
|
@@ -42328,6 +42328,8 @@
|
test.t1 analyze status OK
|
CHECK TABLE t1 EXTENDED;
|
Table Op Msg_type Msg_text
|
+test.t1 check Warning InnoDB: Unpurged clustered index record in table `test`.`t1` /* Partition `part_2` */: COMPACT RECORD(info_bits=32, 8 fields): {[6] X8(0x000000005838),[6] G.(0x00000001472E),[7]( (0x2800000182079C),[4] (0x8000000A),[4] (0x8000000A),[20]10 (0x3130202020202020202020202020202020202020),[20]10 (0x3130202020202020202020202020202020202020),[13]just inserted(0x6A75737420696E736572746564)}
|
+test.t1 check Warning InnoDB: Unpurged clustered index record in table `test`.`t1` /* Partition `part_5` */: COMPACT RECORD(info_bits=32, 8 fields): {[6] X7(0x000000005837),[6] G.(0x00000001472E),[7]( (0x280000018207D2),[4] (0x80000009),[4] (0x80000009),[20]9 (0x3920202020202020202020202020202020202020),[20]9 (0x3920202020202020202020202020202020202020),[13]just inserted(0x6A75737420696E736572746564)}
|
test.t1 check status OK
|
CHECKSUM TABLE t1 EXTENDED;
|
Table Checksum
|
|
mysqltest: Result length mismatch
|
and
CURRENT_TEST: parts.partition_alter1_2_innodb
|
--- /home/buildbot/maria-slave/fulltest-debug-big-only/build/mysql-test/suite/parts/r/partition_alter1_2_innodb.result 2022-12-07 21:27:06.327078615 +0000
|
+++ /home/buildbot/maria-slave/fulltest-debug-big-only/build/mysql-test/suite/parts/r/partition_alter1_2_innodb.reject 2022-12-08 11:11:17.827001454 +0000
|
@@ -30611,6 +30611,8 @@
|
test.t1 analyze status OK
|
CHECK TABLE t1 EXTENDED;
|
Table Op Msg_type Msg_text
|
+test.t1 check Warning InnoDB: Unpurged clustered index record in table `test`.`t1` /* Partition `part0` */: COMPACT RECORD(info_bits=32, 8 fields): {[6] B (0x00000000421A),[6] Ff(0x000000014666),[7] 0!z(0x0B00000130217A),[4] (0x8000000E),[4] (0x8000000E),[20] (0x2020202020202020202020202020202020202020),[20] (0x2020202020202020202020202020202020202020),[12]was inserted(0x77617320696E736572746564)}
|
+test.t1 check Warning InnoDB: Unpurged clustered index record in table `test`.`t1` /* Partition `part2` */: COMPACT RECORD(info_bits=32, 8 fields): {[6] B (0x000000004219),[6] Ff(0x000000014666),[7] 0"=(0x0B00000130223D),[4] (0x8000000F),[4] (0x8000000F),[20] (0x2020202020202020202020202020202020202020),[20] (0x2020202020202020202020202020202020202020),[12]was inserted(0x77617320696E736572746564)}
|
test.t1 check status OK
|
CHECKSUM TABLE t1 EXTENDED;
|
Table Checksum
|
|
mysqltest: Result length mismatch
|
This is not easily reproducible, and I never reproduced it with rr record. To avoid CI failures due to this, in MDEV-29986 I had changed the CHECK TABLE t1 EXTENDED to CHECK TABLE t1.
Edit: Due to a performance impact, MDEV-29986 will not change the default from innodb_undo_log_truncate=OFF after all; hence, the tests will not have to be touched.
I believe that the above mentioned bugs when using innodb_undo_log_truncate=ON could have been fixed by MDEV-30671. I did not test that.
Related comment:
head.reset_trx_no(purge_sys.low_limit_no());
head.undo_no = 0;
}