Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.4.0
Description
Assert hit during RQG testing on
|
10.4.7 70c2bde931246ea4966d82fa56773b8ef1e0074f 2019-07-09T15:10:45+04:00
|
No replay on 10.3.11 07b1a26c33b28812a4fd8c814de0fe7d943bbd6b 2019-07-10
|
|
mysqld: /home/mleich/work/10.4/storage/innobase/include/data0data.h:621: dfield_t* dtuple_get_nth_field(dtuple_t*, ulint): Assertion `n < tuple->n_fields' failed.
|
190711 20:17:23 [ERROR] mysqld got signal 6 ;
|
...
|
Query (0x7f9aa0011180): ROLLBACK /* E_R Thread3 QNO 358 CON_ID 18 */
|
Connection ID (thread ID): 18
|
Status: NOT_KILLED
|
|
Attachments
- MDEV-20048.tgz
- 3 kB
- prt
- 66 kB
Issue Links
- is caused by
-
MDEV-15562 Instant DROP COLUMN or changing the order of columns
-
- Closed
-
- relates to
-
MDEV-11369 Instant add column for InnoDB
-
- Closed
-
Activity
I think that the problem occurs due to the btr_cur_trim() call, which would reduce the number of columns. With the following patch, the test does not crash for me:
diff --git a/storage/innobase/btr/btr0cur.cc b/storage/innobase/btr/btr0cur.cc
|
index 4ec921991a7..e1566d22152 100644
|
--- a/storage/innobase/btr/btr0cur.cc
|
+++ b/storage/innobase/btr/btr0cur.cc
|
@@ -7266,9 +7266,9 @@ btr_push_update_extern_fields(
|
n = upd_get_n_fields(update);
|
|
for (; n--; uf++) {
|
- if (dfield_is_ext(&uf->new_val)) {
|
- dfield_t* field
|
- = dtuple_get_nth_field(tuple, uf->field_no);
|
+ if (dfield_is_ext(&uf->new_val)
|
+ && uf->field_no < tuple->n_fields) {
|
+ dfield_t* field = &tuple->fields[uf->field_no];
|
|
if (!dfield_is_ext(field)) {
|
dfield_set_ext(field); |
Only after developing a test case, I will know if also 10.3 is affected. I think that it should be.
This is definitely caused by MDEV-15562 (instant DROP COLUMN).
I added some instrumentation to be able to ‘roll back’ the btr_cur_trim():
diff --git a/storage/innobase/data/data0data.cc b/storage/innobase/data/data0data.cc
|
index 02f921f716c..7c1cd292e5b 100644
|
--- a/storage/innobase/data/data0data.cc
|
+++ b/storage/innobase/data/data0data.cc
|
@@ -76,6 +76,7 @@ void dtuple_t::trim(const dict_index_t& index)
|
}
|
}
|
|
+ if (n_fields != i) ib::info() << "trimmed " << n_fields << " to " << i;
|
n_fields = i;
|
}
|
|
This would display the following:
10.4 90600e8fc5d731720af557ddb31d1eab9ec47f72 |
…
|
2019-07-19 16:25:49 12 [Note] InnoDB: trimmed 278 to 5
|
mysqld: /mariadb/10.4/storage/innobase/include/data0data.h:621: dfield_t* dtuple_get_nth_field(dtuple_t*, ulint): Assertion `n < tuple->n_fields' failed.
|
The call must have occurred before btr_cur_pessimistic_update() was executed, possibly in btr_cur_optimistic_update(). With this information, I can view the full clustered index tuple:
Program terminated with signal SIGABRT, Aborted.
|
#0 __pthread_kill (threadid=<optimized out>, signo=6)
|
at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
|
56 ../sysdeps/unix/sysv/linux/pthread_kill.c: Tiedostoa tai hakemistoa ei ole.
|
[Current thread is 1 (Thread 0x7f4eeb46a700 (LWP 17726))]
|
(gdb) up 8
|
#8 0x00005558594ac193 in btr_push_update_extern_fields (
|
tuple=0x7f4e8c0674f8, update=0x7f4e8c020c58, heap=0x7f4e8c065970)
|
at /mariadb/10.4/storage/innobase/btr/btr0cur.cc:7271
|
(gdb) p *tuple->fields@278
|
$1 = {{data = 0x7f4e8c065a24, ext = 0, spatial_status = 2, len = 4, type = {
|
prtype = 1283, mtype = 6, len = 4, mbminlen = 0, mbmaxlen = 0}}, {
|
data = 0x7f4e8c069008, ext = 0, spatial_status = 0, len = 6, type = {
|
prtype = 257, mtype = 8, len = 6, mbminlen = 0, mbmaxlen = 0}}, {
|
data = 0x7f4e8c069000, ext = 0, spatial_status = 0, len = 7, type = {
|
prtype = 258, mtype = 8, len = 7, mbminlen = 0, mbmaxlen = 0}}, {
|
data = 0x7f4e8c065a35, ext = 0, spatial_status = 2, len = 4294967295,
|
type = {prtype = 1024, mtype = 4, len = 65535, mbminlen = 0,
|
mbmaxlen = 0}} <repeats 270 times>, {data = 0x7f4e8c068fe8, ext = 1,
|
spatial_status = 0, len = 20, type = {prtype = 1024, mtype = 4,
|
len = 65535, mbminlen = 0, mbmaxlen = 0}}, {data = 0x7f4e8c065a35,
|
ext = 0, spatial_status = 2, len = 4294967295, type = {prtype = 1024,
|
mtype = 4, len = 65535, mbminlen = 0, mbmaxlen = 0}}, {
|
data = 0x7f4e8c065a35, ext = 0, spatial_status = 2, len = 4294967295,
|
type = {prtype = 1024, mtype = 4, len = 65535, mbminlen = 0,
|
mbmaxlen = 0}}, {data = 0x7f4e8c065a35, ext = 0, spatial_status = 2,
|
len = 4294967295, type = {prtype = 1024, mtype = 4, len = 65535,
|
mbminlen = 0, mbmaxlen = 0}}, {data = 0x0, ext = 0, spatial_status = 0,
|
len = 4294967295, type = {prtype = 524540, mtype = 5, len = 10,
|
mbminlen = 1, mbmaxlen = 1}}}
|
(gdb) p *update
|
$2 = {heap = 0x7f4e8c026400, info_bits = 32, old_vrow = 0x0, n_fields = 4,
|
fields = 0x7f4e8c020c88, vers_sys_value = "\000\000\000\000\000\000\000"}
|
(gdb) p *update->fields@update->n_fields
|
$3 = {{field_no = 273, orig_len = 0, exp = 0x0, new_val = {
|
data = 0x7f4e8c0264a8, ext = 1, spatial_status = 0, len = 20, type = {
|
prtype = 1024, mtype = 4, len = 65535, mbminlen = 0, mbmaxlen = 0}},
|
old_v_val = 0x0}, {field_no = 277, orig_len = 0, exp = 0x0, new_val = {
|
data = 0x0, ext = 0, spatial_status = 0, len = 4294967295, type = {
|
prtype = 524540, mtype = 5, len = 10, mbminlen = 1, mbmaxlen = 1}},
|
old_v_val = 0x0}, {field_no = 1, orig_len = 0, exp = 0x0, new_val = {
|
data = 0x7f4e8c020d48, ext = 0, spatial_status = 0, len = 6, type = {
|
prtype = 257, mtype = 8, len = 6, mbminlen = 0, mbmaxlen = 0}},
|
old_v_val = 0x0}, {field_no = 2, orig_len = 0, exp = 0x0, new_val = {
|
data = 0x7f4e8c020d50, ext = 0, spatial_status = 0, len = 7, type = {
|
prtype = 258, mtype = 8, len = 7, mbminlen = 0, mbmaxlen = 0}},
|
old_v_val = 0x0}}
|
(gdb) p tuple->fields[273]
|
$4 = {data = 0x7f4e8c068fe8, ext = 1, spatial_status = 0, len = 20, type = {
|
prtype = 1024, mtype = 4, len = 65535, mbminlen = 0, mbmaxlen = 0}}
|
(gdb) p tuple->fields[277]
|
$5 = {data = 0x0, ext = 0, spatial_status = 0, len = 4294967295, type = {
|
prtype = 524540, mtype = 5, len = 10, mbminlen = 1, mbmaxlen = 1}}
|
(gdb) up
|
#9 0x00005558594a63a2 in btr_cur_pessimistic_update (flags=7,
|
cursor=0x7f4e8c0654c8, offsets=0x7f4eeb4666b0,
|
offsets_heap=0x7f4eeb4666b8, entry_heap=0x7f4e8c065970,
|
big_rec=0x7f4eeb4664c8, update=0x7f4e8c020c58, cmpl_info=0,
|
thr=0x7f4e8c0204e8, trx_id=5055, mtr=0x7f4eeb4666d0)
|
at /mariadb/10.4/storage/innobase/btr/btr0cur.cc:4914
|
(gdb) p index->fields[273]
|
$6 = {col = 0x7f4e885821a8, name = {m_name = 0x0}, prefix_len = 0,
|
fixed_len = 0}
|
(gdb) p index->fields[277]
|
$7 = {col = 0x7f4e8856d900, name = {m_name = 0x7f4e8856d8ad "col_text"},
|
prefix_len = 0, fixed_len = 0}
|
The problematic field at offset 273 is for a column that had been instantly dropped. For the very last field (the latest instantly added column, col_text) at offset 277 there is a NULL value.
So, this may indeed be specific to instant DROP COLUMN. I failed to create a simple test case with only ADD, but I successfully created a deterministic test case that involves both ADD and DROP:
--source include/have_innodb.inc
|
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB; |
INSERT INTO t1 VALUES (1); |
ALTER TABLE t1 ADD b TEXT, ALGORITHM=INSTANT; |
SET @fill_amount = (@@innodb_page_size / 2 ) + 1 ; |
INSERT INTO t1 VALUES(2, REPEAT('b',@fill_amount)); |
BEGIN; |
DELETE FROM t1; |
|
# Stop purge so that it doesn't remove the delete marked entries. |
connect (purge_control,localhost,root); |
START TRANSACTION WITH CONSISTENT SNAPSHOT; |
connection default; |
|
COMMIT; |
|
ALTER TABLE t1 DROP b, ALGORITHM=INSTANT; |
BEGIN; |
INSERT INTO t1 VALUES (2); |
ROLLBACK; |
|
disconnect purge_control;
|
|
SELECT * FROM t1; |
DROP TABLE t1; |
This will crash on the ROLLBACK.
I think that we need a slightly better fix that somehow takes into account the original number of fields in the tuple. We do not want to effectively remove the failing assertion, but instead we want to adjust the logic only for this specific case.
I am raising this to blocker priority, because this is a simple scenario that could potentially cause corruption. I did not get a crash from a non-debug server, but I think we should be better safe than sorry.
The impact for the non-debug server should be that btr_push_update_extern_fields() returns a too large value. As a result, dtuple_convert_big_rec() will be invoked with too large value of n_ext, and it will overestimate the size of the record. Furthermore, for ROW_FORMAT=REDUNDANT, we could unnecessarily use 2-byte offsets to fields. This could lead to the rollback failing with DB_TOO_BIG_RECORD. InnoDB assumes that rollback will always succeed, so recovery from the situation would involve setting innodb_force_recovery=3 or upgrading to a version where the bug has been fixed.
prt == The output (with backtrace) of MTR during my test run on 10.4.
MDEV-20048.tgz == Archive with MTR based replay test case
How to replay the problem:
0. Use a Linux
1. cd <source>/mysql-test
2. tar xvzf MDEV-20048.tgz
This unpacks a
mysqltest_background.sh
main/ml_MDEV-20048.test
3. cd <tree_with_binaries compiled with debug>/mysql-test
4. ./mysql-test-run.pl --mem ml_MDEV-20048
Please do not add my replay test case to the regression tests because
it consumes too much resources.
A perfect regression test should use debug_sync or similar.