2019-08-30 8:58:22 11 [ERROR] [FATAL] InnoDB: Record in index `uidx3` of table `test`.`t3` was not found on update: TUPLE (info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} at: RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x07)}
(col1,DB_TRX_ID,DB_ROLL_PTR,dropped_col_text,col_text)=(metadata record with purged history),delete-marked(7,0x6f,0x460000014a0110,0,77)
(col_text,col1)=(0,7), PAGE_MAX_TRX_ID=0x6d
Again, instead of finding a delete-marked record (77,7), we find the non-delete-marked (0,7) in the secondary index.
The history of the record (77,7) is as follows:
inserted by INSERT INTO t3 VALUES (7,77), transaction 0x64
delete-marked by DELETE FROM t3 WHERE col1=7, transaction 0x69
purged
assertion failure on DELETE FROM t3 WHERE col1=7, transaction 0x6f
(other transactions 0x71,0x72 are waiting for locks, did not modify anything)
The history of the record (0,7) is as follows:
delete-marked by UPDATE t3 SET col1=4, transaction 0x63
purged
inserted by UPDATE t3 SET col1=7, transaction 0x6d updating (0,4) to (0,7)
Let us reconstruct the complete contents of the secondary index (col_text,col1) at each step of the log:
initial state: (0,7) after ADD INDEX (not logged)
del-mark(0,7) after UPDATE t3 SET col1 = 4, transaction 0x63
(0,4),del-mark(0,7) after UPDATE t3 SET col1 = 4, transaction 0x63
(0,4),del-mark(0,7),(77,7) after INSERT INTO t3 VALUES (7,77), transaction 0x64
(0,4),(77,7) after purge
del-mark(0,4),(77,7) after UPDATE t3 SET col1 = 7, transaction 0x66
del-mark(0,4),del-mark(77,7) after DELETE FROM t3 WHERE col1=7, transaction 0x69
del-mark(0,4) after purge
del-mark(0,4) after UPDATE t3 SET col1 = 7, transaction 0x6d
del-mark(0,4),(0,7) after UPDATE t3 SET col1 = 7, transaction 0x6d
(0,7) after purge
(0,7) instead of (77,7) on the DELETE FROM t3 WHERE col1=7, transaction 0x6f
The last UPDATE is attempting to delete-mark an already delete-marked record (0,4). The corruption seems to have been caused earlier, by UPDATE t3 SET col1 = 7, transaction 0x66, which did not attempt to insert the record (0,7) as expected. Actually, that transaction should have been rolled back, because the record (7,77) already existed in the clustered index, so the attempt to update (4,0) to (7,0) should have violated the uniqueness of the PRIMARY KEY(col1).
I must add more diagnostics to cover transaction rollback. The bug appears to be in the UPDATE or its rollback.
Marko Mäkelä
added a comment - Another day, another crash, this time with some more diagnostics:
2019-08-30 8:58:21 9 [Note] InnoDB: insert TUPLE (info_bits=0, 2 fields): {[8] (0x0000000000000014),[7]test/t3(0x746573742F7433)} by 0x22,CREATE TABLE t3 (col1 TINYINT UNSIGNED PRIMARY KEY, col_text TINYINT UNSIGNED NOT NULL DEFAULT 0) ENGINE=InnoDB ROW_FORMAT=REDUNDANT
2019-08-30 8:58:22 12 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x07)} by 0x63,UPDATE t3 SET col1 = 4
2019-08-30 8:58:22 12 [Note] InnoDB: insert TUPLE (info_bits=0, 2 fields): {[1] (0x00),[1] (0x04)} by 0x63,UPDATE t3 SET col1 = 4
2019-08-30 8:58:22 10 [Note] InnoDB: insert TUPLE (info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} by 0x64,INSERT INTO t3 VALUES (7,77)
2019-08-30 8:58:22 2 [Note] InnoDB: delete RECORD(info_bits=32, 2 fields): {[1] (0x00),[1] (0x07)}
2019-08-30 8:58:22 2 [Note] InnoDB: purging RECORD(info_bits=32, 2 fields): {[1] (0x00),[1] (0x07)}:1
2019-08-30 8:58:22 12 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x04)} by 0x66,UPDATE t3 SET col1 = 7
2019-08-30 8:58:22 11 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} by 0x69,DELETE FROM t3 WHERE col1=7
2019-08-30 8:58:22 2 [Note] InnoDB: delete RECORD(info_bits=32, 2 fields): {[1]M(0x4D),[1] (0x07)}
2019-08-30 8:58:22 2 [Note] InnoDB: purging RECORD(info_bits=32, 2 fields): {[1]M(0x4D),[1] (0x07)}:1
2019-08-30 8:58:22 12 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x04)} by 0x6d,UPDATE t3 SET col1 = 7
2019-08-30 8:58:22 12 [Note] InnoDB: insert TUPLE (info_bits=0, 2 fields): {[1] (0x00),[1] (0x07)} by 0x6d,UPDATE t3 SET col1 = 7
2019-08-30 8:58:22 4 [Note] InnoDB: delete RECORD(info_bits=32, 2 fields): {[1] (0x00),[1] (0x04)}
2019-08-30 8:58:22 4 [Note] InnoDB: purging RECORD(info_bits=32, 2 fields): {[1] (0x00),[1] (0x04)}:1
2019-08-30 8:58:22 4 [Note] InnoDB: purged RECORD(info_bits=32, 5 fields): {[1] (0x04),[6] m(0x00000000006D),[7]D I (0x44000001490110),[1] (0x00),DEFAULT}
2019-08-30 8:58:22 11 [ERROR] [FATAL] InnoDB: Record in index `uidx3` of table `test`.`t3` was not found on update: TUPLE (info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} at: RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x07)}
(col1,DB_TRX_ID,DB_ROLL_PTR,dropped_col_text,col_text)=(metadata record with purged history),delete-marked(7,0x6f,0x460000014a0110,0,77)
(col_text,col1)=(0,7), PAGE_MAX_TRX_ID=0x6d
Again, instead of finding a delete-marked record (77,7), we find the non-delete-marked (0,7) in the secondary index.
The history of the record (77,7) is as follows:
inserted by INSERT INTO t3 VALUES (7,77) , transaction 0x64
delete-marked by DELETE FROM t3 WHERE col1=7 , transaction 0x69
purged
assertion failure on DELETE FROM t3 WHERE col1=7 , transaction 0x6f
(other transactions 0x71,0x72 are waiting for locks, did not modify anything)
The history of the record (0,7) is as follows:
delete-marked by UPDATE t3 SET col1=4 , transaction 0x63
purged
inserted by UPDATE t3 SET col1=7 , transaction 0x6d updating (0,4) to (0,7)
Let us reconstruct the complete contents of the secondary index (col_text,col1) at each step of the log:
initial state: (0,7) after ADD INDEX (not logged)
del-mark(0,7) after UPDATE t3 SET col1 = 4 , transaction 0x63
(0,4),del-mark(0,7) after UPDATE t3 SET col1 = 4 , transaction 0x63
(0,4),del-mark(0,7),(77,7) after INSERT INTO t3 VALUES (7,77) , transaction 0x64
(0,4),(77,7) after purge
del-mark(0,4),(77,7) after UPDATE t3 SET col1 = 7 , transaction 0x66
del-mark(0,4),del-mark(77,7) after DELETE FROM t3 WHERE col1=7 , transaction 0x69
del-mark(0,4) after purge
del-mark(0,4) after UPDATE t3 SET col1 = 7 , transaction 0x6d
del-mark(0,4),(0,7) after UPDATE t3 SET col1 = 7 , transaction 0x6d
(0,7) after purge
(0,7) instead of (77,7) on the DELETE FROM t3 WHERE col1=7 , transaction 0x6f
The last UPDATE is attempting to delete-mark an already delete-marked record (0,4). The corruption seems to have been caused earlier, by UPDATE t3 SET col1 = 7 , transaction 0x66, which did not attempt to insert the record (0,7) as expected. Actually, that transaction should have been rolled back, because the record (7,77) already existed in the clustered index, so the attempt to update (4,0) to (7,0) should have violated the uniqueness of the PRIMARY KEY(col1) .
I must add more diagnostics to cover transaction rollback. The bug appears to be in the UPDATE or its rollback.
The diagnostics for del-mark both in the previous logs and the following log was misleading, because it is not showing the new value of the flag, and it is being displayed even when there is no change.
I analyzed another crash, with an explicit trace of persistent transaction commit (undo_no>0) or rollback (undo_no=0).
Again, we crash on DELETE (transaction 0x57), while other transactions 0x59,0x5a are blocked before making any changes.
And again, the table contents is the same, except for transaction identifiers and undo log pointers.
According to the available information that I gathered from the new crash, the clustered index should have contained the record (7,0) instead of (7,77), after the UPDATE t3 SET col1 = 7, transaction 0x53.
Similarly for the earlier case, the record (0,4) was not delete-marked by the UPDATE (transaction 0x66); it was a null change performed by a rollback that was made due to duplicate primary key. Also in that case, it looks like the clustered index should have contained (7,0) instead of (7,77).
I reran with more diagnostics (also including operations on the clustered index), to get a similar crash again, at DELETE, transaction 0xac. In the undo log record for that transaction, the DB_TRX_ID had been zeroed out by row_purge_reset_trx_id(). I might disable that, so that we can try to cross-reference the undo log records with the diagnostic output.
Marko Mäkelä
added a comment - The diagnostics for del-mark both in the previous logs and the following log was misleading, because it is not showing the new value of the flag, and it is being displayed even when there is no change.
I analyzed another crash, with an explicit trace of persistent transaction commit (undo_no>0) or rollback (undo_no=0).
Again, we crash on DELETE (transaction 0x57), while other transactions 0x59,0x5a are blocked before making any changes.
And again, the table contents is the same, except for transaction identifiers and undo log pointers.
According to the available information that I gathered from the new crash, the clustered index should have contained the record (7,0) instead of (7,77), after the UPDATE t3 SET col1 = 7 , transaction 0x53.
Similarly for the earlier case, the record (0,4) was not delete-marked by the UPDATE (transaction 0x66); it was a null change performed by a rollback that was made due to duplicate primary key. Also in that case, it looks like the clustered index should have contained (7,0) instead of (7,77).
I reran with more diagnostics (also including operations on the clustered index), to get a similar crash again, at DELETE , transaction 0xac. In the undo log record for that transaction, the DB_TRX_ID had been zeroed out by row_purge_reset_trx_id() . I might disable that, so that we can try to cross-reference the undo log records with the diagnostic output.
TL;DR: Here is a deterministic test case that does not repeat any crash, but shows error in CHECK TABLE even with innodb_force_recovery=2 (purge disabled):
--source include/have_innodb.inc
CREATETABLE t1(a INTPRIMARYKEY, b INTNOTNULL) ENGINE=InnoDB;
INSERTINTO t1 VALUES (7,77);
ALTERTABLE t1 DROPCOLUMN b, ADDCOLUMN c INTNOTNULLDEFAULT 0;
ALTERTABLE t1 ADDINDEX (c);
BEGIN;
DELETEFROM t1;
INSERTINTO t1 VALUES (4,0),(7,77);
COMMIT;
BEGIN;
DELETEFROM t1 WHERE a=7;
UPDATE t1 SET a=7;
COMMIT;
DELETEFROM t1;
CHECKTABLE t1;
DROPTABLE t1;
How did I arrive at that?
After disabling row_purge_reset_trx_id(), I can follow the history from the undo log page for failures of mmr_MDEV-20066.test with btr-instr.patch.
With this crash, the DELETE is transaction 0x64. The record (7,77) was previously updated by the second and last undo log record of transaction 0x5f, which recorded TRX_UNDO_DEL_MARK_REC for (4,0) and TRX_UNDO_UPD_DEL_REC for (7,77) that had been delete-marked by the only undo log record of transaction 0x60, and previously inserted by transaction 0x5e, which is where the history starts by TRX_UNDO_INSERT_REC.
The ADD INDEX (transaction 0x2b) was committed long before 0x5e started. So, the clustered index record with col1=7 must have carried the value col_text=77 from this point on, and a previous entry must have been purged before 0x5e started. The log confirms this: there was a DELETE and purge right before the INSERT.
According to the following, the index did contain a record (col1,col_text)=(4,0) since this point:
2019-08-30 11:03:50 10 [Note] InnoDB: insert TUPLE (info_bits=0, 4 fields): {[1] (0x04),[6] E(0x000000000045),[7] I /(0x9700000149012F),[1] (0x00)} by 0x45,UPDATE t3 SET col1 = 4
2019-08-30 11:03:50 10 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x07)} by 0x45,UPDATE t3 SET col1 = 4
2019-08-30 11:03:50 10 [Note] InnoDB: insert TUPLE (info_bits=0, 2 fields): {[1] (0x00),[1] (0x04)} by 0x45,UPDATE t3 SET col1 = 4
The DB_ROLL_PTR=0x9700000149012F points to the second and last record of a transaction that logged TRX_UNDO_DEL_MARK_REC for col1=7 that had been modified by transaction 0x43, and TRX_UNDO_INSERT_REC for the (4,0). There is a chain of undo log records, pointing to transactions 0x43, 0x40, 0x3d, 0x3b, 0x39, 0x37, 0x35, 0x33, 0x31, 0x2f, 0x24, which was the initial INSERT of the record col1=7 before the ALTER TABLE (0x28 and 0x2b). After the first ALTER, the record should be interpreted as (col1,dropped_col_text,col_text)=(7,77,DEFAULT=0), that is, (7,0).
So, the record value (4,0) after the long chain of UPDATE seems to be legitimate. The original record (7,0) with dropped_col_text=77 was purged already between transaction 0x2f and 0x31:
2019-08-30 11:03:50 11 [ERROR] [FATAL] InnoDB: Record in index `uidx3` of table `test`.`t3` was not found on update: TUPLE (info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} at: RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x07)}
Let me reconstruct the clustered index contents from the start of transaction 0x5e:
(4,0),(7,77) at the start of transaction 0x51 (rolled back)
(4,0),delete-marked(7,77) after DELETE, transaction 0x55 (committed)
(4,0),(7,77) after INSERT, transaction 0x54 (committed)
(4,0),delete-marked(7,77) after DELETE, transaction 0x59 (committed)
(4,0) after purge (in both indexes)
(4,0),(7,77) after INSERT, transaction 0x5e (committed)
(4,0),delete-marked(7,77) after DELETE, transaction 0x60 (committed)
unchanged after (77,7) was purged from secondary index
delete-marked(4,0),(7,77) after UPDATE, transaction 0x5f
(7,77) after purge (in both indexes)
crash on DELETE because the record should have been (7,0)
The logging for clustered index operations is not complete, but the evidence for the corruption is there, wrongly using update-in-place and inserting (7,77) instead of (7,0) when updating (4,0):
2019-08-30 11:03:50 10 [Note] InnoDB: uip RECORD(info_bits=32, 5 fields): {[1] (0x07),[6] `(0x000000000060),[7]' T (0x27000001540110),[1] (0x00),[1]M(0x4D)} to RECORD(info_bits=0, 5 fields): {[1] (0x07),[6] _(0x00000000005F),[7]& U /(0x2600000155012F),[1] (0x00),[1]M(0x4D)} by 0x5f,UPDATE t3 SET col1 = 7
Marko Mäkelä
added a comment - TL;DR: Here is a deterministic test case that does not repeat any crash, but shows error in CHECK TABLE even with innodb_force_recovery=2 (purge disabled):
--source include/have_innodb.inc
CREATE TABLE t1(a INT PRIMARY KEY , b INT NOT NULL ) ENGINE=InnoDB;
INSERT INTO t1 VALUES (7,77);
ALTER TABLE t1 DROP COLUMN b, ADD COLUMN c INT NOT NULL DEFAULT 0;
ALTER TABLE t1 ADD INDEX (c);
BEGIN ;
DELETE FROM t1;
INSERT INTO t1 VALUES (4,0),(7,77);
COMMIT ;
BEGIN ;
DELETE FROM t1 WHERE a=7;
UPDATE t1 SET a=7;
COMMIT ;
DELETE FROM t1;
CHECK TABLE t1;
DROP TABLE t1;
How did I arrive at that?
After disabling row_purge_reset_trx_id() , I can follow the history from the undo log page for failures of mmr_MDEV-20066.test with btr-instr.patch .
With this crash, the DELETE is transaction 0x64. The record (7,77) was previously updated by the second and last undo log record of transaction 0x5f, which recorded TRX_UNDO_DEL_MARK_REC for (4,0) and TRX_UNDO_UPD_DEL_REC for (7,77) that had been delete-marked by the only undo log record of transaction 0x60, and previously inserted by transaction 0x5e, which is where the history starts by TRX_UNDO_INSERT_REC .
The ADD INDEX (transaction 0x2b) was committed long before 0x5e started. So, the clustered index record with col1=7 must have carried the value col_text=77 from this point on, and a previous entry must have been purged before 0x5e started. The log confirms this: there was a DELETE and purge right before the INSERT .
According to the following, the index did contain a record (col1,col_text)=(4,0) since this point:
2019-08-30 11:03:50 10 [Note] InnoDB: insert TUPLE (info_bits=0, 4 fields): {[1] (0x04),[6] E(0x000000000045),[7] I /(0x9700000149012F),[1] (0x00)} by 0x45,UPDATE t3 SET col1 = 4
2019-08-30 11:03:50 10 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x07)} by 0x45,UPDATE t3 SET col1 = 4
2019-08-30 11:03:50 10 [Note] InnoDB: insert TUPLE (info_bits=0, 2 fields): {[1] (0x00),[1] (0x04)} by 0x45,UPDATE t3 SET col1 = 4
2019-08-30 11:03:50 10 [Note] InnoDB: commit 0x45,2
The DB_ROLL_PTR=0x9700000149012F points to the second and last record of a transaction that logged TRX_UNDO_DEL_MARK_REC for col1=7 that had been modified by transaction 0x43, and TRX_UNDO_INSERT_REC for the (4,0). There is a chain of undo log records, pointing to transactions 0x43, 0x40, 0x3d, 0x3b, 0x39, 0x37, 0x35, 0x33, 0x31, 0x2f, 0x24, which was the initial INSERT of the record col1=7 before the ALTER TABLE (0x28 and 0x2b). After the first ALTER , the record should be interpreted as (col1,dropped_col_text,col_text)=(7,77,DEFAULT=0), that is, (7,0).
So, the record value (4,0) after the long chain of UPDATE seems to be legitimate. The original record (7,0) with dropped_col_text=77 was purged already between transaction 0x2f and 0x31:
2019-08-30 11:03:49 9 [Note] InnoDB: insert TUPLE (info_bits=0, 5 fields): {[8] (0x000000000000001A),[4] (0x00000000),[6] +(0x00000000002B),[7] < (0x880000013C029F),[8]col_text(0x636F6C5F74657874)} by 0x2b,ALTER TABLE t3 ADD INDEX uidx3 ( col_text ), ALGORITHM = NOCOPY, LOCK = EXCLUSIVE
2019-08-30 11:03:49 9 [Note] InnoDB: commit 0x2b,2
2019-08-30 11:03:49 9 [Note] InnoDB: commit 0x2d,1
2019-08-30 11:03:50 10 [Note] InnoDB: insert TUPLE (info_bits=0, 4 fields): {[1] (0x04),[6] /(0x00000000002F),[7] = (0x8A0000013D02D2),[1] (0x00)} by 0x2f,UPDATE t3 SET col1 = 4
2019-08-30 11:03:50 10 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x07)} by 0x2f,UPDATE t3 SET col1 = 4
2019-08-30 11:03:50 10 [Note] InnoDB: insert TUPLE (info_bits=0, 2 fields): {[1] (0x00),[1] (0x04)} by 0x2f,UPDATE t3 SET col1 = 4
2019-08-30 11:03:50 10 [Note] InnoDB: commit 0x2f,2
2019-08-30 11:03:50 2 [Note] InnoDB: delete RECORD(info_bits=32, 2 fields): {[1] (0x00),[1] (0x07)}
2019-08-30 11:03:50 2 [Note] InnoDB: purging RECORD(info_bits=32, 2 fields): {[1] (0x00),[1] (0x07)}:1
2019-08-30 11:03:50 2 [Note] InnoDB: delete RECORD(info_bits=32, 5 fields): {[1] (0x07),[6] /(0x00000000002F),[7] = (0x0A0000013D02B3),[1]M(0x4D),DEFAULT}
2019-08-30 11:03:50 2 [Note] InnoDB: purged RECORD(info_bits=32, 5 fields): {[1] (0x07),[6] /(0x00000000002F),[7] = (0x0A0000013D02B3),[1]M(0x4D),DEFAULT}
2019-08-30 11:03:50 10 [Note] InnoDB: insert TUPLE (info_bits=0, 4 fields): {[1] (0x07),[6] 1(0x000000000031),[7] > O(0x8B0000013E024F),[1] (0x00)} by 0x31,UPDATE t3 SET col1 = 7
Here is the end of the log:
2019-08-30 11:03:50 10 [Note] InnoDB: uip RECORD(info_bits=32, 5 fields): {[1] (0x04),[6] Q(0x000000000051),[7] O (0x1E0000014F0110),[1] (0x00),DEFAULT} to RECORD(info_bits=0, 5 fields): {[1] (0x04),[6] (0x000000000000),[7] (0x80000000000000),[1] (0x00),DEFAULT} by 0x51,UPDATE t3 SET col1 = 7
2019-08-30 11:03:50 10 [Note] InnoDB: commit 0x51,0
2019-08-30 11:03:50 11 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} by 0x55,DELETE FROM t3 WHERE col1=7
2019-08-30 11:03:50 11 [Note] InnoDB: commit 0x55,1
2019-08-30 11:03:50 12 [Note] InnoDB: uip RECORD(info_bits=32, 5 fields): {[1] (0x07),[6] U(0x000000000055),[7] P (0x20000001500110),[1] (0x00),[1]M(0x4D)} to RECORD(info_bits=0, 5 fields): {[1] (0x07),[6] T(0x000000000054),[7] Q (0x1F000001510110),[1] (0x00),[1]M(0x4D)} by 0x54,INSERT INTO t3 VALUES (7,77)
2019-08-30 11:03:50 12 [Note] InnoDB: uip RECORD(info_bits=32, 2 fields): {[1]M(0x4D),[1] (0x07)} to RECORD(info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} by 0x54,INSERT INTO t3 VALUES (7,77)
2019-08-30 11:03:50 12 [Note] InnoDB: commit 0x54,1
2019-08-30 11:03:50 11 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} by 0x59,DELETE FROM t3 WHERE col1=7
2019-08-30 11:03:50 11 [Note] InnoDB: commit 0x59,1
2019-08-30 11:03:50 2 [Note] InnoDB: delete RECORD(info_bits=32, 2 fields): {[1]M(0x4D),[1] (0x07)}
2019-08-30 11:03:50 2 [Note] InnoDB: purging RECORD(info_bits=32, 2 fields): {[1]M(0x4D),[1] (0x07)}:1
2019-08-30 11:03:50 2 [Note] InnoDB: delete RECORD(info_bits=32, 5 fields): {[1] (0x07),[6] Y(0x000000000059),[7]! R (0x21000001520110),[1] (0x00),[1]M(0x4D)}
2019-08-30 11:03:50 2 [Note] InnoDB: purged RECORD(info_bits=32, 5 fields): {[1] (0x07),[6] Y(0x000000000059),[7]! R (0x21000001520110),[1] (0x00),[1]M(0x4D)}
2019-08-30 11:03:50 12 [Note] InnoDB: insert TUPLE (info_bits=0, 5 fields): {[1] (0x07),[6] ^(0x00000000005E),[7] S (0xA5000001530110),[1] (0x00),[1]M(0x4D)} by 0x5e,INSERT INTO t3 VALUES (7,77)
2019-08-30 11:03:50 12 [Note] InnoDB: insert TUPLE (info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} by 0x5e,INSERT INTO t3 VALUES (7,77)
2019-08-30 11:03:50 12 [Note] InnoDB: commit 0x5e,1
2019-08-30 11:03:50 11 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} by 0x60,DELETE FROM t3 WHERE col1=7
2019-08-30 11:03:50 11 [Note] InnoDB: commit 0x60,1
2019-08-30 11:03:50 4 [Note] InnoDB: delete RECORD(info_bits=32, 2 fields): {[1]M(0x4D),[1] (0x07)}
2019-08-30 11:03:50 4 [Note] InnoDB: purging RECORD(info_bits=32, 2 fields): {[1]M(0x4D),[1] (0x07)}:1
2019-08-30 11:03:50 10 [Note] InnoDB: uip RECORD(info_bits=32, 5 fields): {[1] (0x07),[6] `(0x000000000060),[7]' T (0x27000001540110),[1] (0x00),[1]M(0x4D)} to RECORD(info_bits=0, 5 fields): {[1] (0x07),[6] _(0x00000000005F),[7]& U /(0x2600000155012F),[1] (0x00),[1]M(0x4D)} by 0x5f,UPDATE t3 SET col1 = 7
2019-08-30 11:03:50 10 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x04)} by 0x5f,UPDATE t3 SET col1 = 7
2019-08-30 11:03:50 10 [Note] InnoDB: insert TUPLE (info_bits=0, 2 fields): {[1] (0x00),[1] (0x07)} by 0x5f,UPDATE t3 SET col1 = 7
2019-08-30 11:03:50 10 [Note] InnoDB: commit 0x5f,2
2019-08-30 11:03:50 2 [Note] InnoDB: delete RECORD(info_bits=32, 2 fields): {[1] (0x00),[1] (0x04)}
2019-08-30 11:03:50 2 [Note] InnoDB: purging RECORD(info_bits=32, 2 fields): {[1] (0x00),[1] (0x04)}:1
2019-08-30 11:03:50 2 [Note] InnoDB: delete RECORD(info_bits=32, 5 fields): {[1] (0x04),[6] _(0x00000000005F),[7]& U (0x26000001550110),[1] (0x00),DEFAULT}
2019-08-30 11:03:50 2 [Note] InnoDB: purged RECORD(info_bits=32, 5 fields): {[1] (0x04),[6] _(0x00000000005F),[7]& U (0x26000001550110),[1] (0x00),DEFAULT}
2019-08-30 11:03:50 11 [ERROR] [FATAL] InnoDB: Record in index `uidx3` of table `test`.`t3` was not found on update: TUPLE (info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} at: RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x07)}
Let me reconstruct the clustered index contents from the start of transaction 0x5e:
(4,0),(7,77) at the start of transaction 0x51 ( rolled back )
(4,0),delete-marked(7,77) after DELETE , transaction 0x55 (committed)
(4,0),(7,77) after INSERT , transaction 0x54 (committed)
(4,0),delete-marked(7,77) after DELETE , transaction 0x59 (committed)
(4,0) after purge (in both indexes)
(4,0),(7,77) after INSERT , transaction 0x5e (committed)
(4,0),delete-marked(7,77) after DELETE , transaction 0x60 (committed)
unchanged after (77,7) was purged from secondary index
delete-marked(4,0), (7,77) after UPDATE , transaction 0x5f
(7,77) after purge (in both indexes)
crash on DELETE because the record should have been (7,0)
The logging for clustered index operations is not complete, but the evidence for the corruption is there, wrongly using update-in-place and inserting (7,77) instead of (7,0) when updating (4,0):
2019-08-30 11:03:50 10 [Note] InnoDB: uip RECORD(info_bits=32, 5 fields): {[1] (0x07),[6] `(0x000000000060),[7]' T (0x27000001540110),[1] (0x00),[1]M(0x4D)} to RECORD(info_bits=0, 5 fields): {[1] (0x07),[6] _(0x00000000005F),[7]& U /(0x2600000155012F),[1] (0x00),[1]M(0x4D)} by 0x5f,UPDATE t3 SET col1 = 7
The corruption is repeatable on 10.3, which does not support instant DROP COLUMN:
--source include/have_innodb.inc
CREATETABLE t1(a INTPRIMARYKEY) ENGINE=InnoDB;
INSERTINTO t1 VALUES (7);
ALTERTABLE t1 ADDCOLUMN c INTNOTNULLDEFAULT 0;
ALTERTABLE t1 ADDINDEX (c);
BEGIN;
DELETEFROM t1;
INSERTINTO t1 VALUES (4,0),(7,77);
COMMIT;
BEGIN;
DELETEFROM t1 WHERE a=7;
UPDATE t1 SET a=7;
COMMIT;
SELECT * FROM t1 FORCEINDEX(PRIMARY);
SELECT * FROM t1 FORCEINDEX(c);
DELETEFROM t1;
CHECKTABLE t1;
DROPTABLE t1;
The problem is that row_upd_build_difference_binary() fails to account for index->n_fields > entry->n_fields. Hence, the UPDATE of (4,0) to (7,0) will write the wrong value (7,77) to the clustered index and the correct value (0,7) to the secondary index.
Marko Mäkelä
added a comment - - edited The corruption is repeatable on 10.3, which does not support instant DROP COLUMN :
--source include/have_innodb.inc
CREATE TABLE t1(a INT PRIMARY KEY ) ENGINE=InnoDB;
INSERT INTO t1 VALUES (7);
ALTER TABLE t1 ADD COLUMN c INT NOT NULL DEFAULT 0;
ALTER TABLE t1 ADD INDEX (c);
BEGIN ;
DELETE FROM t1;
INSERT INTO t1 VALUES (4,0),(7,77);
COMMIT ;
BEGIN ;
DELETE FROM t1 WHERE a=7;
UPDATE t1 SET a=7;
COMMIT ;
SELECT * FROM t1 FORCE INDEX ( PRIMARY );
SELECT * FROM t1 FORCE INDEX (c);
DELETE FROM t1;
CHECK TABLE t1;
DROP TABLE t1;
The problem is that row_upd_build_difference_binary() fails to account for index->n_fields > entry->n_fields . Hence, the UPDATE of (4,0) to (7,0) will write the wrong value (7,77) to the clustered index and the correct value (0,7) to the secondary index.
Another day, another crash, this time with some more diagnostics:
2019-08-30 8:58:21 9 [Note] InnoDB: insert TUPLE (info_bits=0, 2 fields): {[8] (0x0000000000000014),[7]test/t3(0x746573742F7433)} by 0x22,CREATE TABLE t3 (col1 TINYINT UNSIGNED PRIMARY KEY, col_text TINYINT UNSIGNED NOT NULL DEFAULT 0) ENGINE=InnoDB ROW_FORMAT=REDUNDANT
2019-08-30 8:58:22 12 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x07)} by 0x63,UPDATE t3 SET col1 = 4
2019-08-30 8:58:22 12 [Note] InnoDB: insert TUPLE (info_bits=0, 2 fields): {[1] (0x00),[1] (0x04)} by 0x63,UPDATE t3 SET col1 = 4
2019-08-30 8:58:22 10 [Note] InnoDB: insert TUPLE (info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} by 0x64,INSERT INTO t3 VALUES (7,77)
2019-08-30 8:58:22 2 [Note] InnoDB: delete RECORD(info_bits=32, 2 fields): {[1] (0x00),[1] (0x07)}
2019-08-30 8:58:22 2 [Note] InnoDB: purging RECORD(info_bits=32, 2 fields): {[1] (0x00),[1] (0x07)}:1
2019-08-30 8:58:22 12 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x04)} by 0x66,UPDATE t3 SET col1 = 7
2019-08-30 8:58:22 11 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} by 0x69,DELETE FROM t3 WHERE col1=7
2019-08-30 8:58:22 2 [Note] InnoDB: delete RECORD(info_bits=32, 2 fields): {[1]M(0x4D),[1] (0x07)}
2019-08-30 8:58:22 2 [Note] InnoDB: purging RECORD(info_bits=32, 2 fields): {[1]M(0x4D),[1] (0x07)}:1
2019-08-30 8:58:22 12 [Note] InnoDB: del-mark RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x04)} by 0x6d,UPDATE t3 SET col1 = 7
2019-08-30 8:58:22 12 [Note] InnoDB: insert TUPLE (info_bits=0, 2 fields): {[1] (0x00),[1] (0x07)} by 0x6d,UPDATE t3 SET col1 = 7
2019-08-30 8:58:22 4 [Note] InnoDB: delete RECORD(info_bits=32, 2 fields): {[1] (0x00),[1] (0x04)}
2019-08-30 8:58:22 4 [Note] InnoDB: purging RECORD(info_bits=32, 2 fields): {[1] (0x00),[1] (0x04)}:1
2019-08-30 8:58:22 4 [Note] InnoDB: purged RECORD(info_bits=32, 5 fields): {[1] (0x04),[6] m(0x00000000006D),[7]D I (0x44000001490110),[1] (0x00),DEFAULT}
2019-08-30 8:58:22 11 [ERROR] [FATAL] InnoDB: Record in index `uidx3` of table `test`.`t3` was not found on update: TUPLE (info_bits=0, 2 fields): {[1]M(0x4D),[1] (0x07)} at: RECORD(info_bits=0, 2 fields): {[1] (0x00),[1] (0x07)}
(col1,DB_TRX_ID,DB_ROLL_PTR,dropped_col_text,col_text)=(metadata record with purged history),delete-marked(7,0x6f,0x460000014a0110,0,77)
(col_text,col1)=(0,7), PAGE_MAX_TRX_ID=0x6d
Again, instead of finding a delete-marked record (77,7), we find the non-delete-marked (0,7) in the secondary index.
The history of the record (77,7) is as follows:
(other transactions 0x71,0x72 are waiting for locks, did not modify anything)
The history of the record (0,7) is as follows:
Let us reconstruct the complete contents of the secondary index (col_text,col1) at each step of the log:
The last UPDATE is attempting to delete-mark an already delete-marked record (0,4). The corruption seems to have been caused earlier, by UPDATE t3 SET col1 = 7, transaction 0x66, which did not attempt to insert the record (0,7) as expected. Actually, that transaction should have been rolled back, because the record (7,77) already existed in the clustered index, so the attempt to update (4,0) to (7,0) should have violated the uniqueness of the PRIMARY KEY(col1).
I must add more diagnostics to cover transaction rollback. The bug appears to be in the UPDATE or its rollback.