1. Start the server, create one table and fill it with 100 rows.
2. Session 1 runs in a loop some update like
UPDATE table100_innodb_int_autoinc SET `col_varchar_255_ucs2_key` = CONVERT( 'degsotrsfannidwyvkuvlkeslrryhpkeevqmbksdrzadzpyisznignsytihyjixyalxfxpnafjwzgnkbbayklurufrsajtzohanbuvcfyykvtmesobixwipkoihhqykvoejckythjnjshxgohmecmklxryubdexjgxehdiqqui' USING ASCII )
Session 2 runs in a loop
CHECK TABLE table100_innodb_int_autoinc EXTENDED
After short time the CHECK TABLE harvests a
test.table100_innodb_int_autoinc check Warning InnoDB: Index 'col_varchar_255_ucs2_key' contains 98 entries, should be 100.
Scenario:
1. Start the server, create one table and fill it with 100 rows.
2. Session 1 runs in a loop some update like
UPDATE table100_innodb_int_autoinc SET `col_varchar_255_ucs2_key` = CONVERT( 'degsotrsfannidwyvkuvlkeslrryhpkeevqmbksdrzadzpyisznignsytihyjixyalxfxpnafjwzgnkbbayklurufrsajtzohanbuvcfyykvtmesobixwipkoihhqykvoejckythjnjshxgohmecmklxryubdexjgxehdiqqui' USING ASCII )
Session 2 runs in a loop
CHECK TABLE table100_innodb_int_autoinc EXTENDED
After short time the CHECK TABLE harvests a
test.table100_innodb_int_autoinc check Warning InnoDB: Index 'col_varchar_255_ucs2_key' contains 98 entries, should be 100.
Scenario:
1. Start the server, create one table and fill it with 100 rows.
2. Session 1 runs in a loop some update like
UPDATE table100_innodb_int_autoinc SET `col_varchar_255_ucs2_key` = CONVERT( 'degsotrsfannidwyvkuvlkeslrryhpkeevqmbksdrzadzpyisznignsytihyjixyalxfxpnafjwzgnkbbayklurufrsajtzohanbuvcfyykvtmesobixwipkoihhqykvoejckythjnjshxgohmecmklxryubdexjgxehdiqqui' USING ASCII )
Session 2 runs in a loop
CHECK TABLE table100_innodb_int_autoinc EXTENDED
After short time the CHECK TABLE harvests a
test.table100_innodb_int_autoinc check Warning InnoDB: Index 'col_varchar_255_ucs2_key' contains 98 entries, should be 100.
To avoid a debug assertion failure in purge, I started the patched server with innodb_force_recovery=2. There are 100 records in both the secondary index and the primary key index, but the primary key index contains the string 'egsotrsfannidwyvkuvlkeslrryhpkeevqmb' in many columns, while the secondary index is missing this update. Due to the way how the secondary index MVCC works, some of the secondary index records will be ignored, and a count mismatch will be reported.
It would be nice to get a simpler rr replay trace of this. It is possible that the problem is related to page splits and merges in the clustered index. Apparently only 14 of the 100 rows would match the WHERE condition. Would this be repeatable with fewer than 86 'extra' rows? An mtr based test would help me a lot.
Marko Mäkelä
added a comment - I accessed the index contents of the data directory of the rr replay trace using a patched 10.6 server:
diff --git a/storage/innobase/include/dict0mem.h b/storage/innobase/include/dict0mem.h
index a7f04c3015e..d5d9bfe3715 100644
--- a/storage/innobase/include/dict0mem.h
+++ b/storage/innobase/include/dict0mem.h
@@ -2473,7 +2473,6 @@ inline bool dict_index_t::is_instant() const
inline bool dict_index_t::is_corrupted() const
{
return UNIV_UNLIKELY(online_status >= ONLINE_INDEX_ABORTED
- || (type & DICT_CORRUPT)
|| (table && table->corrupted));
}
To avoid a debug assertion failure in purge, I started the patched server with innodb_force_recovery=2 . There are 100 records in both the secondary index and the primary key index, but the primary key index contains the string 'egsotrsfannidwyvkuvlkeslrryhpkeevqmb' in many columns, while the secondary index is missing this update. Due to the way how the secondary index MVCC works, some of the secondary index records will be ignored, and a count mismatch will be reported.
It would be nice to get a simpler rr replay trace of this. It is possible that the problem is related to page splits and merges in the clustered index. Apparently only 14 of the 100 rows would match the WHERE condition. Would this be repeatable with fewer than 86 'extra' rows? An mtr based test would help me a lot.
/*!40101 SET character_set_client = @saved_cs_client */;
INSERT INTO `t1` VALUES ('ship',NULL,'unbrfeqlhdxamoxyjrdpanbbojmgfvaikcdulupfslgcghacscrvuvbrwnoqifmmmcbcqzzvplrdhaetgjpebshvxjtyjeytzfsvthwhvh','nunbrfeqlhdxamoxyjrdpanbbojmgfvaikcdulupfslgcghacscrvuvbrwnoqifmmmcbcqzzvplrdhaetgjpebshvxjtyjeytzfsvthwhvhfgzlecuozuucsaybsdbrakgyprqnygxwagryhzmfirpduujnymlfijzbdosrmgienkjovqktvkjxwxdvgcldiilzcaurcpeuturnvpnirubyhzpxa','judgment','active',63);
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
RQG grammar:
thread1:
UPDATE t1 SET `col_varchar_255_ucs2_key` = CONVERT( 'degsotrsfannidwyvkuvlkeslrryhpkeevqmbksdrzadzpyisznignsytihyjixyalxfxpnafjwzgnkbbayklurufrsajtzohanbuvcfyykvtmesobixwipkoihhqykvoejckythjnjshxgohmecmklxryubdexjgxehdiqqui' USING ASCII ) WHERE col_int IS NULL ;
thread2:
UPDATE t1 SET `col_varchar_255_ucs2_key` = CONVERT( _varchar(255) USING ASCII ) |
CHECK TABLE t1 EXTENDED ;
The RQG run needs only "--threads=2".
Matthias Leich
added a comment -
sdp:/data/Results/1619015791/TBR-36-MDEV-25459/dev/shm/vardir/1619015791/4/1/rr
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `test` /*!40100 DEFAULT CHARACTER SET latin1 */;
USE `test`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `t1` (
`col_varchar_255_ucs2_key` varchar(255) CHARACTER SET ucs2 DEFAULT NULL,
`col_int` INT,
`col_char_255_ucs2` char(255) CHARACTER SET ucs2 DEFAULT NULL,
`col_varchar_255_ucs2` varchar(255) CHARACTER SET ucs2 DEFAULT NULL,
`col_char_255_utf8_key` char(255) CHARACTER SET utf8 DEFAULT NULL,
`col_char_255_utf8` char(255) CHARACTER SET utf8 DEFAULT NULL,
`pk` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`pk`),
KEY `col_varchar_255_ucs2_key` (`col_varchar_255_ucs2_key`)
) ENGINE=InnoDB AUTO_INCREMENT=95 DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;
INSERT INTO `t1` VALUES ('ship',NULL,'unbrfeqlhdxamoxyjrdpanbbojmgfvaikcdulupfslgcghacscrvuvbrwnoqifmmmcbcqzzvplrdhaetgjpebshvxjtyjeytzfsvthwhvh','nunbrfeqlhdxamoxyjrdpanbbojmgfvaikcdulupfslgcghacscrvuvbrwnoqifmmmcbcqzzvplrdhaetgjpebshvxjtyjeytzfsvthwhvhfgzlecuozuucsaybsdbrakgyprqnygxwagryhzmfirpduujnymlfijzbdosrmgienkjovqktvkjxwxdvgcldiilzcaurcpeuturnvpnirubyhzpxa','judgment','active',63);
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
RQG grammar:
thread1:
UPDATE t1 SET `col_varchar_255_ucs2_key` = CONVERT( 'degsotrsfannidwyvkuvlkeslrryhpkeevqmbksdrzadzpyisznignsytihyjixyalxfxpnafjwzgnkbbayklurufrsajtzohanbuvcfyykvtmesobixwipkoihhqykvoejckythjnjshxgohmecmklxryubdexjgxehdiqqui' USING ASCII ) WHERE col_int IS NULL ;
thread2:
UPDATE t1 SET `col_varchar_255_ucs2_key` = CONVERT( _varchar(255) USING ASCII ) |
CHECK TABLE t1 EXTENDED ;
The RQG run needs only "--threads=2".
Thank you, the new trace is perfect. Just one page in the clustered index (with 1 record) and one in the secondary index:
Clustered index:
(pk,DB_TRX_ID,DB_ROLL_PTR,col_varchar_255_ucs2_key,…)=
(63,0x3e,(update),'degsotrsfann'…,…)
Secondary index (PAGE_MAX_TRX_ID=0x3e):
(col_varchar_255_ucs2_key,pk)=
('degsotrsfann'…,63),
('qemleuzymt'…,63)(delete-marked)
The full length of the secondary index column is 340 bytes (2 bytes per ASCII character, UTF-16BE format). Those byte strings are identical between the secondary and clustered index. Here is the string in ASCII (omitting every second byte):
On a quick look, the collation function seems to work correctly. I wonder if this could be another problem with this optimization that was introduced in MDEV-20301:
at /data/Server/10.6A/storage/innobase/include/row0sel.ic:137
#4 0x000055a4ef351d0a in row_scan_index_for_mysql (prebuilt=0x621000260d88, index=0x6160008abe08, n_rows=0x6e0225b6ca30) at /data/Server/10.6A/storage/innobase/row/row0mysql.cc:4631
#5 0x000055a4eeff45b9 in ha_innobase::check (this=0x61d0008246b8, thd=0x62b00011f218, check_opt=0x62b0001244a8) at /data/Server/10.6A/storage/innobase/handler/ha_innodb.cc:14665
Marko Mäkelä
added a comment - Thank you, the new trace is perfect. Just one page in the clustered index (with 1 record) and one in the secondary index:
Clustered index:
(pk,DB_TRX_ID,DB_ROLL_PTR,col_varchar_255_ucs2_key,…)=
(63,0x3e,(update),'degsotrsfann'…,…)
Secondary index (PAGE_MAX_TRX_ID=0x3e):
(col_varchar_255_ucs2_key,pk)=
('degsotrsfann'…,63),
('qemleuzymt'…,63)(delete-marked)
The full length of the secondary index column is 340 bytes (2 bytes per ASCII character, UTF-16BE format). Those byte strings are identical between the secondary and clustered index. Here is the string in ASCII (omitting every second byte):
perl -pe 's/.(.)/$1/ge' < /dev/shm/340.bin
degsotrsfannidwyvkuvlkeslrryhpkeevqmbksdrzadzpyisznignsytihyjixyalxfxpnafjwzgnkbbayklurufrsajtzohanbuvcfyykvtmesobixwipkoihhqykvoejckythjnjshxgohmecmklxryubdexjgxehdiqqui
On a quick look, the collation function seems to work correctly. I wonder if this could be another problem with this optimization that was introduced in MDEV-20301 :
10.6 af418bb9ef7e422282dc976640409a6af8fcd58c
#0 row_sel_sec_rec_is_for_clust_rec (sec_rec=0x7f24c5fb9093 "", sec_index=0x6160008abe08, clust_rec=0x61b000064b16 "\200", clust_index=0x6160008a4308, thr=0x621000261580)
at /data/Server/10.6A/storage/innobase/row/row0sel.cc:183
#1 0x000055a4ef3c0a8d in Row_sel_get_clust_rec_for_mysql::operator() (this=0x6e0225b6b1f0, prebuilt=0x621000260d88, sec_index=0x6160008abe08, rec=0x7f24c5fb9093 "", thr=0x621000261580, out_rec=0x6e0225b6b0d0,
offsets=0x6e0225b6b150, offset_heap=0x6e0225b6b130, vrow=0x0, mtr=0x6e0225b6bc10) at /data/Server/10.6A/storage/innobase/row/row0sel.cc:3496
#2 0x000055a4ef3ccb93 in row_search_mvcc (buf=0x6210002bc100 '\276' <repeats 200 times>..., mode=PAGE_CUR_G, prebuilt=0x621000260d88, match_mode=0, direction=0)
at /data/Server/10.6A/storage/innobase/row/row0sel.cc:5383
#3 0x000055a4ef336896 in row_search_for_mysql (buf=0x6210002bc100 '\276' <repeats 200 times>..., mode=PAGE_CUR_G, prebuilt=0x621000260d88, match_mode=0, direction=0)
at /data/Server/10.6A/storage/innobase/include/row0sel.ic:137
#4 0x000055a4ef351d0a in row_scan_index_for_mysql (prebuilt=0x621000260d88, index=0x6160008abe08, n_rows=0x6e0225b6ca30) at /data/Server/10.6A/storage/innobase/row/row0mysql.cc:4631
#5 0x000055a4eeff45b9 in ha_innobase::check (this=0x61d0008246b8, thd=0x62b00011f218, check_opt=0x62b0001244a8) at /data/Server/10.6A/storage/innobase/handler/ha_innodb.cc:14665
Currently it looks like we are constructing wrong old_vers with (pk=63,DB_TRX_ID=0x3c,(update)) and apparently empty string. It is as if CHECK TABLE is using a newer read view for accessing the clustered index.
Marko Mäkelä
added a comment - I will have to debug the trace further (around when 61095) to check why exactly we are assigning clust_rec=NULL here:
if (clust_rec
&& (old_vers
|| trx->isolation_level <= TRX_ISO_READ_UNCOMMITTED
|| dict_index_is_spatial(sec_index)
|| rec_get_deleted_flag(rec, dict_table_is_comp(
sec_index->table)))) {
err = row_sel_sec_rec_is_for_clust_rec(rec, sec_index,
clust_rec, clust_index, thr);
switch (err) {
case DB_SUCCESS:
clust_rec = NULL;
break ;
Currently it looks like we are constructing wrong old_vers with (pk=63,DB_TRX_ID=0x3c,(update)) and apparently empty string. It is as if CHECK TABLE is using a newer read view for accessing the clustered index.
For the clustered index scan in the failing CHECK TABLE, row_sel_clust_sees() will not hold for the single record
(pk,DB_TRX_ID,DB_ROLL_PTR,col_varchar_255_ucs2_key,…)=
(63,0x3e,(update),'degsotrsfann'…,…)
so we constructed old_vers that contains a BLOB reference for that column:
(pk,DB_TRX_ID,DB_ROLL_PTR,col_varchar_255_ucs2_key,…)=
(63,0x3c,(update),(space=8, page=5, offset=0x26, length=0x1f0),…)
The payload of the BLOB page is the following UTF-16BE encoded 248-char ASCII string:
The prefix of this seems to match the delete-marked record in the secondary index page that I posted earlier.
So, nothing seems to be really corrupted, but the secondary index MVCC code probably fails to read the BLOB, and instead treats the 20-byte BLOB pointer as the actual string. I will post more, once I have debugged that part of the code.
Marko Mäkelä
added a comment - For the clustered index scan in the failing CHECK TABLE , row_sel_clust_sees() will not hold for the single record
(pk,DB_TRX_ID,DB_ROLL_PTR,col_varchar_255_ucs2_key,…)=
(63,0x3e,(update),'degsotrsfann'…,…)
so we constructed old_vers that contains a BLOB reference for that column:
(pk,DB_TRX_ID,DB_ROLL_PTR,col_varchar_255_ucs2_key,…)=
(63,0x3c,(update),(space=8, page=5, offset=0x26, length=0x1f0),…)
The payload of the BLOB page is the following UTF-16BE encoded 248-char ASCII string:
qemleuzymtgopfikwzloyibtgehtjhlablwaewqzuglpeoqnnoxsqeyzbocilomtnpaxihztdrtqahffxekdhrwdvtdvmlshvdhrgfkmuwonkyqzwelujpeyggixymdugzhiqlmnrlpxzrytoatxolflxdzknhkgyttnjqwqcutogtwviuoqkbzxlbzkzdupmxeeifroaweulyyetwdnnrvxrtoksrgcplpyubzpyntedwlgnvzehakg
The prefix of this seems to match the delete-marked record in the secondary index page that I posted earlier.
So, nothing seems to be really corrupted, but the secondary index MVCC code probably fails to read the BLOB, and instead treats the 20-byte BLOB pointer as the actual string. I will post more, once I have debugged that part of the code.
We have this code in row_sel_sec_rec_is_for_clust_rec():
if (ifield->prefix_len > 0 && len != UNIV_SQL_NULL
&& sec_len != UNIV_SQL_NULL && !is_virtual) {
if (rec_offs_nth_extern(clust_offs, clust_pos)) {
len -= BTR_EXTERN_FIELD_REF_SIZE;
}
len = dtype_get_at_most_n_mbchars(
col->prtype, col->mbminlen, col->mbmaxlen,
ifield->prefix_len, len, (char*) clust_field);
if (rec_offs_nth_extern(clust_offs, clust_pos)
&& len < sec_len) {
if (!row_sel_sec_rec_is_for_blob(
Because the secondary index does not comprise a column prefix, but in fact the full column col_varchar_255_ucs2_key, we fail to invoke the check for BLOB (really, off-page column, in this case VARCHAR).
This should mean that MVCC is broken for full-column secondary indexes in ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED InnoDB tables if the column happened to become chosen for external storage in the clustered index. In ROW_FORMAT=REDUNDANT (the default and only format until MySQL 5.0.3) and ROW_FORMAT=COMPACT (the default between MySQL 5.0.3 and 5.7, or MariaDB Server before 10.2.2), we would always store the entire column in the clustered index.
I will have to check if something similar is broken in the implicit locking of secondary indexes, in row_vers_impl_x_locked_low(). If the call to row_build() is not fetching the externally stored VARCHAR column, locking for secondary index records could be broken. I will try to verify that on a copy of the data directory and possibly a patched server.
Marko Mäkelä
added a comment - We have this code in row_sel_sec_rec_is_for_clust_rec() :
if (ifield->prefix_len > 0 && len != UNIV_SQL_NULL
&& sec_len != UNIV_SQL_NULL && !is_virtual) {
if (rec_offs_nth_extern(clust_offs, clust_pos)) {
len -= BTR_EXTERN_FIELD_REF_SIZE;
}
len = dtype_get_at_most_n_mbchars(
col->prtype, col->mbminlen, col->mbmaxlen,
ifield->prefix_len, len, ( char *) clust_field);
if (rec_offs_nth_extern(clust_offs, clust_pos)
&& len < sec_len) {
if (!row_sel_sec_rec_is_for_blob(
Because the secondary index does not comprise a column prefix, but in fact the full column col_varchar_255_ucs2_key , we fail to invoke the check for BLOB (really, off-page column, in this case VARCHAR ).
This should mean that MVCC is broken for full-column secondary indexes in ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED InnoDB tables if the column happened to become chosen for external storage in the clustered index. In ROW_FORMAT=REDUNDANT (the default and only format until MySQL 5.0.3) and ROW_FORMAT=COMPACT (the default between MySQL 5.0.3 and 5.7, or MariaDB Server before 10.2.2), we would always store the entire column in the clustered index.
I will have to check if something similar is broken in the implicit locking of secondary indexes, in row_vers_impl_x_locked_low() . If the call to row_build() is not fetching the externally stored VARCHAR column, locking for secondary index records could be broken. I will try to verify that on a copy of the data directory and possibly a patched server.
I believe that the bug affects also earlier versions, but I did not test it, because MariaDB 5.5, 10.0, 10.1 have already reached their end of life.
Also, this test case only repeats the problem for innodb_page_size=4k, it should be repeatable with any page size when using appropriately sized records.
Marko Mäkelä
added a comment - I can repeat the problem with the following:
--source include/innodb_page_size_small.inc
CREATE TABLE t1 (
pk int PRIMARY KEY , c varchar (255) UNIQUE ,
d char (255), e varchar (255), f char (255), g char (255)
) ENGINE=InnoDB ROW_FORMAT= DYNAMIC DEFAULT CHARACTER SET ucs2;
INSERT INTO t1 VALUES
(1,REPEAT( 'c' ,248),REPEAT( 'a' ,106),REPEAT( 'b' ,220),REPEAT( 'x' ,14), '' );
BEGIN ;
UPDATE t1 SET c=REPEAT( 'd' ,170);
connect (con1,localhost,root,,);
SELECT pk FROM t1 FORCE INDEX (c);
connection default ;
COMMIT ;
connection con1;
SELECT pk FROM t1 FORCE INDEX (c);
disconnect con1;
connection default ;
DROP TABLE t1;
10.2 922e676b43c7b5cb0f20ca67c6d2222e2fc5ec03
innodb.mvcc_secondary '16k,innodb' w1 [ pass ] 6
innodb.mvcc_secondary '8k,innodb' w3 [ pass ] 5
innodb.mvcc_secondary '4k,innodb' w2 [ fail ]
Test ended at 2021-04-23 14:06:40
CURRENT_TEST: innodb.mvcc_secondary
--- /mariadb/10.2o/mysql-test/suite/innodb/r/mvcc_secondary.result 2021-04-23 14:06:25.841488727 +0300
+++ /mariadb/10.2o/mysql-test/suite/innodb/r/mvcc_secondary.reject 2021-04-23 14:06:40.565659501 +0300
@@ -9,7 +9,6 @@
connect con1,localhost,root,,;
SELECT pk FROM t1 FORCE INDEX (c);
pk
-1
connection default;
COMMIT;
connection con1;
mysqltest: Result length mismatch
I believe that the bug affects also earlier versions, but I did not test it, because MariaDB 5.5, 10.0, 10.1 have already reached their end of life.
Also, this test case only repeats the problem for innodb_page_size=4k , it should be repeatable with any page size when using appropriately sized records.
If I append FOR UPDATE or LOCK IN SHARE MODE to the first SELECT, a locking conflict will occur:
10.2 922e676b43c7b5cb0f20ca67c6d2222e2fc5ec03
mysqltest: At line 15: query 'SELECT pk FROM t1 FORCE INDEX (c) LOCK IN SHARE MODE' failed: 1205: Lock wait timeout exceeded; try restarting transaction
This suggests that no change for row_vers_impl_x_locked_low() is needed, and only row_sel_sec_rec_is_for_clust_rec() is wrong.
Marko Mäkelä
added a comment - If I append FOR UPDATE or LOCK IN SHARE MODE to the first SELECT , a locking conflict will occur:
10.2 922e676b43c7b5cb0f20ca67c6d2222e2fc5ec03
mysqltest: At line 15: query 'SELECT pk FROM t1 FORCE INDEX (c) LOCK IN SHARE MODE' failed: 1205: Lock wait timeout exceeded; try restarting transaction
This suggests that no change for row_vers_impl_x_locked_low() is needed, and only row_sel_sec_rec_is_for_clust_rec() is wrong.
RQG error patterns
[ 'TBR-36' , 'InnoDB: Index .{1,30} contains .{1,10} entries, should be .{1,10}.' ],
[ 'TBR-36' , 'failed: 1712 Index .* is corrupted' ],
[ 'TBR-263' , 'Index for table .{1,150} is corrupt; try to repair it' ],
git clone https://github.com/mleich1/rqg --branch experimental RQG
./RR_REPLAY.sh simp_many_indexes.cfg <path to MariaDb binaries> TBR-36_micro.yy