|
Debugging both MyRocks and InnoDB. In both cases, the slave tries to find the record for update in Rows_log_event::find_row using key=0 (that is, KEY(b(8)).
The lookup key is
(gdb) x/11c m_key
|
0x7fff94015460: 0 '\000' 8 '\b' 0 '\000' 97 'a' 97 'a' 97 'a' 97 'a' 97 'a'
|
0x7fff94015468: 97 'a' 97 'a' 97 'a'
|
if ((error= table->file->ha_index_read_map(table->record[0], m_key,
|
HA_WHOLE_KEY,
|
HA_READ_KEY_EXACT)))
|
InnoDB finds the row and returns error=0 here, while MyRocks returns error=120.
|
|
Not reproducible on the upstream.
Debugging it in MyRocks, the issue looks a bit similar to MDEV-16155.
Upstream has a slightly different fix for MDEV-16155, could it be that their fix covers this issue while ours doesnt?
|
|
...no. Reverting our fix for MDEV-16155 and applying theirs doesn't fix the issue.
|
|
Hi, this patch fixed it.
https://github.com/MariaDB/server/pull/837
|
|
Looking at the patch...
A testcase without replication:
CREATE TABLE t2 (b BLOB, i INT, KEY(b(8))) ENGINE=RocksDB;
|
INSERT INTO t2 VALUES (REPEAT('a',22),1);
|
|
explain
|
select * from t2 where b=repeat('a', 22);
|
select * from t2 where b=repeat('a', 22);
|
select * from t2 ignore index (b) where b=repeat('a', 22);
|
drop table t2;
|
MariaDB [j22]> explain select * from t2 where b=repeat('a', 22);
|
+------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
|
| 1 | SIMPLE | t2 | ref | b | b | 11 | const | 1 | Using where |
|
+------+-------------+-------+------+---------------+------+---------+-------+------+-------------+
|
1 row in set (0.03 sec)
|
|
MariaDB [j22]> select * from t2 where b=repeat('a', 22);
|
Empty set (0.00 sec)
|
|
MariaDB [j22]> select * from t2 ignore index (b) where b=repeat('a', 22);
|
+------------------------+------+
|
| b | i |
|
+------------------------+------+
|
| aaaaaaaaaaaaaaaaaaaaaa | 1 |
|
+------------------------+------+
|
1 row in set (0.01 sec)
|
|
|
Looking at the on-disk data for the above example.
The execution here
(gdb) wher
|
#0 Field_blob::sort_string (this=0x7fff70049680, to=0x7fff70006645 "aaaaaaaa", length=8) at /home/psergey/dev-git/10.2-r14/sql/field.cc:8326
|
#1 0x00007fffe7668505 in myrocks::Rdb_key_def::pack_with_make_sort_key (this=0x7fff7003f160, fpi=0x7fff700466d0, field=0x7fff70049680, buf=0x7fff70046a50 '\245' <repeats 24 times>, "h4z\025", '\217' <repeats 12 times>, "\245", dst=0x7ffff413eea0, pack_ctx=0x7ffff413eed0) at /home/psergey/dev-git/10.2-r14/storage/rocksdb/rdb_datadic.cc:1264
|
#2 0x00007fffe76676d3 in myrocks::Rdb_key_def::pack_field (this=0x7fff7003f160, field=0x7fff70049680, pack_info=0x7fff700466d0, tuple=0x7fff70006645 "aaaaaaaa", packed_tuple=0x7fff70006640 "", pack_buffer=0x7fff70046a50 '\245' <repeats 24 times>, "h4z\025", '\217' <repeats 12 times>, "\245", unpack_info=0x7fff7003ef38, n_null_fields=0x0) at /home/psergey/dev-git/10.2-r14/storage/rocksdb/rdb_datadic.cc:993
|
#3 0x00007fffe7667e38 in myrocks::Rdb_key_def::pack_record (this=0x7fff7003f160, tbl=0x7fff70043480, pack_buffer=0x7fff70046a50 '\245' <repeats 24 times>, "h4z\025", '\217' <repeats 12 times>, "\245", record=0x7fff70049648 "\374\026", packed_tuple=0x7fff70006640 "", unpack_info=0x7fff7003ef38, should_store_row_debug_checksums=@0x7ffff413f03b: false, hidden_pk_id=@0x7ffff413f1a0: 3, n_key_parts=2, n_null_fields=0x0, ttl_pk_offset=0x0, ttl_bytes=0x7fff7003efe8 '\245' <repeats 16 times>, "\360<\002p\377\177") at /home/psergey/dev-git/10.2-r14/storage/rocksdb/rdb_datadic.cc:1149
|
#4 0x00007fffe75e7701 in myrocks::ha_rocksdb::update_sk (this=0x7fff7003ea28, table_arg=0x7fff70043480, kd=..., row_info=...) at /home/psergey/dev-git/10.2-r14/storage/rocksdb/ha_rocksdb.cc:9630
|
produces this secondary key:
(gdb) print new_packed_size
|
$97 = 23
|
(gdb) print m_sk_packed_tuple
|
$98 = (uchar *) 0x7fff70006640 ""
|
(gdb) x/23c m_sk_packed_tuple
|
0x7fff70006640: 0 '\000' 0 '\000' 1 '\001' 4 '\004' 1 '\001' 97 'a' 97 'a' 97 'a'
|
|======= indexnr ======================================| |= not null == | |====== value (3 chars) ==
|
0x7fff70006648: 97 'a' 97 'a' 97 'a' 97 'a' 97 'a' 0 '\000' 22 '\026' 0 '\000'
|
|===== value (5 chars ) ==============| |===== length ============ | |== hidden_pk byte #1 ==
|
0x7fff70006650: 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 2 '\002'
|
|======== hidden_pk bytes 2 - 8. Hidden pk has 8 bytes ================================================|
|
note that it stores length=22. This is incorrect. Ming's patch fixes this.
Concerns:
- this will change on-disk data format.
- Will changing Field_blob::sort_string affect something else in the server?
|
|
Upstream is not affected, they store 8 as length:
(gdb) print new_packed_size
|
$7 = 23
|
(gdb) x/23c m_sk_packed_tuple
|
0x7fffe00196c0: 0 '\000' 0 '\000' 1 '\001' 6 '\006' 1 '\001' 97 'a' 97 'a' 97 'a'
|
0x7fffe00196c8: 97 'a' 97 'a' 97 'a' 97 'a' 97 'a' 0 '\000' 8 '\b' 0 '\000'
|
0x7fffe00196d0: 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 0 '\000' 1 '\001'
|
Field_blob::make_sort_key() has this piece:
uint key_length= MY_MIN(blob_length, length);
|
It was added by this patch:
https://github.com/facebook/mysql-5.6/commit/fecef54c24b5b64037af1ecf2f8e6ced5a90a3ac
commit fecef54c24b5b64037af1ecf2f8e6ced5a90a3ac
|
Author: Manuel Ung <mung@fb.com>
|
Date: Tue Jun 14 09:50:58 2016 -0700
|
|
Fix length in Field_blob::make_sort_key
|
|
Summary:
|
For the binary collation in Field_blob::make_sort_key, we write out the length of the entire blob, even though the key itself may be shorter. We can miss out on duplicate key errors because of this:
|
...
|
|
|
Another question is, why does MyRocks not use variable-length-varchar encoding in this case?
Consider two cases:
create table t4 ( pk int not null primary key, b blob, key(b(100)))engine=rocksdb;
|
insert into t4 values (1, 'abc');
|
Here, secondary key length is 111 bytes.
create table t5 ( pk int not null primary key, b varbinary(120), key(b(100)))engine=rocksdb;
|
insert into t5 values (1, 'abc');
|
Here, key length is 18 bytes. (Which IIRC is 18= (indexnr 4) + (null_flag 1) + (8 chunk_size) + (4 pk_value))
|
|
The above is not a critical limitation.
|
|
There's a bunch of problems with the commit 87609324e0ee:
- it breaks sys_vars.max_sort_length_func. The test result is correct, commit introduced a bug. Because Field_blob::make_sort_key works correctly, exactly as it should, although not how RocksDB wants it to. The correct fix would be to fix RocksDB not to expect Field_blob::make_sort_key to do something it should not.
- It comes with a complex replication test, instead of a simple test, as in the comment above.
- it changes on-disk format
- it's in 10.3 only, so 10.2 is still not fixed
I've fixed first two points in my commit 66750323e57 (it's only in bb-10.3-serg now, so commit hash might still change). Don't know if it's possible to avoid on-disk format changes.
|
|
It's not in review, as far as I can see. There is nothing to review, there was an old patch that was wrong and was reverted two years ago. Since then I don't see any other attempts to fix the bug.
|