[MDEV-21735] Rocksdb text prefix indexes return wrong results Created: 2020-02-14  Updated: 2023-04-27

Status: Confirmed
Project: MariaDB Server
Component/s: Character Sets, Storage Engine - RocksDB
Affects Version/s: 10.3.22, 10.4.12, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.4, 10.5

Type: Bug Priority: Major
Reporter: Michal Povinsky Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Tested on 10.3 on Debian and 10.4 on Gentoo



 Description   

Rows are missing from the output when searching by indexed text column when the value is longer than the index's prefix length. Using "like 'prefix%'" only works when the prefix is shorter than the index's length (or equal, if value=prefix).
Deleting the index, using ignore index or converting the table to another engine gives expected results.
The issue seems to only happen when using utf8/utf8mb4/utf16/utf32 collations, latin1/ascii/ucs2 collations work correctly.

create table tt5 (tx text) engine=rocksdb;
insert into tt5 values ('long text');
select * from tt5 where tx='long text';      # returns the row
create index z on tt5(tx(3));
select * from tt5 where tx='long text';      # empty result
select * from tt5 where tx like 'long text'; # also empty
select * from tt5 where tx like 'lon%';      # empty
select * from tt5 where tx like 'lo%';       # this returns the row again



 Comments   
Comment by Elena Stepanova [ 2020-03-07 ]

Please paste or attach your config file(s).

MariaDB [test]> create table tt5 (tx text) engine=rocksdb;
Query OK, 0 rows affected (0.123 sec)
 
MariaDB [test]> insert into tt5 values ('long text');
Query OK, 1 row affected (0.041 sec)
 
MariaDB [test]> select * from tt5 where tx='long text';      # returns the row
+-----------+
| tx        |
+-----------+
| long text |
+-----------+
1 row in set (0.000 sec)

MariaDB [test]> create index z on tt5(tx(3));
Query OK, 0 rows affected (0.400 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select * from tt5 where tx='long text';      # empty result
+-----------+
| tx        |
+-----------+
| long text |
+-----------+
1 row in set (0.000 sec)

MariaDB [test]> select @@version;
+-----------------+
| @@version       |
+-----------------+
| 10.4.12-MariaDB |
+-----------------+
1 row in set (0.000 sec)

Comment by Michal Povinsky [ 2020-03-08 ]

Here's the content of my config files (with removed comments and irrelevant sections.) It's an almost unmodified Gentoo config (except for enabling rocksdb) and commenting out the character set options doesn't make any difference.

I did a little more testing, and the bug seems to depend on database collation - I didn't test all collations, but it seems that running the test in a fresh database using ascii/latin1/ucs2 collations works correctly, but utf8/utf8mb4/utf16/utf32 shows this bug.

I didn't know that the official MariaDB builds default to latin1, but I can reproduce it there with no config changes, as long as I use utf8mb4_bin database collation.

[client]
socket						= /var/run/mysqld/mysqld.sock
character-sets-dir				= /usr/share/mariadb/charsets
loose-default-character-set=utf8
 
[mysqld]
character-set-server				= utf8
user						= mysql
port						= 3306
socket						= /var/run/mysqld/mysqld.sock
pid-file					= /var/run/mysqld/mariadb.pid
log-error					= /var/log/mysql/mysqld.err
basedir						= /usr
datadir						= /var/lib/mysql
skip-external-locking
lc_messages_dir					= /usr/share/mariadb
lc_messages					= en_US
 
bind-address					= 127.0.0.1
 
log-bin
server-id					= 1
 
tmpdir						= /tmp/
 
[mysqld]
plugin-load-add=auth_socket.so
log_error = /var/log/mysql/error.log
 
[mariadb]
plugin-load-add=ha_rocksdb.so

Comment by Elena Stepanova [ 2020-04-05 ]

Reproducible as described:

create database db collate utf8_general_ci;
use db;
 
create table tt5 (tx text) engine=rocksdb;
insert into tt5 values ('long text');
select * from tt5 where tx='long text';
 
create index z on tt5(tx(3));
select * from tt5 where tx='long text';

Please note that index creation in this case throws a warning:

create index z on tt5(tx(3));
Warnings:
Warning	1815	Internal error: Indexed column db.tt5.tx uses a collation that does not allow index-only access in secondary key and has reduced disk space efficiency in primary key.
select * from tt5 where tx='long text';
tx

Generated at Thu Feb 08 09:09:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.