[MCOL-3395] regression: dictionary de-duplication cache bleeding between columns Created: 2019-06-27  Updated: 2019-10-28  Resolved: 2019-07-05

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 1.2.4
Fix Version/s: 1.2.5

Type: Bug Priority: Critical
Reporter: David Hall (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MCOL-1559 Regression on working_tpch1/misc/bug3... Closed
Problem/Incident
is caused by MCOL-3270 Improve cpimport ingest speed into Di... Closed
Sprint: 2019-06

 Description   

Consider the following table (from misc/bug3669)

create table stringtest (c1 char(10), c2 varchar(10), c3 varchar(6))engine=columnstore;
insert into stringtest values ('abc','cde','abc'), ('cde','abc','cde');

MariaDB [tpch1]> select * from stringtest where c1='abc';
+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| abc  | cde  | abc  |
+------+------+------+
1 row in set (0.039 sec)
 
MariaDB [tpch1]> select * from stringtest where c2='abc';
Empty set (0.036 sec)
 
MariaDB [tpch1]> select * from stringtest where c3='abc';
+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| abc  | cde  | abc  |
+------+------+------+
1 row in set (0.029 sec)

In the first, we're using a char field < 7 wide. In the third we're using a varchar field < 7 wide,
but the second is a varchar > 7 and returns nothing.

In Columnstore 1.1, we get:

MariaDB [dhall]> select * from stringtest where c2='abc';
+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| cde  | abc  | cde  |
+------+------+------+
1 row in set (0.04 sec)



 Comments   
Comment by David Hall (Inactive) [ 2019-07-01 ]

A new development:
When insert multiple rows, it gives wrong answer. When insert one row at a time, it gives correct answer. Is the problem in cpimport or connector feeder to cpimport?

This works:

MariaDB [tpch1]> truncate st1;
Query OK, 0 rows affected (0.638 sec)
 
MariaDB [tpch1]> insert into st1 values('abc','cde', 'abc');
Query OK, 1 row affected (0.904 sec)
 
MariaDB [tpch1]> insert into st1 values('cde','abc','cde');
Query OK, 1 row affected (0.477 sec)
 
MariaDB [tpch1]> select * from st1 where c2='abc';
+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| cde  | abc  | cde  |
+------+------+------+
1 row in set (0.437 sec)
 
MariaDB [tpch1]> 

This does not work:

MariaDB [tpch1]> truncate st1;
Query OK, 0 rows affected (0.638 sec)
 
MariaDB [tpch1]> insert into st1 values('abc','cde', 'abc'), ('cde','abc','cde');
Query OK, 2 rows affected (0.932 sec)
Records: 2  Duplicates: 0  Warnings: 0
 
MariaDB [tpch1]> select * from st1 where c2='abc';
Empty set (0.975 sec)

Comment by David Hall (Inactive) [ 2019-07-01 ]

Tested cpimport and it breaks:

st1.tbl:
'abc'|'cde'|'abc'
'cde'|'abc'|'cde'

[root@srvhall04 queries]# cpimport tpch1 st1 /home/calpont/st1.tbl
Locale is : C

Using table OID 37916 as the default JOB ID
Input file(s) will be read from : /home/calpont/mariadb-columnstore-regression-test/mysql/queries
Job description file : /usr/local/mariadb/columnstore/data/bulk/tmpjob/37916_D20190701_T122149_S151243_Job_37916.xml
Log file for this job: /usr/local/mariadb/columnstore/data/bulk/log/Job_37916.log
2019-07-01 12:21:49 (10256) INFO : successfully loaded job file /usr/local/mariadb/columnstore/data/bulk/tmpjob/37916_D20190701_T122149_S151243_Job_37916.xml
2019-07-01 12:21:49 (10256) INFO : Job file loaded, run time for this step : 0.179432 seconds
2019-07-01 12:21:49 (10256) INFO : PreProcessing check starts
2019-07-01 12:21:49 (10256) INFO : input data file /home/calpont/st1.tbl
2019-07-01 12:21:49 (10256) INFO : PreProcessing check completed
2019-07-01 12:21:49 (10256) INFO : preProcess completed, run time for this step : 0.0465939 seconds
2019-07-01 12:21:49 (10256) INFO : No of Read Threads Spawned = 1
2019-07-01 12:21:49 (10256) INFO : No of Parse Threads Spawned = 3
2019-07-01 12:21:49 (10256) INFO : For table tpch1.st1: 2 rows processed and 2 rows inserted.
2019-07-01 12:21:50 (10256) INFO : Bulk load completed, total run time : 1.23987 seconds

[root@srvhall04 queries]#

MariaDB [tpch1]> select * from st1;
-----------------

c1 c2 c3

-----------------

'abc' 'cde' 'abc'
'cde' 'abc' 'cde'

-----------------
2 rows in set (0.444 sec)

MariaDB [tpch1]> select * from st1 where c2='abc';
Empty set (0.067 sec)

Comment by Andrew Hutchings (Inactive) [ 2019-07-02 ]

Regression happened between 1.2.3 and 1.2.4 and only seems to affect compressed tables

Comment by Andrew Hutchings (Inactive) [ 2019-07-02 ]

Regression caused by MCOL-3270 (found using git bisect)

Comment by Andrew Hutchings (Inactive) [ 2019-07-03 ]

PR in engine and regression suite.

Cause is the the new dictionary de-duplication code in 1.2.4.

Basically the cache is persisting between columns in a single insert. So when different columns contain the same data the token for the wrong column is returned.

This patch clears the cache on soft as well as hard close.

For QA: test added to regression suite and you can use the test in description (the c2 where condition should return a result).

Comment by Andrew Hutchings (Inactive) [ 2019-07-03 ]

Full explanation sent to David Hall:

When a dictionary write is happening the Dctnry class is used to check/insert into the de-duplication cache, write the dictionary data if required and return the token.

The problem comes where the Dctnry class is reused and just opens a new file and re-loads the cache. With the new de-duplication code in 1.2.4 the cache is no longer cleared on load.

Lets take your insert query as an example:

insert into stringtest values ('abc','cde','abc'), ('cde','abc','cde');

First the Dctnry class is used to insert 'abc' and 'cde' into c1, add to the de-duplication cache and return tokens.

Then 'cde' and 'abc' are inserted into c2. The de-duplication cache isn't cleared on file open so the Dctnry class has cache hits and is returning the tokens for c1's dictionary file.

This means on a basic select the data is there, because c2's token column is pointing to the LBIDs/offsets of c1's dictionary. But when you try to scan c2's dictionary with that WHERE condition to get tokens the file is empty.

Comment by Daniel Lee (Inactive) [ 2019-07-05 ]

Build verified: 1.2.5-1 nightly

reproduced issue in 1.2.4-1

Generated at Thu Feb 08 02:42:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.