[MCOL-1968] wrong string comparisation after dataimport and extents Created: 2018-11-26  Updated: 2020-08-25  Resolved: 2019-06-18

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.1.5, 1.1.6, 1.2.5
Fix Version/s: 1.1.0

Type: Bug Priority: Major
Reporter: Richard Stracke Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File export-fc.tar.xz    
Issue Links:
Problem/Incident
causes MCOL-3399 Regression in LDI string length handling Closed
Relates
relates to MCOL-3388 Merge up 1.1 -> 1.2 Closed
Sprint: 2019-06

 Description   

To Reproduce

Copy attached export-fc.csv to /usr/local/mariadb/columnstore/mysql/db

DROP TABLE IF EXISTS t1;
 
CREATE TABLE t1 (
`s1` varchar(2) DEFAULT NULL
) ENGINE=Columnstore DEFAULT CHARSET=utf8;
 
LOAD DATA INFILE '/usr/local/mariadb/columnstore/mysql/db/export-fc.csv'
INTO TABLE t1
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ';' ENCLOSED BY '\"'
LINES TERMINATED BY '\n';

Comparisation result in where clause is everytime 0

MariaDB [test]> select COUNT(*) from t1 where s1 = '60';                      
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

It works with converting

MariaDB [test]> select COUNT(*) from t1 where convert(s1,CHAR(2)) = '60';
+----------+
| COUNT(*) |
+----------+
|    65572 |
+----------+
1 row in set (0.41 sec)

Interesting is, if we create random strings into an innodb table and inserting into a columnstore table, the opposite is the case

Testcase:

drop table if exists `t1`;
drop table if exists `t2`;
 
CREATE TABLE `t1` (
`s1` varchar(2) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
CREATE TABLE `t2` (
`s1` varchar(2) DEFAULT NULL
) ENGINE=Columnstore DEFAULT CHARSET=utf8;
 
DROP PROCEDURE IF EXISTS InsertRandt1;
 
DELIMITER $$
CREATE PROCEDURE InsertRandt1(IN NumRows INT, IN MinVal INT, IN MaxVal INT)
    BEGIN
        DECLARE i INT;
        SET i = 1;
        START TRANSACTION;
        WHILE i <= NumRows DO
            INSERT INTO t1 VALUES (lpad(MinVal + CEIL(RAND() * (MaxVal - MinVal)),2,'0' ));
            SET i = i + 1;
        END WHILE;
        COMMIT;
    END$$
DELIMITER ;
 
call InsertRandt1(5000000, 1,100);
 
insert into `t2` select * from `t1`;

MariaDB [test]> select COUNT(*) from t1 where s1 = '60';
+----------+
| COUNT(*) |
+----------+
|    50385 |
+----------+
1 row in set (1.39 sec)
 
MariaDB [test]> select COUNT(*) from t1 where s1 = convert(s1,CHAR(2)) = '60';
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (1.97 sec)



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2019-06-10 ]

Somehow during the LDI the min/max values are getting bad data. This is causing PBE on equality matching when not part of a function.

Comment by Andrew Hutchings (Inactive) [ 2019-06-10 ]

There is bad data in the import file. One value is hex C2 AB 33. For some reason the Min value is getting set to the first 2 bytes of this (with reversed byte order). This doesn't happen with direct cpimport which seems to imply some kind of bad manipulation when LDI is used.

Comment by Andrew Hutchings (Inactive) [ 2019-06-11 ]

If the first byte of a char/varchar was >= 0x80 the min/max would break for that extent. Submitted patch fixes that.

Comment by Daniel Lee (Inactive) [ 2019-06-18 ]

Build verified: 1.1.8-1 nightly

[root@localhost ~]# cat gitversionInfo.txt
server commit:
09faec8
engine commit:
0f5cc93

Reproduced the issue in 1.1.6-1 and verified the fix in 1.1.8-1 nightly.

[root@localhost ~]# mcsmysql mytest
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 14
Server version: 10.2.22-MariaDB-log Columnstore 1.1.8-1

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [mytest]> DROP TABLE IF EXISTS t1;
Query OK, 0 rows affected (0.20 sec)

MariaDB [mytest]>
MariaDB [mytest]> CREATE TABLE t1 (
-> `s1` varchar(2) DEFAULT NULL
-> ) ENGINE=Columnstore DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.20 sec)

MariaDB [mytest]> LOAD DATA INFILE '/tmp/export-fc.csv'
-> INTO TABLE t1
-> CHARACTER SET 'utf8'
-> FIELDS TERMINATED BY ';' ENCLOSED BY '\"'
-> LINES TERMINATED BY '\n';
Query OK, 3830937 rows affected, 1 warning (6.81 sec)
Records: 3830937 Deleted: 0 Skipped: 0 Warnings: 1

MariaDB [mytest]> select COUNT from t1 where s1 = '60';
----------

COUNT

----------

65572

----------
1 row in set (0.68 sec)

MariaDB [mytest]> select COUNT from t1 where convert(s1,CHAR(2)) = '60';
----------

COUNT

----------

65572

----------
1 row in set (0.85 sec)

Generated at Thu Feb 08 02:32:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.