[MCOL-1396] VARCHAR returning NULL when StringStore memory limit exceeded Created: 2018-05-08  Updated: 2018-05-25  Resolved: 2018-05-25

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 1.1.4
Fix Version/s: 1.1.5

Type: Bug Priority: Critical
Reporter: Andrew Hutchings (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 1
Labels: None

Attachments: File create.sql     File generate.php    
Sprint: 2018-10, 2018-11

 Description   

Report of a table with 100M rows containing a VARCHAR(48) and VARCHAR(32) will start returning NULL and truncated versions of:

_CpNuLl_



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2018-05-08 ]

Current working theory:

StringStore in 1.0 could hold up to 4GB of strings before it was full. In 1.1 we use the high bit of StringStore to mark long strings (TEXT/BLOB) so that we could store these separately. This means StringStore can store 2GB, after this it will be storing using the high bit (there is no check to see if we are doing something bad) and the retrieval will try and get from the long string storage (which is empty).

Regardless of outcome we need to modify StringStore to handle > 4GB of data (64bit ints).

Comment by Andrew Hutchings (Inactive) [ 2018-05-08 ]

How to reproduce (using attachments):

1. Import the table create.sql into test database
2. php generate.php > data.tbl (go make coffee, this will take a long time)
3. cpimport test mcol1396 data.tbl
4. Execute the following:

mcsmysql -uroot test -r -q -e "select if(a > 0, b, c), if(a > 0, c, b) from (select * from mcol1396) as se;" > output.data

Some of the rows will have "NULL" instead of data

Comment by Andrew Hutchings (Inactive) [ 2018-05-08 ]

Confirmed the problem is as described in comment #1

Comment by Andrew Hutchings (Inactive) [ 2018-05-08 ]

For QA: see comment #2

Comment by Daniel Lee (Inactive) [ 2018-05-23 ]

Build tested: 1.1.5-1 source

/root/columnstore/mariadb-columnstore-server
commit 0c983bff02172849a174dde46b62d76aa66485f8
Merge: 6b8a674 d5e6d89
Author: benthompson15 <ben.thompson@mariadb.com>
Date: Thu Apr 26 16:16:51 2018 -0500

Merge pull request #112 from mariadb-corporation/davidhilldallas-patch-3

update to 1.1.5

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit 1ea5198e0e9ecc2a8d13e6b44bf6c632f8561199
Merge: 4533116 59858aa
Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
Date: Fri May 18 12:37:47 2018 +0100

Merge pull request #475 from drrtuy/MCOL-1415

MCOL-1415

/root/mariadb-columnstore-tools
commit e83f2713be574c0b98b37faf4fa61c8ce4997e90
Author: david hill <david.hill@mariadb.com>
Date: Wed Apr 25 14:07:58 2018 -0500

update to 1.1.5

I reproduced the issue in 1.1.4-1. The query ran for a while and ColumnStore eventually restarted due to swap space usage, an expected behavior (My VM used for testing has limited amount of memory). When I checked the output.data file, there were 24117249 lines, with 10698751 lines of "NULL NULL" toward the end of the file.

In 1.1.5-1, the same test produced 18692097 lines without "NULL NULL"s. That means the query did not return NULLs when memory was running out.

Therefore, the reported issue seemed to have been fixed, an additional test uncovered an compression issue in 1.1.5-1. The same issue did not occurred in 1.1.4-1.

After creating the table and loading the data file (Both 1.1.5-1 and 1.1.4-1 tests used the same data file, copied across the network), I executed the following query to make sure there is no NULLs in the table.

MariaDB [mytest]> select count(*), sum(isnull(a)), sum(isnull(b)), sum(isnull(c)) from mcol1396;
ERROR 1815 (HY000): Internal error: An unexpected condition within the query caused an internal processing error within InfiniDB. Please check the log files for more details. Additional Information: error in BatchPrimitiveProces

further investigation show that column c had a decompression issue. Here is what's in the err.log file.

May 23 18:40:01 localhost PrimProc[666]: 01.296203 |0|0|0| C 28 CAL0061: PrimProc error reading file for OID 3791; Error decompressing block 63 code=-1 part=0 seg=2
May 23 18:40:01 localhost PrimProc[666]: 01.305513 |0|0|0| C 28 CAL0000: Error decompressing block 63 code=-1 part=0 seg=2
May 23 18:40:04 localhost PrimProc[666]: 04.470432 |0|0|0| C 28 CAL0061: PrimProc error reading file for OID 3791; Error decompressing block 63 code=-1 part=0 seg=2
May 23 18:40:04 localhost PrimProc[666]: 04.471273 |0|0|0| C 28 CAL0000: Error decompressing block 63 code=-1 part=0 seg=2

I recreated the table and imported the data file again, then the decompression error occurred on column b instead with similar error messages.

For testing, I am using Centos 7 vagrant vm box, with 6 gb of memory configured. Columnstore stack is single server with one local dbroot.

Comment by Andrew Hutchings (Inactive) [ 2018-05-24 ]

struggling to reproduce this. I'm wondering if it is a RAM issue. I don't have anything that small but I'll create one and try.

Comment by Andrew Hutchings (Inactive) [ 2018-05-24 ]

tried many ways to reproduce this, re-assigned to Daniel to see if he can again on a new build.

Comment by Daniel Lee (Inactive) [ 2018-05-25 ]

Did more testing with new build and newly generated data and could not reproduce the issue.

Generated at Thu Feb 08 02:28:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.