[MCOL-1176] 10Mio Row test fails, only 9988608 rows are written to ColumnStore Created: 2018-01-24  Updated: 2023-10-26  Resolved: 2018-01-31

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.1.2
Fix Version/s: 1.1.3

Type: Bug Priority: Blocker
Reporter: Jens Röwekamp (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Debian 9 no other dists tested


Sprint: 2018-02, 2018-03

 Description   

When expanding the 1Mio row python test [1] to 10Mio rows, only 9988608 rows are written to ColumnStore. The first 11392 rows are somehow dropped.

Same behaviour for the 10Mio row java test [2]. Here also only 9988608 rows are written.

Therefore, probably a bug in the C++ implementation or Swig.

[1] https://github.com/mariadb-corporation/mariadb-columnstore-api/blob/MCOL-1091/python/test/test_million_row.py
[2] https://github.com/mariadb-corporation/mariadb-columnstore-api/blob/MCOL-1091/java/src/test/java/com/mariadb/columnstore/api/MillionRowTest.java



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2018-01-24 ]

Confirmed in the C++ API too. Assigned to me

Comment by Andrew Hutchings (Inactive) [ 2018-01-24 ]

10,000,000 rows means there will be 2 extents, the first with 8388608. The writeRow() that triggers the extent rollover will have 11392 rows left to write. So these must be getting dropped when the new extent is being created.

Comment by Andrew Hutchings (Inactive) [ 2018-01-25 ]

I think this affects DML INSERTS too but I haven't yet been able to build a test to prove it. I did try to get the API to insert just under 8388608 rows to try and push it over the edge with DML and it creates an extra unused extent which fires an error when read.

Comment by Andrew Hutchings (Inactive) [ 2018-01-25 ]

DML Insert is not affected and the just under 8388608 row problem was due to cpimport and bulk write skipping the first block even if it is empty. It is part of the same problem.

Comment by Andrew Hutchings (Inactive) [ 2018-01-25 ]

Made this a blocker as this is a data loss issue with no good workaround.

Comment by Andrew Hutchings (Inactive) [ 2018-01-30 ]

Branches in API and engine to be merged.

For QA: There is a test in the API's built-in regression suite.

Comment by Daniel Lee (Inactive) [ 2018-01-31 ]

Build verified: Github source 1.1.3-1

[root@localhost ~]# cat mariadb-columnstore-1.1.3-1-centos7.x86_64.bin.tar.txt
/root/columnstore/mariadb-columnstore-server
commit 99cdb0a4b5626426402a2be2572844409e4db18d
Merge: f56e806 936a78c
Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
Date: Wed Jan 31 09:35:48 2018 +0200

Merge pull request #91 from mariadb-corporation/MCOL-964

MCOL-964 Use cache_table name only if from view

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit dd79644ae7003754b014bfbee1f33763286f8ad2
Merge: 70993d4 2b944eb
Author: David.Hall <david.hall@mariadb.com>
Date: Tue Jan 30 15:32:43 2018 -0600

Merge pull request #381 from mariadb-corporation/MCOL-1160

MCOL-1160 Track and flush dictionary blocks

[root@localhost mariadb-columnstore-api]# git show
commit f053df67f2efe6ea64b67be0839b31d3a57d1784
Merge: 8246470 4279cd6
Author: David.Hall <david.hall@mariadb.com>
Date: Tue Jan 30 15:33:40 2018 -0600

Merge pull request #41 from mariadb-corporation/MCOL-1160

MCOL-1160 Fix dictionary flushing

All 24 API tests passed.

Generated at Thu Feb 08 02:26:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.