[MCOL-984] Error 1815 after several executions of example/basic_bulk_insert having SMALLINT in t1 Created: 2017-10-24  Updated: 2023-10-26  Resolved: 2017-10-27

Status: Closed
Project: MariaDB ColumnStore
Component/s: writeengine
Affects Version/s: 1.1.0
Fix Version/s: 1.1.1

Type: Bug Priority: Critical
Reporter: Sasha V Assignee: David Thompson (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Docker container with Ubuntu 16.04.2 LTS and 2GB of RAM


Issue Links:
Relates
relates to MCOL-1675 Incorrect HWM calculation if the tabl... Closed
relates to MCOL-930 Cannot execute queries Closed
Sprint: 2017-21

 Description   

After several executions of example/basic_bulk_insert the query
SELECT * FROM test.t1;
results in:
Error Code: 1815. Internal error: An unexpected condition within the query caused an internal processing error within InfiniDB. Please check the log files for more details. Additional Information: error in BatchPrimitiveProces
The crit.log has:
Oct 23 20:02:41 s_columnstore@mcs11 PrimProc[4175]: 41.881088 |0|0|0| C 28 CAL0000: /home/builder/mariadb-columnstore-server/mariadb-columnstore-engine/primitives/primproc/columncommand.cpp error on projectResultRG for oid 3248 lbid 2053120: input rids 3, output rids 2

The error is cleared by executing
select calFlushCache();
or restarting the columnstore system.

To reproduce the issue one must change the t1 table definition to have SMALLINT:

CREATE TABLE `t1` (
`a` int(11) DEFAULT NULL,
`b` smallint(6) DEFAULT NULL
) ENGINE=Columnstore;



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2017-10-24 ]

Managed to reproduce this after about 20 executions.

Comment by Andrew Hutchings (Inactive) [ 2017-10-24 ]

WriteEngine commit is sending us the wrong HWM for the smallint on the commit call, either we are appending a block instead of creating a new one, or the block calculation is incorrect (looks like the former is more likely).

Comment by Sasha V [ 2017-10-24 ]

The same error 1815 also happen for a peculiar datatype varchar(1). To reproduce the issue please change the t1 table definition to:
CREATE TABLE `t1` (
`a` int(11) DEFAULT NULL,
`b` varchar(1) DEFAULT NULL
) ENGINE=Columnstore;
and execute example/basic_bulk_insert enough times until the query
SELECT * FROM test.t1;
results in:
Error Code: 1815. Internal error: An unexpected condition within the query caused an internal processing error within InfiniDB. Please check the log files for more details. Additional Information: error in BatchPrimitiveProces
with crit.log having:
Oct 24 19:23:43 s_columnstore@mcs11 PrimProc[4175]: 43.864529 |0|0|0| C 28 CAL0000: /home/builder/mariadb-columnstore-server/mariadb-columnstore-engine/primitives/primproc/columncommand.cpp error on projectResultRG for oid 3385 lbid 693248: input rids 3000, output rids 2000

Comment by Andrew Hutchings (Inactive) [ 2017-10-24 ]

thank you for the information. It will happen with any data types that are not the same. I know roughly what is happening but not where the calculation and tracking of the HWM is failing yet. I'll dig into this further ASAP as even though the flush is letting you read the data it could cause data corruption.

Comment by Andrew Hutchings (Inactive) [ 2017-10-26 ]

OK, the problem is rowId allocation is happening on the first column rather than the smallest column which means we are writing mid-block on the smaller columns and therefore are not doing things in a crash safe way (and therefore confusing the heck out of PrimProc). To fix this we need to select the smallest column in the table to calculate the rowId from.

Comment by Andrew Hutchings (Inactive) [ 2017-10-26 ]

Pull request for engine develop-1.1 (fix is in WriteEngine).

DT may be the best to QA this due to his usage experience of mcsapi.

For QA: This is with WriteEngine's mcsapi hooks. The only way to reproduce is to use the "create table" in the description (in the test database) and run basic_bulk_insert in the example directory of mcsapi about 20 times.
Before the fix you should find that a select * on the table will fail with an error. Also the HWM will be wrong (hwm for a should be double the hwm for b after the fix)

Comment by David Thompson (Inactive) [ 2017-10-27 ]

Verified before and after behavior for the reported issue as well as a few other sanity regression tests using latest build from develop-1.1

Generated at Thu Feb 08 02:25:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.