[MCOL-271] Improved support for NULL for Varchar/char/text/datetime/timestamp + ? Created: 2016-08-18  Updated: 2023-11-17  Resolved: 2023-04-25

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: 23.10.0

Type: New Feature Priority: Blocker
Reporter: Andrew Hutchings (Inactive) Assignee: Sergey Zefirov
Resolution: Fixed Votes: 1
Labels: rm_invalid_data

Issue Links:
Blocks
is blocked by MCOL-92 Support full numeric ranges for all d... Closed
Duplicate
is duplicated by MCOL-1423 cpimport loads empty value ("") as NU... Closed
is duplicated by MCOL-4403 Trailing spaces not behaving as InnoDB Closed
is duplicated by MCOL-4579 CHAR(2) NULL: empty strings or SPACE(... Closed
is duplicated by MCOL-4729 ColumnStore treating NULL and empty s... Closed
PartOf
is part of MCOL-25 001 Working Folder UM Join Test Fails Closed
Problem/Incident
causes MCOL-5035 MariaDB Columnstore all versions: upd... Confirmed
Relates
relates to MCOL-4403 Trailing spaces not behaving as InnoDB Closed
relates to MCOL-4577 VARCHAR(2) NOT NULL: explicit empty s... Closed
relates to MCOL-4578 CHAR(2) NOT NULL: Empty string or SPA... Closed
relates to MCOL-171 0000-00-00 dates are not supported an... Closed
Epic Link: ColumnStore Compatibility Improvements
Sprint: 2021-17, 2022-22, 2022-23, 2023-4, 2023-5, 2023-6
Assigned for Review: Roman Roman

 Description   

I suspect this is related to MCOL-171.

MariaDB [sakila]> create table t2 (a varchar(255)) engine=columnstore;Query OK, 0 rows affected (0.61 sec)

MariaDB [sakila]> insert into t2 values ('');
Query OK, 1 row affected (0.26 sec)

MariaDB [sakila]> select * from t2;
------

a

------

NULL

------
1 row in set (0.10 sec)

MariaDB [sakila]> create table t1 (a varchar(255) not null) engine=columnstore;
Query OK, 0 rows affected (0.53 sec)

MariaDB [sakila]> insert into t1 values ('');
ERROR 1815 (HY000): Internal error: CAL0001: Insert Failed: IDB-4015: Column 'a' cannot be null.

--------------------------
external effects on NULL where clause used in appications

https://docs.google.com/document/d/1dZ6jlp3_tZ0NPQp9LXG0sFVvrMvJOKGTNwwVy80SCws/edit



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2016-08-19 ]

Problem is in WE_DMLCommandProc::processSingleInsert:

origVals = columnPtr->get_DataVector();
 
...
 
tmpStr = origVals[i];
if ( tmpStr.length() == 0 )
        isNULL = true;
else
        isNULL = false;

So an empty string would be set to NULL.

Comment by Andrew Hutchings (Inactive) [ 2016-08-19 ]

Written a 16KB patch so far on this basically implementing NULL support at every stage throughout a DML query. Still some work to do in the WriteEngine area and possible data reading to differentiate between empty strings and NULL data.

Comment by Justin Swanhart (Inactive) [ 2016-08-19 ]

I believe cpimport also has code that treats empty string as NULL, and perhaps the bridge between cpimport and the bulk loading interface too.

Comment by Andrew Hutchings (Inactive) [ 2016-08-24 ]

Progress so far in this patch: https://github.com/LinuxJedi/mariadb-columnstore-engine/commit/0434612514f851e459595924c428af30f7914118

MCOL-171 is fixed

MCOL-271 is mostly fixed. IS NULL/NOT NULL comparisons aren't yet working due to some issue in PrimProc that I haven't managed to fully diagnose yet.

Comment by Andrew Hutchings (Inactive) [ 2016-08-30 ]

On hold for now. I have to make heavy changes to the write engine to make this work.

Comment by Andrew Hutchings (Inactive) [ 2016-11-08 ]

I now think this should be fixed as part of MCOL-92. If we have a separate container for NULL entries in columns (and maybe other potential bitmasks?) the problem with int ranges goes away and it can be used with character type columns too.

Comment by Sergey Zefirov [ 2022-03-04 ]

I think that, not having special NULL bitmap as a part of column, it is possible to use something akin to ```boost::option<string>``` (not actual boost type, but something alike) where empty value represents NULL and one has to obtain a reference to string contained by checking the non-emptiness.

Type checking can be leveraged to find out all places where this replacement has to be done.

I'll discuss that with Roman and, probably, Gagan further. Most probably, Gagan will help with advices on how he needs it to be done in his work.

Comment by Sergey Zefirov [ 2022-03-05 ]

Relevant branch: https://github.com/mariadb-SergeyZefirov/mariadb-columnstore-engine/tree/MCOL-271-empty-strings-should-not-be-nulls

Comment by Sergey Zefirov [ 2022-04-07 ]

Here are some more NULL values:

        if (((colType.colDataType == execplan::CalpontSystemCatalog::DATE) && (inData == "0000-00-00")) ||
            ((colType.colDataType == execplan::CalpontSystemCatalog::DATETIME) &&
             (inData == "0000-00-00 00:00:00")) ||
            ((colType.colDataType == execplan::CalpontSystemCatalog::TIMESTAMP) &&
             (inData == "0000-00-00 00:00:00")))
        {
          isNull = true;
        }

(writeengine/server/we_dmlcommandproc.cpp)

I haven't found anything else that is also implicitly converted to NULL.

Comment by alexey vorovich (Inactive) [ 2022-05-26 ]

sergey.zefirov drrtuy I want to share some related experience we had in IBI in the area.
We also started with having '.' string treated as NULL.

It was major project to support the correct approach . We broke it into 2 major stages:

1. Correct the "object orientation" by removing direct access to control blocks and encapsulating access into functions
2. change the actual function from using '.' to using a special bit

Of course stage 1 was 99% of the work and we kept merging into main branch without changing overall desired behavior. This allowed immediate daily testing by QA and the rest of team.

once stage 1 was done we did an additional step of having a system-wide setting that was used in each of the encapsulated functions, that allowed to switch back/forth on the NULL implementation.

we tested with old approach being default for some time before changing the default to new.

This may or may not be relevant to us here .

av

Comment by Sergey Zefirov [ 2022-06-02 ]

Tests that fail to load data:

  1. mcs103_ldi_fields_enclosed_by
  2. mcs102_lds_transform_csv
  3. mcs28_load_data_local_infile
  4. mcs29_load_data_local_infile_negative
  5. unsigned_joins

There are 23 tests that call test01_distinct procedure from ctype_cmp_create.inc. Some of them pass, but majority fail. Judging from the output, the problem is in use of GROUP_CONCAT in the test01_distinct procedure. There are 6 other tests that use GROUP_CONCAT and also fail.

I see no other "hot spots" here.

Comment by alexey vorovich (Inactive) [ 2022-07-26 ]

sergey.zefirov
You say "only new are used" . Should we remove unused code ?

Comment by Sergey Zefirov [ 2022-12-09 ]

No new problems observed.

The list to fix:

Working Folder Test scripts that failed:
Compare failed - working_ssb_compareLogOnly/ssb_dim_mixed_outer_2colgrp/ssb_dim_mixed_outer_2colgrp3.sql
Compare failed - working_tpch1/aggregation/bitop.sql

Comment by Daniel Lee (Inactive) [ 2023-04-25 ]

Build verified:

engine: 1e56a0b557efb677d07533d05eb02ad723955317
server: 11c83d9ae9eb249d00589cc6ab71e7f4e67ffa27
buildNo: 7534

Testing the following areas:

INSERT
UPDATE
LDI, not using cpimport
LDI, using cpimport
cpimport, with and without -n
queries, with IS NULL and IS NOT NULL
subquery, with IS NULL and IS NOT NULL
sql_mode, with and without EMPTY_STRING_IS_NULL;
NULL values and '' (empty) strings
ETC

There is a MTR performance issue for the develop branch as of 04/24 night. According to the developer, MTR tests passed on the MCOL-271 commit.

Two tickets have been opened.
MCOL-5480 LDI loads values incorrectly for MEDIUMINT, TIME and TIMESTAMP when cpimport is used for batch insert
The issue is not caused by this fix. It also exists in release 23.02.2. For completeness, we should fix this issue, as part of MCOL-271.

MCOL-5483 For UPDATE operation, row count for 'Changed: ' returned is incorrect
Not release to this fix.

Generated at Thu Feb 08 02:19:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.