[MCOL-271] Improved support for NULL for Varchar/char/text/datetime/timestamp + ? Created: 2016-08-18 Updated: 2023-11-17 Resolved: 2023-04-25 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 23.10.0 |
| Type: | New Feature | Priority: | Blocker |
| Reporter: | Andrew Hutchings (Inactive) | Assignee: | Sergey Zefirov |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | rm_invalid_data | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Epic Link: | ColumnStore Compatibility Improvements | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | 2021-17, 2022-22, 2022-23, 2023-4, 2023-5, 2023-6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Assigned for Review: | |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
I suspect this is related to MariaDB [sakila]> create table t2 (a varchar(255)) engine=columnstore;Query OK, 0 rows affected (0.61 sec) MariaDB [sakila]> insert into t2 values (''); MariaDB [sakila]> select * from t2;
------
------ MariaDB [sakila]> create table t1 (a varchar(255) not null) engine=columnstore; MariaDB [sakila]> insert into t1 values (''); -------------------------- https://docs.google.com/document/d/1dZ6jlp3_tZ0NPQp9LXG0sFVvrMvJOKGTNwwVy80SCws/edit |
| Comments |
| Comment by Andrew Hutchings (Inactive) [ 2016-08-19 ] | |||||||||
|
Problem is in WE_DMLCommandProc::processSingleInsert:
So an empty string would be set to NULL. | |||||||||
| Comment by Andrew Hutchings (Inactive) [ 2016-08-19 ] | |||||||||
|
Written a 16KB patch so far on this basically implementing NULL support at every stage throughout a DML query. Still some work to do in the WriteEngine area and possible data reading to differentiate between empty strings and NULL data. | |||||||||
| Comment by Justin Swanhart (Inactive) [ 2016-08-19 ] | |||||||||
|
I believe cpimport also has code that treats empty string as NULL, and perhaps the bridge between cpimport and the bulk loading interface too. | |||||||||
| Comment by Andrew Hutchings (Inactive) [ 2016-08-24 ] | |||||||||
|
Progress so far in this patch: https://github.com/LinuxJedi/mariadb-columnstore-engine/commit/0434612514f851e459595924c428af30f7914118
| |||||||||
| Comment by Andrew Hutchings (Inactive) [ 2016-08-30 ] | |||||||||
|
On hold for now. I have to make heavy changes to the write engine to make this work. | |||||||||
| Comment by Andrew Hutchings (Inactive) [ 2016-11-08 ] | |||||||||
|
I now think this should be fixed as part of | |||||||||
| Comment by Sergey Zefirov [ 2022-03-04 ] | |||||||||
|
I think that, not having special NULL bitmap as a part of column, it is possible to use something akin to ```boost::option<string>``` (not actual boost type, but something alike) where empty value represents NULL and one has to obtain a reference to string contained by checking the non-emptiness. Type checking can be leveraged to find out all places where this replacement has to be done. I'll discuss that with Roman and, probably, Gagan further. Most probably, Gagan will help with advices on how he needs it to be done in his work. | |||||||||
| Comment by Sergey Zefirov [ 2022-03-05 ] | |||||||||
|
Relevant branch: https://github.com/mariadb-SergeyZefirov/mariadb-columnstore-engine/tree/MCOL-271-empty-strings-should-not-be-nulls | |||||||||
| Comment by Sergey Zefirov [ 2022-04-07 ] | |||||||||
|
Here are some more NULL values:
(writeengine/server/we_dmlcommandproc.cpp) I haven't found anything else that is also implicitly converted to NULL. | |||||||||
| Comment by alexey vorovich (Inactive) [ 2022-05-26 ] | |||||||||
|
sergey.zefirov drrtuy I want to share some related experience we had in IBI in the area. It was major project to support the correct approach . We broke it into 2 major stages: 1. Correct the "object orientation" by removing direct access to control blocks and encapsulating access into functions Of course stage 1 was 99% of the work and we kept merging into main branch without changing overall desired behavior. This allowed immediate daily testing by QA and the rest of team. once stage 1 was done we did an additional step of having a system-wide setting that was used in each of the encapsulated functions, that allowed to switch back/forth on the NULL implementation. we tested with old approach being default for some time before changing the default to new. This may or may not be relevant to us here . av | |||||||||
| Comment by Sergey Zefirov [ 2022-06-02 ] | |||||||||
|
Tests that fail to load data:
There are 23 tests that call test01_distinct procedure from ctype_cmp_create.inc. Some of them pass, but majority fail. Judging from the output, the problem is in use of GROUP_CONCAT in the test01_distinct procedure. There are 6 other tests that use GROUP_CONCAT and also fail. I see no other "hot spots" here. | |||||||||
| Comment by alexey vorovich (Inactive) [ 2022-07-26 ] | |||||||||
|
sergey.zefirov | |||||||||
| Comment by Sergey Zefirov [ 2022-12-09 ] | |||||||||
|
No new problems observed. The list to fix:
| |||||||||
| Comment by Daniel Lee (Inactive) [ 2023-04-25 ] | |||||||||
|
Build verified: engine: 1e56a0b557efb677d07533d05eb02ad723955317 Testing the following areas: INSERT There is a MTR performance issue for the develop branch as of 04/24 night. According to the developer, MTR tests passed on the Two tickets have been opened. MCOL-5483 For UPDATE operation, row count for 'Changed: ' returned is incorrect |