[MCOL-4580] Extent's approximate range keeping for dictionaries Created: 2021-03-04  Updated: 2023-12-21

Status: In Testing
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: 23.10

Type: Task Priority: Major
Reporter: Sergey Zefirov Assignee: Sergey Zefirov
Resolution: Unresolved Votes: 0
Labels: rm_perf

Issue Links:
Blocks
blocks MCOL-4522 calGetTrace shows double LIO from Com... Closed
is blocked by MCOL-5005 Add charset number to system catalog Closed
Duplicate
duplicates MCOL-1090 Extent Elimination for char/varchar c... Closed
PartOf
includes MCOL-4529 Design the proper way of doing extent... Closed
Problem/Incident
causes MCOL-5346 Excessive log messages converting NUL... Closed
Epic Link: ColumnStore Performance Improvements
Sprint: 2021-4, 2021-5, 2021-6, 2021-7, 2021-8, 2021-9, 2021-10, 2021-11, 2021-12, 2021-13, 2021-14, 2021-15, 2021-16, 2021-17, 2022-22, 2022-23, 2023-4, 2023-5, 2023-6, 2023-7, 2023-8, 2023-10, 2023-11, 2023-12
Assigned for Review: Roman Roman
Assigned for Testing: Kirill Perov Kirill Perov

 Description   

Following the ideas from the design document, we may add extent range information for dictionaries.

Token columns are, essentially, 8-byte-wide unsigned integers. They can be tracked just like other integer columns in extentmap. But instead of tracking values of tokens we will track values of prefixes of corresponding dictionary strings, with collation applied.

I will add more precise plan to comments below.



 Comments   
Comment by Sergey Zefirov [ 2021-03-10 ]

Relevant branch: https://github.com/mariadb-SergeyZefirov/mariadb-columnstore-engine/tree/MCOL-4580-extent-range-keeping-for-dictionaries

Let's start with functionality tests!

Comment by Sergey Zefirov [ 2021-03-25 ]

cpimport does not handle collation(s).

Comment by Sergey Zefirov [ 2021-03-25 ]

Why would it - we process collations dynamically during query plan execution.

This is relatively good, actually.

Comment by Sergey Zefirov [ 2021-03-26 ]

https://collation-charts.org/mysql60/mysql604.utf8_slovak_ci.html - recommended by bar.

Please note that CH (sequence of chars 0x43 0x48) goes between H (0x48) and I (0x49) with code 0x49.

Comment by Sergey Zefirov [ 2021-03-26 ]

Corner case: part Slovakian, part Russian string in the column with Slovakian collation.

The Solvakian part must be sorted (eliminated) accordingly to Slovakian rules and Russian parts should be sorted accordingly to Russian rules.

Comment by Sergey Zefirov [ 2021-03-26 ]

The rules of Slovakian collation calls for, let say, "safe min/max" computations for strings.

The accented O (Ó) if put in range as it is (code 0xd3) will make us to think that "y" (code 0x79) is also included. Which ay not be a case.

Comment by Sergey Zefirov [ 2021-03-26 ]

Test I have right now depends on the MCOL-2044 functionality. I think I'll go for cpimport changes while 2044 is not ready.

Comment by Sergey Zefirov [ 2021-03-29 ]

https://collation-charts.org/mysql60/mysql604.koi8r_general_ci.html

A collation table where 0xFF (Ъ) is lexically smaller than 0xF1 (Я).

Comment by Sergey Zefirov [ 2021-03-29 ]

Interesting bits from bar:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8 COLLATE utf8_czech_ci);
INSERT INTO t1 values('a'),('h'),('j'),('ch'),('CH'),('Ch'),('cH');
SELECT a, HEX(WEIGHT_STRING(a)) FROM t1 ORDER BY a;

weight_string corresponds to

size_t     (*strnxfrm)(CHARSET_INFO *,
                         uchar *dst, size_t dstlen, uint nweights,
                         const uchar *src, size_t srclen, uint flags);

And

my_bool (*like_range)(CHARSET_INFO *,
                        const char *s, size_t s_length,
                        pchar w_prefix, pchar w_one, pchar w_many, 
                        size_t res_length,
                        char *min_str, char *max_str,
                        size_t *min_len, size_t *max_len);

This last function expands a string ("a%") with characters corresponding to first and last (in sorting order sense) strings for a collation.

Comment by Sergey Zefirov [ 2021-03-29 ]

(debug build of MariaDB)

The script:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8 COLLATE utf8_czech_ci);
INSERT INTO t1 values('a'),('h'),('j'),('ch'),('CH'),('Ch'),('cH');
SELECT a, HEX(LIKE_RANGE_MIN(CONCAT(a,'%'),8)), HEX(LIKE_RANGE_MAX(CONCAT(a,'%'),8)) FROM t1;

The result:

+------+--------------------------------------+--------------------------------------+
| a    | HEX(LIKE_RANGE_MIN(CONCAT(a,'%'),8)) | HEX(LIKE_RANGE_MAX(CONCAT(a,'%'),8)) |
+------+--------------------------------------+--------------------------------------+
| a    | 6109090909090909                     | 61EFBFBFEFBFBF20                     |
| h    | 6809090909090909                     | 68EFBFBFEFBFBF20                     |
| j    | 6A09090909090909                     | 6AEFBFBFEFBFBF20                     |
| ch   | 6368                                 | 6368                                 |
| CH   | 4348                                 | 4348                                 |
| Ch   | 4368                                 | 4368                                 |
| cH   | 6348                                 | 6348                                 |
+------+--------------------------------------+--------------------------------------+

It is a result of "like_range" application above.

Comment by Sergey Zefirov [ 2021-04-07 ]

Have some small success:

DROP DATABASE IF EXISTS MCOL4580;
CREATE DATABASE MCOL4580;
USE MCOL4580;
CREATE TABLE t(d TEXT) ENGINE=COLUMNSTORE;
INSERT INTO t(d) VALUES ('b'),('b'),('b');
SELECT min_value, max_value FROM information_schema.columnstore_extents WHERE width=8;
min_value	max_value
7061644215716938000	NULL
DROP DATABASE MCOL4580;
../storage/columnstore/columnstore/mtr/basic.a [ pass ]    878

The 7061644215716938000 is 0x*62*00000000000110 - I've got almost valid prefix for string 'b'. The junk in lower bytes is a question I will debug shortly.

Comment by Sergey Zefirov [ 2021-04-09 ]

The "signedness" of column type partially depends on the width of the column. This made me make column range signed.

It looks like job execution engine already tries to skip some extents and incorrectly panics.

Comment by Sergey Zefirov [ 2021-04-12 ]

TupleBPS::makeJobs receive TOKEN extents with type TEXT and size 8. Due to CasualPartitioningType logic, these columns are not CP-elimination eligible. This means they will be scanned anyway.

I have to think how to overcome this limitation.

Comment by Sergey Zefirov [ 2021-04-13 ]

TOKEN columns are marked as TEXT with width=8 in CalpontSystemCatalog nomenclature.

The CasualPartitionDataType(ColDataType, int) predicate in LbidList checks if width is < 8 for TEXT columns - in that case they are eligible for elimination. I've changed the comparison to width <= 8. This change made TOKEN extent in my test to be scrutinized with predicates.

There is single predicate on that extent (which is correct) but the list of filters in that predicate is empty. That means that extent is marked to scan - filters can only eliminate extents from scanning and there are no filters to eliminate.

Comment by Sergey Zefirov [ 2021-04-19 ]

The test SQL code:

DROP DATABASE IF EXISTS MCOL4580;
CREATE DATABASE MCOL4580;
USE MCOL4580;
CREATE TABLE t(d TEXT) ENGINE=COLUMNSTORE;
INSERT INTO t(d) VALUES ('b'),('b'),('b');
SELECT calsettrace(1);
SELECT COUNT(*) FROM t WHERE d = 'a';
SHOW WARNINGS;   -- <--- this reports number of blocks eliminated.
SELECT calgettrace();
DROP DATABASE MCOL4580;

Now the first run of my test without any attempt of adding filter that will remove TOKEN extent.

# mysql <test
calsettrace(1)
0
COUNT(*)
0
Level	Code	Message
Note	9999	Query Stats: MaxMemPct-0; NumTempFiles-0; TempFileSpace-0B; ApproxPhyI/O-2; CacheI/O-2; BlocksTouched-2; PartitionBlocksEliminated-0; MsgBytesIn-108B; MsgBytesOut-662B; Mode-Distributed
calgettrace()
\nDesc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows \nBPS  PM   t     3000     (d)               2   2   0   0.008   1    \nTAS  UM   -     -        -                 -   -   -   0.000   1    \nTNS  UM   -     -        -                 -   -   -   0.000   1    \n

And second run of the same test, but when I added a filter to a TOKEN column:

# mysql <test
calsettrace(1)
0
COUNT(*)
0
Level	Code	Message
Note	9999	Query Stats: MaxMemPct-0; NumTempFiles-0; TempFileSpace-0B; ApproxPhyI/O-1; CacheI/O-1; BlocksTouched-1; PartitionBlocksEliminated-0; MsgBytesIn-108B; MsgBytesOut-672B; Mode-Distributed
calgettrace()
\nDesc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows \nBPS  PM   t     3000     (d)               1   1   0   0.006   1    \nTAS  UM   -     -        -                 -   -   -   0.000   1    \nTNS  UM   -     -        -                 -   -   -   0.000   1    \n

Note that counts for PIO and LIO are changed. Also warnings report shows less blocks touched.

We read less blocks.

I think we still read dictionary for some reason and plan to investigate on that.

Comment by Sergey Zefirov [ 2021-05-14 ]

Each dictionary column is split into two parts: TOKEN column which contains references of actual data and actual dictionary data. The dictionary data extent does not contain structured information, it contains information whose structure is defined elsewhere - in the TOKEN column.

Before any changes related to this task the situation was as follows: when we tried to eliminate dictionary extents we applied predicates to the dictionary data extents, not to the TOKEN extents.

It may turn to be not necessary at all and right now I am looking to prove that. E.g., we can leave only scan operation for TOKEN extents and drop out operation on the dictionary data extents.

Comment by Sergey Zefirov [ 2021-05-18 ]

Tokens (write engine parlance) in old calpont code are TEXT columns with width exactly 8. So they are treated everywhere as if they are of type CHAR(8).

For example, this is the code in lbidlist.cpp, CasualPartitioningPredicate method:

        if (bIsChar && 1 < ct.colWidth) // <- is true for TOKEN column
        {
            datatypes::Charset cs(ct.charsetNumber);
            utils::ConstString sMin((const char *) &cpRange.loVal, 8); // XXX: note the treatment of values.
            utils::ConstString sMax((const char *) &cpRange.hiVal, 8);
            utils::ConstString sVal((const char *) &value, 8);
            scan = compareStr(cs, sMin.rtrimZero(),
                                  sMax.rtrimZero(),
                                  sVal.rtrimZero(), op, lcf);
//                      cout << "scan=" << (uint32_t) scan << endl;
        }

The Casual Partitioning range values and value itself are treated as if they are strings, encoded as integers. This direct conversion is compatible with handlng of ranges in different parts of CS engine (versioning, mainly) only for big endian platforms.

The code above is also not quite correct because we would like to keep ranges' values in the collated state anyway. It may be possible that ranges are not correctly kept in the present design - I have to investigate.

Comment by Sergey Zefirov [ 2021-05-19 ]

I've got BlocksTouched=0 on this test:

DROP DATABASE IF EXISTS MCOL4580;
CREATE DATABASE MCOL4580;
USE MCOL4580;
CREATE TABLE t(d TEXT) ENGINE=COLUMNSTORE;
INSERT INTO t(d) VALUES ('b'),('b'),('b');
SELECT * FROM information_schema.columnstore_extents;
SELECT calsettrace(1);
SELECT COUNT(*) FROM t WHERE d = 'a';
SHOW WARNINGS;
SELECT calgettrace();
DROP DATABASE MCOL4580;

SHOW WARNINGS report:

9999	Query Stats: MaxMemPct-0; NumTempFiles-0; TempFileSpace-0B; ApproxPhyI/O-0; CacheI/O-0; BlocksTouched-0; PartitionBlocksEliminated-1; MsgBytesIn-0B; MsgBytesOut-602B; Mode-Distributed

I've got (even before fix of code above) the following:

# mysql <test 
ERROR 1815 (HY000) at line 5: Internal error: CAL0001: image inconsistency 

This means that ranges for TOKEN column are kept in the different state across program components. I'm on investigation of that phenomena.

Comment by Sergey Zefirov [ 2021-05-25 ]

The source of "Image inconsistency" result was due to WriteEngineServer crash:

Thread 23 "WriteEngineServ" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffaabe06700 (LWP 117237)]
0x00007ffabbd6438b in je_large_dalloc () from /lib64/libjemalloc.so.2
(gdb) backtrace 
#0  0x00007ffabbd6438b in je_large_dalloc () from /lib64/libjemalloc.so.2
#1  0x00007ffabbd21c9c in je_free_default () from /lib64/libjemalloc.so.2
#2  0x00007ffaba25fd88 in __gnu_cxx::new_allocator<boost::shared_ptr<joblist::AnyDataList> >::deallocate (this=0x7ffaabe01680, 
    __p=0x7ffaab204000) at /usr/include/c++/8/ext/new_allocator.h:125
#3  0x00007ffaba25e9cd in std::allocator_traits<std::allocator<boost::shared_ptr<joblist::AnyDataList> > >::deallocate (__a=..., 
    __p=0x7ffaab204000, __n=2) at /usr/include/c++/8/bits/alloc_traits.h:462
#4  0x00007ffaba25d194 in std::_Vector_base<boost::shared_ptr<joblist::AnyDataList>, std::allocator<boost::shared_ptr<joblist::AnyDataList> > >::_M_deallocate (this=0x7ffaabe01680, __p=0x7ffaab204000, __n=2) at /usr/include/c++/8/bits/stl_vector.h:304
#5  0x00007ffaba25cfbe in std::_Vector_base<boost::shared_ptr<joblist::AnyDataList>, std::allocator<boost::shared_ptr<joblist::AnyDataList> > >::~_Vector_base (this=0x7ffaabe01680, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/stl_vector.h:285
#6  0x00007ffaba25b48f in std::vector<boost::shared_ptr<joblist::AnyDataList>, std::allocator<boost::shared_ptr<joblist::AnyDataList> > >::~vector (this=0x7ffaabe01680, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/stl_vector.h:570
#7  0x00007ffaba259b94 in joblist::JobStepAssociation::~JobStepAssociation (this=0x7ffaabe01660, __in_chrg=<optimized out>)
    at /data/mdb-server/storage/columnstore/columnstore/dbcon/joblist/jobstep.h:67
#8  0x00007ffaba2e2588 in (anonymous namespace)::doSimpleFilter (sf=0x7ff84a400400, jobInfo=...)
    at /data/mdb-server/storage/columnstore/columnstore/dbcon/joblist/jlf_execplantojoblist.cpp:1836

The answer to a broadcast message was empty (WE server is down), it was not the same as other answers (I presume they are correct) and as a result of answers divergence we've got "image inconsistency" error.

Comment by Sergey Zefirov [ 2021-05-26 ]

Complete stack trace:

#0  0x00007f606c1ce38b in je_large_dalloc () from /lib64/libjemalloc.so.2
#1  0x00007f606c18bc9c in je_free_default () from /lib64/libjemalloc.so.2
#2  0x00007f606a710388 in __gnu_cxx::new_allocator<boost::shared_ptr<joblist::JobStep> >::deallocate (
    this=0x7f605c2024a0, __p=0x7f605b604000) at /usr/include/c++/8/ext/new_allocator.h:125
#3  0x00007f606a70c828 in std::allocator_traits<std::allocator<boost::shared_ptr<joblist::JobStep> > >::deallocate (__a=..., __p=0x7f605b604000, __n=2) at /usr/include/c++/8/bits/alloc_traits.h:462
#4  0x00007f606a709e96 in std::_Vector_base<boost::shared_ptr<joblist::JobStep>, std::allocator<boost::shared_ptr<joblist::JobStep> > >::_M_deallocate (this=0x7f605c2024a0, __p=0x7f605b604000, __n=2)
    at /usr/include/c++/8/bits/stl_vector.h:304
#5  0x00007f606a70a084 in std::_Vector_base<boost::shared_ptr<joblist::JobStep>, std::allocator<boost::shared_ptr<joblist::JobStep> > >::~_Vector_base (this=0x7f605c2024a0, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/stl_vector.h:285
#6  0x00007f606a7084df in std::vector<boost::shared_ptr<joblist::JobStep>, std::allocator<boost::shared_ptr<joblist::JobStep> > >::~vector (this=0x7f605c2024a0, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/stl_vector.h:570
#7  0x00007f606a7dba7b in (anonymous namespace)::makeJobList_ (cplan=0x7f605c203d40, rm=0x7f605b63c380, 
    isExeMgr=false, errCode=@0x7f605c20336c: 0, emsg="")
    at /data/mdb-server/storage/columnstore/columnstore/dbcon/joblist/joblistfactory.cpp:1910
#8  0x00007f606a7dc1ba in joblist::JobListFactory::makeJobList (cplan=0x7f605c203d40, rm=0x7f605b63c380, 
    tryTuple=true, isExeMgr=false)
    at /data/mdb-server/storage/columnstore/columnstore/dbcon/joblist/joblistfactory.cpp:2065
#9  0x00007f6069c95eaf in execplan::CalpontSystemCatalog::getSysData_EC (this=0x7f605b63dc00, csep=..., 
    sysDataList=..., sysTableName="systable")
    at /data/mdb-server/storage/columnstore/columnstore/dbcon/execplan/calpontsystemcatalog.cpp:906
#10 0x00007f6069c959c5 in execplan::CalpontSystemCatalog::getSysData (this=0x7f605b63dc00, csep=..., 
    sysDataList=..., sysTableName="systable")
    at /data/mdb-server/storage/columnstore/columnstore/dbcon/execplan/calpontsystemcatalog.cpp:841
--Type <RET> for more, q to quit, c to continue without paging--c
#11 0x00007f6069ca742c in execplan::CalpontSystemCatalog::tableName (this=0x7f605b63dc00, tableoid=@0x7f605c2041e0: 3003) at /data/mdb-server/storage/columnstore/columnstore/dbcon/execplan/calpontsystemcatalog.cpp:3469
#12 0x0000564293710117 in WriteEngine::WE_DMLCommandProc::commitBatchAutoOn (this=0x7f605ff35700, bs=..., err="") at /data/mdb-server/storage/columnstore/columnstore/writeengine/server/we_dmlcommandproc.cpp:2171
#13 0x00005642936b519d in WriteEngine::DmlReadThread::operator() (this=0x7f605ff62838) at /data/mdb-server/storage/columnstore/columnstore/writeengine/server/we_readthread.cpp:134
#14 0x00005642936bfa41 in boost::detail::thread_data<WriteEngine::DmlReadThread>::run (this=0x7f605ff62680) at /usr/include/boost/thread/detail/thread.hpp:116
#15 0x00007f6065e6b5e1 in thread_proxy () from /lib64/libboost_thread.so.1.66.0
#16 0x00007f6064f9914a in start_thread () from /lib64/libpthread.so.0
#17 0x00007f6063410f23 in clone () from /lib64/libc.so.6

Comment by Sergey Zefirov [ 2021-05-26 ]

The problem above was fixed by updating to the fresh version of MCS and, subsquently, server.

I have MTR running right now and some tests pass. The failing ones so far are testing collation functionality.

Comment by Sergey Zefirov [ 2021-05-26 ]

I was too optimistic about the segfault. Most of tests below were failing with "Lost connection to DDLProc":

 - skipping '/data/mdb-server/mysql-test/var/log/columnstore/basic.unsigned_least/'
columnstore/basic.unsigned_math          [ fail ]
        Test ended at 2021-05-26 15:57:43
 
CURRENT_TEST: columnstore/basic.unsigned_math
mysqltest: At line 10: query 'create table j1 (j1_key int)engine=columnstore' failed: ER_INTERNAL_ERROR (1815): Internal error: Lost connection to DDLProc
 
The result from queries just before the failure was:
DROP DATABASE IF EXISTS unsigned_math_db;
CREATE DATABASE unsigned_math_db;
USE unsigned_math_db;
create table j1 (j1_key int)engine=columnstore;
 
 - skipping '/data/mdb-server/mysql-test/var/log/columnstore/basic.unsigned_math/'
--------------------------------------------------------------------------
The servers were restarted 391 times
Spent 8.007 of 1236 seconds executing testcases
 
Completed: Failed 391/434 tests, 9.91% were successful.
 
Failing test(s): columnstore/basic.ctype_cmp_char1_latin1_swedish_ci ...many more...

Comment by Sergey Zefirov [ 2021-05-28 ]

I had segfaults for the test columnstore/basic.ctype_cmp_varchar32_latin1_bin in the native centos build. But it passes under valgrind.

A strange situation.

Comment by Sergey Zefirov [ 2021-06-09 ]

The crash was due to use of non-managed pointer instead of shared_ptr. Right now crash is absent and I am working on the tests (I have exception raised during INSERT operation).

Comment by Sergey Zefirov [ 2021-06-09 ]

Fixed the exception.

Now I wait for MTR tests to finish to see what I broke.

Thus far my changes broke many tests dealing with collation.

Comment by Sergey Zefirov [ 2021-06-09 ]

MTR reports "Completed: Failed 183/435 tests, 57.93% were successful." 90 of failing tests are MCOL-2044-related (some tests did not do a proper cleanup).

There were wide decimal failures and more. Of course, there are failures in tests dealing with collations.

Comment by Sergey Zefirov [ 2021-06-10 ]

One of affected tests is mcs24 "INSERT from another table". Which throws an exception in one of queries and does not clean things up. This is one reason why MCOL-2044 blows up.

Comment by Sergey Zefirov [ 2021-06-11 ]

Here are latest results:

Completed: Failed 15/404 tests, 96.29% were successful.
 
Failing test(s): columnstore/basic.mcol4580-binary-collation-range-cpimport columnstore/basic.mcol4580-binary-collation-range-insert-clause columnstore/basic.mcs212_idbExtentMax_function columnstore/basic.mcs213_idbExtentMin_function columnstore/basic.mcs251_time_to_sec_function columnstore/basic.mcs28_load_data_local_infile columnstore/basic.mcs30_update_table columnstore/basic.mcs38_select_limit columnstore/basic.mcs48_cpimport_central_loc_dist_source columnstore/basic.mcs67_ldi_datafile_separators columnstore/basic.mcs76_having columnstore/basic.mcs77_where_conditions columnstore/basic.mcs80_set_operations columnstore/basic.unsigned_aggregate columnstore/basic.unsigned_joins

An improvement upon what was before.

Comment by Sergey Zefirov [ 2021-06-15 ]

After added cleanup of my mcol4580 tests, I've reduced number of failing tests to 8:

Tests Reason
columnstore/basic.mcs212_idbExtentMax_function columnstore/basic.mcs213_idbExtentMin_function These lose connection to ExeMgr: query 'SELECT idbExtentMin(col1) FROM t1 LIMIT 1' failed: ER_INTERNAL_ERROR (1815): Internal error: Error while fetching from ExeMgr: IDB-2035: An internal error occurred. Check the error log file & contact support.
columnstore/basic.mcs251_time_to_sec_function Failed to properly convert constant string to time in seconds: SELECT TIME_TO_SEC('10:50:40.9999') FROM t1 LIMIT 1; returns 0.000 instead of correct result
olumnstore/basic.mcs38_select_limit Shows that LIKE operator does not work
columnstore/basic.mcs30_update_table Most probably, WHERE condition does not work
columnstore/basic.mcs76_having Again, LIKE operator does not work
columnstore/basic.mcs77_where_conditions An internal error occured: ER_INTERNAL_ERROR (1815): Internal error: combineJobStepsByTable failed.
columnstore/basic.mcs80_set_operations Appears as different ordering of results - results are there but in wrong place
Comment by Sergey Zefirov [ 2021-06-15 ]

Fixed handling of LIKE/NLIKE, got 5 tests failing:

Failing test(s): columnstore/basic.mcs212_idbExtentMax_function columnstore/basic.mcs213_idbExtentMin_function columnstore/basic.mcs251_time_to_sec_function
columnstore/basic.mcs30_update_table columnstore/basic.mcs80_set_operations

Comment by Sergey Zefirov [ 2021-09-30 ]

The failure of queries in the comment just above was a fluke - it appears that right now regression tests may fail differently in a different days of year.

Appropriate bug reported here.

When I've run these tests for 10.6/develop and for my patch in tight succession, I've got only problems with IS NULL.

The operation "column IS NULL" is translated to an equality test with the specific value, e.g., "column=COLUMN_TYPE_NULL_VALUE".

Logs indicate that type used for comparison in extent elimination code is the code used for short character strings, because token columns in ColumnStore are marked as CHARACTER columns with width 8 (exactly short char strings).

It is not signedness as I suspected, but type error nonetheless.

Comment by Roman [ 2022-03-03 ]

Implemented but the issue will be waiting for a prerequisite.

Comment by Roman [ 2022-08-16 ]

Can not deliver in time for 22.08.01 so moving this important perf optimization to a later release.

Comment by Kirill Perov [ 2023-12-02 ]

sergey.zefirov is veeeery long winded... It's difficult to find reproduction script, can you point it?

Generated at Thu Feb 08 02:51:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.