[MCOL-1023] Add char(>8) and double to feature extends elimination Created: 2017-11-09  Updated: 2022-02-25  Resolved: 2017-12-08

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 1.0.0
Fix Version/s: Icebox

Type: New Feature Priority: Major
Reporter: Richard Stracke Assignee: Todd Stoffel (Inactive)
Resolution: Duplicate Votes: 1
Labels: extentelimination

Issue Links:
PartOf
includes MCOL-1089 Extent elimination for double datatype Closed
includes MCOL-1090 Extent Elimination for char/varchar c... Closed

 Description   

Extent exlimination works regarding
https://mariadb.com/kb/en/library/columnstore-storage-architecture/

"filter value is compared to the minimum and maximum values stored in each extent for the column: If the filter value is outside of a extent's min-max value range, the extent is eliminated."

This is not the case for the datatypes char(>8) and double (+real)

To Reproduce:

Create Table

create database `etest`;
 
USE `etest`  ; 
CREATE TABLE `etest` 
  ( 
     `e01`    int         , 
     `e02`    char(8)     , 
     `e03`    char(9)     , 
     `e04`    decimal     , 
     `e05`    double      , 
     `e06`    real        
) engine=columnstore;
 
}

Activate extends elimination with insert value and select col from table for every column.

insert into `etest` (`e01`,`e02`,`e03`,`e04`,`e05`,`e06`) VALUES (1,'12345678','123456789',1.1,1.1,1.1);
 
select `e01` from `etest` ;
select `e02` from `etest` ;
select `e03` from `etest` ;
select `e04` from `etest` ;
select `e05` from `etest` ;
select `e06` from `etest` ;
 

MariaDB [etest]> select ce.object_id,object_type,min_value,max_value,state,status,cc.data_type,cc.column_length from  `information_schema`.`COLUMNSTORE_EXTENTS`  ce left join   `information_schema`.`COLUMNSTORE_COLUMNS`  cc on ce.object_id = cc.object_id where cc.table_schema = 'etest' and cc.table_name = 'etest';
+-----------+-------------+---------------------+---------------------+---------+-----------+-----------+---------------+
| object_id | object_type | min_value           | max_value           | state   | status    | data_type | column_length |
+-----------+-------------+---------------------+---------------------+---------+-----------+-----------+---------------+
|      3023 | Column      |                   1 |                   1 | Valid   | Available | int       |             4 |
|      3024 | Column      | 4050765991979987456 | 4050765991979987456 | Valid   | Available | char      |             8 |
|      3025 | Column      |                NULL |                NULL | Invalid | Available | char      |             9 |
|      3026 | Column      |                NULL |                NULL | Invalid | Available | decimal   |             8 |
|      3027 | Column      |                NULL |                NULL | Invalid | Available | double    |             8 |
|      3028 | Column      |                NULL |                NULL | Invalid | Available | double    |             8 |
+-----------+-------------+---------------------+---------------------+---------+-----------+-----------+---------------+
6 rows in set (0.06 sec)

CHAR(8) and int will be recognized feature.

CHAR(9) also double(real) even if column length is only 8.

It would be nice and a performance bonus, if these datatypes will be included into the extents elimination feature.

Richard



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2017-11-09 ]

These are two different problems to solve that should eventually have their own tickets.

The float/double is a problem because the binary representation is not in order (0.1 in binary is more than 1.0 for example). So we would need some interpretation of the data beyond basic binary ordering (it may not matter for display purposes?).

The CHAR issue would require re-architecting the system catalog. The data storage for min/max is an 8-byte value and this would need to become a varying length column. It may instead be better to implement a hash index for dictionaries as you are probably more likely to do equality matching than value ranges for CHAR/VARCHAR in ColumnStore (made simpler because a hash index is already partly implemented from InfiniDB days but not completed and disabled).

Comment by Dipti Joshi (Inactive) [ 2017-12-08 ]

MCOL-1089 and MCOL-1090 created to address the two data type. Please see the actual resolution and fixVersion there

Generated at Thu Feb 08 02:25:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.