[MCOL-4871] Extent elimination for short CHAR does not work well Created: 2021-09-20  Updated: 2021-12-20  Resolved: 2021-12-20

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 5.6.2
Fix Version/s: 6.3.1

Type: Bug Priority: Blocker
Reporter: Alexander Barkov Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
is blocked by MDEV-26669 Add MY_COLLATION_HANDLER functions mi... Closed
Relates
relates to MCOL-4872 HEX hybrid is treated as a number in ... Open
Sprint: 2021-10, 2021-11, 2021-12

 Description   

I create and populate a table as follows:

SET NAMES utf8;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(1) CHARACTER SET latin1 COLLATE latin1_swedish_ci) ENGINE=ColumnStore;
INSERT INTO t1 VALUES (0xE9);
SELECT a, HEX(a) FROM t1;

+------+--------+
| a    | HEX(a) |
+------+--------+
| é    | E9     |
+------+--------+

Looks good so far.

Now I'm checking min and max values for the underlying extent:

SELECT
  c.table_schema,
  c.table_name,
  c.column_name,
  hex(e.min_value),
  hex(e.max_value)
FROM
  information_schema.columnstore_extents e,
  information_schema.columnstore_columns c
WHERE c.table_schema='test'
  AND c.table_name='t1'
  AND c.column_name='a'
  AND c.object_id=e.object_id;

+--------------+------------+-------------+------------------+------------------+
| table_schema | table_name | column_name | hex(e.min_value) | hex(e.max_value) |
+--------------+------------+-------------+------------------+------------------+
| test         | t1         | a           | FFFFFFFFFFFFFFE9 | FFFFFFFFFFFFFF80 |
+--------------+------------+-------------+------------------+------------------+

The min value does not look fine - for some reasons the 0xE9 went to lowest byte, while all higher bytes were filled with 0xFF. I'd expect:

  • Either the result in hex(e.min_value) to be just 0xE9 - this is a CHAR(1) column. It's not clear why it gets converted to uint64_t.
  • Or at least 0xE9 goes to the highest byte.

The max value does not look fine either - it contains something not correlating with the character 0xE9 at all.

Now let's check how these min/max values affect WHERE:

Non-matching blocks do not get eliminated

Now I run a query with a value which is not in the table:

DO calsettrace(1);
SELECT * FROM t1 WHERE a='o';
SELECT
  regexp_substr(calgettrace(1),'(BlocksTouched-[^;]*);') AS touched,
  regexp_substr(calgettrace(1),'(PartitionBlocksEliminated-[^;]*);') AS eliminated;
DO calsettrace(0);

Empty set, 1 warning (0.045 sec)

+------------------+------------------------------+
| touched          | eliminated                   |
+------------------+------------------------------+
| BlocksTouched-1; | PartitionBlocksEliminated-0; |
+------------------+------------------------------+

It correctly returned empty set, but the "touched" and "eliminated" values are wrong. It should have eliminated this block.

More examples with CHAR(1)

CHAR(1) CHARACER SET cp1251

SET NAMES utf8;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(1) CHARACTER SET cp1251) ENGINE=ColumnStore;
INSERT INTO t1 VALUES ('г' /*_cp1251 x'E3'*/);
SELECT a, HEX(a) FROM t1;

+------+--------+
| a    | HEX(a) |
+------+--------+
| г    | E3     |
+------+--------+

Looks good so far.

SELECT
  c.table_schema,
  c.table_name,
  c.column_name,
  hex(e.min_value),
  hex(e.max_value)
FROM
  information_schema.columnstore_extents e,
  information_schema.columnstore_columns c
WHERE c.table_schema='test'
  AND c.table_name='t1'
  AND c.column_name='a'
  AND c.object_id=e.object_id;

+--------------+------------+-------------+------------------+------------------+
| table_schema | table_name | column_name | hex(e.min_value) | hex(e.max_value) |
+--------------+------------+-------------+------------------+------------------+
| test         | t1         | a           | FFFFFFFFFFFFFFE3 | FFFFFFFFFFFFFF80 |
+--------------+------------+-------------+------------------+------------------+

max_value looks wrong.

DO calsettrace(1);
SELECT * FROM t1 WHERE a='д'/*_cp1251 x'E4'*/;
SELECT
  regexp_substr(calgettrace(1),'(BlocksTouched-[^;]*);') AS touched,
  regexp_substr(calgettrace(1),'(PartitionBlocksEliminated-[^;]*);') AS eliminated;
DO calsettrace(0);

Empty set, 1 warning (0.046 sec)

+------------------+------------------------------+
| touched          | eliminated                   |
+------------------+------------------------------+
| BlocksTouched-1; | PartitionBlocksEliminated-0; |
+------------------+------------------------------+

Looks wrong. The block should have been eliminated.

CHAR(1) CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_ci

SET NAMES utf8;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_ci) ENGINE=ColumnStore;
INSERT INTO t1 VALUES ('a');
SELECT a, HEX(a) FROM t1;

+------+--------+
| a    | HEX(a) |
+------+--------+
| a    | 61     |
+------+--------+

Looks good so far.

SELECT
  c.table_schema,
  c.table_name,
  c.column_name,
  hex(e.min_value),
  hex(e.max_value)
FROM
  information_schema.columnstore_extents e,
  information_schema.columnstore_columns c
WHERE c.table_schema='test'
  AND c.table_name='t1'
  AND c.column_name='a'
  AND c.object_id=e.object_id;

+--------------+------------+-------------+------------------+------------------+
| table_schema | table_name | column_name | hex(e.min_value) | hex(e.max_value) |
+--------------+------------+-------------+------------------+------------------+
| test         | t1         | a           | 61               | FFFFFFFF80000000 |
+--------------+------------+-------------+------------------+------------------+

max_value looks wrong.

DO calsettrace(1);
SELECT * FROM t1 WHERE a='b';
SELECT
  regexp_substr(calgettrace(1),'(BlocksTouched-[^;]*);') AS touched,
  regexp_substr(calgettrace(1),'(PartitionBlocksEliminated-[^;]*);') AS eliminated;
DO calsettrace(0);

Empty set, 1 warning (0.040 sec)

+------------------+------------------------------+
| touched          | eliminated                   |
+------------------+------------------------------+
| BlocksTouched-1; | PartitionBlocksEliminated-0; |
+------------------+------------------------------+

Looks wrong. The block should have been eliminated.



 Comments   
Comment by Alexander Barkov [ 2021-09-22 ]

More examples with CHAR(2)

CHAR(2) CHARACTER SET latin1

SET NAMES utf8;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(2) CHARACTER SET latin1 COLLATE latin1_swedish_ci) ENGINE=ColumnStore;
INSERT INTO t1 VALUES ('ÿ€' /*_latin1 x'FF80'*/);
SELECT a, HEX(a) FROM t1;
SELECT
  c.table_schema,
  c.table_name,
  c.column_name,
  hex(e.min_value),
  hex(e.max_value)
FROM
  information_schema.columnstore_extents e,
  information_schema.columnstore_columns c
WHERE c.table_schema='test'
  AND c.table_name='t1'
  AND c.column_name='a'
  AND c.object_id=e.object_id;
DO calsettrace(1);

+-------+--------+
| a     | HEX(a) |
+-------+--------+
| ÿ€    | FF80   |
+-------+--------+

+--------------+------------+-------------+------------------+------------------+
| table_schema | table_name | column_name | hex(e.min_value) | hex(e.max_value) |
+--------------+------------+-------------+------------------+------------------+
| test         | t1         | a           | 7FFF             | FFFFFFFFFFFF80FF |
+--------------+------------+-------------+------------------+------------------+

min_value looks wrong.

SELECT * FROM t1 WHERE a=x'FF7F' /* 'ÿ<DEL>' */;
SELECT
  regexp_substr(calgettrace(1),'(BlocksTouched-[^;]*);') AS touched,
  regexp_substr(calgettrace(1),'(PartitionBlocksEliminated-[^;]*);') AS eliminated;
DO calsettrace(0);

Empty set, 1 warning (0.010 sec)

+------------------+------------------------------+
| touched          | eliminated                   |
+------------------+------------------------------+
| BlocksTouched-1; | PartitionBlocksEliminated-0; |
+------------------+------------------------------+

Looks wrong. The block should have been eliminated.

CHAR(2) CHARACTER SET koi8u

SET NAMES utf8;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(2) CHARACTER SET koi8u COLLATE koi8u_general_ci) ENGINE=ColumnStore;
INSERT INTO t1 VALUES ('Ъ─') /*_koi8u x'FF80'*/;
SELECT a, HEX(a) FROM t1;
SELECT
  c.table_schema,
  c.table_name,
  c.column_name,
  hex(e.min_value),
  hex(e.max_value)
FROM
  information_schema.columnstore_extents e,
  information_schema.columnstore_columns c
WHERE c.table_schema='test'
  AND c.table_name='t1'
  AND c.column_name='a'
  AND c.object_id=e.object_id;

+-------+--------+
| a     | HEX(a) |
+-------+--------+
| Ъ─    | FF80   |
+-------+--------+

+--------------+------------+-------------+------------------+------------------+
| table_schema | table_name | column_name | hex(e.min_value) | hex(e.max_value) |
+--------------+------------+-------------+------------------+------------------+
| test         | t1         | a           | 7FFF             | FFFFFFFFFFFF80FF |
+--------------+------------+-------------+------------------+------------------+

min_value Looks wrong.

DO calsettrace(1);
SELECT * FROM t1 WHERE a='Ъя' /*_koi8u x'FFF1'*/;
SELECT
  regexp_substr(calgettrace(1),'(BlocksTouched-[^;]*);') AS touched,
  regexp_substr(calgettrace(1),'(PartitionBlocksEliminated-[^;]*);') AS eliminated;
DO calsettrace(0);

Empty set, 1 warning (0.016 sec)

+------------------+------------------------------+
| touched          | eliminated                   |
+------------------+------------------------------+
| BlocksTouched-1; | PartitionBlocksEliminated-0; |
+------------------+------------------------------+

Looks wrong. The block should have been eliminated.

CHAR(2) CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_ci

SET NAMES utf8;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(2) CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_ci) ENGINE=ColumnStore;
INSERT INTO t1 VALUES ('aa');
SELECT a, HEX(a) FROM t1;
SELECT
  c.table_schema,
  c.table_name,
  c.column_name,
  hex(e.min_value),
  hex(e.max_value)
FROM
  information_schema.columnstore_extents e,
  information_schema.columnstore_columns c
WHERE c.table_schema='test'
  AND c.table_name='t1'
  AND c.column_name='a'
  AND c.object_id=e.object_id;

+------+--------+
| a    | HEX(a) |
+------+--------+
| aa   | 6161   |
+------+--------+

+--------------+------------+-------------+------------------+------------------+
| table_schema | table_name | column_name | hex(e.min_value) | hex(e.max_value) |
+--------------+------------+-------------+------------------+------------------+
| test         | t1         | a           | 6161             | NULL             |
+--------------+------------+-------------+------------------+------------------+

max_value looks wrong.

DO calsettrace(1);
SELECT * FROM t1 WHERE a='bb';
SELECT
  regexp_substr(calgettrace(1),'(BlocksTouched-[^;]*);') AS touched,
  regexp_substr(calgettrace(1),'(PartitionBlocksEliminated-[^;]*);') AS eliminated;
DO calsettrace(0);

+------------------+------------------------------+
| touched          | eliminated                   |
+------------------+------------------------------+
| BlocksTouched-1; | PartitionBlocksEliminated-0; |
+------------------+------------------------------+

Looks wrong. The block should have been eliminated.

Comment by Alexander Barkov [ 2021-09-22 ]

More examples with CHAR(4)

CHAR(4) CHARACTER SET latin1

SET NAMES utf8;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(4) CHARACTER SET latin1 COLLATE latin1_swedish_ci) ENGINE=ColumnStore;
INSERT INTO t1 VALUES ('ÿÿÿ€' /*_latin1 x'FFFFFF80'*/);
SELECT a, HEX(a) FROM t1;
SELECT
  c.table_schema,
  c.table_name,
  c.column_name,
  hex(e.min_value),
  hex(e.max_value)
FROM
  information_schema.columnstore_extents e,
  information_schema.columnstore_columns c
WHERE c.table_schema='test'
  AND c.table_name='t1'
  AND c.column_name='a'
  AND c.object_id=e.object_id;

+-----------+----------+
| a         | HEX(a)   |
+-----------+----------+
| ÿÿÿ€      | FFFFFF80 |
+-----------+----------+

+--------------+------------+-------------+------------------+------------------+
| table_schema | table_name | column_name | hex(e.min_value) | hex(e.max_value) |
+--------------+------------+-------------+------------------+------------------+
| test         | t1         | a           | 7FFFFFFF         | FFFFFFFF80FFFFFF |
+--------------+------------+-------------+------------------+------------------+

min_value looks wrong.

DO calsettrace(1);
SELECT * FROM t1 WHERE a=x'FFFFFF7F' /* 'ÿÿÿ<DEL>' */;
SELECT
  regexp_substr(calgettrace(1),'(BlocksTouched-[^;]*);') AS touched,
  regexp_substr(calgettrace(1),'(PartitionBlocksEliminated-[^;]*);') AS eliminated;
DO calsettrace(0);

Empty set, 1 warning (0.016 sec)

+------------------+------------------------------+
| touched          | eliminated                   |
+------------------+------------------------------+
| BlocksTouched-1; | PartitionBlocksEliminated-0; |
+------------------+------------------------------+

Looks wrong. The block should have been eliminated.

CHAR(4) CHARACTER SET koi8u

SET NAMES utf8;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(4) CHARACTER SET koi8u COLLATE koi8u_general_ci) ENGINE=ColumnStore;
INSERT INTO t1 VALUES ('ЪЪЪ─') /*_koi8u x'FFFFFF80'*/;
SELECT a, HEX(a) FROM t1;
SELECT
  c.table_schema,
  c.table_name,
  c.column_name,
  hex(e.min_value),
  hex(e.max_value)
FROM
  information_schema.columnstore_extents e,
  information_schema.columnstore_columns c
WHERE c.table_schema='test'
  AND c.table_name='t1'
  AND c.column_name='a'
  AND c.object_id=e.object_id;

+-----------+----------+
| a         | HEX(a)   |
+-----------+----------+
| ЪЪЪ─      | FFFFFF80 |
+-----------+----------+

+--------------+------------+-------------+------------------+------------------+
| table_schema | table_name | column_name | hex(e.min_value) | hex(e.max_value) |
+--------------+------------+-------------+------------------+------------------+
| test         | t1         | a           | 7FFFFFFF         | FFFFFFFF80FFFFFF |
+--------------+------------+-------------+------------------+------------------+

min_value looks wrong.

DO calsettrace(1);
SELECT * FROM t1 WHERE a='ЪЪЪя' /*_koi8u x'FFFFFFF1'*/;
SELECT
  regexp_substr(calgettrace(1),'(BlocksTouched-[^;]*);') AS touched,
  regexp_substr(calgettrace(1),'(PartitionBlocksEliminated-[^;]*);') AS eliminated;
DO calsettrace(0);

Empty set, 1 warning (0.041 sec)

+------------------+------------------------------+
| touched          | eliminated                   |
+------------------+------------------------------+
| BlocksTouched-1; | PartitionBlocksEliminated-0; |
+------------------+------------------------------+

Looks wrong. The block should have been eliminated.

Comment by Alexander Barkov [ 2021-09-22 ]

More examples with CHAR(8)

CHAR(8) CHARACTER SET latin1

SET NAMES utf8;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(8) CHARACTER SET latin1 COLLATE latin1_swedish_ci) ENGINE=ColumnStore;
INSERT INTO t1 VALUES ('ÿÿÿÿÿÿÿ€' /*_latin1 x'FFFFFFFFFFFFFF80'*/);
SELECT a, HEX(a) FROM t1;
SELECT
  c.table_schema,
  c.table_name,
  c.column_name,
  hex(e.min_value),
  hex(e.max_value)
FROM
  information_schema.columnstore_extents e,
  information_schema.columnstore_columns c
WHERE c.table_schema='test'
  AND c.table_name='t1'
  AND c.column_name='a'
  AND c.object_id=e.object_id;

+-------------------+------------------+
| a                 | HEX(a)           |
+-------------------+------------------+
| ÿÿÿÿÿÿÿ€          | FFFFFFFFFFFFFF80 |
+-------------------+------------------+

+--------------+------------+-------------+------------------+------------------+
| table_schema | table_name | column_name | hex(e.min_value) | hex(e.max_value) |
+--------------+------------+-------------+------------------+------------------+
| test         | t1         | a           | NULL             | 8100000000000000 |
+--------------+------------+-------------+------------------+------------------+

min_value looks wrong. max_value looks wrong.

DO calsettrace(1);
SELECT * FROM t1 WHERE a=x'FFFFFFFFFFFFFF7F' /* 'ÿÿÿÿÿÿÿ<DEL>' */;
SELECT
  regexp_substr(calgettrace(1),'(BlocksTouched-[^;]*);') AS touched,
  regexp_substr(calgettrace(1),'(PartitionBlocksEliminated-[^;]*);') AS eliminated;
DO calsettrace(0);

Empty set, 1 warning (0.020 sec)

+------------------+------------------------------+
| touched          | eliminated                   |
+------------------+------------------------------+
| BlocksTouched-1; | PartitionBlocksEliminated-0; |
+------------------+------------------------------+

Looks wrong. The block should have been eliminated.

CHAR(8) CHARACTER SET koi8u

SET NAMES utf8;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a CHAR(8) CHARACTER SET koi8u COLLATE koi8u_general_ci) ENGINE=ColumnStore;
INSERT INTO t1 VALUES ('ЪЪЪЪЪЪЪ─') /*_koi8u x'FFFFFFFFFFFFFF80'*/;
SELECT a, HEX(a) FROM t1;
SELECT
  c.table_schema,
  c.table_name,
  c.column_name,
  hex(e.min_value),
  hex(e.max_value)
FROM
  information_schema.columnstore_extents e,
  information_schema.columnstore_columns c
WHERE c.table_schema='test'
  AND c.table_name='t1'
  AND c.column_name='a'
  AND c.object_id=e.object_id;

+-------------------+------------------+
| a                 | HEX(a)           |
+-------------------+------------------+
| ЪЪЪЪЪЪЪ─          | FFFFFFFFFFFFFF80 |
+-------------------+------------------+

+--------------+------------+-------------+------------------+------------------+
| table_schema | table_name | column_name | hex(e.min_value) | hex(e.max_value) |
+--------------+------------+-------------+------------------+------------------+
| test         | t1         | a           | NULL             | 8100000000000000 |
+--------------+------------+-------------+------------------+------------------+

min_value look wrong. max_value looks wrong.

DO calsettrace(1);
SELECT * FROM t1 WHERE a='ЪЪЪЪЪЪЪя' /*_koi8u x'FFFFFFFFFFFFFFF1'*/;
SELECT
  regexp_substr(calgettrace(1),'(BlocksTouched-[^;]*);') AS touched,
  regexp_substr(calgettrace(1),'(PartitionBlocksEliminated-[^;]*);') AS eliminated;
DO calsettrace(0);

Empty set, 1 warning (0.018 sec)

+------------------+------------------------------+
| touched          | eliminated                   |
+------------------+------------------------------+
| BlocksTouched-1; | PartitionBlocksEliminated-0; |
+------------------+------------------------------+

Looks wrong. The block should have been eliminated.

Comment by Daniel Lee (Inactive) [ 2021-12-20 ]

Build verified: 6.3.1-1 (#3562)

Repeated tests cases in ticket. Partition block eliminated.

MariaDB [mytest]> SELECT
    ->   regexp_substr(calgettrace(1),'(BlocksTouched-[^;]*);') AS touched,
    ->   regexp_substr(calgettrace(1),'(PartitionBlocksEliminated-[^;]*);') AS eliminated;
+------------------+------------------------------+
| touched          | eliminated                   |
+------------------+------------------------------+
| BlocksTouched-0; | PartitionBlocksEliminated-1; |
+------------------+------------------------------+
1 row in set (0.000 sec)

Generated at Thu Feb 08 02:53:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.