[MCOL-4522] calGetTrace shows double LIO from Community 5.4.1 to Enterprise 5.5.1 Created: 2021-01-29  Updated: 2021-04-15  Resolved: 2021-04-04

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 5.5.1
Fix Version/s: 5.5.2

Type: Bug Priority: Major
Reporter: Todd Stoffel (Inactive) Assignee: Gregory Dorman (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
is blocked by MCOL-4529 Design the proper way of doing extent... Closed
is blocked by MCOL-4580 Extent's approximate range keeping fo... In Testing
PartOf
includes MCOL-4527 Simple query performace is degraded b... Closed

 Description   

SELECT calSetTrace(1);

SELECT addresskey,
       cityname,
       statecode,
       zip
FROM   mailing_addresses_cis
WHERE  zip = '33776';

SELECT calGetTrace();

Community 5.4.1 Results:

Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows
BPS PM mailing_addresses_cis 4390 (addresskey,cityname,statecode,zip) 0 109325 49152 0.815 5679
TNS UM - - - - - - 0.801 5679

Enterprise 5.5.1 Results:

Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows
BPS PM mailing_addresses_cis 4390 (addresskey,cityname,statecode,zip) 0 288623 0 1.604 5679
TNS UM - - - - - - 1.585 5679

It appears that the Logical I/O is more than double in enterprise, we see 0 Partition Blocks Eliminated, and the query time is almost double.



 Comments   
Comment by David Hall (Inactive) [ 2021-02-01 ]

In 5.5, in order to be in line with InnoDB, we changed the way we declare char and varchar. It used to be declarations of length were in bytes, now length is declared in number of characters. This means that a varchar(5) declared as UTF will create a field with max length of 15 (3 being the largest number of bytes for a UTF character). The largest number of bytes that can currently be stored in the data file of a varchar is 7 (1st byte is length). So the ZIP varchar(5) becomes a dictionary field. Dictionary fields cannot currently be used in extent elimination, so it requires a full scan.

In the future we hope to implement 16 byte char data files (15 usable bytes for varchar), which means this particular definition would again participate in extent elimination. In addition we are working on a design to allow dictionary columns to participate in extent elimination. However, neither of these efforts has been scheduled.

There are two workarounds:
1) declare the field as a numeric type. In all cases, if a choice to declare a field as numeric is possible, it should be done:
`ZIP` int NOT NULL,
2) if it is impossible to declare the field as numeric, but it can be declared as latin1, it should be done. Better yet, declare it with collate latin1_bin:
`ZIP` varchar(5) character set 'latin1' COLLATE latin1_bin NOT NULL,

When used in a WHERE clause. numeric fields are significantly faster than character fields, even when those character fields can participate in extent elimination.

Comment by Gregory Dorman (Inactive) [ 2021-02-21 ]

This depends on 4529, which will be 6.1.1 at best...

In 5.5.2 the situation is improved due to 4527, which restores the performance for latin1 charsets at least. utf8 series require more work.

Comment by Gregory Dorman (Inactive) [ 2021-04-04 ]

Tested in 5.5.2 GA, with latin1. extent elimination of character columns in utf8 needs to wait for 6.1.1

Generated at Thu Feb 08 02:50:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.