[MCOL-4529] Design the proper way of doing extent elimination on varchars with utf8 or utf8mb4 Created: 2021-02-02 Updated: 2021-04-15 Resolved: 2021-03-11 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | Documentation |
| Affects Version/s: | None |
| Fix Version/s: | N/A |
| Type: | Task | Priority: | Critical |
| Reporter: | Gregory Dorman (Inactive) | Assignee: | Sergey Zefirov |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Epic Link: | ColumnStore Performance Improvements | ||||||||||||||||||||||||
| Sprint: | 2021-2, 2021-3, 2021-4, 2021-5 | ||||||||||||||||||||||||
| Description |
|
Overall - the top ticket (4522) is a massive regression. Starting in 5.5, the true length of varchar and char conforms to SQL standard, and incorporates byte counts for character sets such as utf8, and utf8mb4. Up to and including 5.4, it was wrong, and only reserved the bytes equal to the number of characters defined in the column DDL. The side effect of it is exemplified in the customer complaint: a varchar(5) under olf utf8 used to fit, and filters on it were subject to extent elimination. With 5.5, it no longer fits, which causes columnstore to essentially ignore an otherwise tight filter, and to proceed with the full scan. |
| Comments |
| Comment by Sergey Zefirov [ 2021-02-04 ] | ||||||||||||||||||||||||||
|
Roman proposed to split extent map by column width and I would like to piggy back that idea. I would like to propose to keep ranges in maps indexed by the the length. Right now the structure as follows:
What I propose is the following:
And extentmap will now contains several "extentarrays", one for each key length. It is easy to define maximum and minimum values for integers. Floating point numbers also can be handled, possible with one exception of negative zero. For strings of arbitrary length, the minimum and maximum can be defined as follows:
Please note that we consider minimum prefix to be extended to the right with zeroes infinitely and maximum to be extended to the right with the 0xff infinitely. That makes comparison of key length prefixes of strings with maximums and minimums valid: minimum is equal or less than any string in the range and maximum is equal or greater to any string in the range. The prefix of the string preserves utf8 encoding, except for some last symbol. That last symbol can be thrown off and several first symbols can be used with collation for better extent use or elimination. | ||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-04 ] | ||||||||||||||||||||||||||
|
Extentmap internally is defined as preallocated array due to the way changes are handled: these changes are recorded as n-byte values starting at some memory address(es). When something is about to be changed, its value (n bytes of it) first recorded in the rollback record and then it is changed. The ExtentMap inherits from Undoable class and uses default implementation. Thus, we should keep "array of records" design if we do not want to change in-memory rollback mechanism. We can do that by having several arrays instead of one. And, possibly, even save some memory, I'll expand that soon. Atomic update of several ranges with different key lengths is also possible: we can acquire locks for each individual key length sequentially, by using some order on key length (ascending or any other). The same ordering of lock acquisition between several mutator processes prevents dead lock. | ||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-04 ] | ||||||||||||||||||||||||||
|
We can use "array of arrays" storage to keep allocation smaller. The idea is described here in Compact Dynamic Arrays section. Instead of having dynamic arrays, we can allocate needed parts of static array, having sqrt(N) array for pointers to several sqrt(N) subarrays. N above is maximum number of extent records kept. This way we will have same address for our implementation of Undoable and not more than O(sqrt(N)) overhead over maximum number of extents actually used. | ||||||||||||||||||||||||||
| Comment by Roman [ 2021-02-05 ] | ||||||||||||||||||||||||||
|
sergey.zefirov There will be no uint128 based dtypes so we don't need this integral type. | ||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-08 ] | ||||||||||||||||||||||||||
|
The locking in ExtentMap is done through a call to grabEMEntryTable(OPS op), where operation op can be READ or WRITE. It does so for the whole table at once. For more efficient split access we need to perform locking on multiple tables and for that we need to establish locking order for tables. Without lock ordering we can have deadlocks. The easiest ordering for extent map operation is by length of a key in an extent. We have fixed number of key lengths (1, 2, 4, 8 and 16 bytes) and set of key lengths for locking can be provided with the single integer values with some bits set. Thus, the grabEMEntryTable will change to grabEMEntryTables(OPS op, smallintset keyLenSet). The operation is still READ or WRITE, the key length set is a set of small integers i (i=0..5) for key lengths 2^i^, represented as a structure with single field. Corresponding release operation should receive same parameters and unlock tables in reverse locking order. The introduction of smallintset structure helps with threading values with types: we can provide singleton, union and enumeration operations, as well as operations for collections of LBID-associated ranges. | ||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-10 ] | ||||||||||||||||||||||||||
|
I definitely should record what Roman suggested: we may check extents with textual information with simple comparison of prefixes: extent_range_min <= SUBSTR(string, 0, LENGTH(extent_range_min)) && SUBSTR(string, 0, LENGTH(extent_range_max)) <= extent_range_max The "less or equal" comparison is collation-aware one. We keep ranges for TOKEN column extents in 16-byte (128-bit) keys. These TOKEN columns provide information about strings layout in dictionary file. Please note that TOKEN columns are 8-bytes wide and this can be a source of confusion. The string prefixes in ranges are binary - no collation is applied at the time of writing the strings into dictionary files. Collation, if any, is applied at the extent elimination phase. | ||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-10 ] | ||||||||||||||||||||||||||
|
Elena Starikova from server team shared this interesting bug from MySQL: https://bugs.mysql.com/bug.php?id=20247 Binary collation for strings "B", "D" and "Č" (accented C) will result in range "B".."Č" - both excessively large and, when used with proper collation, will not include "D" at all. The use of proper collation would result in "B".."D" but will preclude the use of any other collations. | ||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-10 ] | ||||||||||||||||||||||||||
|
Alexander Barkov explained that MariaDB does not optimize "some_column COLLATE collation" if collation specified differ from the collation specified for column. But still, we need to store binary (utf-8) representation of strings for different collations to be applied. | ||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-11 ] | ||||||||||||||||||||||||||
|
Started a document on this, shared it with Greg and Roman. | ||||||||||||||||||||||||||
| Comment by Gregory Dorman (Inactive) [ 2021-03-11 ] | ||||||||||||||||||||||||||
|
This task was to produce the design. It was accomplished. It is now MCOL-4580, implementation time. |