[MCOL-4950] Extent elimination for floating point values Created: 2021-12-20 Updated: 2022-01-05 |
|
| Status: | Open |
| Project: | MariaDB ColumnStore |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Icebox |
| Type: | New Feature | Priority: | Major |
| Reporter: | Sergey Zefirov | Assignee: | Sergey Zefirov |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Description |
|
Floating point values are encoded using some cryptic scheme that appears to be not easily ordered using integer comparison rules. But that appearance is false - floating point values can easily be translated into unsigned integers and compared using unsigned integer comparison. The following is an idea, not tested in code. For our purposes we will distinguish three kinds of values:
I will use FP32 as example as they are shorter to write, but reasoning applies to FP64 values as well. Positive FP32 values have form 0 EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM - 0, then 8 bits of exponent and 23 bits of mantissa. exponent_bias constant for FP32 is 127. Exponent of all zeroes is reserved to represent denormalized values, most useful of them is 0.0: 0 00000000 00000000000000000000000. Exponent of all ones is for NAN (mantissa is not 0) and Inifinity (mantissa is 0) values. Least exponent of normalized FP32 value (-127) is encoded as 00000001 (-127+128), zero exponent is encoded as 128 and biggest exponent (126) isi encoded as 11111110. Note that exponents are ordered as unsigned values. Mantissa represents fractional part Thus, positive normalized FP values and zero can be ordered as unsigned values: zero is less than any non-zero value, values with different exponents are ordered by their exponent values and values with eqeual exponents are ordered by mantissa values, and implicit integerr part 1 can be ignored in that case. The problem is with negative values - they are ordered backwards. We can emply the following transformation (FP32 is a floating point value cast to bit-by-bit equal unsigned integer): FP32UINT = SIGN(FP32) ? 0x80000000 - FP32 : 0x80000000 + FP32. In that case, all negative values will have ordering from largest to smallest (because we reversed theirordering) and all positive values, including zero, will have ordering from smallest to largest. The NULL value for unsigned 32-bit integers is 0xfffffffe, which is NAN and NAN is used to represent NULL value inside MCS. We may account for that in our transformation code above. In the end, we can convert between floating point values and unsigned integers and use existing extentmap functionality to eliminate extents with floating point values. |
| Comments |
| Comment by David Hall (Inactive) [ 2021-12-20 ] |
|
Be careful: Floating point encoding may not be consistent between architectures (Say ARM64). It may be fine, but check. |