Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4950

Extent elimination for floating point values

    XMLWordPrintable

Details

    • New Feature
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • Icebox
    • None
    • None

    Description

      Floating point values are encoded using some cryptic scheme that appears to be not easily ordered using integer comparison rules. But that appearance is false - floating point values can easily be translated into unsigned integers and compared using unsigned integer comparison.

      The following is an idea, not tested in code.

      For our purposes we will distinguish three kinds of values:

      • regular FP values in the form S*1.MMMMM*2^(EEEE+exponent_bias), where S is the sign (1, encoded as 0, or -1, encoded as 1), MMMM are mantissa bits and EEEE are exponent bits; exponent_bias is an offset for exponent so that exponent can be compared using unsigned comparison rules;
      • zero - encoded as all zeroes except for sign bit and can be negative or positive zero; we should normalize zero to be only positive zero (0).
      • NAN, not a number, MCS uses - positive or negative, exponent bits are all 1, mantissa bits are anything but all 0.

      I will use FP32 as example as they are shorter to write, but reasoning applies to FP64 values as well.

      Positive FP32 values have form 0 EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM - 0, then 8 bits of exponent and 23 bits of mantissa. exponent_bias constant for FP32 is 127.

      Exponent of all zeroes is reserved to represent denormalized values, most useful of them is 0.0: 0 00000000 00000000000000000000000. Exponent of all ones is for NAN (mantissa is not 0) and Inifinity (mantissa is 0) values. Least exponent of normalized FP32 value (-127) is encoded as 00000001 (-127+128), zero exponent is encoded as 128 and biggest exponent (126) isi encoded as 11111110. Note that exponents are ordered as unsigned values.

      Mantissa represents fractional part of a value 1.x (1 is implicit for normalized FP values). Mantissa values also can be ordered as unsigned values.

      Thus, positive normalized FP values and zero can be ordered as unsigned values: zero is less than any non-zero value, values with different exponents are ordered by their exponent values and values with eqeual exponents are ordered by mantissa values, and implicit integerr part 1 can be ignored in that case.

      The problem is with negative values - they are ordered backwards.

      We can emply the following transformation (FP32 is a floating point value cast to bit-by-bit equal unsigned integer): FP32UINT = SIGN(FP32) ? 0x80000000 - FP32 : 0x80000000 + FP32. In that case, all negative values will have ordering from largest to smallest (because we reversed theirordering) and all positive values, including zero, will have ordering from smallest to largest.

      The NULL value for unsigned 32-bit integers is 0xfffffffe, which is NAN and NAN is used to represent NULL value inside MCS. We may account for that in our transformation code above.

      In the end, we can convert between floating point values and unsigned integers and use existing extentmap functionality to eliminate extents with floating point values.

      Attachments

        Activity

          People

            sergey.zefirov Sergey Zefirov
            sergey.zefirov Sergey Zefirov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.