Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5104

Statistic Functions improvement: STDDEV_SAMP, STDDEV_POP, VAR_POP, VAP_SAMP

    XMLWordPrintable

Details

    • New Feature
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • 6.2.3
    • PrimProc
    • None
    • 2021-17

    Description

      Columnstore statistic functions STDDEV_SAMP, STDDEV_POP, VAR_POP, VAP_SAMP
      use naive algorithm, that looses precision dramatically
      see example

      MariaDB [test]> create table t(a bigint) engine columnstore;
      Query OK, 0 rows affected (0.344 sec)
       
      MariaDB [test]> insert into t value (80000000001);
      Query OK, 1 row affected (0.144 sec)
       
      MariaDB [test]> insert into t value (80000000003);
      Query OK, 1 row affected (0.076 sec)
       
      MariaDB [test]> select std(a) from t;
      +--------+
      | std(a) |
      +--------+
      | 0.0000 |
      +--------+
      1 row in set (0.059 sec)
       
      MariaDB [test]> create table tm(a bigint);
      Query OK, 0 rows affected (0.010 sec)
       
      MariaDB [test]> insert into tm value (80000000003);
      Query OK, 1 row affected (0.001 sec)
       
      MariaDB [test]> insert into tm value (80000000001);
      Query OK, 1 row affected (0.001 sec)
       
      MariaDB [test]> select std(a) from tm;
      +--------+
      | std(a) |
      +--------+
      | 1.0000 |
      +--------+
      1 row in set (0.001 sec)}}
      

      Same problem we have in twin window functions.
      Because of this problem we have different behavior on ARM and x86, and broken tests also.
      We should replace naive algorithm with more stable like server does

      Also Columnstore functions work improperly with decimals

      MariaDB [test]> select * from tdec;
      +--------------------------------------+
      | col1                                 |
      +--------------------------------------+
      | 800000000000000000000000000000000001 |
      | 800000000000000000000000000000000003 |
      | 800000000000000000000000000000000005 |
      | 800000000000000000000000000000000007 |
      | 800000000000000000000000000000000011 |
      | 800000000000000000000000000000000013 |
      | 800000000000000000000000000000000015 |
      | 800000000000000000000000000000000017 |
      | 800000000000000000000000000000000019 |
      | 800000000000000000000000000000000021 |
      +--------------------------------------+
      10 rows in set (0.045 sec)
       
      MariaDB [test]> SELECT 'q1', floor(STD(col1) OVER ()) AS std FROM tdec;
      +----+------+
      | q1 | std  |
      +----+------+
      | q1 | NULL |
      | q1 | NULL |
      | q1 | NULL |
      | q1 | NULL |
      | q1 | NULL |
      | q1 | NULL |
      | q1 | NULL |
      | q1 | NULL |
      | q1 | NULL |
      | q1 | NULL |
      +----+------+
      10 rows in set (0.010 sec)
      

      compare with server

      MariaDB [test]> select * from tdecserv;
      +--------------------------------------+
      | col1                                 |
      +--------------------------------------+
      | 800000000000000000000000000000000001 |
      | 800000000000000000000000000000000003 |
      | 800000000000000000000000000000000005 |
      | 800000000000000000000000000000000007 |
      | 800000000000000000000000000000000011 |
      | 800000000000000000000000000000000013 |
      | 800000000000000000000000000000000015 |
      | 800000000000000000000000000000000017 |
      | 800000000000000000000000000000000019 |
      | 800000000000000000000000000000000021 |
      +--------------------------------------+
      10 rows in set (0.000 sec)
       
      MariaDB [test]> SELECT 'q1', floor(STD(col1) OVER ()) AS std FROM tdecserv;
      +----+------+
      | q1 | std  |
      +----+------+
      | q1 |    0 |
      | q1 |    0 |
      | q1 |    0 |
      | q1 |    0 |
      | q1 |    0 |
      | q1 |    0 |
      | q1 |    0 |
      | q1 |    0 |
      | q1 |    0 |
      | q1 |    0 |
      +----+------+
      10 rows in set (0.001 sec)
      

      All of this should be fixed

      See
      Algorythms

      Attachments

        Issue Links

          Activity

            People

              Andrey Andrey Piskunov (Inactive)
              leonid.fedorov Leonid Fedorov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.