[MCOL-4738] AVG returns a wrong result Created: 2021-05-28  Updated: 2021-07-09  Resolved: 2021-07-09

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 6.1.1
Fix Version/s: 6.1.1

Type: Bug Priority: Blocker
Reporter: Alexander Barkov Assignee: Gagan Goel (Inactive)
Resolution: Fixed Votes: 0
Labels: regression

Sprint: 2021-8, 2021-9

 Description   

This problem is not repeatable with the develop-5 branch.

This problem is repeatable with the develop branch on Fedora (5.5.9-200.fc31.x86_64), only with RelWithDebInfo build.
Debug build returns expected results (work exactly in the same way with InnoDB).

I run this script:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a INT, b CHAR(5))ENGINE=Columnstore;
INSERT INTO t1 VALUES (NULL, ''),(1, 'aaa'),(2, 'aaa'),(3, 'ccc'),(4, 'ddd'),(5, 'aaa'),(6, 'ddd'),(7, 'eee');
SELECT * FROM t1;

+------+------+
| a    | b    |
+------+------+
| NULL | NULL |
|    1 | aaa  |
|    2 | aaa  |
|    3 | ccc  |
|    4 | ddd  |
|    5 | aaa  |
|    6 | ddd  |
|    7 | eee  |
+------+------+

Looks good so far.

SELECT a, AVG(a) FROM t1 GROUP BY a ORDER BY a;

+------+--------+
| a    | AVG(a) |
+------+--------+
| NULL |   NULL |
|    1 |   NULL |
|    2 |   NULL |
|    3 | 0.0000 |
|    4 |   NULL |
|    5 | 0.0000 |
|    6 | 0.0000 |
|    7 | 0.0000 |
+------+--------+

Looks wrong.

SELECT AVG(DISTINCT a) FROM t1;

+-----------------+
| AVG(DISTINCT a) |
+-----------------+
|            NULL |
+-----------------+

Also looks wrong.

If I change ENGINE to InnoDB, it returns expected results:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a INT, b CHAR(5)) ENGINE=InnoDB;
INSERT INTO t1 VALUES (NULL, ''),(1, 'aaa'),(2, 'aaa'),(3, 'ccc'),(4, 'ddd'),(5, 'aaa'),(6, 'ddd'),(7, 'eee');

SELECT a, AVG(a) FROM t1 GROUP BY a ORDER BY a;

+------+--------+
| a    | AVG(a) |
+------+--------+
| NULL |   NULL |
|    1 | 1.0000 |
|    2 | 2.0000 |
|    3 | 3.0000 |
|    4 | 4.0000 |
|    5 | 5.0000 |
|    6 | 6.0000 |
|    7 | 7.0000 |
+------+--------+

SELECT AVG(DISTINCT a) FROM t1;

+-----------------+
| AVG(DISTINCT a) |
+-----------------+
|          4.0000 |
+-----------------+



 Comments   
Comment by David Hall (Inactive) [ 2021-06-30 ]

AMD64 machines use 80 bits for long double, but (as a default) store the values in 128 bits.

After much testing, I've determined that gcc does not fill these unused bits, but rather leaves them undisturbed. Since we may compare rows using memcmp, any uninitialized bits such as this will compromise the compare. So we need to mask these bits to something so that memcmp works.

In order to work for systems (such as ARM) that use 128 bit long doubles rather than 80- bit, we can use

    int64_t d = std::numeric_limits<long double>::digits;  // Number of base 2 digits (64 for 80 bit fpu)
    int64_t e = std::numeric_limits<long double>::max_exponent; // max base 2 exponent (16384 for 80 bit fpu)

To determine if masking is needed.

Comment by David Hall (Inactive) [ 2021-07-07 ]

Put logic in cmake to test for an 80 bit fpu and set a compiler flag. Then added an ifdef into setLongDoubleField to mask out the unused bits.

Comment by Roman [ 2021-07-07 ]

I am not sure if it is possible to test this w/o developer ATM. The patch affects ARM or custom builds on either Fedora(we don't have it in our CI)
or using Ninja instead of CMake.
I will ask bar and tntnatbry to test the patch.

Comment by Gagan Goel (Inactive) [ 2021-07-07 ]

David.Hall, drrtuy The patch still does not fix Ninja RelWithDebInfo builds. I am re-opening the ticket:

MariaDB [test]> create table t1 (a int)engine=columnstore;
Query OK, 0 rows affected (0.451 sec)
 
MariaDB [test]> insert into t1 values (1), (2);
Query OK, 2 rows affected (0.197 sec)
Records: 2  Duplicates: 0  Warnings: 0
 
MariaDB [test]>  select avg(a) from t1;
+--------+
| avg(a) |
+--------+
|   NULL |
+--------+
1 row in set (0.050 sec)

Comment by Alexander Barkov [ 2021-07-07 ]

The patch does not fix the problem on my Fedora box.

Comment by David Hall (Inactive) [ 2021-07-07 ]

I came up with a solution that should work wit strict_aliasing

Comment by Gagan Goel (Inactive) [ 2021-07-08 ]

Changes look good to me. I have confirmed Ninja RelWithDebInfo builds are now giving expected results.

bar Can you please confirm if the problem is also now fixed in your Fedora box?

Comment by Gagan Goel (Inactive) [ 2021-07-09 ]

Bar confirmed the issue is also fixed on Fedora. Closing this as fixed.

Generated at Thu Feb 08 02:52:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.