[MCOL-5451] Repeated read on large disk agg query inconsistent Created: 2023-03-10  Updated: 2023-11-17  Resolved: 2023-04-01

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: None
Fix Version/s: 23.02.2

Type: Bug Priority: Major
Reporter: Allen Herrera Assignee: Roman
Resolution: Fixed Votes: 2
Labels: None

Attachments: XML File Columnstore_xml_diskagg_1thr.xml    
Assigned for Review: Leonid Fedorov Leonid Fedorov
Assigned for Testing: Daniel Lee Daniel Lee (Inactive)

 Description   

Large group by on 40 billion records has different results when run. Implying something is broken with disk agg/query execution

Setup:

mcsSetConfig HashJoin TotalUmMemory 10%
mcsSetConfig DBBC NumBlocksPct 10
mcsSetConfig RowAggregation AllowDiskBasedAggregation Y
systemctl stop mariadb-columnstore
systemctl start mariadb-columnstore
 
create database if not exists test; use test;
create table test_256 (a varchar(30)) ENGINE=Columnstore DEFAULT CHARACTER SET=utf8;
cpimport test test_256 /data/split_twohundredfiftysixth_1

Problem Query:

time mariadb test -qse "select count(*) from (select count(a) from test_256 GROUP BY a) s;

Data in first comment

Proof

[root@ip-172-31-19-93 data]# time mariadb test -qse "select count(*) from (select count(a) from test_256 GROUP BY a) s;"
count(*)
142064923
real    0m40.239s
[root@ip-172-31-19-93 data]# time mariadb test -qse "select count(*) from (select count(a) from test_256 GROUP BY a) s;"
count(*)
142022851
real    0m38.298s
[root@ip-172-31-19-93 data]# time mariadb test -qse "select count(*) from (select count(a) from test_256 GROUP BY a) s;"
count(*)
142016464
real    0m37.936s



 Comments   
Comment by Daniel Lee (Inactive) [ 2023-04-01 ]

Build verified: 23.02.2 (Jenkins RC)

Repeat execution returned consistent results. Results also matched that of InnoDB.

MariaDB [tpch1]> select count(*) from lineitem;
+----------+
| count(*) |
+----------+
|  6001215 |
+----------+
1 row in set (0.113 sec)
 
MariaDB [tpch1]> select count(*) from (select count(l_comment) from lineitem GROUP BY l_comment) s;
+----------+
| count(*) |
+----------+
|  4501941 |
+----------+
1 row in set (1.540 sec)

Generated at Thu Feb 08 02:57:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.