[MCOL-4316] columnstore query with aggregation and counts returning incorrect data with mysql, jdbc, but correct with odbc Created: 2020-09-18  Updated: 2021-05-02  Resolved: 2020-11-07

Status: Closed
Project: MariaDB ColumnStore
Component/s: N/A
Affects Version/s: 1.5.3
Fix Version/s: 5.5.1

Type: Bug Priority: Major
Reporter: David Hill (Inactive) Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

single node community addition



 Description   

our columnstore database (community edition) has incorrect results from direct query tools but works correctly from an external link in SQL Server using ODBC.



 Comments   
Comment by Todd Stoffel (Inactive) [ 2020-09-18 ]

I am not able to reproduce this issue. We'll need information about the connectors involved and their versions.

Comment by David Hill (Inactive) [ 2020-09-18 ]

We are using this JDBC driver that gets the wrong data:
mariadb-java-client-2,6,2,jar

I also got the wrong data using mysql client running in Ubuntu:
mysql Ver 15.1 Distrib 10.3.22-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2

I am also attaching an export of the columnstore table. This table just has FK values which you can then add corresponding data in the children tables shown in the join.

Comment by David Hill (Inactive) [ 2020-09-18 ]

from customer

Please note also that the table rslt_somatic_variation is set for snappy compression and we insert the data in one statement from a staging table to get the best compression. You should be able to get a similar compression with import.

Comment by Todd Stoffel (Inactive) [ 2020-11-07 ]

ColumnStore is a columnar based engine with no indexes. As such, the data is returned in non-deterministic order. The sample query provided was a cross engine join and was ordered by an InnoDB field only. Once that order is satisfied, the remaining results are returned in random order. To get the same results every time, you must order by the ColumnStore fields as well. Otherwise this might give the illusion of different results.

For instance:

SELECT 
  rsv.biomarker_id, 
  t.primary_dx, 
  max(b.name) AS gene, 
  count(DISTINCT rsv.patient_id) AS patient_count 
FROM 
  coh_genomics.rslt_somatic_variation rsv 
  JOIN coh_genomics.biomarker b ON rsv.biomarker_id = b.biomarker_id 
  AND b.name NOT IN ('MSI', 'TMB') 
  JOIN test_order t ON t.load_id = rsv.load_id 
GROUP BY 
  rsv.biomarker_id, 
  t.primary_dx 
ORDER BY 
  count(DISTINCT rsv.patient_id) DESC, biomarker_id, primary_dx;

Generated at Thu Feb 08 02:49:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.