Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Do
-
1.2.4, 1.2.5
Description
Problem:
We are having an issue when using window functions with newer Maria DB Columnstore version such as 1.2.4 and 1.2.5 versions. Same query runs over same data in MariaDB Columnstore 1.0.8 without any issue in seconds but never finishes on newer versions.
Story:
We migrated a database from version 1.0.8 to 1.2.4. After the migration, Users started to complain about non-ending queries with window functions. Processlist shows "creating sort index" as query state.
Test Results:
Below you can see test results in detail (execution time for same query based on different Columnstore versions with different data sizes.) As soon as table passes a certain number of rows(roughly 50K), execution time increases exponentially and becomes unacceptable.
1.2.4 and 1.2.5 versions are running on better hosts with more resources.
Hint:
One thing worth to note down here is that Query status is shown as "Creating Sort Index" for almost entire duration in Test 4 and Test 5 in both newer versions.
I added Columnstore.xml file for 1.2.4 version.
Test 1: Original Query Execution with Full Table Data more than 6 million
| Version | Table Row Count | Execution Time |
|---|---|---|
| 1.0.8 | 6 M | 19 Seconds |
| 1.2.4 | 6 M | Killed after 6 hours |
| 1.2.5 | 6 M | Killed after 6 hours |
Test 2: Limited Data (1000 rows)
| Version | Table Row Count | Execution Time |
|---|---|---|
| 1.0.8 | 1K | 0.24 seconds |
| 1.2.4 | 1K | 0.22 seconds |
| 1.2.5 | 1K | 0.26 seconds |
Test 3: Limited Data (10000 rows)
| Version | Table Row Count | Execution Time |
|---|---|---|
| 1.0.8 | 10K | 0.30 seconds |
| 1.2.4 | 10K | 9 seconds |
| 1.2.5 | 10K | 1.5 seconds |
Test 4: Limited Data (100000 rows)
| Version | Table Row Count | Execution Time |
|---|---|---|
| 1.0.8 | 100K | 0.80 seconds |
| 1.2.4 | 100K | 10 min 19.03 sec |
| 1.2.5 | 100K | 1 min. 32 seconds |
Test 5: Limited Data (1000000 rows)
| Version | Table Row Count | Execution Time |
|---|---|---|
| 1.0.8 | 1M | 3.63 seconds |
| 1.2.4 | 1M | Killed after 2.5 hours |
| 1.2.5 | 1M | Killed after 2.5 hours |
EXPLAIN Results
For 1.0.8:
{
|
"query_block": {
|
"select_id": 1,
|
"const_condition": "1",
|
"filesort": {
|
"temporary_table": {
|
"function": "buffer",
|
"table": {
|
"table_name": "<derived2>",
|
"access_type": "ALL",
|
"rows": 2000,
|
"filtered": 100,
|
"materialized": {
|
"query_block": {
|
"select_id": 2,
|
"table": {
|
"table_name": "tblnoporderadd_yesterday",
|
"access_type": "ALL",
|
"rows": 2000,
|
"filtered": 100
|
}
|
}
|
}
|
},
|
"table": {
|
"table_name": "i",
|
"access_type": "ALL",
|
"rows": 2000,
|
"filtered": 100,
|
"attached_condition": "trigcond((i.ipaddress = t.dst_ip_left))"
|
}
|
}
|
}
|
}
|
}
|
|
1.2.5:
{
|
"query_block": {
|
"select_id": 1,
|
"const_condition": "1",
|
"filesort": {
|
"sort_key": "if(i.alias is null,t.dst_ip_left,i.alias), cast(t.pct_50 as signed), cast(t.pct_80 as signed), cast(t.pct_90 as signed), cast(t.pct_95 as signed), cast(t.pct_99 as signed)",
|
"temporary_table": {
|
"table": {
|
"table_name": "<derived2>",
|
"access_type": "ALL",
|
"rows": 2000,
|
"filtered": 100,
|
"materialized": {
|
"query_block": {
|
"select_id": 2,
|
"window_functions_computation": {
|
"sorts": {
|
"filesort": {
|
"sort_key": "tblnoporderadd_yesterday.dst_ip_left, tblnoporderadd_yesterday.latency"
|
}
|
},
|
"temporary_table": {
|
"table": {
|
"table_name": "tblnoporderadd_yesterday",
|
"access_type": "ALL",
|
"rows": 2000,
|
"filtered": 100
|
}
|
}
|
}
|
}
|
}
|
},
|
"table": {
|
"table_name": "i",
|
"access_type": "ALL",
|
"rows": 2000,
|
"filtered": 100,
|
"attached_condition": "trigcond(i.ipaddress = t.dst_ip_left)"
|
}
|
}
|
}
|
}
|
// Some comments here
|
SELECT '2019-09-19' date, if(isnull(i.alias),dst_ip_left,i.alias) host, 'ORDERADD' message_type , cast(pct_50 as integer) pct50, cast(pct_80 as integer) pct80, cast(pct_90 as integer) pct90, cast(pct_95 as integer) pct95, cast(pct_99 as integer) pct99, count(*) samplesize
|
FROM (
|
SELECT dst_ip_left,
|
percentile_cont(0.5) within group (order by latency) OVER (PARTITION BY dst_ip_left) pct_50,
|
percentile_cont(0.8) within group (order by latency) OVER (PARTITION BY dst_ip_left) pct_80,
|
percentile_cont(0.9) within group (order by latency) OVER (PARTITION BY dst_ip_left) pct_90,
|
percentile_cont(0.95) within group (order by latency) OVER (PARTITION BY dst_ip_left) pct_95,
|
percentile_cont(0.99) within group (order by latency) OVER (PARTITION BY dst_ip_left) pct_99
|
FROM Table1
|
) as t
|
LEFT JOIN Table2 as i on i.ipaddress = t.dst_ip_left
|
GROUP BY host, pct50, pct80, pct90, pct95, pct99
|
ORDER BY host;
|