# TPC-H SF=10 Benchmark: InnoDB + Query Accelerator vs InnoDB + hash join tweak ## Cluster Each run used a **fresh cluster provisioned from scratch** via Terraform + Ansible (no data or state reused between runs). | Parameter | Value | | ---------------- | -------------------------------------------- | | **MariaDB** | 11.4.10-7 Enterprise | | **Columnstore** | 25.10.3 (CI build stable-23.10/cron/3028) | | **Instances** | c5n.metal (72 vCPU, 192 GB RAM) x 3 (mcs1-3) | | **Load Gen** | mx1 (c5n.metal) -- go-tpc -> mcs1 via VPC | | **OS / Region** | Ubuntu 24.04 / us-west-2 | | **EBS** | 70 GB gp3 | | **Scale Factor** | 10 (~10 GB raw, 60M lineitem rows) | | **Engine** | InnoDB (all tables) | | **Tool** | go-tpc tpch, 22 queries, single run each | --- ## Run 1 -- InnoDB + Query Accelerator QA routes InnoDB queries transparently to Columnstore, distributing work across all 3 PM nodes. | Parameter | Value | | ----------------------------------------- | ----------------------------- | | `columnstore_innodb_queries_use_mcs` | ON (config, requires restart) | | `columnstore_unstable_optimizer` | ON | | `columnstore_query_accel_parallel_factor` | 50 | | `max_connections` | 500 | | `optimizer_switch` | RBO string (see below) | Optimizer switch: all flags OFF except `materialization`, `condition_pushdown_for_derived`, `condition_pushdown_from_having`. Engine Independent Statistics with histograms were collected for parallel CES activation: | Table | Column | histogram_size | | -------- | ---------- | -------------: | | lineitem | l_orderkey | 10 | | partsupp | ps_partkey | 3 | | part | p_partkey | 2 | | orders | o_orderkey | 5 | Additional indexes: `orders(o_orderkey, o_orderstatus)`, `supplier(s_nationkey, s_suppkey, s_name)`. Verification: `mcs_get_plan('rules')` -> **parallel_ces**, predicate_pushdown. **Result:** All 22 queries completed in **2558s (~43 min)**. --- ## Run 2 -- InnoDB (no QA) + HASH JOIN Tweak QA disabled (`columnstore_innodb_queries_use_mcs = OFF`). | Parameter | Value | | ------------------ | ------------------------------- | | `join_cache_level` | 3 | | `join_buffer_size` | 3 GB | | `optimizer_switch` | `optimize_join_buffer_size=off` | Verification: `EXPLAIN FORMAT=JSON` -> `access_type: hash_ALL`, `join_type: BNLH`, `buffer_size: 3072Mb`. **Result:** 19 of 22 queries completed in **3430s (~57 min)**. 3 queries (Q17, Q20, Q22) could not finish -- correlated subqueries with estimated run times of 38h to 3 days. They were killed after 6-7 min of execution. --- ## Query Times | Query | QA (s) | noQA (s) | Ratio | | ----- | -----: | ------------: | ---------: | | Q1 | 76.47 | 124.79 | 1.63x | | Q2 | 75.67 | 1199.91 | 15.86x | | Q3 | 96.20 | 95.87 | 1.00x | | Q4 | 93.85 | 93.99 | 1.00x | | Q5 | 96.47 | 97.14 | 1.01x | | Q6 | 77.28 | 76.54 | 0.99x | | Q7 | 97.68 | 116.27 | 1.19x | | Q8 | 98.62 | 99.15 | 1.01x | | Q9 | 163.28 | 171.33 | 1.05x | | Q10 | 95.80 | 96.20 | 1.00x | | Q11 | 136.60 | 135.46 | 0.99x | | Q12 | 93.38 | 93.31 | 1.00x | | Q13 | 19.50 | 101.70 | 5.22x | | Q14 | 79.89 | 79.22 | 0.99x | | Q15 | 154.82 | 154.55 | 1.00x | | Q16 | 71.24 | 70.56 | 0.99x | | Q17 | 152.54 | ~162000 est. | ~1062x | | Q18 | 170.79 | 178.71 | 1.05x | | Q19 | 80.03 | 161.53 | 2.02x | | Q20 | 140.32 | ~137000 est. | ~976x | | Q21 | 465.60 | 283.60 | 0.61x | | Q22 | 21.64 | ~83-256K est. | ~3800-12Kx | ### Summary | Metric | QA | noQA + HASH JOIN | | ------------------------- | ------------------- | ------------------- | | **Completed** | 22 / 22 | 19 / 22 | | **Total (22 / 19)** | **2558s (~43 min)** | **3430s (~57 min)** | | **Ratio (19 comparable)** | | **1.53x slower** | | **Parity (<=5%)** | | 11 queries | | **Slower (>5%)** | | 5 queries | | **DNF** | 0 | 3 queries | --- ## Node Metrics ### CPU (% of 72 cores) | Node | QA Min | QA Avg | QA P95 | QA Max | noQA Min | noQA Avg | noQA P95 | noQA Max | | ---- | -----: | -----: | -----: | -----: | -------: | -------: | -------: | -------: | | mcs1 | 0.1% | 2.2% | 4.1% | 33.9% | 0.1% | 1.4% | 2.5% | 3.8% | | mcs2 | 0.0% | 0.5% | 1.5% | 16.9% | 0.1% | 0.1% | 0.2% | 3.4% | | mcs3 | 0.0% | 0.5% | 1.5% | 17.3% | 0.1% | 0.1% | 0.2% | 3.5% | ### RAM (% of 192 GB) | Node | QA Min | QA Avg | QA P95 | QA Max | noQA Min | noQA Avg | noQA P95 | noQA Max | | ---- | -----: | -----: | -----: | -----: | -------: | -------: | -------: | -------: | | mcs1 | 1.9% | 4.3% | 9.8% | 18.2% | 1.8% | 2.6% | 3.3% | 4.8% | | mcs2 | 1.4% | 1.5% | 1.5% | 1.5% | 1.5% | 1.5% | 1.5% | 1.6% | | mcs3 | 1.3% | 1.4% | 1.4% | 1.4% | 1.4% | 1.4% | 1.4% | 1.5% | ### Disk IO (MiB/s) | Node | QA Min | QA Avg | QA P95 | QA Max | noQA Min | noQA Avg | noQA P95 | noQA Max | | ---- | -----: | -----: | -----: | -----: | -------: | -------: | -------: | -------: | | mcs1 | 0.0 | 88.2 | 125.7 | 213.3 | 0.0 | 70.9 | 125.2 | 183.8 | | mcs2 | 0.0 | 14.8 | 125.2 | 202.0 | 0.0 | 0.5 | 0.0 | 1.3 | | mcs3 | 0.0 | 14.8 | 125.2 | 203.0 | 0.0 | 0.5 | 0.0 | 0.7 | --- ## Estimation Methodology Queries Q17, Q20, Q22 could not complete without QA and were killed. Estimated durations use the same methodology and throughput measurements from the identical hardware/data combination: 1. **EXPLAIN FORMAT=JSON** -- query plan analysis: correlated subquery type, `expression_cache` keys. 2. **Unique key count** -- verified via `SELECT COUNT(*)` (Q17: 2044) or TPC-H data distribution spec (Q20, Q22). 3. **`Innodb_rows_read`** -- sampled twice 10-30s apart during execution -> measured throughput. 4. **Total rows** = unique keys x table size. Aggregate subqueries (`avg`, `sum`) read all rows (no short-circuit). 5. **Estimated time** = total rows / throughput. | Query | Subquery type | Unique keys | Rows/key | Total rows | Speed | Est. | | ----- | ----------------- | ------------- | -------- | ---------- | ------- | ---------- | | Q17 | `avg` (aggregate) | 2044 verified | 59M | 121B | 750K/s | **~45h** | | Q20 | `sum` (aggregate) | ~2133 spec | 59M | 127B | 915K/s | **~38h** | | Q22 | `NOT EXISTS` | 8-25K spec | 15M | 119-373B | 1.46M/s | **23-71h** | Q21 (`EXISTS`/`NOT EXISTS`) completed in 284s (noQA) despite similar structure, because EXISTS short-circuits on first match, correlation by `l_orderkey` has high locality (~4 rows/order), and pre-filtering by `nation = 'SAUDI ARABIA'` limits invocations. --- ## Conclusions ### 1. QA completes the full TPC-H benchmark; InnoDB alone cannot With QA, all 22 queries completed in **41 minutes**. Without QA, 3 out of 22 queries (Q17, Q20, Q22) could not finish -- estimated run times range from **38 hours to 3 days** per query. These queries contain correlated subqueries that force InnoDB into billions of full table scans. ### 2. For simple queries, HASH JOIN achieves near-parity with QA 11 out of 22 queries ran within +/-5% of QA times. The HASH JOIN tweak (`join_cache_level=3`, `join_buffer_size=3GB`) effectively replaces nested-loop joins with hash joins for straightforward multi-table queries. ### 3. QA advantage grows with query complexity | Query pattern | noQA / QA ratio | | ------------------------------- | --------------- | | Simple scans / 2-3 table joins | ~1.0x | | Multi-table joins (6+ tables) | 1.2-2.0x | | GROUP BY + subqueries | 5-16x | | Correlated aggregate subqueries | ~1000x (DNF) | ### 4. Without QA, only 1 of 3 nodes is utilized InnoDB runs exclusively on mcs1. mcs2 and mcs3 are completely idle (CPU 0.1%, Disk IO 0.5 MiB/s). QA distributes work across all PMs -- CPU peaks of 17% on mcs2/mcs3 and Disk IO avg 14.8 MiB/s vs 0.5.