# TPC-H SF=10 Benchmark: InnoDB + Query Accelerator vs InnoDB + hash join tweak

## Cluster

Each run used a **fresh cluster provisioned from scratch** via Terraform + Ansible (no data or state reused between runs).

| Parameter        | Value                                        |
| ---------------- | -------------------------------------------- |
| **MariaDB**      | 11.4.10-7 Enterprise                         |
| **Columnstore**  | 25.10.3 (CI build stable-23.10/cron/3028)    |
| **Instances**    | c5n.metal (72 vCPU, 192 GB RAM) x 3 (mcs1-3) |
| **Load Gen**     | mx1 (c5n.metal) -- go-tpc -> mcs1 via VPC    |
| **OS / Region**  | Ubuntu 24.04 / us-west-2                     |
| **EBS**          | 70 GB gp3                                    |
| **Scale Factor** | 10 (~10 GB raw, 60M lineitem rows)           |
| **Engine**       | InnoDB (all tables)                          |
| **Tool**         | go-tpc tpch, 22 queries, single run each     |

---

## Run 1 -- InnoDB + Query Accelerator

QA routes InnoDB queries transparently to Columnstore, distributing work across all 3 PM nodes.

| Parameter                                 | Value                         |
| ----------------------------------------- | ----------------------------- |
| `columnstore_innodb_queries_use_mcs`      | ON (config, requires restart) |
| `columnstore_unstable_optimizer`          | ON                            |
| `columnstore_query_accel_parallel_factor` | 50                            |
| `max_connections`                         | 500                           |
| `optimizer_switch`                        | RBO string (see below)        |

Optimizer switch: all flags OFF except `materialization`, `condition_pushdown_for_derived`, `condition_pushdown_from_having`.

Engine Independent Statistics with histograms were collected for parallel CES activation:

| Table    | Column     | histogram_size |
| -------- | ---------- | -------------: |
| lineitem | l_orderkey |             10 |
| partsupp | ps_partkey |              3 |
| part     | p_partkey  |              2 |
| orders   | o_orderkey |              5 |

Additional indexes: `orders(o_orderkey, o_orderstatus)`, `supplier(s_nationkey, s_suppkey, s_name)`.

Verification: `mcs_get_plan('rules')` -> **parallel_ces**, predicate_pushdown.

**Result:** All 22 queries completed in **2558s (~43 min)**.

---

## Run 2 -- InnoDB (no QA) + HASH JOIN Tweak

QA disabled (`columnstore_innodb_queries_use_mcs = OFF`).

| Parameter          | Value                           |
| ------------------ | ------------------------------- |
| `join_cache_level` | 3                               |
| `join_buffer_size` | 3 GB                            |
| `optimizer_switch` | `optimize_join_buffer_size=off` |

Verification: `EXPLAIN FORMAT=JSON` -> `access_type: hash_ALL`, `join_type: BNLH`, `buffer_size: 3072Mb`.

**Result:** 19 of 22 queries completed in **3430s (~57 min)**. 3 queries (Q17, Q20, Q22) could not finish -- correlated subqueries with estimated run times of 38h to 3 days. They were killed after 6-7 min of execution.

---

## Query Times

| Query | QA (s) |      noQA (s) |      Ratio |
| ----- | -----: | ------------: | ---------: |
| Q1    |  76.47 |        124.79 |      1.63x |
| Q2    |  75.67 |       1199.91 |     15.86x |
| Q3    |  96.20 |         95.87 |      1.00x |
| Q4    |  93.85 |         93.99 |      1.00x |
| Q5    |  96.47 |         97.14 |      1.01x |
| Q6    |  77.28 |         76.54 |      0.99x |
| Q7    |  97.68 |        116.27 |      1.19x |
| Q8    |  98.62 |         99.15 |      1.01x |
| Q9    | 163.28 |        171.33 |      1.05x |
| Q10   |  95.80 |         96.20 |      1.00x |
| Q11   | 136.60 |        135.46 |      0.99x |
| Q12   |  93.38 |         93.31 |      1.00x |
| Q13   |  19.50 |        101.70 |      5.22x |
| Q14   |  79.89 |         79.22 |      0.99x |
| Q15   | 154.82 |        154.55 |      1.00x |
| Q16   |  71.24 |         70.56 |      0.99x |
| Q17   | 152.54 |  ~162000 est. |     ~1062x |
| Q18   | 170.79 |        178.71 |      1.05x |
| Q19   |  80.03 |        161.53 |      2.02x |
| Q20   | 140.32 |  ~137000 est. |      ~976x |
| Q21   | 465.60 |        283.60 |      0.61x |
| Q22   |  21.64 | ~83-256K est. | ~3800-12Kx |

### Summary

| Metric                    | QA                  | noQA + HASH JOIN    |
| ------------------------- | ------------------- | ------------------- |
| **Completed**             | 22 / 22             | 19 / 22             |
| **Total (22 / 19)**       | **2558s (~43 min)** | **3430s (~57 min)** |
| **Ratio (19 comparable)** |                     | **1.53x slower**    |
| **Parity (<=5%)**         |                     | 11 queries          |
| **Slower (>5%)**          |                     | 5 queries           |
| **DNF**                   | 0                   | 3 queries           |

---

## Node Metrics

### CPU (% of 72 cores)

| Node | QA Min | QA Avg | QA P95 | QA Max | noQA Min | noQA Avg | noQA P95 | noQA Max |
| ---- | -----: | -----: | -----: | -----: | -------: | -------: | -------: | -------: |
| mcs1 |   0.1% |   2.2% |   4.1% |  33.9% |     0.1% |     1.4% |     2.5% |     3.8% |
| mcs2 |   0.0% |   0.5% |   1.5% |  16.9% |     0.1% |     0.1% |     0.2% |     3.4% |
| mcs3 |   0.0% |   0.5% |   1.5% |  17.3% |     0.1% |     0.1% |     0.2% |     3.5% |

### RAM (% of 192 GB)

| Node | QA Min | QA Avg | QA P95 | QA Max | noQA Min | noQA Avg | noQA P95 | noQA Max |
| ---- | -----: | -----: | -----: | -----: | -------: | -------: | -------: | -------: |
| mcs1 |   1.9% |   4.3% |   9.8% |  18.2% |     1.8% |     2.6% |     3.3% |     4.8% |
| mcs2 |   1.4% |   1.5% |   1.5% |   1.5% |     1.5% |     1.5% |     1.5% |     1.6% |
| mcs3 |   1.3% |   1.4% |   1.4% |   1.4% |     1.4% |     1.4% |     1.4% |     1.5% |

### Disk IO (MiB/s)

| Node | QA Min | QA Avg | QA P95 | QA Max | noQA Min | noQA Avg | noQA P95 | noQA Max |
| ---- | -----: | -----: | -----: | -----: | -------: | -------: | -------: | -------: |
| mcs1 |    0.0 |   88.2 |  125.7 |  213.3 |      0.0 |     70.9 |    125.2 |    183.8 |
| mcs2 |    0.0 |   14.8 |  125.2 |  202.0 |      0.0 |      0.5 |      0.0 |      1.3 |
| mcs3 |    0.0 |   14.8 |  125.2 |  203.0 |      0.0 |      0.5 |      0.0 |      0.7 |

---

## Estimation Methodology

Queries Q17, Q20, Q22 could not complete without QA and were killed. Estimated durations use the same methodology and throughput measurements from the identical hardware/data combination:

1. **EXPLAIN FORMAT=JSON** -- query plan analysis: correlated subquery type, `expression_cache` keys.
2. **Unique key count** -- verified via `SELECT COUNT(*)` (Q17: 2044) or TPC-H data distribution spec (Q20, Q22).
3. **`Innodb_rows_read`** -- sampled twice 10-30s apart during execution -> measured throughput.
4. **Total rows** = unique keys x table size. Aggregate subqueries (`avg`, `sum`) read all rows (no short-circuit).
5. **Estimated time** = total rows / throughput.

| Query | Subquery type     | Unique keys   | Rows/key | Total rows | Speed   | Est.       |
| ----- | ----------------- | ------------- | -------- | ---------- | ------- | ---------- |
| Q17   | `avg` (aggregate) | 2044 verified | 59M      | 121B       | 750K/s  | **~45h**   |
| Q20   | `sum` (aggregate) | ~2133 spec    | 59M      | 127B       | 915K/s  | **~38h**   |
| Q22   | `NOT EXISTS`      | 8-25K spec    | 15M      | 119-373B   | 1.46M/s | **23-71h** |

Q21 (`EXISTS`/`NOT EXISTS`) completed in 284s (noQA) despite similar structure, because EXISTS short-circuits on first match, correlation by `l_orderkey` has high locality (~4 rows/order), and pre-filtering by `nation = 'SAUDI ARABIA'` limits invocations.

---

## Conclusions

### 1. QA completes the full TPC-H benchmark; InnoDB alone cannot

With QA, all 22 queries completed in **41 minutes**. Without QA, 3 out of 22 queries (Q17, Q20, Q22) could not finish -- estimated run times range from **38 hours to 3 days** per query. These queries contain correlated subqueries that force InnoDB into billions of full table scans.

### 2. For simple queries, HASH JOIN achieves near-parity with QA

11 out of 22 queries ran within +/-5% of QA times. The HASH JOIN tweak (`join_cache_level=3`, `join_buffer_size=3GB`) effectively replaces nested-loop joins with hash joins for straightforward multi-table queries.

### 3. QA advantage grows with query complexity

| Query pattern                   | noQA / QA ratio |
| ------------------------------- | --------------- |
| Simple scans / 2-3 table joins  | ~1.0x           |
| Multi-table joins (6+ tables)   | 1.2-2.0x        |
| GROUP BY + subqueries           | 5-16x           |
| Correlated aggregate subqueries | ~1000x (DNF)    |

### 4. Without QA, only 1 of 3 nodes is utilized

InnoDB runs exclusively on mcs1. mcs2 and mcs3 are completely idle (CPU 0.1%, Disk IO 0.5 MiB/s). QA distributes work across all PMs -- CPU peaks of 17% on mcs2/mcs3 and Disk IO avg 14.8 MiB/s vs 0.5.