# MCOL-6290: Query Accelerator TPC-H Benchmark Report

## Environment

| Component              | Details                                                   |
| ---------------------- | --------------------------------------------------------- |
| **CPU**                | AMD EPYC 7R32, 8 cores / 16 threads, 1 socket             |
| **RAM**                | 30 GiB                                                    |
| **Disk**               | 96 GB EBS (`/dev/root`)                                   |
| **OS**                 | Ubuntu 24.04.3 LTS (Noble Numbat), kernel 6.14.0-1018-aws |
| **MariaDB Server**     | 11.4.10-7-MariaDB-enterprise                              |
| **ColumnStore**        | 25.10.3 (stable-23.10 branch)                             |
| **InnoDB Buffer Pool** | 128 MB (default)                                          |
| **Benchmark Tool**     | go-tpc v1.0.9 (commit d05fdf8a, built 2022-10-24)         |
| **Data Generator**     | tpchgen-cli (Rust-based TPC-H data generator)             |
| **Scale Factor**       | 0.1 (~100 MB, lineitem: 600K rows, orders: 150K rows)     |
| **Iterations**         | 5 runs per configuration, averages reported               |

## Notes on Scale Factor

SF=0.1 was chosen because at SF=1 several InnoDB queries without QA (Q13, Q17, Q20) exceeded 30+ minutes each, making a full 5-run benchmark impractical.

Since `go-tpc tpch prepare` only accepts integer `--sf` values (minimum 1), data for SF=0.1 was generated and loaded separately:

```bash
# Generate TPC-H SF=0.1 data
tpchgen-cli -s 0.1 -o /tmp/tpch01

# Create InnoDB tables with PKs (matching MTR createTables.inc schema)
# Load data via LOAD DATA LOCAL INFILE ... FIELDS TERMINATED BY '|' LINES TERMINATED BY '|\n'
```

## Query Accelerator Configuration

QA requires the following server and session settings:

**Server (read-only, requires restart):**

```
--columnstore_innodb_queries_use_mcs=on
```

**Session (set globally for benchmark):**

```sql
SET GLOBAL columnstore_unstable_optimizer=on;
SET GLOBAL columnstore_query_accel_parallel_factor=5;
SET GLOBAL optimizer_switch='index_merge=off,index_merge_union=off,index_merge_sort_union=off,
  index_merge_intersection=off,index_merge_sort_intersection=off,index_condition_pushdown=off,
  derived_merge=off,derived_with_keys=off,firstmatch=off,loosescan=off,materialization=on,
  in_to_exists=off,semijoin=off,partial_match_rowid_merge=off,partial_match_table_scan=off,
  subquery_cache=off,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=off,
  semijoin_with_cache=off,join_cache_incremental=off,join_cache_hashed=off,join_cache_bka=off,
  optimize_join_buffer_size=off,table_elimination=off,extended_keys=off,exists_to_in=off,
  orderby_uses_equalities=off,condition_pushdown_for_derived=on,split_materialized=off,
  condition_pushdown_for_subquery=off,rowid_filter=off,condition_pushdown_from_having=on,
  not_null_range_scan=off,hash_join_cardinality=off,cset_narrowing=off,sargable_casefold=off';
```

**Cross-engine credentials (Columnstore.xml):**

```
CrossEngineSupport.User = <user>
CrossEngineSupport.Password = <password>
CrossEngineSupport.Port = 3306
```

**InnoDB table indexes (matching MTR `indices.inc`):**

```sql
ALTER TABLE partsupp ADD INDEX I1 (ps_partkey, ps_suppkey);
ALTER TABLE lineitem ADD INDEX (l_orderkey, l_suppkey);
ALTER TABLE orders   ADD INDEX (o_orderkey, o_orderstatus);
ALTER TABLE supplier ADD INDEX (s_nationkey, s_suppkey, s_name);
```

**Engine-independent statistics with histograms:**

```sql
SET histogram_size=10;
ANALYZE TABLE lineitem PERSISTENT FOR COLUMNS (l_orderkey) INDEXES();
SET histogram_size=3;
ANALYZE TABLE partsupp PERSISTENT FOR COLUMNS (ps_partkey) INDEXES();
SET histogram_size=2;
ANALYZE TABLE part PERSISTENT FOR COLUMNS (p_partkey) INDEXES();
SET histogram_size=5;
ANALYZE TABLE orders PERSISTENT FOR COLUMNS (o_orderkey) INDEXES();
```

## Benchmark Commands

**Baseline (InnoDB, QA OFF):**

```bash
# Server started WITHOUT --columnstore_innodb_queries_use_mcs=on
go-tpc tpch run -H 127.0.0.1 -P 3306 -U bench -p bench -D tpch_bench --sf 1 --count 22
```

**QA enabled (InnoDB + Query Accelerator):**

```bash
# Server started WITH --columnstore_innodb_queries_use_mcs=on
# Global session variables set as above
go-tpc tpch run -H 127.0.0.1 -P 3306 -U bench -p bench -D tpch_bench --sf 1 --count 22
```

> **Note:** `--sf 1` is passed to `go-tpc tpch run` because it only accepts integer values. It selects which TPC-H queries to execute but does not affect the data already loaded at SF=0.1.

Each configuration was run 5 times sequentially.

## Results (5-run average, seconds)

| Query     | InnoDB (no QA) | InnoDB + QA | Speedup    |
| --------- | -------------- | ----------- | ---------- |
| Q1        | 1.18s          | 0.62s       | 1.9x       |
| Q2        | 0.10s          | 0.89s       | 0.1x       |
| Q3        | 0.52s          | 0.53s       | 1.0x       |
| Q4        | 0.35s          | 0.36s       | 1.0x       |
| Q5        | 0.11s          | 0.60s       | 0.2x       |
| Q6        | 0.17s          | 0.22s       | 0.8x       |
| Q7        | 0.37s          | 0.64s       | 0.6x       |
| Q8        | 0.31s          | 0.80s       | 0.4x       |
| Q9        | 3.69s          | 1.13s       | **3.3x**   |
| Q10       | 1.13s          | 0.57s       | **2.0x**   |
| Q11       | 0.58s          | 0.96s       | 0.6x       |
| Q12       | 2.68s          | 0.49s       | **5.5x**   |
| Q13       | 100.83s        | 0.23s       | **438.4x** |
| Q14       | 2.26s          | 0.46s       | **4.9x**   |
| Q15       | 1.04s          | 0.45s       | **2.3x**   |
| Q16       | 0.10s          | 0.29s       | 0.3x       |
| Q17       | 3.57s          | 0.58s       | **6.1x**   |
| Q18       | 0.49s          | 0.66s       | 0.7x       |
| Q19       | 0.40s          | 0.48s       | 0.8x       |
| Q20       | 105.63s        | 0.93s       | **113.1x** |
| Q21       | 3.99s          | 1.44s       | **2.8x**   |
| Q22       | 10.16s         | 0.31s       | **32.3x**  |
| **Total** | **239.67s**    | **13.65s**  | **17.6x**  |

## Per-Run Raw Data

### Baseline (InnoDB, no QA) — 5 runs (seconds)

| Query | Run 1  | Run 2  | Run 3  | Run 4  | Run 5  | Avg    |
| ----- | ------ | ------ | ------ | ------ | ------ | ------ |
| Q1    | 1.24   | 1.17   | 1.17   | 1.17   | 1.17   | 1.18   |
| Q2    | 0.10   | 0.10   | 0.10   | 0.10   | 0.10   | 0.10   |
| Q3    | 0.44   | 0.57   | 0.50   | 0.57   | 0.50   | 0.52   |
| Q4    | 0.84   | 0.23   | 0.23   | 0.23   | 0.23   | 0.35   |
| Q5    | 0.17   | 0.10   | 0.10   | 0.10   | 0.10   | 0.11   |
| Q6    | 0.17   | 0.17   | 0.17   | 0.17   | 0.17   | 0.17   |
| Q7    | 0.37   | 0.37   | 0.37   | 0.37   | 0.37   | 0.37   |
| Q8    | 0.64   | 0.23   | 0.23   | 0.23   | 0.23   | 0.31   |
| Q9    | 2.99   | 3.86   | 3.93   | 3.86   | 3.79   | 3.69   |
| Q10   | 0.97   | 1.11   | 1.24   | 1.17   | 1.17   | 1.13   |
| Q11   | 0.64   | 0.57   | 0.57   | 0.57   | 0.57   | 0.58   |
| Q12   | 2.52   | 2.72   | 2.72   | 2.72   | 2.72   | 2.68   |
| Q13   | 104.12 | 100.03 | 99.96  | 99.96  | 100.09 | 100.83 |
| Q14   | 2.18   | 2.25   | 2.32   | 2.25   | 2.32   | 2.26   |
| Q15   | 0.37   | 0.70   | 1.71   | 0.70   | 1.71   | 1.04   |
| Q16   | 0.10   | 0.10   | 0.10   | 0.10   | 0.10   | 0.10   |
| Q17   | 3.19   | 3.66   | 3.66   | 3.66   | 3.66   | 3.57   |
| Q18   | 0.57   | 0.37   | 0.50   | 0.44   | 0.57   | 0.49   |
| Q19   | 0.37   | 0.37   | 0.50   | 0.37   | 0.37   | 0.40   |
| Q20   | 105.46 | 105.73 | 105.66 | 105.66 | 105.66 | 105.63 |
| Q21   | 3.99   | 3.99   | 3.99   | 3.99   | 3.99   | 3.99   |
| Q22   | 10.17  | 10.17  | 10.17  | 10.10  | 10.17  | 10.16  |

### QA Enabled (InnoDB + Query Accelerator) — 5 runs (seconds)

| Query | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Avg  |
| ----- | ----- | ----- | ----- | ----- | ----- | ---- |
| Q1    | 0.84  | 0.57  | 0.57  | 0.57  | 0.57  | 0.62 |
| Q2    | 0.77  | 0.97  | 0.84  | 0.97  | 0.91  | 0.89 |
| Q3    | 0.70  | 0.57  | 0.44  | 0.44  | 0.50  | 0.53 |
| Q4    | 0.37  | 0.37  | 0.37  | 0.30  | 0.37  | 0.36 |
| Q5    | 0.70  | 0.57  | 0.64  | 0.50  | 0.57  | 0.60 |
| Q6    | 0.30  | 0.23  | 0.23  | 0.17  | 0.17  | 0.22 |
| Q7    | 0.64  | 0.64  | 0.64  | 0.64  | 0.64  | 0.64 |
| Q8    | 0.77  | 0.84  | 0.91  | 0.70  | 0.77  | 0.80 |
| Q9    | 1.11  | 1.17  | 1.04  | 1.17  | 1.17  | 1.13 |
| Q10   | 0.70  | 0.57  | 0.50  | 0.64  | 0.44  | 0.57 |
| Q11   | 0.97  | 0.97  | 0.91  | 0.97  | 0.97  | 0.96 |
| Q12   | 0.64  | 0.50  | 0.57  | 0.37  | 0.37  | 0.49 |
| Q13   | 0.23  | 0.23  | 0.23  | 0.23  | 0.23  | 0.23 |
| Q14   | 0.64  | 0.50  | 0.37  | 0.37  | 0.44  | 0.46 |
| Q15   | 0.44  | 0.44  | 0.44  | 0.44  | 0.50  | 0.45 |
| Q16   | 0.30  | 0.23  | 0.30  | 0.30  | 0.30  | 0.29 |
| Q17   | 0.57  | 0.64  | 0.50  | 0.57  | 0.64  | 0.58 |
| Q18   | 0.70  | 0.64  | 0.64  | 0.70  | 0.64  | 0.66 |
| Q19   | 0.50  | 0.50  | 0.44  | 0.44  | 0.50  | 0.48 |
| Q20   | 0.97  | 0.91  | 0.91  | 0.91  | 0.97  | 0.93 |
| Q21   | 1.44  | 1.51  | 1.38  | 1.44  | 1.44  | 1.44 |
| Q22   | 0.30  | 0.30  | 0.30  | 0.30  | 0.37  | 0.31 |

## Analysis

### Queries with significant QA speedup (>2x)

- **Q13** (438x): LEFT OUTER JOIN customer/orders with GROUP BY — InnoDB struggles with the massive hash join; QA parallelizes via ColumnStore engine
- **Q20** (113x): Correlated subquery with EXISTS over lineitem — QA eliminates the row-by-row subquery evaluation
- **Q22** (32x): Nested NOT EXISTS subqueries with AVG — QA benefits from parallel range scans
- **Q17** (6.1x): Correlated subquery with AVG over lineitem — similar pattern to Q20
- **Q12** (5.5x): Two-table join lineitem/orders with CASE aggregation
- **Q14** (4.9x): Two-table join lineitem/part with conditional SUM
- **Q9** (3.3x): Multi-table join (5 tables) with expression in SELECT
- **Q21** (2.8x): Multi-table join with EXISTS/NOT EXISTS subqueries
- **Q15** (2.3x): Revenue view with MAX subquery
- **Q10** (2.0x): Four-table join with aggregation

### Queries where QA is slower (<1x)

- **Q2** (0.1x), **Q5** (0.2x), **Q16** (0.3x), **Q8** (0.4x): Fast queries (<0.5s on InnoDB) where cross-engine overhead dominates
- **Q7** (0.6x), **Q11** (0.6x): Mid-range queries where QA overhead exceeds parallelism benefit

### Key Takeaways

1. **QA excels on heavy analytical queries** — correlated subqueries, large JOINs, and GROUP BY operations see 3x–438x improvement
2. **QA adds overhead to fast queries** — cross-engine data transfer and parallel coordination cost ~0.3–0.9s, penalizing sub-second InnoDB queries
3. **Overall 17.6x speedup** — total benchmark time drops from 240s to 14s
4. **QA verified active** — `SELECT mcs_get_plan('rules')` returned `parallel_ces,predicate_pushdown`; PROCESSLIST confirmed `cejuser` making parallel cross-engine reads during QA execution