# MCOL-6290: Query Accelerator TPC-H Benchmark Report ## Environment | Component | Details | | ---------------------- | --------------------------------------------------------- | | **CPU** | AMD EPYC 7R32, 8 cores / 16 threads, 1 socket | | **RAM** | 30 GiB | | **Disk** | 96 GB EBS (`/dev/root`) | | **OS** | Ubuntu 24.04.3 LTS (Noble Numbat), kernel 6.14.0-1018-aws | | **MariaDB Server** | 11.4.10-7-MariaDB-enterprise | | **ColumnStore** | 25.10.3 (stable-23.10 branch) | | **InnoDB Buffer Pool** | 128 MB (default) | | **Benchmark Tool** | go-tpc v1.0.9 (commit d05fdf8a, built 2022-10-24) | | **Data Generator** | tpchgen-cli (Rust-based TPC-H data generator) | | **Scale Factor** | 0.1 (~100 MB, lineitem: 600K rows, orders: 150K rows) | | **Iterations** | 5 runs per configuration, averages reported | ## Notes on Scale Factor SF=0.1 was chosen because at SF=1 several InnoDB queries without QA (Q13, Q17, Q20) exceeded 30+ minutes each, making a full 5-run benchmark impractical. Since `go-tpc tpch prepare` only accepts integer `--sf` values (minimum 1), data for SF=0.1 was generated and loaded separately: ```bash # Generate TPC-H SF=0.1 data tpchgen-cli -s 0.1 -o /tmp/tpch01 # Create InnoDB tables with PKs (matching MTR createTables.inc schema) # Load data via LOAD DATA LOCAL INFILE ... FIELDS TERMINATED BY '|' LINES TERMINATED BY '|\n' ``` ## Query Accelerator Configuration QA requires the following server and session settings: **Server (read-only, requires restart):** ``` --columnstore_innodb_queries_use_mcs=on ``` **Session (set globally for benchmark):** ```sql SET GLOBAL columnstore_unstable_optimizer=on; SET GLOBAL columnstore_query_accel_parallel_factor=5; SET GLOBAL optimizer_switch='index_merge=off,index_merge_union=off,index_merge_sort_union=off, index_merge_intersection=off,index_merge_sort_intersection=off,index_condition_pushdown=off, derived_merge=off,derived_with_keys=off,firstmatch=off,loosescan=off,materialization=on, in_to_exists=off,semijoin=off,partial_match_rowid_merge=off,partial_match_table_scan=off, subquery_cache=off,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=off, semijoin_with_cache=off,join_cache_incremental=off,join_cache_hashed=off,join_cache_bka=off, optimize_join_buffer_size=off,table_elimination=off,extended_keys=off,exists_to_in=off, orderby_uses_equalities=off,condition_pushdown_for_derived=on,split_materialized=off, condition_pushdown_for_subquery=off,rowid_filter=off,condition_pushdown_from_having=on, not_null_range_scan=off,hash_join_cardinality=off,cset_narrowing=off,sargable_casefold=off'; ``` **Cross-engine credentials (Columnstore.xml):** ``` CrossEngineSupport.User = CrossEngineSupport.Password = CrossEngineSupport.Port = 3306 ``` **InnoDB table indexes (matching MTR `indices.inc`):** ```sql ALTER TABLE partsupp ADD INDEX I1 (ps_partkey, ps_suppkey); ALTER TABLE lineitem ADD INDEX (l_orderkey, l_suppkey); ALTER TABLE orders ADD INDEX (o_orderkey, o_orderstatus); ALTER TABLE supplier ADD INDEX (s_nationkey, s_suppkey, s_name); ``` **Engine-independent statistics with histograms:** ```sql SET histogram_size=10; ANALYZE TABLE lineitem PERSISTENT FOR COLUMNS (l_orderkey) INDEXES(); SET histogram_size=3; ANALYZE TABLE partsupp PERSISTENT FOR COLUMNS (ps_partkey) INDEXES(); SET histogram_size=2; ANALYZE TABLE part PERSISTENT FOR COLUMNS (p_partkey) INDEXES(); SET histogram_size=5; ANALYZE TABLE orders PERSISTENT FOR COLUMNS (o_orderkey) INDEXES(); ``` ## Benchmark Commands **Baseline (InnoDB, QA OFF):** ```bash # Server started WITHOUT --columnstore_innodb_queries_use_mcs=on go-tpc tpch run -H 127.0.0.1 -P 3306 -U bench -p bench -D tpch_bench --sf 1 --count 22 ``` **QA enabled (InnoDB + Query Accelerator):** ```bash # Server started WITH --columnstore_innodb_queries_use_mcs=on # Global session variables set as above go-tpc tpch run -H 127.0.0.1 -P 3306 -U bench -p bench -D tpch_bench --sf 1 --count 22 ``` > **Note:** `--sf 1` is passed to `go-tpc tpch run` because it only accepts integer values. It selects which TPC-H queries to execute but does not affect the data already loaded at SF=0.1. Each configuration was run 5 times sequentially. ## Results (5-run average, seconds) | Query | InnoDB (no QA) | InnoDB + QA | Speedup | | --------- | -------------- | ----------- | ---------- | | Q1 | 1.18s | 0.62s | 1.9x | | Q2 | 0.10s | 0.89s | 0.1x | | Q3 | 0.52s | 0.53s | 1.0x | | Q4 | 0.35s | 0.36s | 1.0x | | Q5 | 0.11s | 0.60s | 0.2x | | Q6 | 0.17s | 0.22s | 0.8x | | Q7 | 0.37s | 0.64s | 0.6x | | Q8 | 0.31s | 0.80s | 0.4x | | Q9 | 3.69s | 1.13s | **3.3x** | | Q10 | 1.13s | 0.57s | **2.0x** | | Q11 | 0.58s | 0.96s | 0.6x | | Q12 | 2.68s | 0.49s | **5.5x** | | Q13 | 100.83s | 0.23s | **438.4x** | | Q14 | 2.26s | 0.46s | **4.9x** | | Q15 | 1.04s | 0.45s | **2.3x** | | Q16 | 0.10s | 0.29s | 0.3x | | Q17 | 3.57s | 0.58s | **6.1x** | | Q18 | 0.49s | 0.66s | 0.7x | | Q19 | 0.40s | 0.48s | 0.8x | | Q20 | 105.63s | 0.93s | **113.1x** | | Q21 | 3.99s | 1.44s | **2.8x** | | Q22 | 10.16s | 0.31s | **32.3x** | | **Total** | **239.67s** | **13.65s** | **17.6x** | ## Per-Run Raw Data ### Baseline (InnoDB, no QA) — 5 runs (seconds) | Query | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Avg | | ----- | ------ | ------ | ------ | ------ | ------ | ------ | | Q1 | 1.24 | 1.17 | 1.17 | 1.17 | 1.17 | 1.18 | | Q2 | 0.10 | 0.10 | 0.10 | 0.10 | 0.10 | 0.10 | | Q3 | 0.44 | 0.57 | 0.50 | 0.57 | 0.50 | 0.52 | | Q4 | 0.84 | 0.23 | 0.23 | 0.23 | 0.23 | 0.35 | | Q5 | 0.17 | 0.10 | 0.10 | 0.10 | 0.10 | 0.11 | | Q6 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | 0.17 | | Q7 | 0.37 | 0.37 | 0.37 | 0.37 | 0.37 | 0.37 | | Q8 | 0.64 | 0.23 | 0.23 | 0.23 | 0.23 | 0.31 | | Q9 | 2.99 | 3.86 | 3.93 | 3.86 | 3.79 | 3.69 | | Q10 | 0.97 | 1.11 | 1.24 | 1.17 | 1.17 | 1.13 | | Q11 | 0.64 | 0.57 | 0.57 | 0.57 | 0.57 | 0.58 | | Q12 | 2.52 | 2.72 | 2.72 | 2.72 | 2.72 | 2.68 | | Q13 | 104.12 | 100.03 | 99.96 | 99.96 | 100.09 | 100.83 | | Q14 | 2.18 | 2.25 | 2.32 | 2.25 | 2.32 | 2.26 | | Q15 | 0.37 | 0.70 | 1.71 | 0.70 | 1.71 | 1.04 | | Q16 | 0.10 | 0.10 | 0.10 | 0.10 | 0.10 | 0.10 | | Q17 | 3.19 | 3.66 | 3.66 | 3.66 | 3.66 | 3.57 | | Q18 | 0.57 | 0.37 | 0.50 | 0.44 | 0.57 | 0.49 | | Q19 | 0.37 | 0.37 | 0.50 | 0.37 | 0.37 | 0.40 | | Q20 | 105.46 | 105.73 | 105.66 | 105.66 | 105.66 | 105.63 | | Q21 | 3.99 | 3.99 | 3.99 | 3.99 | 3.99 | 3.99 | | Q22 | 10.17 | 10.17 | 10.17 | 10.10 | 10.17 | 10.16 | ### QA Enabled (InnoDB + Query Accelerator) — 5 runs (seconds) | Query | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Avg | | ----- | ----- | ----- | ----- | ----- | ----- | ---- | | Q1 | 0.84 | 0.57 | 0.57 | 0.57 | 0.57 | 0.62 | | Q2 | 0.77 | 0.97 | 0.84 | 0.97 | 0.91 | 0.89 | | Q3 | 0.70 | 0.57 | 0.44 | 0.44 | 0.50 | 0.53 | | Q4 | 0.37 | 0.37 | 0.37 | 0.30 | 0.37 | 0.36 | | Q5 | 0.70 | 0.57 | 0.64 | 0.50 | 0.57 | 0.60 | | Q6 | 0.30 | 0.23 | 0.23 | 0.17 | 0.17 | 0.22 | | Q7 | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 | | Q8 | 0.77 | 0.84 | 0.91 | 0.70 | 0.77 | 0.80 | | Q9 | 1.11 | 1.17 | 1.04 | 1.17 | 1.17 | 1.13 | | Q10 | 0.70 | 0.57 | 0.50 | 0.64 | 0.44 | 0.57 | | Q11 | 0.97 | 0.97 | 0.91 | 0.97 | 0.97 | 0.96 | | Q12 | 0.64 | 0.50 | 0.57 | 0.37 | 0.37 | 0.49 | | Q13 | 0.23 | 0.23 | 0.23 | 0.23 | 0.23 | 0.23 | | Q14 | 0.64 | 0.50 | 0.37 | 0.37 | 0.44 | 0.46 | | Q15 | 0.44 | 0.44 | 0.44 | 0.44 | 0.50 | 0.45 | | Q16 | 0.30 | 0.23 | 0.30 | 0.30 | 0.30 | 0.29 | | Q17 | 0.57 | 0.64 | 0.50 | 0.57 | 0.64 | 0.58 | | Q18 | 0.70 | 0.64 | 0.64 | 0.70 | 0.64 | 0.66 | | Q19 | 0.50 | 0.50 | 0.44 | 0.44 | 0.50 | 0.48 | | Q20 | 0.97 | 0.91 | 0.91 | 0.91 | 0.97 | 0.93 | | Q21 | 1.44 | 1.51 | 1.38 | 1.44 | 1.44 | 1.44 | | Q22 | 0.30 | 0.30 | 0.30 | 0.30 | 0.37 | 0.31 | ## Analysis ### Queries with significant QA speedup (>2x) - **Q13** (438x): LEFT OUTER JOIN customer/orders with GROUP BY — InnoDB struggles with the massive hash join; QA parallelizes via ColumnStore engine - **Q20** (113x): Correlated subquery with EXISTS over lineitem — QA eliminates the row-by-row subquery evaluation - **Q22** (32x): Nested NOT EXISTS subqueries with AVG — QA benefits from parallel range scans - **Q17** (6.1x): Correlated subquery with AVG over lineitem — similar pattern to Q20 - **Q12** (5.5x): Two-table join lineitem/orders with CASE aggregation - **Q14** (4.9x): Two-table join lineitem/part with conditional SUM - **Q9** (3.3x): Multi-table join (5 tables) with expression in SELECT - **Q21** (2.8x): Multi-table join with EXISTS/NOT EXISTS subqueries - **Q15** (2.3x): Revenue view with MAX subquery - **Q10** (2.0x): Four-table join with aggregation ### Queries where QA is slower (<1x) - **Q2** (0.1x), **Q5** (0.2x), **Q16** (0.3x), **Q8** (0.4x): Fast queries (<0.5s on InnoDB) where cross-engine overhead dominates - **Q7** (0.6x), **Q11** (0.6x): Mid-range queries where QA overhead exceeds parallelism benefit ### Key Takeaways 1. **QA excels on heavy analytical queries** — correlated subqueries, large JOINs, and GROUP BY operations see 3x–438x improvement 2. **QA adds overhead to fast queries** — cross-engine data transfer and parallel coordination cost ~0.3–0.9s, penalizing sub-second InnoDB queries 3. **Overall 17.6x speedup** — total benchmark time drops from 240s to 14s 4. **QA verified active** — `SELECT mcs_get_plan('rules')` returned `parallel_ces,predicate_pushdown`; PROCESSLIST confirmed `cejuser` making parallel cross-engine reads during QA execution