Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
MariaDB (and historically, MySQL) had a BNL-H join (in MariaDB, join_cache_level>=3).
Then MySQL has added "proper" hash join.
This MDEV is about checking the performance of the two.
Query pattern
SELECT |
MAX(column1), ... |
FROM
|
t1, t2
|
WHERE
|
t1.col1=t2.col2
|
Let the join columns (col1 and col2 above) be integers for now.
Tables
Use 500bytes / row in both tables.
No indexes.
20M rows in one table, 20M (or 200M?) in another.
Run 1:
- table 1: 20M rows, unique values.
- table 2: 20M rows, 10 rows with same value on average.
Run 2:
- table 1: 20M rows, unique values.
- table 2: 200M rows, 10 rows with same value on average.
(Please don't "co-locate" rows with duplicates next to one another).
Table sizes
The first table: 20M * 0.5 KB/record = 10G
The second table: also 10G in try1, 100G in try2.
InnoDB Buffer pool
Start from "hot" buffer pool for both tables.
(Set innodb_buffer_pool_size to fit both tables, do multiple query runs).
Memory used for joining
MariaDB
Default is join_buffer_size=256K
MySQL
Default is join_buffer_size=256K
MySQL: "Memory usage by hash joins can be controlled using the join_buffer_size
system variable"
To use in benchmark
Set the common default: let'start with join_buffer_size=1G.
Links
https://dev.mysql.com/doc/refman/8.0/en/hash-joins.html
https://dev.mysql.com/blog-archive/hash-join-in-mysql-8/
https://mariadb.com/kb/en/server-system-variables/#join_cache_level
Attachments
- t1_small.txt
- 1.23 MB
- t2_small.txt
- 1.04 MB
Issue Links
- relates to
-
MDEV-34352 Execution time for Block Nested Loop Hash (BNLH) join with charset utf8mb4 is 7-20 times slower than with latin1
-
- Closed
-
-
MDEV-34427 BNL-H has not optimal implementation for varchar type
-
- Open
-
-
MDEV-34417 Wrong result set with utf8mb4_danish_ci and BNLH join
-
- Closed
-
-
MDEV-35855 Make it possible to enable BNL-H join without hitting regressions
-
- Open
-
Activity
Results of testing
For condition:
Run 1:
|
table 1: 20M rows, unique values. |
table 2: 20M rows, 10 rows with same value on average. |
join_buffer_size=1G
|
CREATE TABLE t1(col1 VARCHAR(500)) engine=INNODB; |
CREATE TABLE t2(col2 VARCHAR(500)) engine=INNODB; |
and query:
select max(A.col1) from t1 A, t2 B where A.col1=B.col2; |
the following results were obtained (with the default charset):
DB | Engine | Charset | Collation | Execution time |
---|---|---|---|---|
MySQL 8.3 | InnoDB | utf8mb4 | utf8mb4_0900_ai_ci | 3 hours 46 min 20,74 sec |
MariaDB 11.5 | InnoDB | latin1 | latin1_swedish_ci | more then 3 days and not finished |
On a dataset with 3000000 rows in table were obtained the following results :
DB | Engine | Charset | Collation | Execution time |
---|---|---|---|---|
MySQL 8.3 | InnoDB | utf8mb4 | utf8mb4_0900_ai_ci | 2 min 18,16 sec |
MariaDB 11.5 | InnoDB | latin1 | latin1_swedish_ci | 11 min 25,869 sec |
MariaDB 11.5 | InnoDB | utf8mb4 | utf8mb4_unicode_ci | 21 hours 24 min 10,971 sec |
On a small dataset (200000 rows in table) were obtained the following results :
DB | Engine | Charset | Collation | Execution time |
---|---|---|---|---|
MySQL 8.3 | InnoDB | utf8mb4 | utf8mb4_0900_ai_ci | 9,62 sec |
MySQL 8.3 | InnoDB | utf8mb4 | utf8mb4_general_ci | 10,13 sec |
MariaDB 11.5 | InnoDB | latin1 | latin1_swedish_ci | 4,031 sec |
MariaDB 11.5 | InnoDB | utf8mb4 | utf8mb4_unicode_ci | 7 min 7,990 sec |
MariaDB 11.5 | InnoDB | utf8mb4 | utf8mb4_general_ci | 5 min 39,216 sec |
ANALYZE for the small dataset:
MySQL (CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci):
| -> Aggregate: max(A.col5) (cost=4.41e+9 rows=1) (actual time=9616..9616 rows=1 loops=1) |
-> Filter: (B.col6 = A.col5) (cost=4e+9 rows=4e+9) (actual time=4753..9498 rows=201043 loops=1) |
-> Inner hash join (<hash>(B.col6)=<hash>(A.col5)) (cost=4e+9 rows=4e+9) (actual time=4753..9374 rows=201043 loops=1) |
-> Table scan on B (cost=0.0109 rows=200164) (actual time=0.181..4188 rows=200000 loops=1) |
-> Hash
|
-> Table scan on A (cost=20179 rows=200080) (actual time=0.563..4249 rows=200000 loops=1) |
|
|
MySQL (CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci):
| -> Aggregate: max(A.col5) (cost=4.4e+9 rows=1) (actual time=10118..10118 rows=1 loops=1) |
-> Inner hash join (A.col5 = B.col6) (cost=4e+9 rows=4e+9) (actual time=5034..10030 rows=201043 loops=1) |
-> Table scan on A (cost=0.0112 rows=200059) (actual time=0.433..4414 rows=200000 loops=1) |
-> Hash
|
-> Table scan on B (cost=20096 rows=199978) (actual time=0.353..4307 rows=200000 loops=1) |
|
|
Mariadb (CHARSET=latin1 COLLATE=latin1_swedish_ci):
| {
|
"query_optimization": { |
"r_total_time_ms": 0.075227284 |
},
|
"query_block": { |
"select_id": 1, |
"cost": 371917.4026, |
"r_loops": 1, |
"r_total_time_ms": 4038.20338, |
"nested_loop": [ |
{
|
"table": { |
"table_name": "A", |
"access_type": "ALL", |
"loops": 1, |
"r_loops": 1, |
"rows": 200017, |
"r_rows": 200000, |
"cost": 33.0239298, |
"r_table_time_ms": 148.2765999, |
"r_other_time_ms": 28.96390798, |
"r_engine_stats": { |
"pages_accessed": 520, |
"pages_read_count": 518, |
"pages_read_time_ms": 251875.792 |
},
|
"filtered": 100, |
"r_total_filtered": 100, |
"attached_condition": "A.col5 is not null", |
"r_filtered": 100 |
}
|
},
|
{
|
"block-nl-join": { |
"table": { |
"table_name": "B", |
"access_type": "hash_ALL", |
"key": "#hash#$hj", |
"key_length": "503", |
"used_key_parts": ["col6"], |
"ref": ["test1.A.col5"], |
"loops": 200017, |
"r_loops": 50, |
"rows": 200185, |
"r_rows": 200000, |
"cost": 371884.3787, |
"r_table_time_ms": 2644.038343, |
"r_other_time_ms": 1188.554524, |
"r_engine_stats": { |
"pages_accessed": 20450, |
"pages_read_count": 1450, |
"pages_read_time_ms": 937762.289 |
},
|
"filtered": 10, |
"r_total_filtered": 100, |
"r_filtered": 100 |
},
|
"buffer_type": "flat", |
"buffer_size": "2048Kb", |
"join_type": "BNLH", |
"attached_condition": "B.col6 = A.col5", |
"r_loops": 200000, |
"r_filtered": 100, |
"r_unpack_time_ms": 11.28171225, |
"r_other_time_ms": 17.084369, |
"r_effective_rows": 1.005215 |
}
|
}
|
]
|
}
|
} |
|
Mariadb (CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci):
| {
|
"query_optimization": { |
"r_total_time_ms": 0.021029488 |
},
|
"query_block": { |
"select_id": 1, |
"cost": 371541.3949, |
"r_loops": 1, |
"r_total_time_ms": 428806.3792, |
"nested_loop": [ |
{
|
"table": { |
"table_name": "A", |
"access_type": "ALL", |
"loops": 1, |
"r_loops": 1, |
"rows": 200000, |
"r_rows": 200000, |
"cost": 32.6108548, |
"r_table_time_ms": 311.4800239, |
"r_other_time_ms": 1086.996091, |
"r_engine_stats": { |
"pages_accessed": 808, |
"pages_read_count": 808, |
"pages_read_time_ms": 495295.48 |
},
|
"filtered": 100, |
"r_total_filtered": 100, |
"attached_condition": "A.col5 is not null", |
"r_filtered": 100 |
}
|
},
|
{
|
"block-nl-join": { |
"table": { |
"table_name": "B", |
"access_type": "hash_ALL", |
"key": "#hash#$hj", |
"key_length": "2003", |
"used_key_parts": ["col6"], |
"ref": ["test1.A.col5"], |
"loops": 200000, |
"r_loops": 194, |
"rows": 200000, |
"r_rows": 200000, |
"cost": 371508.7841, |
"r_table_time_ms": 20130.76286, |
"r_other_time_ms": 407173.4794, |
"r_engine_stats": { |
"pages_accessed": 79346, |
"pages_read_count": 37842, |
"pages_read_time_ms": 28388551.35 |
},
|
"filtered": 10, |
"r_total_filtered": 100, |
"r_filtered": 100 |
},
|
"buffer_type": "flat", |
"buffer_size": "2048Kb", |
"join_type": "BNLH", |
"attached_condition": "B.col6 = A.col5", |
"r_loops": 200000, |
"r_filtered": 100, |
"r_unpack_time_ms": 52.00349565, |
"r_other_time_ms": 51.65476019, |
"r_effective_rows": 1.005215 |
}
|
}
|
]
|
}
|
} |
|
Mariadb (CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci):
| {
|
"query_optimization": { |
"r_total_time_ms": 0.079407492 |
},
|
"query_block": { |
"select_id": 1, |
"cost": 371917.4026, |
"r_loops": 1, |
"r_total_time_ms": 339859.7986, |
"nested_loop": [ |
{
|
"table": { |
"table_name": "A", |
"access_type": "ALL", |
"loops": 1, |
"r_loops": 1, |
"rows": 200017, |
"r_rows": 200000, |
"cost": 33.0239298, |
"r_table_time_ms": 387.6377464, |
"r_other_time_ms": 876.243192, |
"r_engine_stats": { |
"pages_accessed": 808, |
"pages_read_count": 808, |
"pages_read_time_ms": 693960.456 |
},
|
"filtered": 100, |
"r_total_filtered": 100, |
"attached_condition": "A.col5 is not null", |
"r_filtered": 100 |
}
|
},
|
{
|
"block-nl-join": { |
"table": { |
"table_name": "B", |
"access_type": "hash_ALL", |
"key": "#hash#$hj", |
"key_length": "2003", |
"used_key_parts": ["col6"], |
"ref": ["test1.A.col5"], |
"loops": 200017, |
"r_loops": 194, |
"rows": 200185, |
"r_rows": 200000, |
"cost": 371884.3787, |
"r_table_time_ms": 17617.89978, |
"r_other_time_ms": 320885.0351, |
"r_engine_stats": { |
"pages_accessed": 79346, |
"pages_read_count": 23326, |
"pages_read_time_ms": 20330041.15 |
},
|
"filtered": 10, |
"r_total_filtered": 100, |
"r_filtered": 100 |
},
|
"buffer_type": "flat", |
"buffer_size": "2048Kb", |
"join_type": "BNLH", |
"attached_condition": "B.col6 = A.col5", |
"r_loops": 200000, |
"r_filtered": 100, |
"r_unpack_time_ms": 45.64252627, |
"r_other_time_ms": 47.33608526, |
"r_effective_rows": 1.005215 |
}
|
}
|
]
|
}
|
} |
|
What are the attached t1_small.txt and t2_small.txt files?
Please attach the benchmark script (or link to it) so we can get back to it and/or
look at other details.
On a small dataset (200,000 rows in table) were obtained the following results :
notes:
- this is with large key_length (500 or 2K bytes). Correct this was in the request in this MDEV, but we may want to check a more common case of smaller comparisons...
- ANALYZE shows buffer_size=2M.
- the second table is read 50 or 194 times (lots of refills).
Do we really have join_buffer_size=1G (as said above in that comment), while join buffer code uses 2M for buffers (as ANALYZE shows?). Any idea why?
Is there any way to see how much memory was used by MySQL ? (please check Performance schema? )
- Ok it probably uses such small buffer size because of join_buffer_space_limit. Need to increase that, too.
- For MySQL, please check https://dev.mysql.com/doc/refman/8.4/en/performance-schema-memory-summary-tables.html. Look at what PS shows while hash join is running...
- Can we try with the "small" dataset, latin1 character set, and large buffer sizes (large enough so that there are no refills). How does MariaDB compare to MySQL in this case?
Accidentally discovered this:
Consider a BNL-H join done over a column that is defined as VARCHAR(100) but has actual data of much smaller length N. The column itself is stored in the buffer efficiently, taking space for N characters.
But the buffer record also contains a "non-embedded key", which takes the space of 100 characters.
This means BNL-H joins over dynamic-length columns have a lot of overhead.
The decision whether to use the embedded key is made in JOIN_CACHE::check_emb_key_usage().
Increasing join_buffer_space_limit was helpful, but it looks like as soon as the data size exceeds the allocated memory, performance problems begin.
Because there are some problems with collation utf8mb4 (MDEV-34427 and MDEV-34417), latin1 is used for testing.
Variables:
+-------------------------+----------------------+ |
| Variable_name | Value |
|
+-------------------------+----------------------+ |
| join_buffer_size | 524288 |
|
| join_buffer_space_limit | 2097152 |
|
| join_cache_level | 3 |
|
+-------------------------+----------------------+ |
Results:
N=50000
DB | Execution time |
---|---|
MySQL 8.3 | 2,00 sec |
MariaDB 11.5 | 1,067 sec |
N=100000
DB | Execution time |
---|---|
MySQL 8.3 | 5,91 sec |
MariaDB 11.5 | 5,088 sec |
N=200000
DB | Execution time |
---|---|
MySQL 8.3 | 8,67 sec |
MariaDB 11.5 | 26,720 sec |
There is an information about temporary files from run with N=200000 for MySQL in temp_files_mysql_n=200000.txt
For int type:
create TABLE t1(col int) engine=INNODB character set latin1 collate latin1_swedish_ci; |
create TABLE t2(col int) engine=INNODB character set latin1 collate latin1_swedish_ci; |
and
Variables:
+-------------------------+----------------------+ |
| Variable_name | Value |
|
+-------------------------+----------------------+ |
| join_buffer_size | 524288 |
|
| join_buffer_space_limit | 2097152 |
|
| join_cache_level | 3 |
|
+-------------------------+----------------------+ |
Results:
N=1000000
DB | Execution time |
---|---|
MySQL 8.3 | 36,22 sec |
MariaDB 11.5 | 31,576 sec |
N=2000000
DB | Execution time |
---|---|
MySQL 8.3 | 1 min 21,35 sec |
MariaDB 11.5 | 1 min 34,099 sec |
In MySQL, hash join is enabled by default
mysql> explain select * from t1 A, t2 B where A.a=B.a;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+--------------------------------------------+
| 1 | SIMPLE | A | NULL | ALL | NULL | NULL | NULL | NULL | 1000 | 100.00 | NULL |
| 1 | SIMPLE | B | NULL | ALL | NULL | NULL | NULL | NULL | 998842 | 10.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+--------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
explain analyze select * from t1 A, t2 B where A.a=B.a\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (B.a = A.a) (cost=99.9e+6 rows=99.9e+6) (actual time=2.36..370 rows=1000 loops=1)
-> Table scan on B (cost=10.5 rows=998842) (actual time=0.0137..317 rows=1e+6 loops=1)
-> Hash
-> Table scan on A (cost=101 rows=1000) (actual time=0.0389..1.77 rows=1000 loops=1)
1 row in set (0.37 sec)
In MariaDB:
MariaDB [test]> set join_cache_level=3;
Query OK, 0 rows affected (0.000 sec)
MariaDB [test]> explain select * from t1 A, t2 B where A.a=B.a;
+------+-------------+-------+----------+---------------+-----------+---------+----------+------+--------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+----------+---------------+-----------+---------+----------+------+--------------------------------------------------+
| 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 10 | |
| 1 | SIMPLE | B | hash_ALL | NULL | #hash#$hj | 4 | test.A.a | 1000 | Using where; Using join buffer (flat, BNLH join) |
+------+-------------+-------+----------+---------------+-----------+---------+----------+------+--------------------------------------------------+
ANALYZE FORMAT=JSON shows various details, e.g. buffer size.