[MDEV-35049] fix innodb-adaptive-hash-index scalability with multiple threads - Jira

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: 11.8.1
Component/s: Storage Engine - InnoDB
Labels:
- Preview_11.8
- tech_debt

Description

InnoDB adaptive indexes has performance problems when running queries in parallel that uses InnoDB hash indexes (at least on the same tables).

Here is an example of running the same SELECT query with 10-256 concurrent threads.
(The query normally takes about 0.36 seconds):

Total execution with 10 threads time: 1 seconds
Total execution with 32 threads time: 1 seconds
Total execution with 64 threads time: 3 seconds
Total execution with 128 threads time: 5 seconds
Total execution with 256 threads time: 9 seconds

When using --innodb-adaptive-hash-index=1 we get:

Total execution with 10 threads time: 1 seconds
Total execution with 32 threads time: 5 seconds
Total execution with 64 threads time: 9 seconds
Total execution with 128 threads time: 18 seconds
Total execution with 256 threads time: 50 seconds

MySQL 5.7 does not have this issue. It has roughly the same speed with and without the hash indexes.

In the above query, there was a driving table with a 400 rows doing lookups in a child table using 200 lookups on a secondary keys for every row from the driving table.
Using innodb-adaptive-hash-index=1 speeds up the query with 37% when running a query in one thread but it slows down drastically when running 32-256 concurrent threads.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

ahi.sh
1 kB
2024-11-19 14:36
innodb-hashtest.sh
1 kB
2024-09-30 15:25

Issue Links

causes

MDEV-35649 Crash on macOS on preview-11.8-preview branch

Closed

includes

MDEV-35312 page_cur_search_with_match() could avoid rec_get_offsets()

Closed

is caused by

MDEV-22456 Dropping the adaptive hash index may cause DDL to lock up InnoDB

Closed

MDEV-22646 Assertion `table2->cached' failed in dict_table_t::add_to_cache

Closed

relates to

MDEV-35189 Updating cache for INFORMATION_SCHEMA.INNODB_LOCKS et al is suboptimal

Closed

MDEV-35190 HASH_SEARCH() is duplicating effort before HASH_INSERT() or HASH_DELETE()

Closed

MDEV-35247 ut_fold_ull() and ut_hash_ulint() are a waste

Closed

MDEV-35485 The test innodb.innodb_buffer_pool_resize occasionally crashes

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(3 relates to, 15 mentioned in)

Activity

Ascending order - Click to sort in descending order

View 49 older comments

Marko Mäkelä added a comment - 2024-12-20 14:04

There was one more observed ut_ad(block->n_pointers) assertion failure. It is the result of a race condition between btr_search_drop_page_hash_index() and btr_search_update_hash_ref() between two SELECT operations. The workload is also frequently rebuilding the adaptive hash index for this index, between 1 field (left side) and 2 fields (right side). It demonstrates that ideally, it should be possible to enable or disable the adaptive hash index not only on a per-index basis, but also on a per-query basis.

I do not have a fix for this yet. Possibly, btr_search_drop_page_hash_index() needs to be made to use a single batch, by allocating a large enough buffer. Multiple batches in btr_search_build_page_hash_index() should not be a problem, because the AHI always can be a proper subset of the contents of the index.

For performance testing, I also created the branch bb-10.11-MDEV-35049, with two variants: something that aims to be a correct merge and a deliberately incorrect merge to assess the impact of ~~MDEV-13756~~.

Marko Mäkelä added a comment - 2024-12-20 14:04 There was one more observed ut_ad(block->n_pointers) assertion failure. It is the result of a race condition between btr_search_drop_page_hash_index() and btr_search_update_hash_ref() between two SELECT operations. The workload is also frequently rebuilding the adaptive hash index for this index, between 1 field (left side) and 2 fields (right side). It demonstrates that ideally, it should be possible to enable or disable the adaptive hash index not only on a per-index basis, but also on a per-query basis. I do not have a fix for this yet. Possibly, btr_search_drop_page_hash_index() needs to be made to use a single batch, by allocating a large enough buffer. Multiple batches in btr_search_build_page_hash_index() should not be a problem, because the AHI always can be a proper subset of the contents of the index. For performance testing, I also created the branch bb-10.11- MDEV-35049 , with two variants: something that aims to be a correct merge and a deliberately incorrect merge to assess the impact of MDEV-13756 .

Marko Mäkelä added a comment - 2025-01-02 08:48

I ended up fixing the btr_search_drop_page_hash_index() problem by allocating enough stack to accommodate the maximum number of records per page. I empirically determined the maximum number of records with the following test case:

--source include/have_innodb.inc

--source include/have_innodb_64k.inc

--source include/have_sequence.inc

CREATE TABLE t(a SMALLINT PRIMARY KEY, INDEX(a)) ENGINE=InnoDB;

INSERT INTO t SELECT * FROM seq_1_to_8189;

od -Ax -t x1 -j 0x40000 -N 0x10000 var/mysqld.1/data/test/t.ibd

There is a hard limit of 8191 records per page. InnoDB reserves 13 bits for a heap number of in the record header, and the predefined infimum and supremum records consume one heap number each.

So, we would have to allocate 8189*4 = 32756 bytes of stack for the rec_fold() value buffer in btr_search_drop_page_hash_index() in order to be able to compute all rec_fold() before acquiring an exclusive latch on an adaptive hash index partition. I did consider an alternative of allocating a buffer from the heap; I am afraid that if we failed to allocate it, we’d have no other option than to crash the server.

I just realized that there is one more option: keep computing rec_fold() for the "overflow" while holding an exclusive adaptive hash index latch. In that way, for typical pages where we have at most a few hundred records, we’d invoke rec_fold() while not blocking others. In the non-typical case, there would be a bit more blocking going on. This could be the best compromise. We know that Alpine Linux (MDEV-18462) is configuring a small stack size, which is already leading to some problems elsewhere (MDEV-35705, MDEV-35022).

Marko Mäkelä added a comment - 2025-01-02 08:48 I ended up fixing the btr_search_drop_page_hash_index() problem by allocating enough stack to accommodate the maximum number of records per page. I empirically determined the maximum number of records with the following test case: --source include/have_innodb.inc --source include/have_innodb_64k.inc --source include/have_sequence.inc CREATE TABLE t(a SMALLINT PRIMARY KEY , INDEX (a)) ENGINE=InnoDB; INSERT INTO t SELECT * FROM seq_1_to_8189; od -Ax -t x1 -j 0x40000 -N 0x10000 var/mysqld.1/data/test/t.ibd There is a hard limit of 8191 records per page. InnoDB reserves 13 bits for a heap number of in the record header, and the predefined infimum and supremum records consume one heap number each. So, we would have to allocate 8189*4 = 32756 bytes of stack for the rec_fold() value buffer in btr_search_drop_page_hash_index() in order to be able to compute all rec_fold() before acquiring an exclusive latch on an adaptive hash index partition. I did consider an alternative of allocating a buffer from the heap; I am afraid that if we failed to allocate it, we’d have no other option than to crash the server. I just realized that there is one more option: keep computing rec_fold() for the "overflow" while holding an exclusive adaptive hash index latch. In that way, for typical pages where we have at most a few hundred records, we’d invoke rec_fold() while not blocking others. In the non-typical case, there would be a bit more blocking going on. This could be the best compromise. We know that Alpine Linux ( MDEV-18462 ) is configuring a small stack size, which is already leading to some problems elsewhere ( MDEV-35705 , MDEV-35022 ).

Marko Mäkelä added a comment - 2025-01-07 10:14

I checked some performance test results. As expected, I see a slight improvement in INSERT for the innodb_adaptive_hash_index=OFF case, due to ~~MDEV-35312~~ and the fact that we now enable that code even when the adaptive hash index is disabled.

Curiously, for the t_oltp_pointselect_innodb test in the innodb_adaptive_hash_index=ON case, there is hardly any improvement observed. This seems to suggest that we should include something like ahi.sh in our performance regression test suite.

Marko Mäkelä added a comment - 2025-01-07 10:14 I checked some performance test results. As expected, I see a slight improvement in INSERT for the innodb_adaptive_hash_index=OFF case, due to MDEV-35312 and the fact that we now enable that code even when the adaptive hash index is disabled. Curiously, for the t_oltp_pointselect_innodb test in the innodb_adaptive_hash_index=ON case, there is hardly any improvement observed. This seems to suggest that we should include something like ahi.sh in our performance regression test suite.

Matthias Leich added a comment - 2025-01-08 10:10

RQG testing etc.

Matthias Leich added a comment - 2025-01-08 10:10 RQG testing etc.

Marko Mäkelä added a comment - 2025-01-10 15:58

I created the branch bb-10.11-MDEV-35049-rebase (and tested all 10 commits in it separately) as well as pushed the 10 commits to the main (currently 11.8) branch.

Marko Mäkelä added a comment - 2025-01-10 15:58 I created the branch bb-10.11-MDEV-35049-rebase (and tested all 10 commits in it separately) as well as pushed the 10 commits to the main (currently 11.8) branch.

MariaDB Server

fix innodb-adaptive-hash-index scalability with multiple threads

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration