[MDEV-37115] Reverse index scan incurs unnecessary overhead - Jira

XML

Word

Printable

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Critical
Resolution: Unresolved
Affects Version/s: 10.6, 10.11, 11.4, 11.8
Fix Version/s: 10.11, 11.4, 11.8
Component/s: Storage Engine - InnoDB
Labels:
- performance

Bug Category:
Related to performance
Sprint:
Q4/2025 Server Maintenance, Q1/2026 Server Maintenance

Description

Reverse index scans in InnoDB currently make use of rather expensive logic for moving to the previous index page in the function btr_pcur_move_backward_from_page(). InnoDB will release all latches and then search for a smaller key than the current one.

We are missing an opportunity to acquire a latch on the preceding index page directly if it is available without any waiting. If any waiting is involved, then we had better use the existing logic, whose motivation is to avoid lock order inversion and therefore deadlocks between concurrent forward and reverse index scans.

The following test demonstrates this:

--source include/have_innodb.inc

--source include/have_innodb_4k.inc

--source include/have_profiling.inc

--source include/have_sequence.inc

CREATE TABLE tbig2 (id int auto_increment not null,

c2 char(255) not null default '',

c3 char(255) not null default '',

c4 char(255) not null default '', primary key(id,c2,c3,c4(20)))

ENGINE=InnoDB default charset=ucs2;

SET STATEMENT unique_checks=0,foreign_key_checks=0 FOR

insert into tbig2 (c2) select '' from seq_1_to_524288;

--disable_result_log

set profiling=ON;

select id from tbig2 order by id desc;

select id from tbig2 order by id desc;

select id from tbig2 order by id desc;

select id from tbig2 order by id desc;

select id from tbig2 order by id desc;

select id from tbig2 order by id desc;

select id from tbig2 order by id desc;

--enable_result_log

show profiles;

DROP TABLE tbig2;

This will create a 1.7 GiB data file, with 2 rows per index leaf page and 3 records per non-leaf page. That would be 262,144 leaf pages of 4 KiB each (1 GiB) and 0.7 GiB of non-leaf pages. I padded the PRIMARY KEY with the data, so that the tree would grow higher.

If I run the test on a large enough innodb_buffer_pool_size so that the data fits there (say, 2 GiB), then my current implementation of the optimization in btr_pcur_move_backward_from_page() would improve throughput by about 10% in one environment. However, if I run the test with a smaller buffer pool, so that the data cannot be resident, then the old implementation turns out to perform faster. The reason ought to be that while InnoDB is searching for the key, it may trigger read-ahead of some preceding pages.

Attachments

Issue Links

is blocked by

MDEV-37244 buf_read_page() should return a buffer-fixed block

Closed

relates to

MDEV-32067 InnoDB linear read ahead had better be logical

Stalled

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Assigned for Implementation:: Marko Mäkelä

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2025-06-30 06:49

Updated:: 2025-10-21 16:38