Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-37115

Reverse index scan incurs unnecessary overhead

    XMLWordPrintable

Details

    • Related to performance
    • Q4/2025 Server Maintenance

    Description

      Reverse index scans in InnoDB currently make use of rather expensive logic for moving to the previous index page in the function btr_pcur_move_backward_from_page(). InnoDB will release all latches and then search for a smaller key than the current one.

      We are missing an opportunity to acquire a latch on the preceding index page directly if it is available without any waiting. If any waiting is involved, then we had better use the existing logic, whose motivation is to avoid lock order inversion and therefore deadlocks between concurrent forward and reverse index scans.

      The following test demonstrates this:

      --source include/have_innodb.inc
      --source include/have_innodb_4k.inc
      --source include/have_profiling.inc
      --source include/have_sequence.inc
      CREATE TABLE tbig2 (id int auto_increment not null,
      c2 char(255) not null default '',
      c3 char(255) not null default '',
      c4 char(255) not null default '', primary key(id,c2,c3,c4(20)))
      ENGINE=InnoDB default charset=ucs2;
      SET STATEMENT unique_checks=0,foreign_key_checks=0 FOR
      insert into tbig2 (c2) select '' from seq_1_to_524288;
      --disable_result_log
      set profiling=ON;
      select id from tbig2 order by id desc;
      select id from tbig2 order by id desc;
      select id from tbig2 order by id desc;
      select id from tbig2 order by id desc;
      select id from tbig2 order by id desc;
      select id from tbig2 order by id desc;
      select id from tbig2 order by id desc;
      --enable_result_log
      show profiles;
      DROP TABLE tbig2;
      

      This will create a 1.7 GiB data file, with 2 rows per index leaf page and 3 records per non-leaf page. That would be 262,144 leaf pages of 4 KiB each (1 GiB) and 0.7 GiB of non-leaf pages. I padded the PRIMARY KEY with the data, so that the tree would grow higher.

      If I run the test on a large enough innodb_buffer_pool_size so that the data fits there (say, 2 GiB), then my current implementation of the optimization in btr_pcur_move_backward_from_page() would improve throughput by about 10% in one environment. However, if I run the test with a smaller buffer pool, so that the data cannot be resident, then the old implementation turns out to perform faster. The reason ought to be that while InnoDB is searching for the key, it may trigger read-ahead of some preceding pages.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Marko Mäkelä Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.