[MDEV-11384] AliSQL: [Feature] Issue#19 BUFFER POOL LIST SCAN OPTIMIZATION Created: 2016-11-29  Updated: 2021-03-09  Resolved: 2021-03-09

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: 10.6.0

Type: Task Priority: Major
Reporter: Sergey Vojtovich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
is blocked by MDEV-25085 Simplify instrumentation for LRU evic... Closed
Relates
relates to MDEV-15706 Remove information_schema.innodb_metr... Open
relates to MDEV-16526 Overhaul the InnoDB page flushing Closed
relates to MDEV-16580 Remove unused monitor counters from I... Closed
relates to MDEV-23399 10.5 performance regression with IO-b... Closed
relates to MDEV-23855 InnoDB log checkpointing causes regre... Closed
Epic Link: AliSQL patches

 Description   

Description:
------------
backport from WebScaleSQL
 
This patch includes:
--- backport of upstream work around buffer pool list scan.
 
     WL#7047 - Optimize buffer pool list scans and related batch processing code
 
     Reduce excessive scanning of pages when doing flush list batches. The
     fix is to introduce the concept of "Hazard Pointer", this reduces the
     time complexity of the scan from O(n*n) to O(n).
 
     The concept of hazard pointer is reversed in this work.  Academically a
     hazard pointer is a pointer that the thread working on it will declare as
     such and as long as that thread is not done no other thread is allowed to
     do anything with it.
 
     In this WL we declare the pointer as a hazard pointer and then if any other
     thread attempts to work on it, it is allowed to do so but it has to adjust
     the hazard pointer to the next valid value. We use hazard pointer solely for
     reverse traversal of lists within a buffer pool instance.
 
     Add an event to control the background flush thread. The background flush
     thread wait has been converted to an os event timed wait so that it can be
     signalled by threads that want to kick start a background flush when the
     buffer pool is running low on free/dirty pages.
 
--- fix for mysql bug#71411
     buf_flush_LRU() returns the number of pages processed. There are
     two types of processing that can happen. A page can get evicted or
     a page can get flushed. These two numbers are quite distinct and
     should not be mixed.

https://github.com/alibaba/AliSQL/commit/2645293fb0c1ed398f7243da2c14ab07572045b0



 Comments   
Comment by Marko Mäkelä [ 2017-11-17 ]

The commit includes two things: a backport of a feature from MySQL 5.7 to AliSQL 5.6, and a change to split a "processed blocks" counter into "flushed blocks" and "evicted blocks" counters.

MariaDB 10.2+ is based on MySQL 5.7, so the only addition of this contribution is the split of the counter. I would prefer to do it differently, if it is OK from a performance point of view:

  1. Move the counters from innodb_monitor to server status variables (export_vars).
  2. Instead of passing the counters as return values or output parameters, just do my_atomic_add(&export_vars.counter_name, ...) at the low level.

plinux, I think that the above is feasible to do in MariaDB 10.3.

Comment by Marko Mäkelä [ 2021-02-26 ]

In MDEV-23399, the LRU eviction flushing was moved from the single page cleaner thread to user threads that are allocating buffer pool pages.

I would not extend the monitor interface with new counters; that interface should hopefully be removed (MDEV-15706) and replaced with innodb_status_variables.

We actually do have MONITOR_LRU_BATCH_EVICT_TOTAL_PAGE, which is being incremented in buf_do_LRU_batch() (and not exposed elsewhere). We also have a buf_flush_page_count (innodb_buffer_pool_pages_flushed) that is incremented in each call of buf_flush_page(). That counter does not distinguish the two types of page writes (checkpoint or eviction).

It seems that we could address this by extending innodb_status_variables as follows:

  • Exposing the MONITOR_LRU_BATCH_EVICT_TOTAL_PAGE counter.
  • Introducing a new counter of page writes triggered by eviction flushing, updated while holding buf_pool.mutex or buf_pool.flush_list_mutex in buf_do_LRU_batch() or buf_flush_LRU_list_batch() or buf_page_write_complete().
Comment by Marko Mäkelä [ 2021-03-08 ]

bb-10.6-MDEV-11384

Comment by Vladislav Vaintroub [ 2021-03-08 ]

Looks good to me,

Comment by Marko Mäkelä [ 2021-03-08 ]

wlad, thank you. I filed MDEV-25085 for the change of instrumentation, because it has rather little to do with the original Description. Once that is closed, I intend to close this ticket as well, because then everything mentioned in the Description would be addressed.

Comment by Marko Mäkelä [ 2021-03-09 ]

The first part (MySQL WL#7047) was already part of MariaDB Server 10.2 (via MySQL 5.7). The second part (the counters) were mostly done in MariaDB 10.5.7 (MDEV-23399 and MDEV-23855), with a last bit (MDEV-25085) done in MariaDB 10.6.0.

Generated at Thu Feb 08 07:49:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.