Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11384

AliSQL: [Feature] Issue#19 BUFFER POOL LIST SCAN OPTIMIZATION

Details

    Description

      Description:
      ------------
      backport from WebScaleSQL
       
      This patch includes:
      --- backport of upstream work around buffer pool list scan.
       
           WL#7047 - Optimize buffer pool list scans and related batch processing code
       
           Reduce excessive scanning of pages when doing flush list batches. The
           fix is to introduce the concept of "Hazard Pointer", this reduces the
           time complexity of the scan from O(n*n) to O(n).
       
           The concept of hazard pointer is reversed in this work.  Academically a
           hazard pointer is a pointer that the thread working on it will declare as
           such and as long as that thread is not done no other thread is allowed to
           do anything with it.
       
           In this WL we declare the pointer as a hazard pointer and then if any other
           thread attempts to work on it, it is allowed to do so but it has to adjust
           the hazard pointer to the next valid value. We use hazard pointer solely for
           reverse traversal of lists within a buffer pool instance.
       
           Add an event to control the background flush thread. The background flush
           thread wait has been converted to an os event timed wait so that it can be
           signalled by threads that want to kick start a background flush when the
           buffer pool is running low on free/dirty pages.
       
      --- fix for mysql bug#71411
           buf_flush_LRU() returns the number of pages processed. There are
           two types of processing that can happen. A page can get evicted or
           a page can get flushed. These two numbers are quite distinct and
           should not be mixed.
      

      https://github.com/alibaba/AliSQL/commit/2645293fb0c1ed398f7243da2c14ab07572045b0

      Attachments

        Issue Links

          Activity

            The commit includes two things: a backport of a feature from MySQL 5.7 to AliSQL 5.6, and a change to split a "processed blocks" counter into "flushed blocks" and "evicted blocks" counters.

            MariaDB 10.2+ is based on MySQL 5.7, so the only addition of this contribution is the split of the counter. I would prefer to do it differently, if it is OK from a performance point of view:

            1. Move the counters from innodb_monitor to server status variables (export_vars).
            2. Instead of passing the counters as return values or output parameters, just do my_atomic_add(&export_vars.counter_name, ...) at the low level.

            plinux, I think that the above is feasible to do in MariaDB 10.3.

            marko Marko Mäkelä added a comment - The commit includes two things: a backport of a feature from MySQL 5.7 to AliSQL 5.6, and a change to split a "processed blocks" counter into "flushed blocks" and "evicted blocks" counters. MariaDB 10.2+ is based on MySQL 5.7, so the only addition of this contribution is the split of the counter. I would prefer to do it differently, if it is OK from a performance point of view: Move the counters from innodb_monitor to server status variables (export_vars). Instead of passing the counters as return values or output parameters, just do my_atomic_add(&export_vars.counter_name, ...) at the low level. plinux , I think that the above is feasible to do in MariaDB 10.3.

            In MDEV-23399, the LRU eviction flushing was moved from the single page cleaner thread to user threads that are allocating buffer pool pages.

            I would not extend the monitor interface with new counters; that interface should hopefully be removed (MDEV-15706) and replaced with innodb_status_variables.

            We actually do have MONITOR_LRU_BATCH_EVICT_TOTAL_PAGE, which is being incremented in buf_do_LRU_batch() (and not exposed elsewhere). We also have a buf_flush_page_count (innodb_buffer_pool_pages_flushed) that is incremented in each call of buf_flush_page(). That counter does not distinguish the two types of page writes (checkpoint or eviction).

            It seems that we could address this by extending innodb_status_variables as follows:

            • Exposing the MONITOR_LRU_BATCH_EVICT_TOTAL_PAGE counter.
            • Introducing a new counter of page writes triggered by eviction flushing, updated while holding buf_pool.mutex or buf_pool.flush_list_mutex in buf_do_LRU_batch() or buf_flush_LRU_list_batch() or buf_page_write_complete().
            marko Marko Mäkelä added a comment - In MDEV-23399 , the LRU eviction flushing was moved from the single page cleaner thread to user threads that are allocating buffer pool pages. I would not extend the monitor interface with new counters; that interface should hopefully be removed ( MDEV-15706 ) and replaced with innodb_status_variables . We actually do have MONITOR_LRU_BATCH_EVICT_TOTAL_PAGE , which is being incremented in buf_do_LRU_batch() (and not exposed elsewhere). We also have a buf_flush_page_count ( innodb_buffer_pool_pages_flushed ) that is incremented in each call of buf_flush_page() . That counter does not distinguish the two types of page writes (checkpoint or eviction). It seems that we could address this by extending innodb_status_variables as follows: Exposing the MONITOR_LRU_BATCH_EVICT_TOTAL_PAGE counter. Introducing a new counter of page writes triggered by eviction flushing, updated while holding buf_pool.mutex or buf_pool.flush_list_mutex in buf_do_LRU_batch() or buf_flush_LRU_list_batch() or buf_page_write_complete() .
            marko Marko Mäkelä added a comment - bb-10.6-MDEV-11384

            Looks good to me,

            wlad Vladislav Vaintroub added a comment - Looks good to me,

            wlad, thank you. I filed MDEV-25085 for the change of instrumentation, because it has rather little to do with the original Description. Once that is closed, I intend to close this ticket as well, because then everything mentioned in the Description would be addressed.

            marko Marko Mäkelä added a comment - wlad , thank you. I filed MDEV-25085 for the change of instrumentation, because it has rather little to do with the original Description. Once that is closed, I intend to close this ticket as well, because then everything mentioned in the Description would be addressed.

            The first part (MySQL WL#7047) was already part of MariaDB Server 10.2 (via MySQL 5.7). The second part (the counters) were mostly done in MariaDB 10.5.7 (MDEV-23399 and MDEV-23855), with a last bit (MDEV-25085) done in MariaDB 10.6.0.

            marko Marko Mäkelä added a comment - The first part ( MySQL WL#7047 ) was already part of MariaDB Server 10.2 (via MySQL 5.7). The second part (the counters) were mostly done in MariaDB 10.5.7 ( MDEV-23399 and MDEV-23855 ), with a last bit ( MDEV-25085 ) done in MariaDB 10.6.0.

            People

              marko Marko Mäkelä
              svoj Sergey Vojtovich
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.