Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30089

Metrics not incremented for 1st iteration in buf_LRU_free_from_common_LRU_list.

Details

    Description

      Hi,

      It looks like MariaDB is affected by this upstream bug:
      https://bugs.mysql.com/bug.php?id=109200

      I quickly checked the code in 10.11 (affected), 10.0 (not affected, like 5.6) and 10.1 (affected). I am extrapolating 10.2 to 10.10 affected.

      Many thanks for looking into this,

      Jean-François Gagné

      Attachments

        Issue Links

          Activity

            jeanfrancois.gagne Jean-François Gagné created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value

            Thank you, jeanfrancois.gagne! I can confirm your finding, also for the function buf_LRU_free_from_unzip_LRU_list().

            In MariaDB Server 10.1.2, this was changed by MDEV-6936. I had first encountered this optimization when it was implemented in MySQL 5.7.2 more than a year before it appeared in the mostly 5.6-based MariaDB Server 10.1 series.

            I think that removing the scanned=1 was a good idea, because it allows us to avoid accessing a global data structure (the monitor counter) in some cases. But, indeed, if a block is freed on the first iteration, we fail to update the counter. Would this patch (against 10.5) work for you?

            diff --git a/storage/innobase/buf/buf0lru.cc b/storage/innobase/buf/buf0lru.cc
            index b282eb17dae..8acb0aa203f 100644
            --- a/storage/innobase/buf/buf0lru.cc
            +++ b/storage/innobase/buf/buf0lru.cc
            @@ -208,6 +208,7 @@ static bool buf_LRU_free_from_unzip_LRU_list(ulint limit)
             
             		freed = buf_LRU_free_page(&block->page, false);
             		if (freed) {
            +			scanned++;
             			break;
             		}
             
            @@ -252,6 +253,7 @@ static bool buf_LRU_free_from_common_LRU_list(ulint limit)
             			}
             
             			freed = true;
            +			scanned++;
             			break;
             		}
             	}
            

            I had expected that this code could have been changed in MariaDB Server 10.5 by MDEV-23399 or MDEV-23855, but that is not the case.

            Note: I would like to replace this counter interface with something simpler (status variables). See MDEV-15706.

            marko Marko Mäkelä added a comment - Thank you, jeanfrancois.gagne ! I can confirm your finding, also for the function buf_LRU_free_from_unzip_LRU_list() . In MariaDB Server 10.1.2, this was changed by MDEV-6936 . I had first encountered this optimization when it was implemented in MySQL 5.7.2 more than a year before it appeared in the mostly 5.6-based MariaDB Server 10.1 series. I think that removing the scanned=1 was a good idea, because it allows us to avoid accessing a global data structure (the monitor counter) in some cases. But, indeed, if a block is freed on the first iteration, we fail to update the counter. Would this patch (against 10.5) work for you? diff --git a/storage/innobase/buf/buf0lru.cc b/storage/innobase/buf/buf0lru.cc index b282eb17dae..8acb0aa203f 100644 --- a/storage/innobase/buf/buf0lru.cc +++ b/storage/innobase/buf/buf0lru.cc @@ -208,6 +208,7 @@ static bool buf_LRU_free_from_unzip_LRU_list(ulint limit) freed = buf_LRU_free_page(&block->page, false); if (freed) { + scanned++; break; } @@ -252,6 +253,7 @@ static bool buf_LRU_free_from_common_LRU_list(ulint limit) } freed = true; + scanned++; break; } } I had expected that this code could have been changed in MariaDB Server 10.5 by MDEV-23399 or MDEV-23855 , but that is not the case. Note: I would like to replace this counter interface with something simpler (status variables). See MDEV-15706 .

            Before implementing this, I think that we must assess the performance impact. jeanfrancois.gagne, could you share some benchmark scripts or performance testing results?

            marko Marko Mäkelä added a comment - Before implementing this, I think that we must assess the performance impact. jeanfrancois.gagne , could you share some benchmark scripts or performance testing results?
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ]
            Status Open [ 1 ] Needs Feedback [ 10501 ]

            Thanks for your quick reply Marko.

            > I had first encountered this optimization when it was implemented in MySQL 5.7.2

            So if I understand correctly, for you, this is an optimization...

            > I think that removing the scanned=1 was a good idea, because it allows us to avoid accessing a global data structure (the monitor counter) in some cases.

            Well... right: we avoid accessing a global data structure, but we loose in observability...

            > I think that we must assess the performance impact

            IMHO, observability is more important than performance. Not being able to detect that we run out of free pages in the free list and that a query thread had to do a LRU scan is not good for DBA that are keeping an eye on the behavior of their dbs.

            Moreover and IMHO, accessing a global data structure is only adding a small cost compared to taking the LRU mutex. There are also mutex counters in performance schema, so intuitively, I would say that skipping incrementing MONITOR_LRU_SEARCH_SCANNED is a very small optimization.

            Furthermore, two global data structures are accessed in buf_LRU_get_free_block before calling buf_LRU_scan_and_free_block and buf_LRU_free_from_unzip_LRU_list and buf_LRU_free_from_unzip_LRU_list: MONITOR_LRU_GET_FREE_SEARCH [1] and MONITOR_LRU_GET_FREE_LOOPS [2]. I do not think "saving" accessing a few more global data structures will significantly slow things down, and as written above, I am ok with slowing things down a little to get better observability. Also, these counters are disabled by default, which probably makes things fast.

            [1]: https://github.com/MariaDB/server/blob/10.11/storage/innobase/buf/buf0lru.cc#L397

            [2]: https://github.com/MariaDB/server/blob/10.11/storage/innobase/buf/buf0lru.cc#L423

            > Would this patch (against 10.5) work for you?

            Yes it would, and I would add a little more (see next two paragraphs). Your patch is better than what I had in mind. Starting the loop with scanned=1 causes an off-by-one error when no pages are freed, so does incrementing the counters by scanned+1.

            You can probably also remove the "if (scanned)" as the counter should always be incremented (even if there was a case where the counter would be incremented by 0, which I think does not happen, we probably want to increment the NUM_CALL counter by 1 to trace that the function was entered).

            I think there are other places where this bug was introduced in 5.7 / 10.1. After opening the upstream bug, I found buf_flush_single_page_from_LRU which I added in a comment. There might be others that I did not find yet.

            > I would like to replace this counter interface with something simpler

            IMHO, this counter interface works quite well. There is already a lot of things in the status variables, I am not sure I would want to put more things there. And I think both status variables and the InnoDB counters are "legacy" counters interfaces, the right way to show these now would be in performance schema. But this is a little out of the subject of this bug. I will look at MDEV-15706. I also have many more ideas on this subject, mostly coming from the Percona Extended InnoDB Slow Log which I would like to have somewhere in Performance Schema, ideally aggregated at the query level (like "Server" counters - rows examined, ... - in the digest table).

            > could you share some benchmark scripts or performance testing results?

            I will soon, in a blog post about Free Pages, Free List, etc...

            Thanks again Marko for spending time on my report and working on this with me.

            jeanfrancois.gagne Jean-François Gagné added a comment - Thanks for your quick reply Marko. > I had first encountered this optimization when it was implemented in MySQL 5.7.2 So if I understand correctly, for you, this is an optimization... > I think that removing the scanned=1 was a good idea, because it allows us to avoid accessing a global data structure (the monitor counter) in some cases. Well... right: we avoid accessing a global data structure, but we loose in observability... > I think that we must assess the performance impact IMHO, observability is more important than performance. Not being able to detect that we run out of free pages in the free list and that a query thread had to do a LRU scan is not good for DBA that are keeping an eye on the behavior of their dbs. Moreover and IMHO, accessing a global data structure is only adding a small cost compared to taking the LRU mutex. There are also mutex counters in performance schema, so intuitively, I would say that skipping incrementing MONITOR_LRU_SEARCH_SCANNED is a very small optimization. Furthermore, two global data structures are accessed in buf_LRU_get_free_block before calling buf_LRU_scan_and_free_block and buf_LRU_free_from_unzip_LRU_list and buf_LRU_free_from_unzip_LRU_list: MONITOR_LRU_GET_FREE_SEARCH [1] and MONITOR_LRU_GET_FREE_LOOPS [2] . I do not think "saving" accessing a few more global data structures will significantly slow things down, and as written above, I am ok with slowing things down a little to get better observability. Also, these counters are disabled by default, which probably makes things fast. [1] : https://github.com/MariaDB/server/blob/10.11/storage/innobase/buf/buf0lru.cc#L397 [2] : https://github.com/MariaDB/server/blob/10.11/storage/innobase/buf/buf0lru.cc#L423 > Would this patch (against 10.5) work for you? Yes it would, and I would add a little more (see next two paragraphs). Your patch is better than what I had in mind. Starting the loop with scanned=1 causes an off-by-one error when no pages are freed, so does incrementing the counters by scanned+1. You can probably also remove the "if (scanned)" as the counter should always be incremented (even if there was a case where the counter would be incremented by 0, which I think does not happen, we probably want to increment the NUM_CALL counter by 1 to trace that the function was entered). I think there are other places where this bug was introduced in 5.7 / 10.1. After opening the upstream bug, I found buf_flush_single_page_from_LRU which I added in a comment. There might be others that I did not find yet. > I would like to replace this counter interface with something simpler IMHO, this counter interface works quite well. There is already a lot of things in the status variables, I am not sure I would want to put more things there. And I think both status variables and the InnoDB counters are "legacy" counters interfaces, the right way to show these now would be in performance schema. But this is a little out of the subject of this bug. I will look at MDEV-15706 . I also have many more ideas on this subject, mostly coming from the Percona Extended InnoDB Slow Log which I would like to have somewhere in Performance Schema, ideally aggregated at the query level (like "Server" counters - rows examined, ... - in the digest table). > could you share some benchmark scripts or performance testing results? I will soon, in a blog post about Free Pages, Free List, etc... Thanks again Marko for spending time on my report and working on this with me.

            I had thought that the condition if (scanned) might still be useful in the case that the list is empty. For buf_LRU_free_from_unzip_LRU_list() it is the case unless some ROW_FORMAT=COMPRESSED tables exist in the buffer pool. Testing a local variable is cheaper than loading and testing a global variable, so I would retain the condition there.

            For buf_LRU_free_from_common_LRU_list() you are right that it practically never is the case, and the condition can be removed as redundant.

            The status variable interface was extended in 10.5 by MDEV-18582, mostly by exposing InnoDB counters that were already being updated. I have removed some counters that I considered to be useless or redundant or found to be unused, in MDEV-28540, MDEV-28539, MDEV-27848, MDEV-14425.

            marko Marko Mäkelä added a comment - I had thought that the condition if (scanned) might still be useful in the case that the list is empty. For buf_LRU_free_from_unzip_LRU_list() it is the case unless some ROW_FORMAT=COMPRESSED tables exist in the buffer pool. Testing a local variable is cheaper than loading and testing a global variable, so I would retain the condition there. For buf_LRU_free_from_common_LRU_list() you are right that it practically never is the case, and the condition can be removed as redundant. The status variable interface was extended in 10.5 by MDEV-18582 , mostly by exposing InnoDB counters that were already being updated. I have removed some counters that I considered to be useless or redundant or found to be unused, in MDEV-28540 , MDEV-28539 , MDEV-27848 , MDEV-14425 .

            The single-page flushing and related counters were removed in MDEV-23399. I might not fix this in older versions than 10.5.

            marko Marko Mäkelä added a comment - The single-page flushing and related counters were removed in MDEV-23399 . I might not fix this in older versions than 10.5.

            > The single-page flushing and related counters were removed in MDEV-23399.

            Ah, ok. IMHO, this is a lost in observability. I have not read the details of the MDEV.

            > I might not fix this in older versions than 10.5.

            I understand.

            > For buf_LRU_free_from_unzip_LRU_list() it is the case [...]

            Hum, right. I am ok with not incrementing MONITOR_LRU_UNZIP_SEARCH_SCANNED_NUM_CALL if the unzip LRU is empty.

            As a side note, MariaDB and MySQL have diverged here. In MySQL, buf_LRU_free_from_unzip_LRU_list is not called if the unzip LRU is empty [1], but it is called in MariaDB [2]. IMHO, MySQL is better here as it does not call buf_LRU_free_from_unzip_LRU_list if the unzip LRU is empty. Maybe the "return quick" at the beginning of buf_LRU_free_from_unzip_LRU_list in MariaDB should also return if the list is empty as the 1st test, and then we can remove the "if (scanned)" . But I might be bike-shedding here.

            [1]: https://github.com/mysql/mysql-server/blob/mysql-8.0.30/storage/innobase/buf/buf0lru.cc#L1183

            [2]: https://github.com/MariaDB/server/blob/10.11/storage/innobase/buf/buf0lru.cc#L1322

            [3]: https://github.com/MariaDB/server/blob/10.11/storage/innobase/buf/buf0lru.cc#L195

            jeanfrancois.gagne Jean-François Gagné added a comment - > The single-page flushing and related counters were removed in MDEV-23399 . Ah, ok. IMHO, this is a lost in observability. I have not read the details of the MDEV. > I might not fix this in older versions than 10.5. I understand. > For buf_LRU_free_from_unzip_LRU_list() it is the case [...] Hum, right. I am ok with not incrementing MONITOR_LRU_UNZIP_SEARCH_SCANNED_NUM_CALL if the unzip LRU is empty. As a side note, MariaDB and MySQL have diverged here. In MySQL, buf_LRU_free_from_unzip_LRU_list is not called if the unzip LRU is empty [1] , but it is called in MariaDB [2] . IMHO, MySQL is better here as it does not call buf_LRU_free_from_unzip_LRU_list if the unzip LRU is empty. Maybe the "return quick" at the beginning of buf_LRU_free_from_unzip_LRU_list in MariaDB should also return if the list is empty as the 1st test, and then we can remove the "if (scanned)" . But I might be bike-shedding here. [1] : https://github.com/mysql/mysql-server/blob/mysql-8.0.30/storage/innobase/buf/buf0lru.cc#L1183 [2] : https://github.com/MariaDB/server/blob/10.11/storage/innobase/buf/buf0lru.cc#L1322 [3] : https://github.com/MariaDB/server/blob/10.11/storage/innobase/buf/buf0lru.cc#L195
            marko Marko Mäkelä made changes -
            Status Needs Feedback [ 10501 ] Open [ 1 ]

            The single-page flushing was a known performance bottleneck. It is a special case of "LRU flushing", that is, writing out a dirty page so that it can be replaced with something else in the buffer pool, which I prefer to call "eviction flushing". We found that it is simpler to always initiate a number of dirty page writes (a "LRU flushing batch") when needed. The single buf_flush_page_cleaner thread in MariaDB starting with 10.5.7 only initiates checkpoint-related page writes. The eviction flushing is always initiated by a thread that needs to replace a page in the buffer pool.

            marko Marko Mäkelä added a comment - The single-page flushing was a known performance bottleneck. It is a special case of "LRU flushing", that is, writing out a dirty page so that it can be replaced with something else in the buffer pool, which I prefer to call "eviction flushing". We found that it is simpler to always initiate a number of dirty page writes (a "LRU flushing batch") when needed. The single buf_flush_page_cleaner thread in MariaDB starting with 10.5.7 only initiates checkpoint-related page writes. The eviction flushing is always initiated by a thread that needs to replace a page in the buffer pool.
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2022-11-28 11:35:58.0 2022-11-28 11:35:58.333
            marko Marko Mäkelä made changes -
            Component/s Storage Engine - InnoDB [ 10129 ]
            Fix Version/s 10.5.19 [ 28511 ]
            Fix Version/s 10.6.12 [ 28513 ]
            Fix Version/s 10.7.8 [ 28515 ]
            Fix Version/s 10.8.7 [ 28517 ]
            Fix Version/s 10.9.5 [ 28519 ]
            Fix Version/s 10.10.3 [ 28521 ]
            Fix Version/s 10.11.2 [ 28523 ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]

            People

              marko Marko Mäkelä
              jeanfrancois.gagne Jean-François Gagné
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.