Thanks for your quick reply Marko.
> I had first encountered this optimization when it was implemented in MySQL 5.7.2
So if I understand correctly, for you, this is an optimization...
> I think that removing the scanned=1 was a good idea, because it allows us to avoid accessing a global data structure (the monitor counter) in some cases.
Well... right: we avoid accessing a global data structure, but we loose in observability...
> I think that we must assess the performance impact
IMHO, observability is more important than performance. Not being able to detect that we run out of free pages in the free list and that a query thread had to do a LRU scan is not good for DBA that are keeping an eye on the behavior of their dbs.
Moreover and IMHO, accessing a global data structure is only adding a small cost compared to taking the LRU mutex. There are also mutex counters in performance schema, so intuitively, I would say that skipping incrementing MONITOR_LRU_SEARCH_SCANNED is a very small optimization.
Furthermore, two global data structures are accessed in buf_LRU_get_free_block before calling buf_LRU_scan_and_free_block and buf_LRU_free_from_unzip_LRU_list and buf_LRU_free_from_unzip_LRU_list: MONITOR_LRU_GET_FREE_SEARCH [1] and MONITOR_LRU_GET_FREE_LOOPS [2]. I do not think "saving" accessing a few more global data structures will significantly slow things down, and as written above, I am ok with slowing things down a little to get better observability. Also, these counters are disabled by default, which probably makes things fast.
[1]: https://github.com/MariaDB/server/blob/10.11/storage/innobase/buf/buf0lru.cc#L397
[2]: https://github.com/MariaDB/server/blob/10.11/storage/innobase/buf/buf0lru.cc#L423
> Would this patch (against 10.5) work for you?
Yes it would, and I would add a little more (see next two paragraphs). Your patch is better than what I had in mind. Starting the loop with scanned=1 causes an off-by-one error when no pages are freed, so does incrementing the counters by scanned+1.
You can probably also remove the "if (scanned)" as the counter should always be incremented (even if there was a case where the counter would be incremented by 0, which I think does not happen, we probably want to increment the NUM_CALL counter by 1 to trace that the function was entered).
I think there are other places where this bug was introduced in 5.7 / 10.1. After opening the upstream bug, I found buf_flush_single_page_from_LRU which I added in a comment. There might be others that I did not find yet.
> I would like to replace this counter interface with something simpler
IMHO, this counter interface works quite well. There is already a lot of things in the status variables, I am not sure I would want to put more things there. And I think both status variables and the InnoDB counters are "legacy" counters interfaces, the right way to show these now would be in performance schema. But this is a little out of the subject of this bug. I will look at MDEV-15706. I also have many more ideas on this subject, mostly coming from the Percona Extended InnoDB Slow Log which I would like to have somewhere in Performance Schema, ideally aggregated at the query level (like "Server" counters - rows examined, ... - in the digest table).
> could you share some benchmark scripts or performance testing results?
I will soon, in a blog post about Free Pages, Free List, etc...
Thanks again Marko for spending time on my report and working on this with me.
Thank you, jeanfrancois.gagne! I can confirm your finding, also for the function buf_LRU_free_from_unzip_LRU_list().
In MariaDB Server 10.1.2, this was changed by
MDEV-6936. I had first encountered this optimization when it was implemented in MySQL 5.7.2 more than a year before it appeared in the mostly 5.6-based MariaDB Server 10.1 series.I think that removing the scanned=1 was a good idea, because it allows us to avoid accessing a global data structure (the monitor counter) in some cases. But, indeed, if a block is freed on the first iteration, we fail to update the counter. Would this patch (against 10.5) work for you?
diff --git a/storage/innobase/buf/buf0lru.cc b/storage/innobase/buf/buf0lru.cc
index b282eb17dae..8acb0aa203f 100644
--- a/storage/innobase/buf/buf0lru.cc
+++ b/storage/innobase/buf/buf0lru.cc
@@ -208,6 +208,7 @@ static bool buf_LRU_free_from_unzip_LRU_list(ulint limit)
freed = buf_LRU_free_page(&block->page, false);
if (freed) {
+ scanned++;
break;
}
@@ -252,6 +253,7 @@ static bool buf_LRU_free_from_common_LRU_list(ulint limit)
}
freed = true;
+ scanned++;
break;
}
I had expected that this code could have been changed in MariaDB Server 10.5 by
MDEV-23399orMDEV-23855, but that is not the case.Note: I would like to replace this counter interface with something simpler (status variables). See MDEV-15706.