[MDEV-28724] innodb_stats_persistent_sample_pages accepting percentage value Created: 2022-06-01  Updated: 2024-01-22

Status: Open
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - InnoDB
Fix Version/s: N/A

Type: New Feature Priority: Minor
Reporter: Martin Reinhardt Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 0
Labels: beginner-friendly, innodb, performance


 Description   

We have a feature request from a customer for innodb_stats_persistent_sample_pages also accepting a percentage rather than only an absolute number of index pages.



 Comments   
Comment by Haidong Ji [ 2022-06-01 ]

I'd like to work on this.

Comment by Haidong Ji [ 2022-06-06 ]

Hi Martin, I don't see any support case number links, perhaps it's a permission issue. If I couldn't see the support case, could you provide a summary here? It will help me in coming up with ideas to implement this feature request.

Comment by Haidong Ji [ 2022-06-06 ]

Unfortunately I don't have access to the support case, Martin. Would it be OK to share its content here? Or a quick summary will be good.

Comment by Martin Reinhardt [ 2022-06-06 ]

Description from case: We have persistent statistics enabled. However the innodb_stats_persistent_sample_pages is set to 20 pages by default. Our tables differ in size considerably - anywhere from few pages to few thousands or millions Is there a way to specify this parameter in percent rather than an absolute value in pages? If I mention, 10,000 pages for ex, many tables will fall under this category but will cause spike in IO and take longer when analyze table is run. Mentioning this as a percentage would make sense here, but not sure if there's a way to do it. If there's not, can this be added as a feature going forward?

Comment by Haidong Ji [ 2022-06-07 ]

Below is my research so far and proposed solution.

Here is how 3 other competing database products implement this feature:

Here is my proposal of implementation:

  • Introducing 2 additional settings
    • innodb_stats_persistent_percent_sampling
      This is a boolean setting, true means the sampling is based on percentage. Its default value should be false, which is consistent with the current state of not having this feature and won't break things;
    • innodb_stats_persistent_sample_percentage
      This is an int, between 0 and 100.
  • During ANALYZE TABLE, we'll add code to check innodb_stats_persistent_percent_sampling boolean setting. If true, we'll calculate the number of pages for sampling based on innodb_stats_persistent_sample_percentage. Otherwise, we'll proceed like we currently do, using the innodb_stats_persistent_sample_pages.

In both Oracle and SQL Server, the unit to sample is number of rows, not number of pages like Innodb. In this new feature, I plan to continue using number of pages, instead of rows like Oracle or SQL Server. I suspect that is what the customer ask for anyway, but want to mention it in the off chance that's not the case.

Comment by Marko Mäkelä [ 2022-06-28 ]

haidong, thank you. As I just posted in the pull request, I think that a single parameter would be better, say, innodb_stats_percentage and the table attribute stats_percentage, defined as HA_TOPTION_SYSVAR. The default value 0 would mean that the percentage-based sampling is disabled. Let us continue the discussion in the pull request.

Generated at Thu Feb 08 10:02:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.