Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28724

innodb_stats_persistent_sample_pages accepting percentage value

Details

    Description

      We have a feature request from a customer for innodb_stats_persistent_sample_pages also accepting a percentage rather than only an absolute number of index pages.

      Attachments

        Issue Links

          Activity

            haidong Haidong Ji added a comment -

            I'd like to work on this.

            haidong Haidong Ji added a comment - I'd like to work on this.
            haidong Haidong Ji added a comment -

            Hi Martin, I don't see any support case number links, perhaps it's a permission issue. If I couldn't see the support case, could you provide a summary here? It will help me in coming up with ideas to implement this feature request.

            haidong Haidong Ji added a comment - Hi Martin, I don't see any support case number links, perhaps it's a permission issue. If I couldn't see the support case, could you provide a summary here? It will help me in coming up with ideas to implement this feature request.
            haidong Haidong Ji added a comment -

            Unfortunately I don't have access to the support case, Martin. Would it be OK to share its content here? Or a quick summary will be good.

            haidong Haidong Ji added a comment - Unfortunately I don't have access to the support case, Martin. Would it be OK to share its content here? Or a quick summary will be good.
            martin.reinhardt@mariadb.com Martin Reinhardt added a comment - - edited

            Description from case: We have persistent statistics enabled. However the innodb_stats_persistent_sample_pages is set to 20 pages by default. Our tables differ in size considerably - anywhere from few pages to few thousands or millions Is there a way to specify this parameter in percent rather than an absolute value in pages? If I mention, 10,000 pages for ex, many tables will fall under this category but will cause spike in IO and take longer when analyze table is run. Mentioning this as a percentage would make sense here, but not sure if there's a way to do it. If there's not, can this be added as a feature going forward?

            martin.reinhardt@mariadb.com Martin Reinhardt added a comment - - edited Description from case: We have persistent statistics enabled. However the innodb_stats_persistent_sample_pages is set to 20 pages by default. Our tables differ in size considerably - anywhere from few pages to few thousands or millions Is there a way to specify this parameter in percent rather than an absolute value in pages? If I mention, 10,000 pages for ex, many tables will fall under this category but will cause spike in IO and take longer when analyze table is run. Mentioning this as a percentage would make sense here, but not sure if there's a way to do it. If there's not, can this be added as a feature going forward?
            haidong Haidong Ji added a comment -

            Below is my research so far and proposed solution.

            Here is how 3 other competing database products implement this feature:

            Here is my proposal of implementation:

            • Introducing 2 additional settings
              • innodb_stats_persistent_percent_sampling
                This is a boolean setting, true means the sampling is based on percentage. Its default value should be false, which is consistent with the current state of not having this feature and won't break things;
              • innodb_stats_persistent_sample_percentage
                This is an int, between 0 and 100.
            • During ANALYZE TABLE, we'll add code to check innodb_stats_persistent_percent_sampling boolean setting. If true, we'll calculate the number of pages for sampling based on innodb_stats_persistent_sample_percentage. Otherwise, we'll proceed like we currently do, using the innodb_stats_persistent_sample_pages.

            In both Oracle and SQL Server, the unit to sample is number of rows, not number of pages like Innodb. In this new feature, I plan to continue using number of pages, instead of rows like Oracle or SQL Server. I suspect that is what the customer ask for anyway, but want to mention it in the off chance that's not the case.

            haidong Haidong Ji added a comment - Below is my research so far and proposed solution. Here is how 3 other competing database products implement this feature: PostgreSQL As far as I can tell, PostgreSQL doesn't have a similar feature. I checked CREATE STATISTICS and ANALYZE https://www.postgresql.org/docs/current/sql-analyze.html https://www.postgresql.org/docs/current/sql-createstatistics.html SQL Server https://docs.microsoft.com/en-us/sql/t-sql/statements/update-statistics-transact-sql?view=sql-server-ver16 It has the setting of PERSIST_SAMPLE_PERCENT. This name may be the most relevant if/when we decide to add new setting Oracle https://docs.oracle.com/en/database/oracle/oracle-database/21/tgsql/gathering-optimizer-statistics.html Here is my proposal of implementation: Introducing 2 additional settings innodb_stats_persistent_percent_sampling This is a boolean setting, true means the sampling is based on percentage. Its default value should be false, which is consistent with the current state of not having this feature and won't break things; innodb_stats_persistent_sample_percentage This is an int, between 0 and 100. During ANALYZE TABLE, we'll add code to check innodb_stats_persistent_percent_sampling boolean setting. If true, we'll calculate the number of pages for sampling based on innodb_stats_persistent_sample_percentage. Otherwise, we'll proceed like we currently do, using the innodb_stats_persistent_sample_pages. In both Oracle and SQL Server, the unit to sample is number of rows, not number of pages like Innodb. In this new feature, I plan to continue using number of pages, instead of rows like Oracle or SQL Server. I suspect that is what the customer ask for anyway, but want to mention it in the off chance that's not the case.

            haidong, thank you. As I just posted in the pull request, I think that a single parameter would be better, say, innodb_stats_percentage and the table attribute stats_percentage, defined as HA_TOPTION_SYSVAR. The default value 0 would mean that the percentage-based sampling is disabled. Let us continue the discussion in the pull request.

            marko Marko Mäkelä added a comment - haidong , thank you. As I just posted in the pull request , I think that a single parameter would be better, say, innodb_stats_percentage and the table attribute stats_percentage , defined as HA_TOPTION_SYSVAR . The default value 0 would mean that the percentage-based sampling is disabled. Let us continue the discussion in the pull request.

            marko I have been informed that the PR developer is no longer contributing to MariaDB.

            TheLinuxJedi Andrew Hutchings (Inactive) added a comment - marko I have been informed that the PR developer is no longer contributing to MariaDB.

            People

              marko Marko Mäkelä
              martin.reinhardt@mariadb.com Martin Reinhardt
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.