Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33881

Userstat skips system tables inconsistently

Details

    Description

      As a sub-product of MDEV-33152 (Add QUERIES to INDEX_STATISTICS), INDEX_STATISTICS was made to ignore all tables other than those of "USER" category.
      In addition to statistical tables and information_schema/performance_schema, it now ignores tables of "SYSTEM" category.

      However, from the user perspective, the difference between "USER" and "SYSTEM" tables in MariaDB isn't obvious. Most tables in mysql schema are not of "SYSTEM" category, only a small subset of them are. For example, mysql.proc and mysql.help_topics are "SYSTEM" tables, so they are now skipped by INDEX_STATISTICS, while mysql.global_priv and mysql.db are not – even though all of them are Aria tables with indexes, which can be read from or altered by a user, etc.

      Moreover, unlike INDEX_STATISTICS, TABLE_STATISTICS still takes into account all such tables, which makes the resulting statistics even more confusing.

      I suggest to allow CATEGORY_SYSTEM tables remain in INDEX_STATISTICS, they shouldn't normally create huge additional load on statistics collection, but if the user's workflow is such that they are used heavily, the user would probably want to have statistics for them.

      set global userstat=1;
      select count(*) from mysql.proc where db = 'mysql';
      select count(*) from mysql.global_priv where user = 'mysql';
      select * from information_schema.table_statistics;
      select * from information_schema.index_statistics;
      set global userstat=0;
      

      bb-11.5-MDEV-33152-index-statistics 802e89b176efd8ef995cdd456a04ac6452f0e284

      select * from information_schema.table_statistics;
      TABLE_SCHEMA	TABLE_NAME	ROWS_READ	ROWS_CHANGED	ROWS_CHANGED_X_INDEXES
      mysql	global_priv	5	0	0
      mysql	proc	2	0	0
      select * from information_schema.index_statistics;
      TABLE_SCHEMA	TABLE_NAME	INDEX_NAME	ROWS_READ	QUERIES
      mysql	global_priv	PRIMARY	5	1
      

      compare to 11.5 main:

      select * from information_schema.table_statistics;
      TABLE_SCHEMA	TABLE_NAME	ROWS_READ	ROWS_CHANGED	ROWS_CHANGED_X_INDEXES
      mysql	global_priv	5	0	0
      mysql	proc	2	0	0
      select * from information_schema.index_statistics;
      TABLE_SCHEMA	TABLE_NAME	INDEX_NAME	ROWS_READ
      mysql	global_priv	PRIMARY	5
      mysql	proc	PRIMARY	2
      

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova created issue -
            elenst Elena Stepanova made changes -
            Field Original Value New Value
            elenst Elena Stepanova made changes -
            Summary INDEX_STATISTICS skip system tables inconsistently Userstat skips system tables inconsistently
            elenst Elena Stepanova made changes -
            serg Sergei Golubchik made changes -
            monty Michael Widenius added a comment - - edited

            One of the main points of INDEX statistics is to find indexes that are not used.
            Because of this collecting index information for tables the user has no control of is meaningless.
            Another problem with collecting things for CATEGORY_SYSTEM is that mtr results becomes dependent of things like
            the setting of use_stat_tables or even background updates by innodb persistent statistics.
            I agree that INDEX_STATISTICS and TABLE_STATISTICS should collect information from the same tables.

            I other words, I am not worried about the performance. I am worried about presenting information that is different based on settings, if data is cached or not. Having system tables not be part of index statistics makes things more predictable and less confusing for the user (like why is this index sometimes used and not used in other cases etc.)

            I agree that privilege tables like global_priv and mysql should also be skipped.
            This was missed because we never had a table category for privilege tables. Should probable be added. I checked the code how TABLE_CATEGORY_USER is used and it looks like we have bugs in the code where we assume that 'mysql' tables, like mysql.db is classified as CATEGORY_USER.

            monty Michael Widenius added a comment - - edited One of the main points of INDEX statistics is to find indexes that are not used. Because of this collecting index information for tables the user has no control of is meaningless. Another problem with collecting things for CATEGORY_SYSTEM is that mtr results becomes dependent of things like the setting of use_stat_tables or even background updates by innodb persistent statistics. I agree that INDEX_STATISTICS and TABLE_STATISTICS should collect information from the same tables. I other words, I am not worried about the performance. I am worried about presenting information that is different based on settings, if data is cached or not. Having system tables not be part of index statistics makes things more predictable and less confusing for the user (like why is this index sometimes used and not used in other cases etc.) I agree that privilege tables like global_priv and mysql should also be skipped. This was missed because we never had a table category for privilege tables. Should probable be added. I checked the code how TABLE_CATEGORY_USER is used and it looks like we have bugs in the code where we assume that 'mysql' tables, like mysql.db is classified as CATEGORY_USER.
            monty Michael Widenius added a comment - - edited

            I have now assigned CATEGORY_MYSQL for all the not assigned tables in the mysql schema. This will remove these from INDEX_STATS.
            I checked the code for TABLE_STATISTICS and it also ignores tables that are not of category TABLE_CATEGORY_USER.
            Will do some testing to verify that:

            Running your test after fixing the above:
            MariaDB [mysql]> select * from information_schema.table_statistics;
            MariaDB [mysql]> select * from information_schema.index_statistics;
            (empty set for both)

            monty Michael Widenius added a comment - - edited I have now assigned CATEGORY_MYSQL for all the not assigned tables in the mysql schema. This will remove these from INDEX_STATS. I checked the code for TABLE_STATISTICS and it also ignores tables that are not of category TABLE_CATEGORY_USER. Will do some testing to verify that: Running your test after fixing the above: MariaDB [mysql] > select * from information_schema.table_statistics; MariaDB [mysql] > select * from information_schema.index_statistics; (empty set for both)

            To clarify the confusion which was raised, this part of the complaint in the description

            Moreover, unlike INDEX_STATISTICS, TABLE_STATISTICS still takes into account all such tables

            was specific to bb-11.5-MDEV-33152-index-statistics, in other words, to changes made in the scope of MDEV-33152. TABLE_STATISTICS was changed separately in MDEV-33151, so as long as MDEV-33151 and MDEV-33152 are released (or not released) together, this shouldn't be a problem. The rest of the description, a different treatment of different tables in mysql schema, was true for both INDEX_STATISTICS and TABLE_STATISTICS.

            elenst Elena Stepanova added a comment - To clarify the confusion which was raised, this part of the complaint in the description Moreover, unlike INDEX_STATISTICS, TABLE_STATISTICS still takes into account all such tables was specific to bb-11.5- MDEV-33152 -index-statistics, in other words, to changes made in the scope of MDEV-33152 . TABLE_STATISTICS was changed separately in MDEV-33151 , so as long as MDEV-33151 and MDEV-33152 are released (or not released) together, this shouldn't be a problem. The rest of the description, a different treatment of different tables in mysql schema, was true for both INDEX_STATISTICS and TABLE_STATISTICS.
            monty Michael Widenius made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            Fixed in bb-11.5-monty
            This patch removes all usage of tables in the mysql schema from userstat, which I think is correct as these are in most cases used invisible from users.
            The patch also makes all tables in the mysql schema uniform. One cannot create table stats for any of them and there is no need to read stat tables for them.

            If needed, we can add a variable that we do also do stats for all tables in the mysql schema.

            monty Michael Widenius added a comment - Fixed in bb-11.5-monty This patch removes all usage of tables in the mysql schema from userstat, which I think is correct as these are in most cases used invisible from users. The patch also makes all tables in the mysql schema uniform. One cannot create table stats for any of them and there is no need to read stat tables for them. If needed, we can add a variable that we do also do stats for all tables in the mysql schema.
            monty Michael Widenius made changes -
            Status In Progress [ 3 ] In Testing [ 10301 ]

            Pushed to bb-11.5-monty

            monty Michael Widenius added a comment - Pushed to bb-11.5-monty
            monty Michael Widenius made changes -
            issue.field.resolutiondate 2024-04-18 09:47:44.0 2024-04-18 09:47:43.735
            monty Michael Widenius made changes -
            Fix Version/s 11.5.1 [ 29634 ]
            Fix Version/s 11.5 [ 29506 ]
            Resolution Fixed [ 1 ]
            Status In Testing [ 10301 ] Closed [ 6 ]

            People

              monty Michael Widenius
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.