[MDEV-33151] Add more columns to TABLE_STATISTICS and USER STATS - Jira

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 11.5.1
Component/s: Plugin - userstat
Labels:
- Preview_11.5

Description

Following changes are for the userstat plugin:

Add the following columns to TABLE_STATISTICS for MariaDB monitors:

ROWS_INSERTED, ROWS_DELETED, ROWS_UPDATED, KEY_READ_HITS and KEY_READ_MISSES

Add the following statistics to CLIENT_STATISTICS and USER_STATISTICS:

KEY_READ_HITS and KEY_READ_MISSES

Attachments

Issue Links

causes

MDEV-33881 Userstat skips system tables inconsistently

Closed

MDEV-33913 TABLE_STATISTICS increments ROWS_CHANGED twice upon UPDATE and does not count DELETE

Closed

MDEV-33916 Key_read_misses in USER_STATISTICS and CLIENT_STATISTICS is unusable

Closed

MDEV-34153 Userstat Key_read_hits/Key_read_misses do not work for descending keys

Open

MDEV-34154 Userstat: Updates to system tables via DDL are reflected in client/user statistics

Open

Activity

Ascending order - Click to sort in descending order

Michael Widenius added a comment - 2024-01-02 11:53

Commit message:

~~MDEV-33151~~ Add more columns to TABLE_STATISTICS and USER STATS

Columns added to TABLE_STATISTICS

ROWS_INSERTED, ROWS_DELETED, ROWS_UPDATED, KEY_READ_HITS and
KEY_READ_MISSES.

Columns added to CLIENT_STATISTICS and USER_STATISTICS:

KEY_READ_HITS and KEY_READ_MISSES.

User visible changes (except new columns):

CLIENT_STATISTICS and USER_STATISTICS has columns KEY_READ_HITS and
KEY_READ_MISSES added after column ROWS_UPDATED before SELECT_COMMANDS.

Other changes:

All row variables that are part of user_stats moved to
'struct rows_stats' to make it easy to clear all of them at once.
ha_read_key_misses added to STATUS_VAR

Notes:

userstat.result has a change of handler_read_key. This is because
use-stat-tables is now disabled for the test.

Michael Widenius added a comment - 2024-01-02 11:53 Commit message: MDEV-33151 Add more columns to TABLE_STATISTICS and USER STATS Columns added to TABLE_STATISTICS ROWS_INSERTED, ROWS_DELETED, ROWS_UPDATED, KEY_READ_HITS and KEY_READ_MISSES. Columns added to CLIENT_STATISTICS and USER_STATISTICS: KEY_READ_HITS and KEY_READ_MISSES. User visible changes (except new columns): CLIENT_STATISTICS and USER_STATISTICS has columns KEY_READ_HITS and KEY_READ_MISSES added after column ROWS_UPDATED before SELECT_COMMANDS. Other changes: All row variables that are part of user_stats moved to 'struct rows_stats' to make it easy to clear all of them at once. ha_read_key_misses added to STATUS_VAR Notes: userstat.result has a change of handler_read_key. This is because use-stat-tables is now disabled for the test.

Michael Widenius added a comment - 2024-01-02 11:55

Code can be found in bb-11.5-monty

Michael Widenius added a comment - 2024-01-02 11:55 Code can be found in bb-11.5-monty

Sergei Golubchik added a comment - 2024-03-23 13:08

to test this feature separately please use bb-11.5-~~MDEV-33151~~-userstat

Sergei Golubchik added a comment - 2024-03-23 13:08 to test this feature separately please use bb-11.5- MDEV-33151 -userstat

Elena Stepanova added a comment - 2024-04-16 12:53

Current status:

~~MDEV-33913~~ is a simple bug which needs to be fixed, a link to the proposed fix is in a comment.

In regard to KEY_READ_HITS and KEY_READ_MISSES, I find the functionality confusing. First, the counters are located strangely. I would expect that key_read_hits and key_read_misses in their implemented semantics (number of rows the key has found / hasn't found) to make sense for per-index statistics, while for client/user/table it makes more sense to count the number of queries which used or didn't use the index. I'm not sure how much use is in counting total number of key hits/misses per client, without detalisation per index.

Secondly, ~~MDEV-33916~~ was filed for KEY_READ_MISSES at least in CLIENT/USER statistics not being meaningful .

Further, I find the KEY_READ_HITS results strange even for simple queries, like

MariaDB [test]> create or replace table t (a int primary key);

Query OK, 0 rows affected (0.901 sec)

MariaDB [test]> insert into t values (1),(2),(3),(4),(5);

Query OK, 5 rows affected (0.014 sec)

Records: 5  Duplicates: 0  Warnings: 0

MariaDB [test]> flush table_statistics;

Query OK, 0 rows affected (0.001 sec)

MariaDB [test]> select a from t where a > 3;

+---+

| a |

+---+

| 4 |

| 5 |

+---+

2 rows in set (0.001 sec)

MariaDB [test]> show table_statistics \G

*************************** 1. row ***************************

           Table_schema: test

             Table_name: t

              Rows_read: 2

           Rows_changed: 0

Rows_changed_x_#indexes: 0

          Rows_inserted: 0

           Rows_updated: 0

           Rows_deleted: 0

          Key_read_hits: 1

        Key_read_misses: 0

1 row in set (0.001 sec)

MariaDB [test]> select a from t where a in (4,5,6,7);

+---+

| a |

+---+

| 4 |

| 5 |

+---+

2 rows in set (0.001 sec)

MariaDB [test]> show table_statistics \G

*************************** 1. row ***************************

           Table_schema: test

             Table_name: t

              Rows_read: 4

           Rows_changed: 0

Rows_changed_x_#indexes: 0

          Rows_inserted: 0

           Rows_updated: 0

           Rows_deleted: 0

          Key_read_hits: 3

        Key_read_misses: 2

1 row in set (0.001 sec)

That is, for the first query the hits are "1", for the second query "2". It can probably be justified from the internal implementation perspective, but would be hard to explain to users.

Given the above, I think it would be useful to have the feature reviewed by an optimizer expert (it doesn't look like it was reviewed at all). psergei agreed to do this, so I'm assigning it to him.

Elena Stepanova added a comment - 2024-04-16 12:53 Current status: MDEV-33913 is a simple bug which needs to be fixed, a link to the proposed fix is in a comment. In regard to KEY_READ_HITS and KEY_READ_MISSES , I find the functionality confusing. First, the counters are located strangely. I would expect that key_read_hits and key_read_misses in their implemented semantics (number of rows the key has found / hasn't found) to make sense for per-index statistics, while for client/user/table it makes more sense to count the number of queries which used or didn't use the index. I'm not sure how much use is in counting total number of key hits/misses per client, without detalisation per index. Secondly, MDEV-33916 was filed for KEY_READ_MISSES at least in CLIENT/USER statistics not being meaningful . Further, I find the KEY_READ_HITS results strange even for simple queries, like MariaDB [test]> create or replace table t (a int primary key ); Query OK, 0 rows affected (0.901 sec) MariaDB [test]> insert into t values (1),(2),(3),(4),(5); Query OK, 5 rows affected (0.014 sec) Records: 5 Duplicates: 0 Warnings: 0 MariaDB [test]> flush table_statistics; Query OK, 0 rows affected (0.001 sec) MariaDB [test]> select a from t where a > 3; + ---+ | a | + ---+ | 4 | | 5 | + ---+ 2 rows in set (0.001 sec) MariaDB [test]> show table_statistics \G *************************** 1. row *************************** Table_schema: test Table_name: t Rows_read: 2 Rows_changed: 0 Rows_changed_x_#indexes: 0 Rows_inserted: 0 Rows_updated: 0 Rows_deleted: 0 Key_read_hits: 1 Key_read_misses: 0 1 row in set (0.001 sec) MariaDB [test]> select a from t where a in (4,5,6,7); + ---+ | a | + ---+ | 4 | | 5 | + ---+ 2 rows in set (0.001 sec) MariaDB [test]> show table_statistics \G *************************** 1. row *************************** Table_schema: test Table_name: t Rows_read: 4 Rows_changed: 0 Rows_changed_x_#indexes: 0 Rows_inserted: 0 Rows_updated: 0 Rows_deleted: 0 Key_read_hits: 3 Key_read_misses: 2 1 row in set (0.001 sec) That is, for the first query the hits are "1", for the second query "2". It can probably be justified from the internal implementation perspective, but would be hard to explain to users. Given the above, I think it would be useful to have the feature reviewed by an optimizer expert (it doesn't look like it was reviewed at all). psergei agreed to do this, so I'm assigning it to him.

Sergei Petrunia added a comment - 2024-05-13 14:22 - edited

That is, for the first query the hits are "1", for the second query "2". It can probably be justified from the internal implementation perspective, but would be hard to explain to users.

The query

MariaDB [test]> select a from t where a > 3;

has made one "hit" where it was looking for row with a right after "3".
The row that it has read after this was not counted.

MariaDB [test]> select a from t where a in (4,5,6,7);

Here we found rows with a=4 and a=5 (two hits) and haven't found anything with a=6 and a=7 (two misses)...

I'm still thinking about how one interprets that...
KEY_READ_HITS and KEY_READ_MISSES are the lookups that were made into indexes.
Index reads normally are:
1. a lookup which can either find row or not. Lookup is relatively expensive.
2. possibly, reads of one or more subsequent/previous index records. These should be relatively cheaper.

This patch counts #1, it counts both the case when the row is found and when it is not. It looks somewhat incomplete that #2 are not counted...
and I agree that one would expect to see these statistics collected on per-index basis, too.

Another question is: should lookups to find the first / last index records be counted as KEY_READ_HITS... They are also index dives so it would be logical to count them...

Sergei Petrunia added a comment - 2024-05-13 14:22 - edited That is, for the first query the hits are "1", for the second query "2". It can probably be justified from the internal implementation perspective, but would be hard to explain to users. The query MariaDB [test]> select a from t where a > 3; has made one "hit" where it was looking for row with a right after "3". The row that it has read after this was not counted. MariaDB [test]> select a from t where a in (4,5,6,7); Here we found rows with a=4 and a=5 (two hits) and haven't found anything with a=6 and a=7 (two misses)... I'm still thinking about how one interprets that... KEY_READ_HITS and KEY_READ_MISSES are the lookups that were made into indexes. Index reads normally are: 1. a lookup which can either find row or not. Lookup is relatively expensive. 2. possibly, reads of one or more subsequent/previous index records. These should be relatively cheaper. This patch counts #1, it counts both the case when the row is found and when it is not. It looks somewhat incomplete that #2 are not counted... and I agree that one would expect to see these statistics collected on per-index basis, too. Another question is: should lookups to find the first / last index records be counted as KEY_READ_HITS ... They are also index dives so it would be logical to count them...

Sergei Petrunia added a comment - 2024-05-20 11:10

Hitting Review Done as the patch does what it says it will do...

I have the following comments (also elaborated above), but it's a matter of opinion whether to consider them to be within the scope of this MDEV or outside:

Should we have a counter like KEY_READ_NEXT to count rows that we read from index in next[_same] and prev[_same] calls?
Should we count index_first()/index_last() as lookups also?
Should we have KEY_READ_HITS/MISSES in INDEX statistics?

Sergei Petrunia added a comment - 2024-05-20 11:10 Hitting Review Done as the patch does what it says it will do... I have the following comments (also elaborated above), but it's a matter of opinion whether to consider them to be within the scope of this MDEV or outside: Should we have a counter like KEY_READ_NEXT to count rows that we read from index in next [_same] and prev [_same] calls? Should we count index_first()/index_last() as lookups also? Should we have KEY_READ_HITS/MISSES in INDEX statistics?

People

Assignee:: Sergei Golubchik

Reporter:: Michael Widenius

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2024-01-02 11:51

Updated:: 2024-06-15 13:08

Resolved:: 2024-05-27 11:23

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server