[MDEV-18188] Maintain persistent COUNT(*) in InnoDB - Jira

Details

Type: New Feature
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Storage Engine - InnoDB
Labels:
- performance
- statistics

Description

Query planning needs to know the number of records in a table. Currently, InnoDB only provides an estimate of this.

If InnoDB kept accurate track of the number of records in a table, then it would not only benefit statistics, but also limited cases like the following:

SELECT COUNT(*) FROM t LOCK IN SHARE MODE;

SELECT COUNT(*) FROM t FOR UPDATE;

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

SELECT COUNT(*) FROM t;

At the moment, InnoDB would scan the entire table in order to count the records. If we maintained a durable count of committed records, we could instantly return the results for the above cases. The locking variants would simply lock the table. For the default REPEATABLE READ we would still have to count the records.

However, if we additionally maintained a count of uncommitted rows, then we could have instant COUNT in most cases:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

SELECT COUNT(*) FROM t;

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- instantaneous if committed count = uncommitted count

SELECT COUNT(*) FROM t;

These counts could possibly be maintained in the clustered index root page. If this hurts concurrency too much (in MDEV-6076, storing the persistent AUTO_INCREMENT there did not seem to hurt), then we could partition the two counters into multiple pages.

Attachments

Issue Links

is duplicated by

MDEV-36200 Slow Performance with COUNT() on InnoDB Tables - Inquiry for Faster Alternatives

Closed

relates to

MDEV-10128 Port MySQL 5.7 select count() optimization

Closed

MDEV-17605 Statistics for InnoDB table is wrong if persistent statistics is used

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2019-02-24 20:00

While doing this, we could also introduce a persistent checksum on the data, synchronized with every update of the clustered index. The checksum would necessarily be something else than what is currently returned by CHECKSUM TABLE, because it would depend on the page contents and would not be updated on instant ALTER TABLE operations that would affect the logical data format.

Marko Mäkelä added a comment - 2019-02-24 20:00 While doing this, we could also introduce a persistent checksum on the data, synchronized with every update of the clustered index. The checksum would necessarily be something else than what is currently returned by CHECKSUM TABLE , because it would depend on the page contents and would not be updated on instant ALTER TABLE operations that would affect the logical data format.

Marko Mäkelä added a comment - 2020-12-03 06:45

Based on some technical difficulties that we have encountered during the development of ~~MDEV-515~~, I think we’d better store the persistent count information in the clustered index root page, in place of the "infimum" and "supremum" strings. ~~MDEV-15562~~ already repurposed some of these bytes in 10.4. The original idea was to extend the ~~MDEV-15562~~ file format change in a different way, that is, storing the information in the metadata BLOB. Using the metadata BLOB would break IMPORT TABLESPACE (~~MDEV-18543~~) and if we go with the planned limitation, disable ~~MDEV-515~~.

Neither way of extending the file format would support persistent COUNT(*) for ROW_FORMAT=COMPRESSED tables, which we are hoping to phase out, as noted in ~~MDEV-23497~~.

We could protect the updates of the persistent count in a similar way as we protect the changes of the persistent AUTO_INCREMENT (~~MDEV-6076~~). Similar to that field, there should be no reader-writer conflicts even if we allow concurrent reads on the page, because both the persistent AUTO_INCREMENT and COUNT(*) would only be read when the table is added to the data dictionary cache.

Marko Mäkelä added a comment - 2020-12-03 06:45 Based on some technical difficulties that we have encountered during the development of MDEV-515 , I think we’d better store the persistent count information in the clustered index root page, in place of the "infimum" and "supremum" strings. MDEV-15562 already repurposed some of these bytes in 10.4. The original idea was to extend the MDEV-15562 file format change in a different way, that is, storing the information in the metadata BLOB. Using the metadata BLOB would break IMPORT TABLESPACE ( MDEV-18543 ) and if we go with the planned limitation, disable MDEV-515 . Neither way of extending the file format would support persistent COUNT(*) for ROW_FORMAT=COMPRESSED tables, which we are hoping to phase out, as noted in MDEV-23497 . We could protect the updates of the persistent count in a similar way as we protect the changes of the persistent AUTO_INCREMENT ( MDEV-6076 ). Similar to that field, there should be no reader-writer conflicts even if we allow concurrent reads on the page, because both the persistent AUTO_INCREMENT and COUNT(*) would only be read when the table is added to the data dictionary cache.

People

Assignee:: Thirunarayanan Balathandayuthapani

Reporter:: Marko Mäkelä

Votes:: 5 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 2019-01-09 15:27

Updated:: 2025-02-28 08:58

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server