Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18188

Maintain persistent COUNT(*) in InnoDB

Details

    Description

      Query planning needs to know the number of records in a table. Currently, InnoDB only provides an estimate of this.

      If InnoDB kept accurate track of the number of records in a table, then it would not only benefit statistics, but also limited cases like the following:

      SELECT COUNT(*) FROM t LOCK IN SHARE MODE;
      SELECT COUNT(*) FROM t FOR UPDATE;
      SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
      SELECT COUNT(*) FROM t;

      At the moment, InnoDB would scan the entire table in order to count the records. If we maintained a durable count of committed records, we could instantly return the results for the above cases. The locking variants would simply lock the table. For the default REPEATABLE READ we would still have to count the records.

      However, if we additionally maintained a count of uncommitted rows, then we could have instant COUNT in most cases:

      SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
      SELECT COUNT(*) FROM t;
      SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
      -- instantaneous if committed count = uncommitted count
      SELECT COUNT(*) FROM t;

      These counts could possibly be maintained in the clustered index root page. If this hurts concurrency too much (in MDEV-6076, storing the persistent AUTO_INCREMENT there did not seem to hurt), then we could partition the two counters into multiple pages.

      Attachments

        Issue Links

          Activity

            While doing this, we could also introduce a persistent checksum on the data, synchronized with every update of the clustered index. The checksum would necessarily be something else than what is currently returned by CHECKSUM TABLE, because it would depend on the page contents and would not be updated on instant ALTER TABLE operations that would affect the logical data format.

            marko Marko Mäkelä added a comment - While doing this, we could also introduce a persistent checksum on the data, synchronized with every update of the clustered index. The checksum would necessarily be something else than what is currently returned by CHECKSUM TABLE , because it would depend on the page contents and would not be updated on instant ALTER TABLE operations that would affect the logical data format.

            Based on some technical difficulties that we have encountered during the development of MDEV-515, I think we’d better store the persistent count information in the clustered index root page, in place of the "infimum" and "supremum" strings. MDEV-15562 already repurposed some of these bytes in 10.4. The original idea was to extend the MDEV-15562 file format change in a different way, that is, storing the information in the metadata BLOB. Using the metadata BLOB would break IMPORT TABLESPACE (MDEV-18543) and if we go with the planned limitation, disable MDEV-515.

            Neither way of extending the file format would support persistent COUNT(*) for ROW_FORMAT=COMPRESSED tables, which we are hoping to phase out, as noted in MDEV-23497.

            We could protect the updates of the persistent count in a similar way as we protect the changes of the persistent AUTO_INCREMENT (MDEV-6076). Similar to that field, there should be no reader-writer conflicts even if we allow concurrent reads on the page, because both the persistent AUTO_INCREMENT and COUNT(*) would only be read when the table is added to the data dictionary cache.

            marko Marko Mäkelä added a comment - Based on some technical difficulties that we have encountered during the development of MDEV-515 , I think we’d better store the persistent count information in the clustered index root page, in place of the "infimum" and "supremum" strings. MDEV-15562 already repurposed some of these bytes in 10.4. The original idea was to extend the MDEV-15562 file format change in a different way, that is, storing the information in the metadata BLOB. Using the metadata BLOB would break IMPORT TABLESPACE ( MDEV-18543 ) and if we go with the planned limitation, disable MDEV-515 . Neither way of extending the file format would support persistent COUNT(*) for ROW_FORMAT=COMPRESSED tables, which we are hoping to phase out, as noted in MDEV-23497 . We could protect the updates of the persistent count in a similar way as we protect the changes of the persistent AUTO_INCREMENT ( MDEV-6076 ). Similar to that field, there should be no reader-writer conflicts even if we allow concurrent reads on the page, because both the persistent AUTO_INCREMENT and COUNT(*) would only be read when the table is added to the data dictionary cache.

            People

              thiru Thirunarayanan Balathandayuthapani
              marko Marko Mäkelä
              Votes:
              5 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.