Details

    Description

      I compared the "explain" output of two select count(*) from table, identical tables, one mariadb and the other "mysql 8.3", and they are different. The tables have one single field, varchar(10) and it's the primary key. I need to understand the difference:

      Mariadb:

      explain extended select count(*) from goodnumber force index(primary);
       
      +------+-------------+------------+------+---------------+------+---------+------+-----------+----------+-------+
      | id   | select_type | table      | type | possible_keys | key  | key_len | ref  | rows      | filtered | Extra |
      +------+-------------+------------+------+---------------+------+---------+------+-----------+----------+-------+
      |    1 | SIMPLE      | goodnumber | ALL  | NULL          | NULL | NULL    | NULL | 315011080 |   100.00 |       |
      +------+-------------+------------+------+---------------+------+---------+------+-----------+----------+-------+, 
      Mysql, explain select count(*) from goodnumber;
      +----+-------------+------------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+
      | id | select_type | table      | partitions | type  | possible_keys | key     | key_len | ref  | rows      | filtered | Extra       |
      +----+-------------+------------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+
      |  1 | SIMPLE      | goodnumber | NULL       | index | NULL          | PRIMARY | 42      | NULL | 315054340 |   100.00 | Using index |
      +----+-------------+------------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+.
      

      The fastest result is the latter, like 10 times. Does it mean that the commercial version of MySQL by Oracle has a better optimizer and it counts the records on a table about 10 times faster?

      Attachments

        Issue Links

          Activity

            I wonder if we should default to using the PRIMARY KEY or clustered index for COUNT( * ) queries on InnoDB tables. If a secondary index is chosen where there is a lot of history available (say, a column that is covered by that index has been frequently updated, or there have been massive DELETE in the table), then the MVCC access to the secondary index records could be extremely slow until MDEV-17598 has been implemented. I covered this in a presentation some time ago: https://mariadb.org/fest2022/how-innodb-multi-version-concurrency-control-mvcc-works/

            Evaluating COUNT by a table scan can be more costly if the records are wide (only a few clustered index records per leaf page), but the worst-case performance should be rather limited, unless the read view is very old and many old versions of records need to be retrieved. In secondary indexes, the MVCC overhead may be paid for any read view. It is a kind of death spiral to performance, because as noted in MDEV-33213, resolving the purge lag will require the MVCC checks to be executed in secondary indexes.

            marko Marko Mäkelä added a comment - I wonder if we should default to using the PRIMARY KEY or clustered index for COUNT( * ) queries on InnoDB tables. If a secondary index is chosen where there is a lot of history available (say, a column that is covered by that index has been frequently updated, or there have been massive DELETE in the table), then the MVCC access to the secondary index records could be extremely slow until MDEV-17598 has been implemented. I covered this in a presentation some time ago: https://mariadb.org/fest2022/how-innodb-multi-version-concurrency-control-mvcc-works/ Evaluating COUNT by a table scan can be more costly if the records are wide (only a few clustered index records per leaf page), but the worst-case performance should be rather limited, unless the read view is very old and many old versions of records need to be retrieved. In secondary indexes, the MVCC overhead may be paid for any read view. It is a kind of death spiral to performance, because as noted in MDEV-33213 , resolving the purge lag will require the MVCC checks to be executed in secondary indexes.
            philip_38 Philip orleans added a comment - - edited

            select count(*) from dno;                     
            +------------+
            | count(*)   |
            +------------+
            | 1034686910 |
            +------------+
            1 row in set (8 min 5.076 sec)
            

            The same operation takes less than 60 seconds with Oracle's MySQL.
            It's time to do parallel scans.

            philip_38 Philip orleans added a comment - - edited select count(*) from dno; +------------+ | count(*) | +------------+ | 1034686910 | +------------+ 1 row in set (8 min 5.076 sec) The same operation takes less than 60 seconds with Oracle's MySQL. It's time to do parallel scans.
            lstartseva Lena Startseva added a comment - - edited

            Check on my laptop (https://jira.mariadb.org/browse/MDEV-33446?focusedCommentId=282972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-282972) innodb_parallel_read_threads for MySQL 8.3. By default it was

            +------------------------------+-------+
            | Variable_name                | Value |
            +------------------------------+-------+
            | innodb_parallel_read_threads | 4     |
            +------------------------------+-------+
            

            and time for execution was 6 min 33,13 sec:

            mysql> select count(*) from dnosimple;
            +------------+
            | count(*)   |
            +------------+
            | 1062468947 |
            +------------+
            1 row in set (6 min 33,13 sec)
            

            For:

            +------------------------------+-------+
            | Variable_name                | Value |
            +------------------------------+-------+
            | innodb_parallel_read_threads | 1     |
            +------------------------------+-------+
            

            Time increased 10 times and was 1 hour 4 min 17,21 sec:

            mysql> select count(*) from dnosimple;
            +------------+
            | count(*)   |
            +------------+
            | 1062468947 |
            +------------+
            1 row in set (1 hour 4 min 17,21 sec)
            

            lstartseva Lena Startseva added a comment - - edited Check on my laptop ( https://jira.mariadb.org/browse/MDEV-33446?focusedCommentId=282972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-282972 ) innodb_parallel_read_threads for MySQL 8.3 . By default it was + ------------------------------+-------+ | Variable_name | Value | + ------------------------------+-------+ | innodb_parallel_read_threads | 4 | + ------------------------------+-------+ and time for execution was 6 min 33,13 sec : mysql> select count (*) from dnosimple; + ------------+ | count (*) | + ------------+ | 1062468947 | + ------------+ 1 row in set (6 min 33,13 sec) For: + ------------------------------+-------+ | Variable_name | Value | + ------------------------------+-------+ | innodb_parallel_read_threads | 1 | + ------------------------------+-------+ Time increased 10 times and was 1 hour 4 min 17,21 sec : mysql> select count (*) from dnosimple; + ------------+ | count (*) | + ------------+ | 1062468947 | + ------------+ 1 row in set (1 hour 4 min 17,21 sec)

            Care to explain?

            philip_38 Philip orleans added a comment - Care to explain?

            This is very interesting. Both versions should provide similar paths.

            philip_38 Philip orleans added a comment - This is very interesting. Both versions should provide similar paths.

            People

              alice Alice Sherepa
              philip_38 Philip orleans
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.