I compared the "explain" output of two select count(*) from table, identical tables, one mariadb and the other "mysql 8.3", and they are different. The tables have one single field, varchar(10) and it's the primary key. I need to understand the difference:
Mariadb:
explain extended select count(*) from goodnumber force index(primary);
The fastest result is the latter, like 10 times. Does it mean that the commercial version of MySQL by Oracle has a better optimizer and it counts the records on a table about 10 times faster?
I wonder if we should default to using the PRIMARY KEY or clustered index for COUNT( * ) queries on InnoDB tables. If a secondary index is chosen where there is a lot of history available (say, a column that is covered by that index has been frequently updated, or there have been massive DELETE in the table), then the MVCC access to the secondary index records could be extremely slow until MDEV-17598 has been implemented. I covered this in a presentation some time ago: https://mariadb.org/fest2022/how-innodb-multi-version-concurrency-control-mvcc-works/
Evaluating COUNT by a table scan can be more costly if the records are wide (only a few clustered index records per leaf page), but the worst-case performance should be rather limited, unless the read view is very old and many old versions of records need to be retrieved. In secondary indexes, the MVCC overhead may be paid for any read view. It is a kind of death spiral to performance, because as noted in MDEV-33213, resolving the purge lag will require the MVCC checks to be executed in secondary indexes.
Marko Mäkelä
added a comment - I wonder if we should default to using the PRIMARY KEY or clustered index for COUNT( * ) queries on InnoDB tables. If a secondary index is chosen where there is a lot of history available (say, a column that is covered by that index has been frequently updated, or there have been massive DELETE in the table), then the MVCC access to the secondary index records could be extremely slow until MDEV-17598 has been implemented. I covered this in a presentation some time ago: https://mariadb.org/fest2022/how-innodb-multi-version-concurrency-control-mvcc-works/
Evaluating COUNT by a table scan can be more costly if the records are wide (only a few clustered index records per leaf page), but the worst-case performance should be rather limited, unless the read view is very old and many old versions of records need to be retrieved. In secondary indexes, the MVCC overhead may be paid for any read view. It is a kind of death spiral to performance, because as noted in MDEV-33213 , resolving the purge lag will require the MVCC checks to be executed in secondary indexes.
The same operation takes less than 60 seconds with Oracle's MySQL.
It's time to do parallel scans.
Philip orleans
added a comment - - edited
select count(*) from dno;
+------------+
| count(*) |
+------------+
| 1034686910 |
+------------+
1 row in set (8 min 5.076 sec)
The same operation takes less than 60 seconds with Oracle's MySQL.
It's time to do parallel scans.
Time increased 10 times and was 1 hour 4 min 17,21 sec:
mysql> selectcount(*) from dnosimple;
+------------+
| count(*) |
+------------+
| 1062468947 |
+------------+
1 row inset (1 hour 4 min 17,21 sec)
Lena Startseva
added a comment - - edited Check on my laptop ( https://jira.mariadb.org/browse/MDEV-33446?focusedCommentId=282972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-282972 ) innodb_parallel_read_threads for MySQL 8.3 . By default it was
+ ------------------------------+-------+
| Variable_name | Value |
+ ------------------------------+-------+
| innodb_parallel_read_threads | 4 |
+ ------------------------------+-------+
and time for execution was 6 min 33,13 sec :
mysql> select count (*) from dnosimple;
+ ------------+
| count (*) |
+ ------------+
| 1062468947 |
+ ------------+
1 row in set (6 min 33,13 sec)
For:
+ ------------------------------+-------+
| Variable_name | Value |
+ ------------------------------+-------+
| innodb_parallel_read_threads | 1 |
+ ------------------------------+-------+
Time increased 10 times and was 1 hour 4 min 17,21 sec :
mysql> select count (*) from dnosimple;
+ ------------+
| count (*) |
+ ------------+
| 1062468947 |
+ ------------+
1 row in set (1 hour 4 min 17,21 sec)
I wonder if we should default to using the PRIMARY KEY or clustered index for COUNT( * ) queries on InnoDB tables. If a secondary index is chosen where there is a lot of history available (say, a column that is covered by that index has been frequently updated, or there have been massive DELETE in the table), then the MVCC access to the secondary index records could be extremely slow until MDEV-17598 has been implemented. I covered this in a presentation some time ago: https://mariadb.org/fest2022/how-innodb-multi-version-concurrency-control-mvcc-works/
Evaluating COUNT by a table scan can be more costly if the records are wide (only a few clustered index records per leaf page), but the worst-case performance should be rather limited, unless the read view is very old and many old versions of records need to be retrieved. In secondary indexes, the MVCC overhead may be paid for any read view. It is a kind of death spiral to performance, because as noted in
MDEV-33213, resolving the purge lag will require the MVCC checks to be executed in secondary indexes.