Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
11.4.0, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL)
-
Linux
Description
I compared the "explain" output of two select count(*) from table, identical tables, one mariadb and the other "mysql 8.3", and they are different. The tables have one single field, varchar(10) and it's the primary key. I need to understand the difference:
Mariadb:
explain extended select count(*) from goodnumber force index(primary);
|
|
+------+-------------+------------+------+---------------+------+---------+------+-----------+----------+-------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+------+-------------+------------+------+---------------+------+---------+------+-----------+----------+-------+
|
| 1 | SIMPLE | goodnumber | ALL | NULL | NULL | NULL | NULL | 315011080 | 100.00 | |
|
+------+-------------+------------+------+---------------+------+---------+------+-----------+----------+-------+,
|
Mysql, explain select count(*) from goodnumber;
|
+----+-------------+------------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+
|
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|
+----+-------------+------------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+
|
| 1 | SIMPLE | goodnumber | NULL | index | NULL | PRIMARY | 42 | NULL | 315054340 | 100.00 | Using index |
|
+----+-------------+------------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+.
|
The fastest result is the latter, like 10 times. Does it mean that the commercial version of MySQL by Oracle has a better optimizer and it counts the records on a table about 10 times faster?
Attachments
Issue Links
- relates to
-
MDEV-5004 Support parallel read transactions on the same snapshot
-
- Open
-
-
MDEV-6096 Ideas about parallel query execution
-
- Open
-
-
MDEV-34125 ANALYZE FORMAT=JSON: r_engine_stats.pages_read_time_ms has wrong scale
-
- Closed
-
Activity
Just out of curiosity, why do we have the "force index" option, if the optimizer disregards it? It makes no sense. KIndly remove the option or make it work.
Server version: 11.3.2-MariaDB-1:11.3.2+maria~ubu2004 mariadb.org binary distribution
The issue with the optimizer or with some other part of the Mariadb software is deeper
Consider a table like
show create table dnc.dnosimple;
|
+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
|
| Table | Create Table |
|
+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
|
| dnosimple | CREATE TABLE `dnosimple` (
|
`ani` bigint(20) unsigned NOT NULL,
|
PRIMARY KEY (`ani`)
|
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci |
|
then, inside a stored procedure I have two variables,
anilongv bigint unsigned
isdnov tinyint
this is from INSIDE the stored procedure
EXPLAIN SELECT count(*) INTO isdnov FROM dnc.dnosimple FORCE INDEX (PRIMARY) WHERE ani = anilongv;
|
|
+------+-------------+-----------+------+---------------+------+---------+------+------------+-------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-----------+------+---------------+------+---------+------+------------+-------+
|
| 1 | SIMPLE | dnosimple | ALL | NULL | NULL | NULL | NULL | 1057261209 | |
|
+------+-------------+-----------+------+---------------+------+---------+------+------------+-------+
|
which is dead wrong, in fact, the query has to be stopped because it scans the table while having a primary key of the same data type.
however, this is the weird thing: I created a new stored procedure
SET NAMES 'latin1';
|
USE asterisk;
|
DELIMITER $$
|
CREATE
|
DEFINER = 'root'@'%'
|
PROCEDURE search_dno (IN aniv bigint UNSIGNED, INOUT kount tinyint)
|
BEGIN
|
SELECT
|
COUNT(*) INTO kount
|
FROM dnc.dnosimple FORCE INDEX (PRIMARY)
|
WHERE ani = aniv;
|
END
|
$$
|
DELIMITER ;
|
and while the "explain" is still wrong, the query does work as expected. This is a table with over 1BN unique numbers.
Naturally, I would like to run the query directly, not via a second stored procedure.
philip_38, it looks very odd that a simple query with WHERE primary_key=const decides to use a full table (or index) scan.
I see in your pasete:
USE asterisk; |
SELECT
|
COUNT INTO kount |
FROM dnc.dnosimple FORCE INDEX (PRIMARY) |
is the table dnosimple from the right database?
I try to reproduce and the query uses const access and reads one row, as expected:
MariaDB [test]> CREATE TABLE `dnosimple` (
|
-> `ani` bigint(20) unsigned NOT NULL,
|
-> PRIMARY KEY (`ani`)
|
-> ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci
|
-> ;
|
Query OK, 0 rows affected (0.001 sec)
|
|
MariaDB [test]> insert into dnosimple select seq from seq_1_to_100000;
|
Query OK, 100000 rows affected (0.112 sec)
|
Records: 100000 Duplicates: 0 Warnings: 0
|
|
MariaDB [test]> SET NAMES 'latin1';
|
Query OK, 0 rows affected (0.000 sec)
|
|
MariaDB [test]> DELIMITER $$
|
MariaDB [test]> CREATE
|
-> PROCEDURE search_dno (IN aniv bigint UNSIGNED, INOUT kount tinyint)
|
-> BEGIN
|
-> EXPLAIN SELECT
|
-> COUNT(*) INTO kount
|
-> FROM dnosimple
|
-> WHERE ani = aniv;
|
-> END
|
-> $$
|
Query OK, 0 rows affected (0.000 sec)
|
|
MariaDB [test]> CALL search_dno(10, @var);
|
-> $$
|
+------+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+
|
| 1 | SIMPLE | dnosimple | const | PRIMARY | PRIMARY | 8 | const | 1 | |
|
+------+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+
|
1 row in set (0.000 sec)
|
|
Query OK, 0 rows affected (0.000 sec)
|
(This is from 11.0 but I'm sure other versions would behave the same for this query).
Are you sure you don't have another dnosimple table in another database with ani column defined as VARCHAR? That would explain the use of full index scan...
you've mentioned in the report:
The tables have one single field, varchar(10) and it's the primary key.
VARCHAR...
if the column in the table is VARCHAR, be sure to compare it with a VARCHAR SP parameter or a string constant...
As for the speed of full table vs full index scan.
The tables have one single field, varchar(10) and it's the primary key.
In this case, if the table is InnoDB, then full table scan and full index scan are essentially the same thing. full table scan is a full index scan on PK. It is very surprising to see any difference in performance.
for comparing across versions, I am trying this:
create or replace table t(i int primary key); |
insert into t select seq from seq_1_to_10000000; |
|
explain extended select count(*) from t ; |
select count(*) from t; |
select count(*) from t; |
select count(*) from t; |
|
explain extended select count(*) from t force index (primary); |
select count(*) from t force index (primary) ; |
select count(*) from t force index (primary) ; |
select count(*) from t force index (primary) ; |
select count(*) from t force index (primary) ; |
The results are :
mdev33446-log-10.11.txt mdev33446-log-11.0.txt
11.0 is slightly faster.
The same table
CREATE TABLE `dnosimple` ( |
`ani` bigint unsigned NOT NULL, |
PRIMARY KEY (`ani`) |
) ENGINE=InnoDB
|
with the same data, 1.062 BN records, look the difference
MySQL 8.3
select count(*) from dnosimple;
|
+------------+
|
| count(*) |
|
+------------+
|
| 1062468947 |
|
+------------+
|
1 row in set (19.63 sec)
|
Mariadb 11.4.1
select count(*) from dnosimple;
|
+------------+
|
| count(*) |
|
+------------+
|
| 1062468947 |
|
+------------+
|
1 row in set (2 min 15.086 sec)
|
I know the hardware is different, but there is something else here.
You may download my data from "wget https://fcc.x1.uy/static/dnosimple.zip". Please load this data, not random or sequential data. "time mysqlimport testdb --local dnosimple.csv". It takes 30 min on both platforms
On my laptop with 64 Gb of RAM I got the next result with the data and query from comment above:
Query execution time | Size of dnosimple.ibd | |
---|---|---|
MySQL 8.3 | 6 min 26,52 sec | 30,7 GB (30 681 333 760 bytes) |
Mariadb 11.4.1 | 30 min 15,338 sec | 34,8 GB (34 837 889 024 bytes) |
ANALYZE FORMAT=JSON select count(*) from dnosimple;
|
gives the next result:
| {
|
"query_optimization": { |
"r_total_time_ms": 0.242982647 |
},
|
"query_block": { |
"select_id": 1, |
"cost": 174691.1825, |
"r_loops": 1, |
"r_total_time_ms": 1822554.621, |
"nested_loop": [ |
{
|
"table": { |
"table_name": "dnosimple", |
"access_type": "ALL", |
"loops": 1, |
"r_loops": 1, |
"rows": 1060618416, |
"r_rows": 1062468947, |
"cost": 174691.1825, |
"r_table_time_ms": 1425235.95, |
"r_other_time_ms": 397318.6493, |
"r_engine_stats": { |
"pages_accessed": 1850995, |
"pages_read_count": 35272, |
"pages_read_time_ms": 14657261.27 |
},
|
"filtered": 100, |
"r_filtered": 100 |
}
|
}
|
]
|
}
|
} |
|
The large pages_read_time_ms caught my eye. This could possibly be explained by MDEV-31227. You might also want to try SET GLOBAL innodb_random_read_ahead=ON.
I set innodb_random_read_ahead=ON and is marginally faster, but still very far away from MySQL (Oracle).
For an apples-to-apples comparison, you should either disable or enable the use of the file system cache (innodb_flush_method=O_DIRECT or innodb_flush_method=fsync) on versions, and use the same innodb_buffer_pool_size in both. The motivation of MDEV-24854 was to improve write performance.
I did match exactly the settings, as above. The difference is still huge.
InnoDB buffer_pool_size= 128M, MariaDB 10.6 compiled with --debug (should be >> 2x slower than optimized binary)
create table foo (a varchar(10)) engine=innodb select left(md5(seq),10) as a from seq_1_to_31011080;
|
31011080 rows affected (18 min 37.844 sec)
|
MariaDB [test]> select count(*) from foo;
|
+----------+
|
| count(*) |
|
+----------+
|
| 31011080 |
|
+----------+
|
1 row in set (2 min 34.729 sec)
|
|
I tried to run with 10.6-debug and a bigger buffer pool:
|
sql/mariadbd --innodb-buffer-pool-size=40G
|
but this did not work as all time is spent in innodb buffer pool checks.
|
Perf shows:
|
24.86% mariadbd [.] ut_list_validate<ut_list_base<buf_pag
|
20.40% mariadbd [.] CheckInFreeList::operator()
|
16.52% mariadbd [.] Atomic_relaxed<unsigned int>::operato
|
11.02% mariadbd [.] buf_pool_t::validate
|
|
Testing 11.4 compiled without debug and 40G buffer pool:
|
create or replace table foo (a varchar(10)) engine=innodb select left(md5(seq),10) as a from seq_1_to_31011080;
|
|
MariaDB [test]> select count(*) from foo;
|
+----------+
|
| count(*) |
|
+----------+
|
| 31011080 |
|
+----------+
|
1 row in set (5.133 sec)
|
1 row in set (4.999 sec) (second run)
|
|
Restarting with empty buffer pool:
|
+----------+
|
| count(*) |
|
+----------+
|
| 31011080 |
|
+----------+
|
1 row in set (6.171 sec)
|
|
Restarting with empty buffer pool of 128M
|
+----------+
|
| count(*) |
|
+----------+
|
| 31011080 |
|
+----------+
|
1 row in set (6.638 sec)
|
|
The table here explains some of the issues:
https://www.percona.com/blog/why-select-count-from-table-is-sometimes-very-slow-in-mysql-or-mariadb/
One major difference in newer MySQL versions is that it has parallel scan of InnoDB. This can have a big impact for single user usage of MySQL.
We should try to find out why 'Trx/Same session is slower than 'Alone'. This should not be the cases. Even for Separate session, the slowdown should only happen for pages that has changed since the transaction started.
I did a check with an identical table as philip_38 described with a InnoDB buffer pools size of 128M on MariaDB 11.4:
create or replace table foo2 (a bigint primary key) engine=innodb select seq as a from seq_1_to_1062468947;
|
MariaDB [test]> select count(*) from foo2;
|
+------------+
|
| count(*) |
|
+------------+
|
| 1062468947 |
|
+------------+
|
1 row in set (1 min 55.127 sec)
|
I tested also with set global innodb_random_read_ahead=ON, but there was no notable difference.
MySQL is probably faster for this particular query thanks to parallel full table scan. (Something we should look at implementing).
philip_38, what is your value for innodb_parallel_read_threads ?
MariaDB has a feature, not yet enabled for InnoDB, that would allow sum queries over the whole table to be much faster.
We should look at enable this one at least for count for InnoDB.
This is a vital feature. Virtually all my business uses cont to confirm that a 1.2BN records table matches a flat file. I do this every AM. This is the list of all the ported phone numbers in North America. The FCC releases the data and then I need to update my table. If we can make this work faster, I can have the table ready for clients.
I cannot understand why nobody mentioned this parallel table scan until today.
(The observation about odd value of r_engine_stats.pages_read_time_ms is addressed separately in MDEV-34125)
I wonder if we should default to using the PRIMARY KEY or clustered index for COUNT( * ) queries on InnoDB tables. If a secondary index is chosen where there is a lot of history available (say, a column that is covered by that index has been frequently updated, or there have been massive DELETE in the table), then the MVCC access to the secondary index records could be extremely slow until MDEV-17598 has been implemented. I covered this in a presentation some time ago: https://mariadb.org/fest2022/how-innodb-multi-version-concurrency-control-mvcc-works/
Evaluating COUNT by a table scan can be more costly if the records are wide (only a few clustered index records per leaf page), but the worst-case performance should be rather limited, unless the read view is very old and many old versions of records need to be retrieved. In secondary indexes, the MVCC overhead may be paid for any read view. It is a kind of death spiral to performance, because as noted in MDEV-33213, resolving the purge lag will require the MVCC checks to be executed in secondary indexes.
select count(*) from dno;
|
+------------+
|
| count(*) |
|
+------------+
|
| 1034686910 |
|
+------------+
|
1 row in set (8 min 5.076 sec)
|
The same operation takes less than 60 seconds with Oracle's MySQL.
It's time to do parallel scans.
Check on my laptop (https://jira.mariadb.org/browse/MDEV-33446?focusedCommentId=282972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-282972) innodb_parallel_read_threads for MySQL 8.3. By default it was
+------------------------------+-------+ |
| Variable_name | Value |
|
+------------------------------+-------+ |
| innodb_parallel_read_threads | 4 |
|
+------------------------------+-------+ |
and time for execution was 6 min 33,13 sec:
mysql> select count(*) from dnosimple; |
+------------+ |
| count(*) | |
+------------+ |
| 1062468947 |
|
+------------+ |
1 row in set (6 min 33,13 sec) |
For:
+------------------------------+-------+ |
| Variable_name | Value |
|
+------------------------------+-------+ |
| innodb_parallel_read_threads | 1 |
|
+------------------------------+-------+ |
Time increased 10 times and was 1 hour 4 min 17,21 sec:
mysql> select count(*) from dnosimple; |
+------------+ |
| count(*) | |
+------------+ |
| 1062468947 |
|
+------------+ |
1 row in set (1 hour 4 min 17,21 sec) |
This is very interesting. Both versions should provide similar paths.
Thank you for the report!
Indeed 11.0-11.4 return results much slower, than 10.4-10.11 (they also used primary key, as in Mysql), FORCE index does not help:
MariaDB [test]> create or replace table t(i int primary key);
Query OK, 0 rows affected (0,058 sec)
MariaDB [test]> insert into t select seq from seq_1_to_10000;
Query OK, 10000 rows affected (0,724 sec)
Records: 10000 Duplicates: 0 Warnings: 0
MariaDB [test]> explain extended select count(*) from t ;
+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+
| 1 | SIMPLE | t | ALL | NULL | NULL | NULL | NULL | 10000 | 100.00 | |
+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+
1 row in set, 1 warning (0,002 sec)
Note (Code 1003): select count(0) AS `count(*)` from `test`.`t`
MariaDB [test]> explain extended select count(*) from t force index (primary);
+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+
| 1 | SIMPLE | t | ALL | NULL | NULL | NULL | NULL | 10000 | 100.00 | |
+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+
1 row in set, 1 warning (0,002 sec)
MariaDB [test]> SET optimizer_trace='enabled=on';
Query OK, 0 rows affected (0,001 sec)
MariaDB [test]> select count(*) from t force index (primary) ;
+----------+
| count(*) |
+----------+
| 10000 |
+----------+
1 row in set (0,050 sec)
MariaDB [test]> select * from information_schema.optimizer_trace limit 1\G
*************************** 1. row ***************************
QUERY: select count(*) from t force index (primary)
TRACE: {
"steps": [
{
"join_preparation": {
"select_id": 1,
"steps": [
{
"expanded_query": "select count(0) AS `count(*)` from t FORCE INDEX (PRIMARY)"
}
]
}
},
{
"join_optimization": {
"select_id": 1,
"steps": [
{
"table_dependencies": [
{
"table": "t",
"row_may_be_null": false,
"map_bit": 0,
"depends_on_map_bits": []
}
]
},
{
"rows_estimation": [
{
"table": "t",
"table_scan": {
"rows": 10000,
"read_cost": 1.3345316,
"read_and_compare_cost": 1.6545316
}
}
]
},
{
"considered_execution_plans": [
{
"plan_prefix": "",
"get_costs_for_tables": [
{
"best_access_path": {
"table": "t",
"plan_details": {
"record_count": 1
},
"considered_access_paths": [
{
"access_type": "scan",
"rows": 10000,
"rows_after_filter": 10000,
"rows_out": 10000,
"cost": 1.6545316,
"index_only": false,
"chosen": true
}
],
"chosen_access_method": {
"type": "scan",
"rows_read": 10000,
"rows_out": 10000,
"cost": 1.6545316,
"uses_join_buffering": false
}
}
}
]
},
{
"plan_prefix": "",
"table": "t",
"rows_for_plan": 10000,
"cost_for_plan": 1.6545316
}
]
},
{
"best_join_order": ["t"],
"rows": 10000,
"cost": 1.6545316
},
{
"attaching_conditions_to_tables": {
"attached_conditions_computation": [],
"attached_conditions_summary": [
{
"table": "t",
"attached_condition": null
}
]
}
},
{
"make_join_readinfo": []
}
]
}
},
{
"join_execution": {
"select_id": 1,
"steps": []
}
}
]
}
MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0
INSUFFICIENT_PRIVILEGES: 0
1 row in set (0,004 sec)