[MDEV-33446] optimizer is wrong - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 11.4.0, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL)
Fix Version/s: 11.4
Component/s: Storage Engine - InnoDB
Labels:
- regression
Environment:
Linux

Description

I compared the "explain" output of two select count(*) from table, identical tables, one mariadb and the other "mysql 8.3", and they are different. The tables have one single field, varchar(10) and it's the primary key. I need to understand the difference:

Mariadb:

explain extended select count(*) from goodnumber force index(primary);

+------+-------------+------------+------+---------------+------+---------+------+-----------+----------+-------+

| id   | select_type | table      | type | possible_keys | key  | key_len | ref  | rows      | filtered | Extra |

+------+-------------+------------+------+---------------+------+---------+------+-----------+----------+-------+

|    1 | SIMPLE      | goodnumber | ALL  | NULL          | NULL | NULL    | NULL | 315011080 |   100.00 |       |

+------+-------------+------------+------+---------------+------+---------+------+-----------+----------+-------+,

Mysql, explain select count(*) from goodnumber;

+----+-------------+------------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+

| id | select_type | table      | partitions | type  | possible_keys | key     | key_len | ref  | rows      | filtered | Extra       |

+----+-------------+------------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+

|  1 | SIMPLE      | goodnumber | NULL       | index | NULL          | PRIMARY | 42      | NULL | 315054340 |   100.00 | Using index |

+----+-------------+------------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+.

The fastest result is the latter, like 10 times. Does it mean that the commercial version of MySQL by Oracle has a better optimizer and it counts the records on a table about 10 times faster?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mdev33446-log-10.11.txt
3 kB
2024-03-26 10:49
mdev33446-log-11.0.txt
3 kB
2024-03-26 10:49

Issue Links

relates to

MDEV-5004 Support parallel read transactions on the same snapshot

Open

MDEV-6096 Ideas about parallel query execution

Open

MDEV-34125 ANALYZE FORMAT=JSON: r_engine_stats.pages_read_time_ms has wrong scale

Closed

Activity

Ascending order - Click to sort in descending order

Alice Sherepa added a comment - 2024-02-12 13:35

Thank you for the report!
Indeed 11.0-11.4 return results much slower, than 10.4-10.11 (they also used primary key, as in Mysql), FORCE index does not help:

MariaDB [test]> create or replace table t(i int primary key);

Query OK, 0 rows affected (0,058 sec)

MariaDB [test]> insert into t select seq from seq_1_to_10000;

Query OK, 10000 rows affected (0,724 sec)

Records: 10000  Duplicates: 0  Warnings: 0

MariaDB [test]> explain extended select count(*) from t ;

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra |

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+

|    1 | SIMPLE      | t     | ALL  | NULL          | NULL | NULL    | NULL | 10000 |   100.00 |       |

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+

1 row in set, 1 warning (0,002 sec)

Note (Code 1003): select count(0) AS `count(*)` from `test`.`t`

MariaDB [test]> explain extended select count(*) from t force index (primary);

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra |

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+

|    1 | SIMPLE      | t     | ALL  | NULL          | NULL | NULL    | NULL | 10000 |   100.00 |       |

+------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+

1 row in set, 1 warning (0,002 sec)

MariaDB [test]> SET optimizer_trace='enabled=on';

Query OK, 0 rows affected (0,001 sec)

MariaDB [test]> select count(*) from t force index (primary) ;

+----------+

| count(*) |

+----------+

|    10000 |

+----------+

1 row in set (0,050 sec)

MariaDB [test]>  select * from information_schema.optimizer_trace limit 1\G

*************************** 1. row ***************************

                            QUERY: select count(*) from t force index (primary)

                            TRACE: {

  "steps": [

      "join_preparation": {

        "select_id": 1,

        "steps": [

            "expanded_query": "select count(0) AS `count(*)` from t FORCE INDEX (PRIMARY)"

},

      "join_optimization": {

        "select_id": 1,

        "steps": [

            "table_dependencies": [

                "table": "t",

                "row_may_be_null": false,

                "map_bit": 0,

                "depends_on_map_bits": []

},

            "rows_estimation": [

                "table": "t",

                "table_scan": {

                  "rows": 10000,

                  "read_cost": 1.3345316,

                  "read_and_compare_cost": 1.6545316

},

            "considered_execution_plans": [

                "plan_prefix": "",

                "get_costs_for_tables": [

                    "best_access_path": {

                      "table": "t",

                      "plan_details": {

                        "record_count": 1

},

                      "considered_access_paths": [

                          "access_type": "scan",

                          "rows": 10000,

                          "rows_after_filter": 10000,

                          "rows_out": 10000,

                          "cost": 1.6545316,

                          "index_only": false,

                          "chosen": true

],

                      "chosen_access_method": {

                        "type": "scan",

                        "rows_read": 10000,

                        "rows_out": 10000,

                        "cost": 1.6545316,

                        "uses_join_buffering": false

},

                "plan_prefix": "",

                "table": "t",

                "rows_for_plan": 10000,

                "cost_for_plan": 1.6545316

},

            "best_join_order": ["t"],

            "rows": 10000,

            "cost": 1.6545316

},

            "attaching_conditions_to_tables": {

              "attached_conditions_computation": [],

              "attached_conditions_summary": [

                  "table": "t",

                  "attached_condition": null

},

            "make_join_readinfo": []

},

      "join_execution": {

        "select_id": 1,

        "steps": []

MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0

          INSUFFICIENT_PRIVILEGES: 0

1 row in set (0,004 sec)

Alice Sherepa added a comment - 2024-02-12 13:35 Thank you for the report! Indeed 11.0-11.4 return results much slower, than 10.4-10.11 (they also used primary key, as in Mysql), FORCE index does not help: MariaDB [test]> create or replace table t(i int primary key); Query OK, 0 rows affected (0,058 sec) MariaDB [test]> insert into t select seq from seq_1_to_10000; Query OK, 10000 rows affected (0,724 sec) Records: 10000 Duplicates: 0 Warnings: 0 MariaDB [test]> explain extended select count(*) from t ; +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+ | 1 | SIMPLE | t | ALL | NULL | NULL | NULL | NULL | 10000 | 100.00 | | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+ 1 row in set, 1 warning (0,002 sec) Note (Code 1003): select count(0) AS `count(*)` from `test`.`t` MariaDB [test]> explain extended select count(*) from t force index (primary); +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+ | 1 | SIMPLE | t | ALL | NULL | NULL | NULL | NULL | 10000 | 100.00 | | +------+-------------+-------+------+---------------+------+---------+------+-------+----------+-------+ 1 row in set, 1 warning (0,002 sec) MariaDB [test]> SET optimizer_trace='enabled=on'; Query OK, 0 rows affected (0,001 sec) MariaDB [test]> select count(*) from t force index (primary) ; +----------+ | count(*) | +----------+ | 10000 | +----------+ 1 row in set (0,050 sec) MariaDB [test]> select * from information_schema.optimizer_trace limit 1\G *************************** 1. row *************************** QUERY: select count(*) from t force index (primary) TRACE: { "steps": [ { "join_preparation": { "select_id": 1, "steps": [ { "expanded_query": "select count(0) AS `count(*)` from t FORCE INDEX (PRIMARY)" } ] } }, { "join_optimization": { "select_id": 1, "steps": [ { "table_dependencies": [ { "table": "t", "row_may_be_null": false, "map_bit": 0, "depends_on_map_bits": [] } ] }, { "rows_estimation": [ { "table": "t", "table_scan": { "rows": 10000, "read_cost": 1.3345316, "read_and_compare_cost": 1.6545316 } } ] }, { "considered_execution_plans": [ { "plan_prefix": "", "get_costs_for_tables": [ { "best_access_path": { "table": "t", "plan_details": { "record_count": 1 }, "considered_access_paths": [ { "access_type": "scan", "rows": 10000, "rows_after_filter": 10000, "rows_out": 10000, "cost": 1.6545316, "index_only": false, "chosen": true } ], "chosen_access_method": { "type": "scan", "rows_read": 10000, "rows_out": 10000, "cost": 1.6545316, "uses_join_buffering": false } } } ] }, { "plan_prefix": "", "table": "t", "rows_for_plan": 10000, "cost_for_plan": 1.6545316 } ] }, { "best_join_order": ["t"], "rows": 10000, "cost": 1.6545316 }, { "attaching_conditions_to_tables": { "attached_conditions_computation": [], "attached_conditions_summary": [ { "table": "t", "attached_condition": null } ] } }, { "make_join_readinfo": [] } ] } }, { "join_execution": { "select_id": 1, "steps": [] } } ] } MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0 INSUFFICIENT_PRIVILEGES: 0 1 row in set (0,004 sec)

Philip orleans added a comment - 2024-02-12 17:26

Just out of curiosity, why do we have the "force index" option, if the optimizer disregards it? It makes no sense. KIndly remove the option or make it work.

Philip orleans added a comment - 2024-02-12 17:26 Just out of curiosity, why do we have the "force index" option, if the optimizer disregards it? It makes no sense. KIndly remove the option or make it work.

Philip orleans added a comment - 2024-03-02 16:09 - edited

Server version: 11.3.2-MariaDB-1:11.3.2+maria~ubu2004 mariadb.org binary distribution
The issue with the optimizer or with some other part of the Mariadb software is deeper
Consider a table like

show create table dnc.dnosimple;

+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------+

| Table     | Create Table                                                                                                                                            |

+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------+

| dnosimple | CREATE TABLE `dnosimple` (

  `ani` bigint(20) unsigned NOT NULL,

  PRIMARY KEY (`ani`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci |

then, inside a stored procedure I have two variables,
anilongv bigint unsigned
isdnov tinyint
this is from INSIDE the stored procedure

EXPLAIN SELECT count(*) INTO isdnov FROM dnc.dnosimple FORCE INDEX (PRIMARY) WHERE ani = anilongv;

+------+-------------+-----------+------+---------------+------+---------+------+------------+-------+

| id   | select_type | table     | type | possible_keys | key  | key_len | ref  | rows       | Extra |

+------+-------------+-----------+------+---------------+------+---------+------+------------+-------+

|    1 | SIMPLE      | dnosimple | ALL  | NULL          | NULL | NULL    | NULL | 1057261209 |       |

+------+-------------+-----------+------+---------------+------+---------+------+------------+-------+

which is dead wrong, in fact, the query has to be stopped because it scans the table while having a primary key of the same data type.

however, this is the weird thing: I created a new stored procedure

SET NAMES 'latin1';

USE asterisk;

DELIMITER $$

CREATE

DEFINER = 'root'@'%'

PROCEDURE search_dno (IN aniv bigint UNSIGNED, INOUT kount tinyint)

BEGIN

SELECT

    COUNT(*) INTO kount

  FROM dnc.dnosimple FORCE INDEX (PRIMARY)

  WHERE ani = aniv;

END

$$

DELIMITER ;

and while the "explain" is still wrong, the query does work as expected. This is a table with over 1BN unique numbers.
Naturally, I would like to run the query directly, not via a second stored procedure.

Philip orleans added a comment - 2024-03-02 16:09 - edited Server version: 11.3.2-MariaDB-1:11.3.2+maria~ubu2004 mariadb.org binary distribution The issue with the optimizer or with some other part of the Mariadb software is deeper Consider a table like show create table dnc.dnosimple; +-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-----------+---------------------------------------------------------------------------------------------------------------------------------------------------------+ | dnosimple | CREATE TABLE `dnosimple` ( `ani` bigint(20) unsigned NOT NULL, PRIMARY KEY (`ani`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci | then, inside a stored procedure I have two variables, anilongv bigint unsigned isdnov tinyint this is from INSIDE the stored procedure EXPLAIN SELECT count(*) INTO isdnov FROM dnc.dnosimple FORCE INDEX (PRIMARY) WHERE ani = anilongv; +------+-------------+-----------+------+---------------+------+---------+------+------------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-----------+------+---------------+------+---------+------+------------+-------+ | 1 | SIMPLE | dnosimple | ALL | NULL | NULL | NULL | NULL | 1057261209 | | +------+-------------+-----------+------+---------------+------+---------+------+------------+-------+ which is dead wrong, in fact, the query has to be stopped because it scans the table while having a primary key of the same data type. however, this is the weird thing: I created a new stored procedure SET NAMES 'latin1'; USE asterisk; DELIMITER $$ CREATE DEFINER = 'root'@'%' PROCEDURE search_dno (IN aniv bigint UNSIGNED, INOUT kount tinyint) BEGIN SELECT COUNT(*) INTO kount FROM dnc.dnosimple FORCE INDEX (PRIMARY) WHERE ani = aniv; END $$ DELIMITER ; and while the "explain" is still wrong, the query does work as expected. This is a table with over 1BN unique numbers. Naturally, I would like to run the query directly, not via a second stored procedure.

Sergei Petrunia added a comment - 2024-03-26 09:44 - edited

philip_38, it looks very odd that a simple query with WHERE primary_key=const decides to use a full table (or index) scan.
I see in your pasete:

USE asterisk;

SELECT

COUNT INTO kount

FROM dnc.dnosimple FORCE INDEX (PRIMARY)

is the table dnosimple from the right database?

I try to reproduce and the query uses const access and reads one row, as expected:

MariaDB [test]> CREATE TABLE `dnosimple` (

    ->   `ani` bigint(20) unsigned NOT NULL,

    ->   PRIMARY KEY (`ani`)

    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci

    -> ;

Query OK, 0 rows affected (0.001 sec)

MariaDB [test]> insert into dnosimple select seq from seq_1_to_100000;

Query OK, 100000 rows affected (0.112 sec)

Records: 100000  Duplicates: 0  Warnings: 0

MariaDB [test]> SET NAMES 'latin1';

Query OK, 0 rows affected (0.000 sec)

MariaDB [test]> DELIMITER $$

MariaDB [test]> CREATE

    -> PROCEDURE search_dno (IN aniv bigint UNSIGNED, INOUT kount tinyint)

    -> BEGIN

    -> EXPLAIN SELECT

    -> COUNT(*) INTO kount

    -> FROM dnosimple

    -> WHERE ani = aniv;

    -> END

    -> $$

Query OK, 0 rows affected (0.000 sec)

MariaDB [test]> CALL search_dno(10, @var);

    -> $$

+------+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+

| id   | select_type | table     | type  | possible_keys | key     | key_len | ref   | rows | Extra |

+------+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+

|    1 | SIMPLE      | dnosimple | const | PRIMARY       | PRIMARY | 8       | const | 1    |       |

+------+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+

1 row in set (0.000 sec)

Query OK, 0 rows affected (0.000 sec)

(This is from 11.0 but I'm sure other versions would behave the same for this query).

Are you sure you don't have another dnosimple table in another database with ani column defined as VARCHAR? That would explain the use of full index scan...

you've mentioned in the report:

The tables have one single field, varchar(10) and it's the primary key.

VARCHAR...

if the column in the table is VARCHAR, be sure to compare it with a VARCHAR SP parameter or a string constant...

Sergei Petrunia added a comment - 2024-03-26 09:44 - edited philip_38 , it looks very odd that a simple query with WHERE primary_key=const decides to use a full table (or index) scan. I see in your pasete: USE asterisk; SELECT COUNT INTO kount FROM dnc.dnosimple FORCE INDEX ( PRIMARY ) is the table dnosimple from the right database? I try to reproduce and the query uses const access and reads one row, as expected: MariaDB [test]> CREATE TABLE `dnosimple` ( -> `ani` bigint(20) unsigned NOT NULL, -> PRIMARY KEY (`ani`) -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci -> ; Query OK, 0 rows affected (0.001 sec) MariaDB [test]> insert into dnosimple select seq from seq_1_to_100000; Query OK, 100000 rows affected (0.112 sec) Records: 100000 Duplicates: 0 Warnings: 0 MariaDB [test]> SET NAMES 'latin1'; Query OK, 0 rows affected (0.000 sec) MariaDB [test]> DELIMITER $$ MariaDB [test]> CREATE -> PROCEDURE search_dno (IN aniv bigint UNSIGNED, INOUT kount tinyint) -> BEGIN -> EXPLAIN SELECT -> COUNT(*) INTO kount -> FROM dnosimple -> WHERE ani = aniv; -> END -> $$ Query OK, 0 rows affected (0.000 sec) MariaDB [test]> CALL search_dno(10, @var); -> $$ +------+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | dnosimple | const | PRIMARY | PRIMARY | 8 | const | 1 | | +------+-------------+-----------+-------+---------------+---------+---------+-------+------+-------+ 1 row in set (0.000 sec) Query OK, 0 rows affected (0.000 sec) (This is from 11.0 but I'm sure other versions would behave the same for this query). Are you sure you don't have another dnosimple table in another database with ani column defined as VARCHAR? That would explain the use of full index scan... you've mentioned in the report: The tables have one single field, varchar(10) and it's the primary key. VARCHAR... if the column in the table is VARCHAR, be sure to compare it with a VARCHAR SP parameter or a string constant...

Sergei Petrunia added a comment - 2024-03-26 10:35 - edited

As for the speed of full table vs full index scan.

The tables have one single field, varchar(10) and it's the primary key.

In this case, if the table is InnoDB, then full table scan and full index scan are essentially the same thing. full table scan is a full index scan on PK. It is very surprising to see any difference in performance.

Sergei Petrunia added a comment - 2024-03-26 10:35 - edited As for the speed of full table vs full index scan. The tables have one single field, varchar(10) and it's the primary key. In this case, if the table is InnoDB, then full table scan and full index scan are essentially the same thing. full table scan is a full index scan on PK. It is very surprising to see any difference in performance.

Sergei Petrunia added a comment - 2024-03-26 10:50

for comparing across versions, I am trying this:

create or replace table t(i int primary key);

insert into t select seq from seq_1_to_10000000;

explain extended select count(*) from t ;

select count(*) from t;

select count(*) from t;

select count(*) from t;

explain extended select count(*) from t force index (primary);

select count(*) from t force index (primary) ;

select count(*) from t force index (primary) ;

select count(*) from t force index (primary) ;

select count(*) from t force index (primary) ;

The results are :
mdev33446-log-10.11.txt mdev33446-log-11.0.txt

11.0 is slightly faster.

Sergei Petrunia added a comment - 2024-03-26 10:50 for comparing across versions, I am trying this: create or replace table t(i int primary key ); insert into t select seq from seq_1_to_10000000; explain extended select count (*) from t ; select count (*) from t; select count (*) from t; select count (*) from t; explain extended select count (*) from t force index ( primary ); select count (*) from t force index ( primary ) ; select count (*) from t force index ( primary ) ; select count (*) from t force index ( primary ) ; select count (*) from t force index ( primary ) ; The results are : mdev33446-log-10.11.txt mdev33446-log-11.0.txt 11.0 is slightly faster.

Sergei Petrunia added a comment - 2024-03-26 10:51

alice, can you provide steps to reproduce the slowdown?

Sergei Petrunia added a comment - 2024-03-26 10:51 alice , can you provide steps to reproduce the slowdown?

Philip orleans added a comment - 2024-03-27 06:01 - edited

The same table

CREATE TABLE `dnosimple` (

  `ani` bigint unsigned NOT NULL,

  PRIMARY KEY (`ani`)

) ENGINE=InnoDB

with the same data, 1.062 BN records, look the difference
MySQL 8.3

select count(*) from dnosimple;

+------------+

| count(*)   |

+------------+

| 1062468947 |

+------------+

1 row in set (19.63 sec)

Mariadb 11.4.1

select count(*) from dnosimple;

+------------+

| count(*)   |

+------------+

| 1062468947 |

+------------+

1 row in set (2 min 15.086 sec)

I know the hardware is different, but there is something else here.
You may download my data from "wget https://fcc.x1.uy/static/dnosimple.zip". Please load this data, not random or sequential data. "time mysqlimport testdb --local dnosimple.csv". It takes 30 min on both platforms

Philip orleans added a comment - 2024-03-27 06:01 - edited The same table CREATE TABLE `dnosimple` ( `ani` bigint unsigned NOT NULL , PRIMARY KEY (`ani`) ) ENGINE=InnoDB with the same data, 1.062 BN records, look the difference MySQL 8.3 select count(*) from dnosimple; +------------+ | count(*) | +------------+ | 1062468947 | +------------+ 1 row in set (19.63 sec) Mariadb 11.4.1 select count(*) from dnosimple; +------------+ | count(*) | +------------+ | 1062468947 | +------------+ 1 row in set (2 min 15.086 sec) I know the hardware is different, but there is something else here. You may download my data from "wget https://fcc.x1.uy/static/dnosimple.zip ". Please load this data, not random or sequential data. "time mysqlimport testdb --local dnosimple.csv". It takes 30 min on both platforms

Lena Startseva added a comment - 2024-04-05 13:30

On my laptop with 64 Gb of RAM I got the next result with the data and query from comment above:

	Query execution time	Size of dnosimple.ibd
MySQL 8.3	6 min 26,52 sec	30,7 GB (30 681 333 760 bytes)
Mariadb 11.4.1	30 min 15,338 sec	34,8 GB (34 837 889 024 bytes)

ANALYZE FORMAT=JSON select count(*) from dnosimple;

gives the next result:

| {

  "query_optimization": {

    "r_total_time_ms": 0.242982647

},

  "query_block": {

    "select_id": 1,

    "cost": 174691.1825,

    "r_loops": 1,

    "r_total_time_ms": 1822554.621,

    "nested_loop": [

        "table": {

          "table_name": "dnosimple",

          "access_type": "ALL",

          "loops": 1,

          "r_loops": 1,

          "rows": 1060618416,

          "r_rows": 1062468947,

          "cost": 174691.1825,

          "r_table_time_ms": 1425235.95,

          "r_other_time_ms": 397318.6493,

          "r_engine_stats": {

            "pages_accessed": 1850995,

            "pages_read_count": 35272,

            "pages_read_time_ms": 14657261.27

},

          "filtered": 100,

          "r_filtered": 100

} |

Lena Startseva added a comment - 2024-04-05 13:30 On my laptop with 64 Gb of RAM I got the next result with the data and query from comment above: Query execution time Size of dnosimple.ibd MySQL 8.3 6 min 26,52 sec 30,7 GB (30 681 333 760 bytes) Mariadb 11.4.1 30 min 15,338 sec 34,8 GB (34 837 889 024 bytes) ANALYZE FORMAT=JSON select count(*) from dnosimple; gives the next result: | { "query_optimization" : { "r_total_time_ms" : 0.242982647 }, "query_block" : { "select_id" : 1, "cost" : 174691.1825, "r_loops" : 1, "r_total_time_ms" : 1822554.621, "nested_loop" : [ { "table" : { "table_name" : "dnosimple" , "access_type" : "ALL" , "loops" : 1, "r_loops" : 1, "rows" : 1060618416, "r_rows" : 1062468947, "cost" : 174691.1825, "r_table_time_ms" : 1425235.95, "r_other_time_ms" : 397318.6493, "r_engine_stats" : { "pages_accessed" : 1850995, "pages_read_count" : 35272, "pages_read_time_ms" : 14657261.27 }, "filtered" : 100, "r_filtered" : 100 } } ] } } |

Marko Mäkelä added a comment - 2024-04-05 14:23

The large pages_read_time_ms caught my eye. This could possibly be explained by ~~MDEV-31227~~. You might also want to try SET GLOBAL innodb_random_read_ahead=ON.

Marko Mäkelä added a comment - 2024-04-05 14:23 The large pages_read_time_ms caught my eye. This could possibly be explained by MDEV-31227 . You might also want to try SET GLOBAL innodb_random_read_ahead=ON .

Philip orleans added a comment - 2024-04-05 15:19

I set innodb_random_read_ahead=ON and is marginally faster, but still very far away from MySQL (Oracle).

Philip orleans added a comment - 2024-04-05 15:19 I set innodb_random_read_ahead=ON and is marginally faster, but still very far away from MySQL (Oracle).

Marko Mäkelä added a comment - 2024-04-08 06:39

For an apples-to-apples comparison, you should either disable or enable the use of the file system cache (innodb_flush_method=O_DIRECT or innodb_flush_method=fsync) on versions, and use the same innodb_buffer_pool_size in both. The motivation of ~~MDEV-24854~~ was to improve write performance.

Marko Mäkelä added a comment - 2024-04-08 06:39 For an apples-to-apples comparison, you should either disable or enable the use of the file system cache ( innodb_flush_method=O_DIRECT or innodb_flush_method=fsync ) on versions, and use the same innodb_buffer_pool_size in both. The motivation of MDEV-24854 was to improve write performance.

Philip orleans added a comment - 2024-04-09 05:46

I did match exactly the settings, as above. The difference is still huge.

Philip orleans added a comment - 2024-04-09 05:46 I did match exactly the settings, as above. The difference is still huge.

Marko Mäkelä added a comment - 2024-04-09 08:51

Thank you, I just wanted to exclude an obvious explanation.

Marko Mäkelä added a comment - 2024-04-09 08:51 Thank you, I just wanted to exclude an obvious explanation.

Michael Widenius added a comment - 2024-05-07 12:18 - edited

InnoDB buffer_pool_size= 128M, MariaDB 10.6 compiled with --debug (should be >> 2x slower than optimized binary)

create table foo (a varchar(10)) engine=innodb select left(md5(seq),10) as a from seq_1_to_31011080;

31011080 rows affected (18 min 37.844 sec)

MariaDB [test]> select count(*) from foo;

+----------+

| count(*) |

+----------+

| 31011080 |

+----------+

1 row in set (2 min 34.729 sec)

I tried to run with 10.6-debug and a bigger buffer pool:

 sql/mariadbd --innodb-buffer-pool-size=40G

but this did not work as all time is spent in innodb buffer pool checks.

Perf shows:

24.86% mariadbd                        [.] ut_list_validate<ut_list_base<buf_pag

 20.40% mariadbd                        [.] CheckInFreeList::operator()

 16.52% mariadbd                        [.] Atomic_relaxed<unsigned int>::operato

 11.02% mariadbd                        [.] buf_pool_t::validate

Testing 11.4 compiled without debug and 40G buffer pool:

create or replace table foo (a varchar(10)) engine=innodb select left(md5(seq),10) as a from seq_1_to_31011080;

MariaDB [test]> select count(*) from foo;

+----------+

| count(*) |

+----------+

| 31011080 |

+----------+

1 row in set (5.133 sec)

1 row in set (4.999 sec)  (second run)

Restarting with empty buffer pool:

+----------+

| count(*) |

+----------+

| 31011080 |

+----------+

1 row in set (6.171 sec)

Restarting with empty buffer pool of 128M

+----------+

| count(*) |

+----------+

| 31011080 |

+----------+

1 row in set (6.638 sec)

Michael Widenius added a comment - 2024-05-07 12:18 - edited InnoDB buffer_pool_size= 128M, MariaDB 10.6 compiled with --debug (should be >> 2x slower than optimized binary) create table foo (a varchar(10)) engine=innodb select left(md5(seq),10) as a from seq_1_to_31011080; 31011080 rows affected (18 min 37.844 sec) MariaDB [test]> select count(*) from foo; +----------+ | count(*) | +----------+ | 31011080 | +----------+ 1 row in set (2 min 34.729 sec) I tried to run with 10.6-debug and a bigger buffer pool: sql/mariadbd --innodb-buffer-pool-size=40G but this did not work as all time is spent in innodb buffer pool checks. Perf shows: 24.86% mariadbd [.] ut_list_validate<ut_list_base<buf_pag 20.40% mariadbd [.] CheckInFreeList::operator() 16.52% mariadbd [.] Atomic_relaxed<unsigned int>::operato 11.02% mariadbd [.] buf_pool_t::validate Testing 11.4 compiled without debug and 40G buffer pool: create or replace table foo (a varchar(10)) engine=innodb select left(md5(seq),10) as a from seq_1_to_31011080; MariaDB [test]> select count(*) from foo; +----------+ | count(*) | +----------+ | 31011080 | +----------+ 1 row in set (5.133 sec) 1 row in set (4.999 sec) (second run) Restarting with empty buffer pool: +----------+ | count(*) | +----------+ | 31011080 | +----------+ 1 row in set (6.171 sec) Restarting with empty buffer pool of 128M +----------+ | count(*) | +----------+ | 31011080 | +----------+ 1 row in set (6.638 sec)

Michael Widenius added a comment - 2024-05-07 13:52

The table here explains some of the issues:
https://www.percona.com/blog/why-select-count-from-table-is-sometimes-very-slow-in-mysql-or-mariadb/
One major difference in newer MySQL versions is that it has parallel scan of InnoDB. This can have a big impact for single user usage of MySQL.
We should try to find out why 'Trx/Same session is slower than 'Alone'. This should not be the cases. Even for Separate session, the slowdown should only happen for pages that has changed since the transaction started.

Michael Widenius added a comment - 2024-05-07 13:52 The table here explains some of the issues: https://www.percona.com/blog/why-select-count-from-table-is-sometimes-very-slow-in-mysql-or-mariadb/ One major difference in newer MySQL versions is that it has parallel scan of InnoDB. This can have a big impact for single user usage of MySQL. We should try to find out why 'Trx/Same session is slower than 'Alone'. This should not be the cases. Even for Separate session, the slowdown should only happen for pages that has changed since the transaction started.

Michael Widenius added a comment - 2024-05-07 14:36 - edited

I did a check with an identical table as philip_38 described with a InnoDB buffer pools size of 128M on MariaDB 11.4:

create or replace table foo2 (a bigint primary key) engine=innodb select seq as a from seq_1_to_1062468947;

MariaDB [test]> select count(*) from foo2;

+------------+

| count(*)   |

+------------+

| 1062468947 |

+------------+

1 row in set (1 min 55.127 sec)

I tested also with set global innodb_random_read_ahead=ON, but there was no notable difference.

MySQL is probably faster for this particular query thanks to parallel full table scan. (Something we should look at implementing).
philip_38, what is your value for innodb_parallel_read_threads ?
MariaDB has a feature, not yet enabled for InnoDB, that would allow sum queries over the whole table to be much faster.
We should look at enable this one at least for count for InnoDB.

Michael Widenius added a comment - 2024-05-07 14:36 - edited I did a check with an identical table as philip_38 described with a InnoDB buffer pools size of 128M on MariaDB 11.4: create or replace table foo2 (a bigint primary key) engine=innodb select seq as a from seq_1_to_1062468947; MariaDB [test]> select count(*) from foo2; +------------+ | count(*) | +------------+ | 1062468947 | +------------+ 1 row in set (1 min 55.127 sec) I tested also with set global innodb_random_read_ahead=ON, but there was no notable difference. MySQL is probably faster for this particular query thanks to parallel full table scan. (Something we should look at implementing). philip_38 , what is your value for innodb_parallel_read_threads ? MariaDB has a feature, not yet enabled for InnoDB, that would allow sum queries over the whole table to be much faster. We should look at enable this one at least for count for InnoDB.

Philip orleans added a comment - 2024-05-07 23:24

This is a vital feature. Virtually all my business uses cont to confirm that a 1.2BN records table matches a flat file. I do this every AM. This is the list of all the ported phone numbers in North America. The FCC releases the data and then I need to update my table. If we can make this work faster, I can have the table ready for clients.
I cannot understand why nobody mentioned this parallel table scan until today.

Philip orleans added a comment - 2024-05-07 23:24 This is a vital feature. Virtually all my business uses cont to confirm that a 1.2BN records table matches a flat file. I do this every AM. This is the list of all the ported phone numbers in North America. The FCC releases the data and then I need to update my table. If we can make this work faster, I can have the table ready for clients. I cannot understand why nobody mentioned this parallel table scan until today.

Sergei Petrunia added a comment - 2024-05-09 08:38

(The observation about odd value of r_engine_stats.pages_read_time_ms is addressed separately in ~~MDEV-34125~~)

Sergei Petrunia added a comment - 2024-05-09 08:38 (The observation about odd value of r_engine_stats.pages_read_time_ms is addressed separately in MDEV-34125 )

Marko Mäkelä added a comment - 2024-05-10 12:36

I wonder if we should default to using the PRIMARY KEY or clustered index for COUNT( * ) queries on InnoDB tables. If a secondary index is chosen where there is a lot of history available (say, a column that is covered by that index has been frequently updated, or there have been massive DELETE in the table), then the MVCC access to the secondary index records could be extremely slow until MDEV-17598 has been implemented. I covered this in a presentation some time ago: https://mariadb.org/fest2022/how-innodb-multi-version-concurrency-control-mvcc-works/

Evaluating COUNT by a table scan can be more costly if the records are wide (only a few clustered index records per leaf page), but the worst-case performance should be rather limited, unless the read view is very old and many old versions of records need to be retrieved. In secondary indexes, the MVCC overhead may be paid for any read view. It is a kind of death spiral to performance, because as noted in ~~MDEV-33213~~, resolving the purge lag will require the MVCC checks to be executed in secondary indexes.

Marko Mäkelä added a comment - 2024-05-10 12:36 I wonder if we should default to using the PRIMARY KEY or clustered index for COUNT( * ) queries on InnoDB tables. If a secondary index is chosen where there is a lot of history available (say, a column that is covered by that index has been frequently updated, or there have been massive DELETE in the table), then the MVCC access to the secondary index records could be extremely slow until MDEV-17598 has been implemented. I covered this in a presentation some time ago: https://mariadb.org/fest2022/how-innodb-multi-version-concurrency-control-mvcc-works/ Evaluating COUNT by a table scan can be more costly if the records are wide (only a few clustered index records per leaf page), but the worst-case performance should be rather limited, unless the read view is very old and many old versions of records need to be retrieved. In secondary indexes, the MVCC overhead may be paid for any read view. It is a kind of death spiral to performance, because as noted in MDEV-33213 , resolving the purge lag will require the MVCC checks to be executed in secondary indexes.

Philip orleans added a comment - 2024-05-25 04:55 - edited

select count(*) from dno;

+------------+

| count(*)   |

+------------+

| 1034686910 |

+------------+

1 row in set (8 min 5.076 sec)

The same operation takes less than 60 seconds with Oracle's MySQL.
It's time to do parallel scans.

Philip orleans added a comment - 2024-05-25 04:55 - edited select count(*) from dno; +------------+ | count(*) | +------------+ | 1034686910 | +------------+ 1 row in set (8 min 5.076 sec) The same operation takes less than 60 seconds with Oracle's MySQL. It's time to do parallel scans.

Lena Startseva added a comment - 2024-05-27 05:14 - edited

Check on my laptop (https://jira.mariadb.org/browse/MDEV-33446?focusedCommentId=282972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-282972) innodb_parallel_read_threads for MySQL 8.3. By default it was

+------------------------------+-------+

| Variable_name                | Value |

+------------------------------+-------+

| innodb_parallel_read_threads | 4     |

+------------------------------+-------+

and time for execution was 6 min 33,13 sec:

mysql> select count(*) from dnosimple;

+------------+

| count(*)   |

+------------+

| 1062468947 |

+------------+

1 row in set (6 min 33,13 sec)

For:

+------------------------------+-------+

| Variable_name                | Value |

+------------------------------+-------+

| innodb_parallel_read_threads | 1     |

+------------------------------+-------+

Time increased 10 times and was 1 hour 4 min 17,21 sec:

mysql> select count(*) from dnosimple;

+------------+

| count(*)   |

+------------+

| 1062468947 |

+------------+

1 row in set (1 hour 4 min 17,21 sec)

Lena Startseva added a comment - 2024-05-27 05:14 - edited Check on my laptop ( https://jira.mariadb.org/browse/MDEV-33446?focusedCommentId=282972&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-282972 ) innodb_parallel_read_threads for MySQL 8.3 . By default it was + ------------------------------+-------+ | Variable_name | Value | + ------------------------------+-------+ | innodb_parallel_read_threads | 4 | + ------------------------------+-------+ and time for execution was 6 min 33,13 sec : mysql> select count (*) from dnosimple; + ------------+ | count (*) | + ------------+ | 1062468947 | + ------------+ 1 row in set (6 min 33,13 sec) For: + ------------------------------+-------+ | Variable_name | Value | + ------------------------------+-------+ | innodb_parallel_read_threads | 1 | + ------------------------------+-------+ Time increased 10 times and was 1 hour 4 min 17,21 sec : mysql> select count (*) from dnosimple; + ------------+ | count (*) | + ------------+ | 1062468947 | + ------------+ 1 row in set (1 hour 4 min 17,21 sec)

Philip orleans added a comment - 2024-05-27 23:49

Care to explain?

Philip orleans added a comment - 2024-05-27 23:49 Care to explain?

Philip orleans added a comment - 2024-09-05 17:16

This is very interesting. Both versions should provide similar paths.

Philip orleans added a comment - 2024-09-05 17:16 This is very interesting. Both versions should provide similar paths.

People

Assignee:: Alice Sherepa

Reporter:: Philip orleans

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2024-02-12 06:00

Updated:: 2024-11-22 17:13

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration