[MDEV-10345] Delete query performance degradation when using a composite primary key Created: 2016-07-07  Updated: 2023-08-17

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1.14
Fix Version/s: 10.1

Type: Bug Priority: Major
Reporter: Jizo Zef Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: None
Environment:

CentOS 7



 Description   

We have an old MySQL 5.0 database running on CentOS 5 server which we want to move to CentOS 7 and MariaDB 10.1.

We have successfully installed and configured the new system to currently run as a slave of the old server (we will make it master once we are sure everything is working smoothly) but there were some problems.

We have a table like this one with ~70,000,000 records:

CREATE TABLE `calculated_data` (  
  `object_id` int(10) unsigned NOT NULL
  `d` date NOT NULL
  `hour` tinyint(4) NOT NULL default '-1'
  `kms` double unsigned NOT NULL
  PRIMARY KEY (`object_id`, `d`, `hour`),  
  CONSTRAINT `my_table__object_id__f_ix` FOREIGN KEY (`object_id`) REFERENCES `objects` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
)  ENGINE=InnoDB DEFAULT CHARSET=utf8;

We purge the old data from the table daily using this query:

DELETE FROM `calculated_data` WHERE `object_id` > 0 AND `d` < DATE_SUB(NOW(), INTERVAL 365 DAY);

We are not interested in `object_id` but we include it in the query so that the engine uses the primary key.

Our old server executes the query in 81 seconds which is perfectly acceptable, the new shiny one does it in 5314 seconds. What can be the problem?

There was another issue. New data is being inserted into the old server at 100 records per second and it works just fine, but the new server was replicating at ~20 records a second maximum. We tried to tune the flush methods and slave parallelism settings to no success (see the commented out settings), we somewhat solved the issue by changing innodb_flush_log_at_trx_commit = 1 to innodb_flush_log_at_trx_commit = 2, but I'm not sure it was the right thing to do.

Settings:

[mysqld]
 
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
 
bind-address = 127.0.0.1
 
max_allowed_packet = 16M
 
#This is MyISAM-only
key_buffer_size = 32M
read_buffer_size = 1M
 
sort_buffer_size = 4M
join_buffer_size = 2M
thread_cache_size = 12
thread_pool_size = 16
query_cache_limit = 0
query_cache_size = 0
query_cache_type = 0
table_open_cache = 5000
open_files_limit = 10000
max_connections = 100
 
character_set_server = utf8
collation_server = utf8_general_ci
 
innodb_buffer_pool_size = 5G
innodb_buffer_pool_instances = 5
innodb_log_buffer_size = 32M
innodb_log_file_size = 256M
innodb_log_files_in_group = 2
innodb_thread_concurrency = 10
innodb_file_per_table = 1
innodb_flush_method = O_DIRECT
#this is not safe (next line) but without it the slave syncs max at 20 inserts per second, very slow
innodb_flush_log_at_trx_commit = 2
#innodb_flush_log_at_trx_commit = 1
 
sync_binlog = 1
 
max_relay_log_size = 256M
relay_log_purge = 1
relay_log = db-relay-bin
relay_log_index = db-relay-bin.index
#slave_parallel_threads = 6
#slave_domain_parallel_threads = 4
#slave_parallel_max_queued = 1M
log_slow_slave_statements = 1
replicate_do_db = db
server_id = 2
 
slow_query_log = 1
slow_query_log_file = /var/log/mariadb/mariadbs.log



 Comments   
Comment by Elena Stepanova [ 2016-07-07 ]

For DELETE problem, please run EXPLAIN DELETE ... on the slow slave, and paste the output.
For INSERT problem, if you can modify logic on the master, you can also try to do your INSERTs in bigger transactions (e.g. BEGIN; <100 inserts>; COMMIT), it should help as well.

Comment by Jizo Zef [ 2016-07-08 ]

Here is the result of the EXPLAIN DELETE...

EXPLAIN DELETE FROM `calculated_data` WHERE `object_id` > 0 AND  `d` < DATE_SUB(NOW(), INTERVAL 365 DAY)

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE calculated_data range PRIMARY PRIMARY 4 NULL 39603342 Using where

There are ~70,000,000 records (edit: as it turns out it's ~80,000,000) in the table evenly distributed by the indexed `d` date field for one whole year (and `object_id` is always bigger than 0), I don't know why the rows field of the explain query is so big (39603342). When I remove the bogus `object_id` > 0 condition I get this (as expected, the index is not being used):

EXPLAIN DELETE FROM `calculated_data` WHERE `d` < DATE_SUB(NOW(), INTERVAL 365 DAY)

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE calculated_data ALL NULL NULL NULL NULL 79206684 Using where

As for doing the inserts in one transaction unfortunately I can't do that because the inserts are being made by different threads of one daemon process (which has 10 connections pool) where every thread inserts data coming from a remote RTU (there are thousands of RTUs which send data at different frequency from every 5 seconds to every 5 minutes, which in average results ~100 inserts per second).

Thank you for your interest.

Comment by Elena Stepanova [ 2016-07-08 ]

For the DELETE problem, did you try to run ANALYZE on the table?

Comment by Jizo Zef [ 2016-07-09 ]

No, but the table is created a couple of days ago from a mysqldump dump file, so it shouldn't be fragmented. Anyway, if you think it may help I can try it.

Can you please try to create the table and insert there i.e. 10,000 values then try the EXPLAIN query on the DELETE query and see if the "rows" is ~5000? In my case, for some reason the rows value is the half of the number of the records in the table. Is this normal considering that we are using the primary key's index? Here are the queries and a PHP script for your testing.

CREATE TABLE `objects` (
  `id` int(10) unsigned NOT NULL,
  `name` varchar(32),
  PRIMARY KEY(`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
CREATE TABLE `calculated_data` (  
  `object_id` int(10) unsigned NOT NULL,
  `d` date NOT NULL,
  `hour` tinyint(4) NOT NULL default '-1',
  `kms` double unsigned NOT NULL,
  PRIMARY KEY (`object_id`, `d`, `hour`),
  CONSTRAINT `my_table__object_id__f_ix` FOREIGN KEY (`object_id`) REFERENCES `objects` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
)  ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
 
INSERT INTO `objects` VALUES (1, 'Test1');
INSERT INTO `objects` VALUES (2, 'Test2');
INSERT INTO `objects` VALUES (3, 'Test3');

<?php
 
$con = new mysqli(
    'localhost',
    'root',
    'password',
    'test'
);
 
if ($con->connect_errno) {
    die('can not connect to the db');
}
 
$con->set_charset('utf8');
 
$ts = strtotime('1 January 2000');
$object_id = 1;
$date = date('Y-m-d', $ts);
$hour = 0;
$kms = rand();
$stmt = $con->prepare(
    'INSERT INTO
      `calculated_data`
    VALUES (?, ?, ?, ?)'
);
if (!$stmt || !$stmt->bind_param('isid', $object_id, $date, $hour, $kms)) {
    die('can not create/prepare the statement');
}
 
$con->begin_transaction();
 
for ($i = 0; $i < 10000; $i++) {
    $stmt->execute();
 
    $object_id++;
    if ($object_id > 3) {
        $object_id = 1;
 
        $hour++;
 
        if ($hour > 23) {
            $hour = 0;
 
            $ts += 24 * 3600;
        }
    }
 
    $date = date('Y-m-d', $ts);
    $kms = rand();
}
 
$con->commit();

Comment by Elena Stepanova [ 2016-07-09 ]

Can you please try to create the table and insert there i.e. 10,000 values then try the EXPLAIN query on the DELETE query and see if the "rows" is ~5000?

Yes, I do see ~5000 in the rows field. However, I doubt it's related to the performance degradation, since apparently it is so on all available versions of MariaDB and MySQL, including MySQL 5.0 – well, there is no EXPLAIN for DELETE in old versions, but it's all the same for SELECT:

MySQL [test]> EXPLAIN EXTENDED SELECT * FROM calculated_data WHERE `object_id` > 0 AND `d` < DATE_SUB(NOW(), INTERVAL 365 DAY);
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table           | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | calculated_data | range | PRIMARY       | PRIMARY | 4       | NULL | 5282 | Using where |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set, 1 warning (0.00 sec)
 
MySQL [test]> SELECT COUNT(*) FROM calculated_data WHERE `object_id` > 0 AND `d` < DATE_SUB(NOW(), INTERVAL 365 DAY);
+----------+
| COUNT(*) |
+----------+
|    10000 |
+----------+
1 row in set (0.01 sec)
 
MySQL [test]> SELECT @@version, @@version_comment;
+-----------+------------------------------+
| @@version | @@version_comment            |
+-----------+------------------------------+
| 5.0.95    | MySQL Community Server (GPL) |
+-----------+------------------------------+
1 row in set (0.00 sec)

Normally, I wouldn't put much trust into InnoDB-related estimates, but I'll assign it to psergey to see if there is anything to improve here, and to look at the reported degradation in general.

Below is a plain SQL version of the alleged '5000 rows' problem above, for the convenience. It is also suitable for MTR if you replace delimiter with --delimiter.

--source include/have_innodb.inc
 
DROP TABLE IF EXISTS calculated_data;
DROP PROCEDURE IF EXISTS fill_table;
 
CREATE TABLE `calculated_data` (  
  `object_id` int(10) unsigned NOT NULL,
  `d` date NOT NULL,
  `hour` tinyint(4) NOT NULL default '-1',
  `kms` double unsigned NOT NULL,
  PRIMARY KEY (`object_id`, `d`, `hour`)
)  ENGINE=InnoDB DEFAULT CHARSET=utf8;
 
delimiter ||
 
CREATE PROCEDURE fill_table (n INT)
BEGIN
    DECLARE obj INT DEFAULT 1;
    DECLARE h INT DEFAULT 0;
    DECLARE i INT DEFAULT 0;
    DECLARE ts INT DEFAULT 0;
    DECLARE dt DATE DEFAULT '2000-01-01';
    
    START TRANSACTION;
    WHILE i < n DO
        INSERT INTO calculated_data VALUES (obj, dt, h, ROUND(RAND()*10000000));
        SET obj = IF(obj=3, 1, obj+1);
        SET dt = IF(h=23,DATE_ADD(dt, INTERVAL 86400 SECOND),dt);
        SET h = IF(h=23, 0, h+1);
        SET i = i + 1;
    END WHILE;
    COMMIT;
END ||
 
delimiter ;
 
CALL fill_table(10000);
EXPLAIN EXTENDED 
    SELECT * FROM calculated_data WHERE `object_id` > 0 AND `d` < DATE_SUB(NOW(), INTERVAL 365 DAY);

Comment by Jizo Zef [ 2016-07-11 ]

Thank you!

P.S.
I just realized that there is a big difference in my real data and the test data related to the date field. In case of the real data, when I execute the DELETE query daily, the `d` < DATE_SUB(NOW(), INTERVAL 365 DAY) condition satisfies to 1/366 of all the records. In case of the test data it satisfies to 100% of the data because we started from 2000-01-01.

Comment by Jizo Zef [ 2016-07-11 ]

I'd like to add the slow query logs related to the issue from both the MySQL 5.0.95 and MariaDB 10.1:

MySQL 5.0.95

# Time: 160711  4:07:29
# User@Host: root[root] @ localhost []
# Query_time: 48  Lock_time: 0  Rows_sent: 0  Rows_examined: 0
SET timestamp=1468195649;
DELETE FROM `calculated_data` WHERE `object_id` > 0 AND `d` < DATE_SUB(NOW(), INTERVAL 365 DAY);

MariaDB 10.1

# Time: 160711  5:37:47
# User@Host: [SQL_SLAVE] @  []
# Thread_id: 3  Schema: xxxdb  QC_hit: No
# Query_time: 5466.650916  Lock_time: 0.000023  Rows_sent: 0  Rows_examined: 66560392
# Rows_affected: 119789
SET timestamp=1468201067;
DELETE FROM `calculated_data` WHERE `object_id` > 0 AND `d` < DATE_SUB(NOW(), INTERVAL 365 DAY);

Comment by Jizo Zef [ 2016-08-11 ]

Are there any news about this issue?

Comment by Ryan Blackledge [ 2023-08-17 ]

We're seeing this optimizer performance regression coming from mysql 5.7 in mariadb 10.6 as well.
DDL:

CREATE TABLE `metric_data` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `container_id` char(36) NOT NULL,
  `api_name` varchar(43) NOT NULL,
  `start_period` bigint(20) DEFAULT NULL,
  `value` decimal(33,15) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `metric_data_container_api_name` (`container_id`,`api_name`),
  CONSTRAINT `metric_data_ibfk_2` FOREIGN KEY (`container_id`) REFERENCES `container` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci;

Update:

delete from metric_data where (container_id, api_name) = ('f0a89a51-024f-41cf-9db4-5d3805956509', 'metric_abcd__m');

Explain Mysql:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE metric_data   ref metric_data_container_api_name metric_data_container_api_name 81 const,const 1 100.00  

Explain Mariadb:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE metric_data ALL         5315 Using where

As you can see, this delete used to use the index (and a select with the same syntax will still use the metric_data_container_api_name index). Now the delete requires a full table scan. We're able to work around it by fooling the optimizer using subqueries, but it's not ideal.
Workaround:

delete md.*
 from metric_data md,
(select md2.id from metric_data md2 where (md2.container_id, md2.api_name) = ('f0a89a51-024f-41cf-9db4-5d3805956509', 'metric_abcd__m')) t
 where md.id = t.id;

Generated at Thu Feb 08 07:41:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.