[MDEV-25036] use bulk insert optimization for multiple insert statements - Jira

Details

Type: Task
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Data Manipulation - Insert, Storage Engine - InnoDB
Labels:
- performance

Description

After ~~MDEV-515~~ InnoDB can handle bulk inserts better. But still that optimization only applies to the first INSERT into the empty table. It works well for INSERT...SELECT and LOAD DATA, but, for example, mysqldump generated dumps contain a series of INSERT statements, and that usage is still poorly optimized.

The problem here is that in the bulk insert mode InnoDB cannot rollback individual statements, all the work done in the bulk insert mode is atomic, as a whole.

There should either be some way to express it in SQL or make sure there will be no rollbacks.

Another complication is that in the bulk insert mode InnoDB locks the whole table, not individual rows.

A typical sql dump looks like

/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;

/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;

LOCK TABLES `t1` WRITE;

/*!40000 ALTER TABLE `t1` DISABLE KEYS */;

INSERT INTO `t1` VALUES (...), (...), ...

/*!40000 ALTER TABLE `t1` ENABLE KEYS */;

UNLOCK TABLES;

/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;

/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;

It seems that with UNIQUE_CHECKS=0 and FOREIGN_KEY_CHECKS=0 there should be no rollbacks. But with CHECK constraints there still should be? And as inserts are done under LOCK TABLES, InnoDB fine grained row level locking is wasted here. So, it seems to be possible to get ~~MDEV-515~~ to work with no syntax extensions in the specific case of mysqldump output.

To make it work generally we'll need a new syntax that will mean "this block of work can only be rolled back as a whole"

A standard approach would be something like

BEGIN ATOMIC;

DECLARE UNDO HANDLER FOR SQLEXCEPTION RESIGNAL;

which means pretty much exactly what we want to do. But the syntax is, perhaps, too awkward?

Non-standard suggestions, copied from ~~MDEV-24818~~:

START TRANSACTION ATOMIC;

--

START TRANSACTION WITH ATOMIC ROLLBACK;

--

START TRANSACTION FOR BULK LOAD;

Attachments

Issue Links

relates to

MDEV-26947 UNIQUE column checks fail in InnoDB resulting in table corruption

Closed

MDEV-31985 IGNORE in INSERT and LOAD are being ignored in InnoDB bulk insert

Confirmed

MDEV-515 innodb bulk insert

Closed

MDEV-24621 In bulk insert, pre-sort and build indexes one page at a time

Closed

MDEV-24818 Concurrent use of InnoDB table is impossible until the first transaction is finished

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2021-03-03 13:48

The setting unique_checks=0 only has an effect on the InnoDB change buffer. If InnoDB decides to buffer an insert into a unique secondary index (with the default unique_checks=1 it can never even be attempted), then the unique key violation would go unnoticed until some later point of time when the buffered change is actually merged and the corruption of the index might be noticed, possibly leading to a crash of the server, or some funny symptoms like ~~MDEV-9663~~. I think that for exactly this reason, unique_checks should be have been deprecated and removed years ago, or at least ignored by InnoDB.

My guess is that the motivation of the setting unique_checks=0 was to allow change buffering to speed up large data loads, in case the user really knows that no duplicate keys can occur. A much better speedup would be provided by ~~MDEV-24621~~.

That said, InnoDB could theoretically interpret the combination of unique_checks=0 and foreign_key_checks=0 to enable the bulk insert optimization. After all, the user swore that there cannot be any errors, hence we should be free to do what we like if those errors are present.

Marko Mäkelä added a comment - 2021-03-03 13:48 The setting unique_checks=0 only has an effect on the InnoDB change buffer. If InnoDB decides to buffer an insert into a unique secondary index (with the default unique_checks=1 it can never even be attempted), then the unique key violation would go unnoticed until some later point of time when the buffered change is actually merged and the corruption of the index might be noticed, possibly leading to a crash of the server, or some funny symptoms like MDEV-9663 . I think that for exactly this reason, unique_checks should be have been deprecated and removed years ago, or at least ignored by InnoDB. My guess is that the motivation of the setting unique_checks=0 was to allow change buffering to speed up large data loads, in case the user really knows that no duplicate keys can occur. A much better speedup would be provided by MDEV-24621 . That said, InnoDB could theoretically interpret the combination of unique_checks=0 and foreign_key_checks=0 to enable the bulk insert optimization. After all, the user swore that there cannot be any errors, hence we should be free to do what we like if those errors are present.

Marko Mäkelä added a comment - 2021-08-17 09:19

thiru pointed out that if ~~MDEV-24621~~ is to support multiple INSERT statements per transaction (which currently is possible thanks to the ~~MDEV-24818~~ hack of SET unique_checks=0, foreign_key_checks=0), we would have to report duplicate key errors after the INSERT statement was already executed, for example, during a subsequent non-INSERT statement. Because reporting a duplicate key error for a previous statement could be extremely confusing, I think that it could be easiest to allow multi-statement INSERT operation of ~~MDEV-24621~~ only when the transaction has been started in a special mode.

Marko Mäkelä added a comment - 2021-08-17 09:19 thiru pointed out that if MDEV-24621 is to support multiple INSERT statements per transaction (which currently is possible thanks to the MDEV-24818 hack of SET unique_checks=0, foreign_key_checks=0 ), we would have to report duplicate key errors after the INSERT statement was already executed, for example, during a subsequent non- INSERT statement. Because reporting a duplicate key error for a previous statement could be extremely confusing, I think that it could be easiest to allow multi-statement INSERT operation of MDEV-24621 only when the transaction has been started in a special mode.

Roel Van de Paar added a comment - 2021-11-04 08:27

When this is being implemented, please also improve the cryptic error message currently seen when executing:

SET autocommit=0,foreign_key_checks=0,unique_checks=0;

CREATE TABLE t (c1 INT KEY,c2 INT UNIQUE) ENGINE=InnoDB;

BEGIN;

INSERT INTO t VALUES (1,0),(2,0);

Leading to:

10.8.0 ccb345e2a3616590ea741830ded1b7f645639de0 (Optimized)
10.8.0-opt>SET autocommit=0,foreign_key_checks=0,unique_checks=0;
Query OK, 0 rows affected (0.000 sec)

10.8.0-opt>CREATE TABLE t (c1 INT KEY,c2 INT UNIQUE) ENGINE=InnoDB;
Query OK, 0 rows affected (0.018 sec)

10.8.0-opt>BEGIN;
Query OK, 0 rows affected (0.000 sec)

10.8.0-opt>INSERT INTO t VALUES (1,0),(2,0);
ERROR 1180 (HY000): Got error 1 "Operation not permitted" during COMMIT

History: see ~~MDEV-26947~~

Roel Van de Paar added a comment - 2021-11-04 08:27 When this is being implemented, please also improve the cryptic error message currently seen when executing: SET autocommit=0,foreign_key_checks=0,unique_checks=0; CREATE TABLE t (c1 INT KEY ,c2 INT UNIQUE ) ENGINE=InnoDB; BEGIN ; INSERT INTO t VALUES (1,0),(2,0); Leading to: 10.8.0 ccb345e2a3616590ea741830ded1b7f645639de0 (Optimized) 10.8.0-opt>SET autocommit=0,foreign_key_checks=0,unique_checks=0; Query OK, 0 rows affected (0.000 sec) 10.8.0-opt>CREATE TABLE t (c1 INT KEY,c2 INT UNIQUE) ENGINE=InnoDB; Query OK, 0 rows affected (0.018 sec) 10.8.0-opt>BEGIN; Query OK, 0 rows affected (0.000 sec) 10.8.0-opt>INSERT INTO t VALUES (1,0),(2,0); ERROR 1180 (HY000): Got error 1 "Operation not permitted" during COMMIT History: see MDEV-26947

Michael Widenius added a comment - 2023-12-20 11:19

We want something that is trivial to use and highly compatible with old applications.
There is no benefit for the user (or the server) to set unique_checks=0.

Users do not want to disable unique checks.
There is almost zero cost to check for duplicate keys as part of creating an unique index.
(MyISAM has done that for 40 years...)

I suggest the following 'easy solution' that would allow new users and old application to benefit
of the new optimization

Add an option to "start transaction' like 'atomic' that will enable the optimzation for empty tables.
Add a session variable "innodb-optimize-insert-into-empty-table' (or similar) which enables optimization of insert into empty table.
The optimzation should happen if either of the two options is used.

I would also like to suggest that the first auto-commit insert into a table would also use this optimization.
This would make CREATE TABLE ; LOAD DATA INFILE fast!

I would expect that most users would use innodb-optimize-insert-into-empty-table=1 because of it's great benefits, in spite of
the problem outlined in ~~MDEV-24818~~ Concurrent use of InnoDB table is impossible until the first transaction is finished

Michael Widenius added a comment - 2023-12-20 11:19 We want something that is trivial to use and highly compatible with old applications. There is no benefit for the user (or the server) to set unique_checks=0. Users do not want to disable unique checks. There is almost zero cost to check for duplicate keys as part of creating an unique index. (MyISAM has done that for 40 years...) I suggest the following 'easy solution' that would allow new users and old application to benefit of the new optimization Add an option to "start transaction' like 'atomic' that will enable the optimzation for empty tables. Add a session variable "innodb-optimize-insert-into-empty-table' (or similar) which enables optimization of insert into empty table. The optimzation should happen if either of the two options is used. I would also like to suggest that the first auto-commit insert into a table would also use this optimization. This would make CREATE TABLE ; LOAD DATA INFILE fast! I would expect that most users would use innodb-optimize-insert-into-empty-table=1 because of it's great benefits, in spite of the problem outlined in MDEV-24818 Concurrent use of InnoDB table is impossible until the first transaction is finished

MariaDB Server

use bulk insert optimization for multiple insert statements

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration