[MDEV-12928] PERFORMANCE issue with INSERT and out of order PKs Created: 2017-05-26  Updated: 2017-06-04

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: None

Type: Task Priority: Major
Reporter: Robert Dyas Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Microsoft Word BOM 2017 2.0.csv    

 Description   

If you have a medium size table, say 100,000 rows, and you issue a single INSERT statement with a VALUES clause containing 5,000 or so short rows (3 columns of bigint) and the primary keys are in ascending order it might take X time to insert. If the keys are out of order, it might take 10x+ the time to insert.

However, if you insert out of order keys using INSERT INTO t1 (x,y,z) SELECT a,b,c FROM t2 you can process 100,000 rows with out of order keys in <x time (more than 10x as fast).

It seems that when inserting with INSERT INTO t1 (x,y,z) SELECT a,b,c FROM t2 all the unique checks and index building is done once where as if you insert multiple rows in a single INSERT INTO t1 (x,y,z) VALUES (1,2,3), (7,8,9), (4,5,6) the rows are processed MUCH more slowly.



 Comments   
Comment by Sergei Golubchik [ 2017-05-26 ]

It's normal for B-trees to be much faster when keys are inserted in order, it's inherent property of this data structure.

Perhaps your INSERT ... SELECT inserts keys in order?

Comment by Robert Dyas [ 2017-05-26 ]

Yes, when the keys are in order there is not a problem with INSERT ... VALUES (1,2,3),(4,5,6) the problem is only when they are out of order.

However, the INSERT ... SELECT is inserting them OUT of order and is still very fast. This is why I thought it would be possible to optimize. Would it be legitimate for MariaDB to presort VALUES (1,2,3),(4,5,6) before inserting?

Our use case is interactive web import, so we don't control the order of the keys.

Comment by Marko Mäkelä [ 2017-05-30 ]

rdyas, could you construct and upload a script that demonstrates the difference between multi-row `INSERT…VALUES` and `INSERT…SELECT`?
I cannot think of anything that would explain such a difference in InnoDB.
Note: If there are multiple INSERT statements, you probably want to start the batch with BEGIN and end with COMMIT, so that InnoDB transaction commit overhead (innodb_flush_log_at_trx_commit=1) is paid only once.

Also, it could be advisable to try the following:

CREATE TABLE t …; # without any secondary indexes
BEGIN;
INSERT INTO t VALUES …;
COMMIT;
ALTER TABLE t ADD KEY (a), ADD KEY (b), …, ALGORITHM=INPLACE;

See also MDEV-515 (InnoDB bulk insert).

Comment by Robert Dyas [ 2017-05-30 ]

Actually, upon review, my previous statement is not exactly correct. Although the original rows used with INSERT...SELECT are out of order, there is an ORDER BY on that SELECT that puts them in order before INSERT gets a swipe at them.

Would it be valid to order the rows in a single INSERT...VALUES with multiple rows by the PK values before inserting? This would fix the performance issue, but I'm not sure that is valid. For example, if the row order is changed then triggers might fire in an unexpected way, correct?

Comment by Robert Dyas [ 2017-06-04 ]

Note that INSERT...VALUES with a few thousand rows in the values clause (say chunks of 6,000) will essentially error out (long stall at about 60,000 out of 130,000 records) but if you use INSERT...VALUES...ON DUPLICATE KEY... it runs just fine. This is on a table that has zero rows to start.

I've attached a sample CSV to import that demonstrates the issue.
BOM 2017 2.0.csv

Just create a simple table with 4 bigint columns with id being the PK.

The problem now appears to have something to do with reporting of duplicate rows when the VALUES clause is long. For example, purposely trying to INSERT...VALUES with duplicates produces a greater than 60 second wait to report back regarding even the first row being a duplicate.

So this is really a bug, not a performance issue, correct?

Generated at Thu Feb 08 08:01:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.