[MDEV-12928] PERFORMANCE issue with INSERT and out of order PKs Created: 2017-05-26 Updated: 2017-06-04 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | Robert Dyas | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Description |
|
If you have a medium size table, say 100,000 rows, and you issue a single INSERT statement with a VALUES clause containing 5,000 or so short rows (3 columns of bigint) and the primary keys are in ascending order it might take X time to insert. If the keys are out of order, it might take 10x+ the time to insert. However, if you insert out of order keys using INSERT INTO t1 (x,y,z) SELECT a,b,c FROM t2 you can process 100,000 rows with out of order keys in <x time (more than 10x as fast). It seems that when inserting with INSERT INTO t1 (x,y,z) SELECT a,b,c FROM t2 all the unique checks and index building is done once where as if you insert multiple rows in a single INSERT INTO t1 (x,y,z) VALUES (1,2,3), (7,8,9), (4,5,6) the rows are processed MUCH more slowly. |
| Comments |
| Comment by Sergei Golubchik [ 2017-05-26 ] | ||||||
|
It's normal for B-trees to be much faster when keys are inserted in order, it's inherent property of this data structure. Perhaps your INSERT ... SELECT inserts keys in order? | ||||||
| Comment by Robert Dyas [ 2017-05-26 ] | ||||||
|
Yes, when the keys are in order there is not a problem with INSERT ... VALUES (1,2,3),(4,5,6) the problem is only when they are out of order. However, the INSERT ... SELECT is inserting them OUT of order and is still very fast. This is why I thought it would be possible to optimize. Would it be legitimate for MariaDB to presort VALUES (1,2,3),(4,5,6) before inserting? Our use case is interactive web import, so we don't control the order of the keys. | ||||||
| Comment by Marko Mäkelä [ 2017-05-30 ] | ||||||
|
rdyas, could you construct and upload a script that demonstrates the difference between multi-row `INSERT…VALUES` and `INSERT…SELECT`? Also, it could be advisable to try the following:
See also | ||||||
| Comment by Robert Dyas [ 2017-05-30 ] | ||||||
|
Actually, upon review, my previous statement is not exactly correct. Although the original rows used with INSERT...SELECT are out of order, there is an ORDER BY on that SELECT that puts them in order before INSERT gets a swipe at them. Would it be valid to order the rows in a single INSERT...VALUES with multiple rows by the PK values before inserting? This would fix the performance issue, but I'm not sure that is valid. For example, if the row order is changed then triggers might fire in an unexpected way, correct? | ||||||
| Comment by Robert Dyas [ 2017-06-04 ] | ||||||
|
Note that INSERT...VALUES with a few thousand rows in the values clause (say chunks of 6,000) will essentially error out (long stall at about 60,000 out of 130,000 records) but if you use INSERT...VALUES...ON DUPLICATE KEY... it runs just fine. This is on a table that has zero rows to start. I've attached a sample CSV to import that demonstrates the issue. Just create a simple table with 4 bigint columns with id being the PK. The problem now appears to have something to do with reporting of duplicate rows when the VALUES clause is long. For example, purposely trying to INSERT...VALUES with duplicates produces a greater than 60 second wait to report back regarding even the first row being a duplicate. So this is really a bug, not a performance issue, correct? |