[MDEV-515] innodb bulk insert - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 10.6.0
Component/s: Storage Engine - InnoDB
Labels:
- performance

Description

The original description of this task read as follows:

as a solution for bulk-inserting huge amount of data into innodb, we consider an utility that creates exported innodb tablespaces.

Normally one needs to create a table in InnoDB or XtraDB and then "export" it. But we could try to create an exported tablespace with a table not by actually exporting it but directly from the data, say, from XML or CSV.

In the end, we decided to optimize the INSERT code path inside InnoDB. There are many different InnoDB ROW_FORMAT, and we do not have instant import until MDEV-11658 gets some thought. Besides, if we make INSERT faster, also some normal workloads will benefit.

We implement an idea that was suggested by monty in October 2017: When InnoDB is inserting into an empty table or partition, we can write a single undo log record TRX_UNDO_EMPTY to cover the operation. ROLLBACK would clear the table when undoing this record.

For this to work, the insert into an empty table or partition must be covered by an exclusive table lock that will be held until the transaction has been committed or rolled back, or until a partial rollback that affects the INSERT operation (so that the table is empty again).

Concurrent non-locking reads must be adjusted: If the read view was created before the INSERT into an empty table was committed, then we must continue to imagine that the table is empty, and not try to read any records. If the read view was created after the COMMIT of the INSERT, then all records must be visible normally.

This special handling only applies to the very first INSERT statement of a transaction for the empty table or partition. If a subsequent statement in the transaction is modifying the initially empty table again, we must enable row-level undo logging, so that we will be able to roll back to the start of the statement in case of an error (such as duplicate key).

INSERT IGNORE will continue to use row-level logging and locking, because implementing it would require the ability to roll back the latest row, while our table-level logging only allows the entire statement to be rolled back.

Future work:

~~MDEV-24621~~ will make such INSERT even faster, by sorting the records for each index separately and by building indexes one page at a time.
~~MDEV-24622~~ will enable this feature on replicas (slaves).
~~MDEV-24623~~ will enable this feature on Galera cluster.

Attachments

Issue Links

blocks

MDEV-23484 Rollback unnecessarily acquires dict_operation_lock for every row

Closed

MDEV-24621 In bulk insert, pre-sort and build indexes one page at a time

Closed

causes

MDEV-24622 Replication does not support bulk insert into empty table

Closed

MDEV-24623 Galera does not support MDEV-515 bulk insert

Closed

MDEV-24700 Assertion `"lock not found" == 0' failed in lock_table_x_unlock

Closed

MDEV-24715 Assertion `!node->table->skip_alter_undo' failed in row_upd_clust_rec on CREATE

Closed

MDEV-24781 Assertion `mode == 16 || mode == 12 || fix_block->page.status != buf_page_t::FREED' failed in buf_page_get_low

Closed

MDEV-24818 Concurrent use of InnoDB table is impossible until the first transaction is finished

Closed

MDEV-24832 Root page AHI Removal fails fails during bulk index rollback

Closed

MDEV-24905 Assertion `!m.first->second.is_bulk_insert()' failed in trx_undo_report_row_operation

Closed

MDEV-24993 pymysql - executemany test failing - lock timeout - insert wait on non-indexed table

Closed

MDEV-25010 Assertion `!lock_sys_t::get_first(receiver_cell, receiver_id, receiver_heap_no)' failed in lock_rec_move

Closed

MDEV-25297 InnoDB: Failing assertion: trx->roll_limit <= trx->undo_no in trx_rollback_start

Closed

MDEV-25315 Assertion `ptr' failed in ut_align_down | SIGSEGV in dict_table_t::not_redundant | Assertion `!lock_sys_t::get_first(receiver_cell, receiver_id, receiver_heap_no)' failed in lock_rec_move

Closed

MDEV-25401 Assertion `!is_set() || (m_status == DA_OK_BULK && is_bulk_op())' failed in Diagnostics_area::set_ok_status on SAVEPOINT

Closed

MDEV-25487 Assertion `!lock_sys_t::get_first(receiver_cell, receiver_id, receiver_heap_no)' failed in lock_rec_move

Closed

MDEV-25496 Assertion `trx->bulk_insert' failed in trx_undo_report_row_operation on INSERT

Closed

MDEV-25534 Assertion `lock_table_has(trx, index->table, LOCK_IX)' failed in lock_rec_insert_check_and_lock on INSERT

Closed

MDEV-26121 [Note] InnoDB: Resetting invalid page [page id: space=5, page number=3] type 17855 to 6.

Closed

MDEV-26623 Possible race condition between statistics and bulk insert

Closed

MDEV-28037 Assertion `trx->bulk_insert' failed in innodb_prepare_commit_versioned

Closed

MDEV-28122 OPTIMIZE TABLE crash

Closed

MDEV-31835 LOAD DATA into InnoDB w/partitions: huge performance loss, affected 10.6+

Closed

MDEV-31861 Empty INSERT crashes with innodb_force_recovery=6 or innodb_read_only=ON

Closed

duplicates

MDEV-5835 ibulkload

Closed

includes

MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPY

Closed

relates to

MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC

Closed

MDEV-16329 Engine-independent online ALTER TABLE

Closed

MDEV-20804 Use the sequence engine in main.index_merge_innodb and main.index_merge_myisam

Closed

MDEV-23805 Make Online DDL to Instant DDL when table is empty

Closed

MDEV-24859 Assertion `static_cast<ins_node_t*>(thr->run_node)->bulk_insert' failed in trx_undo_report_row_operation

Closed

MDEV-25036 use bulk insert optimization for multiple insert statements

Open

MDEV-28327 InnoDB persistent statistics fail to update after bulk insert

Closed

MDEV-29545 InnoDB replace statement returns "can't find record" error during bulk insert operation.

Closed

MDEV-30063 InnoDB: Failing assertion: ib_table->stat_initialized in ha_innobase::info_low()

Closed

MDEV-32648 InnoDB: Failing assertion: old_rec in storage/innobase/btr/btr0pcur.cc line 366

Open

MDEV-4299 Portable Tablespace

Closed

MDEV-5171 Add support for --innodb-optimize-keys to mysqldump.

Closed

MDEV-9459 Truncate table causes innodb stalls

Closed

MDEV-11634 Improve the InnoDB change buffer

Closed

MDEV-14094 benchmark effects of innodb change buffer

Closed

MDEV-14602 Reduce malloc()/free() usage in InnoDB

Confirmed

MDEV-23017 range query performance regression in 10.5.4

Closed

MDEV-24716 Assertion `thr->graph->trx->id == trx_read_trx_id( static_cast<const byte*>( trx_id->data)) || static_cast<ins_node_t*>( thr->run_node)->bulk_insert' failed in btr_cur_optimistic_insert

Closed

MDEV-25064 rpl.rpl_parallel_temptable failed in bb, ASAN heap-use-after-free in std::__atomic_base<long>::store

Closed

MDEV-27214 Import with disabled keys corrupts meta-data like rows, indexes, ...

Closed

MDEV-29975 Assertion `m_status == DA_ERROR' failed in Diagnostics_area::sql_errno on SAVEPOINT, Got error 1, Got error 153, Assertion `!is_set() || (m_status == DA_OK_BULK && is_bulk_op())' failed.

Closed

MDEV-32250 To benefit from MDEV-515 , please make option --no-autocommit default in mysqldump from 10.6 onwards

Closed

MDEV-34719 Disable purge for LOAD DATA INFILE into empty table

Closed

mentioned in: Page Loading...

(19 causes, 1 duplicates, 1 includes, 23 relates to, 1 mentioned in)

Activity

Ascending order - Click to sort in descending order

View 40 older comments

Vladislav Vaintroub added a comment - 2021-01-26 20:57

Note, that size of single INSERT statement is limited by the max_allowed_packet, which is 16MB by default.

Vladislav Vaintroub added a comment - 2021-01-26 20:57 Note, that size of single INSERT statement is limited by the max_allowed_packet, which is 16MB by default.

Noel Diaz added a comment - 2021-03-03 20:23 - edited

If you keep the title, this super downgraded scope becomes highly misleading. Please update the title accordingly there's not a thing 'bulk' about this.

Noel Diaz added a comment - 2021-03-03 20:23 - edited If you keep the title, this super downgraded scope becomes highly misleading. Please update the title accordingly there's not a thing 'bulk' about this.

Marko Mäkelä added a comment - 2021-03-05 15:58

In ~~MDEV-24818~~, my plan is to implement an opt-in mechanism for this, to allow faster processing of mysqldump output:

SET foreign_key_checks=0, unique_checks=0;

A nicer mechanism will be provided in MDEV-25036.

When we require the user to explicitly enable the table-level locking and undo logging, we can cover multiple INSERT into an initially empty table.

In my opinion, the fact that we disable row-level undo logging and thus reduce the purge of history does make this a ‘bulk’ operation, even though the originally planned scope was reduced by filing ~~MDEV-24621~~ for the final part.

Marko Mäkelä added a comment - 2021-03-05 15:58 In MDEV-24818 , my plan is to implement an opt-in mechanism for this, to allow faster processing of mysqldump output: SET foreign_key_checks=0, unique_checks=0; A nicer mechanism will be provided in MDEV-25036 . When we require the user to explicitly enable the table-level locking and undo logging, we can cover multiple INSERT into an initially empty table. In my opinion, the fact that we disable row-level undo logging and thus reduce the purge of history does make this a ‘bulk’ operation, even though the originally planned scope was reduced by filing MDEV-24621 for the final part.

Noel Diaz added a comment - 2021-03-05 18:11

Although I appreciate the intent. The original description talked about "generating" transportable tablespaces and that is a huge leap comparatively.
Imagine having to import 4TB (as we do today) it would have brought import speeds to another level.

Noel Diaz added a comment - 2021-03-05 18:11 Although I appreciate the intent. The original description talked about "generating" transportable tablespaces and that is a huge leap comparatively. Imagine having to import 4TB (as we do today) it would have brought import speeds to another level.

Marko Mäkelä added a comment - 2021-05-26 16:32

noel.diaz, once MDEV-11658 changes the .ibd file format so that the combination of .frm and .ibd files will be self-descriptive and allows the files to be ‘attached’ instantly, we could implement the originally envisioned tool. Changing the file format requires significant engineering effort.

Marko Mäkelä added a comment - 2021-05-26 16:32 noel.diaz , once MDEV-11658 changes the .ibd file format so that the combination of .frm and .ibd files will be self-descriptive and allows the files to be ‘attached’ instantly, we could implement the originally envisioned tool. Changing the file format requires significant engineering effort.

People

Assignee:: Marko Mäkelä

Reporter:: Sergei Golubchik

Votes:: 10 Vote for this issue

Watchers:: 22 Start watching this issue

Dates

Created:: 2012-09-07 18:44

Updated:: 2024-08-07 12:08

Resolved:: 2021-01-25 16:44

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server