[MDEV-24621] In bulk insert, pre-sort and build indexes one page at a time - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 10.7.1
Component/s: Storage Engine - InnoDB
Labels:
- Preview_10.7
- performance

Description

In ~~MDEV-515~~, the original intention was to do 2 things to speed up INSERT to an empty table or partition:

Stop writing undo log records for each inserted record, and instead write just one saying that the table was empty.
When the table was empty, pre-sort the records for each index, and build the indexes one page at a time, instead of the current one unsorted record at a time.

The undo logging change turned out to be rather challenging on its own, because it affects MVCC and locking as well. Preventing the useless undo logging should speed up ROLLBACK and recovery, and also the purge of history, compensating for ~~MDEV-12288~~. Hence, it makes sense to complete ~~MDEV-515~~ in the limited form.

The purpose of this task is to make INSERT into an empty table or partition even more efficient by making use of sorting. We expect data loads into huge tables with many secondary indexes to become faster. The change buffer (~~MDEV-11634~~) should no longer come into play in this case. (Huge loads into nonempty tables will be unaffected by this enhancement.)

After implementation comments:
The following queries will use the new optimization:

CREATE ... SELECT
Inserting into an empty table together with SET STATEMENT unique_checks=0,foreign_key_checks=0 FOR ...
- INSERT statement
- INSERT ... SELECT
- LOAD DATA INFILE

Insert into an empty table without unique_checks=0,foreign_key_checks=0 will NOT be optimized.

Attachments

Issue Links

blocks

MDEV-26155 Mangled data after IMPORT tablespace up till assert for DDL

Open

MDEV-33087 ALTER TABLE...ALGORITHM=COPY should build indexes more efficiently

Closed

causes

MDEV-27220 Assertion `!(index)->is_spatial()' failed in BtrBulk::BtrBulk(dict_index_t*, const trx_t*)

Closed

MDEV-27312 LeakSanitizer error in trx_mod_table_time_t::start_bulk_insert

Closed

MDEV-27316 Assertion `!(index)->is_spatial()' failed.

Closed

MDEV-27318 SIGSEGV in row_merge_tuple_sort and Assertion `data_size < srv_sort_buf_size' failed in row_merge_bulk_buf_add on INSERT with unique_checks and foreign_key_checks disabled

Closed

MDEV-27858 Assertion `page_dir_get_n_heap(new_page) == 2U' failed in PageBulk::init

Closed

MDEV-27890 SIGSEGV in innobase_get_mysql_key_number_for_index and Assertion `table->magic_n == 76333786' failed in dict_table_get_first_index and InnoDB: Failing assertion: i > 0 and Assertion `!trx->check_foreigns' failed on INSERT

Closed

MDEV-27953 Assertion `!check_unique_secondary' failed in trx_t::check_bulk_buffer

Closed

MDEV-28138 MariaDB Assertion Failed in mtr_buf_t::has_space

Closed

MDEV-28242 Assertion `!check_foreigns' failed in trx_t::check_bulk_buffer

Closed

MDEV-28865 Assertion `!check_foreigns' failed in trx_t::bulk_insert_apply

Closed

MDEV-29570 LeakSanitizer error in trx_mod_table_time_t::start_bulk_insert #4

Closed

MDEV-30426 Assertion !rec_offs_nth_extern(offsets2, n) during bulk insert

Closed

MDEV-30796 Auto_increment values changed after restart

Closed

MDEV-31025 Redundant table alter fails when fixed column stored externally

Closed

is blocked by

MDEV-515 innodb bulk insert

Closed

MDEV-26623 Possible race condition between statistics and bulk insert

Closed

relates to

MDEV-16281 Implement parallel CREATE INDEX, ALTER TABLE, or bulk load

Open

MDEV-25036 use bulk insert optimization for multiple insert statements

Open

MDEV-26947 UNIQUE column checks fail in InnoDB resulting in table corruption

Closed

MDEV-27751 InnoDB: Failing assertion: !cursor->index->is_committed() in row0ins.cc line 221 (10.7+)

Closed

MDEV-27971 SIGSEGV in trx_undo_build_roll_ptr on INSERT with binary logging enabled and using XA

Closed

MDEV-28237 Assertion `0' failed in row_upd_sec_index_entry on DELETE

Closed

MDEV-28400 LeakSanitizer error in trx_mod_table_time_t::start_bulk_insert #3

Closed

MDEV-29761 Bulk insert fails to rollback during insert..select

Closed

MDEV-29989 binlog_do_db option breaks importing sql dumps

Closed

MDEV-30063 InnoDB: Failing assertion: ib_table->stat_initialized in ha_innobase::info_low()

Closed

MDEV-30489 Assertion `trx->bulk_insert' failed in innodb_prepare_commit_versioned on SET autocommit=ON

Confirmed

MDEV-31027 Assertion `!is_set() || (m_status == DA_OK_BULK && is_bulk_op())' failed in Diagnostics_area::set_ok_status on EXECUTE of prepared statement with savepoint

Open

MDEV-31298 Assertion `!check_foreigns' failed in trx_mod_table_time_t* trx_t::check_bulk_buffer(dict_table_t*), Assertion `table->skip_alter_undo || !check_unique_secondary' failed in trx_t::check_bulk_buffer

Closed

MDEV-31537 Bulk insert operation aborts the server for redundant table

Closed

MDEV-32453 Bulk insert fails to apply when trigger does insert operation

Closed

MDEV-33243 Assertion `!check_foreigns' failed in dberr_t trx_t::bulk_insert_apply_low()

Closed

MDEV-36504 Memory leak after CREATE TABLE..SELECT

Closed

MDEV-5171 Add support for --innodb-optimize-keys to mysqldump.

Closed

MDEV-16226 TRX_ID-based System Versioning refactoring

Stalled

MDEV-26740 Inplace alter rebuild increases file size

Closed

MDEV-27214 Import with disabled keys corrupts meta-data like rows, indexes, ...

Closed

MDEV-28679 After upgrade to 10.7.3-1 with enabled data-at-rest encryption unable to restore dump file.

Closed

MDEV-31667 SEGV in trx_undo_report_row_operation during some INSERT

Stalled

MDEV-31686 InnoDB: Failing assertion: heap_no < n_heap in page0zip.cc

Confirmed

MDEV-34703 Innodb bulk load : high IO, crashes on Linux, OOM on Linux, as tested with LOAD DATA INFILE

Closed

MDEV-34719 Disable purge for LOAD DATA INFILE into empty table

Closed

(11 causes, 2 is blocked by, 26 relates to)

Activity

Ascending order - Click to sort in descending order

View 9 older comments

Marko Mäkelä added a comment - 2021-09-20 11:36

I must say that it is encouraging to see such time savings for a simple benchmark that did not even include secondary indexes. With secondary indexes, I would expect the performance difference to be even bigger, between row-by-row inserts into each index, and the merge sort and bulk insert of one index at a time.

Marko Mäkelä added a comment - 2021-09-20 11:36 I must say that it is encouraging to see such time savings for a simple benchmark that did not even include secondary indexes. With secondary indexes, I would expect the performance difference to be even bigger, between row-by-row inserts into each index, and the merge sort and bulk insert of one index at a time.

Thirunarayanan Balathandayuthapani added a comment - 2021-10-01 09:15

--source include/have_innodb.inc

--source include/have_sequence.inc

create table t1(f1 int not null primary key, b char(255) CHARACTER SET utf8)engine=innodb;

INSERT INTO t1(f1) SELECT * FROM seq_1_to_1000000;

--source include/restart_mysqld.inc - Here t1 size is 32 MB

alter table t1 force, algorithm=inplace;

--source include/restart_mysqld.inc - Here t1 size is 36 MB

drop table t1;

Inplace alter table code increases the file size of the table by 4MB. Bulk insert code uses the same code path as inplace alter code. This file size
increases issue should affect 10.2+ onwards.

Thirunarayanan Balathandayuthapani added a comment - 2021-10-01 09:15 --source include/have_innodb.inc --source include/have_sequence.inc create table t1(f1 int not null primary key, b char(255) CHARACTER SET utf8)engine=innodb; INSERT INTO t1(f1) SELECT * FROM seq_1_to_1000000; --source include/restart_mysqld.inc - Here t1 size is 32 MB alter table t1 force, algorithm=inplace; --source include/restart_mysqld.inc - Here t1 size is 36 MB drop table t1; Inplace alter table code increases the file size of the table by 4MB. Bulk insert code uses the same code path as inplace alter code. This file size increases issue should affect 10.2+ onwards.

Marko Mäkelä added a comment - 2021-10-08 12:07

thiru, I got some data from axel regarding a performance regression, but it did not have good stack traces. I reread the code changes, and I suspect that the performance regression is due to the SQL layer change that also serg asked about. Could we replace innodb_bulk_insert_write() and its caller with a small addition to ha_innobase::reset()?

Marko Mäkelä added a comment - 2021-10-08 12:07 thiru , I got some data from axel regarding a performance regression, but it did not have good stack traces. I reread the code changes, and I suspect that the performance regression is due to the SQL layer change that also serg asked about. Could we replace innodb_bulk_insert_write() and its caller with a small addition to ha_innobase::reset() ?

Marko Mäkelä added a comment - 2021-10-15 09:59

This is still causing some performance regression for normal DML workload in a case when bulk loading was not used at all.

Marko Mäkelä added a comment - 2021-10-15 09:59 This is still causing some performance regression for normal DML workload in a case when bulk loading was not used at all.

Marko Mäkelä added a comment - 2021-10-22 15:40

Performance seems to be acceptable now.

Marko Mäkelä added a comment - 2021-10-22 15:40 Performance seems to be acceptable now.

People

Assignee:: Thirunarayanan Balathandayuthapani

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 2021-01-19 07:32

Updated:: 2025-04-07 08:57

Resolved:: 2021-10-26 12:18

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server