[MDEV-24621] In bulk insert, pre-sort and build indexes one page at a time - Jira

Marko Mäkelä created issue - 2021-01-19 07:32

Marko Mäkelä made changes - 2021-01-19 07:32

Field	Original Value	New Value
Link		This issue is blocked by ~~MDEV-515~~ [ ~~MDEV-515~~ ]

Marko Mäkelä made changes - 2021-02-09 08:29

Link

This issue relates to ~~MDEV-5171~~ [ ~~MDEV-5171~~ ]

Marko Mäkelä made changes - 2021-03-03 13:48

Link

This issue relates to MDEV-25036 [ MDEV-25036 ]

Marko Mäkelä made changes - 2021-07-20 15:30

Link

This issue blocks MDEV-26155 [ MDEV-26155 ]

Ralf Gebhardt made changes - 2021-07-28 14:16

Priority

Major [ 3 ]

Critical [ 2 ]

Thirunarayanan Balathandayuthapani made changes - 2021-08-03 13:06

Status

Open [ 1 ]

In Progress [ 3 ]

Marko Mäkelä made changes - 2021-08-17 09:19

Link

This issue is blocked by MDEV-25036 [ MDEV-25036 ]

Marko Mäkelä made changes - 2021-08-17 13:51

Link

This issue is blocked by MDEV-25036 [ MDEV-25036 ]

Marko Mäkelä added a comment - 2021-08-17 14:00

Before MDEV-25036 has been implemented, I think that we must disable this optimization for a subsequent INSERT statements into an initially empty table. That is, for a multi-INSERT transaction into an initially empty table, we will retain the table-level undo logging as per ~~MDEV-515~~, but we will apply this optimization only to the first INSERT: At the end of the first statement that inserts into an empty table, we will complete the sort, and report any duplicate key errors.

After MDEV-25036, we could improve things in subsequent task(s) as follows:

We would keep buffering rows from potentially multiple INSERT statements until we receive a statement that forces us to apply the changes.
- Examples would include XA PREPARE, COMMIT, and anything that reads from the table (including UPDATE or DELETE).
- Duplicate key errors or other constraint violations would be reported at this point only.
If the special START TRANSACTION statement indicates so, we may assume that the data is already sorted by PRIMARY KEY and skip the sorting altogether. If we find that the order is violated, we would report an error and roll back the transaction.

Marko Mäkelä added a comment - 2021-08-17 14:00 Before MDEV-25036 has been implemented, I think that we must disable this optimization for a subsequent INSERT statements into an initially empty table. That is, for a multi- INSERT transaction into an initially empty table, we will retain the table-level undo logging as per MDEV-515 , but we will apply this optimization only to the first INSERT : At the end of the first statement that inserts into an empty table, we will complete the sort, and report any duplicate key errors. After MDEV-25036 , we could improve things in subsequent task(s) as follows: We would keep buffering rows from potentially multiple INSERT statements until we receive a statement that forces us to apply the changes. Examples would include XA PREPARE , COMMIT , and anything that reads from the table (including UPDATE or DELETE ). Duplicate key errors or other constraint violations would be reported at this point only. If the special START TRANSACTION statement indicates so, we may assume that the data is already sorted by PRIMARY KEY and skip the sorting altogether. If we find that the order is violated, we would report an error and roll back the transaction.

Thirunarayanan Balathandayuthapani added a comment - 2021-08-30 07:41 - edited

Patch is in bb-10.7-MDEV-24621

Thirunarayanan Balathandayuthapani added a comment - 2021-08-30 07:41 - edited Patch is in bb-10.7- MDEV-24621

Thirunarayanan Balathandayuthapani made changes - 2021-08-30 07:41

Assignee	Thirunarayanan Balathandayuthapani [ thiru ]	Marko Mäkelä [ marko ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Marko Mäkelä added a comment - 2021-09-01 10:32

I posted the first batch of review comments.

I think that the interfaces are roughly correct. I think that we must be careful about overhead during DML operation. Error handling needs to be improved too.

Marko Mäkelä added a comment - 2021-09-01 10:32 I posted the first batch of review comments . I think that the interfaces are roughly correct. I think that we must be careful about overhead during DML operation. Error handling needs to be improved too.

Marko Mäkelä made changes - 2021-09-01 10:32

Assignee	Marko Mäkelä [ marko ]	Thirunarayanan Balathandayuthapani [ thiru ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Thirunarayanan Balathandayuthapani made changes - 2021-09-06 09:49

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Thirunarayanan Balathandayuthapani made changes - 2021-09-06 09:49

Assignee	Thirunarayanan Balathandayuthapani [ thiru ]	Marko Mäkelä [ marko ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Sergei Golubchik added a comment - 2021-09-07 19:43

Does it really have to add a new handlerton method and extend the server's commit code ?
It seems that InnoDB can do it on commit internally.

Sergei Golubchik added a comment - 2021-09-07 19:43 Does it really have to add a new handlerton method and extend the server's commit code ? It seems that InnoDB can do it on commit internally.

Marko Mäkelä added a comment - 2021-09-08 11:39

serg, until MDEV-25036 has been implemented, we must finalize bulk insert at the end of each statement that was the first one to insert into an empty table.

Even after MDEV-25036 is implemented, the finalization must take place before either handlerton::prepare_commit_versioned or handlerton::prepare are invoked. It would be cleaner to have a "pre-prepare" step. That could be handy not only for handlerton::prepare_commit_versioned calls, but also for a new one that we might need in MDEV-24608.

Marko Mäkelä added a comment - 2021-09-08 11:39 serg , until MDEV-25036 has been implemented, we must finalize bulk insert at the end of each statement that was the first one to insert into an empty table. Even after MDEV-25036 is implemented, the finalization must take place before either handlerton::prepare_commit_versioned or handlerton::prepare are invoked. It would be cleaner to have a "pre-prepare" step. That could be handy not only for handlerton::prepare_commit_versioned calls, but also for a new one that we might need in MDEV-24608 .

Sergei Golubchik made changes - 2021-09-14 19:34

Priority

Critical [ 2 ]

Major [ 3 ]

Sergei Golubchik made changes - 2021-09-15 18:03

Link

This issue relates to TODO-3118 [ TODO-3118 ]

Marko Mäkelä made changes - 2021-09-16 12:15

Link

This issue is blocked by ~~MDEV-26623~~ [ ~~MDEV-26623~~ ]

Marko Mäkelä added a comment - 2021-09-16 12:21

An incomplete work-around for ~~MDEV-26623~~ was included in this branch. I plan to merge ~~MDEV-26623~~ and revert the workaround as soon as possible.

Marko Mäkelä added a comment - 2021-09-16 12:21 An incomplete work-around for MDEV-26623 was included in this branch. I plan to merge MDEV-26623 and revert the workaround as soon as possible.

Marko Mäkelä added a comment - 2021-09-17 18:21

The ~~MDEV-26623~~ fix has now been merged to the preview branch.

Marko Mäkelä added a comment - 2021-09-17 18:21 The MDEV-26623 fix has now been merged to the preview branch.

Marko Mäkelä added a comment - 2021-09-20 09:57

I wanted to perform a small performance test, similar to what I conducted for ~~MDEV-515~~ earlier. Unfortunately, this test will crash the preview branch if the first SET statement is present:

--source include/have_innodb.inc

--source include/have_sequence.inc

--source include/not_debug.inc

SET unique_checks=0, foreign_key_checks=0;

SET @freq=@@GLOBAL.innodb_purge_rseg_truncate_frequency;

SET GLOBAL innodb_purge_rseg_truncate_frequency=1;

CREATE TABLE t (a INT PRIMARY KEY, b CHAR(255) CHARACTER SET utf8)

ENGINE=InnoDB ROW_FORMAT=REDUNDANT;

let $n=1000000;

--source include/wait_all_purged.inc

SET profiling_history_size=100;

SET profiling = 1;

SELECT variable_value INTO @old_lsn FROM INFORMATION_SCHEMA.GLOBAL_STATUS

WHERE variable_name = 'INNODB_LSN_CURRENT';

BEGIN;

eval INSERT INTO t(a) SELECT * FROM seq_1_to_$n;

SELECT (@lsn:= variable_value)-@old_lsn

FROM INFORMATION_SCHEMA.GLOBAL_STATUS

WHERE variable_name = 'INNODB_LSN_CURRENT';

ROLLBACK;

SET @old_lsn=@lsn;

SELECT (@lsn:= variable_value)-@old_lsn

FROM INFORMATION_SCHEMA.GLOBAL_STATUS

WHERE variable_name = 'INNODB_LSN_CURRENT';

SELECT * FROM t;

eval INSERT INTO t(a) SELECT * FROM seq_1_to_$n;

--source include/wait_all_purged.inc

SET @old_lsn=@lsn;

SELECT (@lsn:= variable_value)-@old_lsn

FROM INFORMATION_SCHEMA.GLOBAL_STATUS

WHERE variable_name = 'INNODB_LSN_CURRENT';

TRUNCATE TABLE t;

--source include/wait_all_purged.inc

SET @old_lsn=@lsn;

SELECT (@lsn:= variable_value)-@old_lsn

FROM INFORMATION_SCHEMA.GLOBAL_STATUS

WHERE variable_name = 'INNODB_LSN_CURRENT';

BEGIN;

eval INSERT INTO t SET a=$n;

--error ER_DUP_ENTRY

eval INSERT INTO t(a) SELECT * FROM seq_1_to_$n;

SELECT * FROM t;

ROLLBACK;

SELECT * FROM t;

SET @old_lsn=@lsn;

SELECT (@lsn:= variable_value)-@old_lsn

FROM INFORMATION_SCHEMA.GLOBAL_STATUS

WHERE variable_name = 'INNODB_LSN_CURRENT';

SHOW PROFILES;

SET profiling = 0;

DROP TABLE t;

SET GLOBAL innodb_purge_rseg_truncate_frequency=@freq;

Without that first SET statement, we will be using normal row-logged INSERT. For me, that did consume roughly the same amount of time and produced the same amount of redo log, when comparing to the 10.7 base revision that was last merged with the branch.

preview-10.7-MDEV-24621-innodb-bulk-insert e62107a1adbea2af5b514bcc43a1c2e54da687fa
#2 0x0000555555b9d26d in ib::fatal::~fatal (this=<optimized out>,
this@entry=0x7ffff04d0360, __in_chrg=<optimized out>)
at /mariadb/10.7/storage/innobase/ut/ut0ut.cc:519
#3 0x0000555555b8fc66 in row_mysql_handle_errors (new_err=0x7ffff04d057c,
trx=0x7ffff1785130, thr=<optimized out>, savept=0x7ffff04d0580)
at /mariadb/10.7/storage/innobase/row/row0mysql.cc:727
#4 0x00005555562191a5 in row_insert_for_mysql (mysql_rec=<optimized out>,
prebuilt=0x7fff8c0286a0, ins_mode=ROW_INS_NORMAL)
at /mariadb/10.7/storage/innobase/row/row0mysql.cc:1329
…
(gdb) f 3
#3 0x0000555555b8fc66 in row_mysql_handle_errors (new_err=0x7ffff04d057c,
trx=0x7ffff1785130, thr=<optimized out>, savept=0x7ffff04d0580)
at /mariadb/10.7/storage/innobase/row/row0mysql.cc:727
727 ib::fatal() << "Unknown error " << err;
(gdb) p err
$1 = DB_IO_ERROR

I repeated the same with rr, and I determined that the DB_IO_ERROR was set here:

preview-10.7-MDEV-24621-innodb-bulk-insert e62107a1adbea2af5b514bcc43a1c2e54da687fa
#0 0x0000563fadc90434 in os_file_io (in_type=<optimized out>,
file=<optimized out>, buf=<optimized out>, n=<optimized out>, offset=0,
err=0x7f723c9a4074) at /mariadb/10.7/storage/innobase/os/os0file.cc:2896
#1 0x0000563fadc905c5 in os_file_pwrite (err=0x7f723c9a4074,
offset=<optimized out>, n=<optimized out>, buf=0x0, file=52, type=
@0x563fae06eb10: {bpage = 0x0, node = 0x0, type = IORequest::WRITE_SYNC})
at /mariadb/10.7/storage/innobase/os/os0file.cc:2932
#2 os_file_write_func (type=
@0x563fae06eb10: {bpage = 0x0, node = 0x0, type = IORequest::WRITE_SYNC}, name=<optimized out>, name@entry=0x563fae06e154 "(bulk insert)",
file=file@entry=52, buf=buf@entry=0x0, offset=<optimized out>,
offset@entry=0, n=<optimized out>, n@entry=4294967295)
at /mariadb/10.7/storage/innobase/os/os0file.cc:2963
#3 0x0000563fadcc9827 in pfs_os_file_write_func (type=
@0x563fae06eb10: {bpage = 0x0, node = 0x0, type = IORequest::WRITE_SYNC}, src_file=0x563fae06dbf0 "/mariadb/10.7/storage/innobase/row/row0merge.cc",
src_line=1081, n=4294967295, offset=0, buf=<optimized out>,
file=<optimized out>, name=0x563fae06e154 "(bulk insert)")
at /mariadb/10.7/storage/innobase/include/os0file.ic:317
#4 row_merge_buf_blob (blob_file=0x7f721c939b38, heap=<synthetic pointer>,
n_fields=4, entry=0x7f721c939c18)
at /mariadb/10.7/storage/innobase/row/row0merge.cc:1081
#5 row_merge_buf_write (buf=0x7f721c939b78, of=<optimized out>,
5c000 "", blob_file=0x7f721c939b38) at /mariadb/10.7/storage/innobase/row/row0merge.cc:1126
#6 0x0000563fadcc9d5e in row_merge_bulk_t::write_to_tmp_file (index_no=0, this=0x7f721c939af0) at /mariadb/10.7/storage/innobase/row/row0merge.cc:5075
#7 row_merge_bulk_t::write_to_tmp_file (this=0x7f721c939af0, index_no=0) at /mariadb/10.7/storage/innobase/row/row0merge.cc:5066
#8 0x0000563fadcca154 in row_merge_bulk_t::bulk_insert_buffered (this=0x7f721c939af0, row=row@entry=0x7f721c026b40, ind=<optimized out>, trx=<optimized out>)
at /mariadb/10.7/storage/innobase/row/row0merge.cc:5123
#9 0x0000563fadcc4975 in trx_mod_table_time_t::bulk_insert_buffered (trx=<optimized out>, index=<optimized out>, entry=<optimized out>, this=<optimized out>)
at /mariadb/10.7/storage/innobase/include/trx0trx.h:488
#10 row_ins_index_entry (thr=0x7f721c02eb08, entry=<optimized out>, index=<optimized out>) at /mariadb/10.7/storage/innobase/row/row0ins.cc:3373
#11 row_ins_index_entry_step (thr=0x7f721c02eb08, node=<optimized out>) at /mariadb/10.7/storage/innobase/row/row0ins.cc:3546
#12 row_ins (thr=0x7f721c02eb08, node=<optimized out>) at /mariadb/10.7/storage/innobase/row/row0ins.cc:3692
#13 row_ins_step (thr=thr@entry=0x7f721c02eb08) at /mariadb/10.7/storage/innobase/row/row0ins.cc:3838

I did not debug this deeper. One idea would be to use normal file I/O instead of the InnoDB page I/O wrapper. We do not want O_DIRECT for these writes, since we will be writing arbitrary-sized blocks without any alignment guarantees.

Marko Mäkelä added a comment - 2021-09-20 09:57 I wanted to perform a small performance test, similar to what I conducted for MDEV-515 earlier. Unfortunately, this test will crash the preview branch if the first SET statement is present: --source include/have_innodb.inc --source include/have_sequence.inc --source include/not_debug.inc SET unique_checks=0, foreign_key_checks=0; SET @freq=@@ GLOBAL .innodb_purge_rseg_truncate_frequency; SET GLOBAL innodb_purge_rseg_truncate_frequency=1; CREATE TABLE t (a INT PRIMARY KEY , b CHAR (255) CHARACTER SET utf8) ENGINE=InnoDB ROW_FORMAT=REDUNDANT; let $n=1000000; --source include/wait_all_purged.inc SET profiling_history_size=100; SET profiling = 1; SELECT variable_value INTO @old_lsn FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE variable_name = 'INNODB_LSN_CURRENT' ; BEGIN ; eval INSERT INTO t(a) SELECT * FROM seq_1_to_$n; SELECT (@lsn:= variable_value)-@old_lsn FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE variable_name = 'INNODB_LSN_CURRENT' ; ROLLBACK ; SET @old_lsn=@lsn; SELECT (@lsn:= variable_value)-@old_lsn FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE variable_name = 'INNODB_LSN_CURRENT' ; SELECT * FROM t; eval INSERT INTO t(a) SELECT * FROM seq_1_to_$n; --source include/wait_all_purged.inc SET @old_lsn=@lsn; SELECT (@lsn:= variable_value)-@old_lsn FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE variable_name = 'INNODB_LSN_CURRENT' ; TRUNCATE TABLE t; --source include/wait_all_purged.inc SET @old_lsn=@lsn; SELECT (@lsn:= variable_value)-@old_lsn FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE variable_name = 'INNODB_LSN_CURRENT' ; BEGIN ; eval INSERT INTO t SET a=$n; --error ER_DUP_ENTRY eval INSERT INTO t(a) SELECT * FROM seq_1_to_$n; SELECT * FROM t; ROLLBACK ; SELECT * FROM t; SET @old_lsn=@lsn; SELECT (@lsn:= variable_value)-@old_lsn FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE variable_name = 'INNODB_LSN_CURRENT' ; SHOW PROFILES; SET profiling = 0; DROP TABLE t; SET GLOBAL innodb_purge_rseg_truncate_frequency=@freq; Without that first SET statement, we will be using normal row-logged INSERT . For me, that did consume roughly the same amount of time and produced the same amount of redo log, when comparing to the 10.7 base revision that was last merged with the branch. preview-10.7-MDEV-24621-innodb-bulk-insert e62107a1adbea2af5b514bcc43a1c2e54da687fa #2 0x0000555555b9d26d in ib::fatal::~fatal (this=<optimized out>, this@entry=0x7ffff04d0360, __in_chrg=<optimized out>) at /mariadb/10.7/storage/innobase/ut/ut0ut.cc:519 #3 0x0000555555b8fc66 in row_mysql_handle_errors (new_err=0x7ffff04d057c, trx=0x7ffff1785130, thr=<optimized out>, savept=0x7ffff04d0580) at /mariadb/10.7/storage/innobase/row/row0mysql.cc:727 #4 0x00005555562191a5 in row_insert_for_mysql (mysql_rec=<optimized out>, prebuilt=0x7fff8c0286a0, ins_mode=ROW_INS_NORMAL) at /mariadb/10.7/storage/innobase/row/row0mysql.cc:1329 … (gdb) f 3 #3 0x0000555555b8fc66 in row_mysql_handle_errors (new_err=0x7ffff04d057c, trx=0x7ffff1785130, thr=<optimized out>, savept=0x7ffff04d0580) at /mariadb/10.7/storage/innobase/row/row0mysql.cc:727 727 ib::fatal() << "Unknown error " << err; (gdb) p err $1 = DB_IO_ERROR I repeated the same with rr , and I determined that the DB_IO_ERROR was set here: preview-10.7-MDEV-24621-innodb-bulk-insert e62107a1adbea2af5b514bcc43a1c2e54da687fa #0 0x0000563fadc90434 in os_file_io (in_type=<optimized out>, file=<optimized out>, buf=<optimized out>, n=<optimized out>, offset=0, err=0x7f723c9a4074) at /mariadb/10.7/storage/innobase/os/os0file.cc:2896 #1 0x0000563fadc905c5 in os_file_pwrite (err=0x7f723c9a4074, offset=<optimized out>, n=<optimized out>, buf=0x0, file=52, type= @0x563fae06eb10: {bpage = 0x0, node = 0x0, type = IORequest::WRITE_SYNC}) at /mariadb/10.7/storage/innobase/os/os0file.cc:2932 #2 os_file_write_func (type= @0x563fae06eb10: {bpage = 0x0, node = 0x0, type = IORequest::WRITE_SYNC}, name=<optimized out>, name@entry=0x563fae06e154 "(bulk insert)", file=file@entry=52, buf=buf@entry=0x0, offset=<optimized out>, offset@entry=0, n=<optimized out>, n@entry=4294967295) at /mariadb/10.7/storage/innobase/os/os0file.cc:2963 #3 0x0000563fadcc9827 in pfs_os_file_write_func (type= @0x563fae06eb10: {bpage = 0x0, node = 0x0, type = IORequest::WRITE_SYNC}, src_file=0x563fae06dbf0 "/mariadb/10.7/storage/innobase/row/row0merge.cc", src_line=1081, n=4294967295, offset=0, buf=<optimized out>, file=<optimized out>, name=0x563fae06e154 "(bulk insert)") at /mariadb/10.7/storage/innobase/include/os0file.ic:317 #4 row_merge_buf_blob (blob_file=0x7f721c939b38, heap=<synthetic pointer>, n_fields=4, entry=0x7f721c939c18) at /mariadb/10.7/storage/innobase/row/row0merge.cc:1081 #5 row_merge_buf_write (buf=0x7f721c939b78, of=<optimized out>, 5c000 "", blob_file=0x7f721c939b38) at /mariadb/10.7/storage/innobase/row/row0merge.cc:1126 #6 0x0000563fadcc9d5e in row_merge_bulk_t::write_to_tmp_file (index_no=0, this=0x7f721c939af0) at /mariadb/10.7/storage/innobase/row/row0merge.cc:5075 #7 row_merge_bulk_t::write_to_tmp_file (this=0x7f721c939af0, index_no=0) at /mariadb/10.7/storage/innobase/row/row0merge.cc:5066 #8 0x0000563fadcca154 in row_merge_bulk_t::bulk_insert_buffered (this=0x7f721c939af0, row=row@entry=0x7f721c026b40, ind=<optimized out>, trx=<optimized out>) at /mariadb/10.7/storage/innobase/row/row0merge.cc:5123 #9 0x0000563fadcc4975 in trx_mod_table_time_t::bulk_insert_buffered (trx=<optimized out>, index=<optimized out>, entry=<optimized out>, this=<optimized out>) at /mariadb/10.7/storage/innobase/include/trx0trx.h:488 #10 row_ins_index_entry (thr=0x7f721c02eb08, entry=<optimized out>, index=<optimized out>) at /mariadb/10.7/storage/innobase/row/row0ins.cc:3373 #11 row_ins_index_entry_step (thr=0x7f721c02eb08, node=<optimized out>) at /mariadb/10.7/storage/innobase/row/row0ins.cc:3546 #12 row_ins (thr=0x7f721c02eb08, node=<optimized out>) at /mariadb/10.7/storage/innobase/row/row0ins.cc:3692 #13 row_ins_step (thr=thr@entry=0x7f721c02eb08) at /mariadb/10.7/storage/innobase/row/row0ins.cc:3838 I did not debug this deeper. One idea would be to use normal file I/O instead of the InnoDB page I/O wrapper. We do not want O_DIRECT for these writes, since we will be writing arbitrary-sized blocks without any alignment guarantees.

Marko Mäkelä made changes - 2021-09-20 09:57

Assignee	Marko Mäkelä [ marko ]	Thirunarayanan Balathandayuthapani [ thiru ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Marko Mäkelä added a comment - 2021-09-20 11:34

The issue was that we were incorrectly attempting to write NULL values to the BLOB file. With that fixed, I am seeing the following amount of redo log writes reported by the test. (Note that these numbers are not entirely stable.)

operation	baseline	baseline-nopurge	patched	patched-nopurge
bulk INSERT 1M rows	823,539,085	823,536,600	842,825,019	842,823,916
ROLLBACK	71,347	71,353	82,611	82,617
bulk INSERT 1M rows	823,489,560	823,487,932	842,768,033	842,771,515
TRUNCATE	2,880	2,616	2,973	2,632
normal INSERT 1+1M rows,rollback	871,863,611	871,860,836	871,864,475	871,861,868

For the last step, the numbers should be identical, and they are close to identical. Somewhat surprising is that for the ROLLBACK and TRUNCATE we see more log written, even though those were supposed to be roughly identical as well. To remove randomness caused by the purge of transaction history, I repeated the test with innodb_force_recovery=2 and disabled the wait_all_purge.inc lines. Every second column is from that run.

As we can see, the nopurge columns for TRUNCATE are virtually identical, but for ROLLBACK they are not. I think that the reason is different index tree layout. After the first INSERT of the 1M rows, I am seeing the following t.ibd file size:

baseline	patched
884,998,144	1,006,632,960

thiru, I think that such a huge change of file size is something that we must investigate. With the same MERGE_THRESHOLD and other parameters that affect the B-tree page fill factor, we would want to have a similar file layout. I would expect the file size change to explain the different amount of redo log written for ROLLBACK as well.

Note: Performance-wise we seem to be doing fine. Because I had some concurrent load running on the machine, I cannot quote reliable numbers. The test completed in about 31 seconds on the baseline 10.7 revision and in 25 seconds with the patch.

Marko Mäkelä added a comment - 2021-09-20 11:34 The issue was that we were incorrectly attempting to write NULL values to the BLOB file . With that fixed, I am seeing the following amount of redo log writes reported by the test. (Note that these numbers are not entirely stable.) operation baseline baseline-nopurge patched patched-nopurge bulk INSERT 1M rows 823,539,085 823,536,600 842,825,019 842,823,916 ROLLBACK 71,347 71,353 82,611 82,617 bulk INSERT 1M rows 823,489,560 823,487,932 842,768,033 842,771,515 TRUNCATE 2,880 2,616 2,973 2,632 normal INSERT 1+1M rows,rollback 871,863,611 871,860,836 871,864,475 871,861,868 For the last step, the numbers should be identical, and they are close to identical. Somewhat surprising is that for the ROLLBACK and TRUNCATE we see more log written, even though those were supposed to be roughly identical as well. To remove randomness caused by the purge of transaction history, I repeated the test with innodb_force_recovery=2 and disabled the wait_all_purge.inc lines. Every second column is from that run. As we can see, the nopurge columns for TRUNCATE are virtually identical, but for ROLLBACK they are not. I think that the reason is different index tree layout. After the first INSERT of the 1M rows, I am seeing the following t.ibd file size: baseline patched 884,998,144 1,006,632,960 thiru , I think that such a huge change of file size is something that we must investigate. With the same MERGE_THRESHOLD and other parameters that affect the B-tree page fill factor, we would want to have a similar file layout. I would expect the file size change to explain the different amount of redo log written for ROLLBACK as well. Note: Performance-wise we seem to be doing fine. Because I had some concurrent load running on the machine, I cannot quote reliable numbers. The test completed in about 31 seconds on the baseline 10.7 revision and in 25 seconds with the patch.

Marko Mäkelä added a comment - 2021-09-20 11:36

I must say that it is encouraging to see such time savings for a simple benchmark that did not even include secondary indexes. With secondary indexes, I would expect the performance difference to be even bigger, between row-by-row inserts into each index, and the merge sort and bulk insert of one index at a time.

Marko Mäkelä added a comment - 2021-09-20 11:36 I must say that it is encouraging to see such time savings for a simple benchmark that did not even include secondary indexes. With secondary indexes, I would expect the performance difference to be even bigger, between row-by-row inserts into each index, and the merge sort and bulk insert of one index at a time.

Marko Mäkelä made changes - 2021-09-29 14:21

Assignee	Thirunarayanan Balathandayuthapani [ thiru ]	Axel Schwenke [ axel ]
Priority	Major [ 3 ]	Critical [ 2 ]

Thirunarayanan Balathandayuthapani added a comment - 2021-10-01 09:15

--source include/have_innodb.inc

--source include/have_sequence.inc

create table t1(f1 int not null primary key, b char(255) CHARACTER SET utf8)engine=innodb;

INSERT INTO t1(f1) SELECT * FROM seq_1_to_1000000;

--source include/restart_mysqld.inc - Here t1 size is 32 MB

alter table t1 force, algorithm=inplace;

--source include/restart_mysqld.inc - Here t1 size is 36 MB

drop table t1;

Inplace alter table code increases the file size of the table by 4MB. Bulk insert code uses the same code path as inplace alter code. This file size
increases issue should affect 10.2+ onwards.

Thirunarayanan Balathandayuthapani added a comment - 2021-10-01 09:15 --source include/have_innodb.inc --source include/have_sequence.inc create table t1(f1 int not null primary key, b char(255) CHARACTER SET utf8)engine=innodb; INSERT INTO t1(f1) SELECT * FROM seq_1_to_1000000; --source include/restart_mysqld.inc - Here t1 size is 32 MB alter table t1 force, algorithm=inplace; --source include/restart_mysqld.inc - Here t1 size is 36 MB drop table t1; Inplace alter table code increases the file size of the table by 4MB. Bulk insert code uses the same code path as inplace alter code. This file size increases issue should affect 10.2+ onwards.

Thirunarayanan Balathandayuthapani made changes - 2021-10-05 11:10

Link

This issue relates to ~~MDEV-26740~~ [ ~~MDEV-26740~~ ]

Marko Mäkelä added a comment - 2021-10-08 12:07

thiru, I got some data from axel regarding a performance regression, but it did not have good stack traces. I reread the code changes, and I suspect that the performance regression is due to the SQL layer change that also serg asked about. Could we replace innodb_bulk_insert_write() and its caller with a small addition to ha_innobase::reset()?

Marko Mäkelä added a comment - 2021-10-08 12:07 thiru , I got some data from axel regarding a performance regression, but it did not have good stack traces. I reread the code changes, and I suspect that the performance regression is due to the SQL layer change that also serg asked about. Could we replace innodb_bulk_insert_write() and its caller with a small addition to ha_innobase::reset() ?

Marko Mäkelä added a comment - 2021-10-15 09:59

This is still causing some performance regression for normal DML workload in a case when bulk loading was not used at all.

Marko Mäkelä added a comment - 2021-10-15 09:59 This is still causing some performance regression for normal DML workload in a case when bulk loading was not used at all.

Marko Mäkelä made changes - 2021-10-15 09:59

Assignee

Axel Schwenke [ axel ]

Thirunarayanan Balathandayuthapani [ thiru ]

Marko Mäkelä added a comment - 2021-10-22 15:40

Performance seems to be acceptable now.

Marko Mäkelä added a comment - 2021-10-22 15:40 Performance seems to be acceptable now.

Marko Mäkelä made changes - 2021-10-26 12:18

issue.field.resolutiondate

2021-10-26 12:18:11.0

2021-10-26 12:18:11.414

Marko Mäkelä made changes - 2021-10-26 12:18

Fix Version/s		10.7.1 [ 26120 ]
Fix Version/s	10.7 [ 24805 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Marko Mäkelä made changes - 2021-11-01 05:52

Link

This issue relates to ~~MDEV-26947~~ [ ~~MDEV-26947~~ ]

Sergei Golubchik made changes - 2021-12-06 21:24

Workflow

MariaDB v3 [ 118240 ]

MariaDB v4 [ 134379 ]

Marko Mäkelä made changes - 2021-12-10 10:59

Link

This issue causes ~~MDEV-27220~~ [ ~~MDEV-27220~~ ]

Marko Mäkelä made changes - 2021-12-20 07:05

Link

This issue causes ~~MDEV-27316~~ [ ~~MDEV-27316~~ ]

Marko Mäkelä made changes - 2021-12-20 07:06

Link

This issue causes ~~MDEV-27318~~ [ ~~MDEV-27318~~ ]

Marko Mäkelä made changes - 2021-12-20 13:24

Link

This issue causes ~~MDEV-27312~~ [ ~~MDEV-27312~~ ]

Marko Mäkelä made changes - 2022-02-08 14:07

Link

This issue relates to ~~MDEV-27751~~ [ ~~MDEV-27751~~ ]

Marko Mäkelä made changes - 2022-02-21 08:10

Link

This issue causes ~~MDEV-27890~~ [ ~~MDEV-27890~~ ]

Marko Mäkelä made changes - 2022-02-21 08:28

Link

This issue causes ~~MDEV-27858~~ [ ~~MDEV-27858~~ ]

Roel Van de Paar made changes - 2022-02-26 04:14

Link

This issue causes ~~MDEV-27953~~ [ ~~MDEV-27953~~ ]

Roel Van de Paar made changes - 2022-03-01 09:16

Link

This issue relates to ~~MDEV-27971~~ [ ~~MDEV-27971~~ ]

Marko Mäkelä made changes - 2022-03-21 12:02

Link

This issue causes ~~MDEV-28138~~ [ ~~MDEV-28138~~ ]

Marko Mäkelä made changes - 2022-04-04 10:51

Link

This issue relates to ~~MDEV-28237~~ [ ~~MDEV-28237~~ ]

Roel Van de Paar made changes - 2022-04-05 05:08

Link

This issue causes ~~MDEV-28242~~ [ ~~MDEV-28242~~ ]

Marko Mäkelä made changes - 2022-04-23 18:36

Link

This issue relates to ~~MDEV-28400~~ [ ~~MDEV-28400~~ ]

Christopher Mähler made changes - 2022-06-03 14:51

Link

This issue relates to ~~MDEV-28679~~ [ ~~MDEV-28679~~ ]

Roel Van de Paar made changes - 2022-06-16 08:11

Link

This issue causes ~~MDEV-28865~~ [ ~~MDEV-28865~~ ]

Marko Mäkelä made changes - 2022-09-26 06:49

Link

This issue causes ~~MDEV-29570~~ [ ~~MDEV-29570~~ ]

Marko Mäkelä made changes - 2022-10-11 07:32

Link

This issue relates to ~~MDEV-29761~~ [ ~~MDEV-29761~~ ]

Alexander Barkov made changes - 2022-11-11 07:17

Link

This issue relates to ~~MDEV-27214~~ [ ~~MDEV-27214~~ ]

Marko Mäkelä made changes - 2022-11-22 06:14

Link

This issue relates to ~~MDEV-30063~~ [ ~~MDEV-30063~~ ]

Marko Mäkelä made changes - 2022-12-09 12:16

Link

This issue relates to MDEV-16281 [ MDEV-16281 ]

Thirunarayanan Balathandayuthapani made changes - 2023-01-18 06:01

Link

This issue causes ~~MDEV-30426~~ [ ~~MDEV-30426~~ ]

Marko Mäkelä made changes - 2023-01-28 09:28

Link

This issue relates to MDEV-30489 [ MDEV-30489 ]

Ralf Gebhardt made changes - 2023-03-21 11:00

Labels

performance

Preview_10.7 performance

Roel Van de Paar made changes - 2023-04-07 22:46

Link

This issue causes ~~MDEV-31025~~ [ ~~MDEV-31025~~ ]

Roel Van de Paar made changes - 2023-04-08 03:30

Link

This issue relates to MDEV-31027 [ MDEV-31027 ]

Marko Mäkelä made changes - 2023-05-17 10:47

Link

This issue relates to ~~MDEV-31298~~ [ ~~MDEV-31298~~ ]

Alice Sherepa made changes - 2023-06-19 13:48

Link

This issue causes ~~MDEV-30796~~ [ ~~MDEV-30796~~ ]

Marko Mäkelä made changes - 2023-06-26 11:10

Link

This issue relates to ~~MDEV-31537~~ [ ~~MDEV-31537~~ ]

Marko Mäkelä made changes - 2023-08-03 06:59

Link

This issue relates to MDEV-31667 [ MDEV-31667 ]

Marko Mäkelä made changes - 2023-08-03 06:59

Link

This issue relates to MDEV-31686 [ MDEV-31686 ]

Marko Mäkelä made changes - 2023-09-11 13:13

Link

This issue relates to ~~MDEV-29989~~ [ ~~MDEV-29989~~ ]

Aleksey Midenkov made changes - 2023-10-06 22:19

Link

This issue relates to MDEV-16226 [ MDEV-16226 ]

Marko Mäkelä made changes - 2023-10-12 10:08

Link

This issue relates to ~~MDEV-32453~~ [ ~~MDEV-32453~~ ]

Michael Widenius made changes - 2023-12-20 10:51

Description

In ~~MDEV-515~~, the original intention was to do 2 things to speed up {{INSERT}} to an empty table or partition:
# Stop writing undo log records for each inserted record, and instead write just one saying that the table was empty.
# When the table was empty, pre-sort the records for each index, and build the indexes one page at a time, instead of the current one unsorted record at a time.

The undo logging change turned out to be rather challenging on its own, because it affects MVCC and locking as well. Preventing the useless undo logging should speed up {{ROLLBACK}} and recovery, and also the purge of history, compensating for ~~MDEV-12288~~. Hence, it makes sense to complete ~~MDEV-515~~ in the limited form.

The purpose of this task is to make {{INSERT}} into an empty table or partition even more efficient by making use of sorting. We expect data loads into huge tables with many secondary indexes to become faster. The change buffer (~~MDEV-11634~~) should no longer come into play in this case. (Huge loads into nonempty tables will be unaffected by this enhancement.)

In ~~MDEV-515~~, the original intention was to do 2 things to speed up {{INSERT}} to an empty table or partition:
# Stop writing undo log records for each inserted record, and instead write just one saying that the table was empty.
# When the table was empty, pre-sort the records for each index, and build the indexes one page at a time, instead of the current one unsorted record at a time.

The undo logging change turned out to be rather challenging on its own, because it affects MVCC and locking as well. Preventing the useless undo logging should speed up {{ROLLBACK}} and recovery, and also the purge of history, compensating for ~~MDEV-12288~~. Hence, it makes sense to complete ~~MDEV-515~~ in the limited form.

The purpose of this task is to make {{INSERT}} into an empty table or partition even more efficient by making use of sorting. We expect data loads into huge tables with many secondary indexes to become faster. The change buffer (~~MDEV-11634~~) should no longer come into play in this case. (Huge loads into nonempty tables will be unaffected by this enhancement.)

After implementation comments:
The following queries will use the new optimization:
- CREATE ... SELECT
- Inserting into an empty table together with SET STATEMENT unique_checks=0,foreign_key_checks=0 FOR ...
  - INSERT statement
  - INSERT ... SELECT
  - LOAD DATA INFILE

Insert into an empty table without unique_checks=0,foreign_key_checks=0 will NOT be optimized.

Michael Widenius made changes - 2023-12-20 10:53

Description

In ~~MDEV-515~~, the original intention was to do 2 things to speed up {{INSERT}} to an empty table or partition:
# Stop writing undo log records for each inserted record, and instead write just one saying that the table was empty.
# When the table was empty, pre-sort the records for each index, and build the indexes one page at a time, instead of the current one unsorted record at a time.

The undo logging change turned out to be rather challenging on its own, because it affects MVCC and locking as well. Preventing the useless undo logging should speed up {{ROLLBACK}} and recovery, and also the purge of history, compensating for ~~MDEV-12288~~. Hence, it makes sense to complete ~~MDEV-515~~ in the limited form.

The purpose of this task is to make {{INSERT}} into an empty table or partition even more efficient by making use of sorting. We expect data loads into huge tables with many secondary indexes to become faster. The change buffer (~~MDEV-11634~~) should no longer come into play in this case. (Huge loads into nonempty tables will be unaffected by this enhancement.)

After implementation comments:
The following queries will use the new optimization:
- CREATE ... SELECT
- Inserting into an empty table together with SET STATEMENT unique_checks=0,foreign_key_checks=0 FOR ...
  - INSERT statement
  - INSERT ... SELECT
  - LOAD DATA INFILE

Insert into an empty table without unique_checks=0,foreign_key_checks=0 will NOT be optimized.

In ~~MDEV-515~~, the original intention was to do 2 things to speed up {{INSERT}} to an empty table or partition:
# Stop writing undo log records for each inserted record, and instead write just one saying that the table was empty.
# When the table was empty, pre-sort the records for each index, and build the indexes one page at a time, instead of the current one unsorted record at a time.

The undo logging change turned out to be rather challenging on its own, because it affects MVCC and locking as well. Preventing the useless undo logging should speed up {{ROLLBACK}} and recovery, and also the purge of history, compensating for ~~MDEV-12288~~. Hence, it makes sense to complete ~~MDEV-515~~ in the limited form.

The purpose of this task is to make {{INSERT}} into an empty table or partition even more efficient by making use of sorting. We expect data loads into huge tables with many secondary indexes to become faster. The change buffer (~~MDEV-11634~~) should no longer come into play in this case. (Huge loads into nonempty tables will be unaffected by this enhancement.)

After implementation comments:
The following queries will use the new optimization:
- CREATE ... SELECT
- Inserting into an empty table together with SET STATEMENT unique_checks=0,foreign_key_checks=0 FOR ...
  -- INSERT statement
  -- INSERT ... SELECT
  -- LOAD DATA INFILE

Insert into an empty table without unique_checks=0,foreign_key_checks=0 will NOT be optimized.

Marko Mäkelä made changes - 2023-12-20 12:11

Link

This issue blocks ~~MDEV-33087~~ [ ~~MDEV-33087~~ ]

Marko Mäkelä made changes - 2024-01-15 06:34

Link

This issue relates to ~~MDEV-33243~~ [ ~~MDEV-33243~~ ]

Rob Schwyzer (Inactive) made changes - 2024-04-04 22:00

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 36747 ]

Rob Schwyzer (Inactive) made changes - 2024-04-05 18:34

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 36747 ]

Marko Mäkelä made changes - 2024-08-05 08:35

Link

This issue relates to ~~MDEV-34703~~ [ ~~MDEV-34703~~ ]

Marko Mäkelä made changes - 2024-08-07 12:08

Link

This issue relates to ~~MDEV-34719~~ [ ~~MDEV-34719~~ ]

MariaDB Server

In bulk insert, pre-sort and build indexes one page at a time

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration