[MDEV-15914] performance regression for mass update - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.3.3
Fix Version/s: 10.3.7
Component/s: Storage Engine - InnoDB
Labels:
None

Description

Updating a column in all rows of an InnoDB table has become slower in MariaDB 10.3, compared to 10.2. Test case

drop table if exists t1;

create table t1 (id serial, c1 int, c2 char(10) default 'foobar');

insert into t1 (c1) values (rand()*1000);

insert into t1 (c1) select rand()*1000 from t1;

... (19 more times)

update t1 set c2='x';

The final UPDATE touches 1 mio rows. There is no index involved. The update can be done in place. In 10.2.14 the updates takes 2 seconds on average, in 10.3.5 it takes 3 seconds on average. More numbers in related ~~MDEV-15213~~.

Attachments

Issue Links

relates to

MDEV-14425 Change the InnoDB redo log format to reduce write amplification

Closed

MDEV-16041 Do not write for null update (properly fix MySQL Bug#29157)

Closed

MDEV-15213 UPDATEs are slow after instant ADD COLUMN

Closed

Activity

Ascending order - Click to sort in descending order

Axel Schwenke created issue - 2018-04-18 09:19

Axel Schwenke made changes - 2018-04-18 09:19

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-15213~~ [ ~~MDEV-15213~~ ]

Elena Stepanova made changes - 2018-04-18 10:54

Fix Version/s

10.3 [ 22126 ]

Marko Mäkelä added a comment - 2018-04-24 10:59

To rule out the effect of ~~MDEV-12288~~, I rewrote the test to use a slow shutdown, so that it is an apples-to-apples comparison. In 10.3, the purge (resetting DB_TRX_ID) after the INSERT could run concurrently with the UPDATE.

--source include/have_innodb.inc

--source include/have_sequence.inc

create table t1 (id serial, c2 char(10) default 'foobar') engine=innodb;

select now();

insert into t1(id) select * from seq_1_to_1000000;

set global innodb_fast_shutdown=0;

select now();

--source include/restart_mysqld.inc

select now();

update t1 set c2='x';

select now();

drop table t1;

With this, a non-debug build of 10.2 shows me the following when running

./mtr --mem innodb.huge_update

10.2 8346acaf807587da751fdded5eb87fd72f5268a5
create table t1 (id serial, c2 char(10) default 'foobar') engine=innodb;
select now();
now()
2018-04-24 13:50:48
insert into t1(id) select * from seq_1_to_1000000;
set global innodb_fast_shutdown=0;
select now();
now()
2018-04-24 13:50:59
select now();
now()
2018-04-24 13:51:02
update t1 set c2='x';
select now();
now()
2018-04-24 13:51:06
drop table t1;
innodb.huge_update 'innodb' [ pass ] 17726

For 10.3 with the above revision merged, I get the following:

bb-10.3-marko fc106dadb27d9de19c6d17ff328bba2f37e5072d (based on 10.3 f79c5a658cc33a10d7744a748a4328254e2cbaf7)
create table t1 (id serial, c2 char(10) default 'foobar') engine=innodb;
select now();
now()
2018-04-24 13:49:18
insert into t1(id) select * from seq_1_to_1000000;
set global innodb_fast_shutdown=0;
select now();
now()
2018-04-24 13:49:40
select now();
now()
2018-04-24 13:49:53
update t1 set c2='x';
select now();
now()
2018-04-24 13:50:09
drop table t1;
innodb.huge_update 'innodb' [ pass ] 50824

The total test time is much longer (50.8 instead of 17.7 seconds; 33.1 seconds increase), divided as follows:

Stage	10.2 Time	10.3 Time
INSERT	11s	22s
Restart	3s	13s
UPDATE	5s	16s
Total	17.7s	50.8s

The increased restart time is expected due to the purge (~~MDEV-12288~~ is resetting the DB_TRX_ID of all inserted records), but the performance regressions for INSERT (2×) and UPDATE (3×) are not.

Marko Mäkelä added a comment - 2018-04-24 10:59 To rule out the effect of MDEV-12288 , I rewrote the test to use a slow shutdown, so that it is an apples-to-apples comparison. In 10.3, the purge (resetting DB_TRX_ID) after the INSERT could run concurrently with the UPDATE. --source include/have_innodb.inc --source include/have_sequence.inc create table t1 (id serial, c2 char (10) default 'foobar' ) engine=innodb; select now(); insert into t1(id) select * from seq_1_to_1000000; set global innodb_fast_shutdown=0; select now(); --source include/restart_mysqld.inc select now(); update t1 set c2= 'x' ; select now(); drop table t1; With this, a non-debug build of 10.2 shows me the following when running ./mtr --mem innodb.huge_update 10.2 8346acaf807587da751fdded5eb87fd72f5268a5 create table t1 (id serial, c2 char(10) default 'foobar') engine=innodb; select now(); now() 2018-04-24 13:50:48 insert into t1(id) select * from seq_1_to_1000000; set global innodb_fast_shutdown=0; select now(); now() 2018-04-24 13:50:59 select now(); now() 2018-04-24 13:51:02 update t1 set c2='x'; select now(); now() 2018-04-24 13:51:06 drop table t1; innodb.huge_update 'innodb' [ pass ] 17726 For 10.3 with the above revision merged, I get the following: bb-10.3-marko fc106dadb27d9de19c6d17ff328bba2f37e5072d (based on 10.3 f79c5a658cc33a10d7744a748a4328254e2cbaf7) create table t1 (id serial, c2 char(10) default 'foobar') engine=innodb; select now(); now() 2018-04-24 13:49:18 insert into t1(id) select * from seq_1_to_1000000; set global innodb_fast_shutdown=0; select now(); now() 2018-04-24 13:49:40 select now(); now() 2018-04-24 13:49:53 update t1 set c2='x'; select now(); now() 2018-04-24 13:50:09 drop table t1; innodb.huge_update 'innodb' [ pass ] 50824 The total test time is much longer (50.8 instead of 17.7 seconds; 33.1 seconds increase), divided as follows: Stage 10.2 Time 10.3 Time INSERT 11s 22s Restart 3s 13s UPDATE 5s 16s Total 17.7s 50.8s The increased restart time is expected due to the purge ( MDEV-12288 is resetting the DB_TRX_ID of all inserted records), but the performance regressions for INSERT (2×) and UPDATE (3×) are not.

Marko Mäkelä made changes - 2018-04-24 10:59

Status

Open [ 1 ]

Confirmed [ 10101 ]

Jan Lindström (Inactive) added a comment - 2018-04-24 11:21

While you are at it, maybe you should also time DELETE and drop table performance.

Jan Lindström (Inactive) added a comment - 2018-04-24 11:21 While you are at it, maybe you should also time DELETE and drop table performance.

Marko Mäkelä added a comment - 2018-04-24 11:34

Simpler test (no restart; purge disabled):

--source include/have_innodb.inc

--source include/have_sequence.inc

create table t1 (id serial, c2 char(10) default 'foobar') engine=innodb;

connect stop_purge, localhost, root;

START TRANSACTION WITH CONSISTENT SNAPSHOT;

connection default;

select now();

insert into t1(id) select * from seq_1_to_1000000;

select now();

update t1 set c2='x';

select now();

drop table t1;

disconnect stop_purge;

Stage	10.2 Time	10.3 Time
INSERT	10s	24s
UPDATE	3s	12s
Total	13.371s	36.325s

Marko Mäkelä added a comment - 2018-04-24 11:34 Simpler test (no restart; purge disabled): --source include/have_innodb.inc --source include/have_sequence.inc create table t1 (id serial, c2 char (10) default 'foobar' ) engine=innodb; connect stop_purge, localhost, root; START TRANSACTION WITH CONSISTENT SNAPSHOT; connection default ; select now(); insert into t1(id) select * from seq_1_to_1000000; select now(); update t1 set c2= 'x' ; select now(); drop table t1; disconnect stop_purge; Stage 10.2 Time 10.3 Time INSERT 10s 24s UPDATE 3s 12s Total 13.371s 36.325s

Marko Mäkelä added a comment - 2018-04-24 12:58

The above results were obtained when compiling with clang-6.0. At least in one function, clang emitted a long sequence of jmpq functions jumping forward in the function, so the code did not look optimal.

Using gcc 8.0.1 the performance difference is smaller:

Stage	10.2 gcc-8	10.3 gcc-8
INSERT	10s	11s
UPDATE	5s	13s
Total	14.5s	23.9s

Marko Mäkelä added a comment - 2018-04-24 12:58 The above results were obtained when compiling with clang-6.0. At least in one function, clang emitted a long sequence of jmpq functions jumping forward in the function, so the code did not look optimal. Using gcc 8.0.1 the performance difference is smaller: Stage 10.2 gcc-8 10.3 gcc-8 INSERT 10s 11s UPDATE 5s 13s Total 14.5s 23.9s

Marko Mäkelä made changes - 2018-04-24 13:24

Status

Confirmed [ 10101 ]

In Progress [ 3 ]

Marko Mäkelä added a comment - 2018-04-24 21:11

Here is a more suitable test for running sudo perf top while the UPDATE are executing:

--source include/have_innodb.inc

--source include/have_sequence.inc

SET @saved_frequency = @@GLOBAL.innodb_purge_rseg_truncate_frequency;

SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;

create table t1 (id serial, c2 char(10) default 'foobar') engine=innodb;

#connect stop_purge, localhost, root;

#START TRANSACTION WITH CONSISTENT SNAPSHOT;

#connection default;

select now();

insert into t1(id) select * from seq_1_to_1000000;

select now();

# alternative to stop_purge, let the MDEV-12288 purge for INSERT run

--source include/wait_all_purged.inc

select now();

begin;

let $n=10;

while ($n) {

update t1 set c2='x';

update t1 set c2='y';

update t1 set c2='z';

dec $n;

commit;

select now();

drop table t1;

#disconnect stop_purge;

SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency;

The results (CC=gcc-8 cmake -DPLUGIN_PERFSCHEMA:STRING=NO, with skip_innodb_adaptive_hash_index):

Stage	10.2	10.3
INSERT	9s	12s
purge	0s	1s
30 UPDATE	60s	66s

Marko Mäkelä added a comment - 2018-04-24 21:11 Here is a more suitable test for running sudo perf top while the UPDATE are executing: --source include/have_innodb.inc --source include/have_sequence.inc SET @saved_frequency = @@ GLOBAL .innodb_purge_rseg_truncate_frequency; SET GLOBAL innodb_purge_rseg_truncate_frequency = 1; create table t1 (id serial, c2 char (10) default 'foobar' ) engine=innodb; # connect stop_purge, localhost, root; #START TRANSACTION WITH CONSISTENT SNAPSHOT; # connection default ; select now(); insert into t1(id) select * from seq_1_to_1000000; select now(); # alternative to stop_purge, let the MDEV-12288 purge for INSERT run --source include/wait_all_purged.inc select now(); begin ; let $n=10; while ($n) { update t1 set c2= 'x' ; update t1 set c2= 'y' ; update t1 set c2= 'z' ; dec $n; } commit ; select now(); drop table t1; #disconnect stop_purge; SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency; The results ( CC=gcc-8 cmake -DPLUGIN_PERFSCHEMA:STRING=NO , with skip_innodb_adaptive_hash_index ): Stage 10.2 10.3 INSERT 9s 12s purge 0s 1s 30 UPDATE 60s 66s

Marko Mäkelä added a comment - 2018-04-25 10:26 - edited

I tried various different optimizations until I accidentally found the cause of the regression: In MariaDB 10.3.3 I replaced the high-level redo log record MLOG_UNDO_INSERT with the low-level records MLOG_2BYTES and MLOG_WRITE_STRING. It turns out that the small amount of added redo log volume (4 more bytes of payload and writing the page identifier twice instead of once) accounted for all of the increased execution time on my system.

Edit: In ~~MDEV-12353~~ we introduced a more compact redo log format that allows a mini-transaction to write multiple byte strings to be same page without repeating the page identifier. But, the few bytes saved by a special high-level record UNDO_APPEND turned out to have significant performance impact. Maybe ~~MDEV-14425~~ will reduce such impact in the future.

Marko Mäkelä added a comment - 2018-04-25 10:26 - edited I tried various different optimizations until I accidentally found the cause of the regression: In MariaDB 10.3.3 I replaced the high-level redo log record MLOG_UNDO_INSERT with the low-level records MLOG_2BYTES and MLOG_WRITE_STRING . It turns out that the small amount of added redo log volume (4 more bytes of payload and writing the page identifier twice instead of once) accounted for all of the increased execution time on my system. Edit: In MDEV-12353 we introduced a more compact redo log format that allows a mini-transaction to write multiple byte strings to be same page without repeating the page identifier. But, the few bytes saved by a special high-level record UNDO_APPEND turned out to have significant performance impact. Maybe MDEV-14425 will reduce such impact in the future.

Marko Mäkelä made changes - 2018-04-25 10:26

Link

This issue relates to ~~MDEV-14425~~ [ ~~MDEV-14425~~ ]

Marko Mäkelä added a comment - 2018-04-25 10:57

bb-10.3-marko contains many small changes that were motivated by checking sudo perf top while running the test. What fixed the performance for me was reverting a slight redo log volume increase for undo logging.

Marko Mäkelä added a comment - 2018-04-25 10:57 bb-10.3-marko contains many small changes that were motivated by checking sudo perf top while running the test. What fixed the performance for me was reverting a slight redo log volume increase for undo logging .

Marko Mäkelä made changes - 2018-04-25 10:57

Assignee	Marko Mäkelä [ marko ]	Thirunarayanan B [ thiru ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Marko Mäkelä made changes - 2018-04-26 17:52

Link

This issue relates to ~~MDEV-16041~~ [ ~~MDEV-16041~~ ]

Thirunarayanan Balathandayuthapani made changes - 2018-04-26 18:33

Assignee	Thirunarayanan B [ thiru ]	Marko Mäkelä [ marko ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Marko Mäkelä added a comment - 2018-04-26 20:06

Pushed to 10.3, along with a merge of ~~MDEV-16041~~ from 10.2.

Marko Mäkelä added a comment - 2018-04-26 20:06 Pushed to 10.3, along with a merge of MDEV-16041 from 10.2.

Marko Mäkelä made changes - 2018-04-26 20:06

issue.field.resolutiondate

2018-04-26 20:06:59.0

2018-04-26 20:06:59.515

Marko Mäkelä made changes - 2018-04-26 20:06

Fix Version/s		10.3.7 [ 23005 ]
Fix Version/s	10.3 [ 22126 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Marko Mäkelä made changes - 2018-04-26 20:07

Affects Version/s		10.3.3 [ 22644 ]
Affects Version/s	10.3 [ 22126 ]

Sergei Golubchik made changes - 2021-12-06 21:47

Workflow

MariaDB v3 [ 86626 ]

MariaDB v4 [ 154181 ]

People

Assignee:: Marko Mäkelä

Reporter:: Axel Schwenke

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2018-04-18 09:19

Updated:: 2020-09-16 05:52

Resolved:: 2018-04-26 20:06

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server