[MDEV-787] LP:692669 - Bulk Update statement Created: 2010-12-20  Updated: 2018-08-20

Status: Open
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Olaf van der Spek (Inactive) Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: Launchpad

Attachments: XML File LPexportBug692669.xml    

 Description   

Assume strict mode is enabled and I have to update 10000 rows in a table with 100000 rows. The updates are independent of each other (so update xbt_files set leechers = 1 where fid in (1,2) doesn't work).
Executing 10000 queries isn't efficient. There's a nice trick via insert ... on duplicate key update, but it breaks in strict mode (and strict mode is good, so I don't want to disable it).

Another way is to execute the updates via update ... case ... when ... then ... else ... end. However, this way can be 20x slower (for 10000 updates) and the syntax is worse.
So I'd like to request support for bulk updates. I think batch/bulk updates are a common operation and deserve proper and efficient support.

Possible syntax:

update2 xbt_files (fid, leechers, seeders, completed) values (3,1,2,3),(2,4,5,6) set
leechers = values(leechers), seeders = values(seeders), completed = values(completed);

The syntax is just a possibility, the semantics are the most important part.

How to repeat:

set sql_mode = '';
 
drop table if exists xbt_files;
 
create table xbt_files
(
 fid int not null auto_increment,
 info_hash blob not null,
 leechers int not null default 0,
 seeders int not null default 0,
 completed int not null default 0,
 flags int not null default 0,
 mtime int not null,
 ctime int not null,
 primary key (fid),
 unique key (info_hash(20))
);
 
insert into xbt_files (info_hash, mtime, ctime) values ('oRUImUWFyXvKItw3tVfB',
unix_timestamp(), unix_timestamp()), ('7XnB2eksvUgHvavG3YPJ', unix_timestamp(),
unix_timestamp()), ('9EsYv9y72FuHsL5yQYf8', unix_timestamp(), unix_timestamp());
 
insert into xbt_files (fid, leechers, seeders, completed) values (2,6268,4510,38453) on
duplicate key update leechers = values(leechers), seeders = values(seeders), completed =
values(completed);
 
update xbt_files set leechers = case fid when 2 then 4 else leechers end, seeders = case
fid when 2 then 5 else seeders end, completed = case fid when 2 then 6 else completed end
where fid in (2);
 
set sql_mode = 'strict_all_tables';
 
insert into xbt_files (fid, leechers, seeders, completed) values (3,6268,4510,38453) on
duplicate key update leechers = values(leechers), seeders = values(seeders), completed =
values(completed);
# ERROR 1364 (HY000): Field 'info_hash' doesn't have a default value
 
update xbt_files set leechers = case fid when 3 then 7 else leechers end, seeders = case
fid when 3 then 8 else seeders end, completed = case fid when 3 then 9 else completed end
where fid in (3);
 
select * from xbt_files;



 Comments   
Comment by Kristian Nielsen [ 2010-12-20 ]

Re: Bulk Update statement
Have you tried sending multiple UPDATE statements in a single request?

I think you would be able to achieve most of the needed efficiency this way.

Comment by Olaf van der Spek (Inactive) [ 2010-12-20 ]

Re: Bulk Update statement
TBH I haven't. Why do you think this would achieve comparable performance?

Comment by Kristian Nielsen [ 2010-12-22 ]

Re: Bulk Update statement
> TBH I haven't. Why do you think this would achieve comparable performance?

The main overhead in executing 10000 statements one after the other is in round-trips between the client and server.

By sending multiple statements in batches, this overhead is eliminated.

Comment by Olaf van der Spek (Inactive) [ 2010-12-22 ]

Re: Bulk Update statement
That way, the total query size is 86 byte per row vs 20 b per row. That's no good. Queries are already 1+ mb.
This has to be send, transfered, received and parsed. Then you need to lock 10000 times. Then you might need to commit 10000 times.

Hmm, I guess benchmarking settles this dispute faster.

Comment by Sergei Petrunia [ 2010-12-24 ]

Re: Bulk Update statement
How about using a temporary table and a multi table UPDATE statement:

– create a temptable with update 'specifications':
– fid=3 : change (leechers, seeders, completed) to (7,8,9)
– fid=2 : change (leechers, seeders, completed) to (4,5,6)
create temporary table updates (
fid int,
new_leechers int,
new_seeders int,
new_completed int
);
insert into updates values (3, 7, 8, 9), (2, 4, 5, 6);

update xbt_files, updates
set
xbt_files.leechers = updates.new_leechers,
xbt_files.seeders = updates.new_seeders,
xbt_files.completed = updates.new_completed
where
xbt_files.fid=updates.fid;

Comment by Olaf van der Spek (Inactive) [ 2010-12-24 ]

Re: Bulk Update statement
Doesn't work because the same row might be updated multiple times. It's also slower and requires 3 queries instead of 1.

Comment by Rasmus Johansson (Inactive) [ 2010-12-24 ]

Launchpad bug id: 692669

Comment by Sergei Golubchik [ 2015-03-05 ]

I suppose the simplest workaround would be to use insert ... on duplicate key update indeed. And specify values for all columns as required. The values don't matter, if they'll never be used anyway.

It's a workaround, I agree. A nicer solution would be not to issue “ERROR 1364 (HY000): Field 'info_hash' doesn't have a default value” error if there's a unique key conflict. But it's not easy to implement in a reasonable way.

Comment by Olaf van der Spek [ 2018-08-20 ]

> And specify values for all columns as required. The values don't matter, if they'll never be used anyway.

Breaks every time I add a field to the table, adding defaults to all fields would be simpler.

> It's a workaround, I agree. A nicer solution would be not to issue “ERROR 1364 (HY000): Field 'info_hash' doesn't have a default value” error if there's a unique key conflict.

Actually I don't want the insert behavior at all, a dedicated bulk update statement would be ideal.

Generated at Thu Feb 08 06:31:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.