[MCOL-3327] Optimize UPDATE operation that touches many rows at once. Created: 2019-05-23  Updated: 2023-11-10

Status: Open
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.2
Fix Version/s: 23.10

Type: New Feature Priority: Minor
Reporter: Roman Assignee: Roman
Resolution: Unresolved Votes: 1
Labels: None

Issue Links:
Duplicate
duplicates MCOL-3277 slow update statements Closed
PartOf
is part of MCOL-4343 umbrella for tech debt issues Open

 Description   

This issue is optimizing UPDATE on billions of rows. Perf output displays that FileBufferMgr::flushManyAllversion consumes a lot CPU time. The function purges all versions of updated blocks provided as a list of blocks from versionbuffer.

Workaround with lowering writeengine disk cache aka numblockpct to 10% didn't help.

The update insert int columns and matching on (varchar,int) type query in another smaller table. Those updates was run sequentially with no other query running.

Algorithm in FileBufferMgr::flushManyAllversion traverses the whole tr1::unordered_set fbSet trying to find each record in given tr1::unordered_set of blocks to remove.

Suggested solution is to migrate to std::unordered_map with blockid as a hash. Given that number of versions for any block lower then 10 this could even speed up general processing.
Another approach is to find all versions of the block from the list given as an argument using linear search. Given block A try to find N-th version of A, then N-1 th. This approach could be practical even if it has some limitations.


Generated at Thu Feb 08 02:41:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.