[MCOL-3327] Optimize UPDATE operation that touches many rows at once. - Jira

XML

Word

Printable

Details

Type: New Feature
Status: Open (View Workflow)
Priority: Minor
Resolution: Unresolved
Affects Version/s: 1.2
Fix Version/s: 23.10
Component/s: None
Labels:
None

Description

This issue is optimizing UPDATE on billions of rows. Perf output displays that FileBufferMgr::flushManyAllversion consumes a lot CPU time. The function purges all versions of updated blocks provided as a list of blocks from versionbuffer.

Workaround with lowering writeengine disk cache aka numblockpct to 10% didn't help.

The update insert int columns and matching on (varchar,int) type query in another smaller table. Those updates was run sequentially with no other query running.

Algorithm in FileBufferMgr::flushManyAllversion traverses the whole tr1::unordered_set fbSet trying to find each record in given tr1::unordered_set of blocks to remove.

Suggested solution is to migrate to std::unordered_map with blockid as a hash. Given that number of versions for any block lower then 10 this could even speed up general processing.
Another approach is to find all versions of the block from the list given as an argument using linear search. Given block A try to find N-th version of A, then N-1 th. This approach could be practical even if it has some limitations.

Attachments

Issue Links

duplicates

MCOL-3277 slow update statements

Closed

is part of

MCOL-4343 umbrella for tech debt issues

Open

Activity

People

Assignee:: Roman

Reporter:: Roman

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2019-05-23 10:37

Updated:: 2024-07-08 02:20

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.