Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3327

Optimize UPDATE operation that touches many rows at once.

    XMLWordPrintable

Details

    • New Feature
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • 1.2
    • 23.10
    • None
    • None

    Description

      This issue is optimizing UPDATE on billions of rows. Perf output displays that FileBufferMgr::flushManyAllversion consumes a lot CPU time. The function purges all versions of updated blocks provided as a list of blocks from versionbuffer.

      Workaround with lowering writeengine disk cache aka numblockpct to 10% didn't help.

      The update insert int columns and matching on (varchar,int) type query in another smaller table. Those updates was run sequentially with no other query running.

      Algorithm in FileBufferMgr::flushManyAllversion traverses the whole tr1::unordered_set fbSet trying to find each record in given tr1::unordered_set of blocks to remove.

      Suggested solution is to migrate to std::unordered_map with blockid as a hash. Given that number of versions for any block lower then 10 this could even speed up general processing.
      Another approach is to find all versions of the block from the list given as an argument using linear search. Given block A try to find N-th version of A, then N-1 th. This approach could be practical even if it has some limitations.

      Attachments

        Issue Links

          Activity

            People

              drrtuy Roman
              drrtuy Roman
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.