Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3327

Optimize UPDATE operation that touches many rows at once.



    • New Feature
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • 1.2
    • 23.10
    • None
    • None


      This issue is optimizing UPDATE on billions of rows. Perf output displays that FileBufferMgr::flushManyAllversion consumes a lot CPU time. The function purges all versions of updated blocks provided as a list of blocks from versionbuffer.

      Workaround with lowering writeengine disk cache aka numblockpct to 10% didn't help.

      The update insert int columns and matching on (varchar,int) type query in another smaller table. Those updates was run sequentially with no other query running.

      Algorithm in FileBufferMgr::flushManyAllversion traverses the whole tr1::unordered_set fbSet trying to find each record in given tr1::unordered_set of blocks to remove.

      Suggested solution is to migrate to std::unordered_map with blockid as a hash. Given that number of versions for any block lower then 10 this could even speed up general processing.
      Another approach is to find all versions of the block from the list given as an argument using linear search. Given block A try to find N-th version of A, then N-1 th. This approach could be practical even if it has some limitations.


        Issue Links



              drrtuy Roman
              drrtuy Roman
              1 Vote for this issue
              2 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.