Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3258

Support for Replicated Tables

    XMLWordPrintable

Details

    • New Feature
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Do
    • None
    • Icebox
    • N/A

    Description

      Based on our experience with actual customers as well as on the usage of MammothDB (the analytical database that was developed by the Sofia office prior to its acquisition by MariaDB), we believe there is a solid case for implementing replicated tables in ColumnStore. The main drive is to speed up queries where multiple JOIN operations are done by eliminating the need for data redistribution on each query. It will also help cases where dimension tables are too big to be redistributed, pushing the JOIN all the way up to a UM which slows it further (we have seen 50M+ rows in a dimension table with 10+ GB or data). We believe this goal requires several changes, neither of which seems too fundamental to be prohibitive.

      1. Addition of a flag to CREATE TABLE to indicate that the new table will be replicated. This can be done easily by using a SQL comment (much like Spider uses comments for its own things). To remain backward compatible, tables without the new flag will be created as distributed and these having it, as replicated.

      2. Changes to Bulk Load API (which, AFAIK, also handles SQL-driven data load like INSERT and LOAD DATA). It will need to become aware when a table is replicated and act accordingly. This is probably the most significant change as the API will have to ensure all (or a sufficient number of) PMs have written the data. The change does not have to apply to cpimport mode 3 as it is always a local mode and hence the user should be aware of its consequences & implications. Client library that is used by mcsimport, Java wrappers etc. will also be affected.

      3. Changes to query processor, which should know when a table is replicated and skip redistribution for JOIN. Components like ExeMgr and PrimProc will likely be affected, maybe DMLProc also (if it participates a SELECT statement processing). As we're talking of skipping an existing function in certain cases, this should not be hard to add.

      Attachments

        Activity

          People

            toddstoffel Todd Stoffel (Inactive)
            assen.totin Assen Totin (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.