Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4713

Optimizer statistics

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Closed (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1.1
    • Component/s: None
    • Labels:
    • Sprint:
      2021-7, 2021-8, 2021-9

      Description

      Optimizer statistics

      Most engines feeds their query optimizers with different types of statistics, e.g histograms of most common values, fractions of NULL values, Number of Distinct Values, relation cardinality. The statistics allows to make educated guesses regarding cardinality of intermediate and final result of a query, e.g. optimizer finds the optimal join order using this statistics. MCS now has a rudimentary statistics that consist of a relation cardinality only.

      The goal

      We need to design and build facilities that allows MCS to make educated guesses about cardinality of intermediate and final result of a query and use this knowledge to produce an optimal plan.

      What is statistics

      The simplest version of statistics must have the list of attributes listed earlier. The improved version should have an extensible format to add attributes(e.g. correlated or dependant histograms on functionaly dependant columns) w/o lots of changes.

      What triggers stats collection

      There must be a user knob in the form of ANALYZE SQL statement. Bulk insert operation inserting more then N values must also trigger ANALYZE to regresh the stats.

      Consumers

      ExeMgr or whatever the optimizer goes into is the most important consumer. The fact that there might be more then one EM in the cluster must be taken into account.

      Where to put stats and how to save it b/w restarts

      EMs in the cluster must have direct access to the stats so stats in-memory representation must reside in the RAM of the primary EM. Other EMs must have a synchronized version of the stats.

      Producers

      EM must initiate statistics collection. The statistics collection process must not differ much from SELECT * FROM query. The sampling method isn't yet choosen.

      Please see the design doc for more details.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              dleeyh Daniel Lee
              Reporter:
              gdorman Gregory Dorman
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration