Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26519

JSON Histograms: improve histogram collection

    XMLWordPrintable

Details

    Description

      The histograms as defined in MDEV-21130 follow the approach used by SINGLE_PREC_HB/DOUBLE_PREC_HB histograms and put the bucket bound exactly after
      a certain fraction of rows.

      1. Popular values should have their own buckets

      For example, consider a table with these values (each character is one row):

        aaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbccdefghijkilmnopq
        |     |     |     |     |     |     |     |     |
        0     1     2     3     4     5     6     7     8
      

      A histogram with 8 buckets will have these buckets:

      1. a-a
      2. a-a
      3. a-a
      3. a-a
      5. a-b
      6. b-f
      7. f-i
      8. i-q
      

      The estimation function will be able to see that 'a' is a popular value, it will determine its frequency with bucket_size precision.

      However, a better approach would be to put all 'a' values in one bucket:

        aaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbccdefghijkilmnopq
        |                           |  |  |  |  |  |  | |
        0                           1  2  3  4  5  6  7 8
      

      This will give a more precise number for table 'a'.
      It will also provide more buckets for other values.

      Other databases

      • MySQL does something like this.
      • PostgreSQL collects MCVs (Most Common Values) and removes them from consideration before building the histogram.

      2. Store the number of distinct values in each bucket

      Again, following MySQL here: store number of distinct values in each bucket.

      For strings, store only prefix

      Store only a 40-character prefix.

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.