Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26710

Histogram field in mysql.column_stats is too short, JSON histograms get corrupt

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • N/A
    • 10.8.1
    • Optimizer
    • None

    Description

      mysql.column_stats.histogram field is a standard BLOB. A few long values will overfill it easily, the JSON will get truncated and become invalid.

      create or replace table t (a varchar(8192));
      insert into t values
        (repeat('A',8192)),
        (repeat('B',8192)),
        (repeat('C',8192)),
        (repeat('D',8192)),
        (repeat('E',8192)),
        (repeat('F',8192)),
        (repeat('G',8192)),
        (repeat('H',8192)),
        ('I');
       
      set histogram_type= JSON_HB;
      analyze table t persistent for all;
      select * from t where a = 'foo';
       
      # Cleanup
      drop table t;
      

      preview-10.7-MDEV-26519-json-histograms da8bb4b4

      MariaDB [test]> select * from t where a = 'foo';
      ERROR 4183 (HY000): Failed to parse histogram: Root JSON element must be a JSON object at offset 0.
      

      Attachments

        Issue Links

          Activity

            This is because mysql.column_stats.histogram is defined as

              `histogram` blob DEFAULT NULL,
            

            which has a maximum length of 64K.

            If the maximum number of buckets is 255, this gives 257 bytes to represent one bucket.

            In utf8mb4, this is 64 4-byte characters.

            One could argue that JSON syntax parts like field names, quotes, etc. are not 4-byte characters.
            Also, we wanted to truncate the values that are too long.

            Still, the size limit is close enough. I don't see any arguments why we should not raise it.

            psergei Sergei Petrunia added a comment - This is because mysql.column_stats.histogram is defined as `histogram` blob DEFAULT NULL, which has a maximum length of 64K. If the maximum number of buckets is 255, this gives 257 bytes to represent one bucket. In utf8mb4, this is 64 4-byte characters. One could argue that JSON syntax parts like field names, quotes, etc. are not 4-byte characters. Also, we wanted to truncate the values that are too long. Still, the size limit is close enough. I don't see any arguments why we should not raise it.

            People

              psergei Sergei Petrunia
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.