Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26764

JSON_HB Histograms: handle BINARY and unassigned characters

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • 10.8.0
    • Optimizer
    • None

    Description

      This is a follow-up to the discussion with bar.

      Part #1: unassigned characters

      UTF8MB4 charset can represent all known characters. However, other charsets may have so-called un-assigned characters: byte combinations that are not mapped to any particular unicode characters. (These holes are used e.g. for introducing new characters. For example, the EURO sign was initially not present in charsets but then newer versions of charsets have introduced it).

      As for histogram collection: Histogram collection code will try to convert unassigned characters to UTF-8. This will fail, and the result will be not what we need. This needs to be fixed.

      Part #2 [VAR]BINARY

      Conceptually, [VAR]BINARY data does not represent UTF-8 characters.

      Technically, one can store it in UTF-8, as UTF-8 has a character for every (my_wc_t)NUM for each NUM in the [0x00, 0xFF] range.

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.