Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21581

Helper functions and methods for CHARSET_INFO

    XMLWordPrintable

    Details

      Description

      The call notation of CHARSET_INFO routines has some disadvantages:

      • it looks cumbersome
      • it exposes internal structure of CHARSET_INFO to the caller

      Examples:

        // An example from storage/innodb/
        tmp_length = charset->coll->strnxfrm(charset, str, str_length,
                                             str_length, tmp_str,
                                             tmp_length, 0);
      

        // An example from storage/innodb/
        mbl = cs->cset->ctype(cs, &ctype, (uchar*) doc, (uchar*) end);
      

        // An example from storage/myisam/
        mbl= cs->cset->ctype(cs, &ctype, (uchar*)doc, (uchar*)end);
      

        // An example from storage/myisam/
        keyseg->charset->cset->fill(keyseg->charset,
                                    (char*) pos + length,
                                    keyseg->length - length,
                                    ' ');
      

      To make the call notation simple and proof to changes in CHARSET_INFO, lets do the following:

      • Add pure C wrappers for all virtual functions in MY_CHARSET_HANDLER and MY_COLLATION_HANDLER, e.g.

        static inline void
        my_ci_fill(CHARSET_INFO *cs, char *to, size_t len, int ch)
        {
          (cs->cset->fill)(cs, to, len, ch);
        }
        

        Let's call all new functions using the my_ci_ prefix, to make it clear that the first argument is CHARSET_INFO.

      • Add C++ methods into struct charset_info_st, like this:

        struct charset_info_st
        {
        #ifdef __cplusplus
        ...
          void fill(char *to, size_t len, int ch) const
          {
            (cset->fill)(this, to, len, ch);
          }
        ...
          size_t strnxfrm(uchar *dst, size_t dstlen, uint nweights,
                          const uchar *src, size_t srclen, uint flags) const
          {
            return (coll->strnxfrm)(this,
                                    dst, dstlen, nweights,
                                    src, srclen, flags);
          }
        ...
        #endif
        };
        

      so the code in the above examples will turn into:

        // C++ code
        tmp_length = charset->strnxfrm(str, str_length,
                                       str_length, tmp_str,
                                       tmp_length, 0);
      

        // C++ code
        mbl = cs->ctype(&ctype, (uchar*) doc, (uchar*) end);
      

        /* Pure C code */
        mbl= my_ci_ctype(cs, &ctype, (uchar*)doc, (uchar*)end);
      

        /* Pure C code */
        my_ci_fill(keyseg->charset, (char*) pos + length,
                                    keyseg->length - length,
                                    ' ');
      

      The new notation is better, as it does not contain sequences like cs->cset-> and cs->coll->, and the CHARSET_INFO parameter is mentioned only one time (instead of two times), so the new style of the caller code:

      • is shorter
      • is less bug prone
      • is future proof: it won't change if we change the structure CHARSET_INFO, e.g. decompose CHARSET_INFO into smaller pieces responsible for character set and collation properties. Only the wrapper functions and methods will change, the caller code will remain the same.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              bar Alexander Barkov
              Reporter:
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: