[MDEV-21581] Helper functions and methods for CHARSET_INFO - Jira

XML

Word

Printable

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: 10.5.1
Component/s: Character Sets
Labels:
None

Description

The call notation of CHARSET_INFO routines has some disadvantages:

it looks cumbersome
it exposes internal structure of CHARSET_INFO to the caller

Examples:

  // An example from storage/innodb/

  tmp_length = charset->coll->strnxfrm(charset, str, str_length,

                                       str_length, tmp_str,

                                       tmp_length, 0);

  // An example from storage/innodb/

  mbl = cs->cset->ctype(cs, &ctype, (uchar*) doc, (uchar*) end);

  // An example from storage/myisam/

  mbl= cs->cset->ctype(cs, &ctype, (uchar*)doc, (uchar*)end);

  // An example from storage/myisam/

  keyseg->charset->cset->fill(keyseg->charset,

                              (char*) pos + length,

                              keyseg->length - length,

                              ' ');

To make the call notation simple and proof to changes in CHARSET_INFO, lets do the following:

Add pure C wrappers for all virtual functions in MY_CHARSET_HANDLER and MY_COLLATION_HANDLER, e.g.

static inline void

my_ci_fill(CHARSET_INFO *cs, char *to, size_t len, int ch)

  (cs->cset->fill)(cs, to, len, ch);

Let's call all new functions using the my_ci_ prefix, to make it clear that the first argument is CHARSET_INFO.

Add C++ methods into struct charset_info_st, like this:

struct charset_info_st

#ifdef __cplusplus

...

  void fill(char *to, size_t len, int ch) const

    (cset->fill)(this, to, len, ch);

...

  size_t strnxfrm(uchar *dst, size_t dstlen, uint nweights,

                  const uchar *src, size_t srclen, uint flags) const

    return (coll->strnxfrm)(this,

                            dst, dstlen, nweights,

                            src, srclen, flags);

...

#endif

};

so the code in the above examples will turn into:

  // C++ code

  tmp_length = charset->strnxfrm(str, str_length,

                                 str_length, tmp_str,

                                 tmp_length, 0);

  // C++ code

  mbl = cs->ctype(&ctype, (uchar*) doc, (uchar*) end);

  /* Pure C code */

  mbl= my_ci_ctype(cs, &ctype, (uchar*)doc, (uchar*)end);

  /* Pure C code */

  my_ci_fill(keyseg->charset, (char*) pos + length,

                              keyseg->length - length,

                              ' ');

The new notation is better, as it does not contain sequences like cs->cset-> and cs->coll->, and the CHARSET_INFO parameter is mentioned only one time (instead of two times), so the new style of the caller code:

is shorter
is less bug prone
is future proof: it won't change if we change the structure CHARSET_INFO, e.g. decompose CHARSET_INFO into smaller pieces responsible for character set and collation properties. Only the wrapper functions and methods will change, the caller code will remain the same.

Attachments

Issue Links

blocks

MDEV-8334 Rename utf8 to utf8mb3

Closed

MDEV-21504 Collation: Create shared library for engines to use

Closed

Activity

People

Assignee:: Alexander Barkov

Reporter:: Alexander Barkov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2020-01-28 07:11

Updated:: 2020-01-28 11:29

Resolved:: 2020-01-28 09:59

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.