Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
Description
The call notation of CHARSET_INFO routines has some disadvantages:
- it looks cumbersome
- it exposes internal structure of CHARSET_INFO to the caller
Examples:
// An example from storage/innodb/ |
tmp_length = charset->coll->strnxfrm(charset, str, str_length,
|
str_length, tmp_str,
|
tmp_length, 0);
|
// An example from storage/innodb/ |
mbl = cs->cset->ctype(cs, &ctype, (uchar*) doc, (uchar*) end);
|
// An example from storage/myisam/ |
mbl= cs->cset->ctype(cs, &ctype, (uchar*)doc, (uchar*)end);
|
// An example from storage/myisam/ |
keyseg->charset->cset->fill(keyseg->charset,
|
(char*) pos + length, |
keyseg->length - length,
|
' '); |
To make the call notation simple and proof to changes in CHARSET_INFO, lets do the following:
- Add pure C wrappers for all virtual functions in MY_CHARSET_HANDLER and MY_COLLATION_HANDLER, e.g.
static inline void
my_ci_fill(CHARSET_INFO *cs, char *to, size_t len, int ch)
{
(cs->cset->fill)(cs, to, len, ch);
}
Let's call all new functions using the my_ci_ prefix, to make it clear that the first argument is CHARSET_INFO.
- Add C++ methods into struct charset_info_st, like this:
struct charset_info_st
{
#ifdef __cplusplus
...
void fill(char *to, size_t len, int ch) const
{
(cset->fill)(this, to, len, ch);
}
...
size_t strnxfrm(uchar *dst, size_t dstlen, uint nweights,
const uchar *src, size_t srclen, uint flags) const
{
return (coll->strnxfrm)(this,
dst, dstlen, nweights,
src, srclen, flags);
}
...
#endif
};
so the code in the above examples will turn into:
// C++ code |
tmp_length = charset->strnxfrm(str, str_length,
|
str_length, tmp_str,
|
tmp_length, 0);
|
// C++ code |
mbl = cs->ctype(&ctype, (uchar*) doc, (uchar*) end);
|
/* Pure C code */ |
mbl= my_ci_ctype(cs, &ctype, (uchar*)doc, (uchar*)end);
|
/* Pure C code */ |
my_ci_fill(keyseg->charset, (char*) pos + length, |
keyseg->length - length,
|
' '); |
The new notation is better, as it does not contain sequences like cs->cset-> and cs->coll->, and the CHARSET_INFO parameter is mentioned only one time (instead of two times), so the new style of the caller code:
- is shorter
- is less bug prone
- is future proof: it won't change if we change the structure CHARSET_INFO, e.g. decompose CHARSET_INFO into smaller pieces responsible for character set and collation properties. Only the wrapper functions and methods will change, the caller code will remain the same.
Attachments
Issue Links
- blocks
-
MDEV-8334 Rename utf8 to utf8mb3
- Closed
-
MDEV-21504 Collation: Create shared library for engines to use
- Closed