[MDEV-30695] Refactor case folding data types in Asian collation - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: 10.11.3, 11.0.1, 10.10.4
Component/s: Character Sets
Labels:
None

Description

Case folding tables are stored in the following structures:

typedef struct unicase_info_char_st

  uint32 toupper;

  uint32 tolower;

  uint32 sort;

} MY_UNICASE_CHARACTER;

struct unicase_info_st

  my_wc_t maxchar;

  MY_UNICASE_CHARACTER **page;

};

The member MY_UNICASE_CHARACTER::sort is not used by the underlying code in Asia collations.

This member is only used by Unicode _general_ci collations. For other collations (Asian collations, Unicode UCA collations, Unicode _bin collations) the member MY_UNICASE_CHARACTER::sort only wastes memory.

It's good to refactor the code, so those tables do not waste space.

In ~~MDEV-30577~~ we're going to introduce new casefolding tables for Unicode-14.0.0 collations soon. It's good to refactor the code before ~~MDEV-30577~~.

Lets add new data types to store casefolding information:

typedef struct casefold_info_char_t

  uint32 toupper;

  uint32 tolower;

} MY_CASEFOLD_CHARACTER;

struct casefold_info_st

  my_wc_t maxchar;

  MY_CASEFOLD_CHARACTER **page;

};

and change all Asian collations to store casefolding tables using new data types.

Note, some or all Unicode collations will be also modified to use new data types, but later under terms of a separate task.

Attachments

Issue Links

blocks

MDEV-30577 Case folding for uca1400 collations is not up to date

Closed

Activity

There are no comments yet on this issue.

People

Assignee:: Alexander Barkov

Reporter:: Alexander Barkov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2023-02-21 10:02

Updated:: 2023-04-18 07:29

Resolved:: 2023-02-21 10:40

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server