[MDEV-30695] Refactor case folding data types in Asian collation Created: 2023-02-21  Updated: 2023-04-18  Resolved: 2023-02-21

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: 10.11.3, 11.0.1, 10.10.4

Type: Task Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-30577 Case folding for uca1400 collations i... Closed

 Description   

Case folding tables are stored in the following structures:

typedef struct unicase_info_char_st
{
  uint32 toupper;
  uint32 tolower;
  uint32 sort;
} MY_UNICASE_CHARACTER;
 
struct unicase_info_st
{
  my_wc_t maxchar;
  MY_UNICASE_CHARACTER **page;
};

The member MY_UNICASE_CHARACTER::sort is not used by the underlying code in Asia collations.

This member is only used by Unicode _general_ci collations. For other collations (Asian collations, Unicode UCA collations, Unicode _bin collations) the member MY_UNICASE_CHARACTER::sort only wastes memory.

It's good to refactor the code, so those tables do not waste space.

In MDEV-30577 we're going to introduce new casefolding tables for Unicode-14.0.0 collations soon. It's good to refactor the code before MDEV-30577.

Lets add new data types to store casefolding information:

typedef struct casefold_info_char_t
{
  uint32 toupper;
  uint32 tolower;
} MY_CASEFOLD_CHARACTER;
 
 
struct casefold_info_st
{
  my_wc_t maxchar;
  MY_CASEFOLD_CHARACTER **page;
};

and change all Asian collations to store casefolding tables using new data types.

Note, some or all Unicode collations will be also modified to use new data types, but later under terms of a separate task.


Generated at Thu Feb 08 10:18:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.