[MDEV-23301] my_tosort_unicode: unnecessary max_char check while utf8-mb3 processing Created: 2020-07-27  Updated: 2021-05-11

Status: Open
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Georgy Kirichenko Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance, unicode


 Description   

my_tosort_unicode function checks an input character against uni_plane max char value here (strings/ctype-utf8.c):

static inline void
my_tosort_unicode(MY_UNICASE_INFO *uni_plane, my_wc_t *wc, uint flags)
{
  if (*wc <= uni_plane->maxchar)
  {
    MY_UNICASE_CHARACTER *page;
    if ((page= uni_plane->page[*wc >> 8]))
      *wc= (flags & MY_CS_LOWER_SORT) ?
           page[*wc & 0xFF].tolower :
           page[*wc & 0xFF].sort;
  }
  else
  {
    *wc= MY_CS_REPLACEMENT_CHARACTER;
  }
}

But utf8-mb3 encodes only 2-bytes and there is no uniplanes with max char less than 65535 so such check is not required.

Getting rid of this results in a small performance gain (tested on amd64 and aarch64 with sysbench ro test)


Generated at Thu Feb 08 09:21:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.