[MDEV-21816] Suboptimal implementation of my_convert() for ARM64 - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Character Sets
Labels:
- ARMv8
- performance
Environment:
ARM64

Description

my_convert() in strings/ctype.c has this special optimization for i386 and x86_64:

#if defined(__i386__) || defined(__x86_64__)

/*

    Special loop for i386, it allows to refer to a

    non-aligned memory block as UINT32, which makes

    it possible to copy four bytes at once. This

    gives about 10% performance improvement comparing

    to byte-by-byte loop.

*/

  for ( ; length >= 4; length-= 4, from+= 4, to+= 4)

    if ((*(uint32*)from) & 0x80808080)

      break;

    *((uint32*) to)= *((const uint32*) from);

#endif /* __i386__ */

... /* Unoptimized bytewise processing goes here */

Two things can be improved about that code:

1. 64-bit architectures like x86_64 could be optimized even further by processing 8 bytes at a time;
2. Other 64-bit architectures like aarch64 could also benefit from the same optimization, rather than process the input byte by byte.

In our case we see a few percent improvement in CPU-bound sysbench OLTP RO on AArch64, which is not too bad for such a simple optimization.

Attachments

Activity

Ascending order - Click to sort in descending order

Alexey Kopytov added a comment - 2020-02-25 15:00

It's worth mentioning that improvements in benchmark numbers are seen with default-character-set=utf8.

Alexey Kopytov added a comment - 2020-02-25 15:00 It's worth mentioning that improvements in benchmark numbers are seen with default-character-set=utf8 .

Daniel Black added a comment - 2020-10-02 04:39

hmm. pretty sure everything allows unaligned access now. I remember it being a problem on ppc64le too.

I'm thinking a roll to 8 byte, maybe a preloop to get it aligned because afaik aligned access is still a little beit faster and on a string this could be biggish. The the compiler handle the 32 bit case.

FYI krunalbauskar

Daniel Black added a comment - 2020-10-02 04:39 hmm. pretty sure everything allows unaligned access now. I remember it being a problem on ppc64le too. I'm thinking a roll to 8 byte, maybe a preloop to get it aligned because afaik aligned access is still a little beit faster and on a string this could be biggish. The the compiler handle the 32 bit case. FYI krunalbauskar

People

Assignee:: Unassigned

Reporter:: Alexey Kopytov

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2020-02-25 11:49

Updated:: 2020-10-02 04:40

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server