Details
-
Bug
-
Status: Open (View Workflow)
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
ARM64
Description
my_convert() in strings/ctype.c has this special optimization for i386 and x86_64:
#if defined(__i386__) || defined(__x86_64__)
|
/* |
Special loop for i386, it allows to refer to a
|
non-aligned memory block as UINT32, which makes
|
it possible to copy four bytes at once. This
|
gives about 10% performance improvement comparing
|
to byte-by-byte loop.
|
*/
|
for ( ; length >= 4; length-= 4, from+= 4, to+= 4) |
{
|
if ((*(uint32*)from) & 0x80808080) |
break; |
*((uint32*) to)= *((const uint32*) from); |
}
|
#endif /* __i386__ */
|
|
... /* Unoptimized bytewise processing goes here */ |
Two things can be improved about that code:
1. 64-bit architectures like x86_64 could be optimized even further by processing 8 bytes at a time;
2. Other 64-bit architectures like aarch64 could also benefit from the same optimization, rather than process the input byte by byte.
In our case we see a few percent improvement in CPU-bound sysbench OLTP RO on AArch64, which is not too bad for such a simple optimization.
It's worth mentioning that improvements in benchmark numbers are seen with default-character-set=utf8.