my_convert() in strings/ctype.c has this special optimization for i386 and x86_64:
Two things can be improved about that code:
1. 64-bit architectures like x86_64 could be optimized even further by processing 8 bytes at a time;
2. Other 64-bit architectures like aarch64 could also benefit from the same optimization, rather than process the input byte by byte.
In our case we see a few percent improvement in CPU-bound sysbench OLTP RO on AArch64, which is not too bad for such a simple optimization.