[MDEV-21698] optimize implementations of byte load/store macros for non-X86 little-endian architectures Created: 2020-02-10 Updated: 2021-03-08 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Compiling |
| Fix Version/s: | None |
| Type: | Task | Priority: | Minor |
| Reporter: | Alexey Kopytov | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | ARMv8 | ||
| Description |
|
The integer load/store macros defined in include/my_byteorder.h have different implementations for x86[_64] (see include/byte_order_generic_x86.h and include/byte_order_generic_x86_64.h) and all other architectures (include/byte_order_generic.h). Which is unfortunate, because that discriminates little-endian architectures that are not X86, in particular ARM64. The distinction should really be between big-endian and little-endian architectures. Which is the way it is currently implemented in MySQL 8.0, where they have replaced most legacy *int*korr() and int*store() macros with memcpy()-based inline functions, see https://github.com/mysql/mysql-server/commit/536ea313a6a71f9ed87f14d95e03e04e40ff5605 The rationale for using memcpy() looks a little inconclusive to me. But it works almost fine, i.e. the compiler is usually smart enough to convert memcpy into the most efficient implementation on all little-endian architectures. This is a request to:
I'm not yet sure if I'm allowed to share benchmark results and contribute code. But I'm attaching a Godbolt link as a testcase that demonstrates the optimization opportunities. Just build it and run on any available ARM64 and X86 machines: |
| Comments |
| Comment by Sergey Vojtovich [ 2020-02-10 ] |
|
I think we had some PR by Eugene Kosov (https://github.com/MariaDB/server/pull/1232), which was declined by Monty. We should definitely do some extra research here. |