[MDEV-19709] Bitmap<128>::merge etc may crash on older GCC versions Created: 2019-06-07  Updated: 2019-06-19  Resolved: 2019-06-11

Status: Closed
Project: MariaDB Server
Component/s: Compiling
Affects Version/s: 10.4
Fix Version/s: 10.4.6

Type: Bug Priority: Major
Reporter: Vladislav Vaintroub Assignee: Vladislav Vaintroub
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-19734 investigate the performance effects o... Open

 Description   

Older GCC incorrectly optimizes Bitmap<128> code by using SSE instructions on unaligned data.
Analysis, by vlad.lesin

The compiller error was found on Ubuntu 16.04, CentOS 6/7.
To optimize the work with Bitmap<128>::buffer[] compiller uses sse instructions.
For example, the following C++ code:

-------------
void intersect(Bitmap & map2)

{ for (uint i = 0; i < array_elements(buffer); i++) buffer[i] &= map2.buffer[i]; }

-------------

is compilled into the following instructions:

-------------
movdqu xmm0,XMMWORD PTR [r12+0x28]
pand xmm0,XMMWORD PTR [rbx+0x28]
-------------

The second operand of 'pand' instruction must be alligned to 16, otherwise
exception occurs. But compiller generates the above instruction with
non-alligned second argument:

--------------
p ($rbx+0x28)%16
$20 = 8
--------------

There were also other instructions with non-alligned memory operand, for
example, 'por'.



 Comments   
Comment by Sergei Golubchik [ 2019-06-07 ]

As far as I understand, there may be different ways to fix it. Disable SEE (locally) with a pragma, force alignment (with a pragma or an attribute or by moving from uint32 to ulonglong — MDEV-19702).

I personally would try MDEV-19702 first, as it has value on its own, so if it'll help here — so the better.
If it won't — I'd try to see what a newer compiler does, alignment or no SSE or something else. And I'd try to force the older compiler to do the same.

Comment by Vladislav Vaintroub [ 2019-06-07 ]

Serg, moving to ulonglong is not going to help here much, because alignment required for SSE instruction operands (such as pand or por that were the culprit) is not 64bit, but 128 bit.
We tried MY_ALIGNED(16), this for the array, but this did not work out either, since the array maybe inside some struct used by optimizer, allocated not on the stack. and allocators (whether malloc, or self-backed) generally do not force the 16byte alignment. aligned_malloc() and friends could help.

In the fix I use condition on _GNUC_ < 6, for the pragma, because gcc 6.3 did work ok, but gcc 5.4 did not.

Comment by Vladislav Vaintroub [ 2019-06-07 ]

as a side note, a month ago or so, there was no sign of compiler bug on the same compilers.
http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.4-wlad-MDEV-19235

Something has changed since, maybe optimizer folks extended some structures, revealing the bug, or maybe something else got compiler to optimize too aggressively

Comment by Vladislav Lesin [ 2019-06-07 ]

wlad, according to this https://mudongliang.github.io/x86/html/file_module_x86_id_230.html documentation, 'pand' memory operand must be aligned to 16, not to 128.

Comment by Vladislav Vaintroub [ 2019-06-07 ]

vlad.lesin Yes, this is what I'm saying, counting in bits .

Generated at Thu Feb 08 08:53:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.