[MDEV-12796] Bad DOUBLE retruned when built with gcc -ffast-math [strings/dtoa.c issue?] Created: 2017-05-15  Updated: 2017-05-29  Resolved: 2017-05-29

Status: Closed
Project: MariaDB Server
Component/s: Compiling
Affects Version/s: 10.1.19, 10.1.22, 10.1.23
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Tech Magos Assignee: Sergei Golubchik
Resolution: Won't Fix Votes: 0
Labels: -ffast-math
Environment:

Linux RHEL 7.x ; also 6.x
HP 370 server
Cpu: Xeon E5 4620 ; also happens with many others

All versions of mariadb since 10.0 to current 10.1



 Description   

A linux/gcc (4.4 or 4.8 or 5.3) optimized build fails returning correct correct values for DOUBLE cells.

To repro:

1. make a gcc optimized build; flags -march=native -O3 -mavx -mfpmath=sse -msse2 -msse4 -fomit-frame-pointer -ffast-math

2. Create a table with a single DOUBLE field.

3. Insert a number like 132 bill (132000000000) . Select it back and you will get non numeric output (a ":" or other character will appear in the number returned)

The problem goes away when you build without the -ffast-math flag is not used

The problem appears to be in dtoa.c in strings/

I believe code should work properly, even when --fast-math is passed.

As an aside, there are faster publicly available atod and dtoa options (as well as itoa and atoi) compare to the ones mariadb uses. Those could give notable speedup for both server and client code and should be possibly reviewed and considered. See e.g.:
https://github.com/miloyip/dtoa-benchmark



 Comments   
Comment by Elena Stepanova [ 2017-05-20 ]

MariaDB [test]> create table t1 (d double);
Query OK, 0 rows affected (0.54 sec)
 
MariaDB [test]> insert into t1 values (132000000000);
Query OK, 1 row affected (0.05 sec)
 
MariaDB [test]> select * from t1;
+--------------+
| d            |
+--------------+
| 131:00000000 |
+--------------+
1 row in set (0.00 sec)

cmake . -DCMAKE_C_FLAGS="-march=native -O3 -mavx -mfpmath=sse -msse2 -msse4 -fomit-frame-pointer --fast-math" -DCMAKE_CXX_FLAGS="-march=native -O3 -mavx -mfpmath=sse -msse2 -msse4 -fomit-frame-pointer --fast-math" && make -j6

$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
 
$ cmake --version
cmake version 3.0.2

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 8.7 (jessie)
Release:	8.7
Codename:	jessie

Note: I didn't check other versions.

Comment by Sergei Golubchik [ 2017-05-23 ]

I tend to say that it is not a bug. -ffast-math enables -funsafe-math-optimizations, which is documented as

Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards.
This option ... can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions.

And -funsafe-math-optimizations enables -freciprocal-math:

Allow the reciprocal of a value to be used instead of dividing by the value if this enables optimizations. For example `x / y' can be replaced with `x * (1/y)', which is useful if `(1/y)' is subject to common subexpression elimination. Note that this loses precision and increases the number of flops operating on the value.

And this is exactly what is happening. When dtoa is compiled without reciprocal math, the code is

	divsd	%xmm4, %xmm2    ; %xmm2 /= %xmm4

during the execution, %xmm2 is 320000000000, %xmm4 is 100000000000, the result is 3.2.
When dtoa is compiled with reciprocal math, the code is

        movsd  0x35a41b(%rip),%xmm4     ; %xmm4 = 1.0
        divsd  %xmm6,%xmm4              ; %xmm4 /= %xmm6
        mulsd  %xmm4,%xmm2              ; %xmm2 *= %xmm4

where %xmm2 is 320000000000, and %xmm6 is 100000000000. This trick loses precision, as documented, and the result is 3.1999999999999997. While the digit 3 is still corectly appended to the result string, the next two digits will be not 2 and 0, but 1 and 10 (that is ':').

So, apparently, dtoa relies on the correct IEEE standard behavior and does not tolerate precision loss caused by reciprocal math optimization.

Comment by Tech Magos [ 2017-05-27 ]

mariadb is built and will be built with the -ffast-math flag (online forums reveal so too).
A number of packages recommend building with this flag (and are checked to work properly with it) as they provide a slight performance boost.

To avoid people hitting problems in the future, you could perhaps disable the relevant gcc optim in dtoa (with a pragma)? Or more agressive, you could #error if the flag is passed (GCC sets a macro if this is enabled)


Unrelated to this bug, but related to dtoa.c, it may be worth to spend some time researching the upgrading of double<->string and int<->string conversion functions in mariadb. There is a large hidden optimization possible here for both server + client: number<->text conversions happen a lot across the text protocol.

Example codes (among others on github): https://github.com/miloyip/dtoa-benchmark (which may also not have the -ffast-math issue)

Comment by Sergei Golubchik [ 2017-05-28 ]

What do you mean by “mariadb is built and will be built with the -ffast-math”? Our packages are not built with -ffast-math. Do you mean RHEL packages?

Checking __FAST_MATH__ is rather imprecise. dtoa cares about -freciprocal-math. One can enable only the latter, and __FAST_MATH__ won't be set. Or one can enable fast math, but disable reciprocal math.

I'll see if I can disable reciprocal math for dtoa.

Yes, other dtoa implementations are certainly worth exploring. Just not within the scope of this issue.

Comment by Sergei Golubchik [ 2017-05-28 ]

I've found three issues so far. One is that you said,

  • dtoa.c needs to be compiled with -ftrapping-math

But there are more:

  • item_strfunc.cc needs to be compiled with -fsignaling-nans

And even that is not enough, it seems that isinf() is not working properly with -ffast-math, so MariaDB crashes on some tests, being unable to detect Inf correctly.

There may be more issues, I've only run one single test out of more than a thousand of tests that we have.

I think it just means — don't use -ffast-math for MariaDB. It doesn't work.

Comment by Sergei Golubchik [ 2017-05-29 ]

I don't see what we can do here, sorry.

Considering other dtoa implementations — yes, duly noted, thanks.
It's unrelated to closing this issue, though.

Generated at Thu Feb 08 08:00:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.