[MDEV-15430] type_float.test floating point error clang-4 Created: 2018-02-27  Updated: 2020-10-11  Resolved: 2018-06-29

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.2, 10.3
Fix Version/s: 10.2.17, 10.2.18

Type: Bug Priority: Critical
Reporter: Vicențiu Ciorbaru Assignee: Teodor Mircea Ionita (Inactive)
Resolution: Not a Bug Votes: 1
Labels: None

Attachments: PNG File travis failing build.png    
Issue Links:
Blocks
blocks MDEV-15814 Travis-CI (consistent) failures for 1... Closed

 Description   

When compiling with clang 4.0.1 in Travis CI, we get the following errors:

leftmain.type_datetime_hires 'innodb'        w5 [ fail ]
        Test ended at 2018-02-26 16:38:32
 
CURRENT_TEST: main.type_datetime_hires
--- /home/travis/build/ottok/mariadb/mysql-test/r/type_datetime_hires.result	2018-02-26 16:10:37.771159137 +0000
+++ /home/travis/build/ottok/mariadb/mysql-test/r/type_datetime_hires.reject	2018-02-26 16:38:32.047346337 +0000
@@ -15,14 +15,14 @@
 0000-00-00 00:00:00.000
 2010-12-11 00:20:03.123
 2010-12-11 01:02:03.456
-2010-12-11 03:04:05.789
+2010-12-11 03:04:05.785
 2010-12-11 15:47:11.123
 select truncate(a, 6) from t1;
 truncate(a, 6)
 0.000000
 20101211002003.120000
 20101211010203.457031
-20101211030405.790000
+20101211030405.785000
 20101211154711.120000
 select a DIV 1 from t1;
 a DIV 1
@@ -33,21 +33,21 @@
 20101211154711
 select group_concat(distinct a) from t1;
 group_concat(distinct a)
-0000-00-00 00:00:00.000,2010-12-11 00:20:03.123,2010-12-11 01:02:03.456,2010-12-11 03:04:05.789,2010-12-11 15:47:11.123
+0000-00-00 00:00:00.000,2010-12-11 00:20:03.123,2010-12-11 01:02:03.456,2010-12-11 03:04:05.785,2010-12-11 15:47:11.123
 alter table t1 engine=innodb;
 select * from t1 order by a;
 a
 0000-00-00 00:00:00.000
 2010-12-11 00:20:03.123
 2010-12-11 01:02:03.456
-2010-12-11 03:04:05.789
+2010-12-11 03:04:05.785
 2010-12-11 15:47:11.123
 select * from t1 order by a+0;
 a
 0000-00-00 00:00:00.000
 2010-12-11 00:20:03.123
 2010-12-11 01:02:03.456
-2010-12-11 03:04:05.789
+2010-12-11 03:04:05.785
 2010-12-11 15:47:11.123
 drop table t1;
 create table t1 (a datetime(4)) engine=innodb;
 
select_pkeycache                    w4 [ fail ]
        Test ended at 2018-02-26 16:39:27
 
CURRENT_TEST: main.select_pkeycache
--- /home/travis/build/ottok/mariadb/mysql-test/r/select_pkeycache.result	2018-02-26 16:10:37.735158872 +0000
+++ /home/travis/build/ottok/mariadb/mysql-test/r/select_pkeycache.reject	2018-02-26 16:39:27.235746730 +0000
@@ -2133,7 +2133,6 @@
 wss_type
 select wss_type from t1 where wss_type ='102935229216544093';
 wss_type
-102935229216544093
 select wss_type from t1 where wss_type =102935229216544093;
 wss_type
 102935229216544093
 
select                              w1 [ fail ]
        Test ended at 2018-02-26 16:41:12
 
CURRENT_TEST: main.select
--- /home/travis/build/ottok/mariadb/mysql-test/r/select.result	2018-02-26 16:10:37.735158872 +0000
+++ /home/travis/build/ottok/mariadb/mysql-test/r/select.reject	2018-02-26 16:41:12.140507768 +0000
@@ -2133,7 +2133,6 @@
 wss_type
 select wss_type from t1 where wss_type ='102935229216544093';
 wss_type
-102935229216544093
 select wss_type from t1 where wss_type =102935229216544093;
 wss_type
 102935229216544093

These are all related failures in mysys/dtoa.c when converting from a string to a floating point number. For large values there seems to be a loss of precision.

The full list of test failures is:
main.type_datetime_hires main.select_pkeycache main.select main.select_jcl6 main.type_float main.func_str main.type_time_hires main.type_timestamp_hires



 Comments   
Comment by Daniel Black [ 2018-02-28 ]

I have a feeling the case of this was the same as what I was investigating in MDEV-14419

Comment by Teodor Mircea Ionita (Inactive) [ 2018-04-24 ]

I ran the following:

time watch -bde "./mtr --force --max-test-fail=10 --parallel=4 --mysqld="--thread_stack=500000" main.type_datetime_hires main.select_pkeycache main.select main.select_jcl6 main.type_float main.func_str main.type_time_hires main.type_timestamp_hires"
 
real	244m49.072s
user	198m23.313s

On latest 10.3 Release with -03 with each full run taking around 9 seconds, which amounts to approx 1632 runs. No failures so far. Clang is Apple LLVM version 9.1.0 (clang-902.0.39.1).

Maybe try on 10.2? Should I be using some extra flags? Maybe also try with Clang on Linux?

Comment by Teodor Mircea Ionita (Inactive) [ 2018-04-24 ]

No-show for branch 10.2 too:

[10.2|CHERRY-PICKING|●11] $ time watch -bde "./mtr --force --max-test-fail=10 --parallel=4 --mysqld="--thread_stack=500000" main.type_datetime_hires main.select_pkeycache main.select main.select_jcl6 main.type_float main.func_str main.type_time_hires main.type_timestamp_hires"
 
real	220m51.917s
user	140m0.388s
sys	145m17.313s

Going to setup a build on Ubuntu 16.04 LTS next with Clang5 (what Travis uses) and test on that in the same manner. Unfortunately the macOS builds on Travis haven't been succeeding for a long while now, maybe it would be worth looking at MDEV-15778 sometime before 10.3 GA.

Comment by Otto Kekäläinen [ 2018-04-24 ]

teodor You could try runnin a Ubuntu 14.04 virtual machine and reproduce it there. It should be very easy to reproduce what Travis-CI does, as the setup and logs are fully public and defines the entire environment.

Comment by Teodor Mircea Ionita (Inactive) [ 2018-04-24 ]

otto I was just doing that , replicating the environment from Travis, only with a 16.04 since I have it readily available, only needed some apt upgrade. If I get a no-show on that too, I will set-up a 14.04 then.

Comment by Vicențiu Ciorbaru [ 2018-04-24 ]

teodor What cmake configure line are you running? Make sure to mimic the one from Travis. I recall reproducing this required -DCMAKE_BUILD_TYPE=RelWithDebInfo.

Also, this is not a race condition, it was always reproducible, so there's no point in repeating a test if it's not reproducible on the first run.

Comment by Otto Kekäläinen [ 2018-04-24 ]

See screenshot - this is the build that contains the this permanently failing test.

Comment by Otto Kekäläinen [ 2018-04-24 ]

@teodor I have protected branches enabled in my own repository, so you can always compare the code to the 10.3 I have to see what the code looked like before it started to error (assuming the problem is the code, not an underlying dependency that updated and introduced this): https://travis-ci.org/ottok/mariadb/branches

(Protected branch: My own 10.3 branch will always be green, as I cannot push on it any commits that did not pass Travis. My work happens in ok-* branches that may fail occasionally, and because of this bug currently always.)

Comment by Teodor Mircea Ionita (Inactive) [ 2018-05-22 ]

Dumping the results I have so far with cross-compiler testing:

    * Only repro with clang4 on Ubuntu 14.04 and 16.04
        * clang version 4.0.1-svn305264-1~exp1 (branches/release_40); 14.04
        * clang version 4.0.0-1ubuntu1~16.04.2 (tags/RELEASE_400/rc1)
    * No show:
    * macOS 10.13 clang-9.1
    * On 14.04:
        * clang3.3-9,5
        * gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4
        * gcc 4.4
    * On 16.04
        * gcc-5.4 (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
        * gcc-4.7 (Ubuntu/Linaro 4.7.4-3ubuntu12) 4.7.4
        * clang version 5.0.0-3~16.04.1 (tags/RELEASE_500/final)
    * On 17.10
        * gcc-7 (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0
        * gcc-6 (Ubuntu 6.4.0-8ubuntu1) 6.4.0 20171010

Working on disabling affected tests for affected clang version in Travis. Another option suggested by otto is to just drop support for this version.

Comment by Daniel Black [ 2018-05-22 ]

If you want to try clang for the main branches there's a repo and packages: http://apt.llvm.org/.

Is https://github.com/MariaDB/server/pull/505/commits/9fce41be75a1620f11bdcbfd305c4ede1919ae16 the workaround you need?

Note "cross-compiler" has a build/target architecture difference which isn't the same as what you are doing.

Comment by Teodor Mircea Ionita (Inactive) [ 2018-05-23 ]

That could work, however, here is the alternative otto mentioned:

https://travis-ci.org/shinnok/server/builds/382495667
https://github.com/shinnok/server/commit/1b4fc3985dd368e2fab92c930f2a97c7d3c5837d
http://apt.llvm.org/trusty/pool/main/l/ - has clang6 too

Dropping clang4 makes the config a tad cleaner and VERSION no. can be the same to keep gcc on par with clang (we have the issue recorded in Jira after all). I would even go one step further and do this also:

https://github.com/MariaDB/server/commit/8e6f1b9f1e555cec2faaa14c950984de4e1be5bc

Just to make the build setup less confusing. See example here:

https://travis-ci.org/shinnok/server/builds/382504284

Comment by Vicențiu Ciorbaru [ 2018-06-29 ]

Compiler bug within clang 4

Comment by Otto Kekäläinen [ 2020-10-11 ]

I removed the skiplists for these tests in https://salsa.debian.org/mariadb-team/mariadb-10.5/-/commit/bde2cf481fa48a0dd85b9ad40e27ad5005ad1122 as we nowadays only run the main test suite as part of the builds (see debian/rules).

Generated at Thu Feb 08 08:21:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.