[MDEV-135] failures in buildbot in 5.5 on kvm-deb-debian5-amd64 Created: 2012-02-02  Updated: 2012-05-08  Resolved: 2012-02-06

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.5.20

Type: Bug Priority: Major
Reporter: Kristian Nielsen Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-31 make buildbot green for 5.5 Closed

 Description   

Failing test(s): rpl.rpl_checksum_cache rpl.rpl_heartbeat_basic main.ps_3innodb main.ps main.subselect_mat_cost main.select_pkeycache main.multi_update main.union



 Comments   
Comment by Kristian Nielsen [ 2012-02-02 ]

Simple test case:

CREATE TABLE t1 (i INT, INDEX);
INSERT INTO t1 VALUES (1);
SELECT AVG FROM t1;
DROP TABLE t1;

The problem seems to be in my_decimal_div(). This dump is from
Item_sum_avg::val_decimal():

XXX3: SQLCOM_SELECT: SELECT AVG FROM t1
XXX12: Item_sum_avg::val_str()
XXX11: Item_sum_avg::val_decimal()
XXX11: using decimal ...
XXX11: values: 1 / 1
XXX11: sum_dec=9.0: 1 0 0 0 0 0 0 0 0
XXX11: count=9.0: 1 0 54436864 0 1609087657 32688 0 0 31
XXX11: sum/count=9.9: 1 999999999 1 0 11794296 0 6144224 0 1608708904
XXX12: decimal -> 2.0000
XXX13 Item::send(Protocol *, ...) buffer=2.0000

This means that Item_sum_avg::val_decimal() is computing 1/1 with
my_decimal_div(). The result becomes 1.999999999.

Unfortunately, the bug occurence is extremely fragile.

I can repeat on VM vm-debian5-amd64-build by copying in source tarball and
running debian/autobake-deb.sh. If I then add a single line fprintf() in
do_div_mod() and `make -j2`, the problem disappears. If I remove the single
line again and `make -j2`, the problem is still gone ...

wierd ...

Comment by Kristian Nielsen [ 2012-02-02 ]

I discovered that the problem occurs when strings/decimal.c is build with DEB_BUILD_HARDENING=1.
The problem disappears when that file is compiled with that variable not set.

Comment by Kristian Nielsen [ 2012-02-02 ]

Bug is triggered when strings/decimal.c is compiled with -D_FORTIFY_SOURCE=2 (or =1).

Comment by Kristian Nielsen [ 2012-02-06 ]

Ok, I analysed this in detail. My conclusion is that this is a bug in the old
GCC version on Debian 5 "lenny" (4.3.2).

The code does this:

if (unlikely(dcarry == 0 && *start1 < *start2))
...
buf1=start1+len2;
...
SUB2(*buf1, *buf1, lo, carry);
...
dcarry= *start1;

len2 can be zero (and is, when I see the failure). SUB2 assigns to *buf1.

Checking the disassembled GCC output, what it does is cache the value of
*start1 from the top in register %r15d:

a92390: 44 8b 7e fc mov -0x4(%rsi),%r15d # *start1

and it uses this variable to assign to dcarry:

a92512: 44 89 fb mov %r15d,%ebx # dcarry=*start1

This is wrong, as the value in %r15d is stale. *start1 has a new value from the SUB2().

I do not see any problems with the code in terms of violation of strict
aliasing or other issues. My conclusion is that GCC is doing the wrong thing
here.

I do not think there is a point in trying to report this as a GCC bug. This is
in a very old version of the compiler, and we do not see this problem on any
other host/gcc version. It is probably already fixed long ago.

I will add an #ifdef so that the debian package build can work-around the
problem on Debian 5.

Comment by Kristian Nielsen [ 2012-02-06 ]

Buildbot confirms that workaround eliminates the failure.

Generated at Thu Feb 08 06:26:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.