[MDEV-8859] rpl.rpl_mdev382 sporadically fails to finish due to disappeared expect file Created: 2015-09-28  Updated: 2016-06-12  Resolved: 2016-06-12

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 5.5
Fix Version/s: 5.5.50

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Elena Stepanova
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-7069 Fix buildbot failures in main server ... Stalled
Sprint: 5.5.50

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-deb-precise-x86/builds/2982/steps/test_4/logs/mysqld.1.err.3
http://buildbot.askmonty.org/buildbot/builders/kvm-deb-precise-x86/builds/2982/steps/test_4/logs/stdio

In the test output, it looks like this:

rpl.rpl_loaddata 'stmt,xtradb'           w3 [ pass ]    901
worker[3] mysql-test-run: *** ERROR: can't open file "/run/shm/var/3/tmp/mysqld.1.expect": No such file or directory

After that, the worker does not do anything anymore, and at the end MTR complains

mysql-test-run: *** ERROR: Not all tests completed (only 3093 of 3094)

We know it's rpl.rpl_mdev382 because it's the last test in the corresponding server error log:

Version: '5.5.46-MariaDB-1~precise-log'  socket: '/run/shm/var/tmp/3/mysqld.1.sock'  port: 16040  mariadb.org binary distribution
CURRENT_TEST: rpl.rpl_mdev382
150927 19:39:14 [Note] /usr/sbin/mysqld: Normal shutdown
 
150927 19:39:14 [Note] Event Scheduler: Purging the queue. 0 events
150927 19:39:14  InnoDB: Starting shutdown...
150927 19:39:15  InnoDB: Shutdown completed; log sequence number 1621017
150927 19:39:15 [Note] /usr/sbin/mysqld: Shutdown complete

The test only does restart once, and does it seemingly accurate: it writes "wait", then shuts down the server, then removes the file, then writes "restart".
And removing the file fails claiming that it does not exist.
I could not reproduce it, but it's not the first time I see it (not so easy to find since it won't show up in the cross-reference report).

For now, I don't see why it's happening.



 Comments   
Comment by Elena Stepanova [ 2016-06-11 ]

Relatively fresh occasion on 5.5-galera (Feb 17, 2016):
http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-centos5-amd64/builds/2356/steps/test/logs/mysqld.1.err.1
http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-centos5-amd64/builds/2356/steps/test/logs/stdio

So, it's not specific either to 32-bit platform or the particular builder.

Comment by Elena Stepanova [ 2016-06-11 ]

There are 5 occasions on labrador, and it does not even run MTR with parallel workers. So, the theory that a parallel test somehow interferes can be safely abandoned.

http://buildbot.askmonty.org/buildbot/builders/labrador/builds/5554 (bb-10.0-serg)
http://buildbot.askmonty.org/buildbot/builders/labrador/builds/5566
http://buildbot.askmonty.org/buildbot/builders/labrador/builds/5567
http://buildbot.askmonty.org/buildbot/builders/labrador/builds/5582
http://buildbot.askmonty.org/buildbot/builders/labrador/builds/5818

All but one are on 5.5 trees. It starts getting strange – given that 10.x have many more pushes and test runs, it should hit the problem more frequently.

Comment by Elena Stepanova [ 2016-06-11 ]

Occasions on p8:

http://buildbot.askmonty.org/buildbot/builders/p8-rhel7-bintar/builds/304
http://buildbot.askmonty.org/buildbot/builders/p8-rhel7-bintar/builds/586

http://buildbot.askmonty.org/buildbot/builders/p8-rhel7-bintar-debug/builds/1036
http://buildbot.askmonty.org/buildbot/builders/p8-rhel7-bintar-debug/builds/109
http://buildbot.askmonty.org/buildbot/builders/p8-rhel7-bintar-debug/builds/1164
http://buildbot.askmonty.org/buildbot/builders/p8-rhel7-bintar-debug/builds/140
http://buildbot.askmonty.org/buildbot/builders/p8-rhel7-bintar-debug/builds/196
http://buildbot.askmonty.org/buildbot/builders/p8-rhel7-bintar-debug/builds/820

(maybe there are more, didn't check vlp04-06).

Comment by Elena Stepanova [ 2016-06-11 ]

Reproduced locally on 5.5.44 release build by running perl ./mtr rpl.rpl_mdev382 --repeat=100

Comment by Elena Stepanova [ 2016-06-12 ]

The failure is very easily and reliably reproducible with this change:

diff --git a/mysql-test/suite/rpl/t/rpl_mdev382.test b/mysql-test/suite/rpl/t/rpl_mdev382.test
index 21925e0..c26e9e8 100644
--- a/mysql-test/suite/rpl/t/rpl_mdev382.test
+++ b/mysql-test/suite/rpl/t/rpl_mdev382.test
@@ -210,6 +210,7 @@ EOF
 --shutdown_server 30
 
 --remove_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
+--sleep 1
 --write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
 restart-rpl_mdev382.test
 EOF

The problem is that after finding mysqld.1.expect file containing wait, mysql-test-run.pl keeps reading it in a loop. So, there is a short window between remove and write when the file does not exist, if MTR happens to attempt to read at this moment, the failure occurs.

The fix is to use append_file instead.
It has already been fixed in 10.x, hidden among other fixes in this commit:

commit 6b97512b2170fc6a77288cef6e5a85cad773ae4e
Author: unknown <knielsen@knielsen-hq.org>
Date:   Wed Apr 24 13:05:40 2013 +0200
 
    Add missing check for thd->killed in mysql_binlog_send().
    
    The slave dump thread running on the master only checked thd->killed whenever
    it reached the end of a binlog file, not between events. This could
    unnecessarily delay server shutdown.
    
    This was found by code inspection while tracking down some occasional "forcing
    close of thread..." errors in Buildbot. Hopefully this will fix the failures,
    but the fix is correct in any case.
    
    Also increase the wait during server shutdown, 2 seconds is a bit tight in
    case of heavy I/O stall, and it seems better to delay shutdown a bit than
    force-kill threads unnecessarily.
    
    Also fix some races in test cases that restart the mysqld server. The .expect
    file should be changed with --append_file, --remove_file + --write_file
    creates a short window where mysqld can error out due to .expect file missing.

So it's a 5.5-only fix.

Comment by Elena Stepanova [ 2016-06-12 ]

https://github.com/MariaDB/server/commit/67b4a6f576db102d652b401c3d0150556826f641

Generated at Thu Feb 08 07:30:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.