[MDEV-8518] rpl.sec_behind_master-5114 fails sporadically in buildbot Created: 2015-07-21 Updated: 2020-11-04 Resolved: 2017-01-04 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Tests |
| Affects Version/s: | 10.0, 10.1, 10.2 |
| Fix Version/s: | 10.0.29, 10.1.21, 10.2.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Elena Stepanova |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Sprint: | 10.2.1-5, 10.0.29 | ||||||||||||
| Description |
|
http://buildbot.askmonty.org/buildbot/builders/p8-trusty-bintar/builds/494/steps/test/logs/stdio
|
| Comments |
| Comment by Elena Stepanova [ 2015-09-07 ] | ||||||||||||||||||||||||||||||||||
|
Newer link: http://buildbot.askmonty.org/buildbot/builders/kvm-deb-utopic-x86/builds/762/steps/test_4/logs/stdio | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-01-01 ] | ||||||||||||||||||||||||||||||||||
|
The failure is caused by a simple race condition in the test. But it's not enough to fix it, the test itself doesn't really test the bugfix.
The problem is that if the test runs without delays, since the event e1 is fast, we will normally have e2_start_time == e1_start_time. The test is very similar to the draft suggested in Now, the race condition which makes the test fail as pasted in the description is simply that on a slow server it might happen that the first show slave status which the test uses to monitor Seconds_Behind_Master will be executed when the slave is already over 1 second within executing the slow event (sleep), the value is correct, it's just not expected. | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-01-01 ] | ||||||||||||||||||||||||||||||||||
|
https://github.com/MariaDB/server/commit/c4c29523bbb475d49aab9d1f5afe52a049b7b501 | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-01-02 ] | ||||||||||||||||||||||||||||||||||
|
serg, | ||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2017-01-04 ] | ||||||||||||||||||||||||||||||||||
|
As far as I can see, the test does test the bug fix. The test succeeds with a bug fix and fails without it. If the test misses 1 on a slow server... The simple way to fix it and preserve the logic of a test would be to add some tolerance for slow servers and expect, say, 1 or 2, in this case the event should take longer than that, like, SLEEP(3) at least. I don't like increasing the test execution time, though, so let's use your fix, it seems ok. Please verify that it fails without the bug fix, though. | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-01-04 ] | ||||||||||||||||||||||||||||||||||
|
Okay, sorry, I was wrong about the existing test not testing the fix, it is just prone to false negatives. Here is how the normal execution goes on 10.0.17 (before
That's what the test expects, and so, it would indeed fail. But every other time, it goes like this:
So, the test would pass, even though the bug is not fixed yet. The first raise of Seconds_Behind_Master apparently comes from CREATE TABLE which the proposed test change also addresses by synchronizing with master after CREATE TABLE – so yes, I'll stick with it, because increasing the sleep time is not very reliable anyway, some builders can be extremely slow. | ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-01-04 ] | ||||||||||||||||||||||||||||||||||
|
The new version of the test fails before the patch like this:
| ||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-01-04 ] | ||||||||||||||||||||||||||||||||||
|
https://github.com/MariaDB/server/commit/9bf92706d19761722b46d66a671734466cb6e98e |