[MDEV-22513] main.processlist_notembedded fails in buildbot with Timeout in wait_until_count_sessions Created: 2020-05-09 Updated: 2020-05-27 Resolved: 2020-05-26 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Tests |
| Affects Version/s: | 10.2, 10.5 |
| Fix Version/s: | 10.5.4, 10.1.46, 10.2.33, 10.3.24, 10.4.14 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Description |
|
http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/4005
|
| Comments |
| Comment by Marko Mäkelä [ 2020-05-11 ] | ||||||||||||||||||||||||||||
|
As reported in My change was merged by the original test author sanja upward. The bb-10.2-release branch where the test failed is a merge from bb-10.1-release. There, that test did not fail anywhere, but notably, kvm-rpm-centos74-amd64 did not run at all on bb-10.1-release. I tried and failed to reproduce the failure locally, despite 1,200 repeats. The [ retry-fail ] in the buildbot log looks promising. I did not find anything useful in the server error log. I am reassigning this back to the original test author, who also did the merge to bb-10.2-release. I hope that this is well repeatable on the buildbot VM. I only fixed | ||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-05-12 ] | ||||||||||||||||||||||||||||
|
I repeated the hang once locally:
I tried to repeat by running the same sequence of tests that the hung worker was running, since the latest server restart. But, the test would fail to time out:
Possibly related note: The tests main.partition_debug_sync and main.query_cache_debug occasionally spend 300 extra seconds (5 minutes) real time, apparently due to something DEBUG_SYNC related. The tests would not fail, but they would run for 300xxx milliseconds instead of some tens or hundreds of milliseconds. I have seen this starting with 10.2. It might also be that I have not run tests on 10.1 frequently enough to witness such ‘hang’. | ||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-05-26 ] | ||||||||||||||||||||||||||||
|
I think that the signal can be lost because we blindly disconnect the paused connection, and then discard the signals by SET DEBUG_SYNC = 'RESET'. A reap before the disconnect should fix this. |