[MDEV-15792] Fix mtr to be able to wait for >1 exited mysqld Created: 2018-04-06  Updated: 2018-09-04  Resolved: 2018-09-04

Status: Closed
Project: MariaDB Server
Component/s: Galera, Tests
Affects Version/s: 10.1, 10.2, 10.3
Fix Version/s: 10.1.36, 10.2.18, 10.3.10

Type: Bug Priority: Critical
Reporter: Jan Lindström (Inactive) Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: contribution, galera

Issue Links:
Blocks
blocks MDEV-13549 Galera 3 test failures Closed
Duplicate
is duplicated by MDEV-16053 Unable to run mtr test suite with opt... Closed
Relates
relates to MDEV-16053 Unable to run mtr test suite with opt... Closed

 Description   

https://github.com/MariaDB/server/pull/665

Tests affected:



 Comments   
Comment by Elena Stepanova [ 2018-04-12 ]

I have no objections to the patch, but please push into a development tree first.

Comment by Elena Stepanova [ 2018-04-22 ]

As it turns out, the patch requires amendments.

First, it causes ERROR: wait_any failed when tests are run with testcase-timeout > 20. It is currently being fixed in https://github.com/MariaDB/server/pull/709#issuecomment-383030848.

Another, and more trickier problem, is a race condition / non-determinism in processing actual crashes.
It presents like this (for example, the test case from MDEV-15878 can be used to reproduce it):

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
CREATE TABLE t1 (f INT) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1),(2);
ALTER TABLE t1 ORDER BY unknown_column;
ERROR 42S22: Unknown column 'unknown_column' in 'order clause'
CREATE TABLE t2 ENGINE=Aria SELECT * FROM t1;
SELECT * FROM t2;
worker[1] Trying to dump core for [mysqltest - pid: 2476, winpid: 2476, exit: 256]
worker[1] Trying to dump core for [mysqld.1 - pid: 2444, winpid: 2444, exit: 256]
worker[1] mysql-test-run: *** ERROR: Unhandled process [mysqltest - pid: 2476, winpid: 2476, exit: 256] exited
mysql-test-run: *** ERROR: Test suite aborted

Possible reason is that SRVDIED logic lies outside the foreach $proc (keys(%keep_waiting_proc)) loop. so whenever the process remaining in $proc is not the server process, the crash doesn't get handled properly.

Comment by Elena Stepanova [ 2018-04-27 ]

The version of pull request #709 of Apr 25 (with commit 9f0d9012) seems to be fixing problems observed locally and in buildbot, verified in buildbot on bb-10.2-mtr tree; however, please note review comments by serg, some changes have been requested.

Comment by Jan Lindström (Inactive) [ 2018-06-28 ]

https://github.com/MariaDB/server/pull/709

Generated at Thu Feb 08 08:24:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.