[MDEV-29979] rpl tests with mixing_engines sometimes time out in MSAN Created: 2022-11-08 Updated: 2023-08-01 Resolved: 2023-07-31 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication, Tests |
| Affects Version/s: | 10.10, 10.11 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Angelique Sklavounos (Inactive) | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
Happens with rpl.rpl_mixed_mixing_engines, rpl.rpl_non_direct_row_mixing_engines, rpl.rpl_non_direct_stm_mixing_engines.
|
| Comments |
| Comment by Andrei Elkin [ 2023-04-17 ] |
|
angelique.sklavounos, rpl.rpl_mixed_mixing_engines and all tests that invoke a huge rpl/inlculde/rpl_mixing_engines.test may need splitting into smaller ones. The latter invokes invokes rpl_mixing_engines.inc 370 times. Could you please consider partition the affected tests into smaller ones? |
| Comment by Angelique Sklavounos (Inactive) [ 2023-05-01 ] |
|
The tests have been passing for the past month, so there might not be a need anymore to split the tests up. |
| Comment by Roel Van de Paar [ 2023-07-31 ] |
|
This ticket duplicates MDEV-31790, a more recent and fuller report, as far as test slowness in the MSAN builder goes. However, that item is not otherwise specifically discussed in the report, the title (and MTR) just use the term "time out", but that was the result of a crash. There is no real timeout here; the issue is a crashed server. Note "found 'core' (5/5)", "mysqld.2/data/core" and "2002 Can't connect to local server through socket". However, as GDB was not available at the time, no reliable stack is present. Additionally the description says "A failure is shown below, but the output appears to vary random." There is nothing further to analyse or fix here (except to ensure GDB is now available on the builder in question vladbogo fyi, to check); closing. If other crashes are observed on any builder, then they should be debugged individually. |
| Comment by Marko Mäkelä [ 2023-07-31 ] |
|
Roel, I believe that as a result of a non-default 480-second (6-minute) timeout, the processes were killed externally by sending SIGABRT to them. Stack traces or core dumps produced from such a timeout-related killing could be useful for analyzing the root cause, in case it is a deadlock of threads. In this particular case, based on the findings in MDEV-31790, it might be that these tests may run extremely slowly, with no actual hang. |
| Comment by Vlad Bogolin [ 2023-07-31 ] |
|
Roel hopefully this https://github.com/MariaDB/buildbot/pull/148 would solve the gdb issue |