[MDBF-406] Disable valgrind for 10.5+ branches Created: 2022-05-04 Updated: 2023-11-07 Resolved: 2022-10-18 |
|
| Status: | Closed |
| Project: | MariaDB Foundation Development |
| Component/s: | Buildbot |
| Affects Version/s: | N/A |
| Fix Version/s: | N/A |
| Type: | Task | Priority: | Major |
| Reporter: | Vlad Bogolin | Assignee: | Michael Widenius |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | 0d | ||
| Time Spent: | 0.25h | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
marko suggested disabling the Valgrind builder for 10.5+ branches because we have working ASAN and MSAN, and (on the old buildbot only) UBSAN on these branches. |
| Comments |
| Comment by Marko Mäkelä [ 2022-05-16 ] | |||||||||||||||||||||
|
ASAN shadow bytes are roughly equivalent to the A bits of Valgrind memcheck. One ASAN shadow byte covers 64 bytes, while the Valgrind A bits have 1-byte granularity. Before 10.5, the code is too broken for MSAN, mainly because before For branches up to 10.4, a Valgrind builder might still be useful. | |||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-10-06 ] | |||||||||||||||||||||
|
I made some observations while addressing what I consider bogus failures with Valgrind ( When run under Valgrind, the server can be a lot slower, because Valgrind is a single-threaded JIT based CPU that emulates multiple threads by interleaving them. InnoDB is a multi-threaded storage engine. Shutdown or crash recovery can take very long under Valgrind when it chooses an unfortunate scheduling. For example, the mtr framework would silently and forcibly kill the server in the middle of STOP SLAVE or while InnoDB is shutting down. The test could subsequently fail in surprising ways. Moreover, as lock-free or std::atomic based performance improvements have been added to InnoDB, the Valgrind tests could be taking longer to run. Also new tests are being added to later branches. 10.3 typically completes tests in less than 1 hour, while later branches require 1 to 2 hours, depending on the assigned worker. MDEV-29508 was causing additional 2-hour timeouts in 10.5 and later until I disabled that test under Valgrind. I think that running Valgrind for 10.5 or later is a serious waste of resources. On the currently latest available 10.5 push, the ASAN build+test would complete in 36 minutes, and the MSAN build+test in 22 minutes. Unlike Valgrind, ASAN and MSAN cover all code (not just the server) and they are much more likely to catch errors that involve race conditions. And unlike Valgrind, the overhead of ASAN or MSAN is fairly low for multi-threaded programs (I think about 250% or 350%). Valgrind can make many things run hundreds or thousands of times slower. | |||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-10-06 ] | |||||||||||||||||||||
|
Here is an example of a test that would run in 4 seconds without Valgrind:
This suggests that any DEBUG_SYNC test could fail in the same way when running under Valgrind, from time to time. I believe that the reason is that Valgrind’s built-in scheduler that emulates multiple threads may choose very unfair schedulings, analogous to rr record --chaos of the rr tool, which uses ptrace() to run the tracee one thread at a time. | |||||||||||||||||||||
| Comment by Michael Widenius [ 2022-10-18 ] | |||||||||||||||||||||
|
I personally find valgrind easier and better to use at home than using ASAN and MSAN (that require specific builds). conclusion: There is no reason to disable valgrind support or valgrind builds in buildbot | |||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-10-18 ] | |||||||||||||||||||||
|
I would be curious to see one example of a genuine issue that Valgrind finds and ASAN and MSAN do not. | |||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-28 ] | |||||||||||||||||||||
|
I just found an example of something that should be unlikely be implemented in Valgrind: MSAN_OPTIONS=poison_in_dtor=1, which was recently enabled by default, is flagging some code in | |||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-07 ] | |||||||||||||||||||||
|
For what it is worth, Valgrind used to issue bogus warnings for Clang-generated code. Now it has been "improved". I get (among others) the following error when trying to invoke Valgrind on a WITH_VALGRIND=ON executable that was built with Clang 16.0.6.
When I omit --valgrind from the mtr options, the tests will pass. |