Upd: Here is a new test case, which appears to be deterministic, although still fairly fragile. I keep the initial concurrent one as well in the "Initial report" section, it needs to be re-checked after a fix.
The test is highly non-deterministic and fragile. Run with big enough --repeat=N. It usually failed for me within 50 attempts, but it can vary on different machines/builds.
No visible effect on a non-debug build.