we do trylock() and then lock() rather often: up to 30% of cases (performance concern)
You increase the number of instances when lock/trylock ratio reaches 50%. May be you should do it earlier? At 30%, may be?
additional code on rather a hot path (performance concern)
That should normally be just ++mutex_nowaits, shouldn't it?
I couldn't get perfect 3 instances for my host with autosizing: it either gets 2 or raising number of instances up to limit (everything under 480 for waits)
Interesting. Why would you think is that? What did you do in your benchmarks? You've never had only 1 instance?
we can't avoid warm-up (bad for benchmarks)
True. How long a warm-up is needed, what was your impression?
Anyway, any proper benchmark does a warm-up anyway, so it this your warm-up with shorter than what benchmarks typically do, it should be fine.
serg, please review patch for this task.