Every TupleJoiner thread pre-allocates 64MB using our STLPoolAllocator class. With our thread pool that can spawn many threads very quickly pre-allocating a lot of memory. In some systems this will easily blow the configured overcommit. In my tests running 20 simple simultaneous queries it hit 32GB of allocation with only about 1% of that actually used.
Our STLPoolAllocator class also isn't a pool (it isn't even thread safe so it can't be). We would be likely better off using boost::pool_allocator or another off-the-shelf singleton pool allocator.