I found a several query failures trying to get baseline results to test against our work on S3 so far. Roman & I looked into a couple, and at least what we investigated are legitimate bugs, not some environmental difference.
One thing needs to happen right away: turn on result comparisons in our regression test suite. It's really important to know when we've broken something.
A little more looking makes me think some references are bad, and some are missing, so it's not necessarily all broken. But we need to get back to all of the tests we run actually pass.
I tested the code at the following commits in the develop-1.2 branches:
I'll attach the results I got running a baseline test. Failures are listed at the bottom of each file.