[MDBF-737] Statistically optimize mysql-test runs by running less tests - Jira

XML

Word

Printable

Details

Type: Technical task
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: N/A
Component/s: Buildbot
Labels:
- gsoc14

Description

Every default run of mysql-test-run script takes a lot of time (tens of minutes to many hours, depending on the build and computer configuration).

But most of that time is wasted by running wrong tests. Some of the tests are related, if one of the fails, others will fail too. Some of the tests just almost never fail, while others fail much more often. Some of the test execute a specific part of the source code and if that part of the code isn't changed in a particular revision, these tests will certainly not fail.

I would like to know what tests are most useful to run for every particular revision on every particular test platform. In my experiments one can catch 90% of the problems by only running 10% of the tests.

It doesn't mean we will always run 10% of the tests only. It would make sense to run the complete big test suite before releases or on specific builders. But many builders can test must faster with only a small reduction of the test coverage.

How to experiment

There is no need to run tests on many platforms for this. We have historical data from the buildbot for many years. They contain the information what revisions were tested on what builders, what files were modified in what revision, what tests failed where and so on. One can use these data to analyze and select the best test running strategy.

The goal is to run as little tests as possible, while still being able to detect as many test failures as possible.

What to take into account (ideas):

probability of a test to fail
depending on the builder, on the combination
depending on the changed files, changed lines/functions/etc
inter-test correlations
individual tests within a big test file

Assorted thoughts

what to do what a new builder/test/combination is added? we don't have prior probabilities yes
don't use all the data, instead use a sliding window — the failure rates may change over time
average over different combinations or builders
or don't average and treat triplets (test,combination,builder) as individual "tests"
optimize for time, not for a number of tests — differen builders run with different speed, different tests take different time too
emulate the filter bubble (ignore not predicted falures), have a solution to break it

Attachments

Activity

People

Assignee:: Elena Stepanova

Reporter:: Sergei Golubchik

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2014-03-03 18:25

Updated:: 2024-07-02 02:19