NewRegressionFramework

We'd like to revamp the regression tests by moving to a new framework. This page is intended to host a discussion of features and design for the new framework.

Desirable features

Ability to add regressions via EXTRAS
- For example, move eio tests into eio module so we don't try to run them when it's not compiled in
Ability to not run regressions for which binaries or other inputs aren't available
- With maybe some nice semi-automated way of downloading binaries when they're publicly available
Better categorization of tests, and ability to run tests by category, e.g.:
- by CPU model
- by ISA
- by Ruby protocol
- by length
More directed tests that cover specific functionality and complete faster. Running spec benchmarks is important but spends a lot of time doing the same thing over and over. Those should only be a component of our testing, not almost all of it like it is now. This is a desirable feature of our testing strategy, not necessarily something that impacts the regression framework.
Better checkpoint testing
- some of this doesn't really depend on the regression framework, just needs new tests
- e.g., integrating util/checkpoint-tester.py
Support for random testing (e.g., for background testing processes)
- Random latencies?
- Random testing a la memory testers but with different seeds, longer intervals
Decouple from SCons somewhat
- Avoid having scons dependency bugs force unnecessary re-running of tests, particularly for update-refs
Easy support for running separate tests where only the input parameters differ
- For example, several protocols utilize different state transitions depending on configuration flags. It would be great if we could test these without having to create new directories and tests.
- Similarly, we could/should test topologies this way as well.
Automated way to use nightly regressions as a basis for updating "m5-stable"
- How do you identify the last working revision? (from Ali)
- Maybe need a bug-tracking system so we could record facts like "changeset Y fixes a bug introduced in changeset X" then we could automatically exclude changesets between X and Y, but we don't have that. (from stever)
Better definitions of success criteria.
- E.g. Stats were changed, but output is all still correct vs simply passed and failed. (Passed, stats diffs, failed)
- For example you could say that the terminal output changing is fail, or the stdout and spec binary outputs changing are failed, but a 1% difference in stats is a stats difference, which needs to be addresses
- I envision this as providing reasonable certainty that if you create a change you know will modify the stats, you have a quick verification that nothing broke horribly before updating the stats.

Implementation ideas

Just ideas... no definitive decisions have been made yet.

Use Python's unittest module, or something that extends it such as nose
Use SCons to manage dependencies between binaries/test inputs and test results, but in a different SCons invocation (i.e., in its own SConstruct/SConscript)

NewRegressionFramework

Desirable features

Implementation ideas

Navigation menu

Views

Personal tools

Navigation

Search

Tools