Difference between revisions of "NewRegressionFramework"

From gem5
Jump to: navigation, search
Line 19: Line 19:
 
** Random latencies?
 
** Random latencies?
 
** Random testing a la memory testers but with different seeds, longer intervals
 
** Random testing a la memory testers but with different seeds, longer intervals
* Decouple from SCons somewhat
+
* Decouple from SCons
 
** Avoid having scons dependency bugs force unnecessary re-running of tests, particularly for update-refs
 
** Avoid having scons dependency bugs force unnecessary re-running of tests, particularly for update-refs
 +
** Don't rely on scons to run jobs... running scons -j8 with a bunch of tests and a batch queing system means that 8 cpus are consumed, even if there is only one job running.
 +
** Either make scons be able to submit the jobs or have something else that manages the jobs and their completion status
 
* Easy support for running separate tests where only the input parameters differ
 
* Easy support for running separate tests where only the input parameters differ
 
** For example, several protocols utilize different state transitions depending on configuration flags.  It would be great if we could test these without having to create new directories and tests.
 
** For example, several protocols utilize different state transitions depending on configuration flags.  It would be great if we could test these without having to create new directories and tests.

Revision as of 12:27, 12 April 2011

We'd like to revamp the regression tests by moving to a new framework. This page is intended to host a discussion of features and design for the new framework.

Desirable features

  • Ability to add regressions via EXTRAS
    • For example, move eio tests into eio module so we don't try to run them when it's not compiled in
  • Ability to not run regressions for which binaries or other inputs aren't available
    • With maybe some nice semi-automated way of downloading binaries when they're publicly available
  • Better categorization of tests, and ability to run tests by category, e.g.:
    • by CPU model
    • by ISA
    • by Ruby protocol
    • by length
  • More directed tests that cover specific functionality and complete faster. Running spec benchmarks is important but spends a lot of time doing the same thing over and over. Those should only be a component of our testing, not almost all of it like it is now. This is a desirable feature of our testing strategy, not necessarily something that impacts the regression framework.
  • Better checkpoint testing
    • some of this doesn't really depend on the regression framework, just needs new tests
    • e.g., integrating util/checkpoint-tester.py
  • Support for random testing (e.g., for background testing processes)
    • Random latencies?
    • Random testing a la memory testers but with different seeds, longer intervals
  • Decouple from SCons
    • Avoid having scons dependency bugs force unnecessary re-running of tests, particularly for update-refs
    • Don't rely on scons to run jobs... running scons -j8 with a bunch of tests and a batch queing system means that 8 cpus are consumed, even if there is only one job running.
    • Either make scons be able to submit the jobs or have something else that manages the jobs and their completion status
  • Easy support for running separate tests where only the input parameters differ
    • For example, several protocols utilize different state transitions depending on configuration flags. It would be great if we could test these without having to create new directories and tests.
    • Similarly, we could/should test topologies this way as well.
  • Automated way to use nightly regressions as a basis for updating "m5-stable"
    • How do you identify the last working revision? (from Ali)
    • Maybe need a bug-tracking system so we could record facts like "changeset Y fixes a bug introduced in changeset X" then we could automatically exclude changesets between X and Y, but we don't have that. (from stever)
  • Better definitions of success criteria.
    • E.g. Stats were changed, but output is all still correct vs simply passed and failed. (Passed, stats diffs, failed)
    • For example you could say that the terminal output changing is fail, or the stdout and spec binary outputs changing are failed, but a 1% difference in stats is a stats difference, which needs to be addresses
    • I envision this as providing reasonable certainty that if you create a change you know will modify the stats, you have a quick verification that nothing broke horribly before updating the stats.

Implementation ideas

Just ideas... no definitive decisions have been made yet.

  • Use Python's unittest module, or something that extends it such as nose
  • Use SCons to manage dependencies between binaries/test inputs and test results, but in a different SCons invocation (i.e., in its own SConstruct/SConscript)