Difference between revisions of "NewRegressionFramework"
From gem5
Line 3: | Line 3: | ||
== Ali's plan for a new implementation == | == Ali's plan for a new implementation == | ||
* Use [http://pytest.org/latest/ pytest] | * Use [http://pytest.org/latest/ pytest] | ||
+ | * It has, by far, the best documentation of any of the python testing frameworks and seems to be the most active | ||
+ | * Seems to be completely extensible via python plugins and [http://pytest.org/latest/plugins.html#well-specified-hooks hooks] | ||
+ | * Supports outputting JUnit XML incase we want to use a continuous integration solution such as [http://jenkins-ci.org/ Jenkins] or [http://hudson-ci.org/ Hudson] | ||
+ | * The [http://pytest.org/latest/xdist.html#xdist pytest xdist] and [http://pypi.python.org/pypi/pytest-xdist plugin] support running tests on multiple-cpus or multiple machines | ||
+ | * Good collection of [http://pytest.org/latest/talks.html#tutorial-examples-and-blog-postings tasks and tutorials] | ||
+ | === How things would work === | ||
+ | * [http://pytest.org/latest/mark.html Marks] may be assigned to tests either with [http://www.python.org/dev/peps/pep-0318/ python decorators] or a class attribute if we want to stay python 2.5 compatible | ||
+ | ** The decorators would probably include cpu model, memory system, ISA, mode, and run length. | ||
+ | ** We might want to use pytest_addoption to be able to pass lists specifically for each of the decorators and generate tests that match appropriately with [http://pytest.org/latest/example/parametrize.html this] | ||
+ | ** Alternatively we could use [http://pypi.python.org/pypi/pytest-markfiltration/0.4 pytest-markfiltration] although the syntax can be rather contrived | ||
+ | |||
+ | === Outstanding Questions === | ||
+ | * How would we do test discovery? | ||
+ | ** pytest will search py files looking for tests | ||
+ | ** Files can match a pattern, classes in files can match a pattern or functions can match a pattern | ||
+ | ** or it can only match things that inherit from Python UnitTest | ||
+ | * Should we use [http://pytest.org/latest/xunit_setup.html xunit] style or [http://pytest.org/latest/funcargs.html func args] style setups? | ||
+ | * Should we have a class that inherits from Python.UnitTest and does the heavy lifting or should we have a completely separate class that does the heavy lifting and use a factory class to create a bunch of instances of the seperate class? | ||
+ | * Should gem5 be called as a library or on the command line? | ||
+ | * How should we store output files? Same way we do now? should each directory just have a __init__.py and then the tests can be referred to as long.linux_boot.arm.linux.o3? | ||
== Desirable features == | == Desirable features == |
Revision as of 23:32, 17 August 2011
We'd like to revamp the regression tests by moving to a new framework. This page is intended to host a discussion of features and design for the new framework.
Contents
Ali's plan for a new implementation
- Use pytest
- It has, by far, the best documentation of any of the python testing frameworks and seems to be the most active
- Seems to be completely extensible via python plugins and hooks
- Supports outputting JUnit XML incase we want to use a continuous integration solution such as Jenkins or Hudson
- The pytest xdist and plugin support running tests on multiple-cpus or multiple machines
- Good collection of tasks and tutorials
How things would work
- Marks may be assigned to tests either with python decorators or a class attribute if we want to stay python 2.5 compatible
- The decorators would probably include cpu model, memory system, ISA, mode, and run length.
- We might want to use pytest_addoption to be able to pass lists specifically for each of the decorators and generate tests that match appropriately with this
- Alternatively we could use pytest-markfiltration although the syntax can be rather contrived
Outstanding Questions
- How would we do test discovery?
- pytest will search py files looking for tests
- Files can match a pattern, classes in files can match a pattern or functions can match a pattern
- or it can only match things that inherit from Python UnitTest
- Should we use xunit style or func args style setups?
- Should we have a class that inherits from Python.UnitTest and does the heavy lifting or should we have a completely separate class that does the heavy lifting and use a factory class to create a bunch of instances of the seperate class?
- Should gem5 be called as a library or on the command line?
- How should we store output files? Same way we do now? should each directory just have a __init__.py and then the tests can be referred to as long.linux_boot.arm.linux.o3?
Desirable features
- Ability to add regressions via EXTRAS
- For example, move eio tests into eio module so we don't try to run them when it's not compiled in
- Ability to not run regressions for which binaries or other inputs aren't available
- With maybe some nice semi-automated way of downloading binaries when they're publicly available
- Better categorization of tests, and ability to run tests by category, e.g.:
- by CPU model
- by ISA
- by Ruby protocol
- by length
- More directed tests that cover specific functionality and complete faster. Running spec benchmarks is important but spends a lot of time doing the same thing over and over. Those should only be a component of our testing, not almost all of it like it is now. This is a desirable feature of our testing strategy, not necessarily something that impacts the regression framework.
- Better checkpoint testing
- some of this doesn't really depend on the regression framework, just needs new tests
- e.g., integrating util/checkpoint-tester.py
- Support for random testing (e.g., for background testing processes)
- Random latencies?
- Random testing a la memory testers but with different seeds, longer intervals
- Decouple from SCons
- Avoid having scons dependency bugs force unnecessary re-running of tests, particularly for update-refs
- Don't rely on scons to run jobs... running scons -j8 with a bunch of tests and a batch queing system means that 8 cpus are consumed, even if there is only one job running.
- Either make scons be able to submit the jobs or have something else that manages the jobs and their completion status
- Easy support for running separate tests where only the input parameters differ
- For example, several protocols utilize different state transitions depending on configuration flags. It would be great if we could test these without having to create new directories and tests.
- Similarly, we could/should test topologies this way as well.
- Automated way to use nightly regressions as a basis for updating "m5-stable"
- How do you identify the last working revision? (from Ali)
- Maybe need a bug-tracking system so we could record facts like "changeset Y fixes a bug introduced in changeset X" then we could automatically exclude changesets between X and Y, but we don't have that. (from stever)
- Better definitions of success criteria.
- E.g. Stats were changed, but output is all still correct vs simply passed and failed. (Passed, stats diffs, failed)
- For example you could say that the terminal output changing is fail, or the stdout and spec binary outputs changing are failed, but a 1% difference in stats is a stats difference, which needs to be addresses
- I envision this as providing reasonable certainty that if you create a change you know will modify the stats, you have a quick verification that nothing broke horribly before updating the stats.
Implementation ideas
Just ideas... no definitive decisions have been made yet.
- Use Python's unittest module, or something that extends it such as nose
- Use SCons to manage dependencies between binaries/test inputs and test results, but in a different SCons invocation (i.e., in its own SConstruct/SConscript)