Configuration / Simulation Scripts

From gem5
Revision as of 01:30, 20 August 2006 by Stever (talk | contribs) (Start out at a higher level; add "Simulation" section)
Jump to: navigation, search

Simulation scripts control the configuration and execution of M5 simulations. The M5 simulator itself is basically passive; on invoking M5, it simply executes the user's simulation script, and performs actions only when called by the script.

Simulation scripts are written in Python and executed by the Python interpreter. Currently, the interpreter is linked into the M5 executable, but for most purposes the script's execution should be indistinguishable from invoking the Python interpreter directly.

Configuration

The simulated system is built from a collection of simulator objects, or SimObjects. The configuration file describes the simulated system by describing the SimObjects to be instantiated, their parameters, and their relationships. To provide maximum power and flexibility, configurations are specified using the Python programming language. The program contained in the configuration file is executed to create a hierarchy of Python objects that mirror the SimObjects to be created for the simulation. The easiest way to get started is to use an existing script file. Several examples are provided in the src/configs directory.

Python classes

M5 provides a collection of Python object classes that correspond to its C++ simulation object classes. These Python classes are defined in a Python module called "m5". (The Python class definitions for these objects can be found in the source tree in the src/python/m5/objects directory.)

The first step in specifying a SimObject is to instantiate a Python object of the corresponding class. To make the Python classes visible, the configuration file must first import the class definitions from the m5 module as follows:

from m5.objects import *

A Python object is instantiated by writing the class name (which, by our convention, starts with an uppercase letter) followed by a pair of parentheses. Thus the following code instantiates a SimpleCPU object and assigns it to the Python variable cpu:

cpu = SimpleCPU()

SimObject parameters are specified using Python attributes (Python terminology for object fields or members). These attributes can be set at any time using direct Python assignments or at instantiation time using keywords inside the parentheses. The following instantiation:

cpu = SimpleCPU(clock = '2GHz', width = 2)

is thus equivalent to:

cpu = SimpleCPU()
cpu.clock = '2GHz'
cpu.width = 2

Parameter assignments are partially validated at the time the assignment is performed. The attribute name (e.g., clock) must be a defined parameter for the SimObject class, and the right-hand side of the assignment must be of (or convertible to) the correct type for that parameter. The m5 module defines a large number of domain-specific string-to-value conversions, allowing expressions such as '2GHz' and '64KB' for clock rates and memory sizes, respectively. A complete list of valid units can be found in src/python/m5/convert.py.

The complete list of parameters for a given SimObject class (along with their types, default values, and brief descriptions) can be found by looking at the files in the src/python/m5/objects directory. An attribute labeled Param.X defines a parameter of type X, while an attribute labeled VectorParam.X defines a parameter requiring a vector (Python list) whose elements must be of type X. Note that parameters are inherited: for example, the clock parameter for the SimpleCPU object above is not specified in the SimpleCPU class definition in SimpleCPU.py, but is inherited from SimpleCPU's Python base class BaseCPU (specified in BaseCPU.py).

Connections among SimObjects are formed by using a reference to one SimObject as a parameter value in the construction of a second SimObject. For example, a CPU's instruction and data caches are specified by naming cache SimObjects as the values of the CPU's icache and dcache parameters, respectively.

The configuration hierarchy

To simplify the description of large systems, the overall simulation target specification is organized as a hierarchy (tree). Each node in the tree is a SimObject instance. Even if a SimObject is instantiated in Python, it will not be constructed for the simulation unless it is part of this hierarchy. The program must create a special object root of class Root to identify the root of the hierarchy. When the configuration program completes execution, the tree rooted at root is walked recursively to identify objects to construct. Children are added to SimObjects using the same syntax as setting parameters, i.e., by assigning to Python object attributes. The SimpleCPU object created above can be instantiated by making it a child of the root object as follows:

root = Root()
root.cpu = cpu

As with parameters, children can be assigned at instantiation time using keyword assignment within parentheses. As a result, the instantiation of the CPU and attaching it to the root node can be done in a single line:


root = Root(cpu = SimpleCPU(clock = '2GHz', width = 2))

SimObjects may also become children when they are assigned to a parameter of another SimObject. For example, creating a cache object and assigning it to a CPU's icache or dcache parameter makes the cache object a child of the CPU object in the configuration hierarchy. This effect only occurs for SimObjects that are not in the hierarchy; a SimObject that is already part of the hierarchy is not re-parented when it is assigned to another SimObject's parameter.

The configuration hierarchy determines the final name of each instantiated object. The name is formed from the path from root to the particular object (not including root itself), joining elements with '.'. For example, consider the following configuration:

my_cpu = SimpleCPU(clock = '2GHz', width = 2)
my_cpu.icache = BaseCache(size = '32KB', assoc = 2)
my_cpu.dcache = BaseCache(size = '64KB', assoc = 2)
my_system = LinuxSystem(cpu = my_cpu)
root = Root(system = my_system)

In this case, the resulting SimObjects will have the internal names system, system.cpu, system.cpu.icache, and system.cpu.dcache. These names will be used in statistics output, etc. The names my_cpu and my_system are simply Python variables; they can be used within Python to set attributes, add children, etc., but they are not visible to the C++ portion of the simulation. Note that a Python object can be accessed using its configuration hierarchy path from within Python by prepending root. .

A child attribute can also accept a vector of SimObjects. As with vector-valued parameters, these vectors are expressed as Python lists, for example, system.cpu = [ SimpleCPU(), SimpleCPU() ]. In Python, these objects can be accessed using standard list index notation (e.g., system.cpu[0]). The internal names for the objects are formed by directly appending the index to the attribute name (e.g., system.cpu0).

In detail, the semantics of assigning to SimObject attributes are as follows:

If the attribute name identifies one of the SimObject's formal parameters, then the value on the right-hand side is converted to the parameter's type. An error is raised if this conversion cannot be performed. If the value is a SimObject that is not associated with the configuration hierarchy, that SimObject also becomes a child of the SimObject whose attribute is being assigned. The parameter name is used as the final element in the assigned SimObject's name.

If the attribute name does not correspond to a formal parameter and the right-hand value is a SimObject or a list of SimObjects, those SimObject(s) become children of the SimObject whose attribute is being assigned. The attribute name is used as the final element in the assigned SimObject's name.

If the attribute name does not correspond to a formal parameter and the right-hand value is not a SimObject, an error is raised.

Inheritance and late binding

SimObject instances inherit both parameters and values from the classes they instantiate. A key feature of the configuration system is that value inheritance is largely late binding, that is, values are propagated to instances when the hierarchy is instantiated, not when the instance is created. As a result, a value can be set on a class parameter after instances have been created, and the instances will receive the more recent parameter value (as long as the parameter has not been explicitly overridden on those instances). The following example demonstrates this behavior:

# Instantiate some CPUs.
scpu1 = SimpleCPU()
scpu2 = SimpleCPU()
fcpu1 = FullCPU()
fcpu2 = FullCPU()

# Since BaseCPU is a common base class for SimpleCPU and FullCPU, the
# following statement will cause all of the above CPUs to have a clock
# rate of 1GHz.
BaseCPU.clock = '1GHz' 

# The following statement sets the width of both scpu1 and scpu2 to 4.
SimpleCPU.width = 4

# We can override the clock rate for a specific CPU.  Note that this
# assignment will have the same effect whether it is before or after
# the BaseCPU.clock assignment above.
fcpu2.clock = '2GHz'

Subclassing

Users can define new SimObject classes by deriving from existing M5 classes. This feature can be useful for providing classes with differing sets of parameter values. These subclasses are defined using standard Python class syntax:

class CrazyFastCPU(FullCPU):
    rob_size = 10000
    width = 100
    clock = '10GHz'

Users can also subclass or instantiate the SimObject class directly, e.g., obj = SimObject(). These Python objects will not generate C++ SimObjects, but can be assigned children. They can be useful to create internal nodes in the configuration hierarchy that represent collections of SimObjects but do not correspond to C++ SimObjects themselves.

Relative references

In many situations, SimObject parameters have obvious default values that cannot be explicitly named in the general case. For example, many I/O devices need a pointer to the enclosing system's physical memory object or to the enclosing system object itself. Similarly, a cache's default latency might be expressed most conveniently in terms of the clock period of the attached CPU. However, the path names of those objects will vary from configuration to configuration. M5's configuration system solves this problem by providing relative reference objects. These are "proxy" objects that stand in for real objects and are resolved only after the entire hierarchy is constructed.

The m5 module provides two relative reference objects: Self and Parent. An attribute reference relative to Self resolves to the referencing object, while Parent is resolved by iteratively traversing up the hierarchy (towards root), starting at the parent of the referencing object, until a suitable match is found. A key feature of these objects is that resolution is relative to the final referencing object instance, not where the assignment is performed. Thus they can be assigned as default values to parameters in a SimObject class definition, and will be resolved independently for each instance that derives from that class.

For example, it is convenient to set the default clock speed of a CPU object to be the clock speed of the enclosing system; thus in a homogenous multiprocessor there is no need to explicitly set the clock rate on each CPU. We achieve this by setting the default value for the CPU's clock parameter to be Parent.clock. During the final instantiation phase, an access to the CPU's clock parameter will be resolved by iterating up the hierarchy, starting at the CPU's parent, until an object with a clock parameter is found. The value of this parameter (which will be recursively resolved if it is also a relative reference) will be assigned to the CPU's clock parameter.

For further flexibility, Self and Parent can take a special attribute, any, which instructs the resolution mechanism to find any value of the appropriate type, either a hierarchy node itself or a parameter of a node. The most common usage of this feature is to use Parent.any for a SimObject-valued parameter. For example, many devices use Parent.any as a default value to locate the enclosing system object or its physical memory object. To avoid ambiguity, an error will be raised if Parent.any could resolve to multiple values at the same level of the hierarchy.

Simulation

Options

The options given on the command line after the script name (see Running M5) are passed to the simulation script in the same manner that command-line arguments are passed to standard Python scripts (i.e. via sys.argv). These options allow a single script to be configurable in user-defined ways. Our example scripts in configs/example use script options to select the CPU model (simple vs. detailed) used within an otherwise similar configuration.

Because options are passed to the script in the standard Python fashion, script files can use standard Python tools to parse options. We generally use the optparse module from the Python standard library.

Here is an example snippet of a simulation script that does option parsing using optparse, then uses the parsed option flags to select a CPU model:

parser = optparse.OptionParser()

parser.add_option("-d", "--detailed", action="store_true")
parser.add_option("-t", "--timing", action="store_true")

(options, args) = parser.parse_args()

if options.timing:
    cpu = TimingSimpleCPU()
elif options.detailed:
    cpu = DetailedO3CPU()
else:
    cpu = AtomicSimpleCPU()

A secondary benefit of using the optparse module is that all available script options can be listed by using the "-h" flag, e.g.:

m5.opt <script> -h

just as m5.opt -h lists all available M5 options (see Running M5).