Difference between revisions of "General Memory System"
(Created page with "==General memory system== ===Ports system=== ====Ports in general==== ====Various port types==== ====Packets==== ====Requests==== ====Atomic/Timing/Functional accesses==== ===Two…") |
(→General memory system) |
||
Line 1: | Line 1: | ||
− | == | + | gem5's memory systemwas designed with the following goals: |
− | === | + | # Unify timing and functional accesses in timing mode. |
− | + | # Simplify the memory system code -- remove the huge amount of templating and duplicate code. | |
− | ==== | + | # Make changes easier, specifically to allow other memory interconnects besides a shared bus. |
− | ==== | + | |
− | ==== | + | ==Ports system== |
− | + | === MemObjects === | |
− | + | ||
− | + | All objects that connect to the memory system inherit from <code>MemObject</code>. This class adds the pure virtual function <code>getPort(const std::string &name)</code> which returns a port corresponding to the given name. This interface is used to connect memory objects together with the help of a connector (see below). | |
+ | |||
+ | ===Ports=== | ||
+ | The next large part of the memory system is the idea of ports. Ports are used to interface memory objects to each other. They will always come in pairs and we refer to the other port object as the peer. These are used to make the design more modular. With ports a specific interface between every type of object doesn't have to be created. Every memory object has to have at least one port to be useful. | ||
+ | |||
+ | There are two groups of functions in the port object. The <code>send*</code> functions are called on the port by the object that owns that port. For example to send a packet in the memory system a CPU would call <code>myPort->sendTiming(pkt)</code> to send a packet. Each send function has a corresponding recv function that is called on the ports peer. So the implementation of the <code>sendTiming()</code> call above would simply be <code> peer->recvTiming(pkt)</code>. Using this method we only have one virtual function call penalty but keep generic ports that can connect together any memory system objects. | ||
+ | |||
+ | ===Connections=== | ||
+ | |||
+ | In Python, Ports are first-class attributes of simulation objects, much like Params. Two objects can specify that their ports should be connected using the assignment operator. Unlike a normal variable or parameter assignment, port connections are symmetric: <code>A.port1 = B.port2</code> has the same meaning as <code>B.port2 = A.port1</code>. | ||
+ | |||
+ | Objects such as busses that have a potentially unlimited number of ports use "vector ports". An assignment to a vector port appends the peer to a list of connections rather than overwriting a previous connection. | ||
+ | |||
+ | In C++, memory ports are connected together by the python code doing something like <code>p1 = obj1->getPort(); p2 = obj2->getPort(); p1->setPeer(p2); p2->setPeer(p1);</code> with the appropriate port names if any. This is done after all objects are instantiated. | ||
+ | |||
+ | === Port Descendants === | ||
+ | There are several types of ports that inherit from <code>Port</code> and are very generic. | ||
+ | * <code>FunctionalPort</code> provides easy to use methods for writing and reading physical addresses. It is only meant to load data into memory and update constants before the simulation begins. | ||
+ | * <code>VirtualPort</code> provides the same methods as <code>FunctionalPort</code>, but the addresses passed to it are virtual addresses, and a translation is done to get the physical address. If no <code>ThreadContext</code> is passed to the constructor, the virtual->physical translation must be static (e.g. Alpha Superpage accesses), otherwise a <code>ThreadContext</code> is required to do the translation. | ||
+ | |||
+ | ==Packets== | ||
+ | A Packet is used to encapsulate a transfer between two objects in the memory system (e.g., the L1 and L2 cache). This is in contrast to a Request where a single Request travels all the way from the requester to the ultimate destination and back, possibly being conveyed by several different Packets along the way. | ||
+ | |||
+ | Read access to many packet fields is provided via accessor methods which verify that the data in the field being read is valid. | ||
+ | |||
+ | A packet contains the following all of which are accessed by accessors to be certain the data is valid: | ||
+ | * The address. This is the address that will be used to route the packet to its target (if the destination is not explicitly set) and to process the packet at the target. It is typically derived from the request object's physical address, but may be derived from the virtual address in some situations (e.g., for accessing a fully virtual cache before address translation has been performed). It may not be identical to the original request address: for example, on a cache miss, the packet address may be the address of the block to fetch and not the request address. | ||
+ | * The size. Again, this size may not be the same as that of the original request, as in the cache miss scenario. | ||
+ | * A pointer to the data being manipulated. | ||
+ | ** Set by <code>dataStatic()</code>, <code>dataDynamic()</code>, and <code>dataDynamicArray()</code> which control if the data associated with the packet is freed when the packet is, not, with <code>delete</code>, and with <code>delete []</code> respectively. | ||
+ | ** Allocated if not set by one of the above methods <code>allocate()</code> and the data is freed when the packet is destroyed. (Always safe to call). | ||
+ | ** A pointer can be retrived by calling <code>getPtr()</code> | ||
+ | ** <code>get()</code> and <code>set()</code> can be used to manipulate the data in the packet. The get() method does a guest-to-host endian conversion and the set method does a host-to-guest endian conversion. | ||
+ | * A status indicating Success, BadAddress, Not Acknowleged, and Unknown. | ||
+ | * A list of command attributes associated with the packet | ||
+ | **Note: There is some overlap in the data in the status field and the command attributes. This is largely so that a packet an be easily reinitialized when nacked or easily reused with atomic or functional accesses. | ||
+ | * A <code>SenderState</code> pointer which is a virtual base opaque structure used to hold state associated with the packet but specific to the sending device (e.g., an MSHR). A pointer to this state is returned in the packet's response so that the sender can quickly look up the state needed to process it. A specific subclass would be derived from this to carry state specific to a particular sending device. | ||
+ | * A <code>CoherenceState</code> pointer which is a virtual base opaque structure used to hold coherence-related state. A specific subclass would be derived from this to carry state specific to a particular coherence protocol. | ||
+ | * A pointer to the request. | ||
+ | |||
+ | ==Requests== | ||
+ | A request object encapsulates the original request issued by a CPU or I/O device. The parameters of this request are persistent throughout the transaction, so a request object's fields are intended to be written at most once for a given request. There are a handful of constructors and update methods that allow subsets of the object's fields to be written at different times (or not at all). Read access to all request fields is provided via accessor methods which verify that the data in the field being read is valid. | ||
+ | |||
+ | The fields in the request object are typically not available to devices in a real system, so they should normally be used only for statistics or debugging and not as architectural values. | ||
+ | |||
+ | Request object fields include: | ||
+ | * Virtual address. This field may be invalid if the request was issued directly on a physical address (e.g., by a DMA I/O device). | ||
+ | * Physical address. | ||
+ | * Data size. | ||
+ | * Time the request was created. | ||
+ | * The ID of the CPU/thread that caused this request. May be invalid if the request was not issued by a CPU (e.g., a device access or a cache writeback). | ||
+ | * The PC that caused this request. Also may be invalid if the request was not issued by a CPU. | ||
+ | |||
+ | |||
+ | ==Atomic/Timing/Functional accesses== | ||
+ | There are three types of accesses supported by the ports. | ||
+ | # '''Timing''' - Timing accesses are the most detailed access. They reflect our best effort for realistic timing and include the modeling of queuing delay and resource contention. Once a timing request is successfully sent at some point in the future the device that sent the request will either get the response or a NACK if the request could not be completed (more below). Timing and Atomic accesses can not coexist in the memory system. | ||
+ | # '''Atomic''' - Atomic accesses are a faster than detailed access. They are used for fast forwarding and warming up caches and return an approximate time to complete the request without any resource contention or queuing delay. When a atomic access is sent the response is provided when the function returns. Atomic and timing accesses can not coexist in the memory system. | ||
+ | # '''Functional''' - Like atomic accesses functional accesses happen instantaneously, but unlike atomic accesses they can coexist in the memory system with atomic or timing accesses. Functional accesses are used for things such as loading binaries, examining/changing variables in the simulated system, and allowing a remote debugger to be attached to the simulator. The important note is when a functional access is received by a device, if it contains a queue of packets all the packets must be searched for requests or responses that the functional access is effecting and they must be updated as appropriate. The <code>Packet::intersect()</code> and <code>fixPacket()</code> methods can help with this. | ||
+ | |||
+ | |||
+ | == Packet allocation protocol == | ||
+ | |||
+ | The protocol for allocation and deallocation of Packet objects varies depending on the access type. (We're talking about low-level C++ <code>new</code>/<code>delete</code> issues here, not anything related to the coherence protocol.) | ||
+ | |||
+ | ; ''Atomic'' and ''Functional'' : The Packet object is owned by the requester. The responder must overwrite the request packet with the response (typically using the <code>Packet::makeResponse()</code> method). There is no provision for having multiple responders to a single request. Since the response is always generated before <code>sendAtomic()</code> or <code>sendFunctional()</code> returns, the requester can allocate the Packet object statically or on the stack. | ||
+ | |||
+ | ; ''Timing'' : Timing transactions are composed of two one-way messages, a request and a response. In both cases, the Packet object must be dynamically allocated by the sender. Deallocation is the responsibility of the receiver (or, for broadcast coherence packets, the target device, typically memory). In the case where the receiver of a request is generating a response, it ''may'' choose to reuse the request packet for its response to save the overhead of calling <code>delete</code> and then <code>new</code> (and gain the convenience of using <code>makeResponse()</code>). However, this optimization is optional, and the requester must not rely on receiving the same Packet object back in response to a request. Note that when the responder is not the target device (as in a cache-to-cache transfer), then the target device will still delete the request packet, and thus the responding cache must allocate a new Packet object for its response. Also, because the target device may delete the request packet immediately on delivery, any other memory device wishing to reference a broadcast packet past point where the packet is delivered must make a copy of that packet, as the pointer to the packet that is delivered cannot be relied upon to stay valid. | ||
+ | |||
+ | ==Two memory systems: Classic and Ruby== | ||
+ | ===Classic memory system=== | ||
Summarize functionality and highlight advantages and disadvantages. | Summarize functionality and highlight advantages and disadvantages. | ||
Uses ports to connect all components. | Uses ports to connect all components. | ||
− | + | ===Ruby memory system=== | |
Summarize functionality and highlight advantages and disadvantages. | Summarize functionality and highlight advantages and disadvantages. | ||
Uses ports to connect cpus to the Ruby memory system. | Uses ports to connect cpus to the Ruby memory system. | ||
Uses message buffers to connect components within the memory system. | Uses message buffers to connect components within the memory system. |
Revision as of 01:28, 8 March 2011
gem5's memory systemwas designed with the following goals:
- Unify timing and functional accesses in timing mode.
- Simplify the memory system code -- remove the huge amount of templating and duplicate code.
- Make changes easier, specifically to allow other memory interconnects besides a shared bus.
Contents
Ports system
MemObjects
All objects that connect to the memory system inherit from MemObject
. This class adds the pure virtual function getPort(const std::string &name)
which returns a port corresponding to the given name. This interface is used to connect memory objects together with the help of a connector (see below).
Ports
The next large part of the memory system is the idea of ports. Ports are used to interface memory objects to each other. They will always come in pairs and we refer to the other port object as the peer. These are used to make the design more modular. With ports a specific interface between every type of object doesn't have to be created. Every memory object has to have at least one port to be useful.
There are two groups of functions in the port object. The send*
functions are called on the port by the object that owns that port. For example to send a packet in the memory system a CPU would call myPort->sendTiming(pkt)
to send a packet. Each send function has a corresponding recv function that is called on the ports peer. So the implementation of the sendTiming()
call above would simply be peer->recvTiming(pkt)
. Using this method we only have one virtual function call penalty but keep generic ports that can connect together any memory system objects.
Connections
In Python, Ports are first-class attributes of simulation objects, much like Params. Two objects can specify that their ports should be connected using the assignment operator. Unlike a normal variable or parameter assignment, port connections are symmetric: A.port1 = B.port2
has the same meaning as B.port2 = A.port1
.
Objects such as busses that have a potentially unlimited number of ports use "vector ports". An assignment to a vector port appends the peer to a list of connections rather than overwriting a previous connection.
In C++, memory ports are connected together by the python code doing something like p1 = obj1->getPort(); p2 = obj2->getPort(); p1->setPeer(p2); p2->setPeer(p1);
with the appropriate port names if any. This is done after all objects are instantiated.
Port Descendants
There are several types of ports that inherit from Port
and are very generic.
-
FunctionalPort
provides easy to use methods for writing and reading physical addresses. It is only meant to load data into memory and update constants before the simulation begins. -
VirtualPort
provides the same methods asFunctionalPort
, but the addresses passed to it are virtual addresses, and a translation is done to get the physical address. If noThreadContext
is passed to the constructor, the virtual->physical translation must be static (e.g. Alpha Superpage accesses), otherwise aThreadContext
is required to do the translation.
Packets
A Packet is used to encapsulate a transfer between two objects in the memory system (e.g., the L1 and L2 cache). This is in contrast to a Request where a single Request travels all the way from the requester to the ultimate destination and back, possibly being conveyed by several different Packets along the way.
Read access to many packet fields is provided via accessor methods which verify that the data in the field being read is valid.
A packet contains the following all of which are accessed by accessors to be certain the data is valid:
- The address. This is the address that will be used to route the packet to its target (if the destination is not explicitly set) and to process the packet at the target. It is typically derived from the request object's physical address, but may be derived from the virtual address in some situations (e.g., for accessing a fully virtual cache before address translation has been performed). It may not be identical to the original request address: for example, on a cache miss, the packet address may be the address of the block to fetch and not the request address.
- The size. Again, this size may not be the same as that of the original request, as in the cache miss scenario.
- A pointer to the data being manipulated.
- Set by
dataStatic()
,dataDynamic()
, anddataDynamicArray()
which control if the data associated with the packet is freed when the packet is, not, withdelete
, and withdelete []
respectively. - Allocated if not set by one of the above methods
allocate()
and the data is freed when the packet is destroyed. (Always safe to call). - A pointer can be retrived by calling
getPtr()
-
get()
andset()
can be used to manipulate the data in the packet. The get() method does a guest-to-host endian conversion and the set method does a host-to-guest endian conversion.
- Set by
- A status indicating Success, BadAddress, Not Acknowleged, and Unknown.
- A list of command attributes associated with the packet
- Note: There is some overlap in the data in the status field and the command attributes. This is largely so that a packet an be easily reinitialized when nacked or easily reused with atomic or functional accesses.
- A
SenderState
pointer which is a virtual base opaque structure used to hold state associated with the packet but specific to the sending device (e.g., an MSHR). A pointer to this state is returned in the packet's response so that the sender can quickly look up the state needed to process it. A specific subclass would be derived from this to carry state specific to a particular sending device. - A
CoherenceState
pointer which is a virtual base opaque structure used to hold coherence-related state. A specific subclass would be derived from this to carry state specific to a particular coherence protocol. - A pointer to the request.
Requests
A request object encapsulates the original request issued by a CPU or I/O device. The parameters of this request are persistent throughout the transaction, so a request object's fields are intended to be written at most once for a given request. There are a handful of constructors and update methods that allow subsets of the object's fields to be written at different times (or not at all). Read access to all request fields is provided via accessor methods which verify that the data in the field being read is valid.
The fields in the request object are typically not available to devices in a real system, so they should normally be used only for statistics or debugging and not as architectural values.
Request object fields include:
- Virtual address. This field may be invalid if the request was issued directly on a physical address (e.g., by a DMA I/O device).
- Physical address.
- Data size.
- Time the request was created.
- The ID of the CPU/thread that caused this request. May be invalid if the request was not issued by a CPU (e.g., a device access or a cache writeback).
- The PC that caused this request. Also may be invalid if the request was not issued by a CPU.
Atomic/Timing/Functional accesses
There are three types of accesses supported by the ports.
- Timing - Timing accesses are the most detailed access. They reflect our best effort for realistic timing and include the modeling of queuing delay and resource contention. Once a timing request is successfully sent at some point in the future the device that sent the request will either get the response or a NACK if the request could not be completed (more below). Timing and Atomic accesses can not coexist in the memory system.
- Atomic - Atomic accesses are a faster than detailed access. They are used for fast forwarding and warming up caches and return an approximate time to complete the request without any resource contention or queuing delay. When a atomic access is sent the response is provided when the function returns. Atomic and timing accesses can not coexist in the memory system.
- Functional - Like atomic accesses functional accesses happen instantaneously, but unlike atomic accesses they can coexist in the memory system with atomic or timing accesses. Functional accesses are used for things such as loading binaries, examining/changing variables in the simulated system, and allowing a remote debugger to be attached to the simulator. The important note is when a functional access is received by a device, if it contains a queue of packets all the packets must be searched for requests or responses that the functional access is effecting and they must be updated as appropriate. The
Packet::intersect()
andfixPacket()
methods can help with this.
Packet allocation protocol
The protocol for allocation and deallocation of Packet objects varies depending on the access type. (We're talking about low-level C++ new
/delete
issues here, not anything related to the coherence protocol.)
- Atomic and Functional
- The Packet object is owned by the requester. The responder must overwrite the request packet with the response (typically using the
Packet::makeResponse()
method). There is no provision for having multiple responders to a single request. Since the response is always generated beforesendAtomic()
orsendFunctional()
returns, the requester can allocate the Packet object statically or on the stack.
- Timing
- Timing transactions are composed of two one-way messages, a request and a response. In both cases, the Packet object must be dynamically allocated by the sender. Deallocation is the responsibility of the receiver (or, for broadcast coherence packets, the target device, typically memory). In the case where the receiver of a request is generating a response, it may choose to reuse the request packet for its response to save the overhead of calling
delete
and thennew
(and gain the convenience of usingmakeResponse()
). However, this optimization is optional, and the requester must not rely on receiving the same Packet object back in response to a request. Note that when the responder is not the target device (as in a cache-to-cache transfer), then the target device will still delete the request packet, and thus the responding cache must allocate a new Packet object for its response. Also, because the target device may delete the request packet immediately on delivery, any other memory device wishing to reference a broadcast packet past point where the packet is delivered must make a copy of that packet, as the pointer to the packet that is delivered cannot be relied upon to stay valid.
Two memory systems: Classic and Ruby
Classic memory system
Summarize functionality and highlight advantages and disadvantages. Uses ports to connect all components.
Ruby memory system
Summarize functionality and highlight advantages and disadvantages. Uses ports to connect cpus to the Ruby memory system. Uses message buffers to connect components within the memory system.