Difference between revisions of "Modular Coherence Protocols"

From gem5
Jump to: navigation, search
(Implicit hit path)
(Implicit hit path)
Line 15: Line 15:
 
The coherence protocol should not be involved on cache hits.  Requiring a call into the protocol object on hits will drastically reduce our opportunities for performance optimization of this critical path in the simulator.  As is often the case, simulation reflects reality, and no sane coherence protocol would intrude on the hit path of a real implementation either.  The Tag object is currently consulted on hits (to enable replacement algorithm bookkeeping), so if someone desperately needs to collect information on hits they can implement a new Tag object to do that.
 
The coherence protocol should not be involved on cache hits.  Requiring a call into the protocol object on hits will drastically reduce our opportunities for performance optimization of this critical path in the simulator.  As is often the case, simulation reflects reality, and no sane coherence protocol would intrude on the hit path of a real implementation either.  The Tag object is currently consulted on hits (to enable replacement algorithm bookkeeping), so if someone desperately needs to collect information on hits they can implement a new Tag object to do that.
  
While the coherence protocol should not be involved on cache hits, there are a few situations we need to ensure are handled correctly.  First, the cache hit path should differentiate read premission from read/write permission.  Second, the cache hit path should handle sighlently upgraded cache blocks from exclusive to modified state.
+
While the coherence protocol should not be involved on cache hits, there are a few situations we must handle correctly.  First, the cache hit path should differentiate read permission from read/write permission.  Second, the cache hit path should handle silently upgraded cache blocks from exclusive to modified state.
  
 
=== Support internal cache timing ===
 
=== Support internal cache timing ===

Revision as of 15:43, 11 July 2008

Introduction

We need a way to add/extend/replace cache coherence protocols in a modular fashion. The original cache module had coherence factored out into a separate object, but the interface was totally inadequate for doing anything interesting. The redesigned memory system from 2.0 integrated coherence back in to the cache module as a temporary step to eliminate the useless bloat caused by the original separation. It's now time to go back and re-introduce this modularity, but to do it right this time.

We are hugely inspired by SLICC, the coherence protocol description language/compiler that's part of the GEMS simulator from Wisconsin. Following their example, we plan on developing a similar domain-specific language (DSL) and compiler for coherence protocols. SLICC is fairly closely tied to the GEMS framwork, so it's not possible to adopt it directly for M5. In addition, there are a number of aspects of SLICC that we would like to change, partly based on personal bias and partly based on areas where we think we can improve on their design. To try and come up with the best solution we can, we will approach this design as a clean sheet, and not automatically adopt any part of SLICC by default. At the same time, SLICC is a terrific working example, so many of our design decisions will probably end up being "let's do this like SLICC, except...". Correspondingly, most of the initial design requirements and preferences listed below are "things we should do differently than SLICC", with the implicit assumption that most of the issues not listed below will be handled similarly.

Design Requirements

Protocol inheritance

This is Brad's #1 request for improving on SLICC. Related protocols should be able to share common elements. New protocols that are minor modifications or extensions of existing protocols should be describable by inheriting from the existing protocol and describing only the differences.

Implicit hit path

The coherence protocol should not be involved on cache hits. Requiring a call into the protocol object on hits will drastically reduce our opportunities for performance optimization of this critical path in the simulator. As is often the case, simulation reflects reality, and no sane coherence protocol would intrude on the hit path of a real implementation either. The Tag object is currently consulted on hits (to enable replacement algorithm bookkeeping), so if someone desperately needs to collect information on hits they can implement a new Tag object to do that.

While the coherence protocol should not be involved on cache hits, there are a few situations we must handle correctly. First, the cache hit path should differentiate read permission from read/write permission. Second, the cache hit path should handle silently upgraded cache blocks from exclusive to modified state.

Support internal cache timing

Though we currently don't model any internal cache timing, we probably should eventually, and we want to make sure the language is up to the task. For example, having a getState() function that we expect to return immediately means that we can't explicitly model the tag array access latency. It might be better to assume that the tag lookup happens explicitly when a message is first processed and then assume that the controller (and thus the coherence code) is handed both the message and the tag state right from the start.

Design Preferences

Leverage state and message attributes

The current coherence protocol uses memory command attributes to make the code a lot cleaner; see the MemCmd class in src/mem/packet.{hh,cc}. The cache block state has similar properties, though the implementation is even simpler because the current protocol doesn't need any non-transient state beyond the basics (valid, dirty, exclusive). See src/mem/cache/blk.hh. We should keep this basic model. The MemCmd structure is pretty good as is, but needs to be extended to support inheritance. The block state model needs to be expanded to allow protocol-specific state, but we should consider continuing to use encoding tricks to avoid redundantly storing both the block access bits and the full coherence state separately.

In contrast, SLICC doesn't directly associate semantics with requests or states, and you end up with code like this:

     // Set permission
     if ((state == State:I) || (state == State:MI_A) || (state == State:II_A)) {
       changePermission(addr, AccessPermission:Invalid);
     } else if (state == State:S || state == State:O) {
       changePermission(addr, AccessPermission:Read_Only);
     } else if (state == State:M) {
       changePermission(addr, AccessPermission:Read_Write);
     } else {
       changePermission(addr, AccessPermission:Busy);
     }

No new imperative language

Let's not invent a new language for imperative computation. I admit that the "treat C++ code blocks as strings" approach of the ISA description language is a bit cheesy, but for the most part it works. The main drawback is that you can't parse the code to analyze its behavior (though the ISA parser extracts a lot of information in an imperfect way using regexps; see Code parsing), but I think we'll have even less need of that here than the ISA parser does.

Even if we do feel the need to fully parse imperative computations, or to restrict what people can do in them, I'd rather see us do a subset of C++ than invent something new. (That might actually be useful if we can reuse the C++-subset parser for the ISA description language.) Unfortunately, this is easier said than done; one big challenge is getting the context set up right so the parser can figure out which symbols are types, preprocessor macros are expanded, etc.

Note that some of the SLICC design notes talk about wanting to make the language synthesizable to hardware, which is a reasonable motivation for them to have a limited imperative language. We don't have that goal, so there's much less reason to be restrictive.

Design Challenges

Interfacing generic with protocol-specific bits

Though it's reasonable to specialize caches and interconnects based on the coherence protocol, we don't want to recompile every object that connects to the memory system (CPUs, devices, etc.) for the protocol too. Yet these devices do need to send out and respond to basic memory requests (read, write, instruction fetch, etc.). One possible solution is to leverage inheritance, and have a base coherence protocol that defines the set of messages which all CPUs and devices can use. This base protocol will probably be abstract (if only by having all the required functions call fatal()). Then generic memory system clients can depend only on the base protocol, and as long as the derived protocols extend it in a compatible fashion, everything will work.

Note that this solution will constrain how we implement inheritance: we can't do it entirely in the DSL and have it spit out flattened C++, as we'll need this base protocol to be separate in C++ so that the compilation of CPU and device models can be independent of the specific protocol.

Basic Design

There's not much here yet, unfortunately.

Simulator implementation

The coherence protocol implementation (which will eventually be auto-generated from the DSL description) should consist of one or more C++ objects. The cache and interconnect models will be modified to take a coherence protocol object as a template parameter. The protocol object(s) will define types used to define the allowable commands, define the protocol-specific state included in Packet and CacheBlk classes, etc.

  • Should the C++ protocol object(s) form an inheritance hierarchy that mirrors the inheritance relationship of the protocols in the DSL?

Domain-specific language

We should write the parser in PLY and the compiler in Python because we know them, and to avoid introducing any new tool dependencies.

Development Plan

  1. Separate the current coherence protocol from the cache module into a separate policy object (or objects). This will give us an initial interface definition and an idea of what kind of code the description language will need to generate.
  2. Implement a few different protocols using this interface, expanding and modifying it as needed. Ideally these protocols will be significantly different from the base protocol, e.g., directory based, point-to-point, etc. This will help us flesh out the interface, and get started on implementing features that we'll need for more advanced protocols (such as virtual channels) that will be reflected in the description language but are largely orthogonal to it. Some candidates would be Geoff Blake's LogTM implementation or a manual translation of an existing SLICC protocol from the GEMS distribution. A side benefit will be that those of us who need a particular protocol can implement it at this stage and not have the description language be on their critical path. There is the down side that there will be some extra effort to reimplement these protocols in the description language, but if our language is as concise as it should be then this effort should be minimal.
  3. Design and implement the description language and associated compiler. This work can overlap with the previous step somewhat, but we should avoid getting too far ahead of ourselves in case the previous step uncovers some features that require significant modifications to the language structure.

--Steve 12:39, 2 July 2008 (EDT)