Difference between revisions of "Cache Coherence Protocols"

From gem5
Jump to: navigation, search
(Protocols)
(Network_test: Moved the protocol to its own page.)
 
(19 intermediate revisions by 2 users not shown)
Line 1: Line 1:
=== Common Notations and Data Structures ===
+
== Common Notations and Data Structures ==
  
==== '''Coherence Messages''' ====
+
=== '''Coherence Messages''' ===
  
 
These are described in the <''protocol-name''>-msg.sm file for each protocol.
 
These are described in the <''protocol-name''>-msg.sm file for each protocol.
Line 27: Line 27:
 
  |}
 
  |}
  
==== '''AccessPermissions''' ====
+
=== '''AccessPermissions''' ===
  
 
These are associated with each cache block and determine what operations are permitted on that block. It is closely correlated with coherence protocol states.
 
These are associated with each cache block and determine what operations are permitted on that block. It is closely correlated with coherence protocol states.
Line 43: Line 43:
 
  |}
 
  |}
  
==== Data Structures ====
+
=== Data Structures ===
  
 
* '''Message Buffers''':TODO
 
* '''Message Buffers''':TODO
Line 52: Line 52:
 
::::  src/mem/ruby/system/TimerTable.cc: Implementation of the methods of the TimerTable class, that deals with setting addresses & timeouts, scheduling events using the event queue.
 
::::  src/mem/ruby/system/TimerTable.cc: Implementation of the methods of the TimerTable class, that deals with setting addresses & timeouts, scheduling events using the event queue.
  
==== Coherence controller FSM Diagrams ====
+
=== Coherence controller FSM Diagrams ===
  
 
* The Finite State Machines show only the stable states
 
* The Finite State Machines show only the stable states
 
* Transitions are annotated using the notation "'''Event list'''" or "'''Event list : Action list'''" or "'''Event list : Action list : Event list'''". For example, Store : GETX indicates that on a Store event, a GETX message was sent whereas GETX : Mem Read indicates that on receiving a GETX message, a memory read request was sent. Only the main triggers and actions are listed.
 
* Transitions are annotated using the notation "'''Event list'''" or "'''Event list : Action list'''" or "'''Event list : Action list : Event list'''". For example, Store : GETX indicates that on a Store event, a GETX message was sent whereas GETX : Mem Read indicates that on receiving a GETX message, a memory read request was sent. Only the main triggers and actions are listed.
 +
* Optional actions (e.g. writebacks depending on whether or not the block is dirty) are enclosed within '''[ ]'''
 
* In the diagrams, the  transition labels are associated with the arc that cuts across the transition label or the closest arc.
 
* In the diagrams, the  transition labels are associated with the arc that cuts across the transition label or the closest arc.
 
==== MI example ====
 
 
===== Protocol Overview =====
 
 
* This is a simple cache coherence protocol that is used to illustrate protocol specification using SLICC.
 
* This protocol assumes a 1-level cache hierarchy. The cache is private to each node. The caches are kept coherent by a directory controller. Since the hierarchy is only 1-level, there is no inclusion/exclusion requirement.
 
* This protocol does not differentiate between loads and stores.
 
* This protocol cannot implement the semantics of LL/SC instructions, because external GETS requests that hit a block within a LL/SC sequence steal exclusive permissions, thus causing the SC instruction to fail.
 
===== Related Files =====
 
 
* '''src/mem/protocols'''
 
** '''MI_example-cache.sm''': cache controller specification
 
** '''MI_example-dir.sm''': directory controller specification
 
** '''MI_example-dma.sm''': dma controller specification
 
** '''MI_example-msg.sm''': message type specification
 
** '''MI_example.slicc''': container file
 
 
===== Stable States and Invariants =====
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Invariants
 
|-
 
| '''M''' || The cache block has been accessed (read/written) by this node. No other node holds a copy of the cache block
 
|-
 
| '''I''' || The cache block at this node is invalid
 
|}
 
 
'''The notation used in the controller FSM diagrams is described [[#Coherence_controller_FSM_Diagrams|here]].'''
 
 
===== Cache controller =====
 
 
* Requests, Responses, Triggers:
 
** Load, Instruction fetch, Store from the core
 
** Replacement from self
 
** Data from the directory controller
 
** Forwarded request (intervention) from the directory controller
 
** Writeback acknowledgement from the directory controller
 
** Invalidations from directory controller (on dma activity)
 
 
[[File:MI_example_cache_FSM.jpg|450px|right]]
 
 
* Main Operation:
 
** On a '''load/Instruction fetch/Store''' request from the core:
 
*** it checks whether the corresponding block is present in the M state. If so, it returns a hit
 
*** otherwise, if in I state, it initiates a GETX request from the directory controller
 
 
** On a '''replacement''' trigger from self:
 
*** it evicts the block, issues a writeback request to the directory controller
 
*** it waits for acknowledgement from the directory controller (to prevent races)
 
 
** On a '''forwarded request''' from the directory controller:
 
*** This means that the block was in M state at this node when the request was generated by some other node
 
*** It sends the block directly to the requesting node (cache-to-cache transfer)
 
*** It evicts the block from this node
 
 
** '''Invalidations''' are similar to replacements
 
 
===== Directory controller =====
 
 
* Requests, Responses, Triggers:
 
** GETX from the cores, Forwarded GETX to the cores
 
** Data from memory, Data to the cores
 
** Writeback requests from the cores, Writeback acknowledgements to the cores
 
** DMA read, write requests from the DMA controllers
 
 
[[File:MI_example_dir_FSM.jpg|450px|right]]
 
 
* Main Operation:
 
** The directory maintains track of which core has a block in the M state. It designates this core as owner of the block.
 
** On a '''GETX''' request from a core:
 
*** If the block is not present, a memory fetch request is initiated
 
*** If the block is already present, then it means the request is generated from some other core
 
**** In this case, a forwarded request is sent to the original owner
 
**** Ownership of the block is transferred to the requestor
 
** On a '''writeback''' request from a core:
 
*** If the core is owner, the data is written to memory and acknowledgement is sent back to the core
 
*** If the core is not owner, a NACK is sent back
 
**** This can happen in a race condition
 
**** The core evicted the block while a forwarded request some other core was on the way and the directory has already changed ownership for the core
 
**** The evicting core holds the data till the forwarded request arrives
 
** On '''DMA''' accesses (read/write)
 
*** Invalidation is sent to the owner node (if any). Otherwise data is fetched from memory.
 
*** This ensures that the most recent data is available.
 
 
===== Other features =====
 
 
** MI protocols don't support LL/SC semantics. A load from a remote core will invalidate the cache block.
 
** This protocol has no timeout mechanisms.
 
 
==== MOESI_hammer ====
 
 
This is an implementation of AMD's Hammer protocol, which is used in AMD's Hammer chip (also know as the Opteron or Athlon 64).  The protocol implements both the original a HyperTransport protocol, as well as the more recent ProbeFilter protocol.  The protocol also includes a full-bit directory mode.
 
 
===== Related Files =====
 
 
* '''src/mem/protocols'''
 
** '''MOESI_hammer-cache.sm''': cache controller specification
 
** '''MOESI_hammer-dir.sm''': directory controller specification
 
** '''MOESI_hammer-dma.sm''': dma controller specification
 
** '''MOESI_hammer-msg.sm''': message type specification
 
** '''MOESI_hammer.slicc''': container file
 
 
===== Cache Hierarchy =====
 
 
This protocol implements a 2-level private cache hierarchy. It assigns separate Instruction and Data L1 caches, and a unified L2 cache to each core. These caches are private to each core and are controlled with one shared cache controller. This protocol enforce exclusion between L1 and L2 caches.
 
 
===== Stable States and Invariants =====
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Invariants
 
|-
 
| '''MM''' || The cache block is held exclusively by this node and is potentially locally modified (similar to conventional "M" state).
 
|-
 
| '''O''' || The cache block is owned by this node. It has not been modified by this node. No other node holds this block in exclusive mode, but sharers potentially exist.
 
|-
 
| '''M''' || The cache block is held in exclusive mode, but not written to (similar to conventional "E" state). No other node holds a copy of this block. Stores are not allowed in this state.
 
|-
 
| '''S''' || The cache line holds the most recent, correct copy of the data. Other processors in the system may hold copies of the data in the shared state, as well. The cache line can be read, but not written in this state.
 
|-
 
| '''I''' || The cache line is invalid and does not hold a valid copy of the data.
 
|}
 
 
===== Cache controller =====
 
 
 
'''The notation used in the controller FSM diagrams is described [[#Coherence_controller_FSM_Diagrams|here]].'''
 
 
MOESI_hammer supports cache flushing. To flush a cache line, the cache controller first issues a GETF request to the directory to block the line until the flushing is completed. It then issues a PUTF and writes back the cache line.
 
 
[[File:MOESI_hammer_cache_FSM.jpg|center]]
 
 
===== Directory controller =====
 
 
MOESI_hammer memory module, unlike a typical directory protocol, does not contain any directory state and instead broadcasts requests to all the processors in the system. In parallel, it fetches the data from the DRAM and forward the response to the requesters.
 
 
probe filter: TODO
 
 
* '''Stable States and Invariants'''
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Invariants
 
|-
 
| '''NX''' || Not Owner, probe filter entry exists, block in O at Owner.
 
|-
 
| '''NO''' || Not Owner, probe filter entry exists, block in E/M at Owner.
 
|-
 
| '''S''' || Data clean, probe filter entry exists pointing to the current owner.
 
|-
 
| '''O''' || Data clean, probe filter entry exists.
 
|-
 
| '''E''' || Exclusive Owner, no probe filter entry.
 
|}
 
 
* '''Controller'''
 
 
 
'''The notation used in the controller FSM diagrams is described [[#Coherence_controller_FSM_Diagrams|here]].'''
 
 
[[File:MOESI_hammer_dir_FSM.jpg|center]]
 
 
==== MOESI_CMP_token ====
 
 
===== Protocol Overview =====
 
 
* This protocol also models a 2-level cache hierarchy.
 
 
* It maintains coherence permission by explicitly exchanging and counting tokens.
 
 
* A fix number of token are assigned to each cache block in the beginning, the number of token remains unchanged.
 
 
* To write a block, the processor must have all the token for that block. For reading at least one token is required.
 
 
* The protocol also has a persistent message support to avoid starvation.
 
 
===== Related Files =====
 
 
* '''src/mem/protocols'''
 
** '''MOESI_CMP_token-L1cache.sm''': L1 cache controller specification
 
** '''MOESI_CMP_token-L2cache.sm''': L2 cache controller specification
 
** '''MOESI_CMP_token-dir.sm''': directory controller specification
 
** '''MOESI_CMP_token-dma.sm''': dma controller specification
 
** '''MOESI_CMP_token-msg.sm''': message type specification
 
** '''MOESI_CMP_token.slicc''': container file
 
 
===== Controller Description =====
 
 
* '''L1 Cache'''
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Invariants
 
|-
 
| '''MM''' || The cache block is held exclusively by this node and is potentially modified (similar to conventional "M" state).
 
|-
 
| '''MM_W''' || The cache block is held exclusively by this node and is potentially modified (similar to conventional "M" state). Replacements and DMA accesses are not allowed in this state. The block automatically transitions to MM state after a timeout.
 
|-
 
| '''O''' || The cache block is owned by this node. It has not been modified by this node. No other node holds this block in exclusive mode, but sharers potentially exist.
 
|-
 
| '''M''' || The cache block is held in exclusive mode, but not written to (similar to conventional "E" state). No other node holds a copy of this block. Stores are not allowed in this state.
 
|-
 
| '''M_W''' || The cache block is held in exclusive mode, but not written to (similar to conventional "E" state). No other node holds a copy of this block. Only loads and stores are allowed. Silent upgrade happens to MM_W state on store. Replacements and DMA accesses are not allowed in this state. The block automatically transitions to M state after a timeout.
 
|-
 
| '''S''' ||  The cache block is held in shared state by 1 or more nodes. Stores are not allowed in this state.
 
|-
 
| '''I''' || The cache block is invalid.
 
|}
 
 
*'''L2 cache'''
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Invariants
 
|-
 
| '''NP''' || The cache block is held exclusively by this node and is potentially locally modified (similar to conventional "M" state).
 
|-
 
| '''O''' || The cache block is owned by this node. It has not been modified by this node. No other node holds this block in exclusive mode, but sharers potentially exist.
 
|-
 
| '''M''' || The cache block is held in exclusive mode, but not written to (similar to conventional "E" state). No other node holds a copy of this block. Stores are not allowed in this state.
 
|-
 
| '''S''' || The cache line holds the most recent, correct copy of the data. Other processors in the system may hold copies of the data in the shared state, as well. The cache line can be read, but not written in this state.
 
|-
 
| '''I''' || The cache line is invalid and does not hold a valid copy of the data.
 
|}
 
 
* '''Directory controller'''
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Invariants
 
|-
 
| '''O''' || Owner .
 
|-
 
| '''NO''' || Not Owner.
 
|-
 
| '''L''' || Locked.
 
 
|}
 
 
==== MOESI_CMP_directory ====
 
 
'''Editing in progress.'''
 
 
===== Protocol Overview =====
 
 
* TODO: cache hierarchy
 
 
* In contrast with the MESI protocol, the MOESI protocol introduces an additional '''Owned''' state.
 
* The MOESI protocol also includes many coalescing optimizations not available in the MESI protocol.
 
 
===== Related Files =====
 
 
* '''src/mem/protocols'''
 
** '''MOESI_CMP_directory-L1cache.sm''': L1 cache controller specification
 
** '''MOESI_CMP_directory-L2cache.sm''': L2 cache controller specification
 
** '''MOESI_CMP_directory-dir.sm''': directory controller specification
 
** '''MOESI_CMP_directory-dma.sm''': dma controller specification
 
** '''MOESI_CMP_directory-msg.sm''': message type specification
 
** '''MOESI_CMP_directory.slicc''': container file
 
 
===== L1 Cache Controller =====
 
 
* '''Stable States and Invariants'''
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Invariants
 
|-
 
| '''MM''' || The cache block is held exclusively by this node and is potentially modified (similar to conventional "M" state).
 
|-
 
| '''MM_W''' || The cache block is held exclusively by this node and is potentially modified (similar to conventional "M" state). Replacements and DMA accesses are not allowed in this state. The block automatically transitions to MM state after a timeout.
 
|-
 
| '''O''' || The cache block is owned by this node. It has not been modified by this node. No other node holds this block in exclusive mode, but sharers potentially exist.
 
|-
 
| '''M''' || The cache block is held in exclusive mode, but not written to (similar to conventional "E" state). No other node holds a copy of this block. Stores are not allowed in this state.
 
|-
 
| '''M_W''' || The cache block is held in exclusive mode, but not written to (similar to conventional "E" state). No other node holds a copy of this block. Only loads and stores are allowed. Silent upgrade happens to MM_W state on store. Replacements and DMA accesses are not allowed in this state. The block automatically transitions to M state after a timeout.
 
|-
 
| '''S''' ||  The cache block is held in shared state by 1 or more nodes. Stores are not allowed in this state.
 
|-
 
| '''I''' || The cache block is invalid.
 
|}
 
 
* '''FSM Abstraction'''
 
 
'''The notation used in the controller FSM diagrams is described [[#Coherence_controller_FSM_Diagrams|here]].'''
 
 
[[File:MOESI_CMP_directory_L1cache_FSM.jpg|center]]
 
 
** '''Optimizations'''
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Description
 
|-
 
| '''SM''' || A GETX has been issued to get exclusive permissions for an impending store to the cache block, but an old copy of the block is still present. Stores and Replacements are not allowed in this state.
 
|-
 
| '''OM''' || A GETX has been issued to get exclusive permissions for an impending store to the cache block, the data has been received, but all expected acknowledgments have not yet arrived. Stores and Replacements are not allowed in this state.
 
|}
 
 
'''The notation used in the controller FSM diagrams is described [[#Coherence_controller_FSM_Diagrams|here]].'''
 
 
[[File:MOESI_CMP_directory_L1cache_optim_FSM.jpg|center]]
 
 
===== L2 Cache Controller =====
 
 
* '''Stable States and Invariants'''
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! Intra-chip Inclusion !! Inter-chip Exclusion !! States !! Description
 
|-
 
| '''<span style="color:#808080">Not in any L1 or L2 at this chip</span>''' || '''May be present at other chips''' || '''NP/I''' || The cache block at this chip is invalid.
 
|-
 
| rowspan="6"| '''<span style="color:#00CC99">Not in L2, but in 1 or more L1s at this chip</span>''' || rowspan="3"|'''May be present at other chips''' || '''ILS''' || The cache block is not present at L2 on this chip. It is shared locally by L1 nodes in this chip.
 
|-
 
| '''ILO''' || The cache block is not present at L2 on this chip. Some L1 node in this chip is an owner of this cache block.
 
|-
 
| '''ILOS''' || The cache block is not present at L2 on this chip. Some L1 node in this chip is an owner of this cache block. There are also L1 sharers of this cache block in this chip.
 
|-
 
| rowspan="3"|'''Not present at any other chip''' || '''ILX''' || The cache block is not present at L2 on this chip. It is held in exclusive mode by some L1 node in this chip.
 
|-
 
| '''ILOX''' || The cache block is not present at L2 on this chip. It is held exclusively by this chip and some L1 node in this chip is an owner of the block.
 
|-
 
| '''ILOSX''' || The cache block is not present at L2 on this chip. It is held exclusively by this chip. Some L1 node in this chip is an owner of the block. There are also L1 sharers of this cache block in this chip.
 
|-
 
| rowspan="3"| '''<span style="color:#99CCFF">In L2, but not in any L1 at this chip</span>''' || rowspan="2"|'''May be present at other chips''' || '''S''' || The cache block is not present at L1 on this chip. It is held in shared mode at L2 on this chip and is also potentially shared across chips.
 
|-
 
| '''O''' || The cache block is not present at L1 on this chip. It is held in owned mode at L2 on this chip. It is also potentially shared across chips.
 
|-
 
| '''Not present at any other chip''' || '''M''' || The cache block is not present at L1 on this chip. It is present at L2 on this chip and is potentially modified.
 
|- 
 
| rowspan="3"| '''<span style="color:#CC99FF">Both in L2, and 1 or more L1s at this chip</span>''' || rowspan="2"|'''May be present at other chips''' || '''SLS''' || The cache block is present at L2 in shared mode on this chip. There exists local L1 sharers of the block on this chip. It is also potentially shared across chips.
 
|-
 
| '''OLS''' || The cache block is present at L2 in owned mode on this chip. There exists local L1 sharers of the block on this chip. It is also potentially shared across chips.
 
|-
 
| '''Not present at any other chip''' || '''OLSX''' || The cache block is present at L2 in owned mode on this chip. There exists local L1 sharers of the block on this chip. It is held exclusively by this chip.
 
|}
 
 
 
* '''FSM Abstraction'''
 
 
The controller is described in 2 parts. The first picture shows transitions between all "intra-chip inclusion" categories and within categories 1, 3, 4. Transitions within category 2 (Not in L2, but in 1 or more L1s at this chip) are shown in the second picture.
 
 
'''The notation used in the controller FSM diagrams is described [[#Coherence_controller_FSM_Diagrams|here]]. Transitions involving other chips are annotated in <span style="color:#CC3300">brown</span>.'''
 
 
[[File:MOESI_CMP_directory_L2cache_FSM_part_1.jpg|center]]
 
 
The second picture below expands the central hexagonal portion of the above picture to show transitions within category 2 (Not in L2, but in 1 or more L1s at this chip).
 
 
'''The notation used in the controller FSM diagrams is described [[#Coherence_controller_FSM_Diagrams|here]]. Transitions involving other chips are annotated in <span style="color:#CC3300">brown</span>.'''
 
 
[[File:MOESI_CMP_directory_L2cache_FSM_part_2.jpg|center]]
 
 
===== Directory Controller =====
 
 
* '''Stable States and Invariants'''
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Invariants
 
|-
 
| '''M''' || The cache block is held in exclusive state by only 1 node (which is also the owner). There are no sharers of this block. The data is potentially different from that in memory.
 
|-
 
| '''O''' || The cache block is owned by exactly 1 node. There may be sharers of this block. The data is potentially different from that in memory.
 
|-
 
| '''S''' || The cache block is held in shared state by 1 or more nodes. No node has ownership of the block. The data is consistent with that in memory (Check).
 
|-
 
| '''I''' || The cache block is invalid.
 
|}
 
 
* '''FSM Abstraction'''
 
 
'''The notation used in the controller FSM diagrams is described [[#Coherence_controller_FSM_Diagrams|here]].'''
 
 
[[File:MOESI_CMP_directory_dir_FSM.jpg|center]]
 
 
===== Other features =====
 
 
* '''Timeouts''':
 
 
''Rathijit will do it''
 
 
==== MESI_CMP_directory ====
 
 
===== '''Protocol Overview''' =====
 
 
* This protocol models '''two-level cache hierarchy'''. The L1 cache is private to a core, while the L2 cache is shared among the cores. L1 Cache is split into Instruction and Data cache.
 
* '''Inclusion''' is maintained between the L1 and L2 cache.
 
* At high level the protocol has four stable states, '''M''', '''E''', '''S''' and '''I'''. A block in '''M''' state means the blocks is writable (i.e. has exclusive permission) and has been dirtied (i.e. its the only valid copy on-chip). '''E''' state represent a cache block with exclusive permission (i.e. writable) but is not written yet. '''S''' state means the cache block is only readable and possible multiple copies of it exists in multiple private cache and as well as in the shared cache. '''I''' means that the cache block is invalid.
 
* The on-chip cache coherence is maintained through '''Directory Coherence''' scheme, where the directory information is co-located with the corresponding cache blocks in the shared L2 cache.
 
* The protocol has four types of controllers -- '''L1 cache controller, L2 cache controller, Directory controller''' and '''DMA controller'''. L1 cache controller is responsible for managing L1 Instruction and L1 Data Cache. Number of instantiation of L1 cache controller is equal to the number of cores in the simulated system. L2 cache controller is responsible for managing the shared L2 cache and for maintaining coherence of on-chip data through directory coherence scheme. The Directory controller act as interface to the Memory Controller/Off-chip main memory and also responsible for coherence across multiple chips/and external coherence request from DMA controller. DMA controller is responsible for satisfying coherent DMA requests.
 
* One of the primary optimization in this protocol is that if a L1 Cache request a data block even for read permission, the L2 cache controller if finds that no other core has the block, it returns the cache block with exclusive permission. This is an optimization done in anticipation that a cache blocks read would be written by the same core soon and thus save an extra request with this optimization. This is exactly why '''E''' state exits (i.e. when a cache block is writable but not yet written).
 
* The protocol supports ''silent eviction'' of ''clean'' cache blocks from the private L1 caches. This means that cache blocks which have not been written to and has readable permission only can drop the cache block from the private L1 cache without informing the L2 cache. This optimization helps reducing write-back traffic to the L2 cache controller.
 
 
===== '''Related Files''' =====
 
 
* '''src/mem/protocols'''
 
** '''MESI_CMP_directory-L1cache.sm''': L1 cache controller specification
 
** '''MESI_CMP_directory-L2cache.sm''': L2 cache controller specification
 
** '''MESI_CMP_directory-dir.sm''': directory controller specification
 
** '''MESI_CMP_directory-dma.sm''': dma controller specification
 
** '''MESI_CMP_directory-msg.sm''': coherence message type specifications. This defines different field of different type of messages that would be used by the given protocol
 
** '''MESI_CMP_directory.slicc''': container file
 
 
===== '''Controller Description''' =====
 
 
* '''L1 cache controller'''
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Invariants and Semantic/Purpose of the state
 
|-
 
| '''M''' || The cache block is held in exclusive state by '''only one L1 cache'''. There are no sharers of this block. The data is potentially is the only valid copy in the system. The copy of the cache block is '''writable''' and as well as '''readable'''.
 
|-
 
| '''E''' || The cache block is held with exclusive permission by exactly '''only one L1 cache'''. The difference with the '''M''' state is that the cache block is writable (and readable) but not yet written.
 
|-
 
| '''S''' || The cache block is held in shared state by 1 or more L1 caches and/or by the L2 cache. The block is only '''readable'''. No cache can have the cache block with exclusive permission.
 
|-
 
| '''I / NP''' || The cache block is invalid.
 
|-
 
| '''IS''' || Its a transient state. This means that '''GETS (Read)''' request has been issued for the cache block and awaiting for response. The cache block is neither readable nor writable.
 
|-
 
| '''IM''' || Its a transient state. This means that '''GETX (Write)''' request has been issued for the cache block and awaiting for response. The cache block is neither readable nor writable.
 
|-
 
| '''SM''' || Its a transient state. This means the cache block was originally in S state and then '''UPGRADE (Write)''' request was issued to get exclusive permission for the blocks and awaiting response. The cache block is '''readable'''.
 
|-
 
| '''IS_I''' || Its a transient state. This means that while in IS state the cache controller received Invalidation from the L2 Cache's directory. This happens due to race condition due to write to the same cache block by other core, while the given core was trying to get the same cache blocks for reading. The cache block is neither readable nor writable..
 
|-
 
| '''M_I''' || Its a transient state. This state indicates that the cache is trying to replace a cache block in '''M''' state from its cache and the write-back (PUTX) to the L2 cache's directory has been issued but awaiting write-back acknowledgement.
 
|-
 
| '''SINK_WB_ACK''' || Its a transient state. This state is reached when waiting for write-back acknowledgement from the L2 cache's directory, the L1 cache received intervention (forwarded request from other cores). This indicates a race between the issued write-back to the directory and another request from the another cache has happened. This also indicates that the write-back has lost the race (i.e. before it reached the L2 cache's directory, another core's request has reached the L2). This state is essential to  avoid possibility of complicated race condition that can happen if write-backs are silently dropped at the directory.
 
|-
 
 
|}
 
 
* '''L2 cache controller'''
 
 
Recall that the on-chip directory is co-located with the corresponding cache blocks in the L2 Cache. Thus following states in the L2 cache block encodes the information about the status and permissions of the cache blocks in the L2 cache as well as the coherence status of the cache block that may be present in one or more private L1 caches. Beyond the coherence states there are also two more important fields per cache block that aids to make proper coherence actions. These fields are '''Sharers''' field, which can be thought of as a bit-vector indicating which of the private L1 caches potentially have the given cache block. The other important field is the '''Owner''' field, which is the identity of the private L1 cache in case the cache block is held with exclusive permission in a L1 cache.
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Invariants and Semantic/Purpose of the state
 
|-
 
| '''NP''' || The cache blocks is not present in the on-chip cache hierarchy.
 
|-
 
| '''SS''' || The cache block is present in potentially multiple private caches in only readable mode (i.e.in  "S" state in private caches). Corresponding "Sharers" vector with the block should give the identity of the private caches which possibly have the cache block in its cache. The cache block in the L2 cache is valid and '''readable'''.
 
|-
 
| '''M''' || The cache block is present ONLY in the L2 cache  and has exclusive permission. L1 Cache's read/write requests (GETS/GETX) can be satisfied directly from the L2 cache.
 
|-
 
| '''MT''' || The cache block is in ONE of the private L1 caches with exclusive permission. The data in the L2 cache is potentially stale. The identity of the L1 cache which has the block can be found in the "Owner" field associated with the cache block. Any request for read/write (GETS/GETX) from other cores/private L1 caches need to be forwarded to the owner of the cache block. L2 can not service requests itself. 
 
|-
 
| '''M_I''' ||  Its a transient state. This state indicates that the cache is trying to replace the cache block from its cache and the write-back (PUTX/PUTS) to the Directory controller (which act as interface to Main memory) has been issued but awaiting write-back acknowledgement. The data is neither readable nor writable.
 
|-
 
| '''MT_I''' ||  Its a transient state. This state indicates that the cache is trying to replace a cache block in '''MT''' state from its cache. Invalidation to the current owner (private L1 cache) of the cache block has been issued and awaiting write-back from the Owner L1 cache. Note that the this Invalidation (called back-invalidation) is instrumental in making sure that the inclusion is maintained between L1 and L2 caches. The data is neither readable nor writable.
 
|-
 
| '''MCT_I''' || Its a transient state.This state is same as '''MT_I''', except that it is known that the data in the L2 cache is in ''clean'' state. The data is neither readable nor writable.
 
|-
 
| '''I_I''' || Its a transient state. The L2 cache is trying to replace a cache block in the '''SS''' state and the cache block in the L2 is in ''clean'' state. Invalidations has been sent to all potential sharers (L1 caches) of the cache block. The L2 cache's directory is waiting for all the required Acknowledgements to arrive from the L1 caches. Note that the this Invalidation (called back-invalidation) is instrumental in making sure that the inclusion is maintained between L1 and L2 caches. The data is neither readable nor writable.
 
|-
 
| '''S_I''' || Its a transient state.Same as '''I_I''', except the data in L2 cache for the cache block is ''dirty''. This means unlike in the case of '''I_I''', the data needs to be sent to the Main memory. The cache block is neither readable nor writable..
 
|-
 
| '''ISS''' || Its a transient state. L2 has received a '''GETS (read)''' request from one of the private L1 caches, for a cache block that it not present in the on-chip caches. A read request has been sent to the Main Memory (Directory controller) and waiting for the response from the memory. This state is reached only when the request is for data cache block (not instruction cache block). The purpose of this state is that if it is found that only one L1 cache has requested the cache block then the block is returned to the requester with exclusive permission (although it was requested for reading permission). The cache block is neither readable nor writable.
 
|-
 
| '''IS''' || Its a transient state. The state is similar to '''ISS''', except the fact that if the requested cache block is Instruction cache block or more than one core request the same cache block while waiting for the response from the memory, this state is reached instead of '''ISS'''. Once the requested cache block arrives from the Main Memory, the block is sent to the requester(s) with read-only permission. The cache block is neither readable nor writable at this state.
 
|-
 
| '''IM''' || Its a transient state. This state is reached when a L1 GETX (write) request is received by the L2 cache for a cache blocks that is not present in the on-chip cache hierarchy. The request for the cache block in exclusive mode has been issued to the main memory but response is yet to arrive.The cache block is neither readable nor writable at this state.
 
|-
 
| '''SS_MB''' || Its a transient state. In general any state whose name ends with "B" (like this one) also means that it is a ''blocking'' coherence state. This means the directory awaiting for some response from the private L1 cache ans until it receives the desired response any other request is not entertained (i.e. request are effectively serialized). This particular state is reached when a L1 cache requests a cache block with exclusive permission (i.e. GETX or UPGRADE) and the coherence state of the cache blocks was in '''SS''' state. This means that the requested cache blocks potentially has readable copies in the private L1 caches. Thus before giving the exclusive permission to the requester, all the readable copies in the L1 caches need to be invalidated. This state indicate that the required invalidations has been sent to the potential sharers (L1 caches) and the requester has been informed about the required number of Invalidation Acknowledgement it needs before it can have the exclusive permission for the cache block. Once the requester L1 cache gets the required number of Invalidation Acknowledgement it informs the director about this by ''UNBLOCK'' message which allows the directory to move out of this blocking coherence state and thereafter it can resume entertaining other request for the given cache block. The cache block is neither readable nor writable at this state.
 
|-
 
| '''MT_MB''' || Its a transient state and also a ''blocking'' state. This state is reached when L2 cache's directory has sent out a cache block with exclusive permission to a requester L1 cache but yet to receive ''UNBLOCK'' from the requester L1 cache acknowledging the receipt of exclusive permission. The cache block is neither readable nor writable at this state.
 
|-
 
| '''MT_IIB''' || Its a transient state and also a ''blocking'' state. This state is reached when a read request (GETS) request is received for a cache blocks which is currently held with exclusive permission in another private L1 cache (i.e. directory state is '''MT'''). On such requests the L2 cache's directory forwards the request to the current owner L1 cache and transitions to this state.  Two events need to happen before this cache block can be unblocked (and thus start entertaining further request for this cache block). The current owner cache block need to send a write-back to the L2 cache to update the L2's copy with latest value. The requester L1 cache also needs to send ''UNBLOCK'' to the L2 cache indicating that it has got the requested cache block with desired coherence permissions. The cache block is neither readable nor writable at this state in the L2 cache.
 
|-
 
| '''MT_IB''' || Its a transient state and also a ''blocking'' state. This state is reached when at '''MT_IIB''' state the L2 cache controller receives the ''UNBLOCK'' from the requester L1 cache but yet to receive the write-back from the previous owner L1 cache of the block. The cache block is neither readable nor writable at this state in the L2 cache.
 
|-
 
| '''MT_IB''' || Its a transient state and also a ''blocking'' state. This state is reached when at '''MT_IIB''' state the L2 cache controller receives write-back from the previous owner L1 cache for the blocks, while yet to receive the ''UNBLOCK'' from the current requester for the cache block. The cache block is neither readable nor writable at this state in the L2 cache.
 
|}
 
 
==== Network_test ====
 
This is a dummy cache coherence protocol that is used to operate the ruby network tester. The details about running the network tester can be found [[networktest|here]].
 
 
===== Related Files =====
 
 
* '''src/mem/protocols'''
 
** '''Network_test-cache.sm''': cache controller specification
 
** '''Network_test-dir.sm''': directory controller specification
 
** '''Network_test-msg.sm''': message type specification
 
** '''Network_test.slicc''': container file
 
 
===== Cache Hierarchy =====
 
 
This protocol assumes a 1-level cache hierarchy. The role of the cache is to simply send messages from the cpu to the appropriate directory (based on the address), in the appropriate virtual network (based on the message type). It does not track any state. Infact, no CacheMemory is created unlike other protocols. The directory receives the messages from the caches, but does not send any back. The goal of this protocol is to enable simulation/testing of just the interconnection network.
 
 
===== Stable States and Invariants =====
 
 
{| border="1" cellpadding="10" class="wikitable"
 
! States !! Invariants
 
|-
 
| '''I''' || Default state of all cache blocks
 
|}
 
 
===== Cache controller =====
 
 
* Requests, Responses, Triggers:
 
** Load, Instruction fetch, Store from the core.
 
The network tester (in src/cpu/testers/networktest/networktest.cc) generates packets of the type '''ReadReq''', '''INST_FETCH''', and '''WriteReq''', which are converted into '''RubyRequestType:LD''', '''RubyRequestType:IFETCH''', and '''RubyRequestType:ST''', respectively, by the RubyPort (in src/mem/ruby/system/RubyPort.hh/cc). These messages reach the cache controller via the Sequencer. The destination for these messages is determined by the traffic type, and embedded in the address. More details can be found [[networktest|here]].
 
 
* Main Operation:
 
** The goal of the cache is only to act as a source node in the underlying interconnection network. It does not track any states.
 
** On a '''LD''' from the core:
 
*** it returns a hit, and
 
*** maps the address to a directory, and issues a message for it of type '''MSG''', and size '''Control''' (8 bytes) in the request vnet (0).
 
*** Note: vnet 0 could also be made to broadcast, instead of sending a directed message to a particular directory, by uncommenting the appropriate line in the ''a_issueRequest'' action in Network_test-cache.sm
 
** On a '''IFETCH''' from the core:
 
*** it returns a hit, and
 
*** maps the address to a directory, and issues a message for it of type '''MSG''', and size '''Control''' (8 bytes) in the forward vnet (1).
 
** On a '''ST''' from the core:
 
*** it returns a hit, and
 
*** maps the address to a directory, and issues a message for it of type '''MSG''', and size '''Data''' (72 bytes) in the response vnet (2).
 
** Note: request, forward and response are just used to differentiate the vnets, but do not have any physical significance in this protocol.
 
 
===== Directory controller =====
 
 
* Requests, Responses, Triggers:
 
** '''MSG''' from the cores
 
 
* Main Operation:
 
** The goal of the directory is only to act as a destination node in the underlying interconnection network. It does not track any states.
 
** The directory simply pops its incoming queue upon receiving the message.
 
 
===== Other features =====
 
 
** This protocol assumes only 3 vnets.
 
** It should only be used when running the ruby network test.
 

Latest revision as of 00:26, 9 July 2013

Common Notations and Data Structures

Coherence Messages

These are described in the <protocol-name>-msg.sm file for each protocol.

Message Description
ACK/NACK positive/negative acknowledgement for requests that wait for the direction of resolution before deciding on the next action. Examples are writeback requests, exclusive requests.
GETS request for shared permissions to satisfy a CPU's load or IFetch.
GETX request for exclusive access.
INV invalidation request. This can be triggered by the coherence protocol itself, or by the next cache level/directory to enforce inclusion or to trigger a writeback for a DMA access so that the latest copy of data is obtained.
PUTX request for writeback of cache block. Some protocols (e.g. MOESI_CMP_directory) may use this only for writeback requests of exclusive data.
PUTS request for writeback of cache block in shared state.
PUTO request for writeback of cache block in owned state.
PUTO_Sharers request for writeback of cache block in owned state but other sharers of the block exist.
UNBLOCK message to unblock next cache level/directory for blocking protocols.

AccessPermissions

These are associated with each cache block and determine what operations are permitted on that block. It is closely correlated with coherence protocol states.

Permissions Description
Invalid The cache block is invalid. The block must first be obtained (from elsewhere in the memory hierarchy) before loads/stores can be performed. No action on invalidates (except maybe sending an ACK). No action on replacements. The associated coherence protocol states are I or NP and are stable states in every protocol.
Busy TODO
Read_Only Only operations permitted are loads, writebacks, invalidates. Stores cannot be performed before transitioning to some other state.
Read_Write Loads, stores, writebacks, invalidations are allowed. Usually indicates that the block is dirty.

Data Structures

  • Message Buffers:TODO
  • TBE Table: TODO
  • Timer Table: This maintains a map of address-based timers. For each target address, a timeout value can be associated and added to the Timer table. This data structure is used, for example, by the L1 cache controller implementation of the MOESI_CMP_directory protocol to trigger separate timeouts for cache blocks. Internally, the Timer Table uses the event queue to schedule the timeouts. The TimerTable supports a polling-based interface, isReady() to check if a timeout has occurred. Timeouts on addresses can be set using the set() method and removed using the unset() method.
Related Files:
src/mem/ruby/system/TimerTable.hh: Declares the TimerTable class
src/mem/ruby/system/TimerTable.cc: Implementation of the methods of the TimerTable class, that deals with setting addresses & timeouts, scheduling events using the event queue.

Coherence controller FSM Diagrams

  • The Finite State Machines show only the stable states
  • Transitions are annotated using the notation "Event list" or "Event list : Action list" or "Event list : Action list : Event list". For example, Store : GETX indicates that on a Store event, a GETX message was sent whereas GETX : Mem Read indicates that on receiving a GETX message, a memory read request was sent. Only the main triggers and actions are listed.
  • Optional actions (e.g. writebacks depending on whether or not the block is dirty) are enclosed within [ ]
  • In the diagrams, the transition labels are associated with the arc that cuts across the transition label or the closest arc.