Difference between revisions of "Google Summer of Code"

Latest revision as of 14:01, 12 March 2008

Introduction

The Google Summer of Code (SoC) is a great opportunity for students to contribute to open source software projects. The open source projects get additional contributions and active developers while the students get some money and gain experience in large distributed software development.

About M5

M5 is a modular platform for computer system architecture research, encompassing system-level architecture as well as processor microarchitecture. At its core M5 provides a generic, object-oriented discrete-event simulation framework. This includes a foundation for: defining, parameterizing, configuring, and marshaling simulation objects. The foundation along with various pre-made object models allow M5 to simulate both single systems and multiple networked systems deterministically. Simulations can be run using one binary (syscall emulation) or booting an entire operating system such as Linux or Solaris (full-system) on most major ISAs (SPARC, MIPS, ALPHA, ARM, x86/64). The simulator is written in a combination of C++ and Python and is pervasively object oriented. Python is used for configuration and not-performance critical parts, while C++ is used for the core of the simulation framework. Using the M5 simulator, computer architecture researchers around the world have been able to successfully model their systems and publish their work in magazines, conferences, and academic journals. So far, the Publications list has reached more than 50 and it grows every year.

Project Ideas

Below is a list of possible project ideas and starting points, however we're open to other ideas students may have. All the ideas listed here will require some familiarity with Python and a good grasp of advanced C++ concepts.

Direct Execution model

Direct execution is a well known technique for speeding up simulation employed by a number of simulators. A direct execution simulator uses the native machine to execute guest instructions without interpretation. Methods of direct execution include: static code instrumentation, dynamic code instrumentation, full OS virtualization, and application virtualization. There are several mechanisms for implementing direct execution with different pros and cons.

The Linux Kernel Virtual Machine
- PRO: Could be brought up quickly and can leverage an existing virtualization system
- CON: Can only be used for fast forward since instructions cannot be trapped
- http://kvm.qumranet.com/kvmwiki
PIN based application virtualization
- PRO: Capable of dynamic instrumentation, so can be used for real simulation
- CON: May be difficult or impossible to use for full-system simulation
- http://rogue.colorado.edu/Wikipin/index.php/Main_Page
Custom implementation
- PRO: Can do exactly what we want
- CON: Significant effort

Parallelization

As the industry moves toward multicore systems, software will need to become parallel if it is to benefit from successive generations of chips. There are a limited number of ways objects can interact with each other in M5, the scope of this problem is not as vast as it might seem at first. Objects schedule their own events and thus reasonably long chains of independent events are nearly "ready-made" to be parallelized. Previous simulators such as the Wisconsin Wind-Tunnel (which one of our mentors was a co-author of) have been parallel. This task can largely be divided into 3 parts:(1) Identify blocks of code that share global caching structures and make them per thread (through either __thread or setspecific() getspecefic()); (2) Assign each simobject to a thread (and make sure that any events that that simobject generates are bound to that thread); (3) define objects that can be bound threads. This is the only place where intra-thread scheduling is required.

Memory Network Models

Interconnection networks are becoming important for multicore research. Having various models for on-chip networks would be very useful.

Mesh models
Crossbar models

Directory Coherence Protocol

As the number of cores increases, coherence traffic will consume an increasing proportion of system resources. Directory base cache coherence protocols drastically reduce the resources required to maintain coherence across a large number of cores. This project can go hand-in-hand with an effort to implement a new network.

Detailed In-Order core model

There is currently no detailed In-Order CPU model in M5; there is code to start with but nothing that is fully fleshed out. Cores will likely become lower power and have reduced complexity in the future making in-order cores attractive for such systems.

Graphics Processing Unit (GPU) model

Graphics Processing Units interface with general purpose CPUs to accelerate common graphics operations such as texture mapping and shading. Recently, Multi-core GPUs such as Intel's Larabee project have been proposed in an attempt to leverage the power of many-core systems for graphics processors. Creating a flexible, graphics processing CPU Model would allow researchers to more realistically build systems that include one or more GPUs in there framework.

Interface to an HDL

People often write Verilog (or some other HDL) code for future chip designs and want the capability to simulate those designs before sending them to production. By adding a PLI interface to M5, these Verilog descriptions could be simulated as components of M5 alongside well tested models written in C++. M5 can be used to estimate performance of the part without having to physically build it (and the system to contain it), and M5 can flexibly and robustly verify it's functionality without having to build another complex testing infrastructure.

Interface to a Power Analysis Tool

Power consumption has become a critical concern for computer architects in recent years. Interfacing M5 to a Power Analysis Tool such as Wattch would allow researchers to simulate with the flexibility of M5 and simultaneously track the power for important system components.

Sampling/fast-forwarding techniques

Since simulators are very slow, fast-forwarding to an interesting point in an execution and sampling portions of the execution stream with detailed execution can help improve simulation performance.

Using the techniques learned in the SMARTS work would be a good guide
Coupling direct-execution with this would have the most benefit.

Flash Memory Model

Flash memory is becoming more popular these days and is seeing serious consideration as another level in the memory hierarchy between DRAM and disks, or, as is the case for laptops, as a potential replacement for disks. Adding performance models for flash memory parts could significantly improve research in this area.

Heterogeneous ISA Systems

Multi-ISA, multicore systems (such as the CELL broadband engine) gives designers the advantage of connecting specialized units (DSPs, GPUs) with their own ISAs to their general purpose counterparts using general purpose ISAs. Implementing this capability in M5 would allow researchers to study these systems in a more realistic fashion. Currently, M5 only supports one ISA at a time. This project would involve allowing more than one ISA to be compiled in, making sure none of the code in M5 assumes a single ISA is being used, and implementing a mechanism to select which ISA should be used for each component in a simulation. It will be important to only minimally impact M5's simulation speed. This will be a very challenging project. By successfully completing it, however, you'll learn M5 inside and out and give it some impressive new capabilities.

Other Information

The most successful project is one that is going to be interesting to you. We've got some suggested projects above, but the suggestions are just that. If there is something related that you would rather do please put that in your proposal.

Please describe who you are and what you've done in your application. In particular we would like to know about other projects you've worked on and your familiarity with Python and C++. The M5 code base tends to exercise most of the C++ standard (and the non-standard). A good familiarity with C++ and object oriented programming is necessary for a successful M5 project.

Additionally, we would like to see a set of goals/milestones in your proposal. We don't expect the list to be etched in stone, however stepping back and figuring out how you're planning to get from point A to point B is a good way for your and your mentors to track your progress and evaluate the how reasonably your goals are. Finally, we expect that working on M5 would be your main summer activity.

Mentors / M5 Simulation Team

Steve Reinhardt - Simulator Infrastructure; Parallel Simulation; ISA description; Full System Simulation; Memory Modeling
Nate Binkert - Simulator Infrastructure; Parallel Simulation; Python Integration; Full System Simulation; Networking Models; Configuration Scripts
Ali Saidi - Networking Models; Device Modeling; Full System Simulation; Memory Modeling not including caches
Lisa Hsu - Full System Workloads; Memory Modeling; Checkpointing Simulations
Kevin Lim - CPU Modeling (Out-of-Order, SimpleCPU) ; Full-System Simulation;
Gabe Black - ISA description (SPARC, x86); Full System Simulation
Korey Sewell - ISA description (MIPS); Out-of-Order CPU Modeling; SMT, Syscall-Emulation Simulation
Ron Dreslinski - Memory Modeling; Power Modeling

@@ Line 3: / Line 3: @@
 ==About M5==
-M5 is a modular platform for computer system architecture research, encompassing system-level architecture as well as processor microarchitecture. At its core M5 provides a generic, object-oriented discrete-event simulation framework. This includes a foundation for: defining, parameterizing, configuring, and marshaling simulation objects. The foundation along with various pre-made object models allow M5 simulate both single systems and multiple networked systems deterministically. Simulations can be run using one binary (syscall emulation) or booting an entire operating system such as Linux or Solaris (full-system) on most major ISAs (SPARC, MIPS, ALPHA, ARM, x86/64). The simulator is written in a combination of C++ and Python and is pervasively object oriented. Python is used for configuration and not-performance critical parts, while C++ is used for the core of the simulation framework.  Using the M5 simulator, computer architecture researchers around the world have been able to successfully model their systems and publish their work in magazines, conferences, and academic journals. So far, the [[Publications]] list has reached more than 50 and it grows every year.
+M5 is a modular platform for computer system architecture research, encompassing system-level architecture as well as processor microarchitecture. At its core M5 provides a generic, object-oriented discrete-event simulation framework. This includes a foundation for: defining, parameterizing, configuring, and marshaling simulation objects. The foundation along with various pre-made object models allow M5 to simulate both single systems and multiple networked systems deterministically. Simulations can be run using one binary (syscall emulation) or booting an entire operating system such as Linux or Solaris (full-system) on most major ISAs (SPARC, MIPS, ALPHA, ARM, x86/64). The simulator is written in a combination of C++ and Python and is pervasively object oriented. Python is used for configuration and not-performance critical parts, while C++ is used for the core of the simulation framework.  Using the M5 simulator, computer architecture researchers around the world have been able to successfully model their systems and publish their work in magazines, conferences, and academic journals. So far, the [[Publications]] list has reached more than 50 and it grows every year.
 == Project Ideas ==
@@ Line 24: / Line 23: @@
 ====Parallelization====
-As the industry moves toward multicore systems, software will need to become parallel if it is to benefit from successive generations of chips.
+As the industry moves toward multicore systems, software will need to become parallel if it is to benefit from successive generations of chips. There are a limited
-* Use the Wisconsin Wind Tunnel as a guide
+number of ways objects can interact with each other in M5, the scope of this problem is not as vast as it might seem at first.  Objects schedule their own events and thus reasonably long chains of independent events are nearly "ready-made" to be parallelized. Previous simulators such as the Wisconsin Wind-Tunnel (which one of our mentors was a co-author of) have been parallel. This task can largely be divided into 3 parts:(1)  Identify blocks of code that share global caching structures and make them per thread (through either __thread or setspecific() getspecefic()); (2) Assign each simobject to a thread (and make sure that any events that that simobject generates are bound to that thread); (3) define objects that can be bound threads. This is the only place where intra-thread scheduling is required.
-* This actually isn't as bad as it sounds as all objects schedule their own events and there are limited ways they can interact with other objects in the system.
 ====Memory Network Models====
@@ Line 39: / Line 37: @@
 There is currently no detailed In-Order CPU model in M5; there is code to start with but nothing that is fully fleshed out.  Cores will likely become lower power and have reduced complexity in the future making in-order cores attractive for such systems.
+====Graphics Processing Unit (GPU) model ====
+Graphics Processing Units interface with general purpose CPUs to accelerate common graphics operations such as texture mapping and shading. Recently, Multi-core GPUs such as [http://en.wikipedia.org/wiki/Larrabee_%28GPU%29 Intel's Larabee project] have been proposed in an attempt to leverage the power of many-core systems for graphics processors. Creating a flexible, graphics processing CPU Model would allow researchers to more realistically build systems that include one or more GPUs in there framework.
 ====Interface to an HDL====
-# Write a PLI interface to connect Verilog CPUs to the memory system.
+People often write Verilog (or some other HDL) code for future chip designs and want the capability to simulate those designs before sending them to production.  By adding a PLI interface to M5, these Verilog descriptions could be simulated as components of M5 alongside well tested models written in C++. M5 can be used to estimate performance of the part without having to physically build it (and the system to contain it), and M5 can flexibly and robustly verify it's functionality without having to build another complex testing infrastructure.
+==== Interface to a Power Analysis Tool ====
+Power consumption has become a critical concern for computer architects in recent years. Interfacing M5 to a Power Analysis Tool such as [http://www.eecs.harvard.edu/~dbrooks/wattch-form.html Wattch] would allow researchers to simulate with the flexibility of M5 and simultaneously track the power for important system components.
 ====Sampling/fast-forwarding techniques====
-# Sampling/fast-forwarding techniques
+Since simulators are very slow, fast-forwarding to an interesting point in an execution and sampling portions of the execution stream with detailed execution can help improve simulation performance.
-#* This would have the most impact if it was coupled with (1)
+* Using the techniques learned in the SMARTS work would be a good guide
-#* Using SMARTS work would be a good guide
+* Coupling direct-execution with this would have the most benefit.
+====Flash Memory Model====
+Flash memory is becoming more popular these days and is seeing serious consideration as another level in the memory hierarchy between DRAM and disks, or, as is the case for laptops, as a potential replacement for disks.  Adding performance models for flash memory parts could significantly improve research in this area.
-====Other device models====
+====Heterogeneous ISA Systems====
-# Flash memory device model (seems popular nowadays)
+Multi-ISA, multicore systems (such as the [http://en.wikipedia.org/wiki/Cell_microprocessor CELL broadband engine]) gives designers the advantage of connecting specialized units (DSPs, GPUs) with their own ISAs to their general purpose counterparts using general purpose ISAs.  Implementing this capability in M5 would allow researchers to study these systems in a more realistic fashion. Currently, M5 only supports one ISA at a time. This project would involve allowing more than one ISA to be compiled in, making sure none of the code in M5 assumes a single ISA is being used, and implementing a mechanism to select which ISA should be used for each component in a simulation. It will be important to only minimally impact M5's simulation speed. This will be a very challenging project. By successfully completing it, however, you'll learn M5 inside and out and give it some impressive new capabilities.
-#* This could be a hard drive based model like we're seeing in laptops now or a memory device model like several research papers have suggested as storage in between DRAM and disk.
 ==Other Information==
@@ Line 65: / Line 71: @@
 * Kevin Lim - CPU Modeling (Out-of-Order, SimpleCPU) ; Full-System Simulation;
 * Gabe Black  - ISA description (SPARC, x86); Full System Simulation
-* Korey Sewell  - ISA description (MIPS); Out-of-Order CPU Modeling; SMT Simulation
+* Korey Sewell  - ISA description (MIPS); Out-of-Order CPU Modeling; SMT, Syscall-Emulation Simulation
-* Ron Dreslinski - Memory Modeling
+* Ron Dreslinski - Memory Modeling; Power Modeling
 __NOTOC__