Difference between revisions of "Google Summer of Code"
|  (→Parallelization) |  (→Mentors / M5 Simulation Team) | ||
| (11 intermediate revisions by 3 users not shown) | |||
| Line 3: | Line 3: | ||
| ==About M5== | ==About M5== | ||
| − | M5 is a modular platform for computer system architecture research, encompassing system-level architecture as well as processor microarchitecture. At its core M5 provides a generic, object-oriented discrete-event simulation framework. This includes a foundation for: defining, parameterizing, configuring, and marshaling simulation objects. The foundation along with various pre-made object models allow M5 simulate both single systems and multiple networked systems deterministically. Simulations can be run using one binary (syscall emulation) or booting an entire operating system such as Linux or Solaris (full-system) on most major ISAs (SPARC, MIPS, ALPHA, ARM, x86/64). The simulator is written in a combination of C++ and Python and is pervasively object oriented. Python is used for configuration and not-performance critical parts, while C++ is used for the core of the simulation framework.  Using the M5 simulator, computer architecture researchers around the world have been able to successfully model their systems and publish their work in magazines, conferences, and academic journals. So far, the [[Publications]] list has reached more than 50 and it grows every year.   | + | M5 is a modular platform for computer system architecture research, encompassing system-level architecture as well as processor microarchitecture. At its core M5 provides a generic, object-oriented discrete-event simulation framework. This includes a foundation for: defining, parameterizing, configuring, and marshaling simulation objects. The foundation along with various pre-made object models allow M5 to simulate both single systems and multiple networked systems deterministically. Simulations can be run using one binary (syscall emulation) or booting an entire operating system such as Linux or Solaris (full-system) on most major ISAs (SPARC, MIPS, ALPHA, ARM, x86/64). The simulator is written in a combination of C++ and Python and is pervasively object oriented. Python is used for configuration and not-performance critical parts, while C++ is used for the core of the simulation framework.  Using the M5 simulator, computer architecture researchers around the world have been able to successfully model their systems and publish their work in magazines, conferences, and academic journals. So far, the [[Publications]] list has reached more than 50 and it grows every year. | 
| − | |||
| == Project Ideas == | == Project Ideas == | ||
| Line 25: | Line 24: | ||
| ====Parallelization==== | ====Parallelization==== | ||
| As the industry moves toward multicore systems, software will need to become parallel if it is to benefit from successive generations of chips. There are a limited | As the industry moves toward multicore systems, software will need to become parallel if it is to benefit from successive generations of chips. There are a limited | ||
| − | number of ways objects can interact with each other in M5, the scope of this problem is not as vast as it might seem at first.  Objects schedule their own events and thus reasonably long chains of independent events are nearly "ready-made" to be parallelized. Previous simulators such as the Wisconsin Wind-Tunnel (which one of our mentors was a co-author of) have been parallel. This task can largely be divided into 3 parts:(1)  Identify blocks of code that share global caching structures and make them per thread (through either __thread or setspecific() getspecefic()); (2) Assign each simobject to a thread (and make sure that any events that that simobject generates are bound to that thread; (3) define objects that can be bound threads. This is the only place where intra-thread scheduling is required. | + | number of ways objects can interact with each other in M5, the scope of this problem is not as vast as it might seem at first.  Objects schedule their own events and thus reasonably long chains of independent events are nearly "ready-made" to be parallelized. Previous simulators such as the Wisconsin Wind-Tunnel (which one of our mentors was a co-author of) have been parallel. This task can largely be divided into 3 parts:(1)  Identify blocks of code that share global caching structures and make them per thread (through either __thread or setspecific() getspecefic()); (2) Assign each simobject to a thread (and make sure that any events that that simobject generates are bound to that thread); (3) define objects that can be bound threads. This is the only place where intra-thread scheduling is required. | 
| ====Memory Network Models==== | ====Memory Network Models==== | ||
| Line 38: | Line 37: | ||
| There is currently no detailed In-Order CPU model in M5; there is code to start with but nothing that is fully fleshed out.  Cores will likely become lower power and have reduced complexity in the future making in-order cores attractive for such systems. | There is currently no detailed In-Order CPU model in M5; there is code to start with but nothing that is fully fleshed out.  Cores will likely become lower power and have reduced complexity in the future making in-order cores attractive for such systems. | ||
| + | ====Graphics Processing Unit (GPU) model ==== | ||
| + | Graphics Processing Units interface with general purpose CPUs to accelerate common graphics operations such as texture mapping and shading. Recently, Multi-core GPUs such as [http://en.wikipedia.org/wiki/Larrabee_%28GPU%29 Intel's Larabee project] have been proposed in an attempt to leverage the power of many-core systems for graphics processors. Creating a flexible, graphics processing CPU Model would allow researchers to more realistically build systems that include one or more GPUs in there framework. | ||
| + | |||
| ====Interface to an HDL==== | ====Interface to an HDL==== | ||
| − | People often write Verilog (or some other HDL) code for future chip designs and  | + | People often write Verilog (or some other HDL) code for future chip designs and want the capability to simulate those designs before sending them to production.  By adding a PLI interface to M5, these Verilog descriptions could be simulated as components of M5 alongside well tested models written in C++. M5 can be used to estimate performance of the part without having to physically build it (and the system to contain it), and M5 can flexibly and robustly verify it's functionality without having to build another complex testing infrastructure. | 
| ==== Interface to a Power Analysis Tool ==== | ==== Interface to a Power Analysis Tool ==== | ||
| − | Power consumption has become a critical concern for computer architects in recent years. Interfacing M5 to a Power Analysis Tool such as [http://www.eecs.harvard.edu/~dbrooks/wattch-form.html Wattch] would allow researchers to  | + | Power consumption has become a critical concern for computer architects in recent years. Interfacing M5 to a Power Analysis Tool such as [http://www.eecs.harvard.edu/~dbrooks/wattch-form.html Wattch] would allow researchers to simulate with the flexibility of M5 and simultaneously track the power for important system components. | 
| ====Sampling/fast-forwarding techniques==== | ====Sampling/fast-forwarding techniques==== | ||
| Line 51: | Line 53: | ||
| ====Flash Memory Model==== | ====Flash Memory Model==== | ||
| Flash memory is becoming more popular these days and is seeing serious consideration as another level in the memory hierarchy between DRAM and disks, or, as is the case for laptops, as a potential replacement for disks.  Adding performance models for flash memory parts could significantly improve research in this area. | Flash memory is becoming more popular these days and is seeing serious consideration as another level in the memory hierarchy between DRAM and disks, or, as is the case for laptops, as a potential replacement for disks.  Adding performance models for flash memory parts could significantly improve research in this area. | ||
| + | |||
| + | ====Heterogeneous ISA Systems==== | ||
| + | Multi-ISA, multicore systems (such as the [http://en.wikipedia.org/wiki/Cell_microprocessor CELL broadband engine]) gives designers the advantage of connecting specialized units (DSPs, GPUs) with their own ISAs to their general purpose counterparts using general purpose ISAs.  Implementing this capability in M5 would allow researchers to study these systems in a more realistic fashion. Currently, M5 only supports one ISA at a time. This project would involve allowing more than one ISA to be compiled in, making sure none of the code in M5 assumes a single ISA is being used, and implementing a mechanism to select which ISA should be used for each component in a simulation. It will be important to only minimally impact M5's simulation speed. This will be a very challenging project. By successfully completing it, however, you'll learn M5 inside and out and give it some impressive new capabilities. | ||
| ==Other Information== | ==Other Information== | ||
| Line 67: | Line 72: | ||
| * Gabe Black  - ISA description (SPARC, x86); Full System Simulation | * Gabe Black  - ISA description (SPARC, x86); Full System Simulation | ||
| * Korey Sewell  - ISA description (MIPS); Out-of-Order CPU Modeling; SMT, Syscall-Emulation Simulation | * Korey Sewell  - ISA description (MIPS); Out-of-Order CPU Modeling; SMT, Syscall-Emulation Simulation | ||
| − | * Ron Dreslinski - Memory Modeling | + | * Ron Dreslinski - Memory Modeling; Power Modeling | 
| __NOTOC__ | __NOTOC__ | ||
Latest revision as of 15:01, 12 March 2008
Introduction
The Google Summer of Code (SoC) is a great opportunity for students to contribute to open source software projects. The open source projects get additional contributions and active developers while the students get some money and gain experience in large distributed software development.
About M5
M5 is a modular platform for computer system architecture research, encompassing system-level architecture as well as processor microarchitecture. At its core M5 provides a generic, object-oriented discrete-event simulation framework. This includes a foundation for: defining, parameterizing, configuring, and marshaling simulation objects. The foundation along with various pre-made object models allow M5 to simulate both single systems and multiple networked systems deterministically. Simulations can be run using one binary (syscall emulation) or booting an entire operating system such as Linux or Solaris (full-system) on most major ISAs (SPARC, MIPS, ALPHA, ARM, x86/64). The simulator is written in a combination of C++ and Python and is pervasively object oriented. Python is used for configuration and not-performance critical parts, while C++ is used for the core of the simulation framework. Using the M5 simulator, computer architecture researchers around the world have been able to successfully model their systems and publish their work in magazines, conferences, and academic journals. So far, the Publications list has reached more than 50 and it grows every year.
Project Ideas
Below is a list of possible project ideas and starting points, however we're open to other ideas students may have. All the ideas listed here will require some familiarity with Python and a good grasp of advanced C++ concepts.
Direct Execution model
Direct execution is a well known technique for speeding up simulation employed by a number of simulators. A direct execution simulator uses the native machine to execute guest instructions without interpretation. Methods of direct execution include: static code instrumentation, dynamic code instrumentation, full OS virtualization, and application virtualization. There are several mechanisms for implementing direct execution with different pros and cons.
-  The Linux Kernel Virtual Machine
- PRO: Could be brought up quickly and can leverage an existing virtualization system
- CON: Can only be used for fast forward since instructions cannot be trapped
- http://kvm.qumranet.com/kvmwiki
 
-  PIN based application virtualization
- PRO: Capable of dynamic instrumentation, so can be used for real simulation
- CON: May be difficult or impossible to use for full-system simulation
- http://rogue.colorado.edu/Wikipin/index.php/Main_Page
 
-  Custom implementation
- PRO: Can do exactly what we want
- CON: Significant effort
 
Parallelization
As the industry moves toward multicore systems, software will need to become parallel if it is to benefit from successive generations of chips. There are a limited number of ways objects can interact with each other in M5, the scope of this problem is not as vast as it might seem at first. Objects schedule their own events and thus reasonably long chains of independent events are nearly "ready-made" to be parallelized. Previous simulators such as the Wisconsin Wind-Tunnel (which one of our mentors was a co-author of) have been parallel. This task can largely be divided into 3 parts:(1) Identify blocks of code that share global caching structures and make them per thread (through either __thread or setspecific() getspecefic()); (2) Assign each simobject to a thread (and make sure that any events that that simobject generates are bound to that thread); (3) define objects that can be bound threads. This is the only place where intra-thread scheduling is required.
Memory Network Models
Interconnection networks are becoming important for multicore research. Having various models for on-chip networks would be very useful.
- Mesh models
- Crossbar models
Directory Coherence Protocol
As the number of cores increases, coherence traffic will consume an increasing proportion of system resources. Directory base cache coherence protocols drastically reduce the resources required to maintain coherence across a large number of cores. This project can go hand-in-hand with an effort to implement a new network.
Detailed In-Order core model
There is currently no detailed In-Order CPU model in M5; there is code to start with but nothing that is fully fleshed out. Cores will likely become lower power and have reduced complexity in the future making in-order cores attractive for such systems.
Graphics Processing Unit (GPU) model
Graphics Processing Units interface with general purpose CPUs to accelerate common graphics operations such as texture mapping and shading. Recently, Multi-core GPUs such as Intel's Larabee project have been proposed in an attempt to leverage the power of many-core systems for graphics processors. Creating a flexible, graphics processing CPU Model would allow researchers to more realistically build systems that include one or more GPUs in there framework.
Interface to an HDL
People often write Verilog (or some other HDL) code for future chip designs and want the capability to simulate those designs before sending them to production. By adding a PLI interface to M5, these Verilog descriptions could be simulated as components of M5 alongside well tested models written in C++. M5 can be used to estimate performance of the part without having to physically build it (and the system to contain it), and M5 can flexibly and robustly verify it's functionality without having to build another complex testing infrastructure.
Interface to a Power Analysis Tool
Power consumption has become a critical concern for computer architects in recent years. Interfacing M5 to a Power Analysis Tool such as Wattch would allow researchers to simulate with the flexibility of M5 and simultaneously track the power for important system components.
Sampling/fast-forwarding techniques
Since simulators are very slow, fast-forwarding to an interesting point in an execution and sampling portions of the execution stream with detailed execution can help improve simulation performance.
- Using the techniques learned in the SMARTS work would be a good guide
- Coupling direct-execution with this would have the most benefit.
Flash Memory Model
Flash memory is becoming more popular these days and is seeing serious consideration as another level in the memory hierarchy between DRAM and disks, or, as is the case for laptops, as a potential replacement for disks. Adding performance models for flash memory parts could significantly improve research in this area.
Heterogeneous ISA Systems
Multi-ISA, multicore systems (such as the CELL broadband engine) gives designers the advantage of connecting specialized units (DSPs, GPUs) with their own ISAs to their general purpose counterparts using general purpose ISAs. Implementing this capability in M5 would allow researchers to study these systems in a more realistic fashion. Currently, M5 only supports one ISA at a time. This project would involve allowing more than one ISA to be compiled in, making sure none of the code in M5 assumes a single ISA is being used, and implementing a mechanism to select which ISA should be used for each component in a simulation. It will be important to only minimally impact M5's simulation speed. This will be a very challenging project. By successfully completing it, however, you'll learn M5 inside and out and give it some impressive new capabilities.
Other Information
The most successful project is one that is going to be interesting to you. We've got some suggested projects above, but the suggestions are just that. If there is something related that you would rather do please put that in your proposal.
Please describe who you are and what you've done in your application. In particular we would like to know about other projects you've worked on and your familiarity with Python and C++. The M5 code base tends to exercise most of the C++ standard (and the non-standard). A good familiarity with C++ and object oriented programming is necessary for a successful M5 project.
Additionally, we would like to see a set of goals/milestones in your proposal. We don't expect the list to be etched in stone, however stepping back and figuring out how you're planning to get from point A to point B is a good way for your and your mentors to track your progress and evaluate the how reasonably your goals are. Finally, we expect that working on M5 would be your main summer activity.
Mentors / M5 Simulation Team
- Steve Reinhardt - Simulator Infrastructure; Parallel Simulation; ISA description; Full System Simulation; Memory Modeling
- Nate Binkert - Simulator Infrastructure; Parallel Simulation; Python Integration; Full System Simulation; Networking Models; Configuration Scripts
- Ali Saidi - Networking Models; Device Modeling; Full System Simulation; Memory Modeling not including caches
- Lisa Hsu - Full System Workloads; Memory Modeling; Checkpointing Simulations
- Kevin Lim - CPU Modeling (Out-of-Order, SimpleCPU) ; Full-System Simulation;
- Gabe Black - ISA description (SPARC, x86); Full System Simulation
- Korey Sewell - ISA description (MIPS); Out-of-Order CPU Modeling; SMT, Syscall-Emulation Simulation
- Ron Dreslinski - Memory Modeling; Power Modeling
