Difference between revisions of "ISCA 2018 Tutorial"

From gem5
Jump to: navigation, search
(Presenters)
(Abstract)
 
(18 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
</div>
 
</div>
 
<div style="font-size:120%;border:none;margin:0;padding:.1em;text-align:center;color#000">
 
<div style="font-size:120%;border:none;margin:0;padding:.1em;text-align:center;color#000">
Held in conjunction with [http://iscaconf.org/isca2018/ ISCA 2018]
+
Held in conjunction with [http://iscaconf.org/isca2018/ ISCA 2018]. June 2nd, 2018.
 
</div>
 
</div>
  
 
= Important Dates =
 
= Important Dates =
 +
The tutorial will be held on day one of the conference - June 2nd, 2018
 +
 
ISCA 2018 early registration and hotel reservation deadline - April 16th, 2018
 
ISCA 2018 early registration and hotel reservation deadline - April 16th, 2018
  
 
= Abstract =
 
= Abstract =
AMD Research has developed an APU (Accelerated Processing Unit) model that extends gem5 [1] with
+
AMD Research has developed an APU (Accelerated Processing Unit) model that extends gem5 [3] with
a GPU timing model that executes the GCN (Graphics Core Next) generation 3 machine ISA [2]. In
+
a GPU timing model that executes the GCN (Graphics Core Next) generation 3 machine ISA [2, 4]. In
 
addition to supporting a modern machine ISA, the model supports running the open-source Radeon Open
 
addition to supporting a modern machine ISA, the model supports running the open-source Radeon Open
Compute platform (ROCm) stack without modification. This allows users to run a wide variety of
+
Compute platform (ROCm) [1] stack without modification. This allows users to run a wide variety of
 
applications written in several high-level languages, including C++, HIP, OpenMP, and OpenCL. This
 
applications written in several high-level languages, including C++, HIP, OpenMP, and OpenCL. This
 
provides researchers the ability to evaluate many different types of workloads, from traditional compute
 
provides researchers the ability to evaluate many different types of workloads, from traditional compute
Line 39: Line 41:
 
many of the improvements enabled by executing the GCN3 ISA.
 
many of the improvements enabled by executing the GCN3 ISA.
  
[1]. Nathan Binkert, et al. [https://doi.org/10.1145/2024716.2024718 The gem5 Simulator], In SIGARCH Computer Architecture News, vol. 39, no. 2,
+
===== References =====
pp. 1-7, Aug. 2011.
+
# AMD. [https://rocm.github.io/ ROCm]
 +
# AMD. [https://gpuopen.com/compute-product/amd-gcn3-isa-architecture-manual/ AMD GCN3 ISA Architecture Manual]
 +
# Nathan Binkert et al. [https://doi.org/10.1145/2024716.2024718 The gem5 Simulator]. In SIGARCH Computer Architecture News, vol. 39, no. 2, pp. 1-7, Aug. 2011.
 +
# Anthony Gutierrez et al. [https://doi.org/10.1109/HPCA.2018.00058 Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level]. In HPCA 2018.
  
[2]. AMD. [https://gpuopen.com/compute-product/amd-gcn3-isa-architecture-manual/ AMD GCN3 ISA Architecture Manual]
+
= Slides =
 +
Slides from our tutorial, as well as additional documentation, may be found on the [[GPU_Models#GCN3_Based_Simulation|GPU Models]] page.
  
= Slides =
 
 
= Schedule =
 
= Schedule =
TBD
+
{|class="wikitable"
 +
!style="text-align:left;"|Topic
 +
!Presenter
 +
!Time
 +
|-
 +
|Background
 +
|Tony
 +
|style="text-align:right;"|8:00-8:15 am
 +
|-
 +
|ROCm Stack, GCN3 ISA, and uArch
 +
|Tony
 +
|style="text-align:right;"|8:15-9:15 am
 +
|-
 +
|HSA Queuing
 +
|Sooraj
 +
|style="text-align:right;"|9:15-10:00 am
 +
|-
 +
|colspan="2" style="text-align:center;"|Break
 +
|style="text-align:right;"|10:00-10:30 am
 +
|-
 +
|Ruby and GPU Protocol Tester
 +
|Tuan
 +
|style="text-align:right;"|10:30-11:15 am
 +
|-
 +
|Demo/Workloads and Q+A
 +
|Matt Sinclair
 +
|style="text-align:right;"|11:15-12:00 pm
 +
|}
 +
 
 
= Presenters =
 
= Presenters =
 
Tony Gutierrez (AMD Research)
 
Tony Gutierrez (AMD Research)
 +
 +
Sooraj Puthoor (AMD Research)
  
 
Brad Beckmann (AMD Research)
 
Brad Beckmann (AMD Research)
  
Sooraj Puthoor (AMD Research)
+
Tuan Ta (Cornell)
  
Tuan Ta (Cornell)
+
Matt Sinclair (AMD Research)

Latest revision as of 18:18, 24 July 2018


AMD gem5 APU Simulator: Modeling GPUs Using the Machine ISA

Held in conjunction with ISCA 2018. June 2nd, 2018.

Important Dates

The tutorial will be held on day one of the conference - June 2nd, 2018

ISCA 2018 early registration and hotel reservation deadline - April 16th, 2018

Abstract

AMD Research has developed an APU (Accelerated Processing Unit) model that extends gem5 [3] with a GPU timing model that executes the GCN (Graphics Core Next) generation 3 machine ISA [2, 4]. In addition to supporting a modern machine ISA, the model supports running the open-source Radeon Open Compute platform (ROCm) [1] stack without modification. This allows users to run a wide variety of applications written in several high-level languages, including C++, HIP, OpenMP, and OpenCL. This provides researchers the ability to evaluate many different types of workloads, from traditional compute applications to emerging modern GPU workloads, such as task parallel and machine learning applications. The resulting AMD gem5 APU simulator is a cycle-level, flexible research model that is capable of representing many different APU configurations, on-chip cache hierarchies, and system designs. Our APU extensions allow researchers to model both CPU and GPU memory requests and the interactions between them. In particular, the model uses SLICC and Ruby to implement a wide variety of coherence and synchronization solutions, which is a critical research area in heterogeneous computing. The model has been used in several top-tier computer architecture publications in the last several years [MICRO 2013, HPCA 2014, ASPLOS 2014, ISCA 2014, HPCA 2015, ASPLOS 2015, MICRO 2016, HPCA 2017, ISCA 2017, HPCA 2018].

In this tutorial, we will describe the capabilities of the AMD gem5 APU simulator that will be publically released with a liberal BSD license before ISCA 2018. We will detail the simulated APU architecture, review the execution flow, and describe how the simulator has been used. The presentation will also discuss key design decisions and tradeoffs. For example, we use the system-call emulation mode to avoid running a full OS and kernel driver, therefore we will describe the simulator’s system-call emulation interface, and how the ROCm runtime and user space drivers interact with it. Also, our GPU model now directly executes native machine ISA instructions rather than the HSAIL intermediate language representation. Previously relying on executing the intermediate language simplified workload compilation, but was less accurate when modeling hardware behavior. In this tutorial, we will highlight many of the improvements enabled by executing the GCN3 ISA.

References
  1. AMD. ROCm
  2. AMD. AMD GCN3 ISA Architecture Manual
  3. Nathan Binkert et al. The gem5 Simulator. In SIGARCH Computer Architecture News, vol. 39, no. 2, pp. 1-7, Aug. 2011.
  4. Anthony Gutierrez et al. Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level. In HPCA 2018.

Slides

Slides from our tutorial, as well as additional documentation, may be found on the GPU Models page.

Schedule

Topic Presenter Time
Background Tony 8:00-8:15 am
ROCm Stack, GCN3 ISA, and uArch Tony 8:15-9:15 am
HSA Queuing Sooraj 9:15-10:00 am
Break 10:00-10:30 am
Ruby and GPU Protocol Tester Tuan 10:30-11:15 am
Demo/Workloads and Q+A Matt Sinclair 11:15-12:00 pm

Presenters

Tony Gutierrez (AMD Research)

Sooraj Puthoor (AMD Research)

Brad Beckmann (AMD Research)

Tuan Ta (Cornell)

Matt Sinclair (AMD Research)