Difference between revisions of "GPU Models"

From gem5
Jump to: navigation, search
 
(35 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
== AMD's Compute-GPU Model ==
 
== AMD's Compute-GPU Model ==
 +
=== GCN3 Based Simulation ===
 +
[[Media:gutierrez_hpca_2018_lost_in_abstraction.pdf|An HPCA paper]] was published in 2018 that describes the GCN3 model.
 +
==== ISCA 2018 tutorial ====
 +
A tutorial was held on June 2nd, 2018, in conjuction with the 45th International Symposium on Computer Architecture (ISCA).
 +
Our presentation can be found here: [[Media:AMD_gem5_APU_simulator_isca_2018_gem5_wiki.pdf | The AMD gem5 APU Simulator: Modeling GPUs Using the Machine ISA]].
 +
The GCN3 ISA is supported in [https://gem5.googlesource.com/amd/gem5/+/agutierr/master-gcn3-staging AMD's public pre-release gem5 repo]
 +
on the branch agutierr/master-gcn3-staging.
  
== Compute GPU Workloads ==
+
==== Cloning the repository ====
 +
To clone the repo with GCN3 support use the following command:
  
=== Emualted CL Runtime ===
+
<code>git clone https://gem5.googlesource.com/amd/gem5 -b agutierr/master-gcn3-staging</code>
* Download the [http://www.gem5.org/dist/current/gpu/cl-runtime.xz emulated OpenCL runtime].
 
  
=== Rondinia Benchmark Suite ===
+
==== Building the simulator with GPU and GCN3 support ====
 +
Currently, the GPU model only works with X86 and the VIPER protocol, which you can read about in the slides
 +
from AMD's 2018 ISCA tutorial. To build gem5 with a GCN3-based GPU model included use the following command:
 +
 
 +
<code>scons -sQ -jN ./build/GCN3_X86/gem5.opt</code>
 +
 
 +
==== Simulation support for ROCm ====
 +
In contrast to HSAIL execution, the GCN3 model does not rely on an emulated runtime (i.e., a simulator-specific
 +
implementation of the GPU runtime API). Instead, the model was designed with enough fidelity to run the userspace
 +
components of an off-the-shelf version of the Radeon Open Compute platform (ROCm). ROCm is an open platform from
 +
AMD that implements [http://www.hsafoundation.com/ Heterogeneous Systems Architecture (HSA)] principles. More
 +
information about the HSA standard can be found on the HSA Foundation's website.
 +
 
 +
The model currently only works
 +
with system-call emulation (SE) mode, therefore all kernel level driver functionality is modeled entirely within the
 +
SE mode layer of gem5. In particular, the emulated GPU driver supports the necessary <code>ioctl()</code> commands
 +
it receives from the userspace code. The source for the emulated GPU driver can be found in:
 +
 
 +
* The GPU compute driver: <code>src/gpu-compute/gpu_compute_driver.[hh|cc]</code>
 +
 
 +
* The HSA device driver: <code>src/dev/hsa/hsa_driver.[hh|cc]</code>
 +
 
 +
The HSA driver code models the basic functionality for an HSA agent, which is any device that can be targeted by the HSA runtime and accepts Architected Query Language (AQL) packets. AQL packets are a standard format for all HSA agents, and are used primarily to initiate kernel launches on the GPU. The base <code>HSADriver</code> class holds a pointer to the HSA packet processor for the device, and defines the interface for any HSA device. An HSA agent does not have to be a GPU, it could be a generic accelerator, CPU, NIC, etc.
 +
 
 +
The <code>GPUComputeDriver</code> derives from <code>HSADriver</code> and is a device-specific implementation of an <code>HSADriver</code>. It provides the implementation for GPU-specific <code>ioctl()</code> calls.
 +
 
 +
===== ROCm tool chain and software stack =====
 +
In order to build and run applications for ROCm and GCN3 you need several ROCm components. These are:
 +
 
 +
* [https://github.com/RadeonOpenCompute/hcc Heterogeneous Compute Compiler (HCC)]
 +
* [https://github.com/RadeonOpenCompute/ROCR-Runtime Radeon Open Compute runtime (ROCr)]
 +
* [https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface Radeon Open Compute thunk (ROCt)]
 +
* [https://github.com/ROCm-Developer-Tools/HIP HIP] (optional)
 +
 
 +
Only the roc-1.6.x branch of the necessary ROCm components are supported, so be sure to include <code>-b roc-1.6.x</code> when cloning.
 +
The recommended compiler to build these components is gcc 5.4.0.
 +
 
 +
Alternatively, there are deb and yum packages for ROCm archived here: [http://repo.radeon.com/rocm/archive/ ROCm archive].
 +
 
 +
When building gem5's GPU model you must make sure that the <code>src/dev/hsa/kfd_ioctl.h</code> header matches the <code>kfd_ioctl.h</code> header that comes with ROCt.
 +
The emulated driver, <code>src/gpu-compute/gpu_compute_driver.[hh|cc]</code> relies on this file to interpret the <code>ioctl()</code> codes that the thunk
 +
uses.
 +
 
 +
Before running any ROCm-based GPU applications in gem5, you need to make sure that the simulated environment is set properly in <code>configs/example/apu_se.py</code>.
 +
The <code>LD_LIBRARY_PATH</code> in particular must point to the ROCm installation on your local machine.
 +
 
 +
===== Example HC applications =====
 +
No GPU applications have been included with the release of GCN3/ROCm support in gem5, however there are several public repositories with HC/AMP applications
 +
that are known to work with gem5's GCN3-based GPU model:
 +
 
 +
* [https://github.com/ROCm-Developer-Tools/HCC-Example-Application Example HCC applications]
 +
* [https://github.com/AMDComputeLibraries/ComputeApps Example compute proxy applications]
 +
 
 +
These applications may contain host-to-device copies, which are not necessary in the APU systems modeled in gem5. It is advised that you remove all copies in these applications and instead directly pass pointers to host-allocated data structures to the GPU kernels. This if fine for an APU because a "pointer is a pointer" and may be shared across the host and device.
 +
 
 +
===== HIPifying applications =====
 +
TODO
 +
 
 +
=== HSAIL Based Simulation ===
 +
The HSAIL-based GPU model is still the model that is included in the mainline gem5 repository, however it is no longer supported and will be deprecated once GCN3 support is fully merged into the mainline. It is recommended that users start with the GCN3 model in AMD's public pre-release repository.
 +
 
 +
==== MICRO-48 Tutoral ====
 +
A tutorial was held in conjunction with MICRO-48. We have made the slides available from our 2015 tutorial titled: [[Media:AMD_gem5_APU_simulator_micro_2015_final.pptx | The AMD gem5 APU Simulator: Modeling Heterogeneous Systems in gem5]].
 +
 
 +
==== Emualted CL Runtime ====
 +
* Download the [http://www.gem5.org/dist/current/gpu/cl-runtime.tar.xz emulated OpenCL runtime].
 +
 
 +
==== OpenCL Compiler ====
 +
[https://github.com/HSAFoundation/CLOC CLOC] is used to compile OpenCL kernels for use with gem5's GPU compute model. The most recent revision of CLOC that is known to work with gem5 is:
 +
 
 +
commit cf777856cfce86d11ea97c245992971159b85a4d
 +
 
 +
== ARM's NoMali GPU Model ==
 +
 
 +
The NoMali GPU model models the interface used by ARM Mali GPUs. The model does not render or compute anything, but can be used to fake a GPU. This enables Android and ChromeOS experiments without software rendering which would otherwise make simulation results extremely misleading. It was [[media:2015_ws_04_ISCA_2015_NoMali.pdf|presented]] in the [[User_workshop_2015|2015 gem5 User Workshop]].
 +
 
 +
Getting started instructions are currently available for [[Android_KitKat|Android 4.4 (KitKat)]].

Latest revision as of 12:10, 17 September 2019

AMD's Compute-GPU Model

GCN3 Based Simulation

An HPCA paper was published in 2018 that describes the GCN3 model.

ISCA 2018 tutorial

A tutorial was held on June 2nd, 2018, in conjuction with the 45th International Symposium on Computer Architecture (ISCA). Our presentation can be found here: The AMD gem5 APU Simulator: Modeling GPUs Using the Machine ISA. The GCN3 ISA is supported in AMD's public pre-release gem5 repo on the branch agutierr/master-gcn3-staging.

Cloning the repository

To clone the repo with GCN3 support use the following command:

git clone https://gem5.googlesource.com/amd/gem5 -b agutierr/master-gcn3-staging

Building the simulator with GPU and GCN3 support

Currently, the GPU model only works with X86 and the VIPER protocol, which you can read about in the slides from AMD's 2018 ISCA tutorial. To build gem5 with a GCN3-based GPU model included use the following command:

scons -sQ -jN ./build/GCN3_X86/gem5.opt

Simulation support for ROCm

In contrast to HSAIL execution, the GCN3 model does not rely on an emulated runtime (i.e., a simulator-specific implementation of the GPU runtime API). Instead, the model was designed with enough fidelity to run the userspace components of an off-the-shelf version of the Radeon Open Compute platform (ROCm). ROCm is an open platform from AMD that implements Heterogeneous Systems Architecture (HSA) principles. More information about the HSA standard can be found on the HSA Foundation's website.

The model currently only works with system-call emulation (SE) mode, therefore all kernel level driver functionality is modeled entirely within the SE mode layer of gem5. In particular, the emulated GPU driver supports the necessary ioctl() commands it receives from the userspace code. The source for the emulated GPU driver can be found in:

  • The GPU compute driver: src/gpu-compute/gpu_compute_driver.[hh|cc]
  • The HSA device driver: src/dev/hsa/hsa_driver.[hh|cc]

The HSA driver code models the basic functionality for an HSA agent, which is any device that can be targeted by the HSA runtime and accepts Architected Query Language (AQL) packets. AQL packets are a standard format for all HSA agents, and are used primarily to initiate kernel launches on the GPU. The base HSADriver class holds a pointer to the HSA packet processor for the device, and defines the interface for any HSA device. An HSA agent does not have to be a GPU, it could be a generic accelerator, CPU, NIC, etc.

The GPUComputeDriver derives from HSADriver and is a device-specific implementation of an HSADriver. It provides the implementation for GPU-specific ioctl() calls.

ROCm tool chain and software stack

In order to build and run applications for ROCm and GCN3 you need several ROCm components. These are:

Only the roc-1.6.x branch of the necessary ROCm components are supported, so be sure to include -b roc-1.6.x when cloning. The recommended compiler to build these components is gcc 5.4.0.

Alternatively, there are deb and yum packages for ROCm archived here: ROCm archive.

When building gem5's GPU model you must make sure that the src/dev/hsa/kfd_ioctl.h header matches the kfd_ioctl.h header that comes with ROCt. The emulated driver, src/gpu-compute/gpu_compute_driver.[hh|cc] relies on this file to interpret the ioctl() codes that the thunk uses.

Before running any ROCm-based GPU applications in gem5, you need to make sure that the simulated environment is set properly in configs/example/apu_se.py. The LD_LIBRARY_PATH in particular must point to the ROCm installation on your local machine.

Example HC applications

No GPU applications have been included with the release of GCN3/ROCm support in gem5, however there are several public repositories with HC/AMP applications that are known to work with gem5's GCN3-based GPU model:

These applications may contain host-to-device copies, which are not necessary in the APU systems modeled in gem5. It is advised that you remove all copies in these applications and instead directly pass pointers to host-allocated data structures to the GPU kernels. This if fine for an APU because a "pointer is a pointer" and may be shared across the host and device.

HIPifying applications

TODO

HSAIL Based Simulation

The HSAIL-based GPU model is still the model that is included in the mainline gem5 repository, however it is no longer supported and will be deprecated once GCN3 support is fully merged into the mainline. It is recommended that users start with the GCN3 model in AMD's public pre-release repository.

MICRO-48 Tutoral

A tutorial was held in conjunction with MICRO-48. We have made the slides available from our 2015 tutorial titled: The AMD gem5 APU Simulator: Modeling Heterogeneous Systems in gem5.

Emualted CL Runtime

OpenCL Compiler

CLOC is used to compile OpenCL kernels for use with gem5's GPU compute model. The most recent revision of CLOC that is known to work with gem5 is:

commit cf777856cfce86d11ea97c245992971159b85a4d

ARM's NoMali GPU Model

The NoMali GPU model models the interface used by ARM Mali GPUs. The model does not render or compute anything, but can be used to fake a GPU. This enables Android and ChromeOS experiments without software rendering which would otherwise make simulation results extremely misleading. It was presented in the 2015 gem5 User Workshop.

Getting started instructions are currently available for Android 4.4 (KitKat).