Splash benchmarks
It is possible to run the SPLASH-2 benchmarks on M5 in two different ways, each with their own caveats.
Using full-system mode
The most robust approach is to compile the benchmarks using a Pthreads implementation of the PARMACS macros, then link with the standard Linux Pthreads library and run this binary on M5 under full-system mode.
Advantages:
- Realistic: you're getting the actual Linux thread scheduler to schedule your threads
- Robust (in contrast to current SE-mode approaches... see below)
- You can build a cross-compiler to compile the binaries on non-Alpha platforms (see Using linux-dist to Create Disk Images and Kernels for M5... note that you don't need to build a kernel, just the cross-compiler).
Disadvantages:
- CPU limits: the Tsunami platform we model only supports 4 CPUs, though we have patches to make that scale to 64 (see Frequently Asked Questions#How many CPUs can M5 run?).
- Overhead: you've got to download a disk image, get the binaries onto the disk image, boot Linux under M5, etc. This isn't nearly as bad as it sounds, but it's still extra work.
For a step by step guide on running the benchmarks in full-system mode see this document.
Using syscall-emulation mode
There are two approaches to support SPLASH benchmarks in SE mode:
- Use the same Pthreads implementation of PARMACS that you can run in FS mode, and enhance SE mode to handle the necessary syscalls.
- Use a custom M5-specific PARMACS library, possibly coupled with M5-specific syscalls, to support only the thread management features needed by SPLASH.
Unfortunately for option 1, supporting general Pthreads applications in SE mode is extremely difficult. Under both Tru64 and Linux, Pthreads uses an extra "management thread" to perform some tasks, which means an N-thread SPLASH application (which is what you'd typically want to run on an N-CPU machine) really has N+1 threads, and suddenly you need a thread scheduler in M5 to figure out which threads are runnable, assign them to CPUs, maybe preempt one of them if all N+1 are runnable, etc. Worse yet, the Linux Pthreads library uses a pipe to communicate from the application threads to the management threads, requiring you to implement poll() and add signal support (so you can deliver SIGIO to threads), and lots of other nasty stuff. Frankly it's just not worth the effort, given that the Linux kernel already has excellent implementations of pipes and poll() and signals, and you can just run that under FS mode (see above).
Option 2 is arguably the "right" way to support SPLASH applications under SE mode. Your custom PARMACS macro implementation can assume you'll never allocate more threads than CPUs, so you don't need any thread scheduling in M5. This implementation could call existing syscalls where appropriate, or call new M5-specific syscalls that are added specifically for this purpose. This is what most existing simulators that support SPLASH do.
Unfortunately neither of these environments currently exist in a clean form. What does exist is a historical artifact that resulted from me (Steve) trying to support option 1 for Tru64, i.e. SPLASH applications compiled to the Tru64 Pthreads library. (This code actually predates M5's support for Alpha Linux.) There is a lot of complex code in src/kern/tru64 that attempts to do this. Partway through I came to the realization I mentioned in the paragraph above, that doing a complete job would end up with me writing a full thread scheduler inside M5. At that point I gave up and finished the job by switching to option 2, adding M5-specific implementations of the remaining PARMACS macros that were giving me trouble, using special M5 syscalls I added for that purpose.
This code (including binaries) is available here. Feel free to use it, but be aware of the following caveats:
- Because this code is based on Tru64, and it's not possible (to our knowledge) to build a gcc cross-compiler that targets Tru64, you can't compile new binaries without a native Alpha Tru64 system.
- Because the code partly tries to support the N+1 thread Tru64 Pthreads model, there may be situations where it doesn't work (e.g., if you use large numbers of processors, or unusual inputs).
- Some of the synchronization primitives use "magic" M5 system calls, so the synchronization overheads may not be realistic.
If you have trouble, see this email message for more information.
In summary, your best bet is to use FS mode. If someone would like to do an Linux-based "option 2" implementation for SE mode, that would be terrific, and we would be happy to redistribute that with M5. However, to date, that has not happened. Meanwhile, you're welcome to use the Tru64 code, but be aware that it's got some issues.
