• Tutorial 1: Performance analysis for High Performance Systems
  • Tutorial 2: Understanding and managing hardware affinities with hwloc
  • Tutorial 3: Insightful Automatic Performance Modeling

Tutorial 1

francois_trahayAssistant Prof. François Trahay
Telecom SudParis

Title: Performance analysis for High Performance Systems

Abstract: This tutorial presents the EZTrace framework for performance analysis. We present how to analyze the performance of MPI, OpenMP and hydrid applications with EZTrace. We first introduce the general performance analysis workflow as well as how to trace a simple application. Then, we present how the combinaison of EZTrace plugin allows users to analyize hybrid application or to gather hardware counters with PAPI. The last part of the tutorial will focus on how to write an EZTrace plugin in order to analyze precisely a particular application or library and to collect performance data.

Content level: Introductory: 40 %, Intermediate: 30 %, Advanced: 30 %

Attendee Requirements: The exercises will require to connect to a cluster. Thus a laptop will be necessary. A temporary account on a cluster where EZTrace is installed will be provided for this tutorial.

Audience Prerequisites: The level of the presentations and particularly the hands-on exercises requires a general understanding of HPC applications and parallel programming with MPI and/or OpenMP. Familiarity with any form of mixed-mode parallel programming is advantageous but not necessary.

Targeted Audience:

– Application developpers, striving for best application performance on HPC systems
– Runtime system developpers who want to understand the performance of their libraries
– Others interested in programming tool environments and application tuning

Tutorial 2

Brice GoglinResearch Scientist Brice Goglin
Inria Bordeaux Sud-Ouest

Title: Understanding and managing hardware affinities with Hardware Locality (hwloc)

Abstract: This tutorial will walk the audience across the complexity of modern computing servers. We will detail why those characteristics are important to HPC application developers and why they are difficult to manage manually. We will then introduce the Hardware Locality software (hwloc) which is developed to make developers’ life easier by abstracting hardware topologies in a portable way.

Content level: 33% Introductory, 67% Intermediate

Attendee Requirements: Some of the exercises will require computer access, thus a laptop will be necessary. SSH access to a Linux cluster is recommended. The tutorial will also explain to build hwloc using tarballs from http://www.open-mpi.org/projects/hwloc/.

Audience Prerequisites: Basic knowledge of parallel architectures and of wide-spread programming models and HPC runtimes (MPI, OpenMP, etc).

Targeted Audience: The tutorial is designed for HPC developers that want to optimize their application or library according to hardware affinity.

Tutorial 3

Alexandru Calotoiu, Torsten Hoefler, Martin Schulz, Felix Wolf

Title: Insightful Automatic Performance Modeling

Abstract: Many applications suffer from latent performance limitations that may cause them to consume too many resources under certain conditions. Examples include an unexpected growth of the execution time as the number of processes or the size of the input problem is increased. Solving this problem requires the ability to properly model the performance of a program to understand its optimization potential in different scenarios. In this tutorial, we will present a method to automatically generate such models for individual parts of a program from a small set of measurements. We will further introduce a tool that implements our method and teach how to use it in practice. The learning objective of this tutorial is to familiarize the attendees with the ideas behind our modeling approach and to enable them to repeat experiments at home.

Content level: Introductory: 75 %, Intermediate: 25 %, Advanced: 5 %

Attendee Requirements: We expect the participants to provide their own laptops if they wish to follow the demo on their own laptop potentially using their own applications.

Audience Prerequisites: The content of this tutorial is aimed at participants with knowledge of parallel computing but will not require prior knowledge of performance modeling.

Targeted Audience: There are two groups of people which can gain the most benefit from the use of our tool: Application developers who are new to performance modeling can gain insights into the behavior of their application and understand the performance of the different regions of their code. Experts who already have detailed expectations of the performance of certain regions of an application can validate their applications and extend the coverage of their analysis using the automation our method provides. Our approach focuses on parallel SPMD-style applications (e.g., based on MPI).