S1
Sunday, Full Day
Room A102/104/106
Title: Introduction to
Clusters: Build Yourself a PC Cluster NOW!
Presenters: Kwai L. Wong
and Christian Halloy, University of Tennessee/Oak Ridge National Laboratory
Level: 50% Introductory
| 30% Intermediate | 20% Advanced
Abstract:
This will be a well-paced,
information-rich full-day tutorial describing step-by-step, with live
demos, how to build and utilize a Linux PC cluster for High Performance
Computing. PC clusters are being used for parallel computing applications
at many sites around the world and they offer by far the best price/performance
ratio. Linux-based PC clusters are receiving increasingly more attention,
but putting them together remains a challenge to most scientists and researchers.
This tutorial offers a unique opportunity for participants to learn how
to effectively build a PC cluster on-site from scratch. Several computers
will be used to demonstrate in detail how to install, configure, customize,
and eventually compute on the newly built cluster.
This tutorial will have three
parts. The first part will cover basic Linux installation on an individual
workstation. The second part will focus on integrating additional PCs
within a cluster through network configuration using a switch. The final
part will emphasize the installation of several freely available useful
scientific libraries such as PVM, MPI, Scalapack, Petsc, Aztec, etc. Benchmark
computations will be carried out to demonstrate the functionality of the
finished PC cluster.
The authors
have conducted similar training courses and have constructed a research
production PC cluster for ORNL's Solid State Division in Spring 2000.
See http://www.jics.utk.edu/SSDcluster.html
for additional information.
Related
Tutorial: M1 - Advanced Topics in HPC Linux Cluster
Design and Administration |
S2
Sunday, Full Day
Room A101/103/105
Title: Introduction to Effective
Parallel Computing
Presenters: Quentin
F. Stout and Christiane Jablonowski, University of Michigan
Level:
75% Introductory | 25% Intermediate
Abstract:
This tutorial will provide
a comprehensive overview of parallel computing, emphasizing those aspects
most relevant to the user. It is suitable for new users, managers, students
and anyone needing a general overview of parallel computing. It discusses
software and hardware, with an emphasis on standards, portability, and
systems that are now (or soon will be) commercially or freely available.
Computers examined range from low-cost clusters to highly integrated supercomputers.
The tutorial surveys basic concepts and terminology, and gives parallelization
examples selected from engineering, scientific, and data intensive applications.
These real-world examples are targeted at distributed memory systems using
MPI, and shared memory systems using OpenMP. The tutorial shows basic
parallelization approaches and discusses some of the software engineering
aspects of the parallelization process, including the use of tools. It
also discusses techniques for improving parallel performance. It helps
attendees make intelligent planning decisions by covering the primary
options that are available, explaining how they are used and what they
are most suitable for. The tutorial also provides pointers to the literature
and web-based resources. |
S3 Sunday,
Full Day
Room A107
Title:
Practical Automatic Performance Analysis
Presenters:
Michael Gerndt, University of Technology Munich; Barton P. Miller,
University of Wisconsin; Tomàs Margalef, Autonomous University
of Barcelona; Bernd Mohr, Research Centre Juelich
Level: 10% Introductory
| 50% Intermediate | 40% Advanced
Abstract:
Efficient
usage of today's hierarchical clustered machines promises scalable high
performance at low cost but often demands the usage of more than one parallel
programming models in the same application. As a consequence, performance
analysis and tuning become more difficult and creates a need for advanced
tools.
In the last years
progress was made towards the design and implementation of automatic performance
analysis tools. First research tools are already available that either
allow to automatically analyze program traces, e.g., Kappa-Pi and KOJAK,
or even further implement a fully automatic on-line search, e.g., Paradyn.
These tools will be presented in this tutorial. In addition, the tutorial
will give an overview of standard and new performance analysis techniques,
a concise presentation of performance properties for MPI and OpenMP and
an overview of other automatic performance analysis tools not presented
in this tutorial.
The tutorial
will be a combination of presentation and online demonstrations. It will
cover information which application people as well as tool developers
will find most useful.
The tutorial is given by members
of the APART working group (Automatic Performance Analysis: Resources and
Tools) which is funded by the European Commission. APART is a collaborative
effort of more than twenty partners from United States and Europe. Over
the next years, APART will coordinate several development projects for automatic
performance analysis tools in Europe and the United States.
Related Tutorials: M8
- Performance Tuning Using Hardware Counter Data; M11
- Performance Technology for Complex Parallel Systems |
S4 Sunday, Full Day
Room A108
Title: Java for High
Performance Computing: Performance and Parallelisation
Presenters: Lorna Smith,
Mark Bull, and David Henty, Edinburgh Parallel Computing Centre, The University
of Edinburgh
Level:
20% Introductory | 60% Intermediate | 20% Advanced
Abstract:
Java offers a number of benefits
as a language for High Performance Computing (HPC), especially in the
context of the Computational Grid. For example, Java offers a high level
of platform independence not observed with traditional HPC languages.
This is crucial in an area where the lifetime of application codes exceeds
that of most machines. In addition, the object-oriented nature of Java
facilitates code re-use and reduces development time.
There are however a number of
issues surrounding the use of Java for HPC, principally performance, numerics
and parallelism. EPCC is leading the Benchmarking initiative of the Java
Grande Forum, which is specifically concerned with performance. The tutorial
will focus on this work, examining performance issues relevant to HPC applications.
It will consider benchmarks for evaluating different Java environments,
for inter-language comparisons and for testing the performance and scalability
of different Java parallel models (native threads, message passing and OpenMP).
The aim is to demonstrate that
performance no longer prohibits Java as a base language for HPC and that
the available parallel models offer realistic mechanisms for the development
of parallel applications. The tutorial will include a number of practical
coding sessions which reinforce the concepts described in the lectures. |
S5 Sunday, Full Day
Room A110
Title: High-Performance
Numerical Linear Algebra: Fast and Robust Kernels for Scientific Computing
Presenters: Jack Dongarra,
University of Tennessee; Iain Duff, Rutherford Appleton Laboratory; Danny
Sorensen, Rice University; Henk van der Vorst, Utrecht Rutherford Appleton
Laboratory University
Level: 20% Introductory
| 50% Intermediate | 30% Advanced
Abstract:
Present computers, even workstations
and personal computers, allow the solution of very large-scale problems
in science and engineering. A major part of the computational effort goes
in solving linear algebra subproblems. We will discuss a variety of algorithms
for these problems indicating where each is appropriate and emphasizing
their efficient implementation. Many of the sequential algorithms used satisfactorily
on traditional machines fail to exploit the architecture of modern computers.
We will consider techniques devised to utilize modern architectures more
fully, especially the design of the Level 1, 2, 3 BLAS, LAPACK and ScaLAPACK.
For large sparse linear systems
we will give an introduction to this field and guidelines on the selection
of appropriate software. We will consider both direct methods and iterative
methods of solution. In the case of direct methods, we will emphasize frontal
and multifrontal methods including variants performing well on parallel
machines. For iterative methods, our discussion will include CG, MINRES,
SYMMLQ, BiCG, QMR, CGS, BiCGSTAB, GMRES, and LSQR. For large-sparse eigenproblems
we will discuss some of the most widely used methods such as Lanczos, Arnoldi,
and Jacobi-Davidson. The Implicitly Restarted Arnoldi Method will be introduced
along with the software ARPACK that is based upon that method. |
S6 Sunday, Full Day
Room A201
Title: Sharable and Scalable
I/O Solutions for High Performance Computing Applications
Presenters: Larry Schoof,
Sandia National Laboratory; Mark Miller, Lawrence Livermore National Laboratory;
Mike Folk and Albert Cheng, National Center for Supercomputing Applications
Level: 30% Introductory
| 50% Intermediate | 20% Advanced
Abstract:
Two challenges facing HPC applications
are the need to improve I/O performance, and an ability to share complex
scientific data and data analysis software. The computational times for
HPC applications have decreased in recent years by 2-3 orders of magnitude,
but unfortunately I/O performance has not kept pace with these impressive
increases in raw compute power. Scientific data and tools for working with
scientific data have also evolved, from application stovepipes
(little/no data interoperability) to a recognition of the value of large-scale
integration (full data interoperability) that facilitates sharing data and
tools among applications and across a varied and changing landscape of computing
environments. In this tutorial we discuss two complementary I/O libraries
that address these issues. The first, Hierarchical Data Format Version 5
(HDF5) represents and operates on scientific data as concrete arrays. The
second, Sets and Fields (SAF) data modeling system, represents and operates
on scientific data as abstract fields. Building upon HDF5 as a foundation,
SAF encapsulates parallel and scientific constructs intrinsically to provide
greater sharability of data and interoperability of software. |
S15 Sunday, Full Day
Room C102 - C104
Title: High Performance
Computing: What Role for the Individual Microprocessor, if any
Presenters: Yale N. Patt,
The University of Texas at Austin
Level: 30% Introductory
| 40% Intermediate | 30% Advanced
Abstract:
High performance computing applications
continue to want more and more performance capability. Where does the individual
microprocessor fit? Process technology promises one billion transistors
on each silicon die, running at 10 GHz in a few years. Can that technology
be harnessed, or are the nay-sayers right that Moore's Law is dead and the
problems of increasing single chip performance are just too hard. This tutorial
will try to do several things. We will look at the arguments of the nay-sayers,
and point out why they should not deter us. We will explore the bottlenecks
of a microarchitecture vis-a-vis high performance, and describe how we are
overcoming them. We will examine the relevant characteristics of some relevant
state-of-the-art microprocessors. Finally, we will discuss what we might
see on a chip five years from now. More on the presenter can be found at
http://www.ece.utexas.edu/~patt. |
S7 Sunday, Half Day, AM
Room A205
Title: Introduction to
Parallel Programming with OpenMP
Presenters: Tim Mattson,
Intel Corporation; Rudolf Eigenmann, Purdue University
Level: 75% Introductory
| 20% Intermediate | 5% Advanced
Abstract:
OpenMP is an Application Programming
Interface for directive-driven parallel programming of shared memory computers.
Fortran, C and C++ compilers supporting OpenMP are available for Unix and
Windows workstations. Most vendors of shared memory computers are committed
to OpenMP making it the de facto standard for writing portable, shared memory,
parallel programs. This tutorial will provide a comprehensive introduction
to OpenMP. We will start with basic concepts to bring the novice up to speed.
We will then present a few more advanced examples to give some insight into
the issues that come up for experienced OpenMP programmers.
Related Tutorial:
|
S8 Sunday, Half Day, AM
Room A207
Title: Understanding Network
Performance
Presenters: Phillip Dykstra,
WareOnEarth Communications Inc.
Level: 30% Introductory
| 50% Intermediate | 20% Advanced
Abstract:
Supercomputers today are usually
connected to local and wide area networks capable of transferring data at
hundreds of megabits per second or more. Most remote users however only
see a fraction of that potential. Wide area transfer rates less than ten
million bits per second are still commonplace. Why is this and what can
users and administrators of networks and systems do to improve the situation?
This tutorial will introduce the
environment of high performance networking today. The behavior of TCP will
be explained in detail along with factors that limit performance. Included
are well known tuning issues such as "window sizes" and lesser
known factors such as packet size, loss rates, and delay. Recent and proposed
performance related protocol changes will be discussed.
Numerous tools are introduced
that can be used to measure, debug, and tune end systems and networks. You
will learn how these tools work and what they tell you. The attendee should
come away with a better understanding of what is happening to their data
on the network and what is required to achieve higher performance.
Sample material:
http://sd.wareonearth.com/~phil/sc2001
Related Tutorial: S12
Achieving Network Performance |
S9 Sunday, Half Day, AM
Room A209
Title: The Emerging Grid:
Introduction, Tools, Applications
Presenters: Ian Foster,
Argonne National Laboratory; Ed Seidel, The Max Planck Institute for Gravitational
Physics, Albert Einstein Institute
Level: 85% Introductory
| 15% Intermediate | 0% advanced
Abstract:
The paradigm of grid computing
is currently being deployed to solve some of our most challenging computing
problems. This tutorial is for people interested in getting acquainted with
grid computing technologies and approaches and in exploring how to apply
grid technologies to their own large-scale computing problems. It will also
be of interest to those with previous exposure to the concept who seek an
update on how grid techniques have matured and solidified in the past year.
The tutorial will explorelargely
through first-hand exampleshow to apply grid techniques to complex
problems in scientific and engineering computation. The tutorial provides
a pragmatic overview of the grid concept, based on the latest models of
grid architecture. It surveys several technologies that can be used to construct
grids, focusing on the Globus Toolkit, Condor, GridPort, and Cactus. It
illustrates the accomplishments, plans, and challenges faced by large Grid
projects including the Grid Physics Network, the Particle Physics Data Grid,
the NASA Information Power Grid, the Network for Earthquake Engineering
Simulation Grid, and the Earth Systems Grid. It also includes a brief review
of current research efforts to extend the scope, utility, and ease of grid
computing.
Related Tutorial: S13
- Data Grids: Drivers, Technologies, Opportunities |
S10 Sunday, Half Day, AM
Room A112
Title: Mixed-Mode Programming
Introduction
Presenters: Daniel Duffy
and Mark R. Fahey, Computer Sciences Corporation - Engineer Research and
Development Center Major Shared Resource Center
Level: 40% Introductory
| 40% Intermediate | 20% Advanced
Abstract:
This tutorial will discuss the
benefits and pitfalls of multilevel parallelism (MP) using the Message Passing
Interface (MPI) combined with threads. Examples from the author's experiences
will be discussed to give motivation to why multilevel parallelism is beneficial.
While a general knowledge of MPI is assumed, the presentation will introduce
both OpenMP directives and Pthreads. Furthermore, starting from the context
of multithreading an existing MPI application, a general method of how to
include threads will be discussed.
Included will be discussions of
the pros and cons of various tools that can be used across platforms to
help the application developer to optimize and debug an MP program. Also,
sample codes showing a comparison of different MP methods will be shown
and their resulting speedups presented. Finally, lessons learned from unsuccessful
experiences of the authors will be presented. |
S11 Sunday, Half Day, PM
Room A205
Title: Advanced Parallel
Programming with OpenMP
Presenters: Tim Mattson,
Intel Corporation; Rudolf Eigenmann, Purdue University
Level: 10% Introductory
| 50% Intermediate | 40% Advanced
Abstract:
OpenMP is rapidly becoming the
programming model of choice for shared-memory machines. After a very brief
overview of OpenMP basics we will move on to intermediate and advanced topics,
such as advanced OpenMP language features, traps that programmers may fall
into, and a more extensive outlook on future OpenMP developments. We will
also briefly discuss mixing OpenMP with message passing applications written
in MPI. We will present many examples of OpenMP programs and discuss their
performance behavior.
Related Tutorial: S7
- Introduction to Parallel Programming with OpenMP |
S12 Sunday, Half Day, PM
Room A207
Title: Achieving Network
Performance
Presenters: John Estabrook
and Jim Ferguson, National Laboratory for Applied Network Research and the
National Center for Supercomputing Applications
Level: 20% Introductory
| 60% Intermediate | 20% Advanced
Abstract:
High-bandwidth Wide Area Networks
(WANs) deployed in recent years by various Federal agencies and others have
brought sky-high expectations and an equal amount of disappointment to many
who have used them. The problems with poor end-to-end performance on what
should be a fast network connection mostly lie closer to the ends of the
network than the well-engineered backbones. Application specialists and
engineers with the National Laboratory for Applied Network Research (NLANR,
www.nlanr.net) have developed tools and collected knowledge that can assist
both applications developers and their local network support staff. This
tutorial will specifically address typical problems encountered when applications
that run successfully in a Local Area Network are ported to a run on a Wide
Area Network. This tutorial will complement the tutorial "Understanding
Network Performance" which will focus on the underlying issues of TCP.
This tutorial will focus on application level issues, resources for network
monitoring, and the "state of the backbone".
Related Tutorial: S8
Understanding Network Performance |
S13 Sunday, Half Day, PM
Room A209
Title: Data Grids: Drivers,
Technologies, Opportunities
Presenters: Ann Chervenak
, USC Information Sciences Institute; Michael Wilde, Argonne National Laboratory
Abstract:
In numerous scientific, engineering,
and business disciplines, terabyte- and petabyte-scale data collections
are emerging as critical resources. These data sets must be shared by large
communities of users that pool their resources from a large number of institutions.
This 2-part tutorial shows how to design and implement new information infrastructures
called "Data Grids" to access and analyze the enormous distributed
datasets employed by these communities. Part 1 surveys the current body
of data grid concepts and techniques. It details the goals, requirements,
and architectures of both deployed and proposed data grids. Examples will
be drawn from case studies and detailed requirements analyses from physics,
climate science, and engineering communities. Part 2 presents data grid
implementation tools and techniques. We start by examining how to use Grid-enabled
data transport and file replication components in application environments.
We then focus on Grid-enabling applications directly with Data Grid toolkit
components, and conclude with illustrations of integrating components of
the Globus Toolkit for security, policy management, and resource monitoring
with data management capabilities.
Related Tutorial: S9
- The Emerging Grid: Introduction, Tools, Applications |
S14 Sunday, Half Day, PM
Room A112
Title: An Introduction
to the TotalView Debugger
Presenters: Blaise M. Barney,
Lawrence Livermore National Laboratory
Level: 60% Introductory
| 20% Intermediate | 20% Advanced
Abstract:
The TotalView debugger has become
a "de facto standard" tool within the High Performance Computing
industry for debugging cross-platform, cross-language, multi-model parallel
applications. TotalView's easy-to-use graphical user interface provides
the means to see what an application is "really" doing at the
deepest level. TotalView has been selected by the U.S. Department of Energy
as the debugger software of choice for its Accelerated Strategic Computing
Initiative (ASCI) program. TotalView has likewise been selected by a growing
number of telco, petroleum, aerospace, university and HPC organizations
as their debugger of choice.
This tutorial will begin by covering
all of the essentials for using TotalView in a general programming environment.
After covering these essentials, an emphasis will be placed upon debugging
parallel programs, including threaded, MPI, OpenMP and hybrid programs.
Though this tutorial would be best accompanied by hands-on exercises, the
attendee will benefit from the many graphical examples and "screen
captures" of TotalView debug sessions. This tutorial will conclude
with examples and suggestions for the sometimes challenging task of debugging
programs while they are executing in a batch system. |
M1 Monday, Full Day
Room A102
Title: Advanced Topics
in HPC Linux Cluster Design and Administration
Presenters: Troy Baer and
Doug Johnson, Ohio Supercomputer Center
Level: 10% Introductory
| 60% Intermediate | 30% Advanced
Abstract:
This tutorial describes a methodology
for designing, installing, and administering a cluster of commodity computers
as a resource for high performance computing in a production environment.
This methodology is a result of past experience in cluster computing at
OSC and elsewhere, and is used on both OSC's production cluster systems
and on the distributed set of clusters deployed by OSC's Cluster Ohio project.
The tutorial discusses system
design and installation, software configuration, high performance networks,
resource management, scheduling, and performance monitoring. Wherever possible,
currently available technologies and best current practice are described
and related back to a common configuration, that of a cluster of dual-processor
IA32 nodes interconnected by Myrinet. Other architectures and interconnect
technologies are also discussed.
Related Tutorial: S1
- Introduction to Clusters: Build Yourself a PC Cluster NOW! |
M3 Monday, Full Day
Room A201
Title: Securing Your Network
Presenters: Paula C. Albrecht,
Crystal Clear Computing, Inc.
Level: 40% Introductory
| 50% Intermediate | 10% Advanced
Abstract:
Pick up a newspaper, turn on the news, or read a magazine
- security is a hot topic. Each year, the number of system and network
attacks from the Internet continue to rise. Organizations need to be aware
of these attacks and how to protect their systems and networks. This tutorial
provides an overview of common attacks and the technologies available
to protect your network against them. We will first take a look at who
is attacking networks and the types of attacks they are using. Then we
will examine techniques available for securing your systems and networks.
These include cryptography, public key infrastructure (PKI), firewalls,
virtual private networks (VPNs), and intrusion detection. We will introduce
security terminology and network security concepts. Some detailed information
on how the security technologies work and example uses will also be provided.
If you are concerned about the security of your systems and networks,
or ever wondered what network security is all aboutthis tutorial
is for you. |
M4 Monday, Full Day
Room 205
Title: Using MPI-2: Advanced
Features of the Message-Passing Interface
Presenters: William Gropp,
Ewing (Rusty) Lusk, and Rob Ross, Argonne National Laboratory; Rajeev Thakur,
PRISMedia Networks, Inc.
Level: 20% Introductory
| 40% Intermediate | 40% Advanced
Abstract:
This tutorial is about how to
use MPI-2, the collection of advanced features that were added to MPI (Message-Passing
Interface) by the second MPI Forum. These features include parallel I/O,
one-sided communication, dynamic process management, language interoperability,
and some miscellaneous features. Implementations of MPI-2 are beginning
to appear: a few vendors have complete implementations; other vendors and
research groups have implemented subsets of MPI-2, with plans for complete
implementations.
This tutorial explains how to
use MPI-2 in practice, particularly, how to use MPI-2 in a way that results
in high performance. We present each feature of MPI-2 in the form of a series
of examples (in C, Fortran, and C++), starting with simple programs and
moving on to more complex ones. We also discuss how to combine MPI with
OpenMP. We assume that attendees are familiar with the basic message-passing
concepts of MPI-1. |
M5 Monday, Full Day
Room A207
Title: Extreme! Scientific
Parallel Computing
Presenters: Alice E. Koniges,
David C. Eder, and David E. Keyes, Lawrence Livermore National Laboratory;
Rolf Rabenseifner, High Performance Computing Center Stuttgart
Level: 25% Introductory
| 40% Intermediate | 35% Advanced
Abstract:
Teraflop performance is no longer
a thing of the future. Indeed, advances in application computing continue
to boggle the mind. What does it really take to get a major application
performing at the "extreme" level? How do the challenges vary
from cluster computing to the largest architectures? In the introductory
material, we provide an overview of terminology, hardware, performance issues
and software tools. Then, we draw from a series of large-scale application
codes and discuss specific challenges and problems encountered in parallelizing
these applications. The applications, some of which are drawn from a new
book ("Industrial Strength Parallel Computing," Morgan Kaufmann
Publishers, 2000), are a mix of industrial and government applications including
aerospace, biomedical sciences, materials processing and design, and plasma
and fluid dynamics. We also consider applications that were winners of Gordon
Bell prizes for parallel performance. Advanced topics cover parallel I/O
and file systems and combining MPI with Pthreads and/or OpenMP. |
M6 Monday, Full Day
Room A209
Title: Programming with
the Distributed Shared-Memory Model
Presenters: William Carlson,
IDA Center for Computing Sciences; Tarek El-Ghazawi, The George Washington
University; Bob Numrich, Cray Inc.; Kathy Yelick, University of California
at Berkeley
Level: 30% Beginner | 50%
Intermediate | 20% Advanced
Abstract:
The distributed shared-memory
programming paradigm has been receiving rising attention. Recent developments
have resulted in viable distributed shared memory languages that are gaining
vendors support, and several early compilers have been developed.
This programming model has the potential of achieving a balance between
ease-of-programming and performance. As in the shared-memory model, programmers
need not to explicitly specify data accesses. Meanwhile, programmers can
exploit data locality using a model that enables the placement of data close
to the threads that process them, to reduce remote memory accesses.
In this tutorial, we present the
fundamental concepts associated with this programming model. These include
execution models, synchronization, workload distribution, and memory consistency.
We then introduce the syntax and semantics of three parallel programming
language instances with growing interest. These are the Unified Parallel
C or UPC, a parallel extension to ANSI C which is developed by a consortium
of academia, industry, and government; Co-Array FORTRAN, which is developed
at Cray; and Titanium, a JAVA implementation from UCB. It will be shown
through experimental case studies that optimized distributed shared memory
programs can be competitive with message passing codes, without significant
departure from the ease of programming of the shared memory model. |
M7 Monday, Full Day
Room A107
Title: Data Mining for
Scientific and Engineering Applications
Presenters: Robert Grossman,
University of Illinois at Chicago & Magnify, Inc.; Chandrika Kamath,
Lawrence Livermore National Laboratory; Vipin
Kumar, Army High Performance Research Center, University of Minnesota
Level: 50% Introductory
| 30% Intermediate | 20% Advanced
Abstract:
Due to advances in information
technology and high performance computing, very large data sets are becoming
available in many scientific disciplines. The rate of production of such
data far outstrips our ability to analyze them manually. For example, a
computational simulation can generate tera-bytes of data within a few hours,
whereas human analysts may take several weeks to analyze these data sets.
Other examples include several digital sky surveys, and data sets from the
fields of medical imaging, bioinformatics, and remote sensing. As a result,
there is an increasing interest in various scientific communities to explore
the use of emerging data mining techniques for the analysis of these large
data sets.
Data mining is the semi-automatic
discovery of patterns, associations, changes, anomalies, and statistically
significant structures and events in data. Traditional data analysis is
assumption driven as a hypothesis is formed and validated against the data.
Data mining, in contrast, is discovery driven as the patterns are automatically
extracted from data. The goal of the tutorial is to provide researchers
and practitioners in the area of Supercomputing with an introduction to
data mining and its application to several scientific and engineering domains,
including astrophysics, medical imaging, computational fluid dynamics, structural
mechanics, and ecology. |
M8 Monday, Half Day, AM
Room A108
Title: Performance Tuning
Using Hardware Counter Data
Presenters: Shirley Moore,
University of Tennessee; Nils Smeds, Parallelldatorcentrum
Level: 30% Introductory
| 40% Intermediate | 30% Advanced
Abstract:
This tutorial concerns the performance
counter Application Programmers Interface (PAPI) and its use for HPC application
developers as well as HPC tool developers. PAPI is a specification and reference
implementation of a cross-platform interface to hardware performance counters.
Using PAPI, a developer need not re-adapt his/her measurement techniques
for each new hardware platform the application is to run on. Furthermore,
PAPI implements abstractions needed for third-party performance evaluation
tools. With a mature platform-independent interface to the hardware counters,
tool developers can dedicate their efforts to enhancing the functionality
of their tools without the need for re-adapting the tool for new hardware
platforms. The tutorial will cover the use of PAPI directly by an application
developer as well as discuss the mechanisms by which PAPI provides a platform
independent abstraction for tool developers. See http://icl.cs.utk.edu/projects/papi/
for more information about PAPI.
Related Tutorials: S3
Practical Automatic Performance Analysis, M11
Performance Technology for Complex Parallel Systems |
M9 Monday, Half Day, AM
Room A110
Title: Benchmarks, Results,
and Tricks the vendors dont tell you.
Presenters: Robb Graham
and Henry Newman, Instrumental Inc
Level: 10% Beginner | 50%
Intermediate | 40% Advanced
Abstract:
Benchmarks are often performed
to determine which vendors machine is best suited for the customers
needs. These benchmarks must be constructed in a fashion that will ensure
the results will be an accurate representation of the customers needs. Vendors,
in their efforts to enhance the benchmarks, may over optimize the code and
system. This over optimization will increase the customers timeline
performance expectations. Solid benchmarking techniques can help mitigate
this problem. This tutorial will cover how to create rules and benchmarks
for accurate performance predictions, and how to use these benchmarks for
timeline performance modeling and predictions. |
M10 Monday, Half Day, AM
Room A112
Title: Parallel Partitioning
Software for Static, Adaptive, and Multi-phase Computations
Presenters: George Karypis,
University of Minnesota; Karen Devine, Sandia National Laboratories;
Level: 25% Introductory
| 50% Intermediate | 25% Advanced
Abstract:
In recent years, a number of scalable
and high quality partitioning algorithms have been developed that are used
extensively for decomposing scientific computations on parallel computers.
The goal of this tutorial is to provide an overview of stand-alone graph
partitioning packages (ParMetis & Jostle), and of higher-level tools
for load balancing adaptive computations (Zoltan & Drama). The tutorial
will cover both static and dynamic computations as well as recently developed
algorithms and software packages suited for emerging multi-physics and multi-phase
computations. |
M11 Monday, Half Day, PM
Room A108
Title: Performance Technology
for Complex Parallel Systems
Presenters: Allen D. Malony
and Sameer Shende, University of Oregon; Bernd Mohr, Research Centre Juelich
Level: 10% Introductory
| 50% Intermediate | 40% Advanced
Abstract:
Fundamental to the development
and use of parallel systems is the ability to observe, analyze, and understand
their performance. However, the growing complexity of parallel systems challenge
performance technologists to produce tools and methods that are at once
robust (scalable, extensible, configurable) and ubiquitous (cross-platform,
cross-language). This half-day tutorial will focus on performance analysis
in complex parallel systems which include multi-threading, clusters of SMPs,
mixed-language programming, and hybrid parallelism. Several representative
complexity scenarios will be presented to highlight two fundamental performance
analysis concerns: 1) the need for tight integration of performance observation
(instrumentation and measurement) technology with sophisticated programming
environments and system platforms, and 2) the ability to map execution performance
data to high-level programming abstractions
implemented on layered, hierarchical
software systems. The tutorial will describe the TAU performance system
in detail and demonstrate how it is used to successfully address the performance
analysis concerns in each complexity scenario discussed. Tutorial attendees
will be introduced to TAU's instrumentation, measurement, and analysis
tools, and shown how to configure the TAU performance system for specific
needs. A description of future enhancements of the TAU performance framework,
including a demonstration of a prototype for automatic bottleneck analysis,
will conclude the tutorial.
Related Tutorials: S3
Practical Automatic Performance Analysis, M8
Performance Tuning Using Hardware Counter Data |
M12 Monday, Half Day, PM
Room A110
Title: InfiniBand Architecture
and What Does It Bring to High Performance Computing?
Presenter: Dhabaleswar
K. Panda, The Ohio State University
Level: 20% Introductory
| 40% Intermediate | 40% Advanced
Abstract:
The emerging InfiniBand Architecture
(IBA) standard is generating a lot of excitement towards building next generation
computing systems in a radical different manner. This is leading to the
following common questions among many scientists, engineers, managers, developers,
and users associated with High-Performance Computing (HPC): 1) What is IBA?
2) How is it different from other on-going developments and standardization
effort such as Virtual Interface Architecture (VIA), PCI-X, Rapid I/O, etc.?
and 3) What unique features and benefits does IBA bring to HPC?
This tutorial is designed to provide
answers to the above questions. We will start with the background behind
the origin of the IBA standard. Then we will make the attendees familiar
with the novel features of IBA (such as elimination of the standard PCI-bus
based architecture, provision for multiple transport services, mechanisms
to support QoS and protection in the network, uniform treatment of interprocessor
communication and I/O, and support for low latency communication with Virtual
Interface). We will compare and contrast the IBA standard with other on-going
developments/standards. We will show how the IBA standard facilitates the
next generation computing systems to be designed not only to deliver high
performance but also RAS (Reliability, Availability, and Serviceability).
Open research challenges in designing IBA-based HPC systems will be outlined.
The tutorial will conclude with an overview of on-going IBA related research
projects and products.
More information on this tutorial and the speaker
can be obtained from:
http://www.cis.ohio-state.edu/~panda/sc01_tut.html |
M13 Monday, Half Day, PM
Room A112
Title: Cache-based iterative
algorithms
Presenters: Ulrich J. Ruede,
Univeristy of Erlangen; Craig Douglas, Center for Computational Sciences,
University of Kentucky
Level: 30% Introductory,
50% Intermediate, 20% Advanced
Abstract:
In order to mitigate the effect
of the gap between the high execution speed of modern RISC CPUs and the
comparatively poor main memory performance, computer architectures nowadays
comprise several additional levels of smaller and faster cache memories
which are located physically between the processor and main memory. Efficient
program execution, i.e. high MFLOPS rates, can only be achieved if the codes
respect this hierarchical memory design. Unfortunately, today's compilers
are still far away from automatically performing code transformations like
the ones we apply to achieve remarkable speedups. As a consequence, much
of this optimization effort is left to the programmer.
In this tutorial, we will first
discuss the underlying hardware properties and then present both data layout
optimizations, like array padding for example, and data access optimizations,
like e.g., loop blocking. The application of these techniques to iterative
numerical schemes like Gauss-Seidel and multigrid can significantly
enhance their cache performance and thus reduce their execution times on
a variety of machines. We will consider both structured and unstructured
grid computations.
These techniques have been implemented
in our multigrid library DiMEPACK which is freely available on the web.
The use of this library will also be discussed. |