NPACI Archive Page
The NPACI program ended on September 30, 2004. This site is presented for archival purposes only.
For current resources at each of the partner sites, please refer to the appropriate institution site.
|
Project Leader: Joel Saltz, Ohio State University
Project URL: http://www.datacutter.org
Analysis of Coupled Subsurface Flow and Seismic
Models
Collaboration with Mary Wheeler
Implementing optimal production schedules in
oil reservoirs requires coupling of simulations, sophisticated
optimization procedures, and field measurements. These studies
are large data-driven studies as searching for optimal schedules
requires generation and analysis of very large datasets characterizing
subsurface flow and rock properties in an oil reservoir. The
objective of this project is to develop the data management,
transformation, and analysis support for large scale subsurface
flow and reservoir management studies.
In collaboration with Wheeler’s group, we have generated
a 5TB dataset from IPARS oil reservoir simulation models based
on different geostatistical input parameters. This ensemble
of datasets were generated and stored at SDSC, University
of Maryland, and Ohio State University. We have implemented
a system using the DataCutter and SRB components of NPACKage
to support remote querying and analysis of these datasets.
In this system, SRB provides remote access to dataset files
stored on storage systems (SAN, GPFS) at SDSC. A number of
data analysis scenarios implemented using DataCutter for distributed
execution of data filtering and processing operations. These
analysis scenarios involve user-defined queries for economic
evaluation as well as technical evaluation, such as determination
of representative realizations and identification of areas
of bypassed oil. A demonstration of the system was done at
SC2002. This demo showed that by using NPACKage components,
we are able to help Geoscientists in analyzing very large
datasets that are too big to download to local machines.
We are in the process of generating larger datasets
(20-40TB in size) from seismic and oil reservoir simulations
that will be stored on storage systems at multiple NPACI sites.
We plan to incorporate extensions to the current system for
coupling of simulation models and optimization procedures.
The coupling will involve querying, subsetting, and analysis
of large ensembles of distributed datasets and integration
of data from seismic and oil reservoir simulations. We will
harden and generalize the database and data analysis runtime
support using NPACKage, so that it can be used not only in
this project but also in the joint project with Scott Baden
and Phil Colella involving management and manipulation of
large datasets in adaptive mesh refinement (AMR) applications
and the joint project with Kathy Yelick involving analysis
and comparison of datasets in adaptive computation for biological
systems. The generalized support will use the DataCutter,
NWS, Globus, and SRB components of NPACKage. Globus will be
used for resource allocation and authentication; SRB will
be used for remote file access across multiple sites; NWS
will be used for resource monitoring for effective placement
of data processing operations on NPACI storage and compute
clusters; and DataCutter will be used to implement distributed
data querying, filtering, and user-defined processing of data.
Support for Data Subsetting and data Analysis
in Telescience
Collaboration with Mark Ellisman
This project targets the development of the
middleware infrastructure for subsetting and distributed processing
of image datasets stored in the Telescience system. The middleware
infrastructure is being implemented using the DataCutter,
SRB, and Globus components of NPACKage. We have already developed
a prototype system for querying and processing of large microscopy
images stored in Telescience SRB repositories. The system
involves the use of Globus for authentication, DataCutter
for data subsetting and distributed execution of data processing
operations, SRB for remote file access, and the Telescience
portal for user interface for query formulation and viewing
of results. A demonstration of the system was given at SC2002.
The demonstration involved the use of the Telescience portal
to formulate a client query, the DataCutter/SRB system to
retrieve the subsets of the microscopy images from SRB repositories
at SDSC and Globus/DataCutter to process the data on a cluster
of PCs at OSU.
We have set up a small cluster at OSU that will
be used as a node in the BIRN/Telescience system and installed
the SRB and DataCutter components on this cluster. This node
will be used to serve additional datasets from Pathology digitized
microscopy slides. The current middleware system will be extended
for 1) enhanced database and data subsetting support for querying
of datasets at multiple sites and 2) improved image processing
by integrating the Insight Segmentation and Registration Toolkit.
Using this system, a client can query the microscopy datasets
stored in SRB through the Telescience portal, and extract
the subset of the microscopy images as requested by the query
and process them on cluster systems at SDSC and OSU.
Terascale Visualization Project
Collaboration with Art Olson
Visualization of output from large scale simulations
is an important process in analysis of data in many science
and engineering fields and efficient extraction of data of
interest from distributed datasets is a key step in this process.
The goal of this alpha project is to develop the database
support to enable rapid access to subsets of remote, large,
and distributed datasets generated by simulations for visualization
purposes. The database support is being layered on top of
NPACKage. This project is using the DataCutter component of
the NPACKage suite to support data declustering on parallel
storage clusters and efficient data subsetting and data extraction
operations. A prototype system has been developed that allows
a user to browse a 3D volume stored on a remote cluster. The
prototype consists of a client graphical user interface that
can perform texture-based volume rendering of a 3D volume
and a parallel backend server, implemented using DataCutter,
that performs user-specified subsetting and subsampling of
large disk-resident 3D grids and transfer of data to the client.
The prototype system was demonstrated at SC2002 by accessing
a collection of datasets on a cluster at Ohio State University.
A system extending the prototype implementation will be deployed
to visualize large datasets from oil reservoir simulations
(currently 5 Terabytes of data) stored at Ohio State University,
University of Maryland, and SDSC.
Data Analysis Software for Distributed-Memory
Adaptive Computations
Collaboration with Kathy Yelick
This project is developing the software support
for the developers and users of adaptive computational simulation
codes in biological systems research to carry out comparative
analysis operations on large datasets. The software support
will allow application developers to identify spatial regions
characterized by divergence in data values and visualize differences
in those data values. We have implemented a prototype system
using the DataCutter component of NPACKage to query and transform
datasets generated by the Titanium codes for visualization.
We will develop an integrated suit of tools for database support
for querying and data subsetting, and execution of data comparison
and interpolation operations on very large datasets for studies
in heart models. This suite will allow efficient debugging
and validation of large scale adaptive mesh simulation codes
that are used in many scientific, biomedical, and engineering
applications. The software suite will use the DataCutter,
NWS, Globus, and SRB components of NPACKage.
|