National Partnership for Advanced Computational Infrastructure: Archives These pages are a copy of the original www.npaci.edu website, and should be used for historical reference only. Please select an item from the toolbar below to be taken to the latest information on that subject. [ SDSC | User Services | Applications | Allocations | Consulting | SAC | Datastar | Training ] ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ [USEMAP:header.jpg] ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ ABOUT NPACI Grid What Is It? Case Studies Grid Monitor Testbed Info Terminology FAQ USER REFERENCE Getting Started Tutorial Certificates Resources NPACKage HotPage LEARN MORE Events Web Links Contacts ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ NPACI Archive Page The NPACI program ended on September 30, 2004. This site is presented for archival purposes only. For current resources at each of the partner sites, please refer to the appropriate institution site. * Texas Advanced Computing Center (TACC) * San Diego Supercomputer Center (SDSC) * University of Michigan, Ann Arbor Case Study - Data Intensive Grid Computing and Exploration in the Geosciences Project Leader: Joel Saltz, Ohio State University Project URL: http://www.datacutter.org Analysis of Coupled Subsurface Flow and Seismic Models Collaboration with Mary Wheeler Implementing optimal production schedules in oil reservoirs requires coupling of simulations, sophisticated optimization procedures, and field measurements. These studies are large data-driven studies as searching for optimal schedules requires generation and analysis of very large datasets characterizing subsurface flow and rock properties in an oil reservoir. The objective of this project is to develop the data management, transformation, and analysis support for large scale subsurface flow and reservoir management studies. In collaboration with Wheeler's group, we have generated a 5TB dataset from IPARS oil reservoir simulation models based on different geostatistical input parameters. This ensemble of datasets were generated and stored at SDSC, University of Maryland, and Ohio State University. We have implemented a system using the DataCutter and SRB components of NPACKage to support remote querying and analysis of these datasets. In this system, SRB provides remote access to dataset files stored on storage systems (SAN, GPFS) at SDSC. A number of data analysis scenarios implemented using DataCutter for distributed execution of data filtering and processing operations. These analysis scenarios involve user-defined queries for economic evaluation as well as technical evaluation, such as determination of representative realizations and identification of areas of bypassed oil. A demonstration of the system was done at SC2002. This demo showed that by using NPACKage components, we are able to help Geoscientists in analyzing very large datasets that are too big to download to local machines. We are in the process of generating larger datasets (20-40TB in size) from seismic and oil reservoir simulations that will be stored on storage systems at multiple NPACI sites. We plan to incorporate extensions to the current system for coupling of simulation models and optimization procedures. The coupling will involve querying, subsetting, and analysis of large ensembles of distributed datasets and integration of data from seismic and oil reservoir simulations. We will harden and generalize the database and data analysis runtime support using NPACKage, so that it can be used not only in this project but also in the joint project with Scott Baden and Phil Colella involving management and manipulation of large datasets in adaptive mesh refinement (AMR) applications and the joint project with Kathy Yelick involving analysis and comparison of datasets in adaptive computation for biological systems. The generalized support will use the DataCutter, NWS, Globus, and SRB components of NPACKage. Globus will be used for resource allocation and authentication; SRB will be used for remote file access across multiple sites; NWS will be used for resource monitoring for effective placement of data processing operations on NPACI storage and compute clusters; and DataCutter will be used to implement distributed data querying, filtering, and user-defined processing of data. Support for Data Subsetting and data Analysis in Telescience Collaboration with Mark Ellisman This project targets the development of the middleware infrastructure for subsetting and distributed processing of image datasets stored in the Telescience system. The middleware infrastructure is being implemented using the DataCutter, SRB, and Globus components of NPACKage. We have already developed a prototype system for querying and processing of large microscopy images stored in Telescience SRB repositories. The system involves the use of Globus for authentication, DataCutter for data subsetting and distributed execution of data processing operations, SRB for remote file access, and the Telescience portal for user interface for query formulation and viewing of results. A demonstration of the system was given at SC2002. The demonstration involved the use of the Telescience portal to formulate a client query, the DataCutter/SRB system to retrieve the subsets of the microscopy images from SRB repositories at SDSC and Globus/DataCutter to process the data on a cluster of PCs at OSU. We have set up a small cluster at OSU that will be used as a node in the BIRN/Telescience system and installed the SRB and DataCutter components on this cluster. This node will be used to serve additional datasets from Pathology digitized microscopy slides. The current middleware system will be extended for 1) enhanced database and data subsetting support for querying of datasets at multiple sites and 2) improved image processing by integrating the Insight Segmentation and Registration Toolkit. Using this system, a client can query the microscopy datasets stored in SRB through the Telescience portal, and extract the subset of the microscopy images as requested by the query and process them on cluster systems at SDSC and OSU. Terascale Visualization Project Collaboration with Art Olson Visualization of output from large scale simulations is an important process in analysis of data in many science and engineering fields and efficient extraction of data of interest from distributed datasets is a key step in this process. The goal of this alpha project is to develop the database support to enable rapid access to subsets of remote, large, and distributed datasets generated by simulations for visualization purposes. The database support is being layered on top of NPACKage. This project is using the DataCutter component of the NPACKage suite to support data declustering on parallel storage clusters and efficient data subsetting and data extraction operations. A prototype system has been developed that allows a user to browse a 3D volume stored on a remote cluster. The prototype consists of a client graphical user interface that can perform texture-based volume rendering of a 3D volume and a parallel backend server, implemented using DataCutter, that performs user-specified subsetting and subsampling of large disk-resident 3D grids and transfer of data to the client. The prototype system was demonstrated at SC2002 by accessing a collection of datasets on a cluster at Ohio State University. A system extending the prototype implementation will be deployed to visualize large datasets from oil reservoir simulations (currently 5 Terabytes of data) stored at Ohio State University, University of Maryland, and SDSC. Data Analysis Software for Distributed-Memory Adaptive Computations Collaboration with Kathy Yelick This project is developing the software support for the developers and users of adaptive computational simulation codes in biological systems research to carry out comparative analysis operations on large datasets. The software support will allow application developers to identify spatial regions characterized by divergence in data values and visualize differences in those data values. We have implemented a prototype system using the DataCutter component of NPACKage to query and transform datasets generated by the Titanium codes for visualization. We will develop an integrated suit of tools for database support for querying and data subsetting, and execution of data comparison and interpolation operations on very large datasets for studies in heart models. This suite will allow efficient debugging and validation of large scale adaptive mesh simulation codes that are used in many scientific, biomedical, and engineering applications. The software suite will use the DataCutter, NWS, Globus, and SRB components of NPACKage. _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ ©2003 o Funded by the National Science Foundation through the National Partnership for Advanced Computational Infrastructure _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________