Concurrency and Computation: Practice and Experience

Published Papers for 2002

Journal Vision

WILEY Journal Home Page

Papers under review through 2004

Editorial Board


2002 Volume 14 Articles

Article ID: CPE614

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Java for high-performance network-based computing: a survey
Volume ID 14
Issue ID 1
Date Jan 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.614
Article ID CPE614
Author Name(s) M. Lobosco1C. Amorim2O. Loques3
Author Email(s) lobosco@cos.ufrj.br1 amorim@cos.ufrj.br2 loques@ic.uff.br3
Affiliation(s) COPPE, Engenharia de Sistemas, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil 1 2 Instituto de Computação, Universidade Federal Fluminense, Brazil 3
Keyword(s) Java, parallel JVM implementation, high-performance computing, network-based computing,
Abstract
There has been an increasing research interest in extending the use of Java towards high-performance demanding applications such as scalable Web servers, distributed multimedia applications, and large-scale scientific applications. However, extending Java to a multicomputer environment and improving the low performance of current Java implementations pose great challenges to both the systems developer and application designer. In this survey, we describe and classify 14 relevant proposals and environments that tackle Java"s performance bottlenecks in order to make the language an effective option for high-performance network-based computing. We further survey significant performance issues while exposing the potential benefits and limitations of current solutions in such a way that a framework for future research efforts can be established. Most of the proposed solutions can be classified according to some combination of three basic parameters: the model adopted for inter-process communication, language extensions, and the implementation strategy. In addition, where appropriate to each individual proposal, we examine other relevant issues, such as interoperability, portability, and garbage collection. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE615

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A flexible framework for consistency management
Volume ID 14
Issue ID 1
Date Jan 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.615
Article ID CPE615
Author Name(s) S. Weber1P. A. Nixon2B. Tangney3
Author Email(s)
Affiliation(s) Distributed Systems Group, Department of Computer Science, Trinity College, Dublin 2, Ireland 1 2 3
Keyword(s) distributed shared memory, consistency model, coherency protocol, flexibility, costumizability,
Abstract
Recent distributed shared memory (DSM) systems provide increasingly more support for the sharing of objects rather than portions of memory. However, like earlier DSM systems these distributed shared object systems (DSO) still force developers to use a single protocol, or a small set of given protocols, for the sharing of application objects. This limitation prevents the applications from optimizing their communication behaviour and results in unnecessary overhead. A current general trend in software systems development is towards customizable systems, for example frameworks, reflection, and aspect-oriented programming all aim to give the developer greater flexibility and control over the functionality and performance of their code. This paper describes a novel object-oriented framework that defines a DSM system in terms of a consistency model and an underlying coherency protocol. Different consistency models and coherency protocols can be used within a single application because they can be customized, by the application programmer, on a per-object basis. This allows application specific semantics to be exploited at a very fine level of granularity and with a resulting improvement in performance. The framework is implemented in JAVA and the speed-up obtained by a number of applications that use the framework is reported. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE616

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title An analysis of VI Architecture primitives in support of parallel and distributed communication
Volume ID 14
Issue ID 1
Date Jan 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.616
Article ID CPE616
Author Name(s) Andrew Begel1Philip Buonadonna2David E. Culler3David Gay4
Author Email(s) philipb@cs.berkeley.edu2
Affiliation(s) University of California, Berkeley, Soda Hall, Berkeley, CA 94720-1776, U.S.A. 1 2 3 4
Keyword(s) Active Messages, cluster-based networking, Infiniband, network abstractions, network I/O, VI Architecture,
Abstract
We present the results of a detailed study of the Virtual Interface (VI) paradigm as a communication foundation for a distributed computing environment. Using Active Messages and the Split-C global memory model, we analyze the inherent costs of using VI primitives to implement these high-level communication abstractions. We demonstrate a minimum mapping cost (i.e. the host processing required to map one abstraction to a lower abstraction) of 5.4 μs for both Active Messages and Split-C using four-way 550 MHz Pentium III SMPs and the Myrinet network. We break down this cost to the use of individual VI primitives in supporting flow control, buffer management and event processing and identify the completion queue as the source of the highest overhead. Bulk transfer performance plateaus at 44 Mbytes/s for both implementations are due to the addition of fragmentation requirements. Based on this analysis, we present the implications for the VI successor, Infiniband. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE619

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel implementation of the fluid particle model for simulating complex fluids in the mesoscale
Volume ID 14
Issue ID 2
Date Feb 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.619
Article ID CPE619
Author Name(s) Krzysztof Boryczko1Witold Dzwinel2David A. Yuen3
Author Email(s) dwitek@msi.umn.edu2
Affiliation(s) AGH Institute of Computer Science, al. Mickiewicza 30, 30-059, Kraków, Poland 1 2 Minnesota Supercomputer Institute, University of Minnesota, Minneapolis, MN 55415-1227, U.S.A. 3
Keyword(s) fluid particles, parallel algorithm, checkerboard periodic boundary conditions, phase separation, dispersion, blood flow simulation,
Abstract
Dissipative particle dynamics (DPD) and its generalization-the fluid particle model (FPM)-represent the ‘fluid particle’ approach for simulating fluid-like behavior in the mesoscale. Unlike particles from the molecular dynamics (MD) method, the ‘fluid particle’ can be viewed as a ‘droplet’ consisting of liquid molecules. In the FPM, ‘fluid particles’ interact by both central and non-central, short-range forces with conservative, dissipative and Brownian character. In comparison to MD, the FPM method in three dimensions requires two to three times more memory load and a three times greater communication overhead. Computational load per step per particle is comparable to MD due to the shorter interaction range allowed between ‘fluid particles’ than between MD atoms. The classical linked-cells technique and decomposing the computational box into strips allow for rapid modifications of the code and for implementing non-cubic computational boxes. We show that the efficiency of the FPM code depends strongly on the number of particles simulated, the geometry of the box and the computer architecture. We give a few examples from long FPM simulations involving up to 8 million fluid particles and 32 processors. Results from FPM simulations in three dimensions of the phase separation in binary fluid and dispersion of the colloidal slab are presented. A scaling law for symmetric quench in phase separation has been properly reconstructed. We also show that the microstructure of dispersed fluid depends strongly on the contrast between the kinematic viscosities of this fluid phase and the bulk phase. This FPM code can be applied for simulating mesoscopic flow dynamics in capillary pipes or critical flow phenomena in narrow blood vessels. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE617

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Date movement and control substrate for parallel adaptive applications
Volume ID 14
Issue ID 2
Date Feb 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.617
Article ID CPE617
Author Name(s) Kevin Barker1Nikos Chrisochoides2Jeffrey Dobbelaere3Démian Nave4Keshav Pingali5
Author Email(s) nikos@cs.wm.edu2
Affiliation(s) Computer Science, College of William and Mary, Williamsburg, VA 23187, U.S.A. 1 2 3 Computer Science and Engineering, University of Notre Dame South Bend, IN 46556, U.S.A. 4Computer Science, Cornell University, Ithaca, NY 14853-3801, U.S.A. 5
Keyword(s) message passing, runtime system, parallel adaptive applications, mesh generation,
Abstract
In this paper, we present the Data Movement and Control Substrate (DMCS), a library which implements low-latency one-sided communication primitives for use in parallel adaptive and irregular applications. DMCS is built on top of low-level, vendor-specific communication subsystems such as LAPI (Low-level Application Programme Interface) for IBM SP machines, as well as on widely available message-passing libraries like MPI for clusters of workstations and PCs. DMCS adds a small overhead to the communication operations provided by the lower communication system. In return, DMCS provides a flexible and easy to understand application program interface for one-sided communication operations. Furthermore, DMCS is designed so that it can be easily ported and maintained by non-experts. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE618

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title SPiDER-An advanced symbolic debugger for Fortran 90/HPF programs
Volume ID 14
Issue ID 2
Date Feb 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.618
Article ID CPE618
Author Name(s) T. Fahringer1K. Sowa-Piekło2P. Czerwiński3P. Brezany4M. Bubak5R. Koppler6R. Wismüller7
Author Email(s) tf@par.univie.ac.at1
Affiliation(s) Institute for Software Science, University of Vienna, Liechtensteinstrasse 22, A-1090, Vienna, Austria 1ABB Corporate Research, ul. Starowiślna 13A, 31-038 Kraków, Poland 2 3 4 Institute of Computer Science, AGH, al. Mickiewicza 30, 30-059 Kraków, Poland 5GUP Linz, Johannes Kepler University Linz, Altenbergerstrasse 69, 4040 Linz, Austria 6Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR-TUM), Technische Universität München, D-80290 München, Germany 7
Keyword(s) debugger, data parallel programs, message passing programs,
Abstract
Debuggers play an important role in developing parallel applications. They are used to control the state of many processes, to present distributed information in a concise and clear way, to observe the execution behavior, and to detect and locate programming errors. More sophisticated debugging systems also try to improve understanding of global execution behavior and intricate details of a program. In this paper we describe the design and implementation of SPiDER, which is an interactive source-level debugging system for both regular and irregular High-Performance Fortran (HPF) programs. SPiDER combines a base debugging system for message-passing programs with a high-level debugger that interfaces with an HPF compiler. SPiDER, in addition to conventional debugging functionality, allows a single process of a parallel program to be expected or the entire program to be examined from a global point of view. A sophisticated visualization system has been developed and included in SPiDER to visualize data distributions, data-to-processor mapping relationships, and array values. SPiDER enables a programmer to dynamically change data distributions as well as array values. For arrays whose distribution can change during program execution, an animated replay displays the distribution sequence together with the associated source code location. Array values can be stored at individual execution points and compared against each other to examine execution behavior (e.g. convergence behavior of a numerical algorithm). Finally, SPiDER also offers limited support to evaluate the performance of parallel programs through a graphical load diagram. SPiDER has been fully implemented and is currently being used for the development of various real-world applications. Several experiments are presented that demonstrate the usefulness of SPiDER. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE603

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Optimizing the distribution of large data sets in theory and practiceThe original version of this article was first published as ‘Rauch F, Kurmann C, Stricker TM. Optimizing the distribution of large data sets in theory and practice. Euro-Par 2000-Parallel Processing (Lecture Notes in Computer Science, vol. 1900), Bode A, Ludwig T, Karl W, Wismüller R (eds.). Springer, 2000; 1118-1131’, and is reproduced here by kind permission of the publisher.
Volume ID 14
Issue ID 3
Date March 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.603
Article ID CPE603
Author Name(s) Felix Rauch1Christian Kurmann2Thomas M. Stricker3
Author Email(s) rauch@inf.ethz.ch1
Affiliation(s) Laboratory for Computer Systems, ETH - Swiss Institute of Technology, CH-8092 Zürich, Switzerland 1 2 3
Keyword(s) software installation and maintenance, data streaming, partition management, communication modelling, multicast, input output systems,
Abstract
Multicasting large amounts of data efficiently to all nodes of a PC clusteris an important operation. In the form of a partition cast it can be used to replicate entire software installations by cloning. Optimizing a partition cast for a given cluster of PCs reveals some interesting architectural tradeoffs, since the fastest solution does not only depend on the network speed and topology, but remains highly sensitive to other resources like the disk speed, the memory system performance and the processing power in the participating nodes. We present an analytical model that guides an implementation towards an optimal configuration for any given PC cluster. The model is validated by measurements on our cluster using Gigabit- and Fast-Ethernet links. The resulting simple software tool, Dolly, can replicate an entire 2 GB Windows NT image onto 24 machines in less than 5 min. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE604

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Implementation and evaluation of a communication intensive application on the EARTH multithreaded systemThe original version of this article was first published as ‘Theobald KB, Kumar R, Agrawal G, Heber G, Thulasiram RK, Gao GR. Implementation and evaluation of a communication intensive application on the EARTH multithreaded system. Euro-Par 2000-Parallel Processing (Lecture Notes in Computer Science, vol. 1900), Bode A, Ludwig T, Karl W, Wismüller R (eds.). Springer, 2000; 625-637’, and is reproduced here by kind permission of the publisher.
Volume ID 14
Issue ID 3
Date March 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.604
Article ID CPE604
Author Name(s) Kevin B. Theobald1Rishi Kumar2Gagan Agrawal3Gerd Heber4Ruppa K. Thulasiram5Guang R. Gao6
Author Email(s)
Affiliation(s) Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, U.S.A. 1 2 Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, U.S.A. 3Cornell Theory Center, Cornell University, Ithaca, NY 14853, U.S.A. 4 5 6
Keyword(s) conjugate gradient, EARTH, multithreading, parallel computing, sparse matrices,
Abstract
This paper reports a study of sparse Matrix Vector Multiplication (MVM) on a parallel computing platform based on a fine-grained multithreaded program execution model. Such sparse MVM computations, when parallelized without performing graph partitioning, suffers a very high communication to computation ratio, and is well known to have a very limited scalability on traditional distributed-memory machines. The particular multithreaded system we use is the Efficient Architecture for Running THreads (EARTH) model, which can be implemented from off-the-shelf processors. With the Class B input sparse matrix from the NAS CG benchmark (75 000 rows), we attain an absolute speedup of 90 on 120 nodes of a distributed memory configuration. This is achieved without using inspector/executor or graph partitioning, or any communication minimization phase, which means that similar results can be expected for adaptive problems as well. High scalability is achieved because of a number of characteristics of the EARTH architecture: local synchronizations, low communication overheads, ability to overlap communication and computation, and low context-switching costs. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE605

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel static and dynamic multi-constraint graph partitioningThe original version of this article was first published as ‘Schloegel K, Karypis G, Kumar V. Parallel static and dynamic multi-constraint graph partitioning. Euro-Par 2000-Parallel Processing (Lecture Notes in Computer Science, vol. 1900), Bode A, Ludwig T, Karl W, Wismüller R (eds.). Springer, 2000; 296-310’, and is reproduced here by kind permission of the publisher.
Volume ID 14
Issue ID 3
Date March 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.605
Article ID CPE605
Author Name(s) Kirk Schloegel1George Karypis2Vipin Kumar3
Author Email(s) kirk@cs.umn.edu1
Affiliation(s) Army HPC Research Center, Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CS Building, 200 Union St., Minneapolis, MN 55455, U.S.A. 1 2 3
Keyword(s) multi-constraint graph partitioning, parallel graph partitioning, multilevel graph partitioning, multi-phase scientific simulation,
Abstract
Sequential multi-constraint graph partitioners have been developed to address the static load balancing requirements of multi-phase simulations. These work well when (i) the graph that models the computation fits into the memory of a single processor, and (ii) the simulation does not require dynamic load balancing. The efficient execution of very large or dynamically adapting multi-phase simulations on high-performance parallel computers requires that the multi-constraint partitionings are computed in parallel. This paper presents a parallel formulation of a multi-constraint graph-partitioning algorithm, as well as a new partitioning algorithm for dynamic multi-phase simulations. We describe these algorithms and give experimental results conducted on a 128-processor Cray T3E. These results show that our parallel algorithms are able to efficiently compute partitionings of similar edge-cuts as serial multi-constraint algorithms, and can scale to very large graphs. Our dynamic multi-constraint algorithm is also able to minimize the data redistribution required to balance the load better than a naive scratch-remap approach. We have shown that both of our parallel multi-constraint graph partitioners are as scalable as the widely-used parallel graph partitioner implemented in PARMETIS. Both of our parallel multi-constraint graph partitioners are very fast, as they are able to compute three-constraint 128-way partitionings of a 7.5 million vertex graph in under 7 s on 128 processors of a Cray T3E. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE601

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: Euro-Par 2000
Volume ID 14
Issue ID 3
Date March 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.601
Article ID CPE601
Author Name(s) Roland Wismüller1
Author Email(s)
Affiliation(s) 1
Keyword(s)
Abstract
No abstract

Article ID: CPE602

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A callgraph-based search strategy for automated performance diagnosisThe original version of this article was first published as ‘Cain HW, Miller BP, Wylie BJN. A callagraph-based search strategy for automated performance diagnosis. Euro-Par 2000-Parallel Processing (Lecture Notes in Computer Science, vol. 1900), Bode A, Ludwig T, Karl W, Wismüller R (eds.). Springer, 2000; 108-122’, and is reproduced here by kind permission of the publisher.
Volume ID 14
Issue ID 3
Date March 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.602
Article ID CPE602
Author Name(s) Harold W. Cain1Barton P. Miller2Brian J. N. Wylie3
Author Email(s) bart@cs.wisc.edu2
Affiliation(s) Computer Sciences Department, University of Wisconsin, Madison, WI 53706-1685, U.S.A. 1 2 3
Keyword(s) performance diagnosis, dynamic instrumentation, Paradyn,
Abstract
We introduce a new technique for automated performance diagnosis, using the program"s callgraph. We discuss our implementation of this diagnosis technique in the Paradyn Performance Consultant. Our implementation includes the new search strategy and new dynamic instrumentation to resolve pointer-based dynamic call sites at run-time. We compare the effectiveness of our new technique to the previous version of the Performance Consultant for several sequential and parallel applications. Our results show that the new search method performs its search while inserting dramatically less instrumentation into the application, resulting in reduced application perturbation and consequently a higher degree of diagnosis accuracy. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE634

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A quality of service driven concurrency framework for object-based middleware
Volume ID 14
Issue ID 4
Date Apr 10 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.634
Article ID CPE634
Author Name(s) Geoff Coulson1Oveeyen Moonian2
Author Email(s) geoff@comp.lancs.ac.uk1
Affiliation(s) Distributed Multimedia Research Group, Computing Department, Lancaster University, Lancaster LA1 4YR, U.K. 1 2
Keyword(s) quality of service, concurrency framework, CPU scheduling, object-based middleware,
Abstract
Threads play a key role in object-based middleware platforms. Implementers of such platforms can select either kernel or user-level threads, but neither of these options are ideal. In this paper we introduce Application Scheduler Contexts (ASCs) which flexibly combine both types of thread and thereby attempt to exploit the advantages of each. Multiple ASCs can co-exist, each with their own concurrency semantics and scheduling policy. ASCs also support quality of service (QoS) configurability, and define their own QoS schema. We show how ASCs can be efficiently implemented and how they can usefully be exploited in middleware environments. We also provide a quantitative evaluation that demonstrates the feasibility of the ASC concept in performance terms. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE635

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Advanced concurrency control in Java
Volume ID 14
Issue ID 4
Date Apr 10 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.635
Article ID CPE635
Author Name(s) Pascal Felber1Michael K. Reiter2
Author Email(s) pascal@research.bell-labs.com1
Affiliation(s) Bell Laboratories, Murray Hill, NJ 07974, U.S.A. 1Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A. 2
Keyword(s) concurrency control, isolation, transactions, Java,
Abstract
Developing concurrent applications is not a trivial task. As programs grow larger and become more complex, advanced concurrency control mechanisms are needed to ensure that application consistency is not compromised. Managing mutual exclusion on a per-object basis is not sufficient to guarantee isolation of sets of semantically-related actions. In this paper, we consider ‘atomic blocks’, a simple and lightweight concurrency control paradigm that enables arbitrary blocks of code to access multiple shared objects in isolation. We evaluate various strategies for implementing atomic blocks in Java, in such a way that concurrency control is transparent to the programmer, isolation is preserved, and concurrency is maximized. We discuss these concurrency control strategies and evaluate them in terms of complexity and performance. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE636

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The Virtual Service Grid: an architecture for delivering high-end network services
Volume ID 14
Issue ID 4
Date Apr 10 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.636
Article ID CPE636
Author Name(s) Jon B. Weissman1Byoung-Dai Lee2
Author Email(s) jon@cs.umn.edu1
Affiliation(s) Department of Computer Science and Engineering, University of Minnesota, Twin Cities, Minneapolis, MN 55455, U.S.A. 1 2
Keyword(s) Grid computing, parallel computing, network services, resource management,
Abstract
This paper presents the design of a new system architecture, Virtual Service Grid (VSG), for delivering high-performance network services. The VSG is based on the concept of the virtual service which provides location, replication, and fault transparency to clients accessing remotely deployed high-end services. One of the novel features of the virtual service is the ability to self-scale in response to client demand. The VSG exploits network and service information to make adaptive dynamic replica selection, creation, and deletion decisions. We describe the VSG architecture, middleware, and replica management policies. We have deployed the VSG on a wide-area Internet testbed to evaluate its performance. The results indicate that the VSG can deliver efficient performance for a wide range of client workloads, both in terms of reduced response time and in the utilization of system resources. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE652

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Managing application complexity in the SAMRAI object-oriented frameworkThis article is a U.S. Government work and is in the public domain in the U.S.A.
Volume ID 14
Issue ID 5
Date Apr 25 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.652
Article ID CPE652
Author Name(s) Richard D. Hornung1Scott R. Kohn2
Author Email(s) hornung@llnl.gov1
Affiliation(s) Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, U.S.A. 1 2
Keyword(s) object-oriented programming, design patterns, adaptive mesh refinement,
Abstract
A major challenge facing software libraries for scientific computing is the ability to provide adequate flexibility to meet sophisticated, diverse, and evolving application requirements. Object-oriented design techniques are valuable tools for capturing characteristics of complex applications in a software architecture. In this paper, we describe certain prominent object-oriented features of the SAMRAI software library that have proven to be useful in application development. SAMRAI is used in a variety of applications and has demonstrated a substantial amount of code and design re-use in those applications. This flexibility and extensibility is illustrated with three different application codes. We emphasize two important features of our design. First, we describe the composition of complex numerical algorithms from smaller components which are usable in different applications. Second, we discuss the extension of existing framework components to satisfy new application needs. Published in 2002 by John Wiley & Sons, Ltd.

Article ID: CPE650

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: Software architectures for scientific applications
Volume ID 14
Issue ID 5
Date Apr 25 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.650
Article ID CPE650
Author Name(s) Manish Parashar1
Author Email(s)
Affiliation(s) 1
Keyword(s)
Abstract
No abstract

Article ID: CPE651

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The CCA core specification in a distributed memory SPMD framework
Volume ID 14
Issue ID 5
Date Apr 25 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.651
Article ID CPE651
Author Name(s) Benjamin A. Allan1Robert C. Armstrong2Alicia P. Wolfe3Jaideep Ray4David E. Bernholdt5James A. Kohl6
Author Email(s) rob@ca.sandia.gov2
Affiliation(s) Sandia National Laboratories, Livermore, CA, U.S.A. 1 2 3 4 Oak Ridge National Laboratory, Oak Ridge, TN, U.S.A. 5 6
Keyword(s) common component architecture, high-performance computing, CCAFFEINE, peer components, SPMD, framework,
Abstract
We present an overview of the Common Component Architecture (CCA) core specification and CCAFFEINE, a Sandia National Laboratories framework implementation compliant with the draft specification. CCAFFEINE stands for CCA Fast Framework Example In Need of Everything; that is, CCAFFEINE is fast, lightweight, and it aims to provide every framework service by using external, portable components instead of integrating all services into a single, heavy framework core. By fast, we mean that the CCAFFEINE glue does not get between components in a way that slows down their interactions. We present the CCAFFEINE solutions to several fundamental problems in the application of component software approaches to the construction of single program multiple data (SPMD) applications. We demonstrate the integration of components from three organizations, two within Sandia and one at Oak Ridge National Laboratory. We outline some requirements for key enabling facilities needed for a successful component approach to SPMD application building. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE620

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel visualization of gigabyte datasets in GeoFEM
Volume ID 14
Issue ID 6-7
Date May 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.620
Article ID CPE620
Author Name(s) Issei Fujishiro1Li Chen2Yuriko Takeshima3Hiroko Nakamura4Yasuko Suzuki5
Author Email(s) chen@tokyo.rist.or.jp2
Affiliation(s) Ochanomizu University, Tokyo, Japan 1Research Organization for Information Science and Technology, Tokyo, Japan 2Tohoku University, Sendai, Japan 3 4 5
Keyword(s) scientific visualization, parallel visualization, large-scale data visualization, volume visualization, flow visualization, polygonal simplification, feature analysis,
Abstract
An initial overview of parallel visualization in the GeoFEM software system is provided. Our visualization subsystem offers many kinds of parallel visualization methods for the users to visualize their huge finite-element analysis datasets for scalar, vector and/or tensor fields at a reasonable cost. A polygonal simplification scheme is developed to make the transmission and rendition of output graphic primitives more efficient. A salient feature of the subsystem lies in its capability in the automatic setting of visualization parameter values based on the analysis of scalar/flow field topology and volumetric coherence, to improve the quality of visualization results with a minimized number of batch re-executions. Representative experimental results illustrate the effectiveness of our subsystem. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE698

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: APEC Cooperation for Earthquake Simulation
Volume ID 14
Issue ID 6-7
Date May 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.698
Article ID CPE698
Author Name(s) Geoffrey C. Fox1
Author Email(s) gcf@indiana.edu1
Affiliation(s) Department of Computer Science, Indiana University, IN, U.S.A. 1
Keyword(s)
Abstract
No abstract

Article ID: CPE629

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Grid services for earthquake science
Volume ID 14
Issue ID 6-7
Date May 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.629
Article ID CPE629
Author Name(s) Geoffrey Fox1Sung-Hoon Ko2Marlon Pierce3Ozgur Balsoy4Jake Kim5Sangmi Lee6Kangseok Kim7Sangyoon Oh8Xi Rao9Mustafa Varank10Hasan Bulut11Gurhan Gunduz12Xiaohong Qiu13Shrideep Pallickara14Ahmet Uyar15Choonhan Youn16
Author Email(s) pierceme@asc.hpc.mil3
Affiliation(s) Departments of Computer Science and Physics, School of Informatics, Community Grids Laboratory, Indiana University, IN, U.S.A. 1School of Computational Science and Information Technology, Florida State University, FL, U.S.A. 2 3 Computer Science Department, Florida State University, FL, U.S.A. 4 5 6 Computer Science Department, Indiana University, IN, U.S.A. 7 8 9 10 Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY, U.S.A. 11 12 13 14 15 16
Keyword(s) Web services, collaboration, computational portal, earthquake science,
Abstract
We describe an information system architecture for the ACES (Asia-Pacific Cooperation for Earthquake Simulation) community. It addresses several key features of the field-simulations at multiple scales that need to be coupled together; real-time and archival observational data, which needs to be analyzed for patterns and linked to the simulations; a variety of important algorithms including partial differential equation solvers, particle dynamics, signal processing and data analysis; a natural three-dimensional space (plus time) setting for both visualization and observations; the linkage of field to real-time events both as an aid to crisis management and to scientific discovery. We also address the need to support education and research for a field whose computational sophistication is rapidly increasing and spans a broad range. The information system assumes that all significant data is defined by an XML layer which could be virtual, but whose existence ensures that all data is object-based and can be accessed and searched in this form. The various capabilities needed by ACES are defined as grid services, which are conformant with emerging standards and implemented with different levels of fidelity and performance appropriate to the application. Grid Services can be composed in a hierarchical fashion to address complex problems. The real-time needs of the field are addressed by high-performance implementation of data transfer and simulation services. Further, the environment is linked to real-time collaboration to support interactions between scientists in geographically distant locations. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE628

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel simulation system for earthquake generation: fault analysis modules and parallel coupling analysis
Volume ID 14
Issue ID 6-7
Date May 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.628
Article ID CPE628
Author Name(s) Mikio Iizuka1Daigo Sekita2Hisashi Suito3Mamoru Hyodo4Kazuro Hirahara5David Place6Peter Mora7Osamu Hazama8Hiroshi Okuda9
Author Email(s) iizuka@tokyo.rist.or.jp1
Affiliation(s) Research Organization for Information Science and Technology, 2-2-54, Nakameguro, Meguro-ku, Tokyo 153-0061, Japan 1Mitsubishi Research Institute, Inc., 3-6, Otemachi 2-chome, Chiyoda-ku, Tokyo 00-8141, Japan 2Earth and Planetary Sciences, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8602, Japan 3 4 5 QUAEKS, Department of Earth Sciences, The University of Queensland, 4072, Australia 6 7 Yokohama National University, 79-5, Tokiwadai, Hodogaya-ku, Yokohama, Kanagawa, 240-8501, Japan 8Department of Quantum Engineering and Systems Science, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-0033, Japan 9
Keyword(s) solid earth simulations, generation and cycle of earthquakes, GeoFEM, parallel finite-element analysis,
Abstract
Solid earth simulations have recently been developed to address issues such as natural disasters, global environmental destruction and the conservation of natural resources. The simulation of solid earth phenomena involves the analysis of complex structures including strata, faults, and heterogeneous material properties. Simulation of the generation and cycle of earthquakes is particularly important, but such simulations require the analysis of complex fault dynamics. GeoFEM is a parallel finite-element analysis system intended for solid earth field phenomena problems. This paper describes recent development in the GeoFEM project for the simulation of earthquake generation and cycles. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE627

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel multilevel iterative linear solvers with unstructured adaptive grids for simulations in earth science
Volume ID 14
Issue ID 6-7
Date May 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.627
Article ID CPE627
Author Name(s) Kengo Nakajima1
Author Email(s) nakajima@tokyo.rist.or.jp1
Affiliation(s) Department of Computational Earth Sciences, Research Organization for Information Science and Technology (RIST), Tokyo, Japan 1
Keyword(s) parallel iterative solvers, preconditioning, multigrid, grid adaption, GeoFEM,
Abstract
A new multigrid-preconditioned conjugate gradient (MGCG) iterative method for parallel computers is presented. Iterative solvers with preconditioning, such as the incomplete Cholesky or incomplete LU factorization methods, represent some of the most powerful tools for large-scale scientific computation. However, the number of iterations required for convergence by these methods increases with the size of the problem. In multigrid solvers, the rate of convergence is independent of problem size, and the number of iterations remains fairly constant. Multigrid is also a good preconditioning algorithm for Krylov iterative solvers. In this study, the MGCG method is applied to Poisson equations in the region between two spherical surfaces on semi-unstructured, adaptively generated prismatic grids, and to grids with local refinement. Computations using this method on a Hitachi SR2201 with up to 128 processors demonstrated good scalability. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE626

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Optimization of GeoFEM for high performance sequential computer architectures
Volume ID 14
Issue ID 6-7
Date May 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.626
Article ID CPE626
Author Name(s) Kazuo Minami1Hiroshi Okuda2
Author Email(s) minami@tokyo.rist.or.jp1
Affiliation(s) Department of Computational Earth Sciences, Research Organization for Information Science and Technology (RIST), Tokyo, Japan 1Department of Quantum Engineering and Systems Science, The University of Tokyo, Japan 2
Keyword(s)
Abstract
In this research, we focus on improving the performance of GeoFEM on a single processor using a common data structure and coding approach in order to optimize GeoFEM for implementation on various computer architectures including parallel systems. A new data structure and direct access coding are developed for fluid analysis and they are implemented on scalar, vector, and pseudovector architectures. A 17% increase in peak performance is obtained on pseudovector and scalar architectures, and a 20% peak performance improvement is achieved on vector architecture. By applying a new direct access coding approach, the peak performance of the structure solver is increased by 23% on pseudovector architecture and 28% on vector architecture. Architecture-independent matrix assembly coding is developed and evaluated on vector and scalar machines. A performance of 736.8 flops is obtained for the matrix assembly process and 900.7 Mflops for the entire code on an NEC SX-4 supercomputer. An average of 2.06 Gflops performance is obtained on a Fujitsu VPP5000 (peak: 9.6 Gflops), and 124 Mflops is obtained for the matrix assembling process on a 533-MHz 21164 Alpha system. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE625

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Thermal convection analysis in a rotating shell by a parallel finite-element method-development of a thermal-hydraulic subsystem of GeoFEM
Volume ID 14
Issue ID 6-7
Date May 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.625
Article ID CPE625
Author Name(s) Hiroaki Matsui1Hiroshi Okuda2
Author Email(s) matsui@tokyo.rist.or.jp1
Affiliation(s) Department of Research for Computational Earth Science, Research Organization for Information Science&Technology (RIST), Tokyo, Japan 1Department of Quantum Engineering and System Science, The University of Tokyo, Tokyo, Japan 2
Keyword(s) parallel finite-element method, thermal convection, rotating spherical shell, Earth"s outer core,
Abstract
The purpose of this paper is to propose a method for the numerical simulation of thermally driven convection in a rotating spherical shell modeled on the Earth"s outer core using the GeoFEM thermal-hydraulic subsystem, which provides a parallel finite-element method (FEM) platform. This simulation is designed to assist in the understanding of the origin of the geomagnetic field and the dynamics of the fluid in the Earth"s outer core. A three-dimensional and time-dependent process of a Boussinesq fluid in a rotating spherical shell is solved under the effects of self-gravity and the Coriolis force. A tri-linear hexahedral element is used for the spatial distribution. A total of $1.26 imes 10^5$ nodes were used on the spherical shell, and the finite-element mesh was divided into 32 domains for parallel computation. The second-order Adams-Bashforth scheme was used for the time integration of temperature and velocity. To satisfy mass conservation, a parallel iterative solver given by GeoFEM was used to solve for the pressure and correction of the velocity fields, and the simulation was performed over $10^5$ steps using four nodes of a Hitachi SR8000. To verify the proposed simulation code, results of the simulation are compared with analysis by the spectral method. The results show that the outline of convection is approximately equal; that is, three pairs of convection columns are formed, and these columns propagate westward in a quasi-steady state. However, the magnitude of kinetic energy averaged over the shell is approximately 93% of that by the spectral method, and the drift frequency of the columns in the GeoFEM simulation is larger than that by the spectral method. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE624

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Effective adaptation technique for hexahedral mesh
Volume ID 14
Issue ID 6-7
Date May 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.624
Article ID CPE624
Author Name(s) Yoshitaka Wada1Hiroshi Okuda2
Author Email(s) wada@tokyo.rist.or.jp1
Affiliation(s) Department of Computational Earth Sciences, Research Organization for Information Science and Technology (RIST), 2-2-54 Nakameguro, Meguro-ku, Tokyo, Japan 1Department of Quantum Engineering&Systems Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan 2
Keyword(s)
Abstract
This paper describes a simple and effective adaptive technique for mesh refinement in finite-element method models. The proposed refinement method is based on a modified octree refinement approach, and can be applied to an arbitrary hexahedral unstructured mesh. The implementation, Hex-R, is used in conjunction with the finite-element viewer GPPView, and applied to a complex model. The proposed method is demonstrated to be capable of easily and reliably generating an adaptive mesh. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE623

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Finite-element modeling of multibody contact and its application to active faults
Volume ID 14
Issue ID 6-7
Date May 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.623
Article ID CPE623
Author Name(s) H. L. Xing1A. Makinouchi2
Author Email(s) xing@postman.riken.go.jp1
Affiliation(s) Materials Fabrication Laboratory, The Institute of Physical and Chemical Research (RIKEN), 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan 1Integrated V-CAD Research Program, The Institute of Physical and Chemical Research (RIKEN), 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan 2
Keyword(s) multibody contact, finite-element method, nonlinear frictional contact, parallel sparse solver, active faults, earthquake,
Abstract
Earthquakes have been recognized as resulting from a stick-slip frictional instability along the faults between deformable rocks. An arbitrarily-shaped contact element strategy, named the node-to-point contact element strategy, is proposed, applied with the static-explicit characters to handle the friction contact between deformable bodies with stick and finite frictional slip and extended here to simulate the active faults in the crust with a more general nonlinear friction law. An efficient contact search algorithm for contact problems among multiple small and finite deformation bodies is also introduced. Moreover, the efficiency of the parallel sparse solver for the nonlinear friction contact problem is investigated. Finally, a model for the plate movement in the north-east zone of Japan under gravitation is taken as an example to be analyzed with different friction behaviors. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE621

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallelization of a large-scale computational earthquake simulation program
Volume ID 14
Issue ID 6-7
Date May 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.621
Article ID CPE621
Author Name(s) K. F. Tiampo1J. B. Rundle2P. Hopper3J. Sá Martins4S. Gross5S. McGinnis6
Author Email(s) kristy@fractal.colorado.edu1
Affiliation(s) CIRES, University of Colorado, Boulder, CO, U.S.A. 1Department of Physics, Colorado Center for Chaos and Complexity, CIRES, University of Colorado, Boulder, CO, 80309, U.S.A. and Distinguished Visiting Scientist, Jet Propulsion Laboratory, Pasadena, CA 91125, U.S.A. 2 3 4 5 6
Keyword(s) parallel computing, genetic algorithm, earthquake fault simulation,
Abstract
Here we detail both the methods and preliminary results of the first efforts to parallelize three General Earthquake Model (GEM)-related codes: (1) a relatively simple data mining procedure based on a genetic algorithm; (2) a mean-field slider block model; and (3) the Virtual California simulation of GEM. These preliminary results, using a simple, heterogeneous system of processors, existing freeware and an extremely low initial cost in both manpower and hardware dollars, motivate us to more ambitious work with considerably larger-scale computer earthquake simulations of southern California. The GEM omputational problem, which is essentially a Monte Carlo simulation, is well suited to optimization on parallel computers and we outline how we are proceeding in implementing this new software architecture. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE622

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel iterative solvers for unstructured grids using a directive/MPI hybrid programming model for the GeoFEM platform on SMP cluster architectures
Volume ID 14
Issue ID 6-7
Date May 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.622
Article ID CPE622
Author Name(s) Kengo Nakajima1Hiroshi Okuda2
Author Email(s) nakajima@tokyo.rist.or.jp1
Affiliation(s) Department of Computational Earth Sciences, Research Organization for Information Science and Technology (RIST), Tokyo, Japan 1Department of Quantum Engineering and Systems Science, The University of Tokyo, Tokyo, Japan 2
Keyword(s) parallel iterative solvers, preconditioning, reordering, multicolor, SMP cluster, GeoFEM, Earth Simulator,
Abstract
In this paper, an efficient parallel iterative method for unstructured grids developed by the authors for shared memory symmetric multiprocessor (SMP) cluster architectures on the GeoFEM platform is presented. The method is based on a three-level hybrid parallel programming model, including message passing for inter-SMP node communication, loop directives for intra-SMP node parallelization and vectorization for each processing element (PE). Simple 3D elastic linear problems with more than $10^8$ degrees of freedom have been solved by $3 imes3$ block ICCG(0) with additive Schwarz domain decomposition and PDJDS/CM-RCM reordering on 16 SMP nodes of a Hitachi SR8000 parallel computer, achieving a performance of 20 Gflops. The PDJDS/CM-RCM reordering method provides excellent vector and parallel performance in SMP nodes, and is essential for parallelization of forward/backward substitution in IC/ILU factorization with global data dependency. The method developed was also tested on an NEC SX-4 and attained 969 Mflops (48.5% of peak performance) using a single processor. The additive Schwarz domain decomposition method provides robustness for the GeoFEM parallel iterative solvers with localized preconditioning. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE696

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: High Performance Fortran
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.696
Article ID CPE696
Author Name(s) Ken Kennedy1
Author Email(s)
Affiliation(s) 1
Keyword(s)
Abstract
No abstract

Article ID: CPE648

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Techniques for compiling and implementing all NAS parallel benchmarks in HPF
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.648
Article ID CPE648
Author Name(s) Yasunori Nishitani1Kiyoshi Negishi2Hiroshi Ohta3Eiji Nunohiro4
Author Email(s) y_nishitani@itg.hitachi.co.jp1
Affiliation(s) Software Division, Hitachi Ltd, 549-6, Shinano-cho, Totsuka-ku, Yokohama, Kanagawa, 244-0801 Japan 1 2 Information&Computer Systems, Hitachi Ltd, 6-27-18, Minami Oi, Shinagawa-ku, Tokyo, 140-8572 Japan 3 4
Keyword(s) HPF, compiler, NAS parallel benchmarks,
Abstract
The NAS parallel benchmarks (NPB) are a well-known benchmark set for high-performance machines. Much effort has been made to implement them in High-Performance Fortran (HPF). In previous attempts, however, the HPF versions did not include the complete set of benchmarks, and the performance was not always good. In this study, we implement all eight benchmarks of the NPB in HPF, and parallelize them using an HPF compiler that we have developed. This report describes the implementation techniques and compiler features necessary to achieve good performance. We evaluate the HPF version on the Hitachi SR2201, a distributed-memory parallel machine. With 16 processors, the execution time of the HPF version is within a factor of 1.5 of the hand-parallelized version of the NPB 2.3 beta. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE649

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Efficient parallel programming on scalable shared memory systems with High Performance Fortran
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.649
Article ID CPE649
Author Name(s) Siegfried Benkner1Thomas Brandes2
Author Email(s) sigi@ieee.org1
Affiliation(s) Institute for Software Science, University of Vienna, Liechtensteinstrasse 22, A-1090 Vienna, Austria 1Institute for Algorithms and Scientific Computing (SCAI), Fraunhofer Gesellschaft (FhG), Schloß Birlinghoven, D-53754 St. Augustin, Germany 2
Keyword(s)
Abstract
OpenMP offers a high-level interface for parallel programming on scalable shared memory (SMP) architectures. It provides the user with simple work-sharing directives while it relies on the compiler to generate parallel programs based on thread parallelism. However, the lack of language features for exploiting data locality often results in poor performance since the non-uniform memory access times on scalable SMP machines cannot be neglected. High Performance Fortran (HPF), the de-facto standard for data parallel programming, offers a rich set of data distribution directives in order to exploit data locality, but it has been mainly targeted towards distributed memory machines. In this paper we describe an optimized execution model for HPF programs on SMP machines that avails itself with mechanisms provided by OpenMP for work sharing and thread parallelism, while exploiting data locality based on user-specified distribution directives. Data locality does not only ensure that most memory accesses are close to the executing threads and are therefore faster, but it also minimizes synchronization overheads, especially in the case of unstructured reductions. The proposed shared memory execution model for HPF relies on a small set of language extensions, which resemble the OpenMP work-sharing features. These extensions, together with an optimized shared memory parallelization and execution model, have been implemented in the ADAPTOR HPF compilation system and experimental results verify the efficiency of the chosen approach. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE647

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Advanced optimization strategies in the Rice dHPF compiler
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.647
Article ID CPE647
Author Name(s) J. Mellor-Crummey1V. Adve2B. Broom3D. Chavarría-Miranda4R. Fowler5G. Jin6K. Kennedy7Q. Yi8
Author Email(s) johnmc@cs.rice.edu1
Affiliation(s) Department of Computer Science-MS 132, Rice University, 6100 Main Street, Houston, TX 77005, U.S.A. 1Computer Science Department-MC-258, University of Illinois at Urbana-Champaign, 1304 West Springfield Avenue, Urbana, IL 61801, U.S.A. 2 3 4 5 6 7 8
Keyword(s) High-Performance Fortran, automatic parallelization, computation partitioning, NAS benchmarks, multipartitioning,
Abstract
High-Performance Fortran (HPF) was envisioned as a vehicle for modernizing legacy Fortran codes to achieve scalable parallel performance. To a large extent, today"s commercially available HPF compilers have failed to deliver scalable parallel performance for a broad spectrum of applications because of insufficiently powerful compiler analysis and optimization. Substantial restructuring and hand-optimization can be required to achieve acceptable performance with an HPF port of an existing Fortran application, even for regular data-parallel applications. A key goal of the Rice dHPF compiler project has been to develop optimization techniques that enable a wide range of existing scientific applications to be ported easily to efficient HPF with minimal restructuring. This paper describes the challenges to effective parallelization presented by complex (but regular) data-parallel applications, and then describes how the novel analysis and optimization technologies in the dHPF compiler address these challenges effectively, without major rewriting of the applications. We illustrate the techniques by describing their use for parallelizing the NAS SP and BT benchmarks. The dHPF compiler generates multipartitioned parallelizations of these codes that are approaching the scalability and efficiency of sophisticated hand-coded parallelizations. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE645

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Finding performance bugs with the TNO HPF benchmark suite
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.645
Article ID CPE645
Author Name(s) Will Denissen1Henk J. Sips2
Author Email(s) sips@its.tudelft.nl2
Affiliation(s) TNO-TPD, Delft, The Netherlands 1Delft University of Technology, Delft, The Netherlands 2
Keyword(s) HPF, parallel compilers, benchmarking, compiler optimizations,
Abstract
High-Performance Fortran (HPF) has been designed to provide portable performance on distributed memory machines. An important aspect of portable performance is the behavior of the available HPF compilers. Ideally, a programmer may expect comparable performance between different HPF compilers, given the same program and the same machine. To test the performance portability between compilers, we have designed a special benchmark suite, called the TNO HPF benchmark suite. It consists of a set of HPF programs that test various aspects of efficient parallel code generation. The benchmark suite consists of a number of template programs that are used to generate test programs with different array sizes, alignments, distributions, and iteration spaces. It ranges from very simple assignments to more complex assignments such as triangular iteration spaces, convex iteration spaces, coupled subscripts, and indirection arrays. We have run the TNO HPF benchmark suite on three compilers: the PREPARE prototype compiler, the PGI-HPF compiler, and the GMD Adaptor HPF compiler. Results show performance differences that can be quite large (up to two orders of magnitude for the same test program). Closer inspection reveals that the origin of most of the differences in performance is due to differences in local enumeration and storage of distributed array elements. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE646

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Achieving performance under OpenMP on ccNUMA and software distributed shared memory systems
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.646
Article ID CPE646
Author Name(s) B. Chapman1F. Bregier2A. Patil3A. Prabhakar4
Author Email(s) chapman@cs.uh.edu1
Affiliation(s) Department of Computer Science, University of Houston, Houston, TX 77204-3010, U.S.A. 1 2 3 4
Keyword(s) shared memory parallel programming, OpenMP, ccNUMA architectures, restructuring, data locality, data distribution, software distributed shared memory,
Abstract
OpenMP is emerging as a viable high-level programming model for shared memory parallel systems. It was conceived to enable easy, portable application development on this range of systems, and it has also been implemented on cache-coherent Non-Uniform Memory Access (ccNUMA) architectures. Unfortunately, it is hard to obtain high performance on the latter architecture, particularly when large numbers of threads are involved. In this paper, we discuss the difficulties faced when writing OpenMP programs for ccNUMA systems, and explain how the vendors have attempted to overcome them. We focus on one such system, the SGI Origin 2000, and perform a variety of experiments designed to illustrate the impact of the vendor"s efforts. We compare codes written in a standard, loop-level parallel style under OpenMP with alternative versions written in a Single Program Multiple Data (SPMD) fashion, also realized via OpenMP, and show that the latter consistently provides superior performance. A carefully chosen set of language extensions can help us translate programs from the former style to the latter (or to compile directly, but in a similar manner). Syntax for these extensions can be borrowed from HPF, and some aspects of HPF compiler technology can help the translation process. It is our expectation that an extended language, if well compiled, would improve the attractiveness of OpenMP as a language for high-performance computation on an important class of modern architectures. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE644

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Compatibility comparison and performance evaluation for Japanese HPF compilers using scientific applications
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.644
Article ID CPE644
Author Name(s) H. Sakagami1T. Mizuno2
Author Email(s) sakagami@comp.eng.himeji-tech.ac.jp1
Affiliation(s) Computer Engineering, Himeji Institute of Technology, 2167 Shosha, Himeji, Hyogo, 671-2201, Japan 1 2
Keyword(s) HPF, source compatability, benchmarking, fluid code, particle code,
Abstract
The lack of compatibility of High-Performance Fortran (HPF) between vender implementations has been disheartening scientific application users so as to hinder the development of portable programs. Thus parallel computing is still unpopular in the computational science community, even though parallel programming is common to the computer science community. As users would like to run the same source code on parallel machines with different architectures as fast as possible, we have investigated the compatibility of source codes for Japanese HPF compilers (NEC, Fujitsu and Hitachi) with two real-world applications: a 3D fluid code and a 2D particle code. We have found that the source-level compatibility between Japanese HPF compilers is almost preserved, but more effort will be needed to sustain complete compatibility. We have also evaluated parallel performance and found that HPF can achieve good performance for the 3D fluid code with almost the same source code. For the 2D particle code, good results have also been obtained with a small number of processors, but some changes in the original source code and the addition of interface blocks is required. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE643

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title High-performance numerical pricing methods
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.643
Article ID CPE643
Author Name(s) Hans Moritsch1Siegfried Benkner2
Author Email(s) hans.moritsch@univie.ac.at1
Affiliation(s) Department of Business Studies, University of Vienna, Brünner Strasse 72, A-1210 Vienna, Austria 1University of Vienna, Institute for Software Science, Liechtensteinstrasse 22, A-1090, Vienna, Austria 2
Keyword(s) numerical finance, derivative pricing, parallelization, HPF, SMP clusters, MPI, OpenMP,
Abstract
The pricing of financial derivatives is an important field in finance and constitutes a major component of financial management applications. The uncertainty of future events often makes analytic approaches infeasible and, hence, time-consuming numerical simulations are required. In the Aurora Financial Management System, pricing is performed on the basis of lattice representations of stochastic multidimensional scenario processes using the Monte Carlo simulation and Backward Induction methods, the latter allowing for the exploitation of shared-memory parallelism. We present the parallelization of a Backward Induction numerical pricing kernel on a cluster of SMPs using HPF+, an extended version of High-Performance Fortran. Based on language extensions for specifying a hierarchical mapping of data onto an SMP cluster, the compiler generates a hybrid-parallel program combining distributed-memory and shared-memory parallelism. We outline the parallelization strategy adopted by the VFC compiler and present an experimental evaluation of the pricing kernel on an NEC SX-5 vector supercomputer and a Linux SMP cluster, comparing a pure MPI version to a hybrid-parallel MPI/OpenMP version. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE642

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Optimization of element-by-element FEM in HPF 1.1
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.642
Article ID CPE642
Author Name(s) Hiroshi Okuda1Norihisa Anan2
Author Email(s) okuda@q.t.u-tokyo.ac.jp1
Affiliation(s) Department of Quantum Engineering and Systems Science, University of Tokyo, 7-3-1 Hongo, Bunkyou-ku, Tokyo 113-8656, Japan 1Department of Mechanical Engineering and Materials Science, Yokohama National University, 79-5 Tokiwadai, Hodogaya-ku, Yokohama 240-8501, Japan 2
Keyword(s) element-by-element FEM, HPF 1.1, HPF/SX, shrunk and nonshrunk array, MPI,
Abstract
In this study, Poisson"s equation is numerically evaluated by the element-by-element (EBE) finite-element method in a parallel environment using HPF 1.1 (High-Performance Fortran). In order to achieve high parallel efficiency, the data structures have been altered to node-based data instead of mixtures of node- and element-based data, representing a node-based EBE finite-element scheme (nEBE). The parallel machine used in this study was the NEC SX-4, and experiments were performed on a single node having 32 processors sharing common memory. The HPF compiler used in the experiments is HPF/SX Rev 2.0 released in 1997 (unofficial), which supports HPF 1.1. Models containing approximately 200 000 and 1 500 000 degrees of freedom were analyzed in order to evaluate the method. The calculation time, parallel efficiency, and memory used were compared. The performance of HPF in the conjugate gradient solver for the large model, using the NEC SX-4 compiler option -noshrunk, was about 85% that of the message passing interface. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE641

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Three-dimensional global MHD simulation code for the Earth"s magnetosphere using HPF/JA
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.641
Article ID CPE641
Author Name(s) Tatsuki Ogino1
Author Email(s) ogino@stelab.nagoya-u.ac.jp1
Affiliation(s) Solar-Terrestrial Environment Laboratory, Nagoya University, 3-13 Honohara, Toyokawa, Aichi 442-8507, Japan 1
Keyword(s) HPF, VPP Fortran, MHD simulation,
Abstract
We have translated a three-dimensional magnetohydrodynamic (MHD) simulation code of the Earth"s magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A three-dimensional global MHD simulation of the Earth"s magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5% using 56 processing elements of the Fujitsu VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE640

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title VPP Fortran and the design of HPF/JA extensions
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.640
Article ID CPE640
Author Name(s) Hidetoshi Iwashita1Naoki Sueyasu2Sachio Kamiya3Matthijs van Waveren4
Author Email(s) iwashita@lp.nm.fujitsu.co.jp1
Affiliation(s) Strategy and Technology Division, Software Group, Fujitsu Ltd, 140 Miyamoto, Numazu-shi, Shizuoka 410-0396, Japan 1 2 3 Fujitsu European Centre for Information Technology Ltd, Hayes Park Central, Hayes End Road, Hayes, Middlesex UB4 8FE, UK 4
Keyword(s) parallel computing, parallel languages, benchmark, asynchronous communication, data locality,
Abstract
VPP Fortran is a data parallel language that has been designed for the VPP series of supercomputers. In addition to pure data parallelism, it contains certain low-level features that were designed to extract high performance from user programs. A comparison of VPP Fortran and High-Performance Fortran (HPF) 2.0 shows that these low-level features are not available in HPF 2.0. The features include asynchronous inter-processor communication, explicit shadow, and the LOCAL directive. They were shown in VPP Fortran to be very useful in handling real-world applications, and they have been included in the HPF/JA extensions. They are described in the paper. The HPF/JA Language Specification Version 1.0 is an extension of HPF 2.0 to achieve practical performance for real-world applications and is a result of collaboration in the Japan Association for HPF (JAHPF). Some practical programming and tuning procedures with the HPF/JA Language Specification are described, using the NAS Parallel Benchmark BT as an example. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE639

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Implementation and evaluation of HPF/SX V2
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.639
Article ID CPE639
Author Name(s) Hitoshi Murai1Takuya Araki2Yasuharu Hayashi3Kenji Suehiro4Yoshiki Seo5
Author Email(s) murai@es.jamstec.go.jp1
Affiliation(s) 1st Computers Software Division, NEC Solutions, 10 Nisshin-cho 1-chome, Fuchu, Tokyo 183-8501, Japan 1HPC Technology Group, NEC Laboratories, 1-1 Miyazaki 4-chome, Miyamae-ku, Kawasaki, Kanagawa 216-0033, Japan 2 3 4 5
Keyword(s) HPF, compiler, parallelization, benchmark,
Abstract
We are developing HPF/SX V2, a High Performance Fortran (HPF) compiler for vector parallel machines. It provides some unique extensions as well as the features of HPF 2.0 and HPF/JA. In particular, this paper describes four of them: (1) the ON directive of HPF 2.0; (2) the REFLECT and LOCAL directives of HPF/JA; (3) vectorization directives; and (4) automatic parallelization. We evaluate these features through some benchmark programs on NEC SX-5. The results show that each of them achieved a 5-8 times speedup in 8-CPU parallel execution and the four features are useful for vector parallel execution. We also evaluate the overall performance of HPF/SX V2 by using over 30 well-known benchmark programs from HPFBench, APR Benchmarks, GENESIS Benchmarks, and NAS Parallel Benchmarks. About half of the programs showed good performance, while the other half suggest weakness of the compiler, especially on its runtimes. It is necessary to improve them to put the compiler to practical use. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE638

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Code generator for the HPF Library and Fortran 95 transformational functions
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.638
Article ID CPE638
Author Name(s) Matthijs van Waveren1Cliff Addison2Peter Harrison3Dave Orange4Norman Brown5Hidetoshi Iwashita6
Author Email(s) waveren@fecit.co.uk1
Affiliation(s) Fujitsu European Centre for Information Technology Ltd, Hayes Park Central, Hayes End Road, Hayes, Middlesex UB4 8FE, U.K. 1 2 3 N.A. Software Ltd, Roscoe House, 62 Roscoe Street, Liverpool, Merseyside L1 9DW, U.K. 4 5 Strategic Planning Division, Software Group, Fujitsu Ltd, 140 Miyamoto, Numazu-shi, Shizuoka 410-0396, Japan 6
Keyword(s) parallel computing, parallel languages, code generation, library functions, parameterized templates, matrix multiplication,
Abstract
One of the language features of the core language of HPF 2.0 (High Performance Fortran) is the HPF Library. The HPF Library consists of 55 generic functions. The implementation of this library presents the challenge that all data types, data kinds, array ranks and input distributions need to be supported. For instance, more than 2 billion separate functions are required to support COPY_SCATTER fully. The efficient support of these billions of specific functions is one of the outstanding problems of HPF. We have solved this problem by developing a library generator which utilizes the mechanism of parameterized templates. This mechanism allows the procedures to be instantiated at compile time for arguments with a specific type, kind, rank and distribution over a specific processor array. We describe the algorithms used in the different library functions. The implementation gives the ease of generating a large number of library routines from a single template. The templates can be extended with special code for specific combinations of the input arguments. We describe in detail the implementation and performance of the matrix multiplication template for the Fujitsu VPP5000 platform. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE637

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title HPF/JA: extensions of High Performance Fortran for accelerating real-world applications
Volume ID 14
Issue ID 8-9
Date July 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.637
Article ID CPE637
Author Name(s) Yoshiki Seo1Hidetoshi Iwashita2Hiroshi Ohta3Hitoshi Sakagami4
Author Email(s) seo@ccm.cl.nec.co.jp1
Affiliation(s) Internet Systems Research Laboratory, NEC Corporation, 4-1-1 Miyazaki, Miyamae-ku, Kawasaki, Kanagawa 216-8555, Japan 1Strategy and Technology Division, Software Group, Fujitsu Ltd, 140 Miyamoto, Numazu, Shizuoka 410-0396, Japan 2Information&Telecommunication Systems, Hitachi Ltd, Kanda-Surugadai 4-6, Chiyoda-ku, Tokyo 101-8010, Japan 3Computer Engineering, Himeji Institute of Technology, 2167 Shosha, Himeji, Hyago 671-2201, Japan 4
Keyword(s) High Performance Fortran, parallel processing, compiler, data parallel language, supercomputer,
Abstract
This paper presents a set of extensions on High Performance Fortran (HPF) to make it more usable for parallelizing real-world production codes. HPF has been effective for programs that a compiler can automatically optimize efficiently. However, once the compiler cannot, there have been no ways for the users to explicitly parallelize or optimize their programs. In order to resolve the situation, we have developed a set of HPF extensions (HPF/JA) to give the users more control over sophisticated parallelization and communication optimizations. They include parallelization of loops with complicated reductions, asynchronous communication, user-controllable shadow, and communication pattern reuse for irregular remote data accesses. Preliminary experiments have proved that the extensions are effective at increasing HPF"s usability. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE633

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Lesser Bear: A lightweight process library for SMP computers-scheduling mechanism without a lock operation
Volume ID 14
Issue ID 10
Date Aug 25 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.633
Article ID CPE633
Author Name(s) Hisashi Oguma1Yasuichi Nakayama2
Author Email(s) oguma-h@igo.cs.uec.ac.jp1 yasu@cs.uec.ac.jp2
Affiliation(s) NTT DoCoMo R&D Center, Yokosuka, Kanagawa 239-8536, Japan 1Department of Computer Science, The University of Electro-Communications, Chofu, Tokyo 182-8585, Japan 2
Keyword(s) thread library, SMP computer, parallelism, scheduler design,
Abstract
We have designed and implemented a lightweight process (thread) library called ‘Lesser Bear’ for SMP computers. Lesser Bear has thread-level parallelism and high portability. Lesser Bear executes threads in parallel by creating UNIX processes as virtual processors and a memory-mapped file as a huge shared-memory space. To schedule thread in parallel, the shared-memory space has been divided into working spaces for each virtual processor, and a ready queue has been distributed. However the previous version of Lesser Bear sometimes requires a lock operation for dequeueing. We therefore proposed a scheduling mechanism that does not require a lock operation. To achieve this, each divided space forms a link topology through the queues, and we use a lock-free algorithm for the queue operation. This mechanism is applied to Lesser Bear and evaluated by experimental results. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE697

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Efficient communication using message prediction for clusters of multiprocessors
Volume ID 14
Issue ID 10
Date Aug 25 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.697
Article ID CPE697
Author Name(s) Ahmad Afsahi1Nikitas J. Dimopoulos2
Author Email(s) ahmad@ee.queensu.ca1
Affiliation(s) Department of Electrical and Computer Engineering, Queen"s University, Kingston, Canada K7L 3N6 1Department of Electrical and Computer Engineering, University of Victoria, P.O. Box 3055, Victoria, Canada V8W 3P6 2
Keyword(s) message prediction, clusters, communication locality, message passing interface (MPI), zero-copy,
Abstract
With the increasing uniprocessor and symmetric multiprocessor computational power available today, interprocessor communication has become an important factor that limits the performance of clusters of workstations/multiprocessors. Many factors including communication hardware overhead, communication software overhead, and the user environment overhead (multithreading, multiuser) affect the performance of the communication subsystems in such systems. A significant portion of the software communication overhead belongs to a number of message copying operations. Ideally, it is desirable to have a true zero-copy protocol where the message is moved directly from the send buffer in its user space to the receive buffer in the destination without any intermediate buffering. However, due to the fact that message-passing applications at the send side do not know the final receive buffer addresses, early arrival messages have to be buffered at a temporary area. In this paper, we show that there is a message reception communication locality in message-passing applications. We have utilized this communication locality and devised different message predictors at the receiver sides of communications. In essence, these message predictors can be efficiently used to drain the network and cache the incoming messages even if the corresponding receive calls have not yet been posted. The performance of these predictors, in terms of hit ratio, on some parallel applications are quite promising and suggest that prediction has the potential to eliminate most of the remaining message copies. We also show that the proposed predictors do not have sensitivity to the starting message reception call, and that they perform better than (or at least equal to) our previously proposed predictors. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE630

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels
Volume ID 14
Issue ID 10
Date Aug 25 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.630
Article ID CPE630
Author Name(s) Vinod Valsalam1Anthony Skjellum2
Author Email(s) tony@cs.msstate.edu2
Affiliation(s) High Performance Computing Laboratory, Department of Computer Science, Mississippi State University, MS 39762, U.S.A. 1 2
Keyword(s) matrix multiplication, hierarchical matrix storage, Morton order, polyalgorithms, Strassen"s algorithm, kernel interface,
Abstract
Despite extensive research, optimal performance has not easily been available previously for matrix multiplication (especially for large matrices) on most architectures because of the lack of a structured approach and the limitations imposed by matrix storage formats. A simple but effective framework is presented here that lays the foundation for building high-performance matrix-multiplication codes in a structured, portable and efficient manner. The resulting codes are validated on three different representative RISC and CISC architectures on which they significantly outperform highly optimized libraries such as ATLAS and other competing methodologies reported in the literature. The main component of the proposed approach is a hierarchical storage format that efficiently generalizes the applicability of the memory hierarchy friendly Morton ordering to arbitrary-sized matrices. The storage format supports polyalgorithms, which are shown here to be essential for obtaining the best possible performance for a range of problem sizes. Several algorithmic advances are made in this paper, including an oscillating iterative algorithm for matrix multiplication and a variable recursion cutoff criterion for Strassen"s algorithm. The authors expose the need to standardize linear algebra kernel interfaces, distinct from the BLAS, for writing portable high-performance code. These kernel routines operate on small blocks that fit in the L1 cache. The performance advantages of the proposed framework can be effectively delivered to new and existing applications through the use of object-oriented or compiler-based approaches. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE702

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Workload decomposition strategies for hierarchical distributed-shared memory parallel systems and their implementation with integration of high-level parallel languages
Volume ID 14
Issue ID 11
Date Aug 25 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.702
Article ID CPE702
Author Name(s) Sergio Briguglio1Beniamino Di Martino2Gregorio Vlad3
Author Email(s) briguglio@frascati.enea.it1
Affiliation(s) Associazione Euratom-ENEA sulla Fusione, C.R. Frascati, C.P. 65 - I-00044 - Frascati, Rome, Italy 1Dip. Ingegneria dell"Informazione, Second University of Naples, Italy 2 3
Keyword(s) workload decomposition, distributed-shared memory parallel systems, clusters of symmetric multiprocessors, workload balancing, High-Performance Fortran, OpenMP,
Abstract
In this paper we address the issue of workload decomposition in programming hierarchical distributed-shared memory parallel systems. The workload decomposition we have devised consists of a two-stage procedure: a higher-level decomposition among the computational nodes; and a lower-level one among the processors of each computational node. By focusing on porting of a case study particle-in-cell application, we have implemented the described work decomposition without large programming effort by using and integrating the high-level language extensions High-Performance Fortran and OpenMP. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE701

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Deadlock detection in MPI programs
Volume ID 14
Issue ID 11
Date Aug 25 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.701
Article ID CPE701
Author Name(s) Glenn R. Luecke1Yan Zou2James Coyle3Jim Hoekstra4Marina Kraeva5
Author Email(s) grl@iastate.edu1
Affiliation(s) High Performance Computing Group, Iowa State University, Ames, IA 50011-2251, U.S.A. 1 2 3 4 5
Keyword(s) Message-Passing Interface, deadlock detection, handshake strategy,
Abstract
The Message-Passing Interface (MPI) is commonly used to write parallel programs for distributed memory parallel computers. MPI-CHECK is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. This paper presents the methods used in MPI-CHECK 2.0 to detect many situations where actual and potential deadlocks occur when using blocking and non-blocking point-to-point routines as well as when using collective routines. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE700

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel computation on interval graphs: algorithms and experiments
Volume ID 14
Issue ID 11
Date Aug 25 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.700
Article ID CPE700
Author Name(s) A. Ferreira1I. Guérin Lassous2K. Marcus3A. Rau-Chaplin4
Author Email(s) ferreira@sophia.inria.fr1
Affiliation(s) CNRS, Projet Mascotte, BP 93, F-06902 Sophia Antipolis, France 1INRIA - LIP, ENS Lyon, 46, allée d"Italie, F-69364 Lyon Cedex 07, France 2Eurecom, 2229, route des Cretes, BP 193, 06901 Sophia Antipolis Cedex, France 3Faculty of Computer Science, Dalhousie University, P.O. Box 1000, Halifax NS, Canada B3J 2X4 4
Keyword(s) coarse grain, interval graphs, parallel algorithms, practical experiments,
Abstract
This paper describes efficient coarse-grained parallel algorithms and implementations for a suite of interval graph problems. Included are algorithms requiring only a constant number of communication rounds for connected components, maximum weighted clique, and breadth-first-search and depth-first-search trees, as well as $O(log p)$ communication rounds algorithms for optimization problems such as minimum interval covering, maximum independent set and minimum dominating set, where $p$ is the number of processors in the parallel system. This implies that the number of communication rounds is independent of the problem size. Implementations of these algorithms are evaluated on parallel clusters, using both Fast Ethernet and Myrinet interconnection networks, and on a CRAY T3E parallel multicomputer, with extensive experimental results being presented and analyzed. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE703

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Simulating multiple inheritance in Java
Volume ID 14
Issue ID 12
Date Oct 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.703
Article ID CPE703
Author Name(s) Douglas Lyon1
Author Email(s) lyon@docjava.com1
Affiliation(s) Computer Engineering Department, Fairfield University, Fairfield, CT 06430, U.S.A. 1
Keyword(s) software reverse engineering, reverse engineering, reflection, delegation, Java, automatic code generation,
Abstract
The CentiJ system automatically generates code that simulates multiple inheritance in Java. The generated code inputs a series of instances and outputs specifications that can be combined using multiple inheritance. The multiple inheritance of implementation is obtained by simple message forwarding. The reflection API of Java is used to reverse engineer the instances, and so the program can generate source code, but does not require source code on its input. Advantages of CentiJ include compile-time type checking, speed of execution, automatic disambiguation (name space collision resolution) and ease of maintenance. Simulation of multiple inheritance was previously available only to Java programmers who performed manual delegation or who made use of dynamic proxies. The technique has been applied at a major aerospace corporation. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE707

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A quality-of-service-based framework for creating distributed heterogeneous software components
Volume ID 14
Issue ID 12
Date Oct 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.707
Article ID CPE707
Author Name(s) Rajeev R. Raje1Barrett R. Bryant2Andrew M. Olson3Mikhail Auguston4Carol Burt5
Author Email(s) rraje@cs.iupui.edu1
Affiliation(s) Department of Computer and Information Science, Indiana University Purdue University Indianapolis, 723 W. Michigan Street, SL 280, Indianapolis, IN 46202, U.S.A. 1Department of Computer and Information Sciences, The University of Alabama at Birmingham, 1300 University Boulevard, Birmingham, AL 35294-1170, U.S.A. 2 3 Department of Computer Science, New Mexico State University, Las Cruces, NM 88003, U.S.A. 4 5
Keyword(s) distributed systems, quality of service, generative domain models, heterogeneous components, formal methods, two-level grammar,
Abstract
Component-based software development offers a promising solution for taming the complexity found in today"s distributed applications. Today"s and future distributed software systems will certainly require combining heterogeneous software components that are geographically dispersed. For the successful deployment of such a software system, it is necessary that its realization, based on assembling heterogeneous components, not only meets the functional requirements, but also satisfies the non-functional criteria such as the desired quality of service (QoS). In this paper, a framework based on the notions of a meta-component model, a generative domain model and QoS parameters is described. A formal specification based on two-level grammar is used to represent these notions in a tightly integrated way so that QoS becomes a part of the generative domain model. A simple case study is described in the context of this framework. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE699

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Dynamically adapting to system load and program behavior in multiprogrammed multiprocessor systems
Volume ID 14
Issue ID 12
Date Oct 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.699
Article ID CPE699
Author Name(s) Iffat H. Kazi1David J. Lilja2
Author Email(s) iffat.kazi@sun.com1
Affiliation(s) Sun Microsystems Inc., 901 San Antonio Road, Palo Alto, CA 94303, U.S.A. 1Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, U.S.A. 2
Keyword(s) processor allocation, dynamic adaption, loop-level parallelism, shared memory multiprocessor system, multiprogramming,
Abstract
Parallel execution of application programs on a multiprocessor system may lead to performance degradation if the workload of a parallel region is not large enough to amortize the overheads associated with the parallel execution. Furthermore, if too many processes are running on the system in a multiprogrammed environment, the performance of the parallel application may degrade due to resource contention. This work proposes a comprehensive dynamic processor allocation scheme that takes both program behavior and system load into consideration when dynamically allocating processors. This mechanism was implemented on the Solaris operating system to dynamically control the execution of parallel C and Java application programs. Performance results show the effectiveness of this scheme in dynamically adapting to the current execution environment and program behavior, and that it outperforms a conventional time-shared system. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE690

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Economic models for resource management and scheduling in Grid computing
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.690
Article ID CPE690
Author Name(s) Rajkumar Buyya1David Abramson2Jonathan Giddy3Heinz Stockinger4
Author Email(s) raj@cs.mu.oz.au1
Affiliation(s) Grid Computing and Distributed Systems (GRIDS) Lab., Department of Computer Science and Software Engineering, The University of Melbourne, 221 Bouverie St., Carlton, Melbourne, Australia 1CRC for Enterprise Distributed Systems Technology, School of Computer Science and Software Engineering, Monash University, Melbourne, Australia 2 3 CMS Experiment, Computing Group, CERN, European Organization for Nuclear Research, CH-1211 Geneva 23, Switzerland 4
Keyword(s) world-wide computing, grid economy, resource management, scheduling,
Abstract
The accelerated development in peer-to-peer and Grid computing has positioned them as promising next-generation computing platforms. They enable the creation of virtual enterprises for sharing resources distributed across the world. However, resource management, application development and usage models in these environments is a complex undertaking. This is due to the geographic distribution of resources that are owned by different organizations or peers. The resource owners of each of these resources have different usage or access policies and cost models, and varying loads and availability. In order to address complex resource management issues, we have proposed a computational economy framework for resource allocation and for regulating supply and demand in Grid computing environments. This framework provides mechanisms for optimizing resource provider and consumer objective functions through trading and brokering services. In a real world market, there exist various economic models for setting the price of services based on supply-and-demand and their value to the user. They include commodity market, posted price, tender and auction models. In this paper, we discuss the use of these models for interaction between Grid components to decide resource service value, and the necessary infrastructure to realize each model. In addition to usual services offered by Grid computing systems, we need an infrastructure to support interaction protocols, allocation mechanisms, currency, secure banking and enforcement services. We briefly discuss existing technologies that provide some of these services and show their usage in developing the Nimrod-G grid resource broker. Furthermore, we demonstrate the effectiveness of some of the economic models in resource trading and scheduling using the Nimrod/G resource broker, with deadline and cost constrained scheduling for two different optimization strategies, on the World-Wide Grid testbed that has resources distributed across five continents. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE689

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title An event service to support Grid computational environments
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.689
Article ID CPE689
Author Name(s) Geoffrey C. Fox1Shrideep Pallickara2
Author Email(s) sbpallic@ecs.syr.edu2
Affiliation(s) Department of Computer Science, Indiana University, IN, U.S.A. 1Department of Electrical Engineering and Computer Science, Syracuse University, NY, U.S.A. 2
Keyword(s) distributed messaging, publish-subscribe, guaranteed delivery, grid systems, peer-to-peer infrastructures, event distribution systems,
Abstract
We believe that it is interesting to study the system and software architecture of environments which integrate the evolving ideas of computational Grids, distributed objects, Web services, peer-to-peer (P2P) networks and message-oriented middleware. Such P2P Grids should seamlessly integrate users to themselves and to resources which are also linked to each other. We can abstract such environments as a distributed system of ‘clients’ which consist either of ‘users’ or ‘resources’ or proxies thereto. These clients must be linked together in a flexible, fault-tolerant, efficient, high-performance fashion. In this paper, we study the messaging or event system-termed Grid Event Service (GES)-that is appropriate to link the clients (both users and resources of course) together. For our purposes (registering, transporting and discovering information), events are just messages-typically with time stamps. The messaging system GES must scale over a wide variety of devices-from handheld computers at one extreme to high-performance computers and sensors at the other. We have analyzed the requirements of several Grid services that could be built with this model, including computing and education and incorporated constraints of collaboration with a shared event model. We suggest that generalizing the well-known publish-subscribe model is an attractive approach and here we study some of the issues to be addressed if this model is used in GES. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE688

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Community software development with the Astrophysics Simulation Collaboratory
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.688
Article ID CPE688
Author Name(s) Gregor von Laszewski1Michael Russell2Ian Foster3John Shalf4Gabrielle Allen5Greg Daues6Jason Novotny7Edward Seidel8
Author Email(s) gregor@mcs.anl.gov1
Affiliation(s) Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A. 1University of Chicago, Chicago, IL, U.S.A. 2 3 Lawrence Berkeley National Laboratory, Berkeley, CA, U.S.A. 4Max-Planck-Institut für Gravitationsphysik, Albert-Einstein-Institut, Golm, Germany 5National Center for Supercomputing Applications, Champaign, IL, U.S.A. 6 7 8
Keyword(s) collaboratory, Grid computing, Globus, Cactus, Java CoG Kit, Astrophysics Simulation Collaboratory,
Abstract
We describe a Grid-based collaboratory that supports the collaborative development and use of advanced simulation codes. Our implementation of this collaboratory uses a mix of Web technologies (for thin-client access) and Grid services (for secure remote access to, and management of, distributed resources). Our collaboratory enables researchers in geographically disperse locations to share and access compute, storage, and code resources, without regard to institutional boundaries. Specialized services support community code development, via specialized Grid services, such as online code repositories. We use this framework to construct the Astrophysics Simulation Collaboratory, a domain-specific collaboratory for the astrophysics simulation community. This Grid-based collaboratory enables researchers in the field of numerical relativity to study astrophysical phenomena by using the Cactus computational toolkit. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE687

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Engineering an interoperable computational collaboratory on the Grid
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.687
Article ID CPE687
Author Name(s) Vijay Mann1Manish Parashar2
Author Email(s) manish@caip.rutgers.edu2
Affiliation(s) The Applied Software Systems Laboratory, Department of Electrical and Computer Engineering, Rutgers University, 94 Brett Road, Piscataway, NJ 08854, U.S.A. 1 2
Keyword(s) Grid services, problem solving environments, computational collaboratories, computational interaction and steering, middleware, interoperability,
Abstract
The growth of the Internet and the advent of the computational Grid have made it possible to develop and deploy advanced computational collaboratories. These systems build on high-end computational resources, communication technologies and enabling services underlying the Grid, and provide seamless and collaborative access to resources, applications and data. Combining these focused collaboratories and allowing them to interoperate has many advantages and can lead to truly collaborative, multidisciplinary and multi-institutional problem solving. However, integrating these collaboratories presents significant challenges, as each of these collaboratories has a unique architecture and implementation, and builds on different enabling technologies. This paper investigates the issues involved in integrating collaboratories operating on the Grid. It then presents the design and implementation of a prototype middleware substrate to enable a peer-to-peer integration of and global access to multiple, geographically distributed instances of the DISCOVER computational collaboratory. An experimental evaluation of the middleware substrate is presented. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE686

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A Web services data analysis Grid
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.686
Article ID CPE686
Author Name(s) William A. Watson1Ian Bird2Jie Chen3Bryan Hess4Andy Kowalski5Ying Chen6
Author Email(s) watson@jlab.org1
Affiliation(s) Thomas Jefferson National Accelerator Facility, 12000 Jefferson Avenue, Newport News, VA 23606, U.S.A. 1 2 3 4 5 6
Keyword(s) Web services, XML, Grid, data Grid, meta-center, portal,
Abstract
The trend in large-scale scientific data analysis is to exploit computational, storage and other resources located at multiple sites, and to make those resources accessible to the scientist as if they were a single, coherent system. Web technologies driven by the huge and rapidly growing electronic commerce industry provide valuable components to speed the deployment of such sophisticated systems. Jefferson Lab, where several hundred terabytes of experimental data are acquired each year, is in the process of developing a Web-based distributed system for data analysis and management. The essential aspects of this system are a distributed data Grid (site independent access to experimental, simulation and model data) and a distributed batch system, augmented with various supervisory and management capabilities, and integrated using Java and XML-based Web services. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE685

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A distributed computing environment for interdisciplinary applications
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.685
Article ID CPE685
Author Name(s) Jerry A. Clarke1Raju R. Namburu2
Author Email(s) clarke@arl.army.mil1
Affiliation(s) U.S. Army Research Laboratory, Aberdeen Proving Ground, MD 21005-5066, U.S.A. 1 2
Keyword(s) distributed interdisciplinary computing, data model and format, high-performance computing,
Abstract
Practical applications are generally interdisciplinary in nature. Current technology is well matured for addressing individual disciplines but not for interdisciplinary applications. Hence, there is a need to couple the capabilities of several different computational disciplines to address these interdisciplinary applications. One approach is to use coupled or multi-physics software, which typically involves developing and validating the entire software spectrum for a specific application. This will be extremely time consuming thus delaying the delivery of crucial capability to the end-user. The other approach is to integrate individual well-matured computational technology discipline"s software thus taking advantage of the existing scalable software, validation investments, and tremendous developments in computational science. This integrated approach requires a consistent data model, data format, data management, seamless data movement, and robust, modular, scalable coupling algorithms. To address these requirements, we have developed a new flexible data exchange mechanism for high-performance computing (HPC) codes and tools, known as the eXtensibleData Model and Format (XDMF). XDMF is part of a larger effort known as the ‘Interdisciplinary Computing Environment’ (ICE). ICE provides computational engines with the data management, visualizations, and user interface tools necessary to exist in a modern computing environment. Instead of imposing a new programming paradigm on HPC codes, XDMF uses the existing concept of file I/O for distributed coordination. XDMF incorporates Network Distributed Global Memory (NDGM), Hierarchical Data Format version 5 (HDF5), and eXtensible Markup Language (XML) to provide a flexible yet efficient data exchange mechanism. This paper discusses the development and implementation of a distributed computing environment for interdisciplinary applications utilizing the concept of a common data hub. Also, the implementation of XDMF is demonstrated for a typical blast-structure interaction interdisciplinary application. Published in 2002 by John Wiley & Sons, Ltd.

Article ID: CPE683

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title pyGlobus: a Python interface to the Globus Toolkit™
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.683
Article ID CPE683
Author Name(s) Keith R. Jackson1
Author Email(s) KRJackson@lbl.gov1
Affiliation(s) Lawrence Berkeley National Laboratory, 1 Cyclotron Road, MS 50B-2239, U.S.A. 1
Keyword(s)
Abstract
Developing high-performance, problem-solving environments/applications that allow scientists to easily harness the power of the emerging national-scale ‘Grid’ infrastructure is currently a difficult task. Although many of the necessary low-level services, e.g. security, resource discovery, remote access to computation/data resource, etc., are available, it can be a challenge to rapidly integrate them into a new application. To address this difficulty we have begun the development of a Python-based high-level interface to the Grid services provided by the Globus Toolkit. In this paper we will explain why rapid application development using Grid services is important, look briefly at a motivating example, and finally look at the design and implementation of the pyGlobus package. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE684

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title MyPYTHIA: a recommendation portal for scientific software and services
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.684
Article ID CPE684
Author Name(s) E. N. Houstis1A. C. Catlin2N. Dhanjani3J. R. Rice4N. Ramakrishnan5V. Verykios6
Author Email(s) enh@cs.purdue.edu1
Affiliation(s) Department of Computer Sciences, Purdue University, West Lafayette, IN 47907, U.S.A. 1 2 3 4 Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, U.S.A. 5College of Information Science and Technology, Drexel University, Philadelphia, PA 19104, U.S.A. 6
Keyword(s) Grid computing environments, problem solving environments, recommender systems, Web portals, data mining, knowledge discovery,
Abstract
We outline the design of a recommendation system (MyPYTHIA) implemented as a Web portal. MyPYTHIA"s design objectives include evaluating the quality and performance of scientific software on Grid platforms, creating knowledge about which software and computational services should be selected for solving particular problems, selecting parameters of software (or of computational services) based on user-specified computational objectives, providing access to performance data and knowledge bases over the Web and enabling recommendations for targeted application domains. MyPYTHIA uses a combination of statistical analysis, pattern extraction techniques and a database of software performance to map feature-based representations of problem instances to appropriate software. MyPYTHIA"s open architecture allows the user to customize it for conducting individual case studies. We describe the architecture as well as several scientific domains of knowledge enabled by such case studies. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE682

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A CORBA Commodity Grid Kit
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.682
Article ID CPE682
Author Name(s) Manish Parashar1Gregor von Laszewski2Snigdha Verma3Jarek Gawor4Kate Keahey5Nell Rehn6
Author Email(s) Parashar@caip.rutgers.edu1
Affiliation(s) The Applied Software Systems Laboratory, Department of Electrical and Computer Engineering, Rutgers, The State University of New Jersey, 94 Brett Road, Piscataway, NJ 08854-8058, U.S.A. 1Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A. 2 3 4 5 6
Keyword(s) Grid computing, CORBA, Globus, DISCOVER, Java CoG Kit,
Abstract
This paper reports on an ongoing research project aimed at designing and deploying a Common Object Resource Broker Architecture (CORBA) (ww.omg.org) Commodity Grid (CoG) Kit. The overall goal of this project is to enable the development of advanced Grid applications while adhering to state-of-the-art software engineering practices and reusing the existing Grid infrastructure. As part of this activity, we are investigating how CORBA can be used to support the development of Grid applications. In this paper, we outline the design of a CORBA CoG Kit that will provide a software development framework for building a CORBA ‘Grid domain’. We also present our experiences in developing a prototype CORBA CoG Kit that supports the development and deployment of CORBA applications on the Grid by providing them access to the Grid services provided by the Globus Toolkit. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE681

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The Gateway computational Web portal
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.681
Article ID CPE681
Author Name(s) Marlon E. Pierce1Choonhan Youn2Geoffrey C. Fox3
Author Email(s) pierceme@asc.hpc.mil1
Affiliation(s) School of Computational Science and Information Technology, Florida State University, Tallahassee, FL 32306-4120, U.S.A. 1Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY 13244-1240, U.S.A. 2Community Grid Lab, Department of Computer Science, School of Informatics, and Department of Physics, Indiana University, Bloomington, IN 47405-7104, U.S.A. 3
Keyword(s) computational portals, computing environments, computational Grids,
Abstract
In this paper we describe the basic services and architecture of Gateway, a commodity-based Web portal that provides secure remote access to unclassified Department of Defense computational resources. The portal consists of a dynamically generated, browser-based user interface supplemented by client applications and a distributed middle tier, WebFlow. WebFlow provides a coarse-grained approach to accessing both stand-alone and Grid-enabled back-end computing resources. We describe in detail the implementation of basic portal features such as job submission, file transfer, and job monitoring and discuss how the portal addresses security requirements of the deployment centers. Finally, we outline future plans, including integration of Gateway with Department of Defense testbed Grids. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE680

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A software development environment for Grid computing
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.680
Article ID CPE680
Author Name(s) M. S. Müller1E. Gabriel2M. M. Resch3
Author Email(s) mueller@hlrs.de1
Affiliation(s) HLRS-High-Performance Computing Center Stuttgart, Allmandring 30, 70550 Stuttgart, Germany 1 2 3
Keyword(s) MPI, Grid, software development,
Abstract
Grid computing has become a popular concept in the last few years. While in the beginning the driving force was metacomputing, the focus has now shifted towards resource management issues and concepts like ubiquitous computing. For the High-Performance Computing Center Stuttgart (HLRS) the key challenges of Grid computing have come from the demands of its users and customers. With high-speed networks in place, programmers expect to be able to exploit the overall performance of several instruments and high-speed systems for their applications. In order to meet these demands, HLRS has set out a research effort to provide these users with the necessary tools to develop and run their codes on clusters of supercomputers. This has resulted in the development of a basic Grid-computing environment for technical and scientific computing. In this paper we describe the building blocks of this software development environment and focus specifically on communication and debugging. We present the Grid-enabled MPI implementation PACX-MPI and the MPI debugger MARMOT. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE679

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Programming environments for multidisciplinary Grid communities
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.679
Article ID CPE679
Author Name(s) N. Ramakrishnan1L. T. Watson2D. G. Kafura3C. J. Ribbens4C. A. Shaffer5
Author Email(s) naren@cs.vt.edu1
Affiliation(s) Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, U.S.A. 1 2 3 4 5
Keyword(s) Grid computing environments, problem solving environments, multidisciplinary Grid communities, compositional modeling,
Abstract
As the power of computational Grids increases, there is a corresponding need for better usability for large and diverse communities. The focus in this paper is on supporting multidisciplinary communities of scientists and engineers. We discuss requirements for Grid computing environments (GCEs) in this context, and describe several core support technologies developed to meet these requirements. Our work extends the notion of a programming environment beyond the compile-schedule-execute paradigm, to include functionality such as collaborative application composition, information services, and data and simulation management. Systems designed for five different applications communities are described. These systems illustrate common needs and characteristics arising in multidisciplinary communities and motivate a high-level design framework for building GCEs that meet those needs. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE677

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Web-based access to the Grid using the Grid Resource Broker portal
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.677
Article ID CPE677
Author Name(s) Giovanni Aloisio1Massimo Cafaro2
Author Email(s) giovanni.aloisio@unile.it1
Affiliation(s) ISUFI High Performance Computing Center, Department of Innovation Engineering, University of Lecce, Italy 1 2
Keyword(s) computational Grids, Web portals,
Abstract
This paper describes the Grid Resource Broker (GRB) portal, a Web gateway to computational Grids in use at the University of Lecce. The portal allows trusted users seamless access to computational resources and Grid services, providing a friendly computing environment that takes advantage of the underlying Globus Toolkit middleware, enhancing its basic services and capabilities. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE678

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Innovations of the NetSolve Grid Computing System
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.678
Article ID CPE678
Author Name(s) Dorian C. Arnold1Henri Casanova2Jack Dongarra3
Author Email(s) darnold@cs.wisc.edu1
Affiliation(s) 1210 West Dayton Street, Computer Science Department, University of Wisconsin, Madison, WI 53706, U.S.A. 19500 Gilman Drive, Computer Science and Engineering Department, University of California, San Diego, La Jolla, CA 92093-0114, U.S.A. 21122 Volunteer Boulevard, Suite 413, Computer Science Department, The University of Tennessee, Knoxville, TN 37996-3450, U.S.A. 3
Keyword(s) Grid computing, distributed computing, heterogeneous network computing, client-server, agent-based computing,
Abstract
The NetSolve Grid Computing System was first developed in the mid 1990s to provide users with seamless access to remote computational hardware and software resources. Since then, the system has benefitted from many enhancements like security services, data management faculties and distributed storage infrastructures. This article is meant to provide the reader with details regarding the present state of the project, describing the current architecture of the system, its latest innovations and other systems that make use of the NetSolve infrastructure. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE676

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The integrated simulation environment TENT
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.676
Article ID CPE676
Author Name(s) Andreas Schreiber1
Author Email(s) Andreas.Schreiber@dlr.de1
Affiliation(s) Deutsches Zentrum füur Luft- und Raumfahrt e.V., Simulation and Software Technology, Linder Höhe, 51147 Cologne, Germany 1
Keyword(s) Grid computing, component-based software, problem solving environment, CORBA,
Abstract
This paper describes recent development efforts on the integrated simulation environment TENT. TENT is a component-based software integration and workflow management system using the capabilities of CORBA and Java. It is used to integrate the applications required to form complex workflows, which are typical of multidisciplinary simulations in engineering, in which different simulation codes have to be coupled. We present our work in integrating TENT with the Globus Toolkit to create a Grid computing environment. The Java Commodity Grid Toolkit has been especially useful for this work. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE675

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Mississippi Computational Web Portal
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.675
Article ID CPE675
Author Name(s) Tomasz Haupt1Purushotham Bangalore2Gregory Henley3
Author Email(s) haupt@erc.msstate.edu1
Affiliation(s) Engineering Research Center at Mississippi State University, P.O. Box 9627, Mississippi State, MS 39762, U.S.A. 1 2 3
Keyword(s) Web portals, Grid computing environments, access to remote resources, problem-solving environments, Enterprise Java,
Abstract
This paper describes design and implementation of an open, extensible object-oriented framework that allows the integration of new and legacy components into a single user-friendly Grid computing environment. Thus we extend the researcher"s desktop by providing seamless access to remote resources (that is, hardware, software, and data), and thereby simplifying currently difficult to comprehend and changing interfaces and emerging protocols. The user, through the familiar Web browser interface, is able to compose complex computational tasks represented as a collection of middle-tier objects serving as proxies for services rendered by the back-end. The proxies through a Grid resource broker use the Grid services, as defined by the Global Grid Forum, to access remote computational resources. The middle-tier objects are persistent, and therefore once configured, simulation can be reused, shared between users, or undergo transition into operational or educational use. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE674

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Features of the Java Commodity Grid Kit
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.674
Article ID CPE674
Author Name(s) Gregor von Laszewski1Jarek Gawor2Peter Lane3Nell Rehn4Mike Russell5
Author Email(s) gregor@mcs.anl.gov1
Affiliation(s) Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A. 1 2 3 4 5
Keyword(s) Grid computing, Globus Toolkit, peer-to-peer information service, portal, Java,
Abstract
In this paper we report on the features of the Java Commodity Grid Kit (Java CoG Kit). The Java CoG Kit provides middleware for accessing Grid functionality from the Java framework. Java CoG Kit middleware is general enough to design a variety of advanced Grid applications with quite different user requirements. Access to the Grid is established via Globus Toolkit protocols, allowing the Java CoG Kit to also communicate with the services distributed as part of the C Globus Toolkit reference implementation. Thus, the Java CoG Kit provides Grid developers with the ability to utilize the Grid, as well as numerous additional libraries and frameworks developed by the Java community to enable network, Internet, enterprise and peer-to-peer computing. A variety of projects have successfully used the client libraries of the Java CoG Kit to access Grids driven by the C Globus Toolkit software. In this paper we also report on the efforts to develop serverside Java CoG Kit components. As part of this research we have implemented a prototype pure Java resource management system that enables one to run Grid jobs on platforms on which a Java virtual machine is supported, including Windows NT machines. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE673

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Ecce-a problem-solving environment"s evolution toward Grid services and a Web architecture
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.673
Article ID CPE673
Author Name(s) Karen Schuchardt1Brett Didier2Gary Black3
Author Email(s) Karen.Schuchardt@pnl.gov1
Affiliation(s) Battelle, Pacific Northwest National Laboratory, P.O. Box 999/MS K1-85, Richland, WA 99352, U.S.A. 1 2 3
Keyword(s) problem-solving environment, Grid computing, data management, metadata, information services,
Abstract
The Extensible Computational Chemistry Environment (Ecce), an innovative problem-solving environment, was designed a decade ago, before the emergence of the Web and Grid computing services. In this paper, we briefly examine the original Ecce architecture and discuss how it is evolving to incorporate both Grid services and components of the Web to increase its range of services, reduce deployment and maintenance costs, and reach a wider audience. We show that Ecce operates in both Grid and non-Grid environments, an important consideration given Ecce"s broad range of uses and user community, and discuss the strategies for loosely coupled components that make this possible. Both in-progress work and conceptual plans for how Ecce will evolve are presented. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE672

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The Legion Grid Portal
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.672
Article ID CPE672
Author Name(s) Anand Natrajan1Anh Nguyen-Tuong2Marty A. Humphrey3Michael Herrick4Brian P. Clarke5Andrew S. Grimshaw6
Author Email(s) brian_clarke@alumni.virginia.edu1
Affiliation(s) Department of Computer Science at the University of Virginia, Charlottesville, VA 22904-4740, U.S.A. 1Avaki Corporation, Charlottesville, VA 22902, U.S.A. 2 3 4 5 6
Keyword(s) Grids, Grid computing, Web portal, Legion,
Abstract
The Legion Grid Portal is an interface to a Grid system. Users interact with the portal, and hence a Grid, through an intuitive interfacefrom which they can view files, submit and monitor runs, and view accounting information. The architecture of the portal is designed to accommodate multiple diverse Grid infrastructures, legacy systems, and application-specific interfaces. The current implementation of the Legion Grid Portal is with familiar Web technologies over the Legion Grid infrastructure. The portal can be extended in a number of directions-additional support for Grid administrators, greater number of application-specific interfaces, interoperability between Grid infrastructures, and interfaces for programming support. The portal has been in operation since February 2000 on npacinet, a worldwide Grid managed byLegion on NPACI resources. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE670

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title NetBuild: transparent cross-platform access to computational software libraries
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.670
Article ID CPE670
Author Name(s) Keith Moore1Jack Dongarra2
Author Email(s) moore@cs.utk.edu1
Affiliation(s) Innovative Computing Laboratory, University of Tennessee, 1122 Volunteer Blvd., Suite 413, Knoxville, TN 37996-3450, U.S.A. 1 2
Keyword(s) computational software, software libraries, automatic library selection,
Abstract
NetBuild is a suite of tools which automate the process of selecting, locating, downloading, configuring, and installing computational software libraries from over the Internet, and which aid in the construction and cataloging of such libraries. Unlike many other tools, NetBuild is designed to work across a wide variety of computing platforms, and perform fine-grained matching to find the most suitable version of a library for a given target platform. We describe the architecture of NetBuild and its initial implementation. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE671

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The ASCI Computational Grid: initial deployment
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.671
Article ID CPE671
Author Name(s) Randal Rheinheimer1Judy I. Beiriger2Hugh P. Bivens3Steven L. Humphreys4
Author Email(s) randal@lanl.gov1
Affiliation(s) Los Alamos National Laboratory, P.O. Box 1663, Los Alamos, NM 87545, U.S.A. 1Sandia National LaboratoriesSandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy (DOE) under contract DE-AC04-94AL85000., P.O. Box 5800, Albuquerque, NM 87185-1137, U.S.A. 2 3 4
Keyword(s) grid, globus, kerberos, GALE, XML, GSF, CORBA, ASCI,
Abstract
Grid Services, a Department of Energy Accelerated Strategic Computing Initiative program, has designed, implemented, and deployed a grid-based solution for customer access to large computing resources at DOE weapons labs and plants. Customers can access and monitor diverse, geographically distributed resources using the common Grid Services interfaces. This paper discusses the architecture, security, and user interfaces of the Grid Services infrastructure. Published in 2002 by John Wiley & Sons, Ltd.

Article ID: CPE669

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The Grid Portal Development Kit
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.669
Article ID CPE669
Author Name(s) Jason Novotny1
Author Email(s) jdnovotny@lbl.gov1
Affiliation(s) Lawrence Berkeley National Laboratory, Berkeley, CA 94704, U.S.A. 1
Keyword(s) Grid computing, computing environments, Grid portals, science portals, Globus,
Abstract
Computational science portals are emerging as useful and necessary interfaces for performing operations on the Grid. The Grid Portal Development Kit (GPDK) facilitates the development of Grid portals and provides several key reusable components for accessing various Grid services. A Grid portal provides a customizable interface allowing scientists to perform a variety of Grid operations including remote program submission, file staging, and querying of information services from a single, secure gateway. The GPDK leverages off existing Globus/Grid middleware infrastructure as well as commodity Web technology including Java Server Pages and servlets. The design and architecture of the GPDK is presented as well as a discussion on the portal building capabilities of the GPDK, allowing application developers to build customized portals more effectively by reusing common core services provided by the GPDK. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE693

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Component-based, problem-solving environments for large-scale scientific computing
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.693
Article ID CPE693
Author Name(s) Chris Johnson1Steve Parker2David Weinstein3Sean Heffernan4
Author Email(s) crj@cs.utah.edu1
Affiliation(s) Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT 84112, U.S.A. 1 2 3 4
Keyword(s) problem solving environment, scientific computing, SCIRun, BioPSE, Uintah, steering,
Abstract
In this paper we discuss three scientific computing problem solving environments: SCIRun, BioPSE, and Uintah. We begin with an overview of the systems, describe their underlying software architectures, discuss implementation issues, and give examples of their use in computational science and engineering applications. We conclude by discussing future research and development plans for the three problem solving environments. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE694

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Application portals: practice and experience
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.694
Article ID CPE694
Author Name(s) Mary Thomas1Maytal Dahan2Kurt Mueller3Steve Mock4Cathie Mills5Ray Regno6
Author Email(s) mthomas@tacc.utexas.edu1
Affiliation(s) University of California at San Diego, San Diego Supercomputer Center 1 2 3 4 5 6
Keyword(s)
Abstract
The implementation of multiple Grid computing portals has led us to develop a methodology for Grid portal development that facilitates rapid prototyping and building of portals. Based on the National Partnership for Advanced Computational Infrastructure (NPACI) Grid Portal Toolkit (GridPort) and the NPACI HotPage, all portals inherit interactive Grid services, share a single account and login environment, and share the infrastructure required to support and provide services used by each portal. We have demonstrated that the GridPort software can be used in production application portal environments, that the software can be configured to extend multiple sites. In this paper, we describe our experiences gained in building Grid portals and developing software for the Grid. We describe the architecture and design of the portal system, Grid services and systems employed, as well as the unique features of the system. We present descriptions of several application portals and the driving design choices. Finally, we discuss the new and emerging architecture system that being studied based on the Web services architecture. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE695

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The Perl Commodity Grid Toolkit
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.695
Article ID CPE695
Author Name(s) Stephen Mock1Mary Thomas2Maytal Dahan3Kurt Mueller4Catherine Mills5Gregor von Lazewski6
Author Email(s) mock@sdsc.edu1
Affiliation(s) San Diego Supercomputer Center, University of California at San Diego, MC 0505, 9500 Gilman Drive, La Jolla, CA 92093-0505, U.S.A. 1Texas Advanced Computing Center, University of Texas at Austin, Austin, TX 78758-4497, U.S.A. 2 3 4 5 Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A. 6
Keyword(s) Grid, PERL, CoG, commodity, toolkit, Globus, storage resource broker, SRB, module, portal, middleware,
Abstract
The Perl Commodity Grid Toolkit (Perl CoG Kit) is a software project aimed at bringing the complexities and power of the computational Grid to developers of Perl applications. In this paper we describe the history and motivation of the project, the benefits of bringing the Grid to Perl developers, the architecture and status of the Perl CoG Kit and the future plans for the project. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE710

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.710
Article ID CPE710
Author Name(s) Rajkumar Buyya1Manzur Murshed2
Author Email(s) raj@cs.mu.oz.au1
Affiliation(s) Grid Computing and Distributed Systems (GRIDS) Lab., Department of Computer Science and Software Engineering, The University of Melbourne, 221 Bouverie St., Carlton, Melbourne, Australia 1Gippsland School of Computing and IT, Monash University, Gippsland Campus, Churchill, Vic. 3842, Australia 2
Keyword(s) Grid computing, modelling, simulation, scheduling, performance evaluation,
Abstract
Clusters, Grids, and peer-to-peer (P2P) networks have emerged as popular paradigms for next generation parallel and distributed computing. They enable aggregation of distributed resources for solving large-scale problems in science, engineering, and commerce. In Grid and P2P computing environments, the resources are usually geographically distributed in multiple administrative domains, managed and owned by different organizations with different policies, and interconnected by wide-area networks or the Internet. This introduces a number of resource management and application scheduling challenges in the domain of security, resource and policy heterogeneity, fault tolerance, continuously changing resource conditions, and politics. The resource management and scheduling systems for Grid computing need to manage resources and application execution depending on either resource consumers" or owners" requirements, and continuously adapt to changes in resource availability. The management of resources and scheduling of applications in such large-scale distributed systems is a complex undertaking. In order to prove the effectiveness of resource brokers and associated scheduling algorithms, their performance needs to be evaluated under different scenarios such as varying number of resources and users with different requirements. In a Grid environment, it is hard and even impossible to perform scheduler performance evaluation in a repeatable and controllable manner as resources and users are distributed across multiple organizations with their own policies. To overcome this limitation, we have developed a Java-based discrete-event Grid simulation toolkit called GridSim. The toolkit supports modeling and simulation of heterogeneous Grid resources (both time- and space-shared), users and application models. It provides primitives for creation of application tasks, mapping of tasks to resources, and their management. To demonstrate suitability of the GridSim toolkit, we have simulated a Nimrod-G like Grid resource broker and evaluated the performance of deadline and budget constrained cost- and time-minimization scheduling algorithms. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE734

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Editorial
Article Title Editorial: A summary of Grid computing environments
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.734
Article ID CPE734
Author Name(s) Geoffrey C. Fox1
Author Email(s) gcf@indiana.edu1
Affiliation(s) Department of Computer Science, Indiana University, IN, U.S.A. 1
Keyword(s)
Abstract
No abstract

Article ID: CPE691

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title UNICORE-a Grid computing environment
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.691
Article ID CPE691
Author Name(s) Dietmar W. Erwin1
Author Email(s) D.Erwin@fz-juelich.de1
Affiliation(s) Forschungszentrum Jülich GmbH, Zentralinstitut für Mathematik (ZAM), D52425 Jülich, Germany 1
Keyword(s) Grid computing, distributed resources, seamless access,
Abstract
This paper gives an overview of the goals, functions, architecture and future development of UNICORE. Its primary goal is to give researchers a seamless access to distributed resources that are available at remote sites. A graphical interface aids users to formulate jobs which are to be performed in a system and site independent fashion. This procedure allows switching between systems without having to change the job. Complex jobs with individual applications running on different systems at different sites may be formulated. UNICORE will perform synchronization and data transfers as required without any user intervention. UNICORE uses X.509 certificates to authenticate users, software and systems and provide secure communication over the internet. Copyright © 2002 John Wiley & Sons, Ltd.

Article ID: CPE692

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The Polder Computing Environment: a system for interactive distributed simulation
Volume ID 14
Issue ID 13-15
Date Nov 1 2002
DOI(URI) http://dx.doi.org/10.1002/cpe.692
Article ID CPE692
Author Name(s) K. A. Iskra1R. G. Belleman2G. D. van Albada3J. Santoso4P. M. A. Sloot5H. E. Bal6H. J. W. Spoelder7M. Bubak8
Author Email(s) kamil@science.uva.nl1
Affiliation(s) Section Computational Science, Universiteit van Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands 1 2 3 4 5 Division of Mathematics and Computer Science, Faculty of Sciences, Vrije Universiteit, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands 6Division of Physics and Astronomy, Faculty of Sciences, Vrije Universiteit, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands 7Institute of Computer Science, AGH, al. Mickiewicza 30, 30-059 Kraków, Poland 8
Keyword(s) problem solving environments, interactive distributed simulation, immersive environment, grid computing, resource management, dynamic load balancing,
Abstract
The paper provides an overview of an experimental, Grid-like computing environment, Polder, and its components. Polder offers high-performance computing and interactive simulation facilities to computational science. It was successfully implemented on a wide-area cluster system, the Distributed ASCI Supercomputer. An important issue is an efficient management of resources, in particular multi-level scheduling and migration of tasks that use PVM or sockets. The system can be applied to interactive simulation, where a cluster is used for high-performance computations, while a dedicated immersive interactive environment (CAVE) offers visualization and user interaction. Design considerations for the construction of dynamic exploration environments using such a system are discussed, in particular the use of intelligent agents for coordination. A case study of simulatedabdominal vascular reconstruction is subsequently presented: the results of computed tomography or magnetic resonance imaging of a patient are displayed in CAVE, and a surgeon can evaluate the possible treatments by performing the surgeries virtually and analysing the resulting blood flow which is simulated using the lattice-Boltzmann method. Copyright © 2002 John Wiley & Sons, Ltd.