Concurrency and Computation:Practice and
Experience
- This site has been replaced as of June 2006 by
Manuscript Central with
instructions at my site
- Here are instructions
for Grid Workflow Special Issue due June 1 2004 -- CLOSED
This homepage supports referees.
Standard Referee Form
Journal Vision
WILEY
Journal
Home Page
Editorial Board
Below you will find active abstracts.
Please send email to Geoffrey Fox,
gcf@indiana.edu, if you wish to referee any article.
We will send you link to full text online.
Articles under Consideration for Journal
Submitted in 2002-2005 (top)
- The Grid 2002: Architectures, Environments and Applications
Book and Special Issue home page
- Here are instructions for 2002
Java Grande/ISCOPE Special Issue due January 15 2003 -- CLOSED
- C588: Coordinating Components in Middleware Systems
- Abstract:Configuration and coordination are central issues
in the design and implementation of middleware systems and are one of the
reasons why building such systems is more di.cult and complex than constructing
stand-alone sequential programs. Through configuration, the structure of the
system is established which elements it contains, where they are located
and how they are interconnected. Coordination is concerned with the interaction
of the various components when an interaction takes place, which parties
are involved, what protocols are followed. Its purpose is to coordinate the
behaviour of the various components in a way that meets the overall system
speci.cation. The open and adaptive nature of middleware systems makes the task
of configuration and coordination particularly challenging. We propose a model
that can operate in such an environment and enables the dynamic integration and
coordination of components through observation of their behaviour.
- Matthias Radestock, Susan Eisenbach
- Trans Enterprise Computer Communications, The Lee Valley
Tecnopark, Ashley Road London N17 9LN; Department of Computing, Imperial
College of Science, Technology and Medicine, London SW7 2BZ, UK
- sue@doc.ic.ac.uk
- Received January 8 2002, Comments to Authors April 18 2002,
Accepted August 6 2002
- C589: DEADLOCK DETECTION IN MPI PROGRAMS
- Abstract:Message Passing Interface (MPI) is commonly used
to write parallel programs for distributed memory parallel computers. MPI-CHECK
is a tool developed to aid in the debugging of MPI programs that are written in
free or fixed format Fortran 90 and Fortran 77. This paper presents the methods
used in MPI-CHECK 2.0 to detect many situations where actual and potential
deadlocks occur when using blocking and non-blocking point-to-point routines as
well as when using collective routines.
- Glenn Luecke, Yan Zou, James Coyle, Jim Hoekstra, Marina
Kraeva
- High Peformance Computing Group Iowa State University, Ames, Iowa
50011-2251
- grl@iastate.edu
- Received January 22 2002, Comments to Authors April 18 2002,
accepted May 29 2002
- C590: Research Agenda for the Semantic Grid: A Future
e-Science Infrastructure
- Abstract:e-Science offers a promising vision of how
computer and communication technology can support and enhance the scientific
process. It does this by enabling scientists to generate, analyse, share and
discuss their insights, experiments and results in a more effective manner. The
underlying computer infrastructure that provides these facilities is commonly
referred to as the Grid. At this time, there are a number of grid applications
being developed and there is a whole raft of computer technologies that provide
fragments of the necessary functionality. However there is currently a major
gap between these endeavours and the vision of e-Science in which there is a
high degree of easy-to-use and seamless automation and in which there are
flexible collaborations and computations on a global scale. To bridge this
practiceaspiration divide, this report presents a research agenda whose
aim is to move from the current state of the art in e-Science infrastructure,
to the future infrastructure that is needed to support the full richness of the
e-Science vision. Here the future e-Science research infrastructure is termed
the Semantic Grid (Semantic Grid to Grid is meant to connote a similar
relationship to the one that exists between the Semantic Web and the Web). In
more detail, this document analyses the state of the art and the research
challenges that are involved in developing the computing infrastructure needed
for e-Science. In so doing, a conceptual architecture for the Semantic Grid is
presented. This architecture adopts a service oriented perspective in which
distinct stakeholders in the scientific process provide services to one another
in various forms of marketplace. The view presented in the report is holistic,
considering the requirements of e-Science and the e-Scientist at the
data/computation, information and knowledge layers. The data, computation and
information aspects are discussed from a distributed systems viewpoint and in
the particular context of the Web as an established large scale infrastructure.
A clear characterisation of the knowledge grid is also presented. This
characterisation builds on the emerging metadata infrastructure with knowledge
engineering techniques. These techniques are shown to be the key to working
with heterogeneous information and also to working with experts and
establishing communities of e-Scientists. The underlying fabric of the Grid,
including the physical layer and associated technologies, is outside the scope
of this document. Having completed the analysis, the report then makes a number
of recommendations that aim to ensure the full potential of e-Science is
realised and that the maximum value is obtained from the endeavours associated
with developing the Semantic Grid. These recommendations relate to the
following aspects:
- The research issues associated with the technical and
conceptual infrastructure of the Semantic Grid
- The research issues associated with the content
infrastructure of the Semantic Grid
- The bootstrapping activities that are necessary to ensure the
UKs grid and e-Science infrastructure is widely disseminated and
exemplified
- The human resource issues that need to be considered in order
to make a success of the UK e-Science and grid efforts
- The issues associated with the intrinsic process of
undertaking e-Science
- The future strategic activities that need to be undertaken to
maximise the value from the various Semantic Grid endeavours
- David De Roure, Nicholas Jennings and Nigel Shadbolt
- Department of Electronics and Computer Science, University of
Southampton, Southampton SO17 1BJ, UK
- dder@ecs.soton.ac.uk
- Received January 23 2002, Comments to Authors May 29 2002; paper
split up and accepted in Grid Special issue July 9 2002
- C591: Feature Extraction and Visualization of Clusters with
Parallel Processor from Large-Data Sets from Particle Simulations
- Abstract:Simulating natural phenomena at greater accuracy
results in an explosive growth of data. Large - scale simulations with
particles currently involve ensembles consisting of 10^6-10^9 particles in
10^5- 10^6 timesteps. Thus, the data file produced in a single run ranges from
tens of gigabytes to hundreds of terabytes. This data bank allows one to
reconstruct the spatio-temporal evolution of both the particle system in whole
and each particle separately. Realistically, to look at large data set at full
resolution at all times is not necessary. We show that the agglomerative
clustering technique, based on the mutual nearest neighborhood (MNN) concept,
can be easily adapted for efficient visualization of huge data sets from
large-scale simulations with particles at different resolution level. We
present the parallel algorithm for MNN clustering and its timings on the IBM SP
and SGI/Origin 3800 multiprocessor systems. The high efficiency obtained is
mainly due to the similarity in the algorithmic structure of MNN clustering and
particle methods. We show examples drawn from MNN application in visualization
and analysis on the order of a few hundred gigabytes of data from discrete
particle simulations, using dissipative particle dynamics and fluid particle
model. Besides material sciences, this clustering procedure can be applied to
other fields, such as earthquake events and stellar populations in nebula
clusters.
- Krzysztof Boryczko, Witold Dzwinel, David A.Yuen
- AGH Institute of Computer Science, al. Mickiewicza 30, 30-059,
Kraków, Poland; Minnesota Supercomputer Institute, University of
Minnesota, Minneapolis, Minnesota 55415-1227, USA
- dwitek@msi.umn.edu
- Received January 29 2002, Comments to Authors May 29 2002,
Accepted June 14 2002
- C592: The Real-Time Message Passing Interface Standard
(MPI/RT-1.1)
- Abstract:The MPI/RT standard is the product of the work of
many people working in an open community standards group over a period of six
plus years. The purpose of this archival publication is to preserve the
significant knowledge and experience that was developed in real-time message
passing systems as a consequence of the R&D e®ort as well as in the
specification of the standard. Interestingly, several implementations of MPI/RT
(as well as comprehensive test suites) have been created in industry and
academia over the period during which the standard was created. MPI/RT is
likely to gain adoption interest over time, and this adoption may be driven by
the promulgation of the standard including this publication. We expect that,
when people are interested in understanding options for reliable, QoS-oriented
parallel computing with message passing, MPI/RT will serve as a foundation for
such a study, whether or not its complete formalism is accepted into other
systems or standards. MPI/RT is an o®shoot of MPI-1, and retains many of
the communication patterns of MPI- 1. However, MPI/RT has investigated issues
of fine-grain concurrency and highest achievable performance in many ways that
were evidently inappropriate for MPI-1 to do in the scientific computing space
in which it resides, with its much broader audience. MPI/RT focuses on
earlybinding (planned transfer), concurrent message passing, while integrating
multiple real-time models: time-based, event-driven, and priority-oriented
channels. Group admission control and declarative (deferred early binding)
semantics support the goal of hard-real-time for the message-passing component
of computation. Importantly, MPI/RT emphasizes the decoupling of message
transfer and process/thread scheduling as part of its contribution to parallel
processing with QoS. Buffer management and state transition diagrams are also
integral to the notion of streaming data into and out of processors in a way
that is consistent with QoS, and friendly to zero-copy approaches to
communication. MPI/RT has also made strides in the direction of a parallel
middleware specification by emphasizing an object-oriented design for the API
compared to an object-based API or ad hoc API. The advantages of these are
plain in the standard, in that the functionality has useful polymorphic
adaptations where needed. Furthermore, the concepts that derive from MPI-1
(such as collective operations) appear as objects in MPI/RT. This modification
has allowed for the removal of certain constructs in MPI-1 (such as the
communicator), in favor of a specification and implementation phase for objects
that describe communication in MPI/RT. Overall, a cleaner, more extensible
design exists, which does not utilize more resources per se than those which
are needed to admit the required channels for a program. Both offline and
online admission control is contemplated, and multiple modes are supported,
albeit weakly. MPI/RT-1.1, the standard version described here, does not cover
all possible real-time parallel programming possibilities. It is silent
concerning process/thread scheduling, so in some sense, still has strong
aspects of "best effort," in terms of process scheduling. Leaving process
scheduling as an orthogonal concern was intentional, so that the best concepts
in these areas would be used in concert with MPI/RT, rather than offering a
monolith. Furthermore, MPI/RT-1.1 does not explicitly address mode changes
(with guaranteed modechange QoS) between sets of channels, with invariants and
non-invariants among the resources consumed. This remains important work for
the future. The object-oriented, resource-conscious approach of MPI/RT
naturally extends to multiple modes. Work to realize this in a standard or in
prototypes remains for future work, although it was discussed and prototyped
extensively during standardization. It is interesting to consider whether the
connection-oriented, but limited QoS and resource specification of MPI/RT leads
to a more scalable or less scalable system environment than that posed by MPI
and similar middleware. While connections themselves indicate that resources
will be assigned per connection, only those connections that are
program-mandated are actually built. By way of contrast, in MPI, it is
necessary to offer a virtual all-to-all communication topology, and introduce
overheads associated with either the static realization of such a topology, or
else the dynamic build-up/tear down of connections in constrained environments
seeking to scale. Events over the past six years, involving the evolution of
networking technology make MPI/RT as interesting as it was when started, and
possibly of more ubiquitous application in the long term. Infiniband, Rapid
I/O, 3GIO, and other System Area Network standards are likely to offer
rudimentary QoS over time in real applications. Likewise, the production of
massively concurrent supercomputers (104 nodes or more), is likely to drive the
need for predictable message passing in the runtime aspects of such systems.
These events are likely to cause the ideas and concepts defined in this
standard to have impact in areas far broader than originally anticipated.
- Anthony Skjellum, Arkady Kanevsky, Yoginder S. Dandass,Jerrell
Wattsy, Steve Paavola ,Dennis Cottel, Greg Henleyz, L. Shane Hebertx, Zhenqian
Cui, Anna Rounbehler
- Mississippi State University; MPI Software Technology, Inc.;
Network Appliance; California Institute of Technology; Syracuse University; SKY
Computers; SPAWAR Systems Center, US Navy; Raytheon Company; Real-Time Message
Passing Interface (MPI/RT) Forum http://www.mpirt.org
- tony@hpcl.msstate.edu
- Received February 9 2002, Comments to Authors May 29 2002,
accepted December 31 2002
- C593: DYNAMIC CLASS MODEL IN OBJECT-ORIENTED DESIGN
- Abstract:This paper proposes design with the help of
objet-oriented modelling. The class model of designed object can be dynamically
changed by a system. Thanks to the specificity of the class model we receive
new emergent solutions and a warranty of space sparing. The advantage of the
proposed approach is a well-grounded theoretical base and polynomial time of
operations. It gives a possibility of practical applications and extensions of
proposed model.
- Marcin Skowron
- Institute of Computer Science Jagiellonian University Nawojki 11,
30-072 Kraków, Poland
- Marcin.Skowron.Student@softlab.ii.uj.edu.pl
- Received 12 February 2002, Comments to Author July 4 2002
- Full Paper:
../CCPEwebresource/c593skowron/c593skowron.pdf
- C594: A parallel data assimilation model for oceanographic
observations
- Abstract:In this paper we describe the development of a
program that aims at the optimal integration of observed data in an
oceanographic model describing the water transport phenomena in the Agulhas
area at the tip of South Africa. Two parallel implementations, MPI and OpenMP,
are described and experiments with respect to speed and scalability on a Compaq
AlphaServer SC and an SGI Origin3000 are reported
- Fons van Hees, Aad J. van der Steen; Peter Jan van Leeuwen
- Computational Physics, Utrecht University P.O. Box 80195, 3508 TD
Utrecht; Institute for Marine and Atmospheric Research Utrecht, Utrecht
University P.O. Box 80005, 3508 TA Utrecht
- A.vanderSteen@phys.uu.nl
- Received 13 February 2002, Comments to Author July 4 2002,
accepted August 6 2002
- Full Paper:
../CCPEwebresource/c594vandersteen/c594join.pdf
- C595: A Comparison of Task Pools for Dynamic Load Balancing
of Irregular Algorithms
- Abstract:Since a static data distribution does not give
satisfactory results for parallel irregular algorithms, there is need for a
dynamic distribution of data that can be adapted to the current runtime
behavior of the algorithm. Task pools are data structures which can distribute
data dynamically to different processors.
This paper discusses the
characteristics of task-based algorithms and describes the implementation of
selected types of task pools for shared-memory multiprocessors. Several task
pools have been implemented in C with POSIX threads and in Java. Results of
these implementations measured on three different shared-memory systems are
shown for a synthetic algorithm and the parallel hierarchical radiosity
method.
- Matthias Korch and Thomas Rauber
- Martin-Luther-Universit at Halle-Wittenberg, Fachbereich
Mathematik und Informatik, Institut für Informatik D-06099, Halle (Saale),
Germany
- korch@informatik.uni-halle.de
- Received 26 February 2002, Comments to Author July 4 2002,
accepted December 31 2002
- Full Paper: ../CCPEwebresource/c595korch/c595tplong.pdf
- C596: Towards an Improved Memory Model for Java
- Abstract:The Java Language Specification contains a
collection of rules that describe how a thread's working memory should interact
with the main memory in a multi-threaded, shared variables environment. We
formalize and analyze the Java memory model rules from the original
specification. We are able to remove many unnecessary constraints and show
deficiences. We show that sequential consistency does not hold in the original
memory model and propose a correction. We also add rules for the prevention of
deadlock. Finally we discuss possible compiler optimizations based on our
memory model.
- Vishnu Kotrarajas and Susan Eisenbach
- Department of Computing, Imperial College, 189 Queen's Gate,
London SW7 2BZ, U.K.
- sue@doc.ic.ac.uk
- Received 20 March 2002, Comments to Author July 4 2002
- C597: Multi-Wavelength Image Space: Another Grid-Enabled
Science
- Abstract:We describe how the Grid enables new research
possibilities in astronomy through multi-wavelength images. To see sky images
in the same pixel space, they must be projected to that space, a
compute-intensive process. There is thus a virtual data space induced that is
defined by an image and the applied projection. This virtual data can be
created and replicated with Planners and Replica catalog technology developed
under the GriPhyN project. We plan to deploy our system (MONTAGE) on the US
Teragrid. Grid computing is also needed for ingesting data -- computing
background correction on each image -- which forms a separate virtual data
space. Multi-wavelength images can be used for pushing source detection and
statistics by an order of magnitude from current techniques; for optimization
of multi-wavelength image registration for detection and characterization of
extended sources; and for detection of new classes of essentially
multi-wavelength astronomical phenomena. The paper discusses both the grid
architecture and the scientific goals.
- Roy Williams, Bruce Berriman, Ewa Deelman, John Good, Joseph
Jacob, Carl Kesselman, Carol Lonsdale, Seb Oliver, Thomas A. Prince
- CACR, California Institute of Technology, USA, IPAC, California
Institute of Technology, USA, ISI, University of Southern California, USA, JPL,
California Institute of Technology, USA, Astronomy Centre, University of
Sussex, UK
- roy@cacr.caltech.edu
- Received May 24 2002, Comments to Authors August 6 2002, Accepted
August 27 2002
- C605: Guiding RMI Synthesis with Interactive Graph
Placement
- Abstract:CentiJ is a metacomputing system that message
forwards through an RMI (Remote Method Invocation) layer for distributed
computation by using a bridge pattern code synthesizer. CentiJ reuses original
implementations, without altering the source code. It also provides for a means
of partitioning a sequential program across a network of workstations (NOWs) by
using interactive graph placement. The graph placement algorithms is
force-based, treating nodes as charged bodies and arcs as springs. Interaction
is used to take advantage of the programmers knowledge about a program.
The assumption is that the programmer can do a better job at course-grained
problem-partitioning than any modern parallelizing compiler.
- Douglas Lyon
- Computer Engineering Dept. Fairfield University, Fairfield CT
06430
- lyon@docjava.com
- Received June 12 2002, Comments to Author August 14 2002
- C608: A Unified Peer-to-Peer Database Framework and its
Application for Scalable Service Discovery
- Abstract:In a large distributed system spanning many
administrative domains such as a DataGrid, it is often desirable to maintain
and query dynamic and timely information about active participants such as
services, resources and user communities. However, in such a database system,
the set of information tuples in the universe is partitioned over one or more
distributed nodes, for reasons including autonomy, scalability, availability,
performance and security. It is not obvious how to enable general-purpose
discovery query support and collective collaborative functionality that operate
on the distributed system as a whole, rather than on a given part of it.
Further, it is not obvious how to allow for search results that are fresh,
allowing dynamic content. It appears that a Peer-to-Peer (P2P) database network
may be well suited to support dynamic distributed database search, for example
for service discovery. In this paper, we take the first steps towards unifying
the fields of database management systems and P2P computing, which so far have
received considerable, but separate, attention. We extend database concepts and
practice to cover P2P search. Similarly, we extend P2P concepts and practice to
support powerful general-purpose query languages such as XQuery and SQL. As a
result, we devise the Unified Peer-to-Peer Database Framework (UPDF), which
allows to express specific applications for a wide range of data types, node
topologies (e.g. ring, tree, graph), query languages (e.g. XQuery, SQL), query
response modes (e.g. Routed, Direct and Referral Response), neighbor selection
policies (in the form of an XQuery), pipelining, timeout and other scope
characteristics.
- Wolfgang Hoschek
- CERN IT Division, European Organization for Nuclear Research,
1211 Geneva 23, Switzerland
- wolfgang.hoschek@cern.ch
- Received June 21 2002, Comments to Author 14 2002
- C609: The Web Service Discovery Architecture
- Abstract:An enabling step towards increased Internet
software execution exibility is the web services vision of distributed
computing where programs are no longer configured with static information.
Rather, the promise is that programs are made more flexible and powerful by
querying Internet databases (registries) at runtime in order to discover
information and network attached third-party building blocks. Services can
advertise themselves and related metadata via such databases, enabling the
assembly of distributed higher-level components. In this paper, we propose and
specify a discovery architecture that supports this vision, the so-called Web
Service Discovery Architecture (WSDA). WSDA promotes an interoperable web
service layer by defining appropriate services, interfaces, operations and
protocol bindings for discovery. It embraces and integrates solid industry
standards. It is modular because it defines a small set of orthogonal
multi-purpose communication primitives (building blocks) for discovery. These
primitives cover service identification, service description retrieval, data
publication as well as minimal and powerful query support. Each communication
primitive is designed to avoid any unnecessary complexity. WSDA is open and
flexible because each primitive can be used, implemented, customized and
extended in many ways. It is powerful because the individual primitives can be
combined and plugged together by specific clients and services to yield a wide
range of behaviors and emerging synergies. It is unified because it subsumes an
array of disparate concepts, interfaces and protocols under a single
semi-transparent umbrella. It is non-intrusive because it offers interfaces but
does not mandate that every service must comply to these interfaces.
- Wolfgang Hoschek
- CERN IT Division, European Organization for Nuclear Research,
1211 Geneva 23, Switzerland
- wolfgang.hoschek@cern.ch
- Received June 10 2002, Comments to Author June 28 2002, Accepted
August 14 2002
- C610: Impact of Mixed-Parallelism on Parallel
Implementations of Strassen and Winograd Matrix Multiplication Algorithms
- Abstract:In this paper we study the impact of the
simultaneous exploitation of data- and task-parallelism, so-called mixed
parallelism, on Strassen and Winograd matrix multiplication algorithms. For
each of these algorithms, we propose two mixed-parallel implementations. The
former follows the phases of the original algorithms while the latter has been
designed as the result of a list scheduling algorithm. We give a theoretical
comparison.
- F. Desprez and F. Suter
- LIP, ENS Lyon, 46 allee d'Italie, 69364 Lyon Cedex 07,
France
- Frederic.Desprez@ens-lyon.fr
- Received July 1 2002, Comments to Author October 21 2002,
Accepted July 24 2003
- C614: Smart Data: The Source of - and the Solution to -
Security Risks
- Abstract:Note (from editor) this is a "vision" paper. It
is suggested that online data, as it is annotated in the years to come by
experts, will incrementally become far more usable by all users, including
those who violate security controls. In fact, the main source of security in
data, its natural heterogeneity and the resulting difficulty in making use of
it, will sharply diminish in the years to come. But this "smartness" of data
can also be exploited in order to greatly enhance data security. Further, it is
suggested that the development of four different software technologies, if
guided by an integrated vision, could serve as a broad-based and powerful
platform for the collaborative specification and enforcement of security
controls, and for the maintenance of these controls through multiple cycles of
data transformation, integration, and reuse.
- Roger (Buzz) King
- Dept. of Computer Science, University of Colorado, Boulder CO,
80309
- roger@cs.colorado.edu
- Received July 3 2002, Comments to Author October 21 2002,
Accepted December 31 2002
- C624: Mobile Agents and Web Databases: Models and
Experimentation
- Abstract:In this paper we present practical experiences
gathered from the employment of two popular Java-based mobile-agent platforms,
IBM's Aglets and Mitsubishi's Concordia. We present some basic distributed
computing models and describe their adaptation to the mobile-agent paradigm.
Upon these models we develop a set of frameworks for distributed database
access over the World-Wide Web, using IBM's Aglets and Mitsubishi's Concordia
platforms. We compare the two platforms both quantitatively and qualitatively.
For the quantitative comparison, we propose, employ, and validate an approach
to evaluate and analyze mobile-agent framework performance. For the qualitative
assessment, we present our observations about the programmability and
robustness of, and mobility provided by, the two platforms.
- G. Samaras, M. Dikaiakos, C. Spyrou
- Department of Computer Science, University of Cyprus, P.O. Box
20537, 1678 Nicosia, Cyprus
- cssamara@ucy.ac.cy
- Received July 11 2002, Comments to Authors December 31 2002
- Cluster 2001 Special Issue: 6 papers in final accepted form at
../CCPEwebresource/c635toc640clustersi
- C641: A Comparison of Concurrent Programming and
Cooperative Multithreading under Load Balancing Applications
- Abstract: Two models of thread execution are the general
concurrent programming execution model (CP) and the cooperative multithreading
execution model (CM). CP provides nondeterministic process execution where
context switches occur arbitrarily. CM provides threads that execute one at a
time until they explicitly choose to yield the processor. This paper uses two
classic applications to reveal the advantages and disadvantages of load
balancing during process execution under CP and CM styles. These applications
are programmed in two different languages (SR and Dynamic C) on different
hardware (standard PCs and embedded system controllers). An SR-like run-time
system, DesCaRTeS, was developed to provide inter-process communication for the
Dynamic C implementations. This paper compares load balancing and non- load
balancing implementations; it also compares CP and CM style implementations.
The results show that: in cases of very high or very low workloads, load
balancing slightly hindered performance; and in cases of moderate workload,
both SR and Dynamic C implementations of load balancing generally performed
well. Further, for these applications, CM style programs outperform CP style
programs in some cases, but the opposite occurs in some other cases. This paper
also discusses qualitative tradeoffs between CM style programming and CP style
programming for these applications.
- Justin T. Maris, Aaron W. Keen, Takashi Ishihara, and Ronald A.
Olsson
- Department of Computer Science, University of California, Davis,
CA 95616 USA
- olsson@cs.ucdavis.edu
- Received July 31 2002, Comments to Authors December 31 2002,
Accepted February 14 2003
- C642 (Performance Section): Comparing the Performance of
MPICH with Cray's MPI and with SGI's MPI
- Abstract: The purpose of this paper is to compare the
performance of MPICH with the vendor MPI on a Cray T3E-900 and an SGI Origin
3000. Seven basic communication tests which include basic point-to-point and
collective MPI communication routines were chosen to represent commonly used
communinication patterns. Cray's MPI performed better (and sometimes
significantly better) than MSU's MPICH for small and medium messages. They both
performed about the same for the large messages. However for 3 tests MSU's
MPICH was about 20 percent faster than Cray's MPI. SGI's MPI performed and
scaled better (and sometimes significantly better) than MPICH for all messages,
except for the scatter test where MPICH outperformed SGI's MPI for 1 Kbyte
messages. The poor scalability of MPICH on the Origin 3000 suggests there may
be scalability problems with MPICH.
- Glenn R. Luecke, Marina Kraeva, Lili Ju
- Iowa State University Ames Iowa
- grl@iastate.edu
- Received 16th October 2001, Revised 2nd July 2002, Accepted 1st
Aug 2002
- C643 (Performance Section): A Scalable HPF Implementation
of a Finite Volume CEM Application on a CRAY T3E Parallel System
- Abstract: The time-dependent Maxwell equations are one of
the most important approaches to describing dynamic or wide-band frequency
electromagnetic phenomena. A sequential nite volume, characteristic-based
procedure for solving the time-dependent, three-dimensional Maxwell equations
has been implemented in Fortran successfully before. Due to its large
requirement of memory space and high demand of CPU time, it is impossible to
test the code with large array size. Hence, it is essential to implement the
code on a parallel computing system. In this paper, we discuss an eÆcient
and scalable parallelization of the sequential Fortran time-dependent Maxwell
equations solver using High Performance Fortran (HPF). The background of the
project, the theory behind the efficiency being achieved, the parallelization
methodologies employed, and the experimental results obtained on the Cray T3E
massively parallel computing system will be described in detail. Experimental
runs show that the execution time is reduced drastically through parallel
computing. The code is scalable up to 98 processors on the Cray T3E and has a
similar performance compared to an MPI implementation. Based on the
experimentation carried out in this research, we believe that a high level
parallel programming language such as High Performance Fortran is a fast,
viable and economical approach to parallelizing many existing sequential codes
which exhibit a lot of parallelism.
- Yi Pan, Joseph J.S. Shang and Minyi Guo
- Department of Computer Science, Georgia State University,
Atlanta, GA 30303, USA; Air Vehicle Directorate Air Force Research Laboratory
WPAFB, OH 45433-7521, USA; Department of Computer Software, The University of
Aizu, Aizu-Wakamatsu City, Fukushima 965-8580 Japan
- pan@cs.gsu.edu
- Received 16th October 2001, Revised 1st August 2002, Accepted 8th
Aug 2002
- C650: A Summary of Grid Computing Environments
- Abstract: This short paper summarizes a set of 28 papers
gathered together by the GCE (Grid Computing Environment) group of the Global
Grid Forum. This set is published in 2002 as a special issue of Concurrency and
Computation: Practice and Experience. The papers are listed at the end of this
report together with associated papers in a Grid Computing book
- Geoffrey Fox, Dennis Gannon, Mary Thomas
- Department of Computer Science, 215 Lindley Hall,150 S. Woodlawn
Ave,. Bloomington IN 47405-7104; Texas Advanced Computing Center, The
University of Texas at Austin, 10100 Burnet Road, Austin, Texas 78758
- gcf@indiana.edu
- Accepted August 6 2002
- Full Paper:
../CCPEwebresource/c650gridgcespcissueoverview/c650gcesurvey.pdf
- C651: OpenMP-Oriented Applications for Distributed Shared
Memory Architectures
- Abstract: The fast emergence of OpenMP as the preferable
parallel programming paradigm for small-to-medium scale parallelism could
decline unless OpenMP will show capabilities to be the model-of-choice for
large scale high performance parallel computing of the next decade. The main
stumbling block from adapting OpenMP for distributed shared memory (DSM)
machines, which are based on architecture like cc-NUMA, stems from the absence
of capabilities for data placement among processors and threads for achieving
data locality. The absence of such mechanism causes remote memory accesses and
inefficient cache memory use, both of which lead to poor performance. This
paper presents a simple software programming approach called
Copy-inside-Copy-back (CC) that exploits the privatization mechanism of OpenMP
for data placement and re-placement. This technique enables one to distribute
data without taking the control and the exibility from the programmer, and
thus, is an alternative to the traditional implicit and explicit approaches.
Moreover, the CC approach enables SPMD style of programming that makes the
development process of an OpenMP application more structured, and simply to
modify and debug. The CC technique was tested and analyzed using the NAS
Parallel Benchmarks on SGI Origin 2000 multiprocessors machine. The lesson
learnt from this study shows that OpenMP can deliver the desired large- scale
parallelism although fast copy mechanism is essential.
- Ami Marowka, Zhenying Liu, Barbara Chapman
- Department of Computer Science, The University of Houston,
Texas
- amimar2@yahoo.com
- Received August 9 2002, Comments to Authors December 31 2002,
Accepted February 14 2003
- C652: Parallel Operation of CartaBlanca on Shared and
Distributed Memory Computers
- Abstract:We describe the parallel performance of the pure
Java CartaBlanca code on heat transfer and multiphase fluid flow problems.
CartaBlanca is designed for parallel computations on partitioned unstructured
meshes. It uses Javas thread facility to manage computations on each of
the mesh partitions. Inter-partition communications are handled by two compact
objects for node-by-node communication along partition boundaries and for
global reduction calculations across the entire mesh. For distributed
calculations, the JavaParty package from the University of Karlsruhe is
demonstrated to work with CartaBlanca.
- N. T. Padial-Collins, W. B. VanderHeyden, D. Zhang, E. D. Dendy,
D. Livescu
- Los Alamos National Laboratory Theoretical Division and Los
Alamos Computer Science Institute Los Alamos, NM 87545
- nelylanl@lanl.gov
- Received August 13 2002, Comments to Authors December 31 2002,
Accepted January 12 2003
- C653: Parallel Program Debugging by Specification
- Abstract:Most message passing parallel programs employ
logical process topologies with regular characteristics to support their
computation. Since process topologies define the relationship between
processes, they present an excellent opportunity for debugging. The primary
benefit is that process behaviours can be correlatred, allowing expected
behaviour to be abstracted and identified, and undesirable behaviour reported.
However topology support is inadequate in most message parallel programming
environments, including the popular Message Passing Interface (MPI) and the
Parallel Virtual Machine (PVM). Programmers are forced to implement topology
support themselves, increasing the possibility of introducing errors.
This
paper introduces a trace- and topology-based contract-like approach to parallel
program debugging, driven by four distinct types of specification. Trace
interpretation specifications allow trace data from a variety of sources and
message passing libraries.to be interpreted in an abstract manner, and topology
specifications address the lack of explicit topology knowledge, and facilitate
the construction of user-consistent views of the debugging activity. Loop
specifications express topology-consistent views of the debugging activity,
allowing conformance testing of associated trace data, and error specifications
specify undesirable event interactions, including mismatched message sizes and
mismatched communication pairs. Both loop and error specifications are
simplified by having knowledge of the actual topologies being debugged.
The
wealth of new debugging views and techniques made possible by our contract-like
approach are also discussed.
- Simon Huband and Chris McDonald
- Department of Computer Science and Software Engineering, The
University of Western Australia, Crawley, Western Australia 6009,
Australia.
-
huey@csse.uwa.edu.au
- Received August 16 2002, Comments to Authors December 31 2002,
Accepted March 2 2003
- C655: Improving the official specification of Java bytecode
verification
- Abstract: Bytecode verification is the main mechanism to
ensure type safety in the Java Virtual Machine. Inadequacies in its official
specification may lead to incorrect implementations where security can be
broken and/or certain legal programs are rejected. This paper provides a
comprehensive analysis of the specification along with concrete suggestions for
improvement.
- Alessandro Coglio
- Kestrel Institute, 3260 Hillview Avenue, Palo Alto, CA 94304, USA
- coglio@kestrel.edu
- Received 8 January 2002, Revised 4 August 2002, Accepted August
15 2002
- C658: Modular specification of frame properties in JML
- Abstract:We present a modular specification technique for
frame properties. The technique uses modifies clauses and abstract fields with
declared dependencies. Modularity is guaranteed by a programming model that
enforces data abstraction by preventing representation and argument exposure, a
semantics of modifies clauses that uses a notion of ``relevant location,'' and
by modularity rules for dependencies. For concreteness, we adapt this technique
to the Java Modeling Language, JML.
- P. Muller, A. Poetzsch-Heffter, and G. T. Leavens
- Dept. of Computer Science, Iowa State University, 229 Atanasoff
Hall, Ames, Iowa, 50011-1040, USA
-
leavens@cs.iastate.edu
- Received October 1 2001, Revised October 2 2002, Accepted October
5 2002
- C659: Parallel LU Factorization of Sparse Matrices on
FPGA-Based Configurable Computing Engines
- Abstract:Reconfigurable computing has demonstrated in the
last decade capability to significantly improve the performance of a wide range
of computation-intensive applications. With steady advances in silicon
technology, as predicted by Moore's law, Field-Programmable Gate Array (FPGA)
technologies have enabled the implementation of System-On-a Programmable-Chip
(SOPC) computing platforms, which, in turn, have given a significant boost to
the field of reconfigurable computing. It is possible now to implement various
specialized parallel machines within one silicon die. For example, LU
factorization is widely used in engineering and science to solve efficiently
large systems of linear equations. In this paper, we describe our design and
implementation of a parallel machine on an SOPC development board, using
multiple copies of the Altera soft configurable processor, namely Nios(r); we
use this machine for LU factorization. Our implementation facilitates the
efficient solution of linear equations at a cost much lower than that of
supercomputers and networks of workstations. The intricacies of our FPGA-based
design are presented along with tradeoff choices made for the purpose of
illustration. Performance results prove the viability of our approach.
- Xiaofang Wang and Sotirios G. Ziavras
- Department of Electrical and Computer Engineering New Jersey
Institute of Technology University Heights Newark, New Jersey 07102, USA
- ziavras@njit.edu
- Received October 19 2002, Comments to Authors December 31 2002,
Accepted January 17 2003
- C660: Frameworks for Incorporating Semantic Relationships
into Object-Oriented Database Systems
- Abstract: A semantic relationship is a data modeling
construct that connects a pair of classes or categories and has inherent
constraints and other functionalities that precisely reflect the
characteristics of the specific relationship in an application domain. Examples
of semantic relationships include part-whole, ownership, materialization, and
role-of. Such relationships are important in the construction of information
models for advanced applications, whether one is employing traditional data
modeling techniques, knowledge-representation languages, or object-oriented
modeling methodologies. This paper focuses on the issue of providing built-in
support for such constructs in the context of object-oriented database (OODB)
systems. Most of the popular object-oriented modeling approaches include some
semantic relationships in their repertoire of data modeling primitives.
However, commercial OODB systems, which are frequently used as implementation
vehicles, tend not to do the same. We will present two frameworks by which a
semantic relationship can be incorporated into an existing OODB system. The
first only requires that the OODB system support manifest type with respect to
its instances. The second assumes that the OODB system has a special kind of
metaclass facility. The two frameworks are compared and contrasted. In order to
ground our work in existing systems, we show the addition of a part-whole
semantic relationship both to the ONTOS DB/Explorer OODB system and the VODAK
Model Language (VML).
- Michael Halper, Li-min Liu, James Geller, and Yehoshua Perl
- Dept. of Mathematics & Computer Science, Kean University,
Union, NJ 07083 USA; Dept. of Applied Mathematics Chung Yuan Christian
University, Chung-Li, Taiwan; CS Dept. NJIT Newark, NJ 07102 USA
- mhalper@kean.edu
- Received February 15 2002, Revised October 28 2002, Accepted
November 2 2002
- C661: Enhancing OODB Semantics to Support Browsing in an
OODB Vocabulary Representation
- Abstract: In previous work, we have modeled a vocabulary
given as a semantic network by an OODB (Object-Oriented Database). The OODB
schema thus obtained provides a compact abstract view of the vocabulary. This
enables fast traversal of the vocabulary by a user. In the semantic network
vocabulary, the IS-A relationships express the specialization hierarchy. In our
OODB modeling of the vocabulary, the SUBCLASS relationship expresses the
specialization hierarchy of the classes and supports the inheritance of their
properties. A typical IS-A path in the vocabulary has a corresponding shorter
SUBCLASS path in the OODB schema. In the current paper we expose several cases
where the SUBCLASS hierarchy fails to fully correspond to the IS-A hierarchy of
the vocabulary. In these cases there exist traversal paths in the semantic
network for which there are no corresponding traversal paths in the OODB
schema. The reason for this failure is the existence of some IS-A relationships
between concepts of two classes, that are not connected by a SUBCLASS
relationship. This phenomenon weakens the accuracy of our modeling. To rectify
the situation we introduce a new OODB semantic relationship IS-A' to represent
the existence of IS-A relationships between concepts of a pair of classes which
are not connected via a SUBCLASS relationship. The resulting schema contains
both SUBCLASS relationships and IS-A' relationships which completely models the
IS-A hierarchy of the vocabulary. We define a mixed class level traversal path
to contain either SUBCLASS or IS-A' relationships. Consequently, each traversal
path in the semantic network has a corresponding mixed traversal path in the
OODB schema. Hence the introduction of the semantic OODB IS-A' relationship
improves the modeling of semantic network vocabularies by OODBs.
- Li-min Liu, James Geller and Yehoshua Perl
- Department of Applied Mathematics,Chung Yuan Christian University
22, Pu-Zen, Pu-Chung Li, Chung-Li, Taiwan, R.O.C.; Computer Science Department,
New Jersey Institute of Technology, Newark, NJ 07102 USA
-
thescientist00@yahoo.com
- Received February 22 2002, Revised October 30 2002, Accepted
November 2 2002
- C662: Efficient Parallel Implementations of Unstructured
Mesh Generation with High Performance Fortran
- Abstract:Unstructured mesh generation exposes highly
irregular computation patterns, which imposes a challenge in implementing
triangulation algorithms on parallel machines. This paper reports on an
efficient parallel implementation of locally Delaunay triangulation with High
Performance Fortran (HPF). Our algorithm exploits embarrassing parallelism by
performing sub-block triangulation and boundary merge independently at the same
time. The sub-block triangulation is by a divide and conquer Delaunay algorithm
known for its sequential efficiency, and the boundary triangulation is by an
incremental construction algorithm with low overhead. Compared to prior work,
our parallelization method is both simple and efficient. In the paper, we also
describe a solution to the collinear points problem that usually arises in
large data sets. Our experiences with the HPF implementation show that with
careful control of the data distribution, we are able to parallelize the
program using HPF's standard directives and extrinsic procedures. Experimental
results on several parallel platforms, including an IBM SP2 and a DEC Alpha
farm, show that a parallel efficiency of 31%-66% can be achieved for an 8-node
distributed memory system. We also compare efficiency of the HPF implementation
to that of a similarly hand-coded MPI implementation.
- Min-Bin Chen, Tyng-Ruey Chuang and Jan-Jan Wu
- Department of Management Information Systems, Chung Kuo Institute
of Technology, Mucha, Taipei 116, Taiwan; Institute of Information Science,
Academia Sinica, Nankang, Taipei
- cmb@mail.ckitc.edu.tw
- Received November 23 2002, Comments to Authors March 8 2003,
Revised June 17 2003, Accepted July 25 2003
- C663: A PC Cluster System Employing the IEEE 1394
- Abstract:In this paper, we describe the design and
evaluation of a PC cluster system in which IEEE 1394 is applied. Networks for
parallel cluster computing require low latency and high bandwidth. It is also
important that the networks be commercially available at low cost. Few network
devices satisfy all of the above requirements. However, the IEEE 1394 standard
provides a good compromise for fulfilling these requirements. We have used IEEE
1394, which supports a 400 Mbps data transfer rate, to connect the nodes of a
PC cluster system which we have designed and implemented. We have implemented
two communication libraries. One is a fast communication library called CF for
IEEE 1394. The other is an MPI layer library on the CF library. Experimental
results show that CF achieves a 17.2 microsecond round-trip time. On
application benchmarks, the system was considerably faster than TCP/IP over
Fast Ethernet. Even though the system was constructed at very low cost, it
provides good performance. Using the IEEE 1394 standard is thus a good solution
for low-cost cluster systems.
- Kazuki Hyoudou, Ryota Ozaki and Yasuichi Nakayama
- Department of Computer Science, The University of
Electro-Communications Chofu, Tokyo 182-8585 Japan
- hyoudo-k@igo.cs.uec.ac.jp
- Received November 29 2002, Comments to Authors March 8 2003,
Revised June 9 2003, Accepted July 24 2003
- C664: Data Structures in Java for Matrix Computations
- Abstract:In this paper it is shown how to utilize Java
arrays for matrix computations. We discuss the disadvantages of Java arrays
when used as two-dimensional array for dense matrix computation, and how to
improve the performance. We show how to create efficient dynamic data
structures for sparse matrix computations using Java's native arrays. We
construct a data structure for large sparse matrices that is unique for Java.
This datastructure is shown to be more dynamic and efficient than the
traditional storage schemes for large sparse matrices. Numerical results show
that this new data structure, called Java Sparse Array, is competitive with the
traditional Compressed Row Storage scheme on matrix computation routines. Java
gives increased flexibility without loosing efficiency. Compared to other
object oriented data structures it is shown that Java Sparse Arrays has the
same flexibility.
- Geir Gundersen and Trond Steihaug
- Department of Informatics, University of Bergen, Norway
-
Trond.Steihaug@ii.uib.no
- Received December 10 2002, Comments to Authors March 8 2003,
Revised May 5 2003, Accepted July 24 2003
- C665(Performance Section): Performance and Scalability of
MPI on PC Clusters
- Abstract:The purpose of this paper is to compare the
communication performance and scalability of MPI communication routines on an
NT cluster, a Myrinet Linux cluster, an Ethernet Linux cluster, a Cray T3E-600,
and an SGI Origin 2000. All tests in this paper were run for the various
numbers of processors and 2 message sizes. For most of the MPI tests used in
this paper, the T3E-600 and Origin 2000 outperform the NT cluster, the Myrinet
and Ethernet Linux clusters. In spite of the fact that the Cray T3E-600 is
about 5 years old, it performs best of all machines for most of the tests. For
mpi_bcast, mpi_allgather, and mpi_alltoall, the Myrinet Linux cluster
outperforms the NT cluster. For all other MPI collective routines, the NT
cluster outperforms the Myrinet Linux cluster. For all MPI collective routines,
the Myrinet Linux cluster performs significantly better than the Ethernet Linux
cluster.
- Glenn R. Luecke, Jing Yuan, Silvia Spanoyannis, Marina
Kraeva
- 292 Durham Center Iowa State University Ames, Iowa 50011,
USA
- grl@iastate.edu
- Received February 6 2001, Revised November 14 2002, Accepted
December 3 2002
- Full Paper:../CCPEwebresource/c665pemcs16/c665pemcspaper16.pdf
- C666: A role for Pareto optimality in mining performance
data
- Abstract:Improvements in performance modeling and
identification of computational regimes within software libraries is a critical
first step in developing software libraries that are truly agile with respect
to the application as well as to the hardware. It is shown here that Pareto
ranking, a concept from multi-objective optimization, can be an effective tool
for mining large performance datasets. The approach is illustrated using
software performance data gathered using both the public domain LAPACK library
and an asynchronous communication library based on IBM LAPI active message
library.
- Joël M. Malard
- Pacific Northwest National Laboratory Battelle Boulevard, P.O.
Box 999 Richland, WA 99352
- JM.Malard@pnl.gov
- Received December 31 2002, Comments to Authors March 8 2003,
accepted December 30 2003
- C667: A Performance Study of Job Management Systems
- Special Issue: International Workshop on Performance
Modeling, Evaluation, and Optimization of Parallel and Distributed Systems
(PMEO-PDS'02), held in conjunction with IEEE IPDPS 2002, April 15-19, 2002,
Fort Lauderdale, USA; editors Mohamed Ould-Khaoua and Geyong Min
- T. El-Ghazawi1
- The George Washington University, George Mason University
- tarek@seas.gwu.edu
- May 2002, Revised: Dec. 2002, Accepted: Dec. 2002
- C668: Supporting Bulk Synchronous Parallelism with a High
Bandwidth Optical Interconnect
- Special Issue: International Workshop on Performance
Modeling, Evaluation, and Optimization of Parallel and Distributed Systems
(PMEO-PDS'02), held in conjunction with IEEE IPDPS 2002, April 15-19, 2002,
Fort Lauderdale, USA; editors Mohamed Ould-Khaoua and Geyong Min
- I. Gourlay
- Informatics Research Institute, University of Leeds, Leeds LS2
9JT
-
iain@comp.leeds.ac.uk
- Recieved: May 2002, Revised: Oct. 2002, Accepted: Dec. 2002
- C669: Towards a More Realistic Comparative Analysis of
Multicomputer Networks
- Special Issue: International Workshop on Performance
Modeling, Evaluation, and Optimization of Parallel and Distributed Systems
(PMEO-PDS'02), held in conjunction with IEEE IPDPS 2002, April 15-19, 2002,
Fort Lauderdale, USA; editors Mohamed Ould-Khaoua and Geyong Min
- H. Sarbazi-Azad
- Department of Computing Science, University of Glasgow, Glasgow
G12 8QQ.
- azad@sharif.edu
- May 2002, Revised: Nov. 2002, Accepted: Dec. 2002
- C670: RF-MVTC: An EÆcient Risk-Free Multiversion
Concurrency Control Algorithm
- Special Issue: International Workshop on Performance
Modeling, Evaluation, and Optimization of Parallel and Distributed Systems
(PMEO-PDS'02), held in conjunction with IEEE IPDPS 2002, April 15-19, 2002,
Fort Lauderdale, USA; editors Mohamed Ould-Khaoua and Geyong Min
- A. Boukerche
- Department of Computer Sciences, University of North of
Texas
-
boukerche@cs.unt.edu
- Recieved: June 2002, Revised: Nov 2002, Accepted: Dec. 2002
- C671: On the Performance of Circuit-Switched Networks in
the Presence of Correlated Traffic
- Special Issue: International Workshop on Performance
Modeling, Evaluation, and Optimization of Parallel and Distributed Systems
(PMEO-PDS'02), held in conjunction with IEEE IPDPS 2002, April 15-19, 2002,
Fort Lauderdale, USA; editors Mohamed Ould-Khaoua and Geyong Min
- G. Min
- Department of Computing, School of Informatics, University of
Bradford, Bradford, BD7 1DP, U.K.
- g.min@brad.ac.uk
- May 2002, Revised: Dec. 2002, Accepted: Dec. 2002
- C672: Performance Evaluation of a Random-Walk-Based
Algorithm for Embedding Dynamically Evolving Trees in Hypercubic Networks
- Special Issue: International Workshop on Performance
Modeling, Evaluation, and Optimization of Parallel and Distributed Systems
(PMEO-PDS'02), held in conjunction with IEEE IPDPS 2002, April 15-19, 2002,
Fort Lauderdale, USA; editors Mohamed Ould-Khaoua and Geyong Min
- K. Li
- Department of Computer Science, State University of New York, New
Paltz, New York 12561-2499
-
li@mcs.newpaltz.edu
- May 2002, Revised: Oct. 2002, Accepted: Dec. 2002
- C673: On the Scalability of Many-to-many Reliable Multicast
Sessions
- Special Issue: International Workshop on Performance
Modeling, Evaluation, and Optimization of Parallel and Distributed Systems
(PMEO-PDS'02), held in conjunction with IEEE IPDPS 2002, April 15-19, 2002,
Fort Lauderdale, USA; editors Mohamed Ould-Khaoua and Geyong Min
- W. Y. Yoon
- Samsung Advanced Institute of Technology, South Korea
-
wyyoon@sait.samsung.co.kr
- June 2002, Revised: Dec. 2002, Accepted: Dec. 2002
- C674: One-to-All Broadcasting Scheme for Arbitrary Static
Interconnection Networks with Bursty Background Traffic
- Special Issue: International Workshop on Performance
Modeling, Evaluation, and Optimization of Parallel and Distributed Systems
(PMEO-PDS'02), held in conjunction with IEEE IPDPS 2002, April 15-19, 2002,
Fort Lauderdale, USA; editors Mohamed Ould-Khaoua and Geyong Min
- Demetres Kouvatsos
- Performance Modelling and Engineering Research Group, Department
of Computing, University of Bradford BD7 1DP, Bradford, UK
-
d.d.kouvatsos@scm.brad.ac.uk
- Received October 19 2002
- C675: Editorial on Benchmarking Special Issue
- Special Issue: International Workshop on Performance
Modeling, Evaluation, and Optimization of Parallel and Distributed Systems
(PMEO-PDS'02), held in conjunction with IEEE IPDPS 2002, April 15-19, 2002,
Fort Lauderdale, USA; editors Mohamed Ould-Khaoua and Geyong Min
- Mohamed Ould-Khaoua and Geyong Min
- Glasgow University; Bradford University
- mohamed@dcs.gla.ac.uk
and G.Min@Bradford.ac.ukG.Min@Bradford.ac.uk
- Received February 16 2003
- C676: Toward the Design of a Parallel, Linear Algebraic
System
- Abstract:We describe a generic, software framework for the
solution of linear algebraic systems sufficient to provide for their use in a
parallel, iterative context. Some base assumptions about the structure of the
underlying problem and problem domain, as well as the initial distribution of
the parallel structures are described to provide a framework for the discussion
of an implementation using templated C++ code; however, the design has been
made sufficiently flexible that it could easily be applied to other
implementations of the base data structures.
- W. D. Turner and J. E. Flaherty
- Visualization and Computer Vision, GE Global Research,
Schenectady, NY 12301; Scientific Computation Research Center, and Department
of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180
- flahej@rpi.edu
- Received February13 2003, Comments to Authors April 28 2003
- C677: Linda implementations in Java for concurrent systems
- Abstract: This paper surveys a number of the
implementations of Linda that are available in Java. It provides some
discussion of their strengths and weaknesses, and presents the results from
benchmarking experiments using a network of commodity workstations. Some
extensions to the original Linda programming model are also presented and
discussed, together with examples of their application to parallel processing
problems.
- G. C. Wells, A. G. Chalmers, and P. G. Clayton
- Department of Computer Science, Rhodes University, Grahamstown,
6140, South Africa; Department of Computer Science, University of Bristol,
Merchant Venturers Building, Bristol BS8 1UB, U.K.
- G.Wells@ru.ac.za
- Received February 13 2003, Comments to Authors April 28 2003,
Revised 17 June 2003, Accepted July 24 2003
- C678: Studying Protein Folding on the Grid: Experiences
Using CHARMM on NPACI Resources under Legion
- Abstract:One benefit of a computational grid is the
ability to run high-performance applications over distributed resources simply
and securely. We demonstrated this benefit with an experiment in which we
studied the protein-folding process with the CHARMM molecular simulation
package over a grid managed by Legion, a grid operating system.
High-performance applications can take advantage of grid resources if the grid
operating system provides both low-level functionality as well as high-level
services. We describe the nature of services provided by Legion for
high-performance applications. Our experiences indicate that human factors
continue to play a crucial role in the configuration of grid resources,
underlying resources can be problematic, grid services must tolerate underlying
problems or inform the user, and high-level services must continue to evolve to
meet user requirements. Our experiment not only helped a scientist perform an
important study, but also showed the viability of an integrated approach such
as Legions for managing a grid.
- Anand Natrajan, Nancy Wilkins-Diehr, Anthony D. Fox, Michael
Crowley, Marty A. Humphrey, Andrew S. Grimshaw, Charles L. Brooks III
- Department of Computer Science University of Virginia; San Diego
Supercomputing Center University of California at San Diego; Department of
Molecular Biology The Scripps Research Institute
-
an4m@cs.virginia.edu
- Received February 6 2003, Comments to Author February 6 2003,
Revised February 27 2003, Accepted March 2 2003
- C679: Abstracting Asynchronous Object Interaction using
First-Class Functions and Continuations in an Event-Driven System
- Abstract: The shift in focus of distributed computing from
local-area networks to geographically-distributed environments highlights the
need for new distributed programming models. In this work we explore the
combination of a single-threaded event-driven execution model with a small
number of concepts traditionally found in functional languages. We describe a
distributed programming system, called Orfeo, based on a simple yet powerful
procedural language and on asynchronous remote function calls. We discuss how
to use continuations and first-class function values to construct several
useful control abstractions, showing how a small set of powerful concepts can
provide the basis for a highly asynchronous and flexible distributed
programming system.
- Maria Julia de Lima, Noemi Rodriguez, Roberto Ierusalimschy
- PUC-Rio
-
noemi@inf.puc-rio.br
- Received March 14 2003, Comments to Authors Oct 7 2003
- C680: Parallel Object-Oriented Framework Optimization
- One of ten papers in Special Edition of Concurrency:
Practice and Experience dedicated to selected papers presented at the ninth
Compilers for Parallel Computers workshop held in Edinburgh, UK in June, 2001
(CPC 2001).
- Daniel P. Quinlan, Markus Schordan, Brian Miller and Markus
Kowarschik
- Center for Applied Scientific Computing, Lawrence Livermore
National Laboratory, Livermore CA USA; System Simulation Group, Department of
Computer Science, University of Erlangen-Nuremberg, Germany
- dquinlan@llnl.gov
- Received Sept 20 2001, Revised May 29 2002, Accepted March 14
2003
- C681: A compiler for multiple memory models
- One of ten papers in Special Edition of Concurrency:
Practice and Experience dedicated to selected papers presented at the ninth
Compilers for Parallel Computers workshop held in Edinburgh, UK in June, 2001
(CPC 2001).
- S. P. Midkiff, J. Lee and D.A. Padua
- School of Electrical and Computer Engineering, Purdue University;
School opf Computer Science and Engineering, Seoul National University;
Department of Computer Science, University of Illinois at Urbana-Champaign
-
smidkiff@purdue.edu
- Received Sept 20 2001, Revised May 29 2002, Accepted July 1st
2002
- C682: Compiling Data-Parallel Programs for Clusters of
SMPs
- One of ten papers in Special Edition of Concurrency:
Practice and Experience dedicated to selected papers presented at the ninth
Compilers for Parallel Computers workshop held in Edinburgh, UK in June, 2001
(CPC 2001).
- Siegfriend Benkner and Thomas Brandes
- Institute of Software Science, University of Vienna Austria;
Fraunhofer Institute for Algorithms and Scientific Computation, St. augustin
Germany
- sigi@ieee.org
- Received Sept 20 2001, Revised May 29 2002, Accepted July 1st
2002
- C683: Group-SPMD programming with orthogonal processor
groups
- One of ten papers in Special Edition of Concurrency:
Practice and Experience dedicated to selected papers presented at the ninth
Compilers for Parallel Computers workshop held in Edinburgh, UK in June, 2001
(CPC 2001).
- Thomas Rauber, Robert Reilein and Gudula Runger
- Fak. f¨ur Mathematik und Physik, Universit¨at Bayreuth,
95440 Bayreuth, Germany; Fak. f¨ur Informatik, Technische Universit¨at
Chemnitz, 09107 Chemnitz, Germany.
-
rauber@uni-bayreuth.de
- Received Sept 20 2001, Revised May 29 2002, Accepted March 14
2003
- C684: The Effect of Cache Models on Iterative Compilation
for Combined Tiling and Unrolling
- One of ten papers in Special Edition of Concurrency:
Practice and Experience dedicated to selected papers presented at the ninth
Compilers for Parallel Computers workshop held in Edinburgh, UK in June, 2001
(CPC 2001).
- P.M.W. Knijnenburg, T. Kisuki, K. Gallivan and M.F.P.
O'Boyle
- LIACS, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, the
Netherlands peterk@liacs.nl Phone: +31 71 5277063 Fax: +31 71 5276985;
Department of Computer Science, Florida State University; Institute for
Computing Systems Architecture, Edinburgh University
- peterk@liacs.nl
- Received Sept 20 2001, Revised May 29 2002, Accepted March 14
2003
- C685: SWARP: A Retargetable Preprocessor for Multimedia
Instructions
- One of ten papers in Special Edition of Concurrency:
Practice and Experience dedicated to selected papers presented at the ninth
Compilers for Parallel Computers workshop held in Edinburgh, UK in June, 2001
(CPC 2001).
- Gilles Pokam, Stephane Bihan, Julien Simonnet, Francois
Bodin
- IRISA, Campus Universitaire de Beaulieu, 35042 Rennes Cedex,
France
- gpokam@irisa.fr
- Received Sept 20 2001, Revised May 29 2002, Accepted March 14
2003
- C686: Space-Time Mapping and Tiling a Helpful
Combination
- One of ten papers in Special Edition of Concurrency:
Practice and Experience dedicated to selected papers presented at the ninth
Compilers for Parallel Computers workshop held in Edinburgh, UK in June, 2001
(CPC 2001).
- Martin Griebl, Peter Faber, Christian Lengauer
- Fakultat fur Mathematik und Informatik, Universit¨at Passau
D-94030, Passau, Germany
-
griebl@fmi.uni-passau.de
- Received Sept 20 2001, Revised May 29 2002, Accepted March 14
2003
- C687: Data Partitioning-Based Parallel Irregular
Reductions
- One of ten papers in Special Edition of Concurrency:
Practice and Experience dedicated to selected papers presented at the ninth
Compilers for Parallel Computers workshop held in Edinburgh, UK in June, 2001
(CPC 2001).
- Eladio Gutierrez, Oscar Plata, Emilio L. Zapata
- Department of Computer Architecture, University of Malaga
E-29071, Malaga, Spain
- eladio@ac.uma.es
- Received Sept 20 2001, Revised May 29 2002, Accepted March 14
2003
- C688: A Fast and Accurate Method for Evaluating the
Upper-Bound of Memory Performance
- One of ten papers in Special Edition of Concurrency:
Practice and Experience dedicated to selected papers presented at the ninth
Compilers for Parallel Computers workshop held in Edinburgh, UK in June, 2001
(CPC 2001).
- G. Fursin. M.F.P. O'Boyle, O. Temam, G. Watts
- ICSA Division of Informatics, University of Edinburgh JCMB,
King's Buildings Mayfield Rd, Edinburgh EH9 3JZ, UK; Paris South University,
France
- fgg@dcs.ed.ac.uk
- Received Sept 20 2001, Revised May 29 2002, Accepted March 14
2003
- C689: Managing distributed shared arrays in a
bulk-synchronous parallel programming environment
- One of ten papers in Special Edition of Concurrency:
Practice and Experience dedicated to selected papers presented at the ninth
Compilers for Parallel Computers workshop held in Edinburgh, UK in June, 2001
(CPC 2001).
- Christoph W. Kessler
- Programming Environments Laboratory (PELAB), Department of
Computer Science, Linkoping University, S-581 83 Linkoping, Sweden
- chrke@ida.liu.se
- Received Sept 20 2001, Revised May 29 2002, Accepted July 1st
2002
- C690: Introduction to Parallel Computer Compiler Special
Issue
- Editorial for papers in Special Edition of Concurrency:
Practice and Experience dedicated to selected papers presented at the ninth
Compilers for Parallel Computers workshop held in Edinburgh, UK in June, 2001
(CPC 2001).
- Michael O'Boyle
- ICSA Division of Informatics, University of Edinburgh JCMB,
King's Buildings Mayfield Rd, Edinburgh EH9 3JZ, UK
- mob@inf.ed.ac.uk
- Received March 14 2003, Accepted March 14 2003
- C691: A Static Mapping Heuristic for Mapping Parallel
Applications to Heterogeneous Computing Systems
- Abstract: To minimize the execution time of a parallel
application running on a heterogeneous computing distributed system, an
appropriate mapping scheme to allocate the application tasks to the processors
is needed. The general problem of mapping tasks to machines is a well known
NP-hard problem and several heuristics have been proposed to approximate its
optimal solution. In this paper we propose a static graph-based mapping
algorithm, called Heterogeneous Multi-phase Mapping ($HMM$), that permits a
suboptimal mapping of a parallel application onto a heterogeneous computing
distributed system by using a local search technique together with a tabu
search meta-heuristic. $HMM$ allocates parallel tasks by exploiting the
information embedded in the parallelism forms used to implement an application.
We compare HMM with three different leading techniques and with an exhaustive
mapping algorithm. We also give an example of mapping of a pratical application
where HMM verified its usefulness. Experimental results show that $HMM$
performs well demonstrating the applicability of our approach.
- Ranieri Baraglia, Renato Ferrini, Pierluigi Ritrovato
- Istituto di Scienza e Tecnologie dell'Informazione A. Faedo,
Consiglio Nazionale delle Ricerche Area della Ricerca di Pisa Via Moruzzi 1
56126 Pisa, Italy; Centro di Ricerca in Matematica Pura ed Applicata,
Università degli Studi di Salerno Ponte Don Melillo 84084 Fisciano
(Salerno), Italy.
-
ranieri.baraglia@cnuce.cnr.it
- Received March 24 2003, Comments to Authors Oct 7 2003, Accepted
March 17 2004
- C692: Concurrent Urban-Legends
- Abstract: This discussion addresses a number of urban
legends about concurrency in an attempt to separate the myth from the fact. To
dispel certain legends, we argue that:
- Concurrent != Parallel
- Coroutining != Concurrency
- Synchronization != Mutual Exclusion
- Dekker !< Peterson
- Concurrency != Library
- Inheritance Anomaly != Major Concurrency Problem
- Signalling != Hints
- Spurious Wakeup != Efficiency
- Hyman != Peer-Review Failure
Identifying and understanding the fundamental concepts
underlying concurrency is essential to the field. Equally important is not to
confuse sequential and concurrent concepts. Finally, approaches based solely on
efficiency are insufficient to justify a weak or difficult to use concurrent
concept or construct.
- Peter A. Buhr and Ashif S. Harji
- University of Waterloo 200 University Ave. West Waterloo,
Ontario, CANADA, N2L 3G1
-
pabuhr@plg2.math.uwaterloo.ca
- Received March 24 2003, Comments to Authors Oct 7 2003, Accepted
March 9 2004
- C693: A Cache-Efficient Implementation of the Lattice
Boltzmann Method for the Two-Dimensional Diffusion Equation
- Abstract:The lattice Boltzmann method is an important
technique for the numerical solution of partial differential equations because
it has nearly ideal scalability on parallel computers for many applications.
However, to achieve the scalability and speed potential of the lattice
Boltzmann technique, the issues of data reusability in cache-based computer
architectures must be addressed. Utilizing the two-dimensional diffusion
equation, Tt = µ (Txx + Tyy), this paper
examines cache optimization for the lattice Boltzmann method in both serial and
parallel implementations. In this study speedups due to cache optimization were
found to be 1.9 to 2.5 for the serial implementation and 3.6 to 3.8 for the
parallel case in which the domain decomposition was optimized for stride-one
access. In the parallel non-cached implementation the method of domain
decomposition (horizontal or vertical) used for parallelization did not
significantly affect the compute time. In contrast the cache-based
implementation of the lattice Boltzmann method was significantly faster when
the domain decomposition was optimized for stride-one access. Additionally, the
cache-optimized lattice Boltzmann method in which the domain decomposition was
optimized for stride-one access displayed superlinear scalability on all
problem sizes as the number of processors was increased.
- A. C. Velivelli and K. M. Bryden
- Department of Mechanical Engineering Iowa State University, Ames,
IA 50011
-
kmbryden@iastate.edu
- Received March 29 2003, Comments to Authors Oct 8 2003, Accepted
November 16 2003
- C694: Finding stale-value errors in concurrent programs
- Abstract:Concurrent programs can suffer from many types of
errors, not just the well-studied problems of deadlocks and simple race
conditions on variables. This paper addresses a kind of race condition that
arises from reading a variable whose value is possibly out-of-date. The paper
introduces a simple technique for detecting such stale values, and reports on
the encouraging experience with a compile-time checker that uses the
technique.
- Michael Burrows and K. Rustan M. Leino
- Compaq Systems Research Center (Currently Microsoft
Research)
-
leino@microsoft.com
- Received May 1 2003, Comments to Authors Oct 8 2003, Accepted
November 13 2003
- C695: XWPVM: A New Graphical Interface for PVM on Microsoft
Windows Platform
- Abstract: The present article proposes the XWPVM --- a new
graphical interface for PVM implemented on Microsoft Windows platform. XWPVM
assists the user in developing distributed parallel applications. The article
approaches PVM, XPVM, describes the new XWPVM model, its implementation,
results, and presents some conclusions.
- Carlos B. Rosa Junior, Otávio A. S. Carpinteiro, Antonio
C. Zambroni de Souza
- Instituto de Engenharia Elétrica Universidade Federal de
Itajubá Av. BPS 1303, Itajubá, MG, 37500-903, Brazil
- otavio@iee.efei.br
- Received May 20 2003, Comments to Authors Oct 8 2003
- C696: The J2EE ECperf Benchmark Results: Transient Trophies
or Technology Treasures?
- Abstract: ECperf, the widely recognized industry standard
J2EE benchmark, has attracted a large number of results submissions and their
subsequent publication. However, ECperf places little restriction on the
hardware platform, operating systems and databases utilized in the benchmarking
process. This, combined with the existence of only two primary metrics, makes
it difficult to accurately compare the performance of the Application Server
products themselves. By mining the full-disclosure archives for trends and
correlations we have discovered that J2EE technology is very scalable both in a
scale-up and scale-out manner. Other observed trends include, a linear
correlation between middle-tier total processing power and throughput, as well
as between J2EE Application Server license costs and throughput. However, the
results clearly indicate that there is an increasing cost per user with
increasing capacity systems, and scale-up is proportionately more expensive
than scale-out. Finally, the correlation between middle-tier processing power
and throughput, combined with results obtained from a different
'lighter-weight' benchmark, facilitates an estimate of throughput for different
types of J2EE applications.
- Paul Brebner, Jeffrey Gosper
- CSIRO Mathematical and Information Sciences, GPO Box 664,
Canberra, AUSTRALIA
-
Paul.Brebner@csiro.au
- Received May 21 2003; Accepted July 21 2003
- C697: PIPELINES ON HETEROGENEOUS SYSTEMS: MODELS AND
TOOLS
- Abstract: We study the performance of pipeline algorithms
in heterogeneous networks. The concept of heterogeneity is not restricted only
to the differences in computational power of the nodes but also refers to the
network capabilities. We develop an skeleton tool that allows an efficient
block-cyclic mapping of pipelines on heterogeneous systems. The tool supports
pipelines with a number of stages much larger than the number of physical
processors available. We derive an analytical formula that allows to predict
the performance of pipelines in heterogeneous systems. According to the
analytical complexity formula, numerical strategies to solve the optimal
mapping problem are proposed. The computational results prove the accuracy of
the predictions and effectiveness of the approach
- Francisco Almeida, Luz Marina Moreno, Daniel González and
Casiano Rodríguez
- Dpto. Estadística, I. O. y Computación Edificio
Físicas/Matemáticas, Universidad de La Laguna La Laguna, Tenerife
Spain
- falmeida@ull.es
- Received May 29 2003, Comments to Authors Oct 8 2003, Accepted
March 9 2004
- C698: Autonomic Oil Reservoir Optimization on the Grid
- Abstract:The emerging Grid infrastructure and its support
for seamless and secure interactions is enabling a new generation of autonomic
applications where the application components, Grid services, resources, and
data interact as peers to manage, adapt and optimize themselves and the overall
application. In this paper we describe the design, development and operation of
a prototype of such an application that uses peer-to-peer interactions between
distributed services and data on the Grid to enable the autonomic optimization
of an oil reservoir.
- Viraj Bhat, Vincent Matossian, Manish Parashar, Magorzata
Peszynska, Mrinal Sen, Paul Stoffa and Mary F. Wheeler
- The Applied Software Systems Laboratory, Rutgers University,
Piscataway, NJ.; Center for Subsurface Modeling, University of Texas at Austin,
Austin, TX.; Institute for Geophysics, University of Texas at Austin, Austin,
TX.
-
parashar@caip.rutgers.edu
- Received June 14 2003, Comments to Authors Oct 8 2003, Accepted
December 6 2003
- C699: Loosely Coordinated Coscheduling in the Context of
Other Approaches for Dynamic Job Scheduling: A Survey
- Abstract: Loosely coordinated (implicit/dynamic)
coscheduling is a time-sharing approach which originates from NOW environments
of mixed parallel/sequential workloads and limited software support. It is
meant to be an easy-to-implement and scalable approach. Considering that the
percentage of clusters in parallel computing is increasing and easily portable
software is needed, loosely-coordinated coscheduling becomes an attractive
approach for dedicated machines, too. Loose coordination offers attractive
features as a dynamic approach. Static approaches for local job scheduling
assign resources exclusively and nonpreemptively. Such approaches still remain
beyond the desirable resource utilization and average response times.
Conversely, approaches for dynamic scheduling of jobs can preempt resources
and/or adapt their allocation. They typically provide better resource
utilization and response times. Existing dynamic approaches are full preemption
with checkpointing, dynamic adaptation of CPU allocation, and time sharing via
gang or loosely coordinated coscheduling. This survey presents and compares the
different approaches, while focussing especially on the less well explored
loosely coordinated time sharing. The discussion focuses especially on the
implementation problems, in terms of modification of standard operating
systems, the runtime system and the communication libraries.
- Angela C. Sodan
- University of Windsor, Computer Science, 401 Sunset Ave.,
Windsor, ON N9B 3P4 Canada
-
acsodan@davinci.newcs.uwindsor.ca
- Received June 12 2003, Comments to Authors Oct 10 2003, Accepted
March 9 2004
- C700: Scheduling Communication in Multithreaded Programs:
Experimental Results
- Abstract: when the critical path of a communication
session between end-points includes the actions of operating system kernels,
there are attendant overheads. Along with other factors, such as functionality
and flexibility, such overheads motivate and favor the implementation of
communication protocols in user-space. When implemented with threads, such
protocols may hold the key to optimal communication performance and
functionality. Based on implementations of reliable user-space protocols
supported by a threads framework, we focus on our experiences with internal
threads' scheduling techniques and their potential impact on performance. We
present scheduling strategies that enable threads to do both application-level
and communication-related processing. With experiments performed on a Sun
SPARC-5 LAN environment, we show how different scheduling strategies yield
different levels of application-processing efficiency, communication latency
and packet-loss. This work forms part of a larger study on the implementation
of multiple threads-based protocols in a single address-space, and the benefits
of coupling protocols with applications.
- Juan Carlos Gomez, Vernan Rego and V.S. Sunderam
- Department of Computer Sciences, Purdue University, West
Lafayette, IN 47907; Department of Mathematics and Computer Science, Emory
University, Atlanta, GA 30322
- rego@cs.purdue.edu
- Received June 16 2003, Comments to Authors Oct 10 2003, Accepted
April 12 2004
- C731: Fixed Memory Performance Of A 3-D Electromagnetic PIC
Model
- Abstract:Using a parallel 3 dimensional fully kinetic
electromagnetic particle-in-cell (PIC) model, the fixed memory performance was
examined thoroughly on the SGI-Origin 2000 supercomputer with 128 processors.
This parallel PIC model is decomposed only in one dimension with fixed domain
size in each processor. The detailed computational and communication overhead
times incurred by each of the members such as the density solver, the Maxwell
solver and the particle pusher, constituting our PIC model is presented here.
This type of performance analysis is very important in understanding and
improving the overall efficiency of the model.
- Saikat Saha
- Department of Electrical and Computer Engineering, The University
of Alabama in Huntsville, Huntsville, AL-35899, USA.
-
saikat_saha@hotmail.com
- Received June 2 2003, Comments to Authors Oct 10 2003
- C732: The Performance and Scalability of SHMEM and MPI-2
One-Sided Routines on a SGI Origin 2000 and a Cray T3E-600
- Abstract: This paper compares the performance and
scalability of SHMEM and MPI-2 one-sided routines on different communication
patterns for a SGI Origin 2000 and a Cray T3E-600. The communication tests were
chosen to represent commonly-used communication patterns with low contention
(accessing distant messages, a circular right shift, a binary tree broadcast)
to communication patterns with high contention (a "naive" broadcast and an
all-to-all). For all the tests and for small message sizes, the SHMEM
implementation significantly outperformed the MPI-2 implementation for both the
SGI Origin 2000 and the Cray T3E-600.
- Glenn R. Luecke, Silvia Spanoyannis, Marina Kraeva
- Iowa State University Ames, Iowa 50011-2251 USA
- grl@iastate.edu
- Performance Section: Received November 20 2001, Revised
September 10 2002, Accepted December 13 2002
- C733: Semantic Resource Grid: Model, Method and
Platform
- Abstract: This paper proposes the Semantic Resource Grid
model, which aims at effectively sharing using and managing versatile resources
across the Internet. The kernel of the Resource Grid includes a resource space
model RSM and a uniform resource-using mechanism RUM. The RSM is a coordinate
system with independent coordinates and orthogonal axes for uniformly
specifying resources. A resouirce space has three schemas: a user-view schema
that enables users to easily operate and intelligently use resources, a
universal-resource-view schema for uniformly managing globally distributed
resources, and a semantic-web-view schema that provides across-platform and
machine-understandable resource representation. The RUM provides not only the
end-users with an operable resource browser to operate resources in the
user-view schema by using the built-in Resource Operation Language ROL but also
the application developers with the ROL-based programming environment. The
criteria and method for guiding designers to design a proper resource space are
presented. A software platform based on the proposed model has been implemented
and used for resource sharing and management in distributed research
teams.
- Hai Zhuge
- Knowledge Grid Research Group, Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences,
Beijing, 100080, P.R. China
- zhuge@ict.ac.cn
- Received June 4 2003, Comments to Author Oct 10 2003, Accepted
November 13 2003
- C734: An Analytical Study of MPI Middleware Architecture
and its Impact on Performance
- Abstract:This paper presents an analytical study on MPI
middleware architecture for the MPI-1.2 standard, major distinguishing
characteristics of MPI implementations and their impacts on parallel
application performance. The model used to characterize parallel performance is
reviewed and metrics complementing this model and useful for describing
messgae-passing performance are introduced. They capture key aspects of
application performance and are practically measurable.
The key
contribution of this paper is a demonstration that the architecture of the MPI
library has a strong impact on its ability to deliver application performance,
and that pollling architectures (typical of early and still commonly used MPI
implementations) are suboptimal in terms of parallel application runtime. A
classification taxonomy for MPI-1 libraries is offered, and existing libraries
are so classified.
This paper offers comparisons of blocking and polling
modes of the MPI/Pro implementation of MPI, offering a direct comparison of
these strategies for implementing MPI. A complementary direct comparison of
MPI/Pro and MPICH performance is also offered to show the net benefits of
blocking MPI implementation on ceratin real applications' performance.
This
paper demonstrates the need for performance analysis of MPI libraries to go
beyond micro-benchmarking of zero-message-length latency and asymptotic message
bandwidth in order to grade the "quality" of an MPI implementation.
- Rossen Dimitrov and Anthony Skjellum
- Dept. of Computer and Information Sciences, University of Alabama
at Birmingham, High Performance Computing Laboratory, 1300 University Blvd.,
115A Campbell Hall, Birmingham AL 39294-1170
-
tony@MPI-SoftTech.Com
- Received July 2 2003, Comments to Authors Oct 10 2003
- C735: Semantic Link Network Builder and Intelligent
Semantic Browser
- Abstract:Semantic Link Network SLN is a semantic web model
using semantic links to replace the hyperlinks of the current Web. Semantic
Link Network Builder SLN-Builder is a software tool that enables definition,
modification, verification and access of SLN. The SLN-Builder can convert an
SLN definition into XML descriptions for cross-platform information exchange.
The Intelligent Semantic Browser can browse SLN and carry out two types of
reasoning: small granularity reasoning by chaining semantic links and large
granularity reasoning by matching semantic views of SLN. With the help of
reasoning mechanism, the browser can recommend the content that is semantically
relevant to the current browsing content and enables the user to foresee the
end-side content of a semantic link chain. This paper presents the design and
implementation of the SLN-Builder and the intelligent browser as well as the
key algorithms.
- Hai Zhuge and Ruixiang Jia
- Knowledge Grid Research Group, Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences,
Beijing, 100080, P.R. China
- zhuge@ict.ac.cn
- Received 18 July 2003, Comments to Authors Oct 10 2003, Accepted
November 16 2003
- C736: Augmenting LZ-77 with authentication and integrity
assurance capabilities
- Abstract: The formidable dissemination capability allowed
by the current network technology makes it increasingly important to devise new
methods to ensure authenticity and integrity. Nowadays it is common practice to
distribute documents in compressed form. In this paper, we propose a simple
variation on the classic LZ-77 algorithm that allows one to hide, within the
compressed document, enough information to warrant its authenticity and
integrity. The design is based on the unpredictability of a certain class of
pseudo- random number generators, in such a way that the hidden data cannot be
retrieved in a reasonable amount of time by an attacker (unless the secret
bit-string key is known). Since it can still be decompressed by the original
LZ-77 algorithm, the embedding is completely \transparent" and
backward-compatible, making it possible to deploy it without disrupting
service. Experiments show that the degradation in compression due to the
embedding is almost negligible.
- Special Issue of SAC'03 Security Track edited by Ronaldo
Menezes
- Mikhail J. Atallah, Stefano Lonardi
- CERIAS and Department of Computer Sciences, Purdue University,
West Lafayette, IN 47907, USA.; Department of Computer Science and Engineering,
University of California, Riverside, CA 92521, USA.
- stelo@cs.ucr.edu
- Received 27 June 2003, Accepted August 23 2003
- C737: Identification and Authentication of Integrated
Circuits
- Abstract:This paper describes a technique to reliably and
securely identify individual integrated circuits (ICs) based on the precise
measurement of circuit delays and a simple challenge- response protocol. This
technique could be used to produce key-cards that are more difficult to clone
than ones involving digital keys on the IC. We consider potential venues of
attack against our system, and present candidate implementations. Experiments
on Field Programmable Gate Arrays show that the technique is viable, but that
our current implementations could require some strengthening before it can be
considered as secure.
- Special Issue of SAC'03 Security Track edited by Ronaldo
Menezes
- Blaise Gassend, Daihyun Lim, Dwaine Clarke, Marten van Dijk,
Srinivas Devadas
- Massachusetts Institute of Technology, Laboratory for Computer
Science, Cambridge, MA 02139, USA; Prof Holstlaan 4, Eindhoven, The
Netherlands
- blaise@gassend.com
- Received 27 June 2003, Accepted August 23 2003
- C738: Access-Controlled Resource Discovery in Pervasive
Networks
- Abstract:Networks of the future will be characterized by a
variety of computational devices that display a level of dynamism not seen in
traditional wired networks. Because of the dynamic nature of these networks,
resource discovery is one of the fundamental problems that must be solved.
While resource discovery systems are not a novel concept, securing these
systems in an efficient and scalable way is challenging. This paper describes
the design and implementation of an architecture for access-controlled resource
discovery. This system achieves this goal by integrating access control with
the Intentional Naming System (INS), a resource discovery and service location
system. The integration is scalable, efficient, and ts well within a
proxy-based security framework designed for dynamic networks. We provide
performance experiments that show how our solution outperforms existing
schemes. The result is a system that provides secure, access- controlled
resource discovery that can scale to large numbers of resources and users.
- Special Issue of SAC'03 Security Track edited by Ronaldo
Menezes
- Sanjay Raman, Dwaine Clarke, Matt Burnside, Srinivas Devadas,
Ronald Rivest
- Massachusetts Institute of Technology, Laboratory for Computer
Science, Cambridge, MA 02139, USA
-
sraman@abp.lcs.mit.edu
- Received 27 June 2003, Accepted August 23 2003
- C739: A role-based infrastructure management system: design
and implementation
- Abstract: Over the last decade there has been tremendous
advance in the theory and practice of role-based access control (RBAC). One of
the most significant aspects of RBAC can be viewed from its management of
permissions on the basis of roles rather than individual users. Consequently,
it reduces administrative costs and potential errors. The management of roles
in various RBAC implementations, however, tends to be conducted in an ad-hoc
basis, closely coupled with a certain context of system environments. This
paper discusses the development of a system whose purpose is to help manage a
valid set of roles with assigned users and permissions for role-based
authorization infrastructures. We have designed and implemented the system,
called RolePartner. The system enables role administrators to build and
configure various components of a RBAC model so as to embody organizational
access control policies which can be separated from di®erent enforcement
mechanisms. Hence the system helps make it possible to lay a foundation for
role-based authorization infrastructures. Three methodological constituents for
our purpose are introduced, together with the design and implementation issues.
The system has a role-centric view for easily managing constrained and
hierarchical roles as well as assigned users and permissions. An
LDAP-accessible directory service was used for a role database. We show that
the system can be seamlessly integrated with an existing privilege-based
authorization infrastructure.
- Special Issue of SAC'03 Security Track edited by Ronaldo
Menezes
- Dongwan Shin, Gail-Joon Ahn, Sangrae Cho, and Seunghun Jin
- The University of North Carolina at Charlotte; Electronics and
Telecommunications Research Institute
- doshin@uncc.edu
- Received 27 June 2003, Accepted August 23 2003
- C740: Editorial for Special Issue of SAC'03 Security
Track
- Abstract: Editorial for Special Issue of SAC'03 Security
Track
- Special Issue of SAC'03 Security Track edited by Ronaldo
Menezes
- rmenezes@cs.fit.edu
- Florida Institute of Technology, Department of Computer Sciences,
150 West University Blvd., Melbourne, FL 32901
- rmenezes@cs.fit.edu
- Accepted August 14 2003
- C741: Simple verification techniques for complex Java
bytecode subroutines
- Abstract:Java is normally compiled to bytecode, which is
verified and then executed by the Java Virtual Machine. Bytecode produced via
compilation must pass verification. The main cause of complexity for bytecode
verification is subroutines, used by compilers to generate more compact code.
The techniques to verify subroutines proposed in the literature reject certain
programs produced by mundane compilers, are difficult to realize within an
implementation of the Java Virtual Machine, or are relatively complicated. This
paper presents a novel technique which is very simple to understand, implement,
and prove sound. It is also very powerful: the set of accepted programs has a
simple characterization which most likely includes all code produced by current
compilers and which enables future compilers to make more extensive use of
subroutines.
- Special Issue for Formal Techniques for Java-like Programs
workshop at ECOOP.2002Málaga, Spain, June 10, 2002 edited by Peter
Mueller, David Naumann and Erik Poll
- Alessandro Coglio
- Kestrel Institute, 3260 Hillview Avenue, Palo Alto, CA 94304,
USA
- coglio@kestrel.edu
- Received January 5 2003, Revised May 9 2003, Accepted July 25
2003
- C742: Checking Ownership and Confinement
- Abstract:A number of proposals to manage aliasing in
Java-like programming languages have been advanced over the last five years. It
is not clear how practical these proposals are, that is, how well they relate
to the kinds of programs currently written in Java-like languages. To address
this problem, we analysed heap snapshots from a corpus of Java programs. Our
results indicate that object-oriented programs do in fact exhibit symptoms of
encapsulation in practice, and that proposed models of uniqueness, ownership,
and confinement can usefully describe the aliasing structures of
object-oriented programs. Understanding the kinds of aliasing present in
programs should help us to design formalisms to make explicit the kinds of
aliasing implicit in object-oriented programs.
- Special Issue for Formal Techniques for Java-like Programs
workshop at ECOOP.2002 Málaga, Spain, June 10, 2002 edited by Peter
Mueller, David Naumann and Erik Poll
- Alex Potanin, James Noble, and Robert Biddle
- School of Mathematical and Computing Sciences, Victoria
University of Wellington, PO Box 600, Wellington, New Zealand
- kjx@mcs.vuw.ac.nz
- Received January 5 2003, Revised May 9 2003, Accepted July 25
2003
- C743: Analysing the Java Package/Access Concepts in
Isabelle/HOL
- Abstract: Java access modifiers and packages provide a
mechanism to restrict access to members and types, as an additional means of
information hiding beyond the purely object-oriented concept of classes. In
this paper we clarify the semantics of access modifiers and packages by adding
them to our formal model of Java in the theorem prover Isabelle/HOL. We analyse
which properties we can rely on at runtime, provided that the program has
passed the static accessibility tests.
- Special Issue for Formal Techniques for Java-like Programs
workshop at ECOOP.2002 Málaga, Spain, June 10, 2002 edited by Peter
Mueller, David Naumann and Erik Poll
- Norbert Schirmer
- Technische Universitaet Muenchen, Department of Informatics,
D-85748 Garching, Germany
- schirmer@in.tum.de
- Received January 5 2003, Revised May 9 2003, Accepted July 25
2003
- C744: Transposing F to C#: Expressivity of parametric
polymorphism in an object-oriented language
- Abstract: We present a type-preserving translation of
System F (the polymorphic lambda calculus) into a forthcoming revision of the
C# programming language supporting parameterized classes and polymorphic
methods. The forthcoming revision of Java in JDK 1.5 also makes a suitable
target. We formalize the translation using a subset of C# similar to
Featherweight Java. We prove that the translation is fully type-preserving and
that it preserves behaviour via the novel use of an environment-style semantics
for System F. We observe that whilst parameterized classes alone are sufficient
to encode the parameterized datatypes and let-polymorphism of languages such as
ML and Haskell, it is the presence of dynamic dispatch for polymorphic methods
that supports the encoding of the "first-class polymorphism" found in System F
and recent extensions to ML and Haskell.
- Special Issue for Formal Techniques for Java-like Programs
workshop at ECOOP.2002 Málaga, Spain, June 10, 2002 edited by Peter
Mueller, David Naumann and Erik Poll
- Andrew Kennedy and Don Syme
- Microsoft Research, Cambridge, U.K., 7 J J Thomson Ave, Cambridge
CB3 0FB, UK
- akenn@microsoft.com
- Received January 5 2003, Revised May 9 2003, Accepted July 25
2003
- C745: Special Issue: Formal Techniques for Java-like
Language
- Abstract:Overview of Formal Techniques for Java-like
Language Special Issue
- Special Issue for Formal Techniques for Java-like Programs
workshop at ECOOP.2002 Málaga, Spain, June 10, 2002 edited by Peter
Mueller, David Naumann and Erik Poll
- Peter Müller, David A. Naumann, Erik Poll
- Computing Science Institute University of Nijmegen
- erikpoll@cs.kun.nl
- Received July 16 2003, Accepted July 25 2003
- C746: A Parallel Update Algorithm for Concurrency Control
in Data Warehousing Systems
- Abstract:To speed up on-line analytical processing, data
warehouse, which is usually derived from operational databases, is introduced.
When the operational databases happen to change, the data warehouse gets stale.
To maintain the freshness of data warehouse, such changes need to be frequently
and concurrently propagated into the data warehouse. However, if several update
transactions are allowed to execute concurrently without an appropriate
concurrency control, data inconsistency between data warehouse and operational
databases could arise due to incorrect propagation of the updates on the
operational databases into the data warehouse. In this paper, we propose a
concurrency control algorithm, which could concurrently execute a number of
update transactions in a consistent way. To investigate the applicable areas of
our algorithm, its performance is evaluated by means of simulation approach.
Our experimental results show that the proposed algorithm enables analytical
transactions to read fresher data than other representative concurrency control
algorithms.
- Jinbae Kim
- Department of Computer Science, Anyang Technical College,
Republic of Korea
-
jbkim@ianyang.ac.kr
- Received August 28 2002, Comments to Author July 24 2003
- C747: Simulation of Resource Synchronization in a Dynamic
Real-Time Distributed Computing Environment
- Abstract:Today, more and more distributed computer
applications are being modeled and constructed using real-time principles and
concepts. In 1989, the Object Management Group (OMG) formed a Real-Time Special
Interest Group (RT SIG) with the goal of extending the Common Object Request
Broker Architecture (CORBA) standard to include real-time specifications. This
groups most recent efforts have focused on the requirements of dynamic
distributed real-time systems. One open problem in this area is resource access
synchronization for tasks employing dynamic priority scheduling. This paper
presents two resource synchronization protocols that the authors have developed
which meet the requirements of dynamic distributed realtime systems as
specified by Dynamic Scheduling Real-Time CORBA (DSRT CORBA). The proposed
protocols can be applied to both Earliest Deadline First (EDF) and Least Laxity
First (LLF) dynamic scheduling algorithms, allow distributed nested critical
sections, and avoid unnecessary runtime overhead. In order to evaluate the
performance of the proposed protocols, we analyzed each protocols
schedulability. Since the schedulability of the system is affected by numerous
system configuration parameters, we have designed simulation experiments to
isolate and illustrate the impact of each individual system parameter.
Simulation experiments show the proposed protocols have better performance than
one would realize by applying a schema that utilizes dynamic priority ceiling
update.
- Chen Zhang, David Cordes
- Dept. of Computer Information Systems Bryant College Smithfield,
RI 02917-1284; Department of Computer Science The University of Alabama
Tuscaloosa, AL 35487-0290
- czhang@bryant.edu
- Received July 14 2003, Comments to Authors Oct 12 2003, Accepted
Nov 16 2003
- C748: Evaluation of Cache-based Superscalar and Cacheless
Vector Architectures for Scientific Computations
- Abstract: The growing gap between sustained and peak
performance for scientific applications is a well-known problem in high end
computing. The recent development of parallel vector systems offers the
potential to bridge this gap for many computational science codes and deliver a
substantial increase in computing capabilities. This paper examines the
intranode performance of the NEC SX-6 vector processor and the cache-based IBM
Power3/4 superscalar architectures across a number of scientific computing
areas. First, we present the performance of a microbenchmark suite that
examines low-level machine characteristics. Next, we study the behavior of the
NAS Parallel Benchmarks. Finally, we evaluate the performance of several
scientific computing codes. Results demonstrate that the SX-6 achieves high
performance on a large fraction of our applications and often significantly
outperforms the cache-based architectures. However, certain applications are
not easily amenable to vectorization and would require extensive algorithm and
implementation reengineering to utilize the SX-6 effectively.
- Leonid Oliker, Andrew Canning, Jonathan Carter, John Shalf, David
Skinner, Stephane Ethier, Rupak Biswas, Jahed Djomehri, and Rob Van der
Wijngaart
- CRD/NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA
94720; Princeton Plasma Physics Laboratory, Princeton University, Princeton, NJ
08453; NAS Division, NASA Ames Research Center, Moffett Field, CA 94035
- rbiswas@nas.nasa.gov
- Received August 15 2003, Comments to Authors December 30 2003,
Accepted March 9 2004
- C776: Composition and On Demand Deployment of Distributed
Brain Activity Analysis Application on Global Grids
- Abstract:The distribution of knowledge (by scientists) and
data sources (advanced scientific instruments), and the need of large-scale
computational resources for analyzing massive scientific data are two major
problems commonly observed in scientific disciplines. The two popular
scientific disciplines of this nature are brain science and high-energy
physics. The analysis of brain activity data gathered from the MEG
(Magnetoencephalography) instrument is an important research topic in medical
science since it helps doctors in identifying symptoms of diseases. The data
needs to be analyzed exhaustively to efficiently diagnose and analyze brain
functions and requires access to large-scale computational resources. The
potential platform for solving such resource intensive applications is the
Grid. This paper presents the design and development of MEG data analysis
system by leveraging Grid technologies, primarily Nimrod-G, Gridbus, and
Globus. It describes the composition of the neuroscience (brain activity
analysis) application as parameter-sweep application and its on-demand
deployment on Global Grids for distributed execution.
- R. Buyya, S. Date, Y. Mizuno-Matsumoto, S. Venugopal, and D.
Abramson
- Grid Computing and Distributed Systems (GRIDS) Laboratory, Dept.
of Computer Science and Software Engineering, University of Melbourne,
Australia; Graduate School of Information Science and Technology, Department of
Bioinformatic Engineering, Osaka University, Japan; Department of Information
Systems Engineering, Graduate School of Osaka University, Japan; School of
Computer Science and Software Engineering, Monash University, Australia
- raj@cs.mu.oz.au
- Received September 10 2003, Comments to Authors December 30,
2003, Accepted March 9 2004
- C777: Distributed Computing with Triana on the Grid
- Abstract:In this paper, we describe Triana, a distributed
problem-solving environment that makes use of the Grid to enable a user to
compose applications from a set of components, select resources on which the
composed application can be distributed and then execute the application on
those resources. We describe Trianas current pluggable architecture that
can support many different modes of operation by the use of flexible writers
for many popular web service choreography languages. We further show, that the
Triana architecture is middleware independent through the use of the Grid
Application Toolkit (GAT) API and demonstrate this through the use of a GAT
binding to JXTA. We describe how other bindings being developed to Grid
infrastructures, such as OGSA, can seamlessly be integrated within the current
prototype by using the switching capability of the GAT. Finally, we outline an
experiment we conducted using this prototype and discuss its current status.
- Dr Ian Taylor, Dr Ian Wang , Matthew Shields, Shalil Majithia
- School of Computer Science, Cardiff University
-
shalil.majithia@cs.cardiff.ac.uk
- Received September 10 2003, Comments to Authors December 31 2003,
Accepted March 15 2004
- C778: Parallel Gain-Bandwidth Characteristics Calculations
for Thin Avalanche Photodiodes on an SGI Origin 2000 Supercomputer
- Abstract:An important factor for high speed optical
communication is the availability of ultrafast and low-noise photodetectors.
Among the semiconductor photodetectors that are commonly used in today's
long-haul and metro-area fiber-optic systems, avalanche photodiodes (APDs) are
often preferred over p-i-n photodiodes due to their internal gain, which
significantly improves the receiver sensitivity and alleviates the need for
optical pre-amplification. Unfortunately, the random nature of the very process
of carrier impact ionization, which generates the gain, is inherently noisy and
results in fluctuations not only in the gain but also in the time response.
Recently, a theory characterizing the autocorrelation function (or the power
spectral density) of APDs has been developed by us which incorporates the
dead-space effect. The research extends the time-domain analysis of the
dead-space multiplication model to compute the autocorrelation function of the
APD impulse response. However, the computation requires a large amount of
memory space and is very time consuming. In this research, we describe our
experiences in parallelizing the code using in MPI and OpenMP using CAPTools.
Several array partitioning schemes and scheduling policies are implemented and
tested. Our results show that the code is scalable up to 64 processors on an
SGI Origin 2000 machine and has small average errors.
- Yi Pan, Constantinos S. Ierotheou, and Majeed M. Hayat
- \Department of Computer Science Georgia State University Atlanta,
GA 30303, USA; Parallel Processing Research Group University of Greenwich
London SE10 9LS, UK; Majeed M. Hayat Department of Electrical & Computer
Engineering The University of New Mexico Albuquerque, NM 87131-1356, USA
- pan@cs.gsu.edu
- Performance Section: Received 9 September 2002, Accepted 5
September 2003
- C779: Lightweight monitoring of MPI programs in
real-time
- Abstract:Current technologies allow efficient data
collection by several sensors to determine the overall evaluation of the status
of a cluster. However, no previous work of which we are aware analyzes the
behavior of the parallel programs themselves in real-time. In this paper, we
perform a comparison of different artificial intelligence techniques that can
be used to implement a lightweight monitoring and analysis system for parallel
applications on a cluster of Linux workstations. We study the accuracy and
performance of deterministic and stochastic algorithms when we observe the flow
of both function library and OS system calls of parallel programs written with
C and MPI. We demonstrate that monitoring of MPI programs can be achieved with
high accuracy and in some cases with a 0\% false positive rate in real-time,
and we show that the added computational load on each node is small. As an
example, the monitoring of function calls using a Hidden Markov Model generates
3.97\% overhead. The proposed system is able to automatically detect deviations
of a process from its expected behavior in any node of the cluster, and thus it
can be used as an anomaly detector, for performance monitoring to complement
other systems, or as a debugging tool.
- German Florez, Zhen Liu, Susan M. Bridges, Anthony Skjellum and
Rayford B. Vaugh
- Center for Computer Security Research and High Performance
Computer Laboratory, Department of Computer Science and Engineering,
Mississippi State University,MS, 37962-9637, U.S.A.
-
gf24@CSE.MsState.EDU
- Received June 27 2003, Comments to Authors December 31 2003,
Accepted March 9 2004
- C780: The Performance of Parallel Matrix Algorithms on a
Broadcast-based Architecture
- Abstract:Due to advances in fiber-optics and VLSI
technology, interconnection networks which allow multiple simultaneous
broadcasts are becoming feasible. This paper summarizes one such multiprocessor
architecture called the Simultaneous Optical Multiprocessor Exchange Bus
(SOME-Bus). It also presents enhancements to the network interface and the
cache and directory controllers which support cache block combining, capture
and prefetch and allow complete overlap of processing time with the
communication time due to compulsory misses. The paper uses two fundamental
matrix algorithms to characterize the impact of each enhancement on
performance. Cache miss analysis and results from the execution of these
programs on a SOME-Bus simulator show that block capture and prefetch combined
with an effective block replacement policy succeed in significantly reducing
the miss rate due to compulsory misses as the cache size increases, while a
similar increase of cache size in traditional architectures leaves the miss
rate due to compulsory misses unaffected.
- C. Katsinis, D. Hecht, M. Zhu, and H. Narravula
- Electrical and Computer Engineering, Drexel University,
Philadelphia, PA 19104
-
ckatsini@cbis.ece.drexel.edu
- Received October 15 2003, Comments to Authors December 31 2003,
Accepted June 25 2004
- C781: Ultra Quick Matrix Multiplication on Clusters of
Workstations
- Abstract: An ultra quick matrix multiplication algorithm
with linear time complexity is presented and evaluated on a cluster of
networked workstations consisting of Pentium hosts connected together by
Ethernet segments. The obtained results confirm the feasibility of using
networked workstations to provide fast and low cost solutions to many
computationally intensive applications such as large linear algebraic systems.
The paper also presents and verifies an accurate timing model to predict the
performance of the proposed algorithm on arbitrary clusters of workstations.
Through this model the viability of the proposed algorithm can be revealed
without the extra effort that would be needed to carry out real testing.
- Eyas El-QAWASMEH and Abdel-Elah AL-AYYOUB
- Department of Computer Science and Information Systems Jordan
University of Science and Technology P.O.Box 3030, Irbid 22110, Jordan
- eyas@just.edu.jo
- Received October 30 2003, Comments to Authors March 12 2004
- C782: Multivariate Integration and Visualization with
ParInt/ParVis
- Abstract: We present a survey of the contents and
capabilities of the parallel multivariate integration package PARINT and its
companion visualization tool PARVIS, while focusing on the analysis of both the
integrand behavior and the parallel performance of the integration methods.
Current developments include asynchronous adaptive methods for hyperrectangular
and simplex regions. A new parallel quasi-Monte Carlo technique suitable for
heterogeneous platforms has been incorporated for general-purpose integration
in higher dimensions, as well as for the computation of multivariate normal and
t-distributions.
- Elise de Doncker, Rodger Zanny, Karlis Kaugars and Laurentiu
Cucos
- Department of Computer Science, Western Michigan University,
Kalamazoo, MI 49008, USA
-
rrzanny@cs.wmich.edu
- Received October 28 2003, Comments to Authors March 12 2004
- C788: Allocation Management with QBank
- Abstract:QBank is a unique dynamic reservation-based
allocation management system that has proven effective in managing the
utilization of computational resources in a multi-project environment. QBank
allows an organization to guarantee greater fairness and enforce mission
priorities by associating a charge with the use of computational resources and
allocating resource credits which limit how much of the resources may be used
at what time and by whom. It tracks resource utilization and allows for
insightful planning. It also has great potential for its ability to be used in
the emerging field of meta-computing where the problem of allocating and
managing computational resources is a critical necessity. This article
discusses the architecture and design philosophies behind this novel allocation
management system as well as presenting various strategies for its deployment
in a variety of scenarios.
- Scott M. Jackson
- Pacific Northwest National Laboratory, P.O. Box 999, MS K8-91
Richland, WA 99352
-
Scott.Jackson@pnl.gov
- Received 9 December 2003, Comments to Authors March 12 2004
- C789: Integrating Program Components on Distributed Memory
Architectures via MPH
- Abstract:Component programs are usually independently
developed by different groups and remain as independent executables. For
example, coupled climate systems include atmosphere, ocean, land, and ice
models coupled through flux coupler. On parallel distributed memory
supercomputer, it remains a cumbersome task for each executable to recognize
the existence of other executables. This is similar to finding and
communicating to other processes under UNIX environment (imaging one submits
several independent jobs under UNIX and the task is for each job to detect the
existence of other jobs and communicate with them.) We systematically studied
possible component integration alternatives and job execution modes on current
architectures, and identified the following main mechanisms: Single Component
executable; Multi-Executable application (SCME); Multi-Component executable,
Single Executable application (MCSE); Multi-Component executable,
Multi-Executable application (MCME); Multi-Instance executable,
Multi-Executable application (MIME). We further developed MPH, a
multi-component handshaking library to support the above component integration
mechanisms. MPH allows components recognize and talk to each other in a
convenient and consistent way. MPH provides the following capabilities:
component name registration, resource allocation, component-joining,
inter-component communication, inquiry on the multi-component environment, and
standard input/output redirect. MPH currently works on IBM SP, SGI Origin, HP
Compaq clusters, and PC Linux clusters. MPH is adopted by several leading
climate/weather models.
- Chris Ding and Yun He
- Computational Research Division, Lawrence Berkeley National
Laboratory University of California, Berkeley, CA 94720
- chqding@lbl.gov
- Received 17 November 2003, Comments to Authors March 12 2004
- C790: Parallel iterative multilevel solution of mixed
finite element systems for scalar equations
- Abstract:A combination of several contemporary techniques
is used for the efficient parallel solution of the mixed finite element systems
on locally refined grids. Implementation experience and numerical results are
reported.
- V.Chugunov, D.Svyatski, E.Tyrtyshnikov and Yu.Vassilevski
- Institute of Numerical Mathematics, Russian Academy of Sciences,
8 Gubkina str., 119991, GSP-1, Moscow, Russia.
- vasilevs@dodo.inm.ras.ru
- Received 26 November 2003, Comments to Authors March 12 2004,
Accepted October 17 2004
- C797: Dynamic Load Balancing Mechanism for Distributed Java
Applications
- Abstract: Program environments or operating systems
generally leave to the developer the decision on the allocation of program
entities, offering either placement directives, or tools through graphical
interfaces. These approaches cannot always take into account dynamic behavior
of applications, dynamicity in the execution environment, or heterogeneity of
the execution platform. Transparent distribution algorithms are necessary for
automising and optimising application deployment. The ADAJ project, Adaptive
Distributed Applications in Java, deals with placement and migration of Java
objects, meant to deploy automatically parallel Java applications on a cluster
of workstations, using monitoring information about the application behavior.
The transparency obtained through the integration of these tools in the
middleware makes such an environment easy to use and efficiency is
improved.
- Violeta Felea, Bernard Toursel
- LIFL - Informatique IEEA, Bat M3 (USTL - Lille 1), University of
Science and Technology of Lille, 59655 Villeneuve d'Ascq CEDEX FRANCE; ´
Ecole Polytechnique Universitaire de Lille (PolytechLille)
-
Violeta.Felea@lifl.fr
- Received January 6 2004, Comments to Authors June 11 2004,
Accepted August 10 2004
- C798: Efficient and Adaptive Probabilistic Algorithms for
Asynchronous Information Scattering
- Abstract:Asynchronous information scattering is important
in networking, distributed computing, fault tolerance, and parallel computing.
This paper studies a family of probabilistic algorithms for asynchronous
information scattering in computer clusters. The main focus is on algorithm
designs that solve two intrinsic problems in randomized information
dissemination: one is called nonlinear latency in propagation and another is
called performance degradation that happens when the number of active nodes
decreases. Contributions include: (1) New versions of a probabilistic algorithm
have been designed to overcome the two problems above. This paper offers proof
and experiments on the new designs. (2) A new formula is given to calculate
communication complexity in information scattering. The formula corrects the
result of a previous study. (3) The connection between decentralized dynamic
load balancing and the distributed consensus problem is given to illustrate
theoretical and practical advantages of the probabilistic algorithms for
asynchronous information scattering. This paper particularly addresses issues
such as asynchrony, scalability, and high availability.
- Yibing Wang and Robert Hyatt
- Department of Computer and Information Sciences, University of
Alabama at Birmingham, Birmingham, AL 35294, USA
- wangy@cis.uab.edu
- Received January 9 2004, Comments to Authors June 11 2004
- C806: Task Pool Teams: A Hybrid Programming Environment for
Irregular Algorithms on SMP Clusters
- Abstract:Clusters of SMPs (symmetric multiprocessors) are
popular platforms for parallel programming since they provide large
computational power for a reasonable price. For irregular application programs
with dynamically changing computation and data access behavior a flexible
programming model is needed to achieve efficiency. In this paper we propose
task pool teams as a hybrid concept to realize irregular algorithms on clusters
of SMPs. Task pool teams generalize the task pool programming concept for
shared memory machines by combining an explicit message passing layer with
multiple multi-threaded programs. They offer implicit load balance on single
cluster nodes together with multi-threaded, asynchronous communication for the
exchange of remote data. Appropriate communication protocols with an
easy-to-use application programmer interface are provided. The approach is
illustrated with a branch & bound algorithm and the hierarchical radiosity
algorithm.
- JUDITH HIPPOLD and GUDULAR RUNGER
- Chemnitz University of Technology, Department of Computer
Science
-
judith.hippold@informatik.tu-chemnitz.de
- Received 23 Jan 2004, Comments to Authors June 10 2004, Revised
May 4 2005, Accepted June 15 2005
- C807: Modeling Heterogeneous Mesoscopic Fluid In Irregular
geometries Using Shared Memory Systems
- Abstract: We discuss the use of current shared-memory
systems for discrete-particle modeling of heterogeneous mesoscopic fluids in
irregular geometries. This has been demonstrated by way of mesoscopic blood
flow in bifurcating capillary vessels. The plasma is represented by fluid
particles, while the other blood constituents are made of "solid" particles
interacting with harmonic forces. We show that irregular boundary conditions
and heterogeneity of the particle fluid inhibit efficient implementation of the
model on superscalar processors. These data structures render useless many
architectural issues concerning data independence, such as parallel pipelining,
branch prediction, and out-of-order execution. In employing MPI on shared
memory machines, we have constructed a simple middleware library to simplify
paralelization. The particle code was tested on 4 and 8 processors of
SGI/Origin 3800 (R14000/500), IBM Regatta (Power4/1300), SGI Altix 3000
(Itanium 2/1300) systems and two-processor AMD Opteron 240 motherboard. The
tests were performed for the same system employing 2 million fluid and "solid"
particles with and without load-balancing. The efficiency of the particle code
depends critically on the memory latency. Therefore, the latest architectures
with the fastest CPU-memory interface, such as AMD Opteron and Power4,
represent the most promising platforms for modeling the complex mesoscopic
systems with fluid particles. By sacrificing computational efficiency at the
expense of simpler but portable solvers, we can promote the present trends in
the development of easy-to-use computational services based on GRID
paradigm.
- K.Boryczko, W. Dzwinel, D.A. Yuen
- Institute of Computer Science, AGH University of Technology,
Mickiewicza 30, 30-059 Kraków, Poland; Minnesota Supercomputing
Institute, University of Minnesota, MN 55455, Minneapolis, MN, USA
- dzwinel@uci.agh.edu.pl
- Received August 15 2003, Comments to Authors June 11 2004
- C808: Parallel Divide-and-Conquer Scheme for 2D Delaunay
triangulation
- Abstract:This paper describes a parallel
divide-and-conquer algorithm for Delaunay triangulation. This algorithm finds
the affected zone that cover the triangulations that may be modified during the
merge of two sub-block triangulations. With the aid of affected zone,
communications between processors are reduced, the time complexity of
divide-and-conquer remains O(n log n), and the affected zone can be found in
O(n) time steps, where n is the number of points. Issues that increase the
numerical stability, such as line circle test, is also discussed in the paper.
The code was implemented with C, FORTRAN and MPI, so it was easy to port this
program to other machines. Experimental results on IBM SP2 show that a parallel
efficiency of 34%-96% for general distributions can be achieved on an 16-node
distributed memory system.
- Min-Bin Chen, Tyng-Ruey Chuang and Jan-Jan Wu
- Department of Management Information Systems, Chung Kuo Institute
of Technology, Mucha, Taipei 116, Taiwan. ; Institute of Information Science,
Academia Sinica, Nankang, Taipei 115, Taiwan.
-
cmb@cm1.ethome.net.tw
- Received 27 February 2004, Comments to Authors June 11 2004,
accepted June 15 2005
- C809: h-Relation Personalized Communication Strategy For
HP-Adaptive Computations
- Abstract:All-to-all collective communications occur in
many different algorithms in high performance computing, and often involve
regular and predictable patterns of communication when boundary data are being
exchanged in fixed-grid calculations. The scheduling of all-to-all, irregular
and non-predictable communications which use a non-predictable subset of
processors is an important and complicated problem in hp-adaptive methods.
However, the communication pattern can be be described in a sensible way, since
it is derived by partioing a computational domain into sub-domains, where
information is exchanged only between neighboring sub-domains, each sub-domain
is assigned to a processor and forms the basis of communication between the
processors. The communication strategy proposed in this paper automatically
organizes the communication with neighboring sub-domains for irregular
partitions within the computational domain that change throughout the
calculation (after each adaption in hp-adaptive Finite Element Method
algorithms). In general, the strategy addresses the case when nothing can be
assumed about the regularity of the partition of the computational domain, and
the problem is contrained to the communication constraints of the underlying
computational switch network.
- M. Paszynski and K. Milfeld
- Institute for Computational Engineering and Sciences, The
University of Texas at Austin, ACES 6.330, 105, Austin, TX 78712, Campus mail
code C0200; AGH University of Science and Technology, Department of Computer
Methods in Metallurgy, Cracov, Poland; TACC Texas Advanced Computer Center
-
maciek@ices.utexas.edu
- Received March 1 2004, Comments to Authors June 11 2004
- C815: The Design and Implementation of Grid Database
Services in OGSA-DAI
- Abstract: Initially, Grid technologies were principally
associated with supercomputer centres and largescale scientific applications in
physics and astronomy. They are now increasingly seen as being relevant to many
areas of e-Science and e-Business. The emergence of the Open Grid Services
Architecture (OGSA), to complement the ongoing activity on web services
standards, promises to provide a service-based platform that can meet the needs
of both business and scientific applications. Early Grid applications focused
principally on the storage, replication and movement of file-based data. Now
the need for the full integration of database technologies with Grid middleware
is widely recognised. Not only do many Grid applications already use databases
for managing metadata, but increasingly many are associated with large
databases of domain-specific information (e.g. biological or astronomical
data). This paper describes the design and implementation of OGSA-DAI, a
service-based architecture for database access over the Grid. The approach
involves the design of Grid Data Services that allow consumers to discover the
properties of structured data stores and to access their contents. The initial
focus has been on support for access to Relational and XML data, but the
overall architecture has been designed to be extensible to accommodate
different storage paradigms. The paper describes and motivates the design
decisions that have been taken, and illustrates how the approach supports a
range of application scenarios. The OGSADAI software is freely available from
www.ogsadai.org.uk.
- Mario Antonioletti, Malcolm Atkinson, Rob Baxter, Andrew Borley,
Neil P Chue Hong, Brian Collins, Neil Hardman, Ally Hume, Alan Knox, Mike
Jackson, Amy Krause, Simon Laws, James Magowan, Norman W Paton, Dave Pearson,
Tom Sugden, Paul Watson and Martin Westhead
- EPCC, University of Edinburgh, James Clerk Maxwell Building,
Mayfield Road, Edinburgh EH9 3JZ, UK; National e-Science Centre, 15 South
College Street, Edinburgh EH8 9AA, UK; IBM United Kingdom Ltd, Hursley Park,
Winchester S021 2JN, UK; Department of Computer Science, University of
Manchester, Oxford Road, Manchester M13 9PL, UK; Oracle UK Ltd, Thames Valley
Park, Reading RG6 1RA, UK; School of Computing Science, University of
Newcastle, Newcastle-upon-Tyne NE1 7RU, UK.
-
Paul.Watson@newcastle.ac.uk
- Received 13 March 2004, Comments to Authors 11 August 2004,
Accepted October 17 2004
- Full Paper:
../CCPEwebresource/c815watson/c815OGSA-DAI-6.pdf
- C816: WS-GAF: A Framework for Building Grid Applications
using Web Services
- Abstract:This paper presents the motivation and design
decisions for the Web Services Grid Application Framework (WS-GAF), which is a
mapping of Grid architecture requirements onto the Web Services Architecture.
The goal for WS-GAF is to describe a framework for building Grid applications
that adheres to the principles of service-oriented architectures and utilises
existing Web Services technologies. The proposed solution addresses issues
including stateful interactions, logical resource naming, metadata, and
lifetime management
- Savas Parastatidis, Jim Webber, Paul Watson, Thomas Rischbeck
- School of Computing Science, University of Newcastle, Newcastle
upon Tyne, UK 2 Arjuna Technologies Ltd
-
Savas.Parastatidis@newcastle.ac.uk
- Received 14 March 2004, Comments to Authors 11 August 2004,
Accepted October 17 2004
- Full Paper: ../CCPEwebresource/c816webber/c816wsgaf.pdf
- C828: Middleware Benchmarking: Approaches, Results,
Experiences
- Abstract:The report summarizes the results of the Workshop
on Middleware Benchmarking held during OOPSLA 2003. The goal of the workshop
was to help advance the current practice of gathering performance
characteristics of middleware implementations through benchmarking. The
participants of the workshop have focused on identifying requirements and
obstacles of middleware benchmarking and forming a position on the related
issues. Selected requirements and obstacles are presented in section 2 of this
report, together with guidelines to adhere to when benchmarking in section 3,
open issues of current practice in section 4, and perspectives on further
research in section 5.
- Paul Brebner, Emmanuel Cecchet, Julie Marguerite, Petr Tuma,
Octavian Ciuhandu, Bruno Dufour, Lieven Eeckhout, Stéphane
Frénot, Arvind S. Krishna, John Murphy, Clark Verbrugge
- University College London
- P.Brebner@cs.ucl.ac.uk
- Received March 30 2004, Revised May 18 2004, Accepted June 10
2004
- Full Paper:
../CCPEwebresource/c828bebner/c828MiddlewareBenchmarking.pdf
- C829: Distributed Computing in Practice: The Condor
Experience
- Abstract: Since 1984, the Condor project has enabled
ordinary users to do extraordinary computing. Today, the project continues to
explore the social and technical problems of cooperative computing on scales
ranging from the desktop to the world-wide computational grid. In this chapter,
we provide the history and philosophy of the Condor project and describe how it
has interacted with other projects and evolved along with the field of
distributed computing. We outline the core components of the Condor system and
describe how the technology of computing must correspond to social structures.
Throughout, we reflect on the lessons of experience and chart the course
traveled by research ideas as they grow into production systems.
- Douglas Thain,Todd Tannenbaum, and Miron Livny
- Computer Sciences Department, University of Wisconsin-Madison
1210 West Dayton Street, Madison WI 53706
- thain@cs.wisc.edu
- Received March 29 2004, Accepted August 11 2004
- C830: Simplified Strategies yield maximal dense
matrix-matrix multiplication performance
- Abstract: This paper demonstrates that neither "heroic"
combinatorial optimization of parameter spaces, nor pure cache-oblivious
methods, are either necessary or sufficient to achieve consistent performance
in dense matrix-matrix multiplication of arbitrary sized matrices.
Architectural constraints and issues evidently restrict the critical path and
options available for optimal dense matrix-matrix performance. TLB misses.
inter-cache peak bandwidth, and register file size evidently reduce the number
of choices available to achieve maximum performance, so that relatively simple
strategies offer the best overall performance. Interestingly, maximal
small-kernel efficiency is not a guarantee of global minimal multiplication
time. Also, efficient and flat performance is possible at all problem sizes
that fit in main memory, rather than "jagged" performance curves. Higher
achieved performance of this study for smaller problem size offers greater
potential application relevance too. This paper compares and contrasts
copy-based and Morton-order-based methods for dense matrix-matrix
multiplication, and offers models to explain the performance behavior and to
show when to use each of these methods in order to achieve best performance.
This paper suggests specific and simplified design strategies for building
efficient kernels on new cache architectures keyed to fundamental architecture
constraints. Specifically, only a limited number of base cases and only one
assembly function are needed, instead of the numerous base cases and code
explosion seen in other high performance implementations.
- Wenhao Wu, Anthony Skjellum, Kazushige Goto, and Vinod K.
Valsalam
- High Performance Computing Laboratory, Department of Computer and
Information Sciences, Unversity of Alabama at Birmingham, AL 35294; Department
of Computer Science, the University of Texas at Austin, Austin, TX 78712
- tony@cis.uab.edu
- Received 26 April 2004, Comments to Authors August 10 2004
- C831: Learning with Active E-Course in Knowledge Grid
Environment
- Abstract: An active e-course is a self-representable and
self-organizable media mechanism with an open structure. The kernel of the
active e-course is to organize learning materials into a "concept space" rather
than a "page space". By dynamically forming the tailored content and flexible
structure, active e-course caters for different students with different
backgrounds, capabilities and expectations, under different time and venue. It
also provides assessments on students learning performances and gives
corresponding suggestions to guide them in further learning. An authoring tool
for constructing course ontology and a system prototype have been developed to
support the active e-course, enabling a student-centered, highly interactive
and adaptive learning approach. The results of an empirical study show that the
system can enhance the effectiveness and efficiency of learning.
- Hai Zhuge and Yanyan Li
- China Knowledge Grid Research Group, Key Lab of Intelligent
Information Processing Institute of Computing Technology Chinese Academy of
Sciences P.O.Box 2704-28, 100080, Beijing, China
- zhuge@ict.ac.cn
- Received April 5 2004, Comments to Authors August 10 2004,
Accepted October 17 2004
- C832: A New Task Scheduling Method for Distributed Programs
which Require Memory Management in Grids
- Abstract:In Grid applications, it is very likely that
object-oriented languages, such as Java and Ruby, and large-scale
semi-structured data written in XML will be employed. However, because of their
inherent dynamic memory management, Grid applications must sometimes suspend
the execution of all tasks running on the processors. This adversely affects
Grid computing. In this paper, we propose a new task scheduling method called
CP/MM (Critical Path/Memory Management) which can efficiently schedule tasks
for applications requiring memory management. The underlying concept is to
consider the cost due to memory management when the task scheduling system
allocates ready (executable) coarse-grain tasks, or macro-tasks, to processors.
We have developed three task scheduling modules, including CP/MM, for a task
scheduling system which is implemented on a Java RMI (Remote Method Invocation)
communication infrastructure. Moreover, we evaluated the fundamental
performance of CP/MM in two ways. The first is a performance evaluation of
CP/MM when it is applied on a test bed system to experimental test programs
which have many macro-tasks with complicated dependency relations. The test bed
system consists of computers on our two campuses located at a distance of
approximately 32 km from each other. The second is a performance evaluation for
a small but practical test program on a PC cluster. Our experimental results
show that CP/MM can successfully prevent high-priority macro-tasks from being
affected by the garbage collection arising from memory management, so that
CP/MM can efficiently schedule distributed programs whose critical paths are
relatively long.
- H. Koide and Y. Oie
- Department of Artificial Intelligence, Faculty of Computer
Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu,
Iizuka-city, Fukuoka 820-8502 Japan
-
koide@ai.kyutech.ac.jp
- Received June 23 2004, Comments to Authors November 17 2004,
Accepted April 3 2005
- C833: A loosely coupled application model for Grids
- Abstract:As a promising distributed computing paradigm,
Grids can integrate various computation resources on the Internet and organize
them to perform distributed computations as a huge virtual distributed
computer. However, traditional distributed applications face many obstacles in
order to be scheduled effectively and efficiently in Grid environments because
of the dynamic and heterogeneous characteristics of the Internet. Applications
implemented by traditional models such as message passing or distributed
objects are hard to be scheduled robustly and flexibly. In this paper, we
propose a loosely coupled application model for building distributed
applications on Grids. This model emphasizes the autonomy of each distributed
module. Each distributed module can be scheduled onto appropriate Grid
resources at runtime, and once it receives its required data, it can start its
computation without the consideration of other modules in the whole
application. The model weakens the dependencies between different modules of a
distributed application by explicit asynchronous message passing. Necessary
logics are introduced into the model to compose complex relationships between
distributed modules. We assume that a Grid application is composed of remote
service requests and local processing. We model each service request as a
loosely coupled module (LCM). A loosely coupled application can be defined by
the combination of independent LCMs, related LCM groups and local processing.
Different modules in such an application exchange information by explicitly
described data that can be understood by both the application and the Grid
environment. Some parameters in the application model are defined so that
application schedulers in the Grid environment can efficiently implement
application scheduling by appropriate scheduling algorithms designed based on
these parameters. Preliminary results show that this application model can
guarantee the robustness, adaptability and schedulability of Grid
applications.
- Fei Wu and W. K. Ng
- Dept. of Computer Science and Engineering, the Chinese University
of Hong Kong, Shatin, N.T., Hong Kong
-
fwu@cse.cuhk.edu.hk
- Received July 16 2004, Comments to Authors November 17 2004
- C844: Web Service Grids: An Evolutionary Approach
- Abstract:The UK e-Science Programme is a £250M, 5
year initiative which has funded over 100 projects. These application-led
projects are under-pinned by an emerging set of core middleware services that
allow the coordinated, collaborative use of distributed resources. This set of
middleware services runs on top of the research network and beneath the
applications we call the 'Grid'. Grid middleware is currently in transition
from pre-Web Service versions to a new version based on Web Services.
Unfortunately, only a very basic set of Web Services embodied in the Web
Services Interoperability proposal, WS-I, are agreed by most IT companies. IBM
and others have submitted proposals for Web Services for Grids - the Web
Services ResourceFramework and Web Services Notification specifications - to
the OASIS organisation for standardisation. This process could take up to 12
months from March 2004 and the specifications are subject to debate and
potentially significant changes. Since several significant UK e-Science
projects come to an end before the end of this process, the UK therefore needs
to develop a strategy that will protect the UK's investment in Grid middleware
by informing the Open Middleware Infrastructure Institute's (OMII) roadmap and
UK middleware repository in Southampton. This paper sets out an evolutionary
roadmap that will allow us to capture generic middleware components from
projects in a form that will facilitate migration or interoperability with the
emerging Grid Web Services standards and with on-going OGSA developments. In
this paper we therefore define a set of Web Services specifications - that we
call 'WS-I+' to reflect the fact that this is a larger set than currently
accepted by WS-I - that we believe will enable us to achieve the twin goals of
capturing these components and facilitating migration to future standards. We
believe that the extra Web Services specifications we have included in WS-I+
are both helpful in building e-Science Grids and likely to be widely accepted.
- Malcolm Atkinson, David DeRoure, Alistair Dunlop, Geoffrey Fox,
Peter Henderson, Tony Hey, Norman Paton, Steven Newhouse, Savas Parastatidis,
Anne Trefethenand Paul Watson
- National e-Science Centre, Universities of Edinburgh &
Glasgow, 15 South College Street, Edinburgh, UK; OMII, University of
Southampton, UK; Computer Science Department, Indiana University, US; EPSRC,
Polaris House, Swindon, UK; Computing Department, University of Manchester, UK;
School of Computing Science, University of Newcastle, UK
-
Tony.Hey@epsrc.ac.uk
- Received June 30 2004, Revised July 31 2004, Accepted July 31
2004
- C851: Towards Demand-responsive Rearrangement of Mirror
Servers
- Abstract:Various network services are becoming more and
more important serving roles in the social infrastructure in the Internet. The
more clients a service has, the more load the server has, which obviously
lowers the quality of service. To balance load and maintain quality, multiple
servers (mirror servers) are commonly provided that are able to provide access
to the same service. However, it is difficult to estimate the required number
of mirror servers in advance. To overcome this, we propose a new concept called
elastic server groups and discuss its basic mechanism where all server nodes
communicate with one another in a peer-to-peer style to adjust to the expected
number of servers. We also demonstrate the effectiveness of the proposed
mechanism via some experiments.
- Masakuni Agetsuma, Ehsan Mohammad Ali, Kenji Kono, Hideya Iwasaki
and Takashi Masuda
- Course in Computer Science and Information Mathematics, Graduate
School of Electro-Communications, Univ. of Electro-Communications 1-5-1
Chofugaoka Chofu-shi Tokyo 182-8585, Japan ;Department of Computer Science,
Univ. of Electro-Communications 1-5-1 Chofugaoka Chofu-shi Tokyo 182-8585,
Japan
- age@zeus.cs.uec.ac.jp
- Received September 19 2004, Comments to Authors December 20
2004
- C852: Distributed Loop Scheduling Schemes for Heterogeneous
Computer Systems
- Abstract:Distributed Computing Systems are a viable and
less expensive alternative to parallel computers. However, a serious difficulty
in concurrent programming of a distributed system is how to deal with
scheduling and load balancing of such a system which may consist of
heterogeneous computers. Distributed scheduling schemes suitable for parallel
loops with independent iterations on heterogeneous computer clusters have been
designed in the past. In this work we consider a class of Self-Scheduling
schemes for parallel loops with independent iterations that have been applied
to multiprocessor systems. We extend this type of schemes to heterogeneous
distributed systems. We present tests that the distributed versions of these
schemes maintain load balanced execution on heterogeneous systems.
- Anthony T. Chronopoulos, Satish Penmasta, Jianhua Xu, Siraj Ali
- Department of Computer Science, University of Texas at San
Antonio, 6900 N.Loop 1604 West, San Antonio, TX 78249; Lucent Technologies, 101
Crawfords Corner Rd., Holmdel NJ 07733
- atc@cs.utsa.edu
- Received June 30 2004, Comments to Authors December 20 2004,
Accepted March 29 2005
- C853: Transparent access to grid resources for user
software
- Abstract:Grid computing promises access to large amounts
of computing power, but so far adoption of grid computing has been limited to
highly specialized experts for three reasons. First, users are used to batch
systems, and interfaces to grid software are often complex and different than
those in batch systems. Second, users are used to having transparent file
access which grid software does not conveniently provide. Third, efforts to
achieve wide-spread coordination of computers while solving the first two
problems is hampered when clusters are on private networks. Here, we bring
together a variety of software that allows users to almost transparently use
grid resources as if they were local resources while providing transparent
access to files, even when private networks intervene. As a motivating example,
the BaBar Monte Carlo production system is deployed on a truly distributed
environment, the European DataGrid, without any modification to the application
itself.
- Sander Klous, Jaime Frey, Se-Chang Son, Douglas Thain, Alain Roy,
Miron Livny, Jo van den Brand
- NIKHEF, P.O. Box 41882, 1009 DB Amsterdam, The Netherlands;
Computer Science Department, University of Wisconsin, United States
- sander@nikhef.nl
- Received September 3 2004, Comments to Authors December 20 2004,
Accepted March 29 2005
- C880: GMarte: Grid Middleware to Abstract Remote Task
Execution
- Abstract: Grid Computing technologies are now being
largely deployed with the widespread adoption of the Globus Toolkit as the
industrial standard Grid middleware. However, its inherent steep learning curve
discourages the usage of these technologies for non experts. Therefore, to
leverage the usage of Grid Computing, it is important to have available high
level tools that simplify the process of remote task execution. In this paper
we introduce a middleware, developed on top of the Java Commodity Grid, which
offers an object-oriented, user friendly Application Programming Interface,
from the Java language, which eases remote task execution for computationally
intensive parallel applications.
- J. M. Alonso, V. Hernandez and G. Molto
- Departamento de Sistemas Informaticos y Computacion, Universidad
Politecnica de Valencia, Spain.
- jmalonso@dsic.upv.es
- Received November 1 2004, Comments to Author March 28 2005,
Accepted October 27 2005
- C881: LINKING RASTER IMAGE ZONES TO DATABASE BY USING
INVERSE VORONOI DIAGRAMS
- Abstract: The main objective of this study is to solve the
linking problem of raster images, as raster images often contain graphical
objects that need to be linked to a table to store the data about the
characteristics of these graphical objects. In the current technology, image
data files containing land use and land cover data obtained from satellite
images and/or aerial photographs have to be integrated with a Geographical
Information System or Image Processing Systems in order to be linked with a
database to store attribute data about the area in question. This has, so far,
been a real problem for polygon handling using raster data files.
- KEMAL YUKSEK, UGUR AYAN
- Department of Computer Engineering, Istanbul Kultur University,
Turkey
- u.ayan@iku.edu.tr
- Received October 27 2004, Comments to Author March 28 2005
- C882: Jcluster: An Efficient Java Parallel Environment on a
Large-Scale Heterogeneous Cluster
- Abstract: In this paper, we present Jcluster, an efficient
Java parallel environment that provides some critical functions, especially
load balance automatically and high performance communications, for developing
parallel applications in Java on a large-scale heterogenous cluster. Random
stealing is a well-known load-balancing algorithm, however, on a large-scale
cluster, an idle node must randomly steal many times and obtain a task from
other node. In Jcluster environment, we implement a task scheduler based on
transitive random stealing algorithm (TRS). Performance evaluations show that,
with the transitive policy, the scheduler can make any idle node obtain a task
with much less stealing times and the environment has an efficient scalable
performance on a large-scale cluster. In the performance aspects of
communication, with the method of asynchronously multithreaded transmission, we
implement a high performance PVM-like and MPI-like message passing interface
using UDP protocol in pure Java. The evaluation of the communication
performance is conducted among Jcluster environment, LAM-MPI and mpiJava on
LAM-MPI based on the Java Grande Forums pingpong benchmark.
- Bao-Yin Zhang, Guang-Wen Yang and Wei-Min Zheng
- Department of Computer Science and Technology, Tsinghua
University, Beijing, 100084, P.R. China
- zby@tsinghua.edu.cn
- Received November 3 2004, Comments to Author March 28 2005,
Revised April 29 2005, Accepted April 30 2005
- C883: Revocation Techniques for Java Concurrency
- Abstract: This paper proposes two revocation-based schemes
for managing concurrency in Java using a guarded region abstraction. These
schemes alleviate many of the constraints that inhibit construction of
transparently scalable and robust concurrent applications. The first solution,
revocable monitors, augments existing mutual exclusion monitors with the
ability to dynamically (and automatically) resolve priority inversion and
deadlock by reverting program execution to a consistent state when such
situations are detected, while preserving Java semantics. The second technique,
transactional monitors, extends the functionality of revocable monitors by
implementing guarded regions as lightweight transactions that can be executed
concurrently (or in parallel on multiprocessor platforms). We discuss design
and implementation issues of both schemes, as well as a detailed performance
study to compare these schemes against the traditional, state-of-the-art
implementation of Java monitors based on mutual exclusion.
- A. Welc, S. Jagannathan, A. L. Hosking
- Department of Computer Sciences, Purdue University, West
Lafayette, IN 47906
- welc@cs.purdue.edu
- Received November 30 2004, Comments to Author March 28 2005,
Revised May 6 2005, Accepted June 15 2005
- C884: An Efficient Concurrent Implementation of a Neural
Network Algorithm
- Abstract: The focus of this study is how we can
efficiently implement the neural network backpropagation algorithm on a Network
of Workstations (NOW) for concurrent execution. We assume a distributed system
with heterogeneous computers and that the neural network is replicated on each
computer. We propose an architecture model with efficient pattern allocation
that takes into account the speed of processors and overlaps the communication
with computation. The training pattern set is distributed among the
heterogeneous processors with the mapping being fixed during the learning
process. We provide a heuristic pattern allocation algorithm minimizing the
execution time of backpropagation learning. The computations are overlapped
with communications. Under the condition that each processor has to perform a
task directly proportional to its speed, this allocation algorithm has
polynomial-time complexity. We have implemented our model on a dedicated
network of heterogeneous computers using Sejnowski's NetTalk benchmark for
testing.
- Dr. R. Andonie, Dr. A. T. Chronopoulos, Dr. D. Grosu, H.
Galmeanu
- Computer Science Department, Central Washington University,
Ellensburg, WA 98926, USA; Department of Computer Science, The University of
Texas at San Antonio, San Antonio, TX 78249, USA; Department of Computer
Science, Wayne State University, Detroit, MI 48202, USA; Department of
Electronics and Computers, Transylvania University, 2200 Brasov, Romania
- andonie@cwu.edu
- Received November 12 2004, Comments to Author March 28 2005,
Revised April 13 2005, Accepted April 30 2005
- C885: Grids and Web Services for e-Science
- Abstract: This editorial describes four papers that
summarize key Grid technology capabilities to support distributed e-Science
applications. These papers discuss the Condor system supporting computing
communities, the OGSA-DAI service interfaces for databases, the WS-I+ Grid
Service profile and finally WS-GAF the Web Service Grid Application Framework.
We discuss the confluence of mainstream IT Industry development and the very
latest science and computer science research and urge the communities to reach
consensus rapidly. Agreement on a set of core Web Service standards is
essential to allow developers to build Grids and distributed business and
science applications with some assurance that their investment will not be
obviated by the changing Web Service frameworks.
- Tony Hey, Geoffrey Fox
- EPSRC, Polaris House, Swindon, UK; Computer Science Department,
Indiana University, US
- gcf@indiana.edu
- Received January 6 2005, Revised January 17 2005, Accepted
January 17 2005
- C908: Multimedia Vectorization of Floating-Point MIN/MAX
Reductions
- Abstract: Finding the minimum or maximum value in an array
forms an important step in a variety of applications. This paper discusses
vectorization schemes that take advantage of the streaming-SIMD-extensions in
commonly used floating-point MIN and MAX reductions. Performance advantages are
demonstrated with experimental results.
- Aart J.C. Bik, Xinmin Tian, Milind B. Girkar
- Software and Solutions Group, Intel Corporation 3600 Juliete
Lane, Santa Clara, CA 95052, USA
- aart.bik@intel.com
- Received January 13 2005, Comments to Author June 15 2005,
Accepted July 7 2005
- C909: ShanghaiGrid: An Information Service Grid
- Abstract:The goal of the ShanghaiGrid is to provide
information services to the people. It aims to construct a metropolitan-area
information service infrastructure and establish an open standard for
widespread upper-layer applications from both communities and the government.
The Information Service Grid Toolkit and a typical application named Traffic
Information and Guidance are discussed in detail..
- Minglu Li, Min-You Wu, Ying Li, Jian Cao, Linpeng Huang, Qianni
Deng, Xinhua Lin, Changjun Jiang, Weiqin Tong, Yadong Gui, Aoying Zhou, Xinhong
Wu and Shui Jiang
- Dept of Computer Science, Grid Computing Center Shanghai Jiaotong
University, Shanghai, 200030, P.R.China; Computer Science and Technology
School, Soochow University, Suzhou, Tongji University, Shanghai, China;
Shanghai University, Shanghai, China; Shanghai Supercomputer Center, Shanghai,
China; Fudan University, Shanghai, China; Shanghai Urban Transportation
Information Center; East China Institute of Computer Technology
- wu@eece.unm.edu
- Received January 16 2005, Comments to Author June 15 2005,
Accepted August 8 2005
- C910: Crunching Real Data on the Grid: Practice and
Experience with the European DataGrid
- Abstract: The D0 experiment has used the European DataGrid
(EDG) testbed to reprocess real data obtained from the Tevatron collider at the
Fermi National Accelerator Laboratory. Pushing the use of the EDG software
beyond feasibility studies has produced a set of recommendations for authors of
experiment-level software, for producers of middleware, and for designers of
grid systems. This paper describes the D0 experience with the EDG software and
the resulting recommendations.
- D. Groep, J. Templon, C. Loomis
- NIKHEF; Laboratoire de lAcc´el´erator
Lin´eaire (LAL) Orsay, France
- templon@nikhef.nl
- Received September 7 2004, Revised February 17 2005, Accepted
March 28 2005
- C911: A comprehensive peer-to-peer communication model
- Abstract: Peer-to-Peer (P2P) computing is a model
consisting of cooperating distributed processing units terms peers.
Characteristics of this computing model include lack of centralized control and
hierarchical organization of the peers. Instead, peers take on roles of
client/server, provider/requester or spotter (router) as needed. A model called
the P2P Characterization Model (P2PCM) is proposed in this paper to better
describe the communication requirements of P2P applications. Peer-to-Peer
applications are described in terms of an integrated physical component and
service model that allow for cost modeling of processes involved in the
Peer-to-Peer communications. Details of the P2PCM are presented as well as its
application to indexing, discovery and resource utilization. Code
instrumentation of AmosP2P is described in support of the implementation of the
P2PCM.
- Brian J. d'Auriol. and Jesus Pajaro
- Department of Computer Science, The University of Texas at El
Paso, El Paso, TX, USA 79968
- bdauriol@cs.utep.edu
- Received January 27 2005, Comments to Authors June 15 2005,
Accepted January 14 2006
- C912: Detecting Run-Time Errors
- Abstract: Debugging applications programs can be very time
consuming. However, having good software tools can greatly decrease this time.
Some program errors can be found at compile time, but there are many program
errors than cannot be detected until run-time. This is because the information
needed to determine that they are errors is not available until run-time. The
purpose of this paper is to evaluate the ability of a variety of software
products to detect run-time errors, to issue meaningful messages, and to give
the line in the source code where the error occurred. The language a program is
written in determines the kinds of run-time errors that can occur. We limit our
investigation to errors that can occur when executing Fortran, C, and C++
programs. However, most of the software currently available for detecting
run-time errors only supports C and C++. In fact, we found only one product
that provides support for Fortran.
- Glenn R. Luecke, James Coyle, Jim Hoekstra, Marina Kraeva, Ying
Li, Olga Taborskaia, andYanmei Wang
- The Iowa State University's High Performance Computing Group
- grl@iastate.edu
- Received February 4 2005, Comments to Authors June 15 2005,
Accepted August 29 2005
- C913: How to Measure a Large Open Source Distributed
System
- Abstract: How can we measure the spread of a open source
computing system? When a system has no price, no purchase contracts, and no
buyers and sellers, it can be di±cult to judge its impact upon the
world. To explore this issue, we have instrumented the Condor distributed batch
system in a variety of ways and observed its growth to over 50,000 CPUs in the
last ¯ve years. Our instrumentation includes automatic updates by email
and UDP, a voluntary user survey, and download records. Each of these metrics
has various strengths and weaknesses that we are able to compare and contrast.
Using extensive data on the deployment of software versions, we present an
analytic model of version penetration over time and describe the relationship
between downloading and deployment. We also explore the ethical and legal
issues surrounding automatic data collection. Surprisingly, we discover that
objections to automatic data collection are higher among people that are not
using the Condor software. We conclude with some practical advice for others
that wish to measure large distributed systems.
- Douglas Thain, Todd Tannenbaum, and Miron Livny
- Computer Science and Engineering Department, University of Notre
Dame, 384 Fitzpatrick Hall, Notre Dame IN 46556; Computer Sciences Department,
University of Wisconsin-Madison, 1210 West Dayton Street, Madison WI 53706
- dthain@cse.nd.edu
- Received February 6 2005, Comments to Authors June 16 2005,
Accepted September 24 2005
- C914: Experimental Analysis of a Mass Storage System
- Abstract: Mass storage systems (MSSs) play a key role in
parallel computing, especially in data-intensive applications like
bioinformatics. Most contemporary MSSs are implemented as RAID arrays in which
commodity disks are tied together with proprietary controller hardware. The
performance of such systems can be difficult to predict because most of the
internal details of controller behavior are proprietary. We present a
systematic method for empirically evaluating MSS performance by obtaining
measurements on a series of MSS configurations of increasing size and
complexity. We apply this methodology to the newly installed MSS at Ohio
Supercomputer Center (OSC) that has 16 Intel P4 Xeon input/output processors,
each connected to a dedicated IBM FAStT600 Storage Controller (now renamed the
DS4300) that, in turn, supports four 8+1 RAID5 units based on 250GB, 7200 rpm
disks.We observe that the maximum sustained read rate of each FAStT600 is
approximately 280MBytes/sec when multithreaded reads are used. Sustained write
performance is also analyzed. For 8+1 RAID5 systems this should, in theory, be
8/9 times as fast as reads. Our experiments, however, reveal the write
performance to be 1/3 to 1/2 times as fast. Our methodology permits storage
system designers to empirically evaluate the performance of their systems with
considerable confidence. The measurements obtained using our methods permit
application performers to be aware of the limits to the performance of their
codes.
- Shahid Bokhari, Benjamin Rutt, Pete Wyckoff, and Paul
Buerger
- Department of Electrical Engineering UET, Lahore, Pakistan;
Department of Biomedical Informatics Ohio State University, Columbus, Ohio;
Ohio Supercomputer Center Columbus, Ohio
- sbokhari@wol.net.pk
- Received 9 March 2005, Comments to Authors August 8 2005,
Accepted August 30 2005
- C915: Resource Allocation in Grid Computing: An Agent-based
Approach
- Abstract: In this paper, we present an agent-based
resource allocation model for grid computing. Three types of agents namely
Resource Brokering Agents (RBA), Job Agents (JA), and Resource Monitoring
Agents (RMA) are used. The RBAs are static and they act as resource schedulers
as well as brokers for the users to submit their jobs through JAs. The RBAs
incorporate an economic model and a queuing model. The JAs are mobile and
represent the users and perform tasks such as completing the job requests of
the users and executing the job at suitable grid nodes. The RMAs are static and
they reside in the nodes of the local cluster and inform the status of the
resources to the local cluster server. The model differs from other existing
models, as it has the following characteristics: 1) topology-based migration of
agents, 2) analysis of different types of migrations according to the topology
and agents' overheads, and 3) dynamic pricing and negotiation-based resource
allocation. The model is evaluated to investigate the behavior of several
parameters.
- S. S. Manvi, M. N. Birje, Bhanu Prasad
- Department of Electronics and Communication Engineering
Basaveshwar Engineering College, Bagalkot 587 102, INDIA; Department of
Information Science & Engineering Basaveshwar Engineering College, Bagalkot
587 102, INDIA; Department of Computer and Information Sciences Florida A&M
University Tallahassee, FL 32307, USA
- bhanu.prasad@famu.edu
- Received March 1 2005, Comments to Authors June 15 2005
- C916: Auctioning Resources in Grids: Model and
Protocols
- Abstract: In this paper, we propose and study an auction
model for resource management in Grids. We propose and investigate by
simulation three types of auction-based resource allocation protocols: (i)
First-Price Auction Protocol, (ii) Vickrey Auction Protocol and (iii) Double
Auction Protocol. The goal is to find which one is best suitable for the grid
environment from users' perspective as well as from resources' perspective. The
results showed that when we consider a mix of risk-averse and risk-neutral
users the First-Price Auction Protocol favors resources while the Vickrey
Auction Protocol favors users. On the other hand the Double Auction Protocol
favors both users and resources.
- D. Grosu and A. Das
- Department of Computer Science, Wayne State University, 5143 Cass
Avenue, Detroit, MI 48202, USA
-
dgrosu@cs.wayne.edu
- Received February 13 2005, Comments to Authors June 15 2005,
Accepted August 29 2005
- Full Paper:
../CCPEwebresource/c916grosu/c916jauction.pdf
- C917: Design and Implementation of a Linux
High-Availability and Load Balancing Server for Video-on-Demand (VOD)
Systems
- Abstract:In this investigation, we study the
High-Availability and Load-Balancing technologies for clusters of workstations,
in order to both increase the availability and scalability of services and
resources. In other words, to provide a reliable and scalable platform, that is
able to easily add new systems when load increases, and services are not
interrupted when a computer goes down. Today, web-based VOD servers make use of
this technology deployment both practical and cost-effective. Therefore, this
paper presents the hardware and software configurations of PC-based cluster
systems working as VOD system servers.
- Chao-Tung Yang, Ko-Tzu Wang, Kuan-Ching Li
- High Performance Computing Laboratory Dept. of Computer Science
and Information Engineering Tunghai University Taichung City, Taichung 40704
Taiwan ROC; Parallel and Distributed Processing Center Dept. of Computer
Science and Information Management Providence University Shalu, Taichung 43301
Taiwan ROC
- ctyang@mail.thu.edu.tw
- Received April 4 2005 2004, Comments to Authors June 15 2005
- C918: The Argus Prototype: Aggregate Use of Load Modules as
a Highdensity Supercomputer
- Abstract: This paper describes the ARGUS prototype, a high
density, low power supercomputer built from an IXIA network analyzer chassis
and Load Modules. The prototype is configured as a diskless distributed system
that is scalable to 128 processors in a single 9U chassis. The entire system
has a footprint of 1/4 meter2 (2.5 ft2), a volume of 0.09 meter3 (3.3 ft3) and
maximum power consumption of less than 2200 watts. We compare and contrast the
characteristics of ARGUS against various machines including our 32-node Beowulf
and LANLs Green Destiny. Our results show that the computing density
(GFLOPS/ft3) of ARGUS is about 30 times higher than that of the Beowulf, about
3 times higher than that of Green Destiny while performance is comparable.
- Xizhou Feng, Rong Ge and Kirk W. Cameron
- Scalable Performance Laboratory, Department of Computer Science
and Engineering, University of South Carolina, Columbia, SC 29208, USA
- kcameron@cse.sc.edu
- Received April 21 2005, Comments to Authors August 8 2005,
Accepted September 24 2005
- C932: Overlapping Computations and Communications with I/O
in Wavefront Algorithms
- Abstract: Several numerical computation algorithms exhibit
dependences that lead to a wavefront in the computation. Depending on the data
distribution chosen, pipelining communication and computation can be the only
way to avoid a sequential execution of the parallel code. The computation grain
has to be wisely chosen to obtain at the same time a maximum parallelism and a
small communication overhead. On the other hand, when the size of data exceeds
the memory capacity of the target platform, data have to be stored on disk. The
concept of out-of-core computation aims at minimizing the impact of the I/O
needed to compute on such data. It has been applied successfully on several
linear algebra applications. In this paper we apply out-of-core techniques to
wavefront algorithms. The originality of our approach is to overlap
computation, communication, and I/O operations. An original strategy is
proposed using several memory blocks accessed in a cyclic manner.
- E. Caron and F. Desprez,and F. Suter
- LIP, UMR 5668 CNRS, ENS Lyon, INRIA, UCB Lyon, 46 alle d'Italie,
F-69364 Lyon Cedex 07, France.; LORIA, UMR 7503 CNRS, INPL, INRIA, UHP Nancy 1,
Nancy 2, Campus Scientifique - BP 239 - 54506 Vandoeuvre-ls-Nancy Cedex
-
frederic.suter@gmail.com
- Received June 8 2005, Comments to Authors August 29 2005
- C933: Seine: A Dynamic Geometry-based Shared Space
Interaction Framework for Parallel Scientific Applications
- Abstract: While large-scale parallel/distributed
simulations are rapidly becoming critical research modalities in academia and
industry, their efficient and scalable implementations continue to present many
challenges. A key challenge is the dynamic and complex
communication/coordination required by these applications, which depend on
state of the phenomenon being modelled, are determined by the specific
numerical formulation, the domain decomposition and/or sub-domain refinement
algorithms used, etc., and are known only at runtime. This paper presents a
dynamic geometry-based shared-space interaction model for scientific
applications. The model provides the flexibility of shared space based models
to support extremely dynamic communication/coordination patterns while still
enabling scalable implementations. The design and prototype implementation of
an interaction framework based on the model are described. The framework
complements and can be used in conjunction with existing parallel programming
systems such as MPI and OpenMP. An experimental evaluation using an adaptive
multi-block oil reservoir simulation is used to demonstrate the performance and
scalability of applications using the framework.
- L.Zhang and M.Parashar
- The Applied Software System Laboratory, Rutgers University 94
Brett Road, Piscataway, NJ08854, U.S.
- parashar@caip.rutgers.edu
- Received June 12 2005, Comments to Authors August 8 2005,
Accepted September 4 2005
Submitted in 2001 (top)
- Here are instructions for
Grid Computing Environment Special Issue (CLOSED) due Mid July 2001 and here
are submitted papers.
- Here are instructions for Java
Grande/ISCOPE Special Issue (CLOSED) due August 31 2001 and
here are papers
- C496: Open Consensus
- Abstract:This paper presents the abstraction of open
consensus and argues for its use as an effective component for building
reliable agreement protocols in practical asynchronous systems where processes
and links can crash and recover.The specification of open consensus has a
decoupled on-demand and re-entrant behaviour that make its use very
efficient,especially in terms of forced logs, which are known to be major
sources of overhead in distributed systems.We illustrate the use of open
consensus as a basic building block to develop a modular, yet efficient, total
order broadcast protocol.Finally, we describe our Java implementation of our
open consensus abstraction and we convey our efficiency claims through some
practical performance measures.
- Romain Boichat, Svend Frølund, Rachid Guerraoui
- Swiss Federal Institute of Technology, CH 1015, Lausanne and
Hewlett-Packard Laboratories, 1501 Page Mill Rd, Palo Alto
- Email: Romain.Boichat@epfl.ch
- Received January 17 2001, Comments to Authors May 8 2001,
Accepted May 19 2001
- C497: Java for High-Performance Computing
- Abstract:There has been an increasing research interest in
extending the use of Java towards high-performance demanding applications such
as scalable web servers, multimedia applications, and large-scale scientific
applications. However, given the low performance of current Java
implementations, these application domains pose new challenges to both the
application designer and systems developer. In this paper we describe and
classify several important proposals and environments that tackle Java's
performance bottlenecks in order to make the language an effective option for
high-performance computing. We further survey most significant performance
issues while exposing the potential benefits and limitations of current
solutions in such a way that a framework for future research efforts can be
established. We show that most of the proposed solutions can be classified
according to some combination of the three basic parameters: the model adopted
for inter-process communication, language extensions, and the implementation
strategy. In addition, we examine other relevant issues, such as
interoperability, portability, and garbage collection.
- M. Lobosco, C. Amorim, and O. Loques
- Instituto de Computação Universidade Federal
Fluminense Niterói, Rio de Janeiro, Brazil
- Email: loques@ic.uff.br
- Received January 15 2001, Comments to authors May 28 2001,
Accepted July 24 2001
- C499: Data Movement and Control Substrate for Parallel
Adaptive Applications
- Abstract:In this paper, we present the Data Movement and
Control Substrate (DMCS), that implements one-sided low-latency communication
primitives for parallel adaptive and irregular computations. DMCS is built on
top of low-level, vendor-specific communication subsystems such as LAPI for IBM
SP machines and widely available libraries like MPI for clusters of
workstations and PCs. DMCS adds a small overhead to the communication
operations provided by the lower communication system. In return DMCS provides
a flexible and easy to understand application program interface for one-sided
communication operations including put-ops and remote procedure calls.
Furthermore, DMCS is designed so that it can be easily ported and maintained by
non-experts.
- Jeffrey Dobbelaere, Kevin Barker, Nikos Chrisochoides, and Demian
Naves, Keshav Pingali
- Computer Science Department College of William & Mary
Williamsburg, VA 23185.
- Computer Science Department Cornell University Ithaca, NY
14853-3801
- Email: nikos@plato.cs.wm.edu
- Received 11 March 2001, Comments to Authors July 24 2001,
Accepted October 28 2001
- C500: (ACES Special Issue) Parallel visualization of
gigabyte datasets in GeoFEM
- Abstract:Parallel visualization of large datasets in
GeoFEM is described. Our visualization subsystem supports concurrent
visualization with computation, and outputs a simplified small graphic
primitive set, which can be displayed by a basic viewer on clients. This
subsystem provides many kinds of parallel visualization algorithms for the
users to visualize their data from scalar, vector to tensor. Scalar field
topology analysis, flow field topology and semi-automatic parameter design are
employed to improve the quality of visualization results. We also present a
simplification algorithm to reduce the number of output graphic primitives,
accounting for both shape attribute and color attribute. The experimental
results show the effectiveness of our subsystem.
- Issei Fujishiro, Yuriko Takeshima, Li Chen, Hiroko Nakamura , and
Yasuko Suzuki
- Ochanomizu University, Tokyo, Japan
- Research Organization for Information Science & Technology,
Tokyo, Japan
- State Key Lab. of CAD&CG, Zhejiang University, Hangzhou,
310027, P.R. China.
- Email: chen@tokyo.rist.or.jp
- Received 26 February 2001, Comments to Authors August 5 2001,
Accepted November 9 2001
- C501: (ACES Special Issue)Parallelization of a Large-Scale
Computational Earthquake Simulation Program
- Abstract:Here we detail both the methods and preliminary
results of first efforts to parallelize two General Earthquake Model
(GEM)-related codes: 1) a relatively simple data mining procedure based on a
Genetic Algorithm; and 2) the Virtual California simulation of GEM . These
preliminary results, using a simple, heterogeneous processor system, existing
freeware, and with an extremely low cost of both manpower and hardware dollars,
motivate us to more ambitious work with considerably larger-scale computer
earthquake simulations of southern California. The GEM computational problem,
which is essentially a Monte Carlo simulation, is well suited to optimization
on parallel computers, and we outline how we are proceeding in implementing
this new software architecture.
- K.F. Tiampo , J.B. Rundle , S. Gross and S. McGinnis
- Colorado Center for Chaos & Complexity, CIRES, University of
Colorado, Boulder, CO, 80309, USA
- Email: kristy@caldera.colorado.edu
- Received 23 February 2001, Comments to Authors August 5 2001,
Accepted November 3 2001
- C502: (ACES Special Issue)Parallel Iterative Solvers for
Unstructured Grids using Directive/MPI Hybrid Programming Model for GeoFEM
Platform on SMP Cluster Architectures
- Abstract:In this study, efficient parallel iterative
method for unstructured grid has been developed for SMP (shared memory
symmetric multiprocessor) cluster architectures on GeoFEM platform. 3-level
hybrid parallel programming model has been applied : 1. message passing for
inter SMP node communication, 2. loop directive for intra SMP node
parallelization and 3. vectorization for each PE (processing element). Simple
3D elastic linear problems with more than 10 8 DOFs have been solved by 3x3
block ICCG(0) with additive Schwartz domain decomposition and PDJDS/CM-RCM
((Parallel Descending-order Jagged Diagonal Storage/ Cyclic Multicolor-Reverse
Cuthil Mckee)) re-ordering on 16 SMP nodes of Hitachi SR8000 and 20 GFLOPS
performance has been obtained. PDJDS/CM-RCM reordering method provides
excellent vector and parallel performance in SMP nodes. This re-ordering is
essential for parallelization of forward/backward substitution in IC/ILU
factorization with global data dependency. Developed method was also tested on
NEC SX-4 and attained 969 MFLOPS (48.5% of peak performance) using single
processor. Additive Schwartz domain decomposition method provided robustness to
the GeoFEM's parallel iterative solvers with localized preconditioning.
- Kengo Nakajima and Hiroshi Okuda
- Department of Computational Earth Sciences, Research Organization
for Information Science and Technology (RIST), Tokyo, Japan
- Department of Quantum Engineering and Systems Science, The
University of Tokyo, Tokyo, Japan
- Email: nakajima@tokyo.rist.or.jp
- Received 27 February 2001, Comments to Authors August 5 2001,
Accepted October 26 2001
- C503: (ACES Special Issue)Finite Element Modeling of
Multibody Contact and Its Application to Active Faults
- Abstract:Earthquakes have been recognized as resulting
from a stick-slip frictional instability along the faults between deformable
rocks. An arbitrarily shaped contact element strategy, named as node-to-point
contact element strategy is proposed to handle the friction contact between
deformable bodies with stick and finite frictional slip and extended to
simulate the active faults in the crust with a more general nonlinear friction
law. Also introduced is an efficient contact search algorithm for contact
problems among multiple small and finite deformation bodies. Moreover, the
efficiency of the parallel sparse solver for the nonlinear friction contact
problem is investigated. Finally, a model for the plate movement in the
northeast zone of Japan under gravitation is taken as an example to be analyzed
with different friction behaviors.
- H.L. Xing and A. Makinouchi
- Materials Fabrication Lab, The Institute of Physical and Chemical
Research (RIKEN) 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
- Email: xing@postman.riken.go.jp
- Received 28 February 2001, Comments to Authors August 5 2001,
Accepted October 26 2001
- C504: (ACES Special Issue) Effective Adaptation Technique
for Hexahedral Mesh
- Abstract:This paper describes simple and effective
adaptive techniques for the finite element method.The proposed refinement
method is based on a modified octree divsion. This modified octree-based method
can be applied to an arbitary hexahedral unstructured mesh. Hex-R is the the
name of our implentation and it refines the mesh in cooperation with a finite
element viewer: GPPView. The refined mesh retains the topology of the original
coarse mesh. Therefore one can easily use the proposed method for multigrid
generation. Finally the construction of an adaptive method using Hex-R is
mentioned.
- Yoshitaka Wada
- Department of Computational Earth Sciences, Research Organization
for Information Science and Technology (RIST), Tokyo, Japan
- Email: wada@tokyo.rist.or.jp
- Received March 2 2001, Comments to Authors August 5 2001,
Accepted October 26 2001
- C505: (ACES Special Issue)Thermal Convection Analysis in a
Rotating Shell by a Parallel FEM - Development of a Thermal-Hydraulic Subsystem
of GeoFEM
- Abstract:We carry out a numerical simulation of thermally
driven convection in a rotating spherical shell modeled on the Earth s
outer core using Thermal-Hydraulic subsystem of GeoFEM,which gives a parallel
FEM platform.This simulation is for understanding of the origin of the
geomagnetic field and dynamics of a fluid in the Earths outer core.We
solve a three-dimensional and time dependent process of a Boussinesq fluid in a
rotating spherical shell under effects of self gravity and the Coriolis
force.In this simulation,tri-linear hexahedron element is applied for spatial
distribution,and 2nd-order Adams-Bashforth scheme are applied for time
integration of the temperature and velocity.To satisfy the mass conservation,a
parallel iterative solver of GeoFEM is applied for solving pressure and
correction of the velocity fields.To verify our simulation code,results of the
simulation are compared with the same analysis by the spectral method.In these
results,outline of convection is almost same in these cases;that is,three pairs
of convection columns are formed,and these columns propagate to the westward in
quasi-steady state.However, magnitude of kinetic energy averaged over the shell
is about 93%of that in the case by the spectral method,and drift frequency of
the columns in the GeoFEM case is larger than that in the case by the spectral
method.
- Hiroaki Matsui and Hiroshi Okuda
- Department of Research for Computational Earth Science, Research
Organization for Information Science & Technology (RIST), Tokyo
- Department of Quantum Engineering and System Science, The
University of Tokyo, Tokyo, Japan
- Email: matsui@tokyo.rist.or.jp
- Received March 3 2001, Comments to Authors August 5 2001,
Accepted October 26 2001
- C506: (ACES Special Issue) Performance Optimization of
GeoFEM on Various Computer Architecture
- Abstract:We have a prospect that Geofem get good parallel
performance on various computer architecture by the research which has been
made in GeoFEM team. In this research, we focus a target to performance with a
single processor, and common data structure / coding manner to make GeoFEM
running high performance on various computer architecture was researched.
Test coding for Solver Part of the structure and the fluid
analysis code Data structure and coding manner was evaluated on
scalar/vector/pseudo vector architecture. A new data structure and a new direct
access manner was introduced in the fluid analysis solver. As a result, 21%
performance for peak performance was obtained on pseudo vector architecture.
And We got as good performance as the pseudo-vector execution performance in
scalar execution. 25% performance for peak performance was obtained on vector
architecture. A new direct access manner was introduced in the structure code
solver. As a result, 27% performance for peak performance was obtained on
pseudo vector architecture. 34% performance for peak performance was obtained
on vector architecture.
Test Code for Matrix Assemble Part of the structure analysis
code Coding of removing dependency was finished and performance was
evaluated on vector/scalar machine. 736.8MFlops was obtained at matrix
assembling process on SX-4. 900.7MFlops was obtained at whole test Code on
SX-4, and 2.06GFlops was obtained on VPP5000 (peak : 9.6GFlops). 124MFlops was
obtained at matrix assembling process on Alpha system (Alpha 21164
533MHz).
- Kazuo Minami
- Department of Computational Earth Sciences, Research Organization
for Information Science and Technology (RIST), Tokyo, Japan
- Email: minami@tokyo.rist.or.jp
- Received March 5 2001, Comments to Authors August 5 2001,
Accepted October 27 2001
- C507: (ACES Special Issue)Parallel Multilevel Iterative
Linear Solvers with Unstructured Adaptive Grids for Simulations in Earth
Science
- Abstract:Preconditioned iterative solver is one of the
most powerful choice such as IC (Incomplete Cholesky) or ILU (Incomplete LU)
factorization method for large-scale scientific computation. But in these
methods, iteration number until convergence increases as the problem size
becomes larger. In multigrid solvers, the rate of convergence is independent of
problem size and the number of iterations remains fairly constant. Multigrid is
also a good preconditioner for Krylov iterative solvers. In this study,
multigrid preconditioned conjugate gradient iterative method (MGCG) on parallel
computers has been developed and applied to the Poisson equation in the region
between dual sphere surfaces on semi-unstructured prismatic grids generated
adaptively. Moreover this procedure has been applied to the grids with local
refinement. Computational results on Hitachi SR2201 using up to 128 processors
show good scalability of the developed method.
- Kengo Nakajima
- Department of Computational Earth Sciences, Research Organization
for Information Science and Technology (RIST), Tokyo, Japan
- Email: nakajima@tokyo.rist.or.jp
- Received 19 March 2001, Comments to Authors August 5 2001,
Accepted October 27 2001
- C508: Comparing Windows NT, Linux, and QNX as the Basis for
Cluster Systems
- Abstract:Clusters use commodity hardware and software
components to provide an environment for parallel processing. A major issue in
the development of a cluster system is the choice of the operating system that
will run on each node. We compare three alternatives: Windows NT, Linux, and
QNX a real-time microkernel. The comparison is based on expressive
power, performance, and ease-of-use metrics. The result is that none of these
systems has a clear advantage over the others in all the metrics, but that each
has its strong and weak points. Thus any choice of a base system will involve
some technical compromises, but not major ones.
- Avi Kavas, Dror G. Feitelson
- School of Computer Science and Engineering The Hebrew University,
91904 Jerusalem, Israel
- Email: feit@cs.huji.ac.il
- Received 11 February 2001, Comments to authors May 28 2001,
Accepted July 24 2001
- C509: (ACES Special Issue)Parallel Simulation System for
Earthquake Generation: Fault Analysis Modules and Parallel Coupling
Analysis
- Abstract:Solid earth simulations have recently been
developed to address issues such as natural disasters, global environmental
destruction and the conservation of natural resources. The simulation of solid
earth phenomena involves the analysis of complex structures including strata,
faults, and heterogeneous material properties. Simulation of the generation and
cycle of earthquakes is particularly important solid earth phenomena, but such
a simulation requires analysis of a complex fault dynamics. GeoFEM (Iizuka et
al. , 1999) is a parallel finite element analysis system intended for problems
of solid earth field phenomena. This paper shows recent research of GeoFEM for
simulation of earthquake generation and cycles.
Key words: Solid earth simulations, Generation and
cycle of earthquakes, GeoFEM, Parallel finite element analysis.
- Mikio Iizuka, Daigo Sekita, Hisashi Suito, Mamoru Hyodo, Karzuro
Hirahara, David Place, Peter Mora, Osamu Hazama, and Hiroshi Okuda
- Organization for Information Science & Technology. Mitsubishi
Reserch Institute,Inc. Earth and Planetary Sciencs, Nagoya University QUAKES,
Department of Earth Sciences, The University of Queensland. Yokohama National
University. Department of Quantum Engineering and Systems Science, The
University ofTokyo
- Email: iizuka@tokyo.rist.or.jp
- Received 23 April 2001, Comments to Authors August 5 2001,
Accepted October 27 2001
- C514: Parallel Dynamic Scheduling in a Cluster of
Computers
- Abstract:The parallelization of complex planning and
control problems arising in diverse application areas in the industrial,
services, and commercial environments not only allows the determination of
control variables in the required times but also improves the performance of
the control procedure as more processors are involved in the execution of the
parallel program. In this paper we describe a scheduling application in a water
supply network to demonstrate the benefits of parallel processing. The
procedure we propose combines dynamic programming with genetic algorithms and
time series prediction in order to solve problems in which decisions are made
in stages, and the states an control belong to a continuous space. Taking into
account the computational complexity of these applications and the time
contraints that are usually imposed, the procedure has been implemented by a
parallel program in a cluster of computers, an inexpe sive and widely extended
platform that can make parallelism a practical means of tackling complex
problems in many different environments.
- M. Damas, M. Salmeron, J. Ortega, G. Olivares, H. Pomares
- Department of Computer Architecture and Computer Technology,
University of Granada. Facultad de Ciencias. Campus Fuentenueva s/n. E-18071,
Granada, Spain
- Email: mdamas@atc.ugr.es
- Received 13 November 2000, Comments to authors June 2 2001,
Accepted July 24 2001
- C515: A Parallel Algorithm for Static Slicing of Concurrent
Programs
- Abstract:Slicing of concurrent programs is a
compute-intensive task. To speed up the slicing process, we develop a parallel
algorithm. For this purpose we use a Concurrent Control Flow Graph (CCFG) as
the intermediate representation. We use a network of communicating processes to
develop our parallel algorithm. We have implemented our parallel algorithm and
the experimental results appear promising.
- D. Goswami, R. Mall
- Department of Computer Science and Engineering, Indian Institute
of Technology, Kharagpur INDIA
- Email: diganta@cse.iitkgp.ernet.in
- Received 10 May 2001; comments to Authors August 5 2001, Accepted
July 21 2003
- C516: An Analysis of VI Architecture Primitives in Support
of Parallel and Distributed Communication
- Abstract:We present the results of a detailed study of the
Virtual Interface (VI) paradigm as a communication foundation for a distributed
computing environment. Using Active Messages and the Split-C global memory
model, we analyze the inherent costs of using VI primitives to implement these
high-level communication abstractions. We demonstrate a minimum mapping cost
(i.e. the host processing required to map one abstraction to a lower
abstraction) of 5.4 microsec for both Active Messages and Split-C using 4-way
550 MHz Pentium III SMPs and the Myrinet network. We break down this cost to
use of individual VI primitives in supporting flow control, buffer management
and event processing and identify the completion queue as the source of the
highest overhead. Bulk transfer performance plateaus at 44 MB/sec for both
implementations due to the addition of fragmentation requirements. Based on
this analysis, we present the implications for the VI successor, Infiniband.
- Andrew Begel, Philip Buonadonna, David E. Culler, David Gay
- University of California, Berkeley
- Email: philipb@cs.berkeley.edu
- Received 9 May 2001; Comments to Authors August 19 2001, Accepted
September 15 2001
- C517: Automatic Determination of Matrix-Blocks
- Abstract:Many sparse matrices have a natural block
structure, for instance arising from the discretisation of a physical domain.
We give an algorithm for finding this block structure from the matrix sparsity
pattern. This algorithm can be used for instance in iterative solver libraries
in cases where the user does not or can not pass this block structure
information to the software. The block structure can then be used as a basis
for domain-decomposition based preconditioners.
- Victor Eijkhout
- Department of Computer Science; University of Tennessee,
Knoxville, TN 37996
- Email: eijkhout@cs.utk.edu
- Received 5 June 2001, Comments to Authors October 28 2001
- C518: A framework for high-performance matrix
multiplication based on hierarchical abstractions, algorithms and optimized
low-level kernels
- Abstract:Despite extensive research, optimal performance
has not easily been available previously for matrix multiplication (especially
for large matrices) on most architectures because of the lack of a structured
approach and the limitations imposed by matrix storage formats. A simple but
effective framework is presented here that lays the foundation for building
high-performance matrix-multiplication codes in a structured, portable and
efficient manner. The resulting codes are validated on three dfferent
representative RISC and CISC architectures on which they significantly
outperform highly optimized libraries such as ATLAS and other competing
methodologies reported in the literature. The main component of the proposed
approach is a hierarchical storage format that efficiently generalizes the
applicability of the memory hierarchy friendly Morton ordering to
arbitrary-sized matrices. The storage format supports polyalgorithms, which are
shown here to be essential for obtaining the best possible performance for a
range of problem sizes. Several algorithmic advances are made in this paper,
including an oscillating iterative algorithm for matrix multiplication and a
variable recursion cutoff criterion for Strassen's algorithm. The authors
expose the need to standardize linear algebra kernel interfaces, distinct from
the BLAS, for writing portable high-performance code. These kernel routines
operate on small blocks that fit in the L1 cache. The performance advantages
of the proposed framework can be effectively delivered to new and existing
applications through the use of object-oriented or compiler-based
approaches.
- Vinod Valsalam and Anthony Skjellum
- High Performance Computing Laboratory, Department of Computer
Science, Mississippi State University, MS 39762.
- Email: tony@hpcl.cs.msstate.edu
- Received June 1 2001, Comments to Authors October 28 2001,
Accepted November 21 2001
- C519: Parallel Implementation of the Fluid Particle Model
for Simulating Complex Fluids in the Mesoscale
- Abstract:Dissipative particle dynamics (DPD) and its
generalization - the fluid particle model (FPM) - represent the fluid
particle approach for simulating fluid-like behavior in the mesoscale.
Unlike particles from molecular dynamics (MD) method, the fluid
particle can be viewed as a droplet consisting of liquid
molecules. In FPM, fluid particles interact by both central and
non-central, short-range forces with conservative, dissipative and Brownian
character. In comparison to MD, FPM method in 3-D requires two to three times
more memory load and three times more communication overhead. Computational
load per step per particle is comparable to MD due to the shorter interaction
range allowed between fluid particles than between MD atoms. The
classical linked-cells technique and decomposing the computational box into
strips allow for rapid modifications of the code and for implementing non-cubic
computational boxes. We show that the efficiency of the FPM code depends
strongly on the number of particles simulated, geometry of the box, and the
computer architecture. We give a few examples from long FPM simulations
involving up to 8 million fluid particles and 32 processors. Results from FPM
simulations in 3-D of the phase separation in binary fluid and dispersion of
colloidal slab are presented. Scaling law for symmetric quenching in phase
separation has been properly reconstructed. We show also that the
microstructure of dispersed fluid depends strongly on the contrast between
kinematic viscosities of this fluid phase and the bulk phase. This FPM code can
be applied for simulating mesoscopic flow dynamics in capillary pipes or
critical flow phenomena in narrow blood vessels.
- Krzysztof Boryczko, Witold Dzwinel, David A.Yuen
- AGH Institute of Computer Science, al. Mickiewicza 30, 30-059,
Kraków, Poland, Minnesota Supercomputer Institute, University of
Minnesota, Minneapolis, Minnesota 55415-1227,USA
- Email: dwitek@msi.umn.edu
- Received June 25 2001, Comments to Authors October 28 2001,
Accepted November 10 2001
- C520: (ACES Special Issue)Interaction between Seismogenic
Body and Force-Supplying Body and Precursor in Far Field
- Abstract:Seismogenic body and wall rock of force-supply
are analogized respectively by rocks sample and block of transferring pressure
in test of fracture. Their interactions are used to study precursor in far
field and ultra-far field. Observation studies of stress and strain are made on
them. The results show that there appears mutation or disturbance precursor of
strain in the force-measuring sensor or on block of transferring pressure
before rock fracture beside anomaly variation appearing in all observing points
on sample. The observing points on lock of transferring pressure are 25-90 cm
away from sample, which are 3-10 times of fracture sizes. It is explained that
precursor anomaly is possible appearing in far places from seismogenic area.
The reason forming far field precursor is primarily studied.
- Xu Zhaoyong, Mei Shirong, Yang Runhai, Wang Jinglai, Zhao
Jinming, Wang Bin, Hua Peizhong
- Seismological Bureau of Yunnan Province, Kunming, 650041, P. R.
of China, Center for Analysis and Prediction, CSB, Beijing, 100036, P. R. of
China
- Email: xuzhaoyong@netease.com
- Received Jan 12 2001, Withdrawn August 7 2001
- C521: A Java Extension Framework for Web-based
Simulation
- Abstract:We have designed a framework for the web-based
simulation applications with a Java front-end. In this paper, we discuss the
architectural design and related considerations of this framework. We present
details of a multi-layer architecture, key components and their functions, and
the abstraction of the common features in the web-based simulation into this
framework. This framework is implemented in Java technology with
object-oriented design. This framework can incorporate an arbitrary legacy
simulation application in various domains by using configurable and extensible
interfaces. The usage of our framework can be in the area of developing Java
web-based simulation services and simulation problems.
- Tao Tang and Chu R. Wie
- State University of New York at Buffalo, Dept. of Electrical
Engineering, Bonner Hall, Buffalo, NY 14260
- Email: wie@eng.buffalo.edu>
- Received 22 July 2001, Comments to Authors October 28 2001
- C522: WebSimMicro: Web-based Simulation of Microelectronic
Devices
- Abstract:We have developed a web-based Java front-end for
microelectronic device simulation applications. We used a framework, that we
have previously proposed, for extending legacy simulation applications to the
web. This we shall call WebSimMicro, the Web-based Simulation on
Microelectronic devices. In this paper, we discuss the architecture of
WebSimMicro and the design approach using the proposed framework. This
WebSimMicro project contains a series of semiconductor device simulation
applets, including applets for metal-oxide-semiconductor transistors and
bipolar transistors which can be accessed on our website.
- Tao Tang and Chu R. Wie
- State University of New York at Buffalo, Dept. of Electrical
Engineering, Bonner Hall, Buffalo, NY 14260
- Email: wie@eng.buffalo.edu>
- Received 22 July 2001, Comments to Authors October 28 2001
- C523: Towards a Common Development Framework for
Distributed Applications
- Abstract:The last decade has witnessed a convergence in
several concerns related to the development ofdis- tributed applications
ranging from concurrency issues to architectural platforms. However, these
concerns are often addressed within the specicities of a particular
application area or a specic requirement such as performance. This paper is a
contribution towards dening a common devel- opment framework specically for
distributed applications. It denes the salient characteristics of such
applications and discusses emerging concepts and techniques from a number of
disciplines namely concurrency theory, real-time systems, distributed systems
and parallel processing. It concludes on their possible role in an integrated
development framework that would be suitable for distributed
applications.
Author's Notes:
- The paper is more structured as a survey/tutorial than a
traditional research paper.
- Its main value is to broaden parallel processing researchers'
thinking into considering a much wider perspective which encompasses
distributed applications in general.
- It is not exhaustive nor complete (it can't be, given the
broad scope) but it has some value in explaining a lot of the jargon being used
these days.
- It is quite long, but structured in a way that makes it easy
to skip sections for readers that are familiar with a particular research
area.
- Fethi A. Rabhi
- School of Information Systems, University of New South Wales,
Sydney 2052, Australia.
- Email: f.rabhi@unsw.edu.au
- Received 23 July 2001, Comments to Authors October 28 2001
- C524: Lesser Bear: a lightweight process library for SMP
computers - scheduling mechanism without a lock operation
- Abstract:We have designed and implemented a lightweight
process (thread)library called Lesser Bear for SMP computers.
Lesser Bear has thread-level parallelism and high portability. Lesser Bear
executes threads in parallel by creating UNIX processes as virtual processors
and a memory-mapped file as a huge shared-memory space.To schedule thread in
parallel,the shared-memory space has been divided into working spaces for each
virtual processor,and a ready queue has been distributed. However the previous
version of Lesser Bear sometimes requires a lock operation for dequeueing.We
therefore proposed a scheduling mechanism that does not require a lock
operation.To achieve this,each divided space forms a link topology through the
queue,and we use a lock-free algorithm for the queue operation.This mechanism
is applied to Lesser Bear and evaluated by experimental results.
- Hisashi Oguma and Yasuichi Nakayama
- c/o Prof. Y. Nakayama, Department of Computer Science, The
University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585
JAPAN
- oguma-h@igo.cs.uec.ac.jp
- Received 30 July 2001, Comments to Authors October 28 2001,
Accepted November 22 2001
- C525: Object-Oriented Distributed Computing Based on Remote
Class Reference
- Abstract:Java RMI and Jini provide effective mechanisms
for implementing a distributed computing system. Recently many numeral
libraries have been developed that take advantage of Java as an object-oriented
and portable language. The widely-used client-server method limits the extent
to which the benefits of the object-oriented approach can be exploited because
of the difficulties arising when a remote object is the argument or return
value of a remote or local method. In this paper this problem is solved by
introducing a data object that stores the data structure of the remote object
and related access methods. By using this data object, the client can easily
instantiate a remote object, and use it as the argument or return value of
either a local or remote method.
- Yan Huang, David W. Walker, and Omer F. Rana
- Department of Computer Science, Cardiff University, PO Box 916,
Cardiff, CF24 3XF, United Kingdom
- yan.huang@cs.cf.ac.uk
- Received August 2 2001, Comments to Authors November 11 2001,
Accepted January 20 2002
- C526: A stochastic load balancing algorithm for
i-Computing
- Abstract:This paper presents a stochastic dynamic load
balancing algorithm for Internet Computing, which is a new type of distributed
computing involving heterogeneous workstations from different organizations on
Internet. To realize the practical environment, we assume the system comprised
of heterogeneous, untrusted and non-dedicated workstations connected by
non-dedicated network. Our algorithm uses the product of the average processing
time and the queue length of system jobs as the load index. Dynamic
communication delay is included in the execution cost calculation. The transfer
policy and the location policy are combined in a stochastic algorithm. State
information exchange is done via information feedback and mutual updating.
Simulations demonstrate that our algorithm outperforms conventional approaches
over a wide range of system parameters. These results are reconfirmed by
empirical experiments after we have implemented the algorithms on the DJM
global virtual machine.
- Yuk-Yin WONG, Kwong-Sak LEUNG, Kin-Hong LEE
- Computing Services Centre City University of Hong Kong, Tat Chee
Avenue, KLN., Hong Kong; Department of Computer Science and Engineering Chinese
University of Hong Kong, Shatin, N.T., Hong Kong
- clevin.wong@cityu.edu.hk
- Received August 14 2001, Comments to Authors February 9 2002,
Accepted April 5 2002
- C527: Performance Comparison of CORBA and Message Queuing
under Diverse Network Constraints
- Abstract:Middleware provides standardised mechanisms that
can be used to simplify the construction of reliable and flexible distributed
applications over a network. Applying middleware to C3I problems can provide
simple integration of disparate information systems in the military
environment. Often, various middleware technologies are used as the
communication infrastructure and as a practical ease to the network-programming
problem. CORBA and message queuing are undoubtedly two major middleware
paradigms that are fundamentally distinct from one another. In order to exploit
middleware in the military environment, we need to understand the system
characteristics of these middleware technologies A robust middleware
infrastructure is a necessary underpinning for effective distributed
collaboration across the range of stringent military networks. This paper
presents a performance comparison between these two technologies, under typical
test environments. We attempt to identify problems that arise from different
network conditions in the interaction between client and server. Our
experimental results provide guidance in assessing the feasibility of using
different middleware paradigms in various military computing environments.
- T. Andrew Au
- Defence Science and Technology Organisation, Australia.
- Andrew.Au@dsto.defence.gov.au
- Received October 14 2001(electronic version), Comments to Authors
February 9 2002
- C528: Managing Application Complexity in the SAMRAI
Object-Oriented Framework
- Abstract:A major challenge facing software libraries for
scientific computing is the ability to provide adequate exibility to meet
sophisticated, diverse, and evolving application requirements. Object-oriented
design techniques are valuable tools for capturing characteristics of complex
applications in a software architecture. In this paper, we describe certain
prominent object-oriented features of the SAMRAI software library that have
proven to be useful in application development. SAMRAI is used in a variety of
applications and has demonstrated a substantial amount of code and design
re-use in those applications. This flexibility and extensibility is illustrated
with three different application codes. We emphasize two important features of
our design. First, we describe the composition of complex numerical algorithms
from smaller components which are usable in dfferent applications. Second, we
discuss the extension of existing framework components to satisfy new
application needs.
- Richard D. Hornung, Scott R. Kohn
- Center for Applied Scientific Computing, Lawrence Livermore
National Laboratory
- hornung@llnl.gov
- Received October 16 2001 (electronic version), Accepted for
Special Issue
- C529: Grid Services for Earthquake Science
- Abstract:We describe an information system architecture
for the ACES (Asia-Pacific Cooperation for Earthquake Simulation) community. It
addresses several key features of the field - simulations at multiple scales
that need to be coupled together; real-time and archival observational data,
which needs to be analyzed for patterns and linked to the simulations; a
variety of important algorithms including partial differential equation
solvers, particle dynamics, signal processing and data analysis; a natural
three dimensional space (plus time) setting for both visualization and
observations; the linkage of field to real-time events both as an aid to crisis
management and to scientific discovery. We also address the need to support
education and research for a field whose computational sophistication is
increasing rapidly and spans a broad range. The information system assumes that
all significant data is defined by an XML layer which could be virtual but
whose existence ensures that all data is object-based and can be accessed and
searched in this form. The various capabilities needed by ACES are defined as
Grid Services, which are conformant with emerging standards and implemented
with different levels of fidelity and performance appropriate for the
application. Grid Services can be composed in a hierarchical fashion to address
complex problems. The real-time needs of the field are addressed by high
performance implementation of data transfer and simulation services; further
the environment is linked to real-time collaboration to support interactions
between scientists in geographically distant locations.
- Geoffrey Fox,Sung-Hoon Ko, Marlon Pierce, Ozgur Balsoy, Jake Kim,
Sangmi Lee, Kangseok Kim, Sangyoon Oh, Xi Rao, Mustafa Varank, Hasan Bulut,
Gurhan Gunduz, Xiaohong Qiu, Shrideep Pallickara, Ahmet Uyar, Choonhan Youn
- Florida State University, Indiana University, Syracuse
University
- Marlon.Pierce@wpafb.af.mil
- Received October 3 2001, Comments to Authors October 20 2001,
Accepted October 27 2001
- C530: The Virtual Service Grid: An Architecture for
Delivering High-End Network Services
- Abstract:This paper presents the design of a novel system
architecture, Virtual Service Grid (VSG), for delivering high performance
network services. The VSG is based on the concept of the Virtual Service which
provides location, replication, and fault transparency to clients accessing
remotely deployed high-end services. One of the novel features of the Virtual
Service is the ability to self-scale in response to client demand. The VSG
exploits network- and service-information to make adaptive dynamic replica
selection, creation, and deletion decisions. We describe the VSG architecture,
middleware, and replica management policies. We have deployed the VSG on a
wide-area Internet testbed to evaluate its performance. The results indicate
that the VSG can deliver efficient performance for a wide range of client
workloads, both in terms of reduced response time, and in utilization of system
resources.
- Jon B. Weissman and Byoung-Dai Lee
- Department of Computer Science and Engineering, University of
Minnesota,
- jon@cs.umn.edu
- Received October 16 2001 (electronic version),Comments to Authors
Dec 31 2001, Accepted January 20 2002
- C575: Advanced Concurrency Control in Java
- Abstract:Developing concurrent applications is not a
trivial task. As programs grow larger and become more complex, advanced
concurrency control mechanisms are needed to ensure that application
consistency is not compromised. Managing mutual exclusion on a per-object basis
is not sufficient to guarantee isolation of sets of semantically-related
actions. In this paper, we consider "atomic blocks", a simple and lightweight
concurrency control paradigm that enables arbitrary blocks of code to access
multiple shared objects in isolation. We evaluate various strategies for
implementing atomic blocks in Java, in such a way that concurrency control is
transparent to the programmer, isolation is preserved, and concurrency is
maximized. We discuss these concurrency control strategies and evaluate them in
terms of complexity and performance.
- Pascal A. Felber, Michael K. Reiter
- Bell Laboratories 600 Mountain Ave, Murray Hill, NJ 07974
- pascal@research.bell-labs.com
- Received October 1 2001, Comments to authors Dec 31 2001,
accepted January 20 2002
- C576: A Quality of Service-based Framework for Creating
Distributed Heterogeneous Software Components
- Abstract:Component-based software development offers a
promising solution for taming the complexity found in today's distributed
applications. Today's and future software systems will certainly require
combining heterogeneous software components that are geographically dispersed.
For the successful deployment of such a software system, it is necessary that
its realization, based on assembling heterogeneous components, not only meets
the functional requirements, but also satisfies the non-functional criteria
such as the desired QoS (quality of service). In this paper, a framework based
on the notions of a meta-component model, a generative domain model and QoS
parameters is described. A formal specification based on Two-level Grammar is
used to represent these notions in a tightly integrated way so that QoS becomes
a part of the generative domain model. A simple case study is described in the
context of this framework.
- Rajeev R. Raje, Mikhail Augustom, Barrett R. Bryant, Andrew M.
Olson, Carol Burt
- IUPUI, Naval Postgraduate School Monterey, University of Alabama
at Birmingham, 2AB Inc. Alabama
- rraje@cs.iupui.edu
- Received October 8 2001, Comments to Authors March 18 2002,
Accepted June 10 2002
- C577: A comparison of concurrent programming and
cooperative multithreading
- Abstract:This paper presents a comparison of the
cooperative multithreading model with the general concurrent programming model.
It focuses on the execution time performance of a range of standard concurrent
programming applications. The overall results are mixed. In some cases,
programs written in the cooperative multithreading model outperform those
written in the general concurrent programming model. The contributions of this
paper are twofold. First, it presents a thorough analysis of the performances
of applications in the different models, i.e., to explain the criteria that
determine when a program in one model will outperform an equivalent program in
the other. Second, it examines the tradeoffs in writing programs in the
different programming styles. In some cases, better performance comes at the
cost of more complicated code.
- Aaron W. Keen, Takashi Ishihara, Justin T. Maris, Tiejun Li,
Eugene F. Fodor, and Ronald A. Olsson
- Department of Computer Science, University of California, Davis,
CA 95616 USA
- olsson@cs.ucdavis.edu
- Received October 10 2001, Comments to authors Dec 31 2001,
accepted February 9 2002
- C578: GridSim: A Toolkit for the Modeling and Simulation of
Global Grids
- Abstract:Clusters, Grids, and Peer-to-Peer (P2P) networks
have emerged as popular paradigms for next generation parallel and distributed
computing. They enable aggregation of distributed resources for solving
large-scale problems in science, engineering, and commerce. In Grid and P2P
computing environments, the resources are usually geographically distributed in
multiple administrative domains, managed and owned by different organizations
with different policies, and interconnected by wide-area networks or the
Internet. This introduces a number of resource management and application
scheduling challenges in the domain of security, resource and policy
heterogeneity, fault tolerance, continuously changing resource conditions, and
politics. The resource management and scheduling systems for Grid computing
need to manage resources and application execution depending on either resource
consumers or owners requirements, and continuously adapting to
changes in resource availability. The management resources and scheduling of
applications in such a large-scale distributed systems is complex undertaking.
In order to prove the effectiveness of resource brokers and associated
scheduling algorithms, their performance need to evaluated under different
scenarios such as varying number of resources and users with different
requirements. In Grid environment, it is hard and even impossible to perform
scheduler performance evaluation in a repeatable and controllable manner as
resources and users are distributed across multiple organizations with their
own policies. To overcome this limitation, we have proposed and developed a
Java-based discrete-event Grid simulation toolkit called GridSim. The toolkit
supports modeling and simulation of heterogeneous Grid resources (both time and
space-shared), users and application models. It provides primitives for
creation of application tasks, mapping of tasks to resources and their
management. To demonstrate suitability of the GridSim toolkit, we have
simulated a Nimrod-G like grid resource broker and evaluated the performance of
deadline and budget constrained cost- and time minimization scheduling
algorithms.
- Manzur Murshed, Rajkumar Buyya, and David Abramson
- Gippsland School of Computing and IT , Monash University,
Gippsland Campus Churchill, Vic 3842, Australia; School of Computer Science and
Software Engineering Monash University, C5.41, Caulfield Campus Melbourne, VIC
3145, Australia
- rajkumar@csse.monash.edu.au
- Received November 6 2001, Comments to authors Dec 31 2001,
Accepted Feb 9 2002
- C579: Experience with Memory Management in Open Linda
Systems
- Abstract:Coordination systems, in particular Linda, have
established themselves as important tools for the development of applications
for open systems such as the Internet. This paper shows how to tackle a
forgotten, but crucial problem in open coordination systems: memory management.
As with any system which intends to be of wide use, coordination systems have
to address the problems of memory exhaustion as memory is a finite resource.
This paper first explores the separation between coordination and computation
in order to make it clear that the problem of memory exhaustion in coordination
systems cannot be solved using garbage collection schemes implemented at the
computation language | a garbage collection scheme must exist in the
coordination environment as well. As Linda is arguably the most successful
coordination system, this paper will focus on the Linda family of systems. It
is expected that the solution in Linda can be adapted to other coordination
systems.
- Ronaldo Menezes
- Florida Institute of Technology Department of Computer Sciences
150 West University Blvd Melbourne, FL 32901
- rmenezes@cs.fit.edu
- Received November 21 2001, Comments to Authors March 26 2002,
Accepted August 6 2002
- C580: MPI-CHECK: a Tool for Checking Fortran 90 MPI
Programs
- Abstract: MPI is commonly used to write parallel programs
for distributed memory parallel computers. MPI-CHECK is a tool developed to aid
in the debugging of MPI programs that are written in free or fixed format
Fortran 90 and Fortran 77. MPI-CHECK provides automatic compile-time and
run-time checking of MPI programs. MPI-CHECK automatically detects the
following problems in the use of MPI routines: (i) mismatch in argument type,
kind, rank or number; (ii) messages which exceed the bounds of the
source/destination array; (iii) negative message lengths; (iv) illegal MPI
calls before MPI_INITIALIZE or after MPI_FINALIZE; (v) inconsistencies between
the declared type of a message and its associated DATATYPE argument; and (vi)
actual arguments which violate the INTENT attribute.
- Glenn Luecke, Hua Chen, James Coyle, Jim Hoekstra, Marina Kraeva,
Yan Zou
- Iowa State University
- grl@iastate.edu
- Received December 4 2001, Comments to Authors March 26 2002,
Accepted April 18 2002
- C581: The Design and Implementation of a Chinese Financial
Invoice Recognition System
- Abstract:This paper designs and implements a financial
invoice recognition system based on the features of the Chinese financial
invoice. By using the linear whole block moving method in each vertical
segment, a new fast algorithm is put forth to detect and rectify the slant
image. To distinguish the different form types (the foundation necessary for
locating the form fields, filtering the form lines, etc.), several
representative form features are discussed and an invoice-type features library
is built by using a semi-automatic machine study method. On the basis of the
recognized invoice type, real invoice form is re-oriented against the
corresponding blank form according to the invoice type feature, solving the
problem of adhesion of characters and form lines, as well as the problem of
characters segmentation and recognition. Based on the financial Chinese invoice
image feature, a mutual rectification mechanism founded on the recognition
results of financial Chinese characters and Arabic numerals is put forward to
raise the recognition rate. Finally, the experimental results and conclusions
are presented.
- MING Delie , LIU Jian , TIAN Jinwen
- State Key Laboratory for Image Processing and Intelligent Control
Institute for Pattern Recognition and Artificial Intelligence, Huazhong
University of Science and Technology, Wuhan, Hubei province, 430074, P.R.China
Tel: 86-27-87556301
- mingdelie@263.net
- Received 17 December 2001, Comments to Authors March 26 2002
- C582: The Virtual Laboratory: Enabling Molecular Modeling
for Drug Design on the World Wide Grid
- Abstract:Computational Grids are emerging as a new
paradigm for sharing and aggregation of geographically distributed resources
for solving large-scale computational and data intensive problems in science,
engineering, and commerce. However, application development, resource
management and scheduling in these environments is a complex undertaking. In
this paper, we illustrate the development of a virtual laboratory environment
by leveraging existing Grid technologies to enable molecular modeling for drug
design on geographically distributed resources. It involves screening millions
of compounds in the chemical database (CDB) against a protein target to
identify those with potential use for drug design. We have used the Nimrod-G
parameter specification language for transforming a molecular docking
application as a parameter sweep application for executing on distributed
systems. We have developed new tools for data management to organize and
provide access to chemical databases as a network service in a scalable and
distributed manner. They enable transparent access to molecule records in the
CDB of interest from remote clients. The Nimrod-G resource broker along with
molecule CDB service and access management infrastructure is used for
scheduling and on-demand processing of docking jobs on the World- Wide Grid
(WWG) resources. The results demonstrate the ease of use and power of the
Nimrod-G and virtual laboratory tools for grid computing.
- Rajkumar Buyya, Kim Branson, Jon Giddy, and David Abramson
- School of Computer Science and Software Engg. Monash University,
Caulfield Campus Melbourne, Australia; Structural Biology, Walter and Eliza
Hall Institute, Royal Parade, Parkville, Melbourne
- rajkumar@csse.monash.edu.au
- Received 19 December 2001, Comments to Authors March 5 2002,
Accepted March 18 2002
- C583: Automatic Refactoring by Simulation of Multiple
Inheritance in Java
- Abstract:This paper shows a technique that enables the
generation of proxy classes in an automatic manner. The model and
implementation extends production programming to support fast and automatic
prototyping of proxies and interfaces able to simulate multiple inheritance and
refactor legacy systems, even when faced with missing source code. Advantages
of the automatic synthesis of a proxy class include: compile-time type
checking, speed of execution, automatic disambiguation (name space collision
resolution) and ease of maintenance. The approach generates Java source that
does method forwarding and creates interfaces as a means to achieve
polymorphism. Disambiguation can be automatic, semi-automatic or manual. The
forwarding code (i.e., proxy) evolves into an adapter as the delegates change
their specification. This protects client classes from change. The interface
and proxy code are generated automatically via reflection. This type of
simulation of multiple inheritance was previously available only to Java
programmers who performed manual delegation or who made use of dynamic proxies.
The technique has been applied at a major aerospace corporation.
- Douglas Lyon
- Computer Engineering Dept. Fairfield University, Fairfield CT
06430
- lyon@docjava.com
- Received 24 December 2001, Comments to Authors March 26 2002,
Accepted April 5 2002
- C584: Workload Decomposition Strategies for Hierarchical
Distributed-Shared Memory Parallel Systems and their Implementation with
Integration of High Level Parallel Languages
- Abstract:In this paper we address the issue of workload
decomposition in programming hierarchical distributed-shared memory parallel
systems. The workload decomposition we have devised consists in a two-stage
procedure: a higher-level decomposition among the computational nodes, and a
lower-level one among the processors of each computational node. By focussing
on porting of a case study PIC application, we have implemented the described
work decomposition without large programming effort by using and integrating
the high-levellanguages High Performance Fortran and OpenMP.
- Sergio Briguglio, Beniamino Di Martino, and Gregorio Vlad
- Associazione Euratom-ENEA sulla Fusione, C.R. Frascati, C.P. 65
-1-00044 Frascati, Rome, Italy; Dip. Ingegneria dell'Informazione, Second
University of Naples, Italy
- briguglio@frascati.enea.it
- Received 18 November 2001, Comments to Authors March 26 2002,
Accepted May 29 2002
- C585: Optimizing Operating System Performance for CC-NUMA
Architecture
- Abstract:CC-NUMA (Cache-Coherent Non-Uniform Memory
Access) architecture is an attractive solution to scalable servers. The
performance of a CC-NUMA system heavily depends on the number of accesses to
remote memory through an interconnection network. To reduce the number of
remote accesses, an operating system needs to exploit the potential locality of
the architecture. This paper describes design and implementation of a
UNIX-based operating system supporting CC-NUMA architecture. The operating
system implements various enhancements by revising kernel algorithms and data
structure. This paper also analyzes the performance of the enhanced operating
system by running commercial benchmarks on a real CC-NUMA system. The
performance analysis shows that the operating system can achieve better
performance and scalability for CC-NUMA by kernel data striping, localization,
and load balancing.
- Moon-Seok Chang
- lBM, MS 905-7C020, 11400 Burnet Rd., Austin, TX 78758, USA
- cmoon@us.ibm.com
- Received December 15 2001, Comments to Authors March 26 2002,
Revised August 30 2002, Accepted July 21 2003
- C586: Implementation of the EARTH programming model on SMP
clusters: a multi-threaded language and runtime system
- Abstract:We designed and implemented an EARTH (Efficient
Architecture for Running THreads) runtime system for a
multi-processor/multi-node cluster. For portability, we built this runtime
system on top of Pthreads under Linux. This implementation enables the
overlapping of communication and computation on a cluster of Symmetric
Multi-Processors (SMP), and lets the interruptions generated by the arrival of
new data drive the system, rather than relying on network polling. We describe
how our implementation of a multi-threading model on a
multi-processor/multi-node system arranges the execution and the
synchronization activities to make the best use of the resources available, and
how the interaction between the local processing and the network activities are
organized. We also describe the EARTH programming model and its associated
programming language, Threaded-C (release 2.0), used to write programs for this
machine model. This programming model supports irregular fine-grain
parallelism through a two-level hierarchy of threads and fibers.We introduce
mutually exclusive fibers that enable the execution of explicitly threaded
Threaded-C code on a multi-processor shared-memory system. One of our new
synchronization mechanisms, atomic mailboxes, implements a non-deterministic
merge operator.
- G. Tremblay, C.J. Morrone, J.N. Amaral, G.R. Gao
- Dept. d'informatique, Univ. du Quebec a Montreal, Montreal, QC,
Canada; Computer Architecture and Parallel Systems Laboratory (CAPSL), Dept. of
Electrical and Computer Engineering, Univ. of Delaware, Newark, DE, USA; Dept.
of Computing Science, Univ. of Alberta, Edmonton, AB, Canada
- amaral@cs.ualberta.ca
- Received January 5 2002, Comments to Authors March 26 2002;
Accepted July 11 2002
- C587: The LINPACK Benchmark: Past, Present, and Future
- Abstract:In this paper we will clear up some of the
confusion and mystery surrounding the LINPACK Benchmark and some of its
variations. We will cover the original LINPACK 100 benchmark, the TOP500, the
HPL program that can be used in the TOP500 measurement. We will examine what is
measured and describe how to interpret the results of the programs'
execution.
- Jack J. Dongarra, Piotr Luszczek and Antoine Petitet
- Innovative Computing Lab, Computer Science Dept, University of
Tennessee, 1122 Volunteer Blvd, Knoxville TN, 37996-3450; Sun Microsystems,
Inc., Paris, France
- dongarra@cs.utk.edu
- Received December 24 2001, Comments to Authors March 26 2002,
Accepted July 2 2002
Submitted in 2000 or earlier(top)
- Home Page for Java Grande 99
jg99papers/papersforjgsicpande.html
- Home Page for Java Grande 2000 jg2000/jg2000index.html
- C467: VGDS: A Distributed Data Structure Framework for
Scientific Computation
- Abstract: This paper gives an overview of the VGDS
(Virtual Global Data Structure) project. The VGDS effort focuses on developing
an integrated, distributed environment that allows fast prototyping of a
diverse set of simulation problems in irregular scientific and engineering
domains, focusing on computations with irregular and adaptive structures. The
framework defines two base libraries: unstructured mesh and adaptive tree, that
capture major data structures involved inirregular scientic computation. The
framework defines multiple layers of class libraries which work together to
provide data-parallel representations to application developers while
encapsulate parallel implementation details into lower layers of the framework.
The layered approach enables easy extension of the base libraries to a variety
of application-specific data structures. Experimental results on a network of
workstations is reported.
- Pangfeng Liu, Jan-Jan Wu
- mailto:wuj@iis.sinica.edu.tw or
mailto:pangfeng@cs.ccu.edu.tw
- Submitted April 21, 2000
- Comments To Authors October 1 2000
- C468: cJVM: A Cluster JVM Architecture for Single System
Image
- Abstract:cJVM is a Java Virtual Machine (JVM) which
provides a single system image of a traditional JVM while executing in a
distributed fashion on the nodes of a cluster. cJVM virtualizes the cluster,
supporting any pure Java application without requiring that applications be
tailored specifically for it. The aim of cJVM is to obtain improved scalability
for a class of Java Server Applications by distributing the application's work
among the cluster's computing resources. cJVM is based on a novel object model
which distinguishes between an application's view of an object (e.g., every
object is a unique data structure) and its implementation (e.g., objects may
have consistent replications on different nodes). This enables us to exploit
knowledge on the usage of individual objects to improve performance (e.g.,
using object replications to increase locality of access to objects).
Currently, we have already completed a prototype which runs pure Java
applications on a cluster of NT workstations connected via a Myrinet fast
switch. The prototype provides a single system image to applications,
distributing the application's threads and objects over the cluster. We have
used cJVM to run without change a real Java Server Application containing over
10K loc and achieve high scalability for it on a cluster. We also achieved
linear speedup for another application with a large number of independent
threads. This paper discusses cJVM's architecture and implementation. It
focuses on achieving a single system image for a traditional JVM on a cluster
while describing in short how we aim at obtaining scalability.
- Yariv Aridor, Michael Factor and Avi Teperman
- mailto:teperman@il.ibm.com
- Submitted May 1, 2000
- Comments To Authors October 1 2000; Accepted November 23
2000
- C474: Parallel solution of rotating flows in cavities
- Abstract:In this paper, we investigate the parallel
solution to the rotating internal flow problems, using the Navier-Stokes
equations as proposed in [16] and [15]. A Runge-Kutta time-stepping scheme was
applied to the equations and both sequential and message- passing
implementations were developed, the latter using MPI , and were tested on an
SGI Origin200 distributed, global shared memory parallel computer. The results
show that our approach to parallelize the sequential implementation requires
little effort whilst providing good results even for medium-sized problems, on
this particular computer.
- Rudnei Dias da Cunha and Alvaro Luiz de Bortoli
- mailto:rudnei@mat.ufrgs.br
- Submitted May 14, 2000
- Comments To Authors October 1 2000 and Accepted October 9
2000
- C475: Analysis and Measurement of the Effect of Kernel
Locks in SMP Systems
- Abstract:This paper reports the use of case studies to
evaluate the performance degradation caused by the kernel-level lock. We define
the lock ratio as a ratio of the execution time for critical sections to the
total execution time of a parallel program. The kernel-level lock ratio
determines how effective programs work on Symmetric MultiProcessor systems. We
have measured the lock ratios and the performance of three types of parallel
programs on SMP systems with Linux 2.0: matrix multiplication, parallel make,
and WWW server programs. Experimental results show that the higher the lock
ratio of parallel programs, the worse their performance becomes. keywords: SMP
Systems, Operating Systems, Parallel Programs, Per- formance Evaluation, Kernel
Lock
- Akihiro Kaieda and Yasuichi Nakayama; Atsuhiro Tanaka, Takashi
Horikawa, Toshiyasu Kurasugi and Issei Kino
- mailto:yasu@cs.uec.ac.jp
- Submitted May 24, 2000
- Accepted October 1 2000
- C476: Effective Multicast Programming in Large Scale
Distributed Systems:The DACE Approach
- Abstract:Many distributed applications have a strong
requirement for efficient dissemination of large amounts of information to
widely spread consumers in large networks.These include applications in
e-commerce and telecommunication.Publish/subscribe is considered one of the
most important interaction styles to model communication at large
scale.Producers publish information for a topic and consumers subscribe to the
topics they wish to be informed of.The decoupling of producers and consumers in
time and in space makes the publish/subscribe paradigm very attractive for
large scale distribution,especially in environments like the Internet. This
paper describes the architecture and implementation of DACE (Distributed
Asynchronous Computing Environment),a framework for publish/subscribe
communication based on an object- oriented programming abstraction in the form
of Distributed Asynchronous Collection (DAC).DACs capture the different
variations of publish/subscribe,without blurring their respective advantages.
The architecture we present is tolerant to network partitions and crash
failures.The underlying model is based on the notion of Topic Membership:a weak
membership for the parties involved in a topic.We present how Topic Membership
enables the realization of a robust and efficient reliable multicast for large
scale.The protocol ensures that,inside a topic,even a subscriber that is
temporarily partitioned away eventually receives a published message.
- Romain Boichat, Patrick Th. Eugster, Rachid Guerraoui, Joe
Sventek
- Swiss Federal Institute of Technology, Lausanne and Agilent
Laboratories Scotland, Edinburgh
- Email:Patrick.Eugster@lsesun6.epfl.ch
- Submitted July17, 2000
- Comments to Author November 23 2000
- Accepted February 3 2001
- C486: Parallel Versions of Stones Strongly Implicit
(SIP) Algorithm
- Abstract:In this paper, we describe various methods of
deriving a parallel version of Stones Strongly Implicit Procedure (SIP)
for solving sparse linear equations arising from finite difference
approximation to partial differential equations (PDEs). Sequential
versions of this algorithm have been very successful in solving semi-conductor,
heat conduction and flow simulation problems and an efficient parallel version
would enable much larger simulations to be run. An initial investigation of
various parallelising strategies was undertaken using a version of High
Performance Fortran (HPF) and the best methods were reprogrammed using the MPI
message passing libraries for increased efficiency. Early attempts concentrated
on developing a parallel version of the characteristic wavefront computation
pattern of the existing sequential SIP code. However, a red-black ordering of
grid points, similar to that used in parallel versions of the Gauss-Seidel
algorithm, is shown to be far more efficient. The results of both the wavefront
and red-black MPI based algorithms are reported for various size problems and
number of processors on a sixteen node IBM SP2.
- J.S. Reeve, A.D. Scurr and J.H. Merlin
- Department of Electronics and Computer Science, University of
Southampton
- Email:jsr@ecs.soton.ac.uk
- Submitted August 24, 2000
- Comments to Authors February 4 2001, Accepted March 12 2001
- C487: Real-time Multi-spectral Image Fusion
- Abstract:This paper describes a novel real-time
multi-spectral imaging capability for surveillance applications. The capability
combines a new high-performance multi-spectral camera system with a distributed
algorithm that computes a spectral-screening Principal Component Transform
(PCT). The camera system uses a novel filter wheel design together with a
high-bandwidth CCD camera to allow image cubes to be delivered at 110 frames
per second with spectral resolution between 400 and 1000 nm. The filters used
in a particular application are selected to highlight a particular object based
on its spectral signature. The distributed algorithm allows image streams from
a dispersed collection of cameras to be disseminated, viewed, and interpreted
by a distributed group of analysts in real-time. It operates on networks of
commercial-off-the-shelf multiprocessors connected with high-performance (e.g.
gigabit) networking, taking advantage of multi-threading where appropriate. The
algorithm uses a concurrent formulation of the PCT to de-correlate and compress
a multi-spectral image cube. Spectral screening is used to give features that
occur infrequently (e.g. mechanized vehicles in a forest) equal importance to
those that occur frequently (e.g. trees in the forest). A human-centered
color-mapping scheme is used to maximize the impact of spectral contrast on the
human visual system. To demonstrate the efficacy of the multi-spectral system,
plant-life scenes with both real and artificial foliage are used. These scenes
demonstrate the systems ability to distinguish elements of a scene, based on
spectral contrast, that cannot be distinguished with the naked eye. The
capability is evaluated in terms of visual performance, scalability, and
real-time throughput. Our previous work on predictive analytical modeling is
extended to answer practical design questions such as For a specified
cost, what system can should be constructed and what performance will it
attain?
- Tiranee Achalakul, Stephen Taylor
- Department Computer Science, Syracuse University
- Email:tachalak@syr.edu
- Submitted September 14, 2000
- Comments to Authors March 13 2001
- C488: Efficient Communication Using Message Prediction for
Cluster of Multiprocessors
- Abstract:With the increasing uniprocessor and SMP
computation power available today,interprocessor communication has become an
important factor that limits the performance of cluster of workstations.Many
factors including communication hardware overhead,communication software
overhead,and the user environment overhead (multithreading,multiuser) affect
the performance of the communication subsystems in such systems. A significant
portion of the software communication overhead belongs to a number of message
copying.Ideally,it is desirable to have a true zero-copy protocol where the
message is moved directly from the send buffer in its user space to the receive
buffer in the destination without any intermediate buffering.However due to the
fact that message-passing applications at the send side do not know the final
receive buffer addresses,early arrival messages have to be buffered at a
temporary area.In this paper,we show that there is a message reception
communication locality in message-passing applications.We have utilized this
communication locality and devised different message predictors at the receiver
sides of communications.In essence,these message predictors can be efficiently
used to drain the network and cache the incoming messages even if the
corresponding receive calls have not been posted yet.The performance of these
predictors,in terms of hit ratio,on some parallel applications are quite
promising and suggest that prediction has the potential to eliminate most of
the remaining message copies .We also show that the proposed predictors do not
have sensitivity to the starting message reception call,and that they perform
better than (or at least equal to) our previously proposed predictors in [3
].
- Ahmad Afsahi, Nikitas J.Dimopoulos
- Queens University and University of Victoria, Canada
- Email:ahmad@ee.queensu.ca
- Submitted September 14, 2000
- Comments to Authors March 13 2001, Accepted November 9 2001
- C489: Implementing a Dynamic Processor Allocation Policy
for Multiprogrammed Parallel Applications in the Solaris TM Operating
System
- Abstract:Parallel applications typically do not perform
well in a multiprogrammed envi- ronment that uses time-sharing to allocate
processor resources to the applications' parallel threads. Coscheduling related
parallel threads, or statically partitioning the system, often can reduce the
applications' execution times, but at the expense of reducing the overall
system utilization. To address this problem, there has been increasing interest
in dynamically allocating processors to applications based on their resource
demands and the dynamically varying system load. The Loop-Level Process Control
(LLPC) policy [16] dynamically adjusts the number of threads an application is
allowed to execute based on the application's available parallelism and the
overall system load. This study demonstrates the feasibility of incorporating
the LLPC strategy into an existing commercial operating system and
parallelizing compiler and provides further evidence of the performance
improvement that is possible using this dynamic allocation strategy. In this
implementation, applications are automatically parallelized and enhanced with
the appropriate LLPC hooks so that each application interacts with the modied
version of the Solaris operating system. The parallelism of the applications
are then dynamically adjusted automatically when they are executed in a
multiprogrammed environment so that all applications obtain a fair share of the
total processing resources.
- Kelvin K. Yue and David J. Lilja
- Sun Microsystems and University of Minnesota
- Email:lilja@ece.umn.edu
- Submitted October 10, 2000, Accepted February 10 2001
- C490: Lesser Bear a Light-weight Process Library for
SMP computers
- Abstract:We have designed and implemented a light-weight
process (thread)li- brary called Lesser Bear for SMP
computers.Lesser Bear has high portability and thread-level
parallelism.Creating UNIX processes as virtual processors and a memory-mapped
.le as a huge shared-memory space enables Lesser Bear to execute threads in
parallel. Lesser Bear requires exclusive operation between peer virtual
processors,and treats a shared-memory space as a critical section for
synchronization of threads. Therefore,thread functions of the previous Lesser
Bear are serialized. In this paper,we present a scheduling mechanism to execute
thread functions in parallel.In the design of the proposed mechanism,we divide
the entire shared- memory space into partial spaces for virtual processors,and
prepare two queues (Protect Queue and Waiver Queue)for each partial space.We
adopt an algorithm in which ock operations are not necessary for
enqueueing.This algorithm allows us to propose a scheduling mechanism that can
reduce the scheduling overhead.The mechanism is applied to Lesser Bear and
evaluated by experimental results.
- Hisashi Oguma and Yasuichi Nakayama
- The University of Electro-Communications, Tokyo
- Email:oguma-h@igo.cs.uec.ac.jp
- Submitted October 24, 2000
- Comments to Authors March 13 2001, Accepted April 23 2001
- C492: PoLAPACK: Parallel Factorization Routines with
Algorithmic Blocking
- Abstract:LU, QR, and Cholesky factorizations are the most
widely used methods for solving dense linear systems of equations, and have
been extensively studied and implemented onvector and parallel computers. Most
of these factorization routines are implemented with block- partitioned
algorithms in order to perform matrix-matrix operations, that is, to obtain the
highest performance by maximizing reuse of data in the upper levels of memory,
such as cache. Since parallel computers have dierent performance ratios of
computation and com- munication, the optimal computational block sizes are
dierent from one another to generate the maximumperformance of an algorithm.
Therefore, the data matrix should be distributed with the machine specific
optimal block size before the computation. Too small or large a block size
makes getting good performance on a machine nearly impossible. In such a case,
getting a better performance may require a complete redistribution of the data
matrix. In this paper, we present parallel LU, QR, and Cholesky factorization
routines with an "algorithmic blocking" on 2-dimensional block cyclic data
distribution. With the algorithmic blocking, it is possible to obtain the near
optimal performance irrespective of the physical block size. The routines are
implemented on the Intel Paragon and the SGI/Cray T3E and compared with the
corresponding ScaLAPACK factorization routines.
- Jaeyoung Choi
- School of Computing Soongsil University 1-1, Sangdo-Dong,
Dongjak-Ku Seoul 156-743, KOREA
- Email:choi@comp.soongsil.ac.kr
- Submitted November 14, 2000
- Comments to Authors February 4 2001, accepted March 12 2001
- C493: Space-Time Tradeoffs for Parallel 3D Reconstruction
Algorithms for Virus Structure Determination
- Abstract:The 3D electron density determination of viruses,
from experimental data provided by electron microscopy, is a data intensive
computation that requires the use of clusters of PCs or parallel computers. In
this paper we report on three parallel algorithms used for 3D recon- struction
of asymmetric objects from their 2d projections. We discuss their
computational, communication, I/O, and space requirements and present some
performance data. The algorithms are general and can be used for 3D
reconstruction of asymmetric objects for applications other than structural
biology.
- Dan C. Marinescu, Yongchang Ji, and Robert E. Lynch
- Department of Computer Sciences Purdue University West Lafayette,
IN 47907
- Email:[dcm, ji,
rel]@cs.purdue.edu
- Submitted November 17, 2000
- Comments to Authors March 13 2001 Accepted April 8 2001
- C494: A New Parallel Domain Decomposition Method for the
Adaptive Finite Element Solution of Elliptic Partial Differential Equations
- Abstract:We present a new domain decomposition algorithm
for the parallel finite element solution of elliptic partial differential
equations. As with most parallel domain decomposition methods each processor is
assigned one or more subdomains and an iteration is devised which allows the
processors to solve their own subproblem(s) concurrently. The novel feature of
this algorithm however is that each of these subproblems is defined over the
entire domain -- although the vast majority of the degrees of freedom for each
subproblem are associated with a single subdomain (owned by the corresponding
processor). This ensures that a global mechanism is contained within each of
the subproblems tackled and so no separate coarse grid solve is required in
order to achieve rapid convergence of the overall iteration. Furthermore, by
following the paradigm introduced in [5], it is demonstrated that this domain
decomposition solver may be coupled easily with a conventional mesh refinement
code, thus allowing the accuracy, reliability and efficiency of mesh adaptivity
to be utilized in a well load-balanced manner. Finally, numerical evidence is
presented which suggests that this technique has significant potential, both in
terms of the rapid convergence properties and the efficiency of the parallel
implementation.
- Randolph E. Bank and Peter K. Jimack
- Dept. of Mathematics, University of California at San Diego, La
Jolla, CA 92093, USA.School of Computer Studies, University of Leeds, Leeds LS2
9JT, UK.
- Email: pkj@comp.leeds.ac.uk
- Recieved 19 May 2000, Accepted December 7 2000,
- C495: Development and Performance Analysis of Real-World
Applications for Distributed and Parallel Architectures
- Abstract:Several large real-world applications have been
developed for distributed and parallel architectures. We examine two different
program development approaches: First, the usage of a high-level programming
paradigm which reduces the time to create a parallel program dramatically but
sometimes at the cost of a reduced performance. A source-to-source compiler,
has been employed to automatically compile programs written in a
high-level programming paradigm into message passing codes. Second,
manual program development by using a low-level programming paradigm
such as message passing enables the programmer to fully exploit a given
architecture at the cost of a time-consuming and error-prone effort.
Performance tools play a central role to support the performance-oriented
development of applications for distributed and parallel architectures. SCALA
a portable instrumentation, measurement, and post-execution performance
analysis system for distributed and parallel programs has been used to
analyse and to guide the application development by selectively instrumenting
and measuring the code versions, by comparing performance information of
several program executions, by computing a variety of important performance
metrics, by detecting performance bottlenecks, and by relating performance
information back to the input program. We show several experiments of SCALA
when applied to real-world applications. These experiments are conducted for a
NEC Cenju-4 distributed memory machine and a cluster of heterogeneous
workstations and networks.
- T. Fahringer, P.Blaha, A. Hossinger, J. Luitz, E. Mehofer, H.
Moritsch, B. Scholz
- Institute for Software Science, University of Vienna
Liechtensteinstrasse 22, A-1092, Vienna, Austria. Also Vienna University of
Technology and Dept. of Business administration at University of Vienna
- Email: tf@par.univie.ac.at
- Recieved August 1 1999, Revised November 20 2000, Accepted
December 18 2000,
- C498: Simulating asynchronous hardware on multiprocessor
platforms: the case of AMULET1
- Abstract:Synchronous VLSI design is approaching a critical
point, with clock distribution becoming an increasingly costly and complicated
issue and power consumption rapidly emerging as a major concern. Hence,
recently, there has been a resurgence of interest in asynchronous digital
design techniques which promise to liberate digital design from the inherent
problems of synchronous systems. This activity has revealed a need for
modelling and simulation techniques suitable for the asynchronous design style.
The concurrent process algebra Communicating Sequential Processes (CSP) and its
executable counterpart, occam, are increasingly advocated as particularly
suitable for this purpose. This paper focuses on issues related to the
execution of CSP/occam models of asynchronous hardware on multiprocessor
machines, including I/O, monitoring and debugging, partition, mapping and load
balancing. These issues are addressed in the context of occarm, an occam
simulation model of the AMULET1 asynchronous microprocessor, however the
solutions devised are more general and may be applied to other systems
too.
- Georgios K. Theodoropoulos
- School of Computer Science, The University of Birmingham,
Birmingham B15 2TT, U.K.
- Email: gkt@cs.bham.ac.uk
- Received Oct 18 2000, Accepted February 22 2001