Preface / Santosh Pande ; Dharma P. Agrawal |
Introduction |
Compiling for Distributed Memory Multiprocessors / 1: |
Motivation / 1.1: |
Complexity / 1.2: |
Outline of the Monograph / 1.3: |
Future Directions / 1.4: |
Languages / Section I: |
High Performance Fortran 2.0 / Ken Kennedy ; Charles KoelbelChapter 1: |
History and Overview of HPF / 2: |
Data Mapping / 3: |
Basic Language Features / 3.1: |
Advanced Topics / 3.2: |
Data Parallelism / 4: |
Task Parallelism / 4.1: |
EXTRINSIC Procedures / 5.1: |
The TASK_REGION Directive / 5.2: |
Input and Output / 6: |
Summary and Future Outlook / 7: |
The Sisal Project: Real World Functional Programming / Jean-Luc Gaudiot ; Tom DeBoni ; John Feo ; Wim Bohm ; Walid Najjar ; Patrick MillerChapter 2: |
The Sisal Language: A Short Tutorial |
An Early Implementation: The Optimizing Sisal Compiler |
Update in Place and Copy Elimination |
Build in Place |
Reference Counting Optimization / 3.3: |
Vectorization / 3.4: |
Loop Fusion, Double Buffering Pointer Swap, and Inversion / 3.5: |
Sisal90 |
The Foreign Language Interface |
A Prototype Distributed-Memory SISAL Compiler |
Base Compiler |
Rectangular Arrays |
Block Messages / 5.3: |
Multiple Alignment / 5.4: |
Results / 5.5: |
Further Work / 5.6: |
Architecture Support for Multithreaded Execution |
Blocking and Non-blocking Models / 6.1: |
Code Generation / 6.2: |
Summary of Performance Results / 6.3: |
Conclusions and Future Research |
HPC++ and the HPC++Lib Toolkit / Dennis Gannon ; Peter Beckman ; Elizabeth Johnson ; Todd Green ; Mike LevineChapter 3: |
The HPC++ Programming and Execution Model |
Level 1 HPC++ / 2.1: |
The Parallel Standard Template Library / 2.2: |
Parallel Iterators / 2.3: |
Parallel Algorithms / 2.4: |
Distributed Containers / 2.5: |
A Simple Example: The Spanning Tree of a Graph |
Multi-threadedProgramming |
Synchronization |
Examples of Multi-threaded Computations |
Implementing the HPC++ Parallel Loop Directives |
Multi-context Programming and Global Pointers |
Remote Function and Member Calls |
Using Corba IDL to Generate Proxies |
The SPMD Execution Model |
Barrier Synchronization and Collective Operations / 7.1: |
Conclusion / 8: |
A Concurrency Abstraction Model for Avoiding Inheritance Anomaly in Object-Oriented Programs / Sandeep KumarChapter 4: |
Approaches to Parallelism Specification |
Issues in Designing a COOPL |
Issues in Designing Libraries |
What Is the Inheritance Anomaly? |
State Partitioning Anomaly (SPA) |
History Sensitiveness of Acceptable States Anomaly (HSASA) |
State Modification Anomaly (SMA) |
Anomaly A |
Anomaly B |
What Is the Reusability of Sequential Classes? |
A Framework for Specifying Parallelism |
Previous Approaches |
The Concurrency Abstraction Model |
The CORE Language |
Specifying a Concurrent Region / 8.1: |
Defining an AC / 8.2: |
Defining a Parallel Block / 8.3: |
Synchronization Schemes / 8.4: |
Illustrations / 9: |
Reusability of Sequential Classes / 9.1: |
Avoiding the Inheritance Anomaly / 9.2: |
The Implementation Approach / 10: |
Conclusions and Future Directions / 11: |
Analysis / Section II: |
Loop Parallelization Algorithms / Alain Darte ; Yves Robert ; Frederic VivienChapter 5: |
Input and Output of Parallelization Algorithms |
Input: Dependence Graph |
Output: NestedLoops |
Dependence Abstractions |
Dependence Graphs and Distance Sets |
Polyhedral Reduced Dependence Graphs |
Definition and Simulation of Classical Dependence Representations |
Allen and Kennedy's Algorithm |
Algorithm |
Power and Limitations |
Wolf and Lam's Algorithm |
Purpose |
Theoretical Interpretation |
The General Algorithm |
Darte and Vivien's Algorithm |
Another Algorithm Is Needed |
Polyhedral Dependences: A Motivating Example |
Illustrating Example |
Uniformization Step / 6.4: |
Scheduling Step / 6.5: |
Schematic Explanations / 6.6: |
Feautrier's Algorithm / 6.7: |
Array Dataflow Analysis / Paul FeautrierChapter 6: |
Exact Array Dataflow Analysis |
Notations |
The Program Model |
Data Flow Analysis |
Summary of the Algorithm |
Related Work |
Approximate Array Dataflow Analysis |
From ADA to FADA |
Introducing Parameters |
Taking Properties of Parameters into Account |
Eliminating Parameters |
Analysis of Complex Statements |
What Is a Complex Statement |
ADA in the Presence of Complex Statements |
Procedure Calls as Complex Statements / 4.3: |
Applications of ADA and FADA |
Program Comprehension and Debugging |
Parallelization |
Array Expansion and Array Privatization |
Conclusions |
Appendix : Mathematical Tools / A: |
Polyhedra and Polytopes / A.1: |
Z-modules / A.2: |
Z-polyhedra / A.3: |
Parametric Problems / A.4: |
Interprocedural Analysis Based on Guarded Array Regions / Zhiyuan Li ; Junjie Gu ; Gyungho LeeChapter 7: |
Preliminary |
Traditional Flow-Insensitive Summaries |
Array Data Flow Summaries |
Guarded Array Regions |
Operations on GAR's |
Predicate Operations |
Constructing Summary GAR's Interprocedurally |
Hierarchical Supergraph |
Summary Algorithms |
Expansions |
Implementation Considerations |
Symbolic Analysis |
Region Numbering |
Range Operations |
Application to Array Privatization and Preliminary Experimental Results |
Array Privatization |
Preliminary Experimental Results |
Related Works |
Automatic Array Privatization / Peng Tu ; David PaduaChapter 8: |
Background |
Algorithm for Array Privatization |
Data Flow Framework |
Inner Loop Abstraction |
An Example |
Profitability of Privatization |
Last Value Assignment |
Demand-Driven Symbolic Analysis |
Gated Single Assignment |
Demand-Driven Backward Substitution |
Backward Substitution in the Presence of Gating Functions |
Examples of Backward Substitution / 4.4: |
Bounds of Symbolic Expression / 4.5: |
Comparison of Symbolic Expressions / 4.6: |
Recurrence and the ? Function / 4.7: |
Bounds of Monotonic Variables / 4.8: |
Index Array / 4.9: |
Conditional Data Flow Analysis / 4.10: |
Implementation and Experiments / 4.11: |
Communication Optimizations / Section III: |
Optimal Tiling for Minimizing Communication in Distributed Shared-Memory Multiprocessors / Ananl Agarwal ; David Kranz ; Rajeev Barua ; Venkat NatarajanChapter 9: |
Contributions and Related Work |
Overview of the Paper |
Problem Domain and Assumptions |
Program Assumptions |
System Model |
Loop Partitions and Data Partitions |
A Framework for Loop and Data Partitioning |
Loop Tiles in the Iteration Space |
Footprints in the Data Space |
Size of a Footprint for a Single Reference |
Size of the Cumulative Footprint |
Minimizing the Size of the Cumulative Footprint |
General Case of G |
G Is Invertible, but Not Unimodular |
Columns of G Are Dependent and the Rows Are Independent |
The Rows of G Are Dependent |
Other System Environments |
Coherence-Related Cache Misses |
Effect of Cache Line Size |
Data Partitioning in Distributed-Memory Multicomputers |
Combined Loop and Data Partitioning in DSMs |
The CostModel |
The MultipleLoopsHeuristicMethod / 7.2: |
Implementation and Results |
Algorithm Simulator Experiments |
Experiments on the Alewife Multiprocessor |
A Formulation of Loop Tiles Using Bounding Hyperplanes |
Synchronization References / B: |
Communication-Free Partitioning of Nested Loops / Kuei-Ping Shih ; Chua-Huang Huang ; Jang-Ping SheuChapter 10: |
Fundamentals of Array References |
Iteration Spaces and Data Spaces |
Reference Functions |
Properties of Reference Functions |
Loop-Level Partitioning |
Iteration and Data Spaces Partitioning - Uniformly Generated References |
Hyperplane Partitioning of Data Space |
Hyperplane Partitioning of Iteration and Data Spaces |
Statement-Level Partitioning |
Affine Processor Mapping |
Hyperplane Partitioning |
Comparisons and Discussions |
Solving Alignment Using Elementary Linear Algebra / Vladimir Kotlyar ; David Bau ; Induprakas Kodukula ; Keshav Pingali ; Paul StodghillChapter 11: |
Linear Alignment |
Equational Constraints |
Reduction to Null Space Computation |
Remarks |
Reducing the Solution Basis |
Affine Alignment |
Encoding Affine Constraints as Linear Constraints |
Replication |
Formulation of Replication |
Heuristics |
Lessons from Some Common Computational Kernels |
Implications for Alignment Heuristic |
Preface / Santosh Pande ; Dharma P. Agrawal |
Introduction |
Compiling for Distributed Memory Multiprocessors / 1: |
Motivation / 1.1: |
Complexity / 1.2: |
Outline of the Monograph / 1.3: |