Towards Peak Performance with IBM POWER3
Contents
- Towards Peak Performance with IBM POWER3
- Topics
- The POWER3 Residency at Austin, Texas
- RED BOOK: SG24-5155-00 RS/6000 Scientific and Technical Computing: POWER3 Intro
- RS/6000 Processor Roadmap
- POWER3 Architecture
- Stride and Data Cache Structure
- Stride in Fortran loops
- Data Cache structure
- Effects of stride on performance
- The 4-Way Set Associative POWER2 Data Cache
- The 128-Way Set Associative POWER3 Data Cache
- Translation Lookaside Buffer (TLB)
-
-
- Superscalar Floating Point Units
- POWER3 Floating Point Unit - Superscalar Pipeline
- IBM XL Fortran - new with V5
- XLF V5 64-bit Support
- IBM XLF V5 Compiler Optimisation Improvements
- XLF V5 Compiler - SMP support
- Compiling and linking for SMP
- XLF V5 SMP Directives
- Tuning for Maximum Megaflops
- Hand tuning techniques
- "Common sense" example
- Sequential processing
- M x N unrolling (matrix multiply)
- PERFORMANCE Recent Customer Benchmark
- Benchmark Performance POWER3 vs POWER2
- Benchmark Performance XLF V5 vs XLF V3
- SMP Performance: Throughput
- XLF V5 SMP Performance 2-way parallel
- MPI
- MPI vs XLF V5 parallelisation within an SMP
- HPF "High Performance Fortran"
- MPI within an SMP
- SMP nodes in RS/6000 SP
- SMP/SP programming paradigms
- SMP/SP paradigms