GPU
Books
David B. Kirk and Wen-mei W. Hwu. 2010. Programming Massively Parallel Processors: A Hands-On Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
Jason Sanders and Edward Kandrot. 2010. CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional.
Wen-mei W. Hwu. 2011. GPU Computing Gems Emerald Edition. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
Web resources
Rob Farber. CUDA, Supercomputing for the Masses. Dr. Dobbs Journal. 21 parts.
1: Introduction
2: Execution of kernels
3: Error handling and global memory performance limitations
4: Understanding and using shared memory (1)
5: Understanding and using shared memory (2)
6: Global memory and the CUDA profiler
7: Comparison G80 vs. GT200 architecture
8: Using libraries with CUDA
9: Extending High-level Languages with CUDA
10: CUDPP, a powerful data-parallel CUDA library
11: Revisiting CUDA memory spaces
12: CUDA 2.2 Changes the Data Movement Paradigm
13: Using texture memory in CUDA
14: Debugging CUDA and using CUDA-GDB
15: Using Pixel Buffer Objects with CUDA and OpenGL
16: CUDA 3.0 provides expanded capabilities
17: CUDA 3.0 provides expanded capabilities and makes development easier
18: Using Vertex Buffer Objects with CUDA and OpenGL
19: Parallel Nsight Part 1: Configuring and Debugging Applications
20: Parallel Nsight Part 2: Using the Parallel Nsight Analysis capabilities
21: The Fermi architecture and CUDA
Victor W. Lee, Changkyu Kim, Jatin Chhugani et.al. 2010. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. SIGARCH Comput. Archit. News 38, 3 (June 2010), 451-460. DOI=10.1145/1816038.1816021
John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable Parallel Programming with CUDA. Queue 6, 2 (March 2008), 40-53. DOI=10.1145/1365490.1365500. Download of the PDF paper from ACM Queue.
NVDIDIA - CUDA Resources
Documentation: getting started guide, programming guides, reference documents
Downloads: driver, toolkit, SDK
Whitepaper : NVIDIA’s Next Generation CUDA Compute Architecture: Fermi
HPC - High Performance Computing
Aad van Steen. 2010. <b>Overview of recent supercomputers.</b></a> NCF. <a href="http://www.euroben.nl/reports/web10/overview.php">Online version</a>.</p>
Program optimization
David Goldberg. 1991. What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. 23, 1 (March 1991), 5-48. DOI=10.1145/103162.103163
Ulrich Drepper. 2007. What Every Programmer Should Know About Memory. Reference at Citeseer
Last change: 2014-09-27