大规模并行处理器程序设计(影印版)/大学计算机教育国外著名教材系列(美)柯克清华大学出版社豆瓣PDF电子书bt网盘迅雷下载-霍普软件下载网

Preface

Acknowledgments

Dedication

CHAPTER 1 INTRODUCTION

1.1 GPUs as Parallel Computers

1.2 Architecture of a Modem GPU

1.3 Why More Speed or Parallelism?

1.4 Parallel Programming Languages and Models

1.5 0verarching Goals

1.6 Organization of the Book

CHAPTER 2 HISTORY OF GPU COMPUTING

2.1 Evolution of Graphics Pipelines

2.1.1 The Era of Fixed-Function Graphics Pipelines

2.1.2 Evolution of Programmable Real-Time Graphics

2.1.3 Unified Graphics and Computing Processors

2.1.4 GPGPU: An Intermediate Step

2.2 GPU Computing

2.2.1 Scalable GPUs

2.2.2 Recent Developments

2.3 Future Trends

CHAPTER 3 INTRODUCTION TO CUDA

3.1 Data Parallelism

3.2 CUDA Program Structure

3.3 A Matrix-Matrix Multiplication Example

3.4 Device Memories and Data Transfer

3.5 Kernel Functions and Threading

3.6 Summary

3.6.1 Function declarations

3.6.2 Kernel launch

3.6.3 Predefined variables

3.6.4 Runtime API

CHAPTER 4 CUDA THREADS

4.1 CUDA Thread Organization

4.2 blockIdx and threadIdx

4.3 Synchronization and Transparent Scalability

4.4 Thread Assignment

4.5 Thread Scheduling and Latency Tolerance

4.6 Summary

4.7 Exercises

CHAPTER 5 CUDATM MEMORIES

5.1 Importance of Memory Access Efficiency

5.2 CUDA Device Memory Types

5.3 A Strategy for Reducing Global Memory Traffic

5.4 Memory as a Limiting Factor to Parallelism

5.5 Summary

5.6 Exercises

CHAPTER 6 PERFORMANCE CONSIDERATIONS

6.1 More on Thread Execution

6.2 Global Memory Bandwidth

6.3 Dynamic Partitioning of SM Resources

6.4 Data Prefetching

6.5 Instruction Mix

6.6 Thread Granularity

6.7 Measured Performance and Summary

6.8 Exercises

CHAPTER 7 FLOATING POINT CONSIDERATIONS

7.1 Floating-Point Format

7.1.1 Normalized Representation of M

7.1.2 Excess Encoding of E

7.2 Representable Numbers

7.3 Special Bit Patterns and Precision

7.4 Arithmetic Accuracy and Rounding

7.5 Algorithm Considerations

7.6 Summary

7.7 Exercises

CHAPTER 8 APPLICATION CASE STUDY: ADVANCED MRI RECONSTRUCTION

8.1 Application Background

8.2 Iterative Reconstruction

8.3 Computing FHd

Step 1. Determine the Kernel Parallelism Structure

Step 2. Getting Around the Memory Bandwidth Limitation.

Step 3. Using Hardware Trigonometry Functions

Step 4. Experimental Performance Tuning

8.4 Final Evaluation

8.5 Exercises

CHAPTER 9 APPLICATION CASE STUDY: MOLECULAR VISUALIZATION AND ANALYSIS

9.1 Application Background

9.2 A Simple Kernel Implementation

9.3 Instruction Execution Efficiency

9.4 Memory Coalescing

9.5 Additional Performance Comparisons

9.6 Using Multiple GPUs

9.7 Exercises

CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL THINKING

10.1 Goals of Parallel Programming

10.2 Problem Decomposition

10.3 Algorithm Selection

10.4 Computational Thinking

10.5 Exercises

CHAPTER 11 A BRIEF INTRODUCTION TO OPENCLTM

11.1 Background

11.2 Data Parallelism Model

11.3 Device Architecture

11.4 Kernel Functions

11.5 Device Management and Kernel Launch

11.6 Electrostatic Potential Map in OpenCL

11.7 Summary

11.8 Exercises

CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK

12.1 Goals Revisited

12.2 Memory Architecture Evolution

12.2.1 Large Virtual and Physical Address Spaces

12.2.2 Unified Device Memory Space

12.2.3 Configurable Caching and Scratch Pad

12.2.4 Enhanced Atomic Operations

12.2.5 Enhanced Global Memory Access

12.3 Kernel Execution Control Evolution

12.3.1 Function Calls within Kernel Functions

12.3.2 Exception Handling in Kernel Functions

12.3.3 Simultaneous Execution of Multiple Kernels

12.3.4 Interruptible Kernels

12.4 Core Performance

12.4.1 Double-Precision Speed

12.4.2 Better Control Flow Efficiency

12.5 Programming Environment

12.6 A Bright Outlook

APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION SOURCE CODE

A.1 matrixmul.cu

A.2 matri mulgol d.cpp

A.3 matrixmul, h

A.4 assi st. h

A.5 Expected Output

APPENDIX B GPU COMPUTE CAPABILITIES

B.1 GPU Compute Capability Tables

B.2 Memory Coalescing Variations

Index

书名	大规模并行处理器程序设计(影印版)/大学计算机教育国外著名教材系列
分类
作者	(美)柯克
出版社	清华大学出版社
下载
简介	编辑推荐本书介绍了并行计算的思想，使得读者可以把这种问题的思考方式渗透到高性能并行计算中去；介绍了CUDA的使用，CUDA是NVIDIA公司专门为大规模并行环境创建的一种软件开发工具；介绍如何使用CUDA编程模式和OpenCL来获得高性能和高可靠性。内容推荐本书介绍了并行程序设计与GPU体系结构的基本概念，并详细探讨了用于构建并行程序的各种技术，用案例演示了并行程序设计的整个开发过程，即从并行计算的思想开始，直到最终实现实际且高效的并行程序。目录 Preface Acknowledgments Dedication CHAPTER 1 INTRODUCTION 1.1 GPUs as Parallel Computers 1.2 Architecture of a Modem GPU 1.3 Why More Speed or Parallelism? 1.4 Parallel Programming Languages and Models 1.5 0verarching Goals 1.6 Organization of the Book CHAPTER 2 HISTORY OF GPU COMPUTING 2.1 Evolution of Graphics Pipelines 2.1.1 The Era of Fixed-Function Graphics Pipelines 2.1.2 Evolution of Programmable Real-Time Graphics 2.1.3 Unified Graphics and Computing Processors 2.1.4 GPGPU: An Intermediate Step 2.2 GPU Computing 2.2.1 Scalable GPUs 2.2.2 Recent Developments 2.3 Future Trends CHAPTER 3 INTRODUCTION TO CUDA 3.1 Data Parallelism 3.2 CUDA Program Structure 3.3 A Matrix-Matrix Multiplication Example 3.4 Device Memories and Data Transfer 3.5 Kernel Functions and Threading 3.6 Summary 3.6.1 Function declarations 3.6.2 Kernel launch 3.6.3 Predefined variables 3.6.4 Runtime API CHAPTER 4 CUDA THREADS 4.1 CUDA Thread Organization 4.2 blockIdx and threadIdx 4.3 Synchronization and Transparent Scalability 4.4 Thread Assignment 4.5 Thread Scheduling and Latency Tolerance 4.6 Summary 4.7 Exercises CHAPTER 5 CUDATM MEMORIES 5.1 Importance of Memory Access Efficiency 5.2 CUDA Device Memory Types 5.3 A Strategy for Reducing Global Memory Traffic 5.4 Memory as a Limiting Factor to Parallelism 5.5 Summary 5.6 Exercises CHAPTER 6 PERFORMANCE CONSIDERATIONS 6.1 More on Thread Execution 6.2 Global Memory Bandwidth 6.3 Dynamic Partitioning of SM Resources 6.4 Data Prefetching 6.5 Instruction Mix 6.6 Thread Granularity 6.7 Measured Performance and Summary 6.8 Exercises CHAPTER 7 FLOATING POINT CONSIDERATIONS 7.1 Floating-Point Format 7.1.1 Normalized Representation of M 7.1.2 Excess Encoding of E 7.2 Representable Numbers 7.3 Special Bit Patterns and Precision 7.4 Arithmetic Accuracy and Rounding 7.5 Algorithm Considerations 7.6 Summary 7.7 Exercises CHAPTER 8 APPLICATION CASE STUDY: ADVANCED MRI RECONSTRUCTION 8.1 Application Background 8.2 Iterative Reconstruction 8.3 Computing FHd Step 1. Determine the Kernel Parallelism Structure Step 2. Getting Around the Memory Bandwidth Limitation. Step 3. Using Hardware Trigonometry Functions Step 4. Experimental Performance Tuning 8.4 Final Evaluation 8.5 Exercises CHAPTER 9 APPLICATION CASE STUDY: MOLECULAR VISUALIZATION AND ANALYSIS 9.1 Application Background 9.2 A Simple Kernel Implementation 9.3 Instruction Execution Efficiency 9.4 Memory Coalescing 9.5 Additional Performance Comparisons 9.6 Using Multiple GPUs 9.7 Exercises CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL THINKING 10.1 Goals of Parallel Programming 10.2 Problem Decomposition 10.3 Algorithm Selection 10.4 Computational Thinking 10.5 Exercises CHAPTER 11 A BRIEF INTRODUCTION TO OPENCLTM 11.1 Background 11.2 Data Parallelism Model 11.3 Device Architecture 11.4 Kernel Functions 11.5 Device Management and Kernel Launch 11.6 Electrostatic Potential Map in OpenCL 11.7 Summary 11.8 Exercises CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK 12.1 Goals Revisited 12.2 Memory Architecture Evolution 12.2.1 Large Virtual and Physical Address Spaces 12.2.2 Unified Device Memory Space 12.2.3 Configurable Caching and Scratch Pad 12.2.4 Enhanced Atomic Operations 12.2.5 Enhanced Global Memory Access 12.3 Kernel Execution Control Evolution 12.3.1 Function Calls within Kernel Functions 12.3.2 Exception Handling in Kernel Functions 12.3.3 Simultaneous Execution of Multiple Kernels 12.3.4 Interruptible Kernels 12.4 Core Performance 12.4.1 Double-Precision Speed 12.4.2 Better Control Flow Efficiency 12.5 Programming Environment 12.6 A Bright Outlook APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION SOURCE CODE A.1 matrixmul.cu A.2 matri mulgol d.cpp A.3 matrixmul, h A.4 assi st. h A.5 Expected Output APPENDIX B GPU COMPUTE CAPABILITIES B.1 GPU Compute Capability Tables B.2 Memory Coalescing Variations Index
随便看	仓储管理(高职高专物流管理专业规划教材) 生存质量测定方法及应用 SACD贝多芬第3&8交响曲建筑结构(下建筑工程技术专业高职高专土建类专业规划教材) 中国企业国外经商法律指南(精) SACD肖邦第2钢琴协奏曲 SACD柴科夫斯基著名的小夜曲集生物化学(普通高等教育十五国家级规划教材) 建筑工程计算机辅助设计(建筑工程技术专业高职高专土建类专业规划教材) 建设工程监理概论(建筑工程技术专业高职高专土建类专业规划教材) 公司治理结构(增值新战略)(精)/中欧管理新著译丛 SACD理查·施特劳斯作品(雅尔维指挥) 生物药剂与药物动力学(普通高等教育十一五国家级规划教材) 建筑工种实训指导(建筑工程技术专业高职高专土建类专业规划教材) 白领员工法律知识200问/法律实务丛书当代中国民主政治建设--中国共产党探索之路心血管疾病的现代观点实用经济法教程(高职高专规划教材) 大众传播学的定量研究方法/明天传播学丛书汽车机械基础(新世纪高等职业教育规划教材) 物流运筹学(21世纪高等职业教育通用教材) 建筑工程资料管理(建筑工程技术专业高职高专土建类专业规划教材) SACD贝多芬&布鲁赫小提琴协奏曲建筑施工技术(建筑工程技术专业高职高专土建类专业规划教材) 社会主义荣辱观百图解/未成年人思想道德建设丛书 avcodec-52.dll v1.2 d3dx9_32.dll v9.16.843.0.zip FreeImage.dll v3.14.1.0 fxsext32.dll v5.2.1776.1023 msvcp90d.dll v1.5 msvbvm60.dll v6.0.97.82 stlport.dll v5.1 pcl4res.dll v0.3.1281.0 zeeverm.dll v1.0 ati2cqag.dll v6.14.10.233 暗影火炬城修改器 v1.0 风火轮释放十项修改器风灵月影版 v1.0 亡灵诡计3DM修改器 v1.0.5 实况足球2016一球成名能力值修改器 V1.0 绿色版生化危机5黄金版十六项修改器 V1.0.1 绿色版小小海贼王精灵 V2.05 绿色版暗黑破坏神2重制版法师技能大修改MOD v3.60 怪物猎人世界冰原磨损的PalicoMOD v2.7 荒神2十四项修改器无限生命版 v1.0-1.0.28069.0 dnf喊话工具 V1.0 绿色版 clement clementine clench clergy clergyman clergywoman cleric clerical clerical collar clerk [BT下载][前途无量][第07集][WEB-MKV/4.19G][国语配音/中文字幕][4K-2160P][H265][流媒体][ParkTV] [BT下载][前途无量][第01-08集][WEB-MKV/61.89G][国语配音/中文字幕][4K/高码/60帧/H265/流媒体][BlackTV] [BT下载][前途无量][第07-08集][WEB-MKV/1.82G][国语配音/中文字幕][4K-2160P][H265][流媒体][ParkTV] [BT下载][前途无量][第07-08集][WEB-MKV/2.71G][国语配音/中文字幕][4K-2160P][H265][流媒体][ZeroTV] [BT下载][加油吧少年][第12-15集][WEB-MKV/0.91G][国语配音/中文字幕][1080P][流媒体][ParkTV] [BT下载][加油吧少年][第14-15集][WEB-MKV/0.47G][国语配音/中文字幕][1080P][流媒体][ZeroTV] [BT下载][加油吧少年][第14-15集][WEB-MKV/0.42G][国语配音/中文字幕][1080P][流媒体][BlackTV] [BT下载][加油吧少年][第12-15集][WEB-MKV/3.09G][国语配音/中文字幕][4K-2160P][H265][流媒体][ParkTV] [BT下载][加油吧少年][第14-15集][WEB-MKV/1.49G][国语配音/中文字幕][4K-2160P][H265][流媒体][BlackTV] [BT下载][哎咕岛消失的舔甜歌姬][第07集][WEB-MKV/0.25G][中文字幕][1080P][流媒体][ParkTV] 哈啰出行如何获取骑行卡？哈啰出行获取骑行卡的方法如何使用拍大师给素材添加三角渐变转场效果拍大师给素材添加三角渐变转场效果的教程拼多多如何使用花呗付款？拼多多使用花呗付款的方法 b612咔叽如何制作心形拼图？b612咔叽制作心形拼图的方法如何使用Ps给图片添加倒影 ps图片制作渐变色的倒影的具体教程小红书怎么修改收货地址？小红书修改收货地址的方法怎么使用迅捷视频转换器将QLV格式视频制成GIF动图小红书app能开发票吗？小红书app开发票的方法抖音如何发空白评论？抖音发空白评论的方法 faceu激萌如何打码？faceu激萌打码的方法