专家
公告
财富商城
电子网
旗下网站
首页
问题库
专栏
标签库
话题
专家
NEW
门户
发布
提问题
发文章
DSP
【DSP开发】【并行计算-CUDA开发】TI OpenCL v01.01.xx
2019-07-13 13:04
发布
生成海报
站内文章
/
DSP
12441
0
1247
TI OpenCL v01.01.xx
TI OpenCL™ Runtime Documentation Contents:
Introduction
OpenCL 1.1 Reference Material
Compilation
Compile Host OpenCL Applications
Compiling OpenCL C Programs
Create an OpenCL program from source, with embedded source
Create an OpenCL program from source, with source in a file
Create an OpenCL program from binary, with binary in a file
Create an OpenCL program from binary, with embedded binary
Caching on-line compilation results
The TI off-line OpenCL C compiler: clocl
Memory Usage
Device Memory
Caching
How DDR3 is Partitioned for Linux System and OpenCL
66AK2x
AM57
Changing DDR3 Partition for OpenCL
Alternate Host malloc/free Extension for Zero Copy OpenCL Kernels
The OpenCL Memory Model
OpenCL Buffers
Global Buffers
Local Buffers
Sub-Buffers
Buffer Read/Write vs. Map/Unmap
Discovering OpenCL Memory Sizes and Limits
Cache Operations
Large OpenCL buffers and Memory Beyond the 32-bit DSP Address Space
Large Buffer Use Cases
User Defined DSP Heap Extension
User Defined DSP Heap Built-in Functions
Allocation of the Underlying Memory for User Defined DSP Heaps
Putting it all Together
Execution Model
Terminology
Device Discovery
Understanding Kernels, Work-groups and Work-items
Enqueueing a Kernel
Mapping the OpenCL C work-item Built-in Functions
OpenCL C Kernel Code
NDRangeKernel Execution on DSP Devices
Extensions
Calling Standard C Code From OpenCL C Code
Calling Standard C code with OpenMP from OpenCL C code
OpenMP dispatch from OpenCL
C66x standard C compiler intrinsic functions
OpenCL C code using printf
DMA Control Using EdmaMgr Functions
Single Transfer EdmaMgr APIs
Multiple Transfer EdmaMgr APIs
Using Extended Memory on the 66AK2x device
Fast Global buffers in on-chip MSMC memory
OpenCL C Builtin Function Extensions
Cache Operations
Environment Variables
Optimization Tips
Optimization Techniques for Host Code
Use Off-line, Embedded Compilation Model
Avoid the read/write Buffer model on shared memory SoC platforms
Use MSMC Buffers Whenever Possible
Dispatch Appropriate Compute Loads
Prefer Kernels with 1 work-item per work-group
Optimization Techniques for Device (DSP) Code
Prefer Kernels with 1 work-item per work-group
Use Local Buffers
Use async_work_group_copy and async_work_group_strided_copy
Avoid DSP writes directly to DDR
Use the reqd_work_group_size attribute on kernels
Use the TI OpenCL extension than allows Standard C code to be called from OpenCL C code
Avoid OpenCL C Barriers
Use the most efficient data type on the DSP
Do Not Use Large Vector Types
Consecutive memory accesses
Prefer the CPU style of writing OpenCL code over the GPU style
Typical Steps to Optimize Device Code
Optimizing 3x3 Gaussian smoothing filter
Overview of Gaussian Filter
Natural C Code
Optimizing for DSP
Performance Improvement
Performance Data
Examples
Building and Running
Example Descriptions
platforms example
simple example
mandelbrot, mandelbrot_native examples
ccode example
matmpy example
offline example
vecadd_openmp example
vecadd_openmp_t example
vecadd example
vecadd_mpax example
vecadd_mpax_openmp example
dsplib_fft example
ooo, ooo_map examples
null example
sgemm example
dgemm example
edmamgr example
dspheap example
Float compute example
Host Code (main.cpp)
OpenCL C kernel code (dsp_compute.cl)
Sample Output
Monte Carlo example
Algorithm for Gaussian Random Number Generation
Executing the code
Sample Output
Debug
Debug with printf
Host side OpenCL application code
DSP side OpenCL kernel code
Debug with gdb
Host side gdb
DSP side debug with host side client gdbc6x
Debug with CCS
Connect emulator to EVM and CCS
Debug DSP side code with CCS
Debug with dsptop
Profiling
Host Side Profiling
DSP Side Profiling
OpenCL on TI-RTOS
Overview
OpenCL on RTOS Package
Running Examples Shipped with OpenCL Package
Basic OpenCL RTOS Application Development
Building Application on Linux
Building Application on Windows
Creating an OpenCL RTOS Application
Limited Customization: Participating DSP Core(s)
Differences from OpenCL Linux (Host running Linux)
Advanced OpenCL RTOS Application Development
Frequently Asked Questions
How do I get support for TI OpenCL products?
Which TI OpenCL Version is Installed?
Using Python OpenCL with the TI OpenCL implementation
Guidelines for porting Stand-alone DSP applications to OpenCL
Heap Memory Management
Stack Usage
Boot Routine Dependencies
Linker Command Files
OpenCL Interoperability with Host OpenMP
MCSDK-HPC to OpenCL Component Version Map
Does TI’s OpenCL support images and samplers?
Why does the OpenCL ICD installed on my platform not find the TI OpenCL implementation?
Why do I get messages about /var/lock/opencl when running OpenCL applications?
Why do I get DLOAD error messages when running OpenCL applications?
How do I limit log file sizes on EVM’s temporary file storage (tmpfs)?
66AK2* EVMs
AM57* EVMs
Readme
OpenCL v01.01.09.x Readme
Platforms supported
Release Notes
Compiler Versions
OpenCL v01.01.08.x Readme
Platforms supported
Release Notes
Compiler Versions
OpenCL v01.01.07.x Readme
Platforms supported
Release Notes
Compiler Versions
Disclaimer
Important Notice
Ta的文章
更多
>>
【DSP开发】【并行计算-CUDA开发】TI OpenCL v01.01.xx
0 个评论
热门文章
×
关闭
举报内容
检举类型
检举内容
检举用户
检举原因
广告推广
恶意灌水
回答内容与提问无关
抄袭答案
其他
检举说明(必填)
提交
关闭
×
打开微信“扫一扫”,打开网页后点击屏幕右上角分享按钮