CUDA Occupancy Calculator
GPU Occupancy Data is displayed here and in the graphs
| Active Threads per Multiprocessor |
|
| Active Warps per Multiprocessor |
|
| Active Thread Blocks per Multiprocessor |
|
| Occupancy of each Multiprocessor |
|
Physical Limits for GPU Compute Capability
| Version |
|
| Threads per Warp |
|
| Warps per Multiprocessor |
|
| Threads per Multiprocessor |
|
| Thread Blocks per Multiprocessor |
|
| Total # of 32-bit registers per Multiprocessor |
|
| Register allocation unit size |
|
| Register allocation granularity |
|
| Max registers per Block |
|
| Max registers per thread |
|
| Shared Memory per Multiprocessor (bytes) |
|
| Shared Memory Allocation unit size |
|
| Warp allocation granularity (for register allocation) |
|
| Max thread block size |
|
Allocation Per Thread Block
| Warps |
|
| Registers |
|
| Shared Memory |
|
Note: CUDA Runtime uses bytes of Shared Memory per Thread Block.
Maximum Thread Blocks Per Multiprocessor
| Limited by Max Warps / Blocks per Multiprocessor |
|
| Limited by Registers per Multiprocessor |
|
| Limited by Shared Memory per Multiprocessor |
|
Impact of Varying Block Size
Impact of Varying Register Count Per Thread
Impact of Varying Shared Memory Usage Per Block