CUDA Occupancy Calculator

GPU Occupancy Data is displayed here and in the graphs
Active Threads per Multiprocessor
Active Warps per Multiprocessor
Active Thread Blocks per Multiprocessor
Occupancy of each Multiprocessor

Physical Limits for GPU Compute Capability
Version
Threads per Warp
Warps per Multiprocessor
Threads per Multiprocessor
Thread Blocks per Multiprocessor
Total # of 32-bit registers per Multiprocessor
Register allocation unit size
Register allocation granularity
Max registers per Block
Max registers per thread
Shared Memory per Multiprocessor (bytes)
Shared Memory Allocation unit size
Warp allocation granularity (for register allocation)
Max thread block size

Allocation Per Thread Block
Warps
Registers
Shared Memory

Note: CUDA Runtime uses bytes of Shared Memory per Thread Block.

Maximum Thread Blocks Per Multiprocessor
Limited by Max Warps / Blocks per Multiprocessor
Limited by Registers per Multiprocessor
Limited by Shared Memory per Multiprocessor

Impact of Varying Block Size

Impact of Varying Register Count Per Thread

Impact of Varying Shared Memory Usage Per Block

Fork me on GitHub