Ampere (microarchitecture)
Fabrication process | |
---|---|
History | |
Predecessor | |
Successor | Hopper |
Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures, officially announced on May 14, 2020. It is named after French mathematician and physicist André-Marie Ampère.[1][2] Nvidia announced the next-gen GeForce 30 series consumer GPUs at a GeForce Special Event on September 1, 2020.[3][4]
Details
Architectural improvements of the Ampere architecture include the following:[citation needed]
- CUDA Compute Capability 8.0 on GA100 & Compute Capability 8.6 on GA10x
- TSMC's 7 nm FinFET process for GA100
- Custom Samsung's 8nm process (8N) for GA10x
- Third-generation Tensor Cores with FP16, bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration[5]
- Second-generation Ray Tracing Cores, plus concurrent ray tracing and shading and compute (Available on GeForce 30 series)
- High Bandwidth Memory 2 (HBM2) on GA100
- GDDR6X memory on GA102 (GeForce RTX 3090 & RTX 3080)
- NVLink 3.0 (50Gbit/s per pair)[5]
- PCI Express 4.0 with SR-IOV support
- Multi-Instance GPU (MIG) virtualization & GPU partitioning feature (support up to 7 on A100)
- PureVideo Feature Set K hardware video decoding with AV1 hardware decoding[6] on GA10x & Feature Set J hardware decoding on GA100
- 5 NVDECs
- Adds new hardware-based 5-core JPEG decode (NVJPG) with YUV420, YUV422, YUV444, YUV400, RGBA. Should not be confused with Nvidia NVJPEG (GPU-accelerated library for JPEG encoding/decoding)
Comparison of Compute Capability: GP100 vs GV100 vs GA100[7][non-primary source needed]
GPU Features | NVIDIA Tesla P100 | NVIDIA Tesla V100 | NVIDIA A100 |
---|---|---|---|
GPU Codename | GP100 | GV100 | GA100 |
GPU Architecture | NVIDIA Pascal | NVIDIA Volta | NVIDIA Ampere |
Compute Capability | 6 | 7 | 8 |
Threads / Warp | 32 | 32 | 32 |
Max Warps / SM | 64 | 64 | 64 |
Max Threads / SM | 2048 | 2048 | 2048 |
Max Thread Blocks / SM | 32 | 32 | 32 |
Max 32-bit Registers / SM | 65536 | 65536 | 65536 |
Max Registers / Block | 65536 | 65536 | 65536 |
Max Registers / Thread | 255 | 255 | 255 |
Max Thread Block Size | 1024 | 1024 | 1024 |
FP32 Cores / SM | 64 | 64 | 64 |
Ratio of SM Registers to FP32 Cores | 1024 | 1024 | 1024 |
Shared Memory Size / SM | 64 KB | Configurable up to 96 KB | Configurable up to 164 KB |
Comparison of Precision Support Matrix[8][9]
Supported CUDA Core Precisions | Supported Tensor Core Precisions | |||||||||||||||
FP16 | FP32 | FP64 | INT1(Binary) | INT4 | INT8 | TF32 | bfloat16(BF16) | FP16 | FP32 | FP64 | INT1(Binary) | INT4 | INT8 | TF32 | bfloat16(BF16) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NVIDIA Tesla P4 | No | Yes | Yes | No | No | Yes | No | No | No | No | No | No | No | No | No | No |
NVIDIA P100 | Yes | Yes | Yes | No | No | No | No | No | No | No | No | No | No | No | No | No |
NVIDIA Volta | Yes | Yes | Yes | No | No | Yes | No | No | Yes | No | No | No | No | No | No | No |
NVIDIA Turing | Yes | Yes | Yes | No | No | Yes | No | No | Yes | No | No | Yes | Yes | Yes | No | No |
NVIDIA A100 | Yes | Yes | Yes | No | No | Yes | No | Yes | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes |
Comparison of Decode Performance
Concurrent Streams | H.264 Decode (1080p30) | H.265(HEVC) Decode (1080p30) | VP9 Decode (1080p30) |
---|---|---|---|
V100[citation needed] | 16[clarification needed] | 22 | 22 |
A100[citation needed] | 75 | 157 | 108 |
A100 accelerator and DGX A100
Announced and released on May 14, 2020 was the Ampere-based A100 accelerator.[5] The A100 features 19.5 teraflops of FP32 performance, 6912 CUDA cores, 40GB of graphics memory, and 1.6TB/s of graphics memory bandwidth.[10] The A100 accelerator was initially available only in the 3rd generation of DGX server, including 8 A100s.[5] Also included in the DGX A100 is 15TB of PCIe gen 4 NVMe storage,[10] two 64-core AMD Rome 7742 CPUs, 1 TB of RAM, and Mellanox-powered HDR InfiniBand interconnect. The initial price for the DGX A100 was $199,000.[5]
Comparison of accelerators used in DGX:[5][11]
Accelerator |
---|
A100 |
V100 |
P100 |
Architecture | FP32 CUDA Cores | FP64 Cores(excl. Tensor) | INT32 Cores | Boost Clock | Memory Clock | Memory Bus Width | Memory Bandwidth | VRAM | Single Precision | Double Precision(FP64) | INT8(non-Tensor) | INT8 Tensor | INT32 | FP16 | FP16 Tensor | bfloat16 Tensor | TensorFloat-32(TF32) Tensor | FP64 Tensor | Interconnect | GPU | L1 Cache Size | L2 Cache Size | GPU Die Size | Transistor Count | TDP | Manufacturing Process |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Ampere | 6912 | 3456 | 6912 | 1410 MHz | 2.4Gbit/s HBM2 | 5120-bit | 1555GB/sec | 40GB | 19.5 TFLOPs | 9.7 TFLOPs | N/A | 624 TOPs | 19.5 TOPs | 78 TFLOPs | 312 TFLOPs | 312 TFLOPs | 156 TFLOPs | 19.5 TFLOPs | 600GB/sec | GA100 | 20736KB(192KBx108) | 40960 KB | 826mm2 | 54.2B | 400W | TSMC 7 nm N7[citation needed] |
Volta | 5120 | 2560 | 5120 | 1530 MHz | 1.75Gbit/s HBM2 | 4096-bit | 900GB/sec | 16GB/32GB | 15.7 TFLOPs | 7.8 TFLOPs | 62 TOPs | N/A | 15.7 TOPs | 31.4 TFLOPs | 125 TFLOPs | N/A | N/A | N/A | 300GB/sec | GV100 | 10240KB(128KBx80) | 6144 KB | 815mm2 | 21.1B | 300W/350W | TSMC 12 nm FFN[citation needed] |
Pascal | 3584 | 1792 | N/A | 1480 MHz | 1.4Gbit/s HBM2 | 4096-bit | 720GB/sec | 16GB | 10.6 TFLOPs | 5.3 TFLOPs | N/A | N/A | N/A | 21.2 TFLOPs | N/A | N/A | N/A | N/A | 160GB/sec | GP100 | 1344KB(24KBx56) | 4096 KB | 610mm2 | 15.3B | 300W | TSMC 16 nm FinFET+ |
References
- ^ https://nvidianews.nvidia.com/news/nvidias-new-ampere-data-center-gpu-in-full-production
- ^ "NVIDIA Ampere Architecture In-Depth". NVIDIA Developer Blog. May 14, 2020.
- ^ https://nvidianews.nvidia.com/news/nvidia-delivers-greatest-ever-generational-leap-in-performance-with-geforce-rtx-30-series-gpus
- ^ "NVIDIA GeForce Ultimate Countdown". NVIDIA.
- ^ a b c d e f Ryan Smith (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.
- ^ "GeForce RTX 30 Series GPUs: Ushering In A New Era of Video Content With AV1 Decode". NVIDIA.
- ^ https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf
- ^ "NVIDIA Tensor Cores: Versatility for HPC & AI". NVIDIA.
- ^ "Abstract". docs.nvidia.com.
- ^ a b Tom Warren; James Vincent (May 14, 2020). "Nvidia's first Ampere GPU is designed for data centers and AI, not your PC". The Verge.
- ^ "NVIDIA Tesla V100 tested: near unbelievable GPU power". TweakTown. September 17, 2017.