Ampere (microarchitecture): Difference between revisions
No edit summary |
No edit summary |
||
Line 52: | Line 52: | ||
==External links== |
==External links== |
||
* [https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf NVIDIA A100 Tensor Core GPU Architecture whitepaper] |
|||
* [https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/ Nvidia Ampere Architecture] |
* [https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/ Nvidia Ampere Architecture] |
||
* [https://www.nvidia.com/en-us/data-center/a100/ Nvidia A100 Tensor Core GPU] |
* [https://www.nvidia.com/en-us/data-center/a100/ Nvidia A100 Tensor Core GPU] |
Revision as of 17:31, 19 May 2020
Fabrication process | TSMC 7 nm (FinFET) |
---|---|
History | |
Predecessor | |
Successor | Hopper |
Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to the Volta architecture, officially announced on May 14, 2020. It is named after French mathematician and physicist André-Marie Ampère.[1][2] Nvidia's announcement focused on professional AI use, data centers, and Nvidia Drive (self-driving automobiles, autopilot, etc.) and it was not clear at that time whether Ampere would be seen in consumer GPUs as the successor to Turing.[3]
Details
Architectural improvements of the Ampere architecture include the following:
- CUDA Compute Capability 8.0
- TSMC's 7 nm FinFET process
- Third-generation Tensor Cores with FP16, bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration[4]
- High Bandwidth Memory 2 (HBM2)
- NVLink 3.0 (50Gbps per pair)[4]
- PCI Express 4.0 with SR-IOV support
- Multi-Instance GPU (MIG) virtualization & GPU partitioning feature
A100 accelerator and DGX A100
Announced and released on May 14, 2020 was the Ampere-based A100 accelerator.[4] The A100 features 19.5 teraflops of FP32 performance, 6912 CUDA cores, 40GB of graphics memory, and 1.6TB/s of graphics memory bandwidth.[3] The A100 accelerator was initially available only in the 3rd generation of DGX server, including 8 A100s.[4] Also included in the DGX A100 is 15TB of PCIe gen 4 NVMe storage,[3] two 64-core AMD Rome 7742 CPUs, 1 TB of RAM, and Mellanox-powered HDR InfiniBand interconnect. The initial price for the DGX A100 was $199,000.[4]
Comparison of accelerators used in DGX:[4]
Accelerator |
---|
A100 |
V100 |
P100 |
Architecture | FP32 CUDA Cores | Boost Clock | Memory Clock | Memory Bus Width | Memory Bandwidth | VRAM | Single Precision | Double Precision | INT8 Tensor | FP16 Tensor | bfloat16 Tensor | TensorFloat-32(TF32) Tensor | FP64 Tensor | Interconnect | GPU | GPU Die Size | Transistor Count | TDP | Manufacturing Process |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Ampere | 6912 | ~1410MHz | 2.4Gbps HBM2 | 5120-bit | 1555GB/sec | 40GB | 19.5 TFLOPs | 9.7 TFLOPs | 624 TFLOPs | 312 TFLOPs | 312 TFLOPs | 156 TFLOPs | 19.5 TFLOPS | 600GB/sec | GA100 | 826mm2 | 54.2B | 400W | TSMC 7nm N7 |
Volta | 5120 | 1530MHz | 1.75Gbps HBM2 | 4096-bit | 900GB/sec | 16GB/32GB | 15.7 TFLOPs | 7.8 TFLOPs | N/A | 125 TFLOPs | N/A | N/A | N/A | 300GB/sec | GV100 | 815mm2 | 21.1B | 300W/350W | TSMC 12nm FFN |
Pascal | 3584 | 1480MHz | 1.4Gbps HBM2 | 4096-bit | 720GB/sec | 16GB | 10.6 TFLOPs | 5.3 TFLOPs | N/A | N/A | N/A | N/A | N/A | 160GB/sec | GP100 | 610mm2 | 15.3B | 300W | TSMC 16nm FinFET |
References
- ^ [https://nvidianews.nvidia.com/news/nvidias-new-ampere-data-center-gpu-in-full-production bare url
- ^ [https://devblogs.nvidia.com/nvidia-ampere-architecture-in-depth/ bare url
- ^ a b c Tom Warren; James Vincent (May 14, 2020). "Nvidia's first Ampere GPU is designed for data centers and AI, not your PC". The Verge.
- ^ a b c d e f Ryan Smith (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.