Jump to content

Ampere (microarchitecture): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Details: replacing codenames with actual products for simplicity
A100 supports SR-IOV as per A100 whitepaper
Line 28: Line 28:
* GDDR6X memory for GeForce RTX 3090 and 3080
* GDDR6X memory for GeForce RTX 3090 and 3080
* [[NVLink|NVLink 3.0]] with a 50Gbit/s per pair throughput<ref name=anand-A100 />
* [[NVLink|NVLink 3.0]] with a 50Gbit/s per pair throughput<ref name=anand-A100 />
* [[PCI Express#PCI Express 4.0|PCI Express 4.0]]
* [[PCI Express#PCI Express 4.0|PCI Express 4.0]] with SR-IOV support
* Multi-Instance GPU (MIG) virtualization and GPU partitioning feature (support up to 7 for A100)
* Multi-Instance GPU (MIG) virtualization and GPU partitioning feature (support up to 7 for A100)
* [[Nvidia PureVideo|PureVideo]] feature set K hardware video decoding with [[AV1]] hardware decoding<ref>{{Cite web|url=https://www.nvidia.com/en-us/geforce/news/rtx-30-series-av1-decoding/|title=GeForce RTX 30 Series GPUs: Ushering In A New Era of Video Content With AV1 Decode|website=NVIDIA}}</ref> for the GeForce 30 series and feature set J for A100
* [[Nvidia PureVideo|PureVideo]] feature set K hardware video decoding with [[AV1]] hardware decoding<ref>{{Cite web|url=https://www.nvidia.com/en-us/geforce/news/rtx-30-series-av1-decoding/|title=GeForce RTX 30 Series GPUs: Ushering In A New Era of Video Content With AV1 Decode|website=NVIDIA}}</ref> for the GeForce 30 series and feature set J for A100

Revision as of 22:01, 19 September 2020

Nvidia Ampere
Fabrication process
History
Predecessor
SuccessorHopper

Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures, officially announced on May 14, 2020. It is named after French mathematician and physicist André-Marie Ampère.[1][2] Nvidia announced the next-gen GeForce 30 series consumer GPUs at a GeForce Special Event on September 1, 2020.[3][4]

Details

Architectural improvements of the Ampere architecture include the following:[citation needed]

  • CUDA Compute Capability 8.0 for A100 and 8.6 for the GeForce 30 series
  • TSMC's 7 nm FinFET process for A100
  • Custom Samsung's 8nm process (8N) for the GeForce 30 series
  • Third-generation Tensor Cores with FP16, bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration[5]
  • Second-generation ray tracing cores; concurrent ray tracing, shading, and compute for the GeForce 30 series
  • High Bandwidth Memory 2 (HBM2) on A100
  • GDDR6X memory for GeForce RTX 3090 and 3080
  • NVLink 3.0 with a 50Gbit/s per pair throughput[5]
  • PCI Express 4.0 with SR-IOV support
  • Multi-Instance GPU (MIG) virtualization and GPU partitioning feature (support up to 7 for A100)
  • PureVideo feature set K hardware video decoding with AV1 hardware decoding[6] for the GeForce 30 series and feature set J for A100
  • 5 NVDECs
  • Adds new hardware-based 5-core JPEG decode (NVJPG) with YUV420, YUV422, YUV444, YUV400, RGBA. Should not be confused with Nvidia NVJPEG (GPU-accelerated library for JPEG encoding/decoding)

Comparison of Compute Capability: GP100 vs GV100 vs GA100[7][non-primary source needed]

GPU Features NVIDIA Tesla P100 NVIDIA Tesla V100 NVIDIA A100
GPU Codename GP100 GV100 GA100
GPU Architecture NVIDIA Pascal NVIDIA Volta NVIDIA Ampere
Compute Capability 6 7 8
Threads / Warp 32 32 32
Max Warps / SM 64 64 64
Max Threads / SM 2048 2048 2048
Max Thread Blocks / SM 32 32 32
Max 32-bit Registers / SM 65536 65536 65536
Max Registers / Block 65536 65536 65536
Max Registers / Thread 255 255 255
Max Thread Block Size 1024 1024 1024
FP32 Cores / SM 64 64 64
Ratio of SM Registers to FP32 Cores 1024 1024 1024
Shared Memory Size / SM 64 KB Configurable up to 96 KB Configurable up to 164 KB

Comparison of Precision Support Matrix[8][9]

Supported CUDA Core Precisions Supported Tensor Core Precisions
FP16 FP32 FP64 INT1(Binary) INT4 INT8 TF32 bfloat16(BF16) FP16 FP32 FP64 INT1(Binary) INT4 INT8 TF32 bfloat16(BF16)
NVIDIA Tesla P4 No Yes Yes No No Yes No No No No No No No No No No
NVIDIA P100 Yes Yes Yes No No No No No No No No No No No No No
NVIDIA Volta Yes Yes Yes No No Yes No No Yes No No No No No No No
NVIDIA Turing Yes Yes Yes No No Yes No No Yes No No Yes Yes Yes No No
NVIDIA A100 Yes Yes Yes No No Yes No Yes Yes No Yes Yes Yes Yes Yes Yes

Comparison of Decode Performance

Concurrent Streams H.264 Decode (1080p30) H.265(HEVC) Decode (1080p30) VP9 Decode (1080p30)
V100[citation needed] 16[clarification needed] 22 22
A100[citation needed] 75 157 108

A100 accelerator and DGX A100

Announced and released on May 14, 2020 was the Ampere-based A100 accelerator.[5] The A100 features 19.5 teraflops of FP32 performance, 6912 CUDA cores, 40GB of graphics memory, and 1.6TB/s of graphics memory bandwidth.[10] The A100 accelerator was initially available only in the 3rd generation of DGX server, including 8 A100s.[5] Also included in the DGX A100 is 15TB of PCIe gen 4 NVMe storage,[10] two 64-core AMD Rome 7742 CPUs, 1 TB of RAM, and Mellanox-powered HDR InfiniBand interconnect. The initial price for the DGX A100 was $199,000.[5]

Comparison of accelerators used in DGX:[5][11]

Accelerator
A100​
V100​
P100
Architecture FP32 CUDA Cores FP64 Cores(excl. Tensor) INT32 Cores Boost Clock Memory Clock Memory Bus Width Memory Bandwidth VRAM Single Precision Double Precision(FP64) INT8(non-Tensor) INT8 Tensor INT32 FP16 FP16 Tensor bfloat16 Tensor TensorFloat-32(TF32) Tensor FP64 Tensor Interconnect GPU L1 Cache Size L2 Cache Size GPU Die Size Transistor Count TDP Manufacturing Process
Ampere 6912 3456 6912 1410 MHz 2.4Gbit/s HBM2 5120-bit 1555GB/sec 40GB 19.5 TFLOPs 9.7 TFLOPs N/A 624 TOPs 19.5 TOPs 78 TFLOPs 312 TFLOPs 312 TFLOPs 156 TFLOPs 19.5 TFLOPs 600GB/sec GA100 20736KB(192KBx108) 40960 KB 826mm2 54.2B 400W TSMC 7 nm N7[citation needed]
Volta 5120 2560 5120 1530 MHz 1.75Gbit/s HBM2 4096-bit 900GB/sec 16GB/32GB 15.7 TFLOPs 7.8 TFLOPs 62 TOPs N/A 15.7 TOPs 31.4 TFLOPs 125 TFLOPs N/A N/A N/A 300GB/sec GV100 10240KB(128KBx80) 6144 KB 815mm2 21.1B 300W/350W TSMC 12 nm FFN[citation needed]
Pascal 3584 1792 N/A 1480 MHz 1.4Gbit/s HBM2 4096-bit 720GB/sec 16GB 10.6 TFLOPs 5.3 TFLOPs N/A N/A N/A 21.2 TFLOPs N/A N/A N/A N/A 160GB/sec GP100 1344KB(24KBx56) 4096 KB 610mm2 15.3B 300W TSMC 16 nm FinFET+

References

  1. ^ Newsroom, NVIDIA. "NVIDIA's New Ampere Data Center GPU in Full Production". NVIDIA Newsroom Newsroom. {{cite web}}: |last= has generic name (help)
  2. ^ "NVIDIA Ampere Architecture In-Depth". NVIDIA Developer Blog. May 14, 2020.
  3. ^ Newsroom, NVIDIA. "NVIDIA Delivers Greatest-Ever Generational Leap with GeForce RTX 30 Series GPUs". NVIDIA Newsroom Newsroom. {{cite web}}: |last= has generic name (help)
  4. ^ "NVIDIA GeForce Ultimate Countdown". NVIDIA.
  5. ^ a b c d e f Ryan Smith (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.
  6. ^ "GeForce RTX 30 Series GPUs: Ushering In A New Era of Video Content With AV1 Decode". NVIDIA.
  7. ^ "NVIDIA A100 Tensor Core GPU Architecture" (PDF). www.nvidia.com. Retrieved September 18, 2020.
  8. ^ "NVIDIA Tensor Cores: Versatility for HPC & AI". NVIDIA.
  9. ^ "Abstract". docs.nvidia.com.
  10. ^ a b Tom Warren; James Vincent (May 14, 2020). "Nvidia's first Ampere GPU is designed for data centers and AI, not your PC". The Verge.
  11. ^ "NVIDIA Tesla V100 tested: near unbelievable GPU power". TweakTown. September 17, 2017.