Ampere (microarchitecture)

Nvidia Ampere
Fabrication process	TSMC 7 nm (Professional); Samsung 8 nm (Consumer);
History
Predecessor	Turing (consumer); Volta (professional);
Successor	Hopper

Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures, officially announced on May 14, 2020. It is named after French mathematician and physicist André-Marie Ampère.^[1]^[2] Nvidia announced the next-gen GeForce 30 series consumer GPUs at a GeForce Special Event on September 1, 2020.^[3]^[4]

Details

Architectural improvements of the Ampere architecture include the following:^{[citation needed]}

CUDA Compute Capability 8.0 on GA100 & Compute Capability 8.6 on GA10x
TSMC's 7 nm FinFET process for GA100
Custom Samsung's 8nm process (8N) for GA10x
Third-generation Tensor Cores with FP16, bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration^[5]
Second-generation Ray Tracing Cores, plus concurrent ray tracing and shading and compute (Available on GeForce 30 series)
High Bandwidth Memory 2 (HBM2) on GA100
GDDR6X memory on GA102 (GeForce RTX 3090 & RTX 3080)
NVLink 3.0 (50Gbit/s per pair)^[5]
PCI Express 4.0 with SR-IOV support
Multi-Instance GPU (MIG) virtualization & GPU partitioning feature (support up to 7 on A100)
PureVideo Feature Set K hardware video decoding with AV1 hardware decoding^[6] on GA10x & Feature Set J hardware decoding on GA100
5 NVDECs
Adds new hardware-based 5-core JPEG decode (NVJPG) with YUV420, YUV422, YUV444, YUV400, RGBA. Should not be confused with Nvidia NVJPEG (GPU-accelerated library for JPEG encoding/decoding)

Comparison of Compute Capability: GP100 vs GV100 vs GA100^[7]^{[non-primary source needed]}

GPU Features	NVIDIA Tesla P100	NVIDIA Tesla V100	NVIDIA A100
GPU Codename	GP100	GV100	GA100
GPU Architecture	NVIDIA Pascal	NVIDIA Volta	NVIDIA Ampere
Compute Capability	6	7	8
Threads / Warp	32	32	32
Max Warps / SM	64	64	64
Max Threads / SM	2048	2048	2048
Max Thread Blocks / SM	32	32	32
Max 32-bit Registers / SM	65536	65536	65536
Max Registers / Block	65536	65536	65536
Max Registers / Thread	255	255	255
Max Thread Block Size	1024	1024	1024
FP32 Cores / SM	64	64	64
Ratio of SM Registers to FP32 Cores	1024	1024	1024
Shared Memory Size / SM	64 KB	Configurable up to 96 KB	Configurable up to 164 KB

Comparison of Precision Support Matrix^[8]^[9]

	FP16	FP32	FP64	INT1(Binary)	INT4	INT8	TF32	bfloat16(BF16)	FP16	FP32	FP64	INT1(Binary)	INT4	INT8	TF32	bfloat16(BF16)
	Supported CUDA Core Precisions								Supported Tensor Core Precisions
NVIDIA Tesla P4	No	Yes	Yes	No	No	Yes	No	No	No	No	No	No	No	No	No	No
NVIDIA P100	Yes	Yes	Yes	No	No	No	No	No	No	No	No	No	No	No	No	No
NVIDIA Volta	Yes	Yes	Yes	No	No	Yes	No	No	Yes	No	No	No	No	No	No	No
NVIDIA Turing	Yes	Yes	Yes	No	No	Yes	No	No	Yes	No	No	Yes	Yes	Yes	No	No
NVIDIA A100	Yes	Yes	Yes	No	No	Yes	No	Yes	Yes	No	Yes	Yes	Yes	Yes	Yes	Yes

Comparison of Decode Performance

Concurrent Streams	H.264 Decode (1080p30)	H.265(HEVC) Decode (1080p30)	VP9 Decode (1080p30)
V100^{[citation needed]}	16^{[clarification needed]}	22	22
A100^{[citation needed]}	75	157	108

A100 accelerator and DGX A100

Announced and released on May 14, 2020 was the Ampere-based A100 accelerator.^[5] The A100 features 19.5 teraflops of FP32 performance, 6912 CUDA cores, 40GB of graphics memory, and 1.6TB/s of graphics memory bandwidth.^[10] The A100 accelerator was initially available only in the 3rd generation of DGX server, including 8 A100s.^[5] Also included in the DGX A100 is 15TB of PCIe gen 4 NVMe storage,^[10] two 64-core AMD Rome 7742 CPUs, 1 TB of RAM, and Mellanox-powered HDR InfiniBand interconnect. The initial price for the DGX A100 was $199,000.^[5]

Comparison of accelerators used in DGX:^[5]^[11]

Accelerator
A100
V100
P100

Architecture	FP32 CUDA Cores	FP64 Cores(excl. Tensor)	INT32 Cores	Boost Clock	Memory Clock	Memory Bus Width	Memory Bandwidth	VRAM	Single Precision	Double Precision(FP64)	INT8(non-Tensor)	INT8 Tensor	INT32	FP16	FP16 Tensor	bfloat16 Tensor	TensorFloat-32(TF32) Tensor	FP64 Tensor	Interconnect	GPU	L1 Cache Size	L2 Cache Size	GPU Die Size	Transistor Count	TDP	Manufacturing Process
Ampere	6912	3456	6912	1410 MHz	2.4Gbit/s HBM2	5120-bit	1555GB/sec	40GB	19.5 TFLOPs	9.7 TFLOPs	N/A	624 TOPs	19.5 TOPs	78 TFLOPs	312 TFLOPs	312 TFLOPs	156 TFLOPs	19.5 TFLOPs	600GB/sec	GA100	20736KB(192KBx108)	40960 KB	826mm2	54.2B	400W	TSMC 7 nm N7^{[citation needed]}
Volta	5120	2560	5120	1530 MHz	1.75Gbit/s HBM2	4096-bit	900GB/sec	16GB/32GB	15.7 TFLOPs	7.8 TFLOPs	62 TOPs	N/A	15.7 TOPs	31.4 TFLOPs	125 TFLOPs	N/A	N/A	N/A	300GB/sec	GV100	10240KB(128KBx80)	6144 KB	815mm2	21.1B	300W/350W	TSMC 12 nm FFN^{[citation needed]}
Pascal	3584	1792	N/A	1480 MHz	1.4Gbit/s HBM2	4096-bit	720GB/sec	16GB	10.6 TFLOPs	5.3 TFLOPs	N/A	N/A	N/A	21.2 TFLOPs	N/A	N/A	N/A	N/A	160GB/sec	GP100	1344KB(24KBx56)	4096 KB	610mm2	15.3B	300W	TSMC 16 nm FinFET+

References

^ https://nvidianews.nvidia.com/news/nvidias-new-ampere-data-center-gpu-in-full-production
^ "NVIDIA Ampere Architecture In-Depth". NVIDIA Developer Blog. May 14, 2020.
^ https://nvidianews.nvidia.com/news/nvidia-delivers-greatest-ever-generational-leap-in-performance-with-geforce-rtx-30-series-gpus
^ "NVIDIA GeForce Ultimate Countdown". NVIDIA.
^ ^a ^b ^c ^d ^e ^f Ryan Smith (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.
^ "GeForce RTX 30 Series GPUs: Ushering In A New Era of Video Content With AV1 Decode". NVIDIA.
^ https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf
^ "NVIDIA Tensor Cores: Versatility for HPC & AI". NVIDIA.
^ "Abstract". docs.nvidia.com.
^ ^a ^b Tom Warren; James Vincent (May 14, 2020). "Nvidia's first Ampere GPU is designed for data centers and AI, not your PC". The Verge.
^ "NVIDIA Tesla V100 tested: near unbelievable GPU power". TweakTown. September 17, 2017.

External links

[1] ttps://nvidianews.nvidia.com/news/nvidias-new-ampere-data-center-gpu-in-full-production

[2] "NVIDIA Ampere Architecture In-Depth". NVIDIA Developer Blog. May 14, 2020.

[3] ttps://nvidianews.nvidia.com/news/nvidia-delivers-greatest-ever-generational-leap-in-performance-with-geforce-rtx-30-series-gpus

[4] "NVIDIA GeForce Ultimate Countdown". NVIDIA.

[anand-A100-5] ^ ^a ^b ^c ^d ^e ^f Ryan Smith (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.

[6] "GeForce RTX 30 Series GPUs: Ushering In A New Era of Video Content With AV1 Decode". NVIDIA.

[7] ttps://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf

[8] "NVIDIA Tensor Cores: Versatility for HPC & AI". NVIDIA.

[9] "Abstract". docs.nvidia.com.

[verge-A100-10] Tom Warren; James Vincent (May 14, 2020). "Nvidia's first Ampere GPU is designed for data centers and AI, not your PC". The Verge.

[11] "NVIDIA Tesla V100 tested: near unbelievable GPU power". TweakTown. September 17, 2017.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]