Kepler (microarchitecture): Difference between revisions

Content deleted Content added

Inline

Revision as of 19:00, 28 July 2013

Template:Unreviewed Nvidia Kepler Architecture is the first Nvidia GPU architecture to focuse on energy efficiency.

Overview

Where the goal of the previous architecture, Fermi, was to increase raw performance (particularly for compute and tessellation), Nvidia's goal with the Kepler architecture was to increase performance per watt, while still striving for overall performance increases.[1] The primary way they achieved this goal was through the use of a unified clock. By abandoning the shader clock found in their previous GPU designs, efficiency is increased, even though it requires more cores to achieve similar levels of performance. This is not only because the cores are more power efficient (two Kepler cores using about 90% of the power of one Fermi core, according to Nvidia's numbers), but also because the reduction in clock speed delivers a 50% reduction in power consumption in that area.

Kepler also introduced a new form of texture handling known as bindless textures. Previously, textures needed to be bound by the CPU to a particular slot in a fixed-size table before the GPU could reference them. This led to two limitations: one was that because the table was fixed in size, there could only be as many textures in use at one time as could fit in this table (128). The second was that the CPU was doing unnecessary work: it had to load each texture, and also bind each texture loaded in memory to a slot in the binding table.[1] With bindless textures, both limitations are removed. The GPU can access any texture loaded into memory, increasing the number of available textures and removing the performance penalty of binding.

Finally, with Kepler, Nvidia was able to increase the memory clock to 6 GHz. To accomplish this, they needed to design an entirely new memory controller and bus. While still shy of the theoretical 7 GHz limitation of GDDR5, this is well above the 4 GHz speed of the memory controller for Fermi.

Features

The GeForce 600 Series contains products from both the older Fermi and newer Kepler generations of Nvidia GPUs. Kepler based members of the 600 series add the following standard features to the GeForce family:

PCI Express 3.0 interface
DisplayPort 1.2
HDMI 1.4a 4K x 2K video output
Purevideo VP5 hardware video acceleration with up to 4K x 2K H.264 decode support
Hardware H.264 encoding acceleration block (NVENC)
Support for up to 4 independent 2D displays, or 3 stereoscopic/3D displays (NV Surround)
Next Generation Streaming Multiprocessor (SMX)
Simplified Instruction Scheduler
Bindless Textures
CUDA Compute Capability 3.0
GPU Boost
TXAA Support

Manufactured by TSMC on a 28 nm process

Next Generation Streaming Multiprocessor (SMX)

The Kepler architecture employs a new Streaming Multiprocessor Architecture called SMX. The SMX are the key method for Kepler's power efficiency as the whole GPU uses a single "Core Clock" rather than the double-pump "Shader Clock". Although the SMX usage of a single unified clock increases the GPU power efficiency due to the fact that one Kepler CUDA Cores consume 90% power of two Fermi CUDA Core, consequently the SMX needs additional processing units to execute a whole warp per cycle. Kepler also needed to increase raw GPU performance as to remain competitive. As a result, it doubled the CUDA Cores from 16 to 32 per CUDA array, 3 CUDA Cores Array to 6 CUDA Cores Array, 1 load/store and 1 SFU group to 2 load/store and 2 SFU group. The GPU processing resources are also double. From 2 warp schedulers to 4 warp schedulers, 4 dispatch unit became 8 and the register file doubled to 64K entries as to increase performance. With the doubling of GPU processing units and resources increasing the usage of die spaces, The capability of the PolyMorph Engine aren't double but enhanced, making it capable of spurring out a polygon in 2 cycles instead of 4.[3] With Kepler, Nvidia not only have to work on power efficiency but also on area efficiency, thus Nvidia opted to use 8 dedicated FP64 CUDA cores in a SMX as to save die space while still offering FP64 capabilities since all Kepler CUDA cores are not FP64 capable. With the improvement Nvidia made on Kepler, the results include an increase in GPU graphic performance while downplaying FP64 performance.

A New Instruction Scheduler

Additional die areas are acquired by replacing the complex hardware scheduler with a simple software scheduler. With software scheduling, warps scheduling was moved to Nvidia's compiler and as the GPU math pipeline now has a fixed latency, it now include the utilization of Instruction-Level Parallelism and superscalar execution in addition to Thread-Level Parallelism. As instructions are statically scheduled, scheduling inside a warp becomes redundant since the latency of the math pipeline is already known. This resulted an increase in die area space and power efficiency.

GPU Boost

GPU Boost is a new feature which is roughly analogous to turbo boosting of a CPU. The GPU is always guaranteed to run at a minimum clock speed, referred to as the "base clock". This clock speed is set to the level which will ensure that the GPU stays within TDP specifications, even at maximum loads. When loads are lower, however, there is room for the clock speed to be increased without exceeding the TDP. In these scenarios, GPU Boost will gradually increase the clock speed in steps, until the GPU reaches a predefined power target (which is 170W by default). By taking this approach, the GPU will ramp its clock up or down dynamically, so that it is providing the maximum amount of speed possible while remaining within TDP specifications.

The power target, as well as the size of the clock increase steps that the GPU will take, are both adjustable via third-party utilities and provide a means of overclocking Kepler-based cards.

Microsoft Direct3D Support

Nvidia Fermi and Kepler GPUs of the GeForce 600 series support the Direct3D 11.0 specification. Nvidia originally stated that the Kepler architecture has full DirectX 11.1 support, which includes the Direct3D 11.1 path. The following " Modern UI " Direct3D 11.1 features, however, are not supported:

Target-Independent Rasterization (2D rendering only).
16xMSAA Rasterization (2D rendering only).
Orthogonal Line Rendering Mode.
UAV (Unordered Access View) in non-pixel-shader stages.

According to the definition by Microsoft, Direct3D Feature Level 11_1 must be complete, otherwise the Direct3D 11.1 path can not be executed.[8] ↵The integrated Direct3D features of the Kepler architecture are the same as those of the GeForce 400 series Fermi architecture.

TXAA Support

Exclusive to Kepler GPUs, TXAA is a new anti-aliasing method from Nvidia that is designed for direct implementation into game engines. TXAA is based on the MSAA technique and custom resolve filters. It is design to addresses a key problem in games known as shimmering or temporal aliasing. TXAA resolves that by smoothing out the scene in motion, making sure that any in-game scene is being cleared of any aliasing and shimmering.

NVENC

NVENC is Nvidia's power efficient fixed-function encode that is able to take codecs, decode, preprocess, and encode H.264-based content. NVENC specification input formats are limited to H.264 output. But still, NVENC, through its limited format, can support up to 4096x4096 encode.

Like Intel’s Quick Sync, NVENC is currently exposed through a proprietary API, though Nvidia does have plans to provide NVENC usage through CUDA.

@@ Line 3: / Line 3: @@
 <!--- Write your article below this line --->
 Nvidia Kepler Architecture is the first Nvidia GPU architecture to focuse on energy efficiency.
 '''Overview'''
@@ Line 15: / Line 17: @@
 The GeForce 600 Series contains products from both the older Fermi and newer Kepler generations of Nvidia GPUs. Kepler based members of the 600 series add the following standard features to the GeForce family:
+* PCI Express 3.0 interface
+* DisplayPort 1.2
+* HDMI 1.4a 4K x 2K video output
+* Purevideo VP5 hardware video acceleration with up to 4K x 2K H.264 decode support
+* Hardware H.264 encoding acceleration block (NVENC)
+* Support for up to 4 independent 2D displays, or 3 stereoscopic/3D displays (NV Surround)
+* Next Generation Streaming Multiprocessor (SMX)
+* Simplified Instruction Scheduler
+* Bindless Textures
+* CUDA Compute Capability 3.0
+* GPU Boost
+* TXAA Support
+* Manufactured by TSMC on a 28 nm process
-PCI Express 3.0 interface
-DisplayPort 1.2
-HDMI 1.4a 4K x 2K video output
-Purevideo VP5 hardware video acceleration with up to 4K x 2K H.264 decode support
-Hardware H.264 encoding acceleration block (NVENC)
-Support for up to 4 independent 2D displays, or 3 stereoscopic/3D displays (NV Surround)
-Next Generation Streaming Multiprocessor (SMX)
-Simplified Instruction Scheduler
-Bindless Textures
-CUDA Compute Capability 3.0
-GPU Boost
-TXAA Support
-Manufactured by TSMC on a 28 nm process
@@ Line 48: / Line 50: @@
 Nvidia Fermi and Kepler GPUs of the GeForce 600 series support the Direct3D 11.0 specification. Nvidia originally stated that the Kepler architecture has full DirectX 11.1 support, which includes the Direct3D 11.1 path. The following " Modern UI " Direct3D 11.1 features, however, are not supported:
+* Target-Independent Rasterization (2D rendering only).
+* 16xMSAA Rasterization (2D rendering only).
+* Orthogonal Line Rendering Mode.
+* UAV (Unordered Access View) in non-pixel-shader stages.
-Target-Independent Rasterization (2D rendering only).
-xMSAA Rasterization (2D rendering only).
-Orthogonal Line Rendering Mode.
-UAV (Unordered Access View) in non-pixel-shader stages.
 According to the definition by Microsoft, Direct3D Feature Level 11_1 must be complete, otherwise the Direct3D 11.1 path can not be executed.[8] ↵The integrated Direct3D features of the Kepler architecture are the same as those of the GeForce 400 series Fermi architecture.
@@ Line 73: / Line 75: @@
 *
 *
 <!--- STOP! Be warned that by using this process instead of Articles for Creation, this article is subject to scrutiny. As an article in "mainspace", it will be DELETED if there are problems, not just declined. If you wish to use AfC, please return to the Wizard and continue from there. --->

Revision as of 19:00, 28 July 2013

References