SSE5: Difference between revisions
SSE4 is not SSSE3 |
PetraMagna (talk | contribs) m Reverted 1 edit by 35.151.18.6 (talk) to last revision by 2A0D:6FC0:9FC:3000:D7D4:D2C5:91D:7284 |
||
(104 intermediate revisions by 72 users not shown) | |||
Line 1: | Line 1: | ||
The '''SSE5''' (short for '''Streaming SIMD Extensions version 5''') was a [[SIMD]] instruction set extension proposed by [[Advanced Micro Devices|AMD]] on August 30, 2007 as a supplement to the 128-bit [[Streaming SIMD Extensions|SSE]] core instructions in the [[AMD64]] architecture. |
|||
{{future chip}} |
|||
AMD chose not to implement SSE5 as originally proposed. In May 2009, AMD replaced SSE5 with three smaller instruction set extensions named as [[XOP instruction set|XOP]], [[FMA instruction set|FMA4]], and [[F16C]], which retain the proposed functionality of SSE5, but encode the instructions differently for better compatibility with Intel's proposed [[Advanced Vector Extensions|AVX]] instruction set. |
|||
The '''SSE5''' (short for '''Streaming [[SIMD]] Extensions 5'''), announced on August 30, 2007, is a new 128-bit extension to the [[AMD64]] instruction set (itself a 64-bit extension to the 32-bit Intel x86 instruction set) for the [[Bulldozer (processor)|AMD Bulldozer]] processor, due to begin production in 2009. |
|||
The three SSE5-derived instruction sets were introduced in the [[Bulldozer (microarchitecture)|Bulldozer]] processor core, released in October 2011 on a [[32 nanometer|32 nm]] process.<ref>{{cite web | url=https://arstechnica.com/news.ars/post/20081114-amd-fusion-now-pushed-back-to-2011.html | title=AMD Fusion now pushed back to 2011 | date=November 14, 2008 | first=Joel | last=Hruska | publisher=[[Ars Technica]]}}</ref> |
|||
⚫ | SSE5 |
||
==Compatibility== |
|||
⚫ | |||
AMD's SSE5 extension bundle does not include the full set of [[Intel]]'s [[SSE4]] instructions, making it a competitor to SSE4 rather than a successor. |
|||
⚫ | |||
⚫ | |||
==SSE5 enhancements== |
|||
⚫ | The proposed SSE5 instruction set consisted of 170 instructions (including 46 base instructions), many of which are designed to improve single-threaded performance. Some SSE5 instructions are [[instruction set#Code density|3-operand instructions]], the use of which will increase the average number of [[instructions per cycle]] achievable by [[x86]] code.<ref name="Reg1">{{cite web | url=https://www.theregister.co.uk/2007/08/30/amd_sse5/ | title=AMD plots single thread boost with x86 extensions | date=August 30, 2007 | first=Ashlee | last=Vance |author-link=Ashlee Vance | publisher=[[The Register]]}}</ref> Selected new instructions include:<ref>{{cite web | url=http://developer.amd.com/SSE5 | title=128-Bit SSE5 Instruction Set | publisher=[[Advanced Micro Devices|AMD]] Developer Central | access-date=January 28, 2008 |archive-url = https://web.archive.org/web/20080115163416/http://developer.amd.com/SSE5 <!-- Bot retrieved archive --> |archive-date = January 15, 2008}}</ref> |
||
⚫ | |||
⚫ | |||
⚫ | |||
* Precision control, rounding, and conversion instructions |
* Precision control, rounding, and conversion instructions |
||
AMD |
AMD claimed SSE5 would provide dramatic performance improvements, particularly in [[high-performance computing]] (HPC), [[multimedia]], and [[computer security]] applications, including a 5x performance gain for [[Advanced Encryption Standard|AES encryption]] and a 30% performance gain for the [[discrete cosine transform]] (DCT) used for example in video processing.<ref name="Reg1"/> |
||
== |
==2009 revision== |
||
The SSE5 specification included a proposed extension to the general coding scheme of [[x86]] instructions in order to allow instructions to have more than two operands. In 2008, [[Intel]] announced their planned [[Advanced Vector Extensions|AVX]] instruction set which proposed a different way of coding instructions with more than two operands. The two proposed coding schemes, SSE5 and AVX, are mutually incompatible, although the AVX scheme has certain advantages over the SSE5 scheme: most importantly, AVX has plenty of space for future extensions, including larger vector sizes. |
|||
⚫ | |||
In May 2009, AMD published a revised specification for the planned future instructions. This revision changes the coding scheme to make it compatible with the AVX scheme, but with a differing prefix byte in order to avoid overlap between instructions introduced by AMD and instructions introduced by Intel. |
|||
⚫ | |||
* [[Streaming SIMD Extensions|SSE]], [[SSE2]], [[SSE3]], [[SSSE3]], [[SSE4]], [[SSE4a]] |
|||
The revised instruction set no longer carries the name SSE5, which has been criticized for being misleading, but most of the instructions in the new revision are functionally identical to the original SSE5 specification—only the way the instructions are coded differs. The planned additions to the AMD instruction set consists of three subsets: |
|||
* [[SIMD]] |
|||
* [[3DNow!]] Professional |
|||
#[[XOP instruction set|XOP]]: Integer vector [[multiply–accumulate]] instructions, integer vector horizontal addition, integer vector compare, shift and rotate instructions, byte permutation and conditional move instructions, floating point fraction extraction. |
|||
#[[FMA instruction set|FMA4]]: Floating-point vector [[multiply–accumulate]]. |
|||
#[[F16C]]: [[Half-precision]] floating-point conversion. |
|||
Both XOP and FMA4 are removed in newer AMD processors using the [[Zen (microarchitecture)|Zen microarchitecture]].<ref>{{cite web|url=http://www.phoronix.com/scan.php?page=article&item=amd-ryzen-znver1&num=1|title=The Impact Of GCC Zen Compiler Tuning On AMD Ryzen Performance|author=Michael Larabel|date=March 3, 2017|website=[[Phoronix]]|quote=But with Zen being a clean-sheet design, there are some instruction set extensions found in Bulldozer processors not found in Zen/znver1. Those no longer present include FMA4 and XOP.}}</ref> |
|||
⚫ | |||
* [[x86 instruction listings]] |
* [[x86 instruction listings]] |
||
* [[Fused multiply–add]] |
|||
==References== |
|||
⚫ | |||
⚫ | |||
⚫ | |||
* [http://support.amd.com/TechDocs/43479.pdf AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4 Instructions] |
|||
*[https://community.amd.com/thread/98392 AMD and Intel incompatible - What to do? AMD Developer Forums] |
|||
{{Use mdy dates|date=October 2018}} |
|||
⚫ | |||
{{AMD technology}} |
|||
⚫ | |||
{{Multimedia extensions|state=uncollapsed}} |
|||
* [http://www.dailytech.com/article.aspx?newsid=8666 AMD Announces SSE5 Instruction Set], DailyTech, August 30, 2007, accessed August 30, 2007. |
|||
[[Category:X86 instructions]] |
[[Category:X86 instructions]] |
||
[[Category:SIMD computing]] |
[[Category:SIMD computing]] |
||
[[Category:AMD technologies]] |
Latest revision as of 11:38, 7 November 2024
The SSE5 (short for Streaming SIMD Extensions version 5) was a SIMD instruction set extension proposed by AMD on August 30, 2007 as a supplement to the 128-bit SSE core instructions in the AMD64 architecture.
AMD chose not to implement SSE5 as originally proposed. In May 2009, AMD replaced SSE5 with three smaller instruction set extensions named as XOP, FMA4, and F16C, which retain the proposed functionality of SSE5, but encode the instructions differently for better compatibility with Intel's proposed AVX instruction set.
The three SSE5-derived instruction sets were introduced in the Bulldozer processor core, released in October 2011 on a 32 nm process.[1]
Compatibility
[edit]AMD's SSE5 extension bundle does not include the full set of Intel's SSE4 instructions, making it a competitor to SSE4 rather than a successor.
SSE5 enhancements
[edit]The proposed SSE5 instruction set consisted of 170 instructions (including 46 base instructions), many of which are designed to improve single-threaded performance. Some SSE5 instructions are 3-operand instructions, the use of which will increase the average number of instructions per cycle achievable by x86 code.[2] Selected new instructions include:[3]
- Fused multiply–accumulate (FMACxx) instructions
- Integer multiply–accumulate (IMAC, IMADC) instructions
- Permutation (PPERM, PERMPx) and conditional move (PCMOV) instructions
- Precision control, rounding, and conversion instructions
AMD claimed SSE5 would provide dramatic performance improvements, particularly in high-performance computing (HPC), multimedia, and computer security applications, including a 5x performance gain for AES encryption and a 30% performance gain for the discrete cosine transform (DCT) used for example in video processing.[2]
2009 revision
[edit]The SSE5 specification included a proposed extension to the general coding scheme of x86 instructions in order to allow instructions to have more than two operands. In 2008, Intel announced their planned AVX instruction set which proposed a different way of coding instructions with more than two operands. The two proposed coding schemes, SSE5 and AVX, are mutually incompatible, although the AVX scheme has certain advantages over the SSE5 scheme: most importantly, AVX has plenty of space for future extensions, including larger vector sizes.
In May 2009, AMD published a revised specification for the planned future instructions. This revision changes the coding scheme to make it compatible with the AVX scheme, but with a differing prefix byte in order to avoid overlap between instructions introduced by AMD and instructions introduced by Intel.
The revised instruction set no longer carries the name SSE5, which has been criticized for being misleading, but most of the instructions in the new revision are functionally identical to the original SSE5 specification—only the way the instructions are coded differs. The planned additions to the AMD instruction set consists of three subsets:
- XOP: Integer vector multiply–accumulate instructions, integer vector horizontal addition, integer vector compare, shift and rotate instructions, byte permutation and conditional move instructions, floating point fraction extraction.
- FMA4: Floating-point vector multiply–accumulate.
- F16C: Half-precision floating-point conversion.
Both XOP and FMA4 are removed in newer AMD processors using the Zen microarchitecture.[4]
See also
[edit]References
[edit]- ^ Hruska, Joel (November 14, 2008). "AMD Fusion now pushed back to 2011". Ars Technica.
- ^ a b Vance, Ashlee (August 30, 2007). "AMD plots single thread boost with x86 extensions". The Register.
- ^ "128-Bit SSE5 Instruction Set". AMD Developer Central. Archived from the original on January 15, 2008. Retrieved January 28, 2008.
- ^ Michael Larabel (March 3, 2017). "The Impact Of GCC Zen Compiler Tuning On AMD Ryzen Performance". Phoronix.
But with Zen being a clean-sheet design, there are some instruction set extensions found in Bulldozer processors not found in Zen/znver1. Those no longer present include FMA4 and XOP.
External links
[edit]- A New SSE Instruction Set: AMD Announces SSE5, AnandTech
- AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4 Instructions
- AMD and Intel incompatible - What to do? AMD Developer Forums