Advanced Matrix Extensions: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 00:14, 17 December 2024

Advanced Matrix Extensions (AMX), also known as Intel Advanced Matrix Extensions (Intel AMX), are extensions to the x86 instruction set architecture (ISA) for microprocessors from Intel originally designed to work on matrices to accelerate artificial intelligence (AI) and machine learning (ML) workloads.^[1] They can also be applied to optimal routing problems, graph problems, optimisation and others.^[2]

Extensions

AMX was introduced by Intel in June 2020 and first supported by Intel with the Sapphire Rapids microarchitecture for Xeon servers, released in January 2023.^[3]^[4] It introduced 2-dimensional registers called tiles upon which accelerators can perform operations. It is intended as an extensible architecture; the first accelerator implemented is called tile matrix multiply unit (TMUL).^[5]^[6]

In Intel Architecture Instruction Set Extensions and Future Features revision 46, published in September 2022, a new AMX-FP16 extension was documented. This extension adds support for half-precision floating-point numbers. In revision 48 from March 2023, AMX-COMPLEX was documented, adding support for half-precision floating-point complex numbers. Both extensions are available in the Granite Rapids set of server processors (with AMX-COMPLEX support only being available in Granite Rapids-D^[7]).

Tile matrix multiply unit

TMUL unit supports BF16 and INT8 input types.^[8] AMX-FP16 and AMX-COMPLEX also add support for real and complex FP16 numbers. The register file consists of 8 tiles, each with 16 rows of size of 64 bytes (32 BF16/FP16 or 64 INT8 elements). The only supported operation is matrix multiplication ${\textstyle C_{nm}+=\sum _{j=1}^{J}A_{nj}B_{jm}.}$ ^[5]

4th Gen Intel Xeon Scalable processor can perform 2048 INT8 or 1024 BF16 operations per cycle:^[9]^[10] the maximal input sizes are ${\textstyle 16\times J}$ for $A$ and ${\textstyle J\times 16}$ for $B$ , where $J$ is 64 for INT8 and 32 for BF16. The matrix multiplication requires ${\textstyle 256J}$ multiplication and ${\textstyle 256J}$ additions, thus performing ${\textstyle 512J}$ operations in 16 cycles.^[10]

Software support

Compiler and assembler support
- LLVM 13^[11]^[12]^[13]^[14]
- GCC 11^[15]^[16]^[17]
- GNU Assembler (GAS) initial support committed on 25 June 2020^[18]^[14]
Operating system support
- glibc support for detecting AMX feature in CPUs committed on 25 June 2020^[19]
- Linux kernel support since version 5.16^[20]
- VMware vSphere support for AMX in virtual machines released in ESXi version 8.0u1 for VMs using Hardware Version 20^[21]

References

^ Hemsoth, Nicole (August 19, 2021). "With AMX, Intel Adds AI/ML Sparkle to Sapphire Rapids". The Next Platform.
^ Respondek, J.S. (2025). Fast Matrix Multiplication with Applications, Ch. 9. Studies in Big Data. Vol. 166. Springer Cham. ISBN 978-3-031-76929-0.
^ online, heise (28 June 2020). "Intel AMX: Erste Informationen zur Advanced Matrix Extensions Architecture". heise online.
^ Cutress, Ian. "Intel Xeon Sapphire Rapids: How To Go Monolithic with Tiles". AnandTech.
^ ^a ^b "Intel® Architecture Instruction Set Extensions and Future Features".
^ Schor, David (June 29, 2020). "The x86 Advanced Matrix Extension (AMX) Brings Matrix Operations; To Debut with Sapphire Rapids".
^ Larabel, Michael (July 12, 2023). "Intel Granite Rapids D Support Merged Into GCC 14". Phoronix.
^ "Advanced Matrix Extension (AMX) - x86 - WikiChip". en.wikichip.org.
^ "Accelerate Artificial Intelligence (AI) Workloads with Intel Advanced Matrix Extensions (Intel AMX)" (PDF). Intel. Retrieved 2023-04-13.
^ ^a ^b "Intel® 64 and IA-32 Architectures Optimization Reference Manual Volume 1". Intel.
^ "What's New in LLVM for 4th Gen Intel® Xeon® & Max Series CPUs". Retrieved 21 April 2023.
^ Larabel, Michael (2020-07-02). "Intel AMX Support Begins Landing In LLVM". Phoronix. Retrieved 2020-07-02.
^ "[X86-64] Support Intel AMX instructions". GitHub. 2020-07-02. Retrieved 2020-07-02.
^ ^a ^b Larabel, Michael (2020-07-02). "Intel AMX Support Lands In The GNU Assembler". Phoronix. Retrieved 2020-07-02.
^ "GCC 11 Release Series — Changes, New Features, and Fixes - GNU Project". Retrieved 21 April 2023.
^ "[PATCH] Enable GCC support for AMX". 2020-07-06. Retrieved 2020-07-09.
^ "Enable GCC support for AMX-TILE, AMX-INT8, AMX-BF16. · gcc-mirror/gcc@5c60984". GitHub. Retrieved 2022-09-05.
^ "commits with Intel AMX". 2020-07-02. Retrieved 2020-07-02.
^ "x86: Detect Intel Advanced Matrix Extensions". 2020-07-02. Retrieved 2020-07-02.
^ "Linux 5.16 Features Include FUTEX2, Intel AMX, Folios, DG2/Alchemist, More Apple Silicon Support". Phoronix.
^ "Accessing Sapphire Rapids AMX instructions on vSphere". Earl C. Ruby III. 2023-08-24.

External links

[1] Hemsoth, Nicole (August 19, 2021). "With AMX, Intel Adds AI/ML Sparkle to Sapphire Rapids". The Next Platform.

[2] Respondek, J.S. (2025). Fast Matrix Multiplication with Applications, Ch. 9. Studies in Big Data. Vol. 166. Springer Cham. ISBN 978-3-031-76929-0.

[3] , heise (28 June 2020). "Intel AMX: Erste Informationen zur Advanced Matrix Extensions Architecture". heise online.

[4] Cutress, Ian. "Intel Xeon Sapphire Rapids: How To Go Monolithic with Tiles". AnandTech.

[iaiseaffpr-5] "Intel® Architecture Instruction Set Extensions and Future Features".

[6] Schor, David (June 29, 2020). "The x86 Advanced Matrix Extension (AMX) Brings Matrix Operations; To Debut with Sapphire Rapids".

[7] Larabel, Michael (July 12, 2023). "Intel Granite Rapids D Support Merged Into GCC 14". Phoronix.

[8] "Advanced Matrix Extension (AMX) - x86 - WikiChip". en.wikichip.org.

[9] "Accelerate Artificial Intelligence (AI) Workloads with Intel Advanced Matrix Extensions (Intel AMX)" (PDF). Intel. Retrieved 2023-04-13.

[iaorm-10] "Intel® 64 and IA-32 Architectures Optimization Reference Manual Volume 1". Intel.

[11] "What's New in LLVM for 4th Gen Intel® Xeon® & Max Series CPUs". Retrieved 21 April 2023.

[12] Larabel, Michael (2020-07-02). "Intel AMX Support Begins Landing In LLVM". Phoronix. Retrieved 2020-07-02.

[13] "[X86-64] Support Intel AMX instructions". GitHub. 2020-07-02. Retrieved 2020-07-02.

[:0-14] Larabel, Michael (2020-07-02). "Intel AMX Support Lands In The GNU Assembler". Phoronix. Retrieved 2020-07-02.

[15] "GCC 11 Release Series — Changes, New Features, and Fixes - GNU Project". Retrieved 21 April 2023.

[16] "[PATCH] Enable GCC support for AMX". 2020-07-06. Retrieved 2020-07-09.

[17] "Enable GCC support for AMX-TILE, AMX-INT8, AMX-BF16. · gcc-mirror/gcc@5c60984". GitHub. Retrieved 2022-09-05.

[18] "commits with Intel AMX". 2020-07-02. Retrieved 2020-07-02.

[19] "x86: Detect Intel Advanced Matrix Extensions". 2020-07-02. Retrieved 2020-07-02.

[20] "Linux 5.16 Features Include FUTEX2, Intel AMX, Folios, DG2/Alchemist, More Apple Silicon Support". Phoronix.

[21] "Accessing Sapphire Rapids AMX instructions on vSphere". Earl C. Ruby III. 2023-08-24.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

@@ Line 1: / Line 1: @@
+{{Short description|Extensions to the x86 instruction set architecture}}
-{{draft article|subject=cpu}}
+'''Advanced Matrix Extensions''' ('''AMX'''), also known as '''Intel Advanced Matrix Extensions''' ('''Intel AMX'''), are extensions to the [[x86]] [[instruction set architecture]] (ISA) for [[microprocessor]]s from [[Intel]] originally designed to work on [[matrix (mathematics)|matrices]] to accelerate [[artificial intelligence]] (AI) and [[machine learning]] (ML) workloads.<ref>{{Cite web|url=https://www.nextplatform.com/2021/08/19/with-amx-intel-adds-ai-ml-sparkle-to-sapphire-rapids/|title=With AMX, Intel Adds AI/ML Sparkle to Sapphire Rapids|first=Nicole|last=Hemsoth|date=August 19, 2021|website=The Next Platform}}</ref> They can also be applied to optimal [[Vehicle routing problem|routing problems]], graph problems, [[Integer programming|optimisation]] and others.<ref>{{cite book|title=Fast Matrix Multiplication with Applications, Ch. 9 | last = Respondek | first = J.S. | publisher=Springer Cham| series=Studies in Big Data |volume=166| date=2025  |isbn=978-3-031-76929-0 |url=https://link.springer.com/book/9783031769290}}</ref>
-{{short description|Extensions to the x86 instruction set architecture for microprocessors from Intel and AMD}}
-'''Advanced Matrix Extensions''' ('''AMX''', also known as '''Intel® Advanced Matrix Extensions or Intel® AMX''') is an extensions to the [[x86]] [[instruction set architecture]] for [[microprocessor]]s from [[Intel Corporation|Intel]] and [[Advanced Micro Devices|AMD]] which is designed to work on matrices and is meant accelerate AI-related workload, was introduced by Intel in June 2020 and first supported by Intel with the [[Sapphire Rapids]] [[microarchitecture]]. It introduces 2-dimensional registers called "tiles" upon which accelerators can perform operations. It is intended as an extensible architecture, the first accelerator implemented is called TMUL (tile matrix multiply unit) <ref>https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf</ref> <ref>https://fuse.wikichip.org/news/3600/the-x86-advanced-matrix-extension-amx-brings-matrix-operations-to-debut-with-sapphire-rapids/</ref>
-== {{Anchor|AVX1}}Advanced Vector Extensions ==
+==Extensions==
+AMX was introduced by Intel in June 2020 and first supported by Intel with the [[Sapphire Rapids (microprocessor)|Sapphire Rapids]] [[microarchitecture]] for [[Xeon]] servers, released in January 2023.<ref>{{Cite web|url=https://www.heise.de/news/Intel-AMX-Erste-Informationen-zur-Advanced-Matrix-Extensions-Architecture-4797415.html|title=Intel AMX: Erste Informationen zur Advanced Matrix Extensions Architecture|first=heise|last=online|website=heise online|date=28 June 2020 }}</ref><ref>{{Cite web|url=https://www.anandtech.com/show/16921/intel-sapphire-rapids-nextgen-xeon-scalable-gets-a-tiling-upgrade|title=Intel Xeon Sapphire Rapids: How To Go Monolithic with Tiles|first=Ian|last=Cutress|website=[[AnandTech]]}}</ref> It introduced 2-dimensional [[processor register|registers]] called tiles upon which accelerators can perform operations. It is intended as an extensible architecture; the first accelerator implemented is called tile matrix multiply unit (TMUL).<ref name="iaiseaffpr">{{cite web |title=Intel® Architecture Instruction Set Extensions and Future Features |url=https://www.intel.com/content/www/us/en/content-details/790021/intel-architecture-instruction-set-extensions-programming-reference.html }}</ref><ref>{{Cite web|url=https://fuse.wikichip.org/news/3600/the-x86-advanced-matrix-extension-amx-brings-matrix-operations-to-debut-with-sapphire-rapids/|title=The x86 Advanced Matrix Extension (AMX) Brings Matrix Operations; To Debut with Sapphire Rapids|first=David|last=Schor|date=June 29, 2020}}</ref>
-==={{Anchor|TMUL}}TMUL===
-=== Compiler and assembler support ===
-* [[LLVM]] initial support commited at 1 July 2020 <ref>{{Cite web|url=https://www.phoronix.com/scan.php?page=news_item&px=Intel-AMX-LLVM-Starts|title=Intel AMX Support Begins Landing In LLVM|last=Larabel |first=Michael|date=2020-07-02|website=www.phoronix.com|language=en-US|access-date=2020-07-02}}</ref> <ref>{{Cite web|url=https://github.com/llvm/llvm-project/commit/aded4f0cc070fcef6763c9a3c2ba764d652b692e|title=[X86-64] Support Intel AMX instructions |date=2020-07-02|language=en-US|access-date=2020-07-02}}</ref>
-* [[GNU Assembler]] (GAS) initial support commited at 25 June 2020 <ref>{{Cite web|url=https://sourceware.org/git/?p=binutils-gdb.git&a=search&st=commit&s=Intel+AMX|title=commits with Intel AMX|date=2020-07-02|language=en-US|access-date=2020-07-02}}</ref>
+In Intel Architecture Instruction Set Extensions and Future Features revision 46, published in September 2022, a new AMX-FP16 extension was documented. This extension adds support for [[Half-precision floating-point format|half-precision floating-point]] numbers. In revision 48 from March 2023, AMX-COMPLEX was documented, adding support for half-precision floating-point [[complex numbers]]. Both extensions are available in the [[Granite Rapids]] set of server processors (with AMX-COMPLEX support only being available in [[Xeon D|Granite Rapids-D]]<ref>{{cite web |url=https://www.phoronix.com/news/Granite-Rapids-D-GCC-14 |title=Intel Granite Rapids D Support Merged Into GCC 14 |first=Michael |last=Larabel |website=[[Phoronix Test Suite#Phoronix website|Phoronix]] |date=July 12, 2023}}</ref>).
-=== CPUs with AMX ===
+==={{Anchor|Tile Matrix multiply Unit}}Tile matrix multiply unit===
-== Applications ==
+TMUL unit supports [[bfloat16 floating-point format|BF16]] and [[INT8]] input types.<ref>{{Cite web|url=https://en.wikichip.org/wiki/x86/amx|title=Advanced Matrix Extension (AMX) - x86 - WikiChip|website=en.wikichip.org}}</ref> AMX-FP16 and AMX-COMPLEX also add support for real and complex [[Half-precision floating-point format|FP16]] numbers. The register file consists of 8 tiles, each with 16 rows of size of 64 bytes (32 BF16/FP16 or 64 INT8 elements). The only supported operation is [[matrix multiplication]] <math display="inline"> C_{nm} += \sum_{j=1}^J A_{nj}B_{jm}.</math><ref name="iaiseaffpr" />
-=== Software ===
-*[[glibc]] support for detecting AMX feature in CPUs commited at 25 Jun 2020 <ref>{{Cite web|url=https://sourceware.org/git/?p=glibc.git;a=commit;h=4fdd4d41a17dda26c854ed935658154a17d4b906|title=x86: Detect Intel Advanced Matrix Extensions|date=2020-07-02|language=en-US|access-date=2020-07-02}}</ref>
+th Gen Intel Xeon Scalable processor can perform 2048 INT8 or 1024 BF16 operations per cycle:<ref>{{cite web |url=https://www.intel.com/content/dam/www/central-libraries/us/en/documents/2022-12/accelerate-ai-with-amx-sb.pdf |title=Accelerate Artificial Intelligence (AI) Workloads with Intel Advanced Matrix Extensions (Intel AMX) |access-date=2023-04-13 |publisher=Intel}}</ref><ref name="iaorm">{{cite web |url=https://www.intel.com/content/www/us/en/content-details/671488/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html |title=Intel® 64 and IA-32 Architectures Optimization Reference Manual Volume 1  |publisher=Intel}}</ref> the maximal input sizes are <math display="inline">16 \times J</math> for {{math|''A''}} and <math display="inline">J \times 16</math> for {{math|''B''}}, where {{math|''J''}} is 64 for INT8 and 32 for BF16. The matrix multiplication requires <math display="inline">256J</math> multiplication and <math display="inline">256J</math> additions, thus performing <math display="inline">512J</math> operations in 16 cycles.<ref name="iaorm" />
-== References ==
-{{Reflist|30em}}
+== Software support ==
+* Compiler and assembler support
+** [[LLVM]] 13<ref>{{Cite web |title=What's New in LLVM for 4th Gen Intel® Xeon® & Max Series CPUs |access-date=21 April 2023 |url= https://www.intel.com/content/www/us/en/developer/articles/technical/whats-new-in-llvm-for-4th-gen-intel-xeon-processor.html }}</ref><ref>{{Cite web|url=https://www.phoronix.com/scan.php?page=news_item&px=Intel-AMX-LLVM-Starts|title=Intel AMX Support Begins Landing In LLVM|last=Larabel |first=Michael|date=2020-07-02|website=[[Phoronix Test Suite#Phoronix website|Phoronix]]|language=en-US|access-date=2020-07-02}}</ref><ref>{{Cite web|url=https://github.com/llvm/llvm-project/commit/aded4f0cc070fcef6763c9a3c2ba764d652b692e|title=[X86-64] Support Intel AMX instructions |website=[[GitHub]] |date=2020-07-02|language=en-US|access-date=2020-07-02}}</ref><ref name=":0">{{Cite web|url=https://www.phoronix.com/scan.php?page=news_item&px=Intel-AMX-Gas|title=Intel AMX Support Lands In The GNU Assembler|last=Larabel|first=Michael|date=2020-07-02|website=[[Phoronix Test Suite#Phoronix website|Phoronix]]|language=en-US|access-date=2020-07-02}}</ref>
+** [[GNU Compiler Collection|GCC]] 11<ref>{{Cite web |title=GCC 11 Release Series — Changes, New Features, and Fixes - GNU Project |access-date=21 April 2023 |url= https://gcc.gnu.org/gcc-11/changes.html }}</ref><ref>{{Cite web|url=https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549415.html|title=[PATCH] Enable GCC support for AMX|date=2020-07-06|language=en-US|access-date=2020-07-09}}</ref><ref>{{Cite web |title=Enable GCC support for AMX-TILE, AMX-INT8, AMX-BF16. · gcc-mirror/gcc@5c60984 |url=https://github.com/gcc-mirror/gcc/commit/5c609842d13a4c9c6be1a10f7980a74d27daeb85 |access-date=2022-09-05 |website=GitHub |language=en}}</ref>
+** [[GNU Assembler]] (GAS) initial support committed on 25 June 2020<ref>{{Cite web|url=https://sourceware.org/git/?p=binutils-gdb.git&a=search&st=commit&s=Intel+AMX|title=commits with Intel AMX|date=2020-07-02|language=en-US|access-date=2020-07-02}}</ref><ref name=":0"/>
+*Operating system support
+**[[glibc]] support for detecting AMX feature in CPUs committed on 25 June 2020<ref>{{Cite web|url=https://sourceware.org/git/?p=glibc.git;a=commit;h=4fdd4d41a17dda26c854ed935658154a17d4b906|title=x86: Detect Intel Advanced Matrix Extensions|date=2020-07-02|language=en-US|access-date=2020-07-02}}</ref>
+**[[Linux kernel]] support since version 5.16<ref>{{Cite web|url=https://www.phoronix.com/review/linux-516-features|title=Linux 5.16 Features Include FUTEX2, Intel AMX, Folios, DG2/Alchemist, More Apple Silicon Support|website=[[Phoronix Test Suite#Phoronix website|Phoronix]]}}</ref>
+**[[VMware vSphere]] support for AMX in virtual machines released in [[ESXi]] version 8.0u1 for VMs using Hardware Version 20<ref>{{Cite web|url=https://earlruby.org/2023/08/accessing-sapphire-rapids-amx-instructions-on-vsphere/|title=Accessing Sapphire Rapids AMX instructions on vSphere|date=2023-08-24|language=en-US|website=Earl C. Ruby III}}</ref>
+== References ==
+{{Reflist}}
 == External links ==
 * [https://software.intel.com/sites/landingpage/IntrinsicsGuide/ Intel Intrinsics Guide]
+* [https://en.wikichip.org/wiki/x86/amx Wikichip: Advanced Matrix Extension (AMX) - x86]
 {{AMD technology}}
@@ Line 28: / Line 35: @@
 [[Category:X86 instructions]]
 [[Category:SIMD computing]]
-[[Category:Advanced Micro Devices technologies]]
+[[Category:AMD technologies]]

v t e Intel technology
Platforms	Centrino Centrino 2 Viiv MID Tablet CULV Ultrabook Skulltrail NUC Galileo Edison Curie Evo
Discontinued	Common Building Block MultiProcessor Specification Intel Communication Streaming Architecture Intel Inboard 386 Intel Play MMC-1 MMC-2
Current	Advanced Programmable Interrupt Controller CNVi Intel Turbo Boost vPro Intel Secure Key Intel Management Engine Active Management Technology AMT versions High-bandwidth Digital Content Protection High Definition Audio Hub Architecture Rapid Storage Technology SpeedStep Serial Digital Video Out Host Embedded Controller Interface Hyper-threading Omni-Path Platform Environment Control Interface QuickPath Interconnect Platform Controller Hub System Management Bus Thunderbolt Ultra Path Interconnect
Upcoming	Silicon Photonics Link

v t e Instruction set extensions
SIMD (RISC)	Alpha MVI ARM NEON SVE MIPS MDMX MIPS-3D MXU MIPS SIMD PA-RISC MAX Power ISA VMX SPARC VIS
SIMD (x86)	MMX (1996) 3DNow! (1998) SSE (1999) SSE2 (2001) SSE3 (2004) SSSE3 (2006) SSE4 (2006) SSE5 ~~(2007)~~ AVX (2008) F16C (2009) XOP (2009) FMA (FMA4: 2011, FMA3: 2012) AVX2 (2013) AVX-512 (2015) AMX (2022) AVX10 (2023)
Bit manipulation	BMI (ABM: 2007, BMI1: 2012, BMI2: 2013, TBM: 2012) ADX (2014)
Compressed instructions	Thumb MIPS16e ASE RVC
Security and cryptography	PadLock (2003) AES-NI (2008); ARMv8 also has AES instructions CLMUL (2010) RDRAND (2012) SHA (2013) MPX (2015) SGX (2015) TDX (2021)
Transactional memory	TSX (2013) ASF
Virtualization	VT-x (2005) AMD-V (2006) VT-d (AMD-Vi)
Suspended extensions' dates are ~~struck through~~.