Jump to content

Intel microcode: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m typo
P6 and later micro-operations: add main article on Micro-operations.
 
(38 intermediate revisions by 23 users not shown)
Line 1: Line 1:
{{Short description|Microcode in x86 Intel processors}}
'''Intel microcode''' is [[microcode]] that runs inside [[x86]] processors made by [[Intel]]. Since the [[P6 (microarchitecture)|P6 microarchitecture]] introduced in the mid-1990s, the microcode programs can be [[patch (computing)|patched]] by the operating system or [[BIOS]] firmware to workaround bugs found in the CPU after release.<ref name="gwennap-20070915"/> Intel had originally designed microcode updates for processor debugging under its [[design for testing]] (DFT) initiative.<ref name="intel-dft-1998">{{cite paper|url=https://www.intel.com/content/dam/www/public/us/en/documents/research/1998-vol02-iss-2-intel-technology-journal.pdf|title=An Overview of Advanced Failure Analysis Techniques for Pentium and Pentium Pro Microprocessors|journal=Intel Technology Journal|date=20 April 1998|issue=Q2|editor1-first=Lin|editor1-last=Chao|author1=Yeoh Eng Hong|author2=Lim Seong Leong|author3=Wong Yik Choong|author4=Lock Choon Hou|author5=Mahmud Adnan|quote=Pentium Pro microprocessor ... Micropatching {{abbr|DFT|Design For Testability}} feature. ... consists of two key elements: the microcode patch RAM and several pairs of Match and Destination registers. ... Microcode Instruction Pointer (UIP) matches the content of a Match register, the UIP will be reloaded with a new address from the Destination register. ... {{abbr|UIP|Microcode Instruction Pointer}} for the reset subroutine can be set in the Match register ... thereby bypassing the reset subroutine altogether.}}</ref>
'''Intel microcode''' is [[microcode]] that runs inside [[x86]] processors made by [[Intel]]. Since the [[P6 (microarchitecture)|P6 microarchitecture]] introduced in the mid-1990s, the microcode programs can be [[patch (computing)|patched]] by the operating system or [[BIOS]] firmware to work around bugs found in the CPU after release.<ref name="gwennap-20070915"/> Intel had originally designed microcode updates for processor debugging under its [[design for testing]] (DFT) initiative.<ref name="intel-dft-1998">{{cite journal|url=https://www.intel.com/content/dam/www/public/us/en/documents/research/1998-vol02-iss-2-intel-technology-journal.pdf|title=An Overview of Advanced Failure Analysis Techniques for Pentium and Pentium Pro Microprocessors|journal=Intel Technology Journal|date=20 April 1998|issue=Q2|editor1-first=Lin|editor1-last=Chao|author1=Yeoh Eng Hong|author2=Lim Seong Leong|author3=Wong Yik Choong|author4=Lock Choon Hou|author5=Mahmud Adnan|quote=Pentium Pro microprocessor ... Micropatching {{abbr|DFT|Design For Testability}} feature. ... consists of two key elements: the microcode patch RAM and several pairs of Match and Destination registers. ... Microcode Instruction Pointer (UIP) matches the content of a Match register, the UIP will be reloaded with a new address from the Destination register. ... {{abbr|UIP|Microcode Instruction Pointer}} for the reset subroutine can be set in the Match register ... thereby bypassing the reset subroutine altogether.}}</ref>


Following the [[Pentium FDIV bug]] the patchable microcode function took on a wider purpose to allow in-field updating without needing to do a [[product recall]].<ref name="gwennap-20070915"/>
Following the [[Pentium FDIV bug]], the [[patchable microcode]] function took on a wider purpose to allow in-field updating without needing to do a [[product recall]].<ref name="gwennap-20070915"/>


In the P6 and later microarchitectures, [[x86 instruction listings|x86 instructions]] are internally converted into simpler [[Reduced instruction set computer|RISC]]-style [[micro-operation]]s that are specific to a particular processor and [[stepping level]].<ref name="gwennap-20070915"/>
In the P6 and later microarchitectures, [[x86 instruction listings|x86 instructions]] are internally converted into simpler [[Reduced instruction set computer|RISC]]-style [[micro-operation]]s that are specific to a particular processor and [[stepping level]].<ref name="gwennap-20070915"/>


==Micro-operations==
==Pre-P6 microcode==
On the [[Intel 80486]] and AMD [[Am486]] there are approximately 5000 lines of microcode assembly, totalling approximately 240 Kbits stored in the microcode [[Read-only memory|ROM]].<ref name="trumbull-1994">{{cite report|url=https://ir.amd.com/sec-filings/content/0000898430-94-000804/EX-99_1.txt|first=Patricia V.|last=Trumbull|date=1994-10-07|access-date=2021-05-10|series=[[United States District Court for the Northern District of California]]|location=San Jose|via=[[Advanced Micro Devices]]|title=Intel Corporation v. Advanced Micro Devices|number=C-93-20301 PVT|type=Findings of fact and conclusions of law following "[[In-circuit emulation|ICE]]" module of trial|quote=Twelve pins are affiliated with the "ICE" circuitry. … AMD 486DXL and DXLV connect three pins associated with "[[In-circuit emulation|ICE]]" in order to implement its "[[System Management Mode|SMM]]" feature. … 250 lines or 12,032 bits of the "ICE" microcode in the [[Intel 80486|486]]. "[[In-circuit emulation|ICE]]" constitutes about five percent of the total 486 microcode. … two lines … (used to set the "[[In-circuit emulation|ICE]]" mode "[[Flip-flop (electronics)|flip flop]]") … blue coded lines of microcode are associated with production testing and not used for "[[In-circuit emulation|ICE]]" related purposes. … Seventy-five red coded lines were used by Intel to perform "[[System Management Mode|SMM]]" in its [[Intel 486SL|486SL]], a data sheet function of this version of the chip. About 32 yellow coded lines perform routine operations which are not unique to "ICE." About two lines remain dedicated solely to "ICE."}}</ref>
On the Pentium Pro, each micro-operation is 72-bits wide,<ref>{{cite paper|url=https://people.eecs.berkeley.edu/~kubitron/courses/cs152-S04/lectures/lec25-power.pdf|title=Dynamic Scheduling in P6 (Pentium Pro, II, III)|journal=Low Power Design, Advanced Intel Processors|series=CS152 Computer Architecture and Engineering|issue=Lecture 25|date=3 May 2004|first=John|last=Kubiatowicz|quote=Complex 80x86 instructions are executed by a conventional microprogram ({{abbr|8K|8192}} x 72 bits) that issues long sequences of micro-operations}}</ref>{{rp|43}} or 118-bits wide.<ref name="linley-19950216">{{cite news|url=https://pdfs.semanticscholar.org/fe2b/b73d7046a6ed87ce9b18d62f194d67fa2100.pdf|first=Linley|last=Gwennap|date=16 February 1995|work=[[Microprocessor Report]]|publisher=MicroDesign Resources|volume=9|number=2|pages=1–7|title=Intel's P6 Uses Decoupled Superscalar Design|s2cid=14414612|quote=P6 uops have a fixed length of 118 bits, using a regular structure to encode an operation, two sources, and a destination. The source and destination fields are each wide enough to contain a 32-bit operand.}}</ref>{{rp|2}}<ref name="asanovic-2002"/>{{rp|14}} This includes an opcode, two source fields, and one destination field,<ref name="colwell-steck-19950412">{{cite document|url=http://datasheets.chipdb.org/Intel/x86/P6/p6updt.pdf|title=A 0.6 μm BiCMOS Processor With Dynamic Execution|first1=Robert P.|last1=Colwell|first2=Randy L.|last2=Steck|author3=Intel Corporation|date=1995-04-12|access-date=2020-05-27|page=7|quote=Micro-ops are the atomic unit of work in the P6 processor and are {{sic|comprised |hide=y|of}} an opcode, two source and one destination operand. These micro-ops are fixed length and are more general than the Pentium(R) processor's microcode since they need to be scheduled.}}</ref>{{rp|7}} with the ability to hold a 32-bit immediate value.<ref name="linley-19950216"/><ref name="asanovic-2002"/>{{rp|14}} The Pentium Pro is able to detect [[parity error]]s in its internal microcode {{abbr|ROM|Read-Only Memory}} and report these via the [[Machine Check Architecture]].<ref>{{cite report|url-status=unfit|url=x|archiveurl=http://folk.uio.no/inf242/doc/242692_1.pdf|archivedate=6 September 2001|title=16.6.1. Simple Error Codes|page=401|date=3 January 1996|accessdate=1 October 2018|work=Machine Check Architecture|volume=3: Operating System Writer's Guide|series=Pentium® Pro Family Developer's Manual|quote=unique codes indicate global error information … Microcode ROM Parity Error|issue=December 1995}}</ref>

==P6 and later micro-operations==
{{Main|Micro-operation}}
Starting with the Pentium Pro, in most Intel x86 processors, instructions are converted by the instruction fetch and decode unit to sequences of processor-specific micro-operations that are directly executed by the processor. For the instructions that are implemented in microcode, the microcode consists of micro-operations fetched from on-chip memory.<ref name="pentium-pro-tour">{{cite web|url=http://www.intel.com/procs/ppro/info/p6white/index.htm|title=A Tour of the Pentium Pro Processor Microarchitecture|website=Intel|archive-url=https://web.archive.org/web/19961220080210/http://www.intel.com/procs/ppro/info/p6white/index.htm|archive-date=1996-12-20|url-status=dead}}</ref>

On the Pentium Pro, each micro-operation is 72-bits wide,<ref>{{cite journal|url=https://people.eecs.berkeley.edu/~kubitron/courses/cs152-S04/lectures/lec25-power.pdf|title=Dynamic Scheduling in P6 (Pentium Pro, II, III)|journal=Low Power Design, Advanced Intel Processors|series=CS152 Computer Architecture and Engineering|issue=Lecture 25|date=3 May 2004|first=John|last=Kubiatowicz|quote=Complex 80x86 instructions are executed by a conventional microprogram ({{abbr|8K|8192}} x 72 bits) that issues long sequences of micro-operations}}</ref>{{rp|43}} or 118-bits wide.<ref name="linley-19950216">{{cite news|url=https://pdfs.semanticscholar.org/fe2b/b73d7046a6ed87ce9b18d62f194d67fa2100.pdf|archive-url=https://web.archive.org/web/20181008134943/https://pdfs.semanticscholar.org/fe2b/b73d7046a6ed87ce9b18d62f194d67fa2100.pdf|url-status=dead|archive-date=8 October 2018|first=Linley|last=Gwennap|date=16 February 1995|work=[[Microprocessor Report]]|publisher=MicroDesign Resources|volume=9|number=2|pages=1–7|title=Intel's P6 Uses Decoupled Superscalar Design|s2cid=14414612|quote=P6 uops have a fixed length of 118 bits, using a regular structure to encode an operation, two sources, and a destination. The source and destination fields are each wide enough to contain a 32-bit operand.}}</ref>{{rp|2}}<ref name="asanovic-2002"/>{{rp|14}} This includes an opcode, two source fields, and one destination field,<ref name="colwell-steck-19950412">{{cite web|url=http://datasheets.chipdb.org/Intel/x86/P6/p6updt.pdf|title=A 0.6 μm BiCMOS Processor With Dynamic Execution|first1=Robert P.|last1=Colwell|first2=Randy L.|last2=Steck|author3=Intel Corporation|date=1995-04-12|access-date=2020-05-27|page=7|quote=Micro-ops are the atomic unit of work in the P6 processor and are {{sic|comprised |hide=y|of}} an opcode, two source and one destination operand. These micro-ops are fixed length and are more general than the Pentium(R) processor's microcode since they need to be scheduled.}}</ref>{{rp|7}} with the ability to hold a 32-bit immediate value.<ref name="linley-19950216"/><ref name="asanovic-2002"/>{{rp|14}} The Pentium Pro is able to detect [[parity error]]s in its internal microcode {{abbr|ROM|Read-Only Memory}} and report these via the [[Machine Check Architecture]].<ref>{{cite report|url-status=unfit|url=x|archive-url=http://folk.uio.no/inf242/doc/242692_1.pdf|archive-date=6 September 2001|title=16.6.1. Simple Error Codes|page=401|date=3 January 1996|access-date=1 October 2018|work=Machine Check Architecture|volume=3: Operating System Writer's Guide|series=Pentium® Pro Family Developer's Manual|quote=unique codes indicate global error information … Microcode ROM Parity Error|issue=December 1995}}</ref>


Micro-operations have a consistent format with up to three source inputs, and two destination outputs.<ref name="ronen-2005018"/> The processor performs [[register renaming]] to map these inputs to and from the real [[register file]] (RRF) before and after their execution.<ref name="ronen-2005018"/> [[Out-of-order execution]] is used, so the micro-operations and instructions they represent may not appear in the same order.
Micro-operations have a consistent format with up to three source inputs, and two destination outputs.<ref name="ronen-2005018"/> The processor performs [[register renaming]] to map these inputs to and from the real [[register file]] (RRF) before and after their execution.<ref name="ronen-2005018"/> [[Out-of-order execution]] is used, so the micro-operations and instructions they represent may not appear in the same order.


During development of the Pentium Pro, several microcode fixes were included between the A2 and B0 steppings.<ref name="papworth-199604">{{cite news|url=http://web.cecs.pdx.edu/~berkina/R10_papworth_ieeemicro_1996.pdf|page=14|work=IEEE Micro|title=Tuning the Pentium Pro Microarchitecture|first=David B.|last=Papworth|author2=Intel Corporation|issn=0272-1732|date=April 1996|accessdate=8 October 2018|quote=B0 stepping incorporated several microcode bugs and speed path fixes for problems discovered on the A-step silicon}}</ref> For the Pentium II (based on the P6 Pentium Pro), additional micro-operations were added to support the [[MMX (instruction set)|MMX instruction set]].<ref name="kagan-et-al-1997"/> In several cases, "microcode assists" were added to handle rare corner-cases in a reliable way.<ref name="kagan-et-al-1997"/>
During development of the Pentium Pro, several microcode fixes were included between the A2 and B0 steppings.<ref name="papworth-199604">{{cite news|url=http://web.cecs.pdx.edu/~berkina/R10_papworth_ieeemicro_1996.pdf|page=14|work=IEEE Micro|title=Tuning the Pentium Pro Microarchitecture|first=David B.|last=Papworth|author2=Intel Corporation|issn=0272-1732|date=April 1996|access-date=8 October 2018|quote=B0 stepping incorporated several microcode bugs and speed path fixes for problems discovered on the A-step silicon|archive-date=8 October 2018|archive-url=https://web.archive.org/web/20181008095801/http://web.cecs.pdx.edu/~berkina/R10_papworth_ieeemicro_1996.pdf|url-status=dead}}</ref> For the Pentium II (based on the P6 Pentium Pro), additional micro-operations were added to support the [[MMX (instruction set)|MMX instruction set]].<ref name="kagan-et-al-1997"/> In several cases, "microcode assists" were added to handle rare corner-cases in a reliable way.<ref name="kagan-et-al-1997"/>


The Pentium 4 can have 126 micro-operations in flight at the same time.<ref name="hinton-et-al-2001"/>{{rp|10}} Micro-operations are decoded and stored in an Execution Trace Cache with 12,000 entries, to avoid repeated decoding of the same x86 instructions.<ref name="hinton-et-al-2001"/>{{rp|5}} Groups of six micro-operations are packed into a trace line.<ref name="hinton-et-al-2001"/>{{rp|5}} Micro-operations can borrow extra immediate data space within the same cache-line.<ref name="fog-micro-2020">{{cite document|url=https://www.agner.org/optimize/microarchitecture.pdf|title=The microarchitecture of Intel, AMD and VIA CPUs|type=An optimization guide for assembly programmers and compiler makers|first=Agner|last=Fog|publisher=Technical University of Denmark|date=2020-05-25|page=49|quote=… If a μop has an immediate 32-bit operand outside the ±2<sup>15</sup> interval so that it cannot be represented as a 16-bit signed integer, then it will use two trace cache entries unless it can borrow storage space from a nearby μop. … A μop in need of extra storage space can borrow 16 bits of extra storage space from a nearby μop that doesn't need its own data space.}}</ref>{{rp|49}} Complex instructions, such as exception handling, result in jumping to the microcode ROM.<ref name="hinton-et-al-2001"/>{{rp|6}} During development of the Pentium 4, microcode accounted for 14% of processor bugs versus 30% of processor bugs during development of the Pentium Pro.<ref name="bentley-rand-2001">{{cite journal|url=https://www.intel.com/content/dam/www/public/us/en/documents/research/2001-vol05-iss-1-intel-technology-journal.pdf|quote=Bug Discussion|title=Validating The Intel® Pentium® 4 Processor|first1=Bob|last1=Bentley|first2=Rand|last2=Gray|page=29–26|journal=Intel Technology Journal|issue=Q1|year=2001|editor1-first=Lin|editor1-last=Chao}}</ref>{{rp|35}}
The Pentium 4 can have 126 micro-operations in flight at the same time.<ref name="hinton-et-al-2001"/>{{rp|10}} Micro-operations are decoded and stored in an Execution Trace Cache with 12,000 entries, to avoid repeated decoding of the same x86 instructions.<ref name="hinton-et-al-2001"/>{{rp|5}} Groups of six micro-operations are packed into a trace line.<ref name="hinton-et-al-2001"/>{{rp|5}} Micro-operations can borrow extra immediate data space within the same cache-line.<ref name="fog-micro-2020">{{cite web|url=https://www.agner.org/optimize/microarchitecture.pdf|title=The microarchitecture of Intel, AMD and VIA CPUs|type=An optimization guide for assembly programmers and compiler makers|first=Agner|last=Fog|publisher=Technical University of Denmark|date=2020-05-25|page=49|quote=… If a μop has an immediate 32-bit operand outside the ±2<sup>15</sup> interval so that it cannot be represented as a 16-bit signed integer, then it will use two trace cache entries unless it can borrow storage space from a nearby μop. … A μop in need of extra storage space can borrow 16 bits of extra storage space from a nearby μop that doesn't need its own data space.}}</ref>{{rp|49}} Complex instructions, such as exception handling, result in jumping to the microcode ROM.<ref name="hinton-et-al-2001"/>{{rp|6}} During development of the Pentium 4, microcode accounted for 14% of processor bugs versus 30% of processor bugs during development of the Pentium Pro.<ref name="bentley-rand-2001">{{cite journal|url=https://www.intel.com/content/dam/www/public/us/en/documents/research/2001-vol05-iss-1-intel-technology-journal.pdf|quote=Bug Discussion|title=Validating The Intel® Pentium® 4 Processor|first1=Bob|last1=Bentley|first2=Rand|last2=Gray|pages=29–26|journal=Intel Technology Journal|issue=Q1|year=2001|editor1-first=Lin|editor1-last=Chao}}</ref>{{rp|35}}


The [[Intel Core (microarchitecture)|Intel Core microarchitecture]] introduced in 2006 added "[[Macro-Ops Fusion|micro-operations fusion]]" for some common pairs of instructions including comparison followed by a jump.<ref name="gelas-20060501"/> The instruction decoders in the Core convert x86 instructions into microcode in three different ways:
The [[Intel Core (microarchitecture)|Intel Core microarchitecture]] introduced in 2006 added "[[Macro-Ops Fusion|macro-operations fusion]]" for some common pairs of instructions including comparison followed by a jump.<ref name="gelas-20060501"/> The instruction decoders in the Core convert x86 instructions into microcode in three different ways:
{| class="wikitable"
{| class="wikitable"
|+ Conversion of x86 instructions to micro-operations on Core<ref name="gelas-20060501"/>
|+ Conversion of x86 instructions to micro-operations on Core<ref name="gelas-20060501"/>
Line 30: Line 37:


==Update facility==
==Update facility==
In the mid-1990s, a facility for supplying new microcode was initially referred to as the Pentium Pro '''BIOS Update Feature'''.<ref name="intel-bios-19960112">{{cite report|url=http://datasheets.chipdb.org/Intel/x86/Pentium%20Pro/PPPBIOS.PDF|title=8: Pentium Pro Processor BIOS Update Feature|version=2.0|date=12 January 1996|publisher=Intel |accessdate=3 November 2020|page=45|quote=authentication procedure relies upon the decryption provided by the processor to verify an update from a potentially hostile sources.}}</ref> It was intended that user-mode applications should make a [[BIOS interrupt call]] to supply a new "BIOS Update Data Block", which the BIOS would partially validate and save to [[nonvolatile BIOS memory]]; this could be supplied to the installed processors on next boot.<ref name="intel-bios-19960112" />
In the mid-1990s, a facility for supplying new microcode was initially referred to as the Pentium Pro '''BIOS Update Feature'''.<ref name="intel-bios-19960112">{{cite report|url=http://datasheets.chipdb.org/Intel/x86/Pentium%20Pro/PPPBIOS.PDF|title=8: Pentium Pro Processor BIOS Update Feature|version=2.0|date=12 January 1996|publisher=Intel |access-date=3 November 2020|page=45|quote=authentication procedure relies upon the decryption provided by the processor to verify an update from a potentially hostile sources.}}</ref><ref name="Stiller_1996"/> It was intended that user-mode applications should make a [[BIOS interrupt call]] to supply a new "BIOS Update Data Block", which the BIOS would partially validate and save to [[nonvolatile BIOS memory]]; this could be supplied to the installed processors on next boot.<ref name="intel-bios-19960112" />


Intel distributed a program called <code>BUP_UTIL.EXE</code>, renamed <code>CHECKUP3.EXE</code> that could be run under [[DOS]]. Collections of multiple microcode updates were concatencated together and numerically numbered with the extension <code>.PDB</code>, such as <code>PEP6.PDB</code>.<ref name="mueller-199809">{{cite book|url=http://computarium.lcd.lu/library/PDF/MUELLER_Upgrading_and_Repairing_PCs_1998.pdf|title=Upgrading and Repairing PCs|edition=Tenth Anniversary|first=Scott|last1=Mueller|first2=Craig|last2=Zacker|date=September 1998|isbn=0-7897-1636-4|publisher=[[Que Publishing]]|editor1-first=Jim|editor1-last=Minatel|editor2-first=Jill|editor2-last=Byus|editor3-first=Rick|editor3-last=Kughen|page=79|quote=Processor Steppings (Revisions) and Microcode Update Revisions Supported by the Update Database File PEP6.PDB … Using the processor update utility (CHECKUP3.EXE), … can easily verify … the correct microcode update|accessdate=1 October 2018}}</ref>{{rp|79}}
Intel distributed a program called <code>BUP_UTIL.EXE</code>, renamed <code>CHECKUP3.EXE</code> that could be run under [[DOS]]. Collections of multiple microcode updates were concatenated together and numerically numbered with the extension <code>.PDB</code>, such as <code>PEP6.PDB</code>.<ref name="mueller-199809">{{cite book|url=http://computarium.lcd.lu/library/PDF/MUELLER_Upgrading_and_Repairing_PCs_1998.pdf|title=Upgrading and Repairing PCs|edition=Tenth Anniversary|first1=Scott|last1=Mueller|first2=Craig|last2=Zacker|date=September 1998|isbn=0-7897-1636-4|publisher=[[Que Publishing]]|editor1-first=Jim|editor1-last=Minatel|editor2-first=Jill|editor2-last=Byus|editor3-first=Rick|editor3-last=Kughen|page=79|quote=Processor Steppings (Revisions) and Microcode Update Revisions Supported by the Update Database File PEP6.PDB … Using the processor update utility (CHECKUP3.EXE), … can easily verify … the correct microcode update|access-date=1 October 2018}}</ref>{{rp|79}}


===Processor interface===
===Processor interface===
Line 47: Line 54:
# [[Padding (cryptography)|Padding]] consisting of random values, to obfuscate understanding of the format of the microcode update.<ref name="gwennap-20070915"/>
# [[Padding (cryptography)|Padding]] consisting of random values, to obfuscate understanding of the format of the microcode update.<ref name="gwennap-20070915"/>


Each block is encoded differently, and the majority of the 2,000 bytes are not used as configuration program and SRAM micro-operation contents themselves are much smaller.<ref name="gwennap-20070915"/> Final determination and validation of whether an update can be applied to a processor is performed during [[decryption]] via the processor.<ref name="intel-bios-19960112"/> Each microcode update is specific to a particular CPU revision, and is designed to be rejected by CPUs with a different [[stepping level]]. Microcode updates are encrypted to prevent tampering and to enable validation.<ref>{{cite magazine|url=http://www.techweb.com/se/directlink.cgi?EET19970630S0007|url-status=dead|archiveurl=https://web.archive.org/web/19991113012445/http://www.techweb.com/se/directlink.cgi?EET19970630S0007|archivedate=1999-11-13|title=Intel preps plan to bust bugs in Pentium MPUs|first=Alexander|last=Wolfe|magazine=[[EE Times]]|via=[[CMP Technology|Techweb]]|accessdate=3 October 2018|date=30 June 1997|issue=960|quote=obscure moniker "BIOS Update Feature." … "Each BIOS Update is tailored for a particular stepping of [a] processor," … data block is mapped directly-… after decryption-to the microcode itself.}}</ref>
Each block is encoded differently, and the majority of the 2,000 bytes are not used as configuration program and SRAM micro-operation contents themselves are much smaller.<ref name="gwennap-20070915"/> Final determination and validation of whether an update can be applied to a processor is performed during [[decryption]] via the processor.<ref name="intel-bios-19960112"/> Each microcode update is specific to a particular CPU revision, and is designed to be rejected by CPUs with a different [[stepping level]]. Microcode updates are encrypted to prevent tampering and to enable validation.<ref>{{cite magazine|url=http://www.techweb.com/se/directlink.cgi?EET19970630S0007|url-status=dead|archive-url=https://web.archive.org/web/19991113012445/http://www.techweb.com/se/directlink.cgi?EET19970630S0007|archive-date=1999-11-13|title=Intel preps plan to bust bugs in Pentium MPUs|first=Alexander|last=Wolfe|magazine=[[EE Times]]|via=[[CMP Technology|Techweb]]|access-date=3 October 2018|date=30 June 1997|issue=960|quote=obscure moniker "BIOS Update Feature." … "Each BIOS Update is tailored for a particular stepping of [a] processor," … data block is mapped directly-… after decryption-to the microcode itself.}}</ref>


With the Pentium there are two layers of encryption and the precise details explicitly {{Em|not}} documented by Intel, instead being only known to fewer than ten employees.<ref name="wolfe-1997">{{cite magazine|url=http://www.eetimes.com/news/97/963news/hole.html|archiveurl=https://web.archive.org/web/20030309102752/http://www.eetimes.com/news/97/963news/hole.html|archivedate=2003-03-09|magazine=[[EE Times]]|date=30 June 1997|first=Alexander|last=Wolfe|title=Hole seen in Intel's bug-busting feature|location=Santa Clara|quote=Ajay Malhortra, a technical marketing manager based here at Intel's microprocessor group. "Not only is the data block containing the microcode patch encrypted, but once the processor examines the header of the BIOS update, there are two levels of encryption in the processor that must occur before it will successfully load the update." … closely guarded secret. "There is no documentation," said Frank Binns, an architect in Intel's microprocessor group. "It's not as if you can get an Intel 'Red Book' with this stuff written down. It's actually in the heads of less than 10 people in the whole of Intel."}}</ref>
With the Pentium there are two layers of encryption and the precise details explicitly {{Em|not}} documented by Intel, instead being only known to fewer than ten employees.<ref name="wolfe-1997">{{cite magazine|url=http://www.eetimes.com/news/97/963news/hole.html|archive-url=https://web.archive.org/web/20030309102752/http://www.eetimes.com/news/97/963news/hole.html|archive-date=2003-03-09|magazine=[[EE Times]]|date=30 June 1997|first=Alexander|last=Wolfe|title=Hole seen in Intel's bug-busting feature|location=Santa Clara|quote=Ajay Malhortra, a technical marketing manager based here at Intel's microprocessor group. "Not only is the data block containing the microcode patch encrypted, but once the processor examines the header of the BIOS update, there are two levels of encryption in the processor that must occur before it will successfully load the update." … closely guarded secret. "There is no documentation," said Frank Binns, an architect in Intel's microprocessor group. "It's not as if you can get an Intel 'Red Book' with this stuff written down. It's actually in the heads of less than 10 people in the whole of Intel."}}</ref>


Microcode updates for [[Intel Atom]], [[Nehalem (microarchitecture)|Nehalem]] and [[Sandy Bridge]] additionally contain an extra 520-byte header containing a 2048-bit [[RSA (cryptosystem)|RSA]] modulus with an exponent of 17 decimal.<ref name="chen-ahn-20141211"/>{{rp|7,8}}
Microcode updates for [[Intel Atom]], [[Nehalem (microarchitecture)|Nehalem]] and [[Sandy Bridge]] additionally contain an extra 520-byte header containing a 2048-bit [[RSA (cryptosystem)|RSA]] modulus with an exponent of 17 decimal.<ref name="chen-ahn-20141211"/>{{rp|7,8}}
Line 61: Line 68:
| Core || PIII … {{nowrap|Core 2}} ||style="text-align:right;"| 4048 ||style="text-align:right;"| 3096
| Core || PIII … {{nowrap|Core 2}} ||style="text-align:right;"| 4048 ||style="text-align:right;"| 3096
|-
|-
| Netburst || {{nowrap|{{abbr|P4|Pentium 4}}, {{nowrap|Pentium D}}, Celeron ||style="text-align:right;"| 2000–7120 ||style="text-align:right;"| 2000 + N*1024 || chained block cipher
| Netburst || {{nowrap|{{abbr|P4|Pentium 4}}}}, {{nowrap|Pentium D}}, Celeron ||style="text-align:right;"| 2000–7120 ||style="text-align:right;"| 2000 + N*1024 || chained block cipher
|-
|-
| Atom, Nehalem, {{nowrap|Sandy Bridge}} || {{nowrap|Core i3/i5/i7}} ||style="text-align:right;"| 976–16336 ||style="text-align:right;"| 976 + N*1024; 5120 || AES + RSA signature
| Atom, Nehalem, {{nowrap|Sandy Bridge}} || {{nowrap|Core i3/i5/i7}} ||style="text-align:right;"| 976–16336 ||style="text-align:right;"| 976 + N*1024; 5120 || AES + RSA signature
Line 70: Line 77:


During the mid-1980s [[NEC]] and Intel had a long-running US federal court case about microcode copyright.<ref name="elkins-1990">{{cite journal|url=https://repository.jmls.edu/cgi/viewcontent.cgi?article=1423&context=jitpl|
During the mid-1980s [[NEC]] and Intel had a long-running US federal court case about microcode copyright.<ref name="elkins-1990">{{cite journal|url=https://repository.jmls.edu/cgi/viewcontent.cgi?article=1423&context=jitpl|
journal=Computer/Law Journal|volume=10|issue=4|date=Winter 1990|first=David S.|last=Elkins|title=NEC v. Intel: A Guide to Using "Clean Room" Procedures as Evidence|page=453|quote=NEC's use of its [[cleanroom software engineering|clean room procedures]] as trial evidence … [[William Percival Gray|Judge Gray]] defined microcode … within the Copyright Act's definition of a "computer program," … Intel's microcode is copyrightable. … Intel's microcode did not contain the required copyright notice. … copyrights had been forfeited. … Intel was left with no basis for its claim of copying}}</ref> NEC had been acting as a [[second source]] for [[Intel 8086]] CPUs with its NEC μPD8086, and held long-term patent and copyright cross-licensing agreements with Intel. In August 1982 Intel sued NEC for copyright infringement over the microcode implementation.<ref name="hinckley-198701">{{cite journal|url=https://digitalcommons.law.scu.edu/cgi/viewcontent.cgi?article{{=}}1031&context=chtlj|title=NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright Editors'|first=Robert C.|last=Hinckley|date=January 1987|journal=Santa Clara High Technology Law Journal|volume=3|issue=1|format=Article 2|quote=Appendix: Microcode formats; [[Intel 8086|8086]]/8088 Format; [[NEC V20|V20]]/V30 format}}</ref><ref name="leong-19880328">{{cite magazine|url=https://books.google.com/books?id=JRgDwCkMX_cC&pg=PP84|title=Intel witness recants story|first=Kathy Chin|last=Leong|magazine=Computerworld : the newsweekly of information systems management|pages=83, 84|location=San Jose|date=28 March 1988|accessdate=2 October 2018|volume=22|number=13|issn=0010-4841}}</ref> NEC prevailed by demonstrating via [[Clean room design|cleanroom software engineering]] that the similarities in the implementation of microcode on its V20 and V30 processors was the result of the restrictions demanded by the architecture, rather than via copying.<ref name="elkins-1990"/>
journal=Computer/Law Journal|volume=10|issue=4|date=Winter 1990|first=David S.|last=Elkins|title=NEC v. Intel: A Guide to Using "Clean Room" Procedures as Evidence|page=453|quote=NEC's use of its [[cleanroom software engineering|clean room procedures]] as trial evidence … [[William Percival Gray|Judge Gray]] defined microcode … within the Copyright Act's definition of a "computer program," … Intel's microcode is copyrightable. … Intel's microcode did not contain the required copyright notice. … copyrights had been forfeited. … Intel was left with no basis for its claim of copying}}</ref> NEC had been acting as a [[second source]] for [[Intel 8086]] CPUs with its NEC μPD8086, and held long-term patent and copyright cross-licensing agreements with Intel. In August 1982 Intel sued NEC for copyright infringement over the microcode implementation.<ref name="hinckley-198701">{{cite journal |last=Hinckley |first=Robert C. |date=January 1987 |title=NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright Editors' |url=https://digitalcommons.law.scu.edu/cgi/viewcontent.cgi?article=1031&context=chtlj |journal=Santa Clara High Technology Law Journal |volume=3 |issue=1 |quote=Appendix: Microcode formats; [[Intel 8086|8086]]/8088 Format; [[NEC V20|V20]]/V30 format}}</ref><ref name="leong-19880328">{{cite magazine|url=https://books.google.com/books?id=JRgDwCkMX_cC&pg=PP84|title=Intel witness recants story|first=Kathy Chin|last=Leong|magazine=[[Computerworld]]|pages=83, 84|location=San Jose|date=28 March 1988|access-date=2 October 2018|volume=22|number=13|issn=0010-4841}}</ref> NEC prevailed by demonstrating via [[Clean room design|cleanroom software engineering]] that the similarities in the implementation of microcode on its V20 and V30 processors was the result of the restrictions demanded by the architecture, rather than via copying.<ref name="elkins-1990"/>


The [[Intel 386]] can perform a [[built-in self-test]] of the microcode and [[programmable logic array]]s, with the value of the self-test placed in the <code>EAX</code> register.<ref name="intel-386-dx-199512">{{cite document|url=x|url-status=unfit|archiveurl=http://pdf.datasheetcatalog.com/datasheet/Intel/mXtuvqv.pdf|archivedate=3 September 2004|title=Intel386 DX Microprocessor 32-BIT CHMOS Microprocessor with Integrated Memory Management|date=December 1995|issue=231630–011|quote=self-test checks the function of all of the Control ROM … EAX register will contain a signature of 00000000h indicating the Intel386 DX passed its self-test of microcode and major [[programmable logic array|PLA]] contents}}</ref> During the BIST, the microprogram counter is re-used to walk through all of the ROMs, with the results being collated via a network of multiple-input signature registers (MISRs) and linear-feedback shift registers.<ref>{{cite paper|url=https://nptel.ac.in/courses/Webcourse-contents/IIT%20Kharagpur/Embedded%20systems/Pdf/Lesson-40.pdf|title=5.1 Exhaustive Test in the Intel 80386|series=Testing of Embedded System|journal=Built-In-Self-Test (BIST) for Embedded Systems|page=21|publisher=[[IIT Kharagpur]]|date=7 October 2006|accessdate=6 October 2018|quote=For ROMs, the patterns are generated by the microprogram counter which is part of the normal logic.}}</ref> On start up of the [[Intel 486]], a hardware-controlled BIST runs for 2<sup>20</sup> clock cycles to check various arrays including the microcode ROM, after which control is transferred to the microcode for further self-testing of registers and computation units.<ref name="gelsinger-1999">{{cite journal|url=https://www.computer.org/csdl/proceedings/iccd/1989/1971/00/00063355.pdf|title=Computer Aided Design and Built In Self Test on the i486™ CPU|first1=Patrick|last1=Gelsinger|authorlink1=Pat Gelsinger|first2=Sundar|last2=lyengar|first3=Joseph|last3=Krauskopf|first4=James|last4=Nadir|author5=Intel|publisher=IEEE|year=1999|pages=200–201|journal=Computer Design: VLSI in Computers and Processors}}</ref> The Intel 486 microcode ROM has 250,000 transistors.<ref name="gelsinger-1999"/>
The [[Intel 386]] can perform a [[built-in self-test]] of the microcode and [[programmable logic array]]s, with the value of the self-test placed in the <code>EAX</code> register.<ref name="intel-386-dx-199512">{{cite web|url=x|url-status=unfit|archive-url=http://pdf.datasheetcatalog.com/datasheet/Intel/mXtuvqv.pdf|archive-date=3 September 2004|title=Intel386 DX Microprocessor 32-BIT CHMOS Microprocessor with Integrated Memory Management|date=December 1995|issue=231630–011|quote=self-test checks the function of all of the Control ROM … EAX register will contain a signature of 00000000h indicating the Intel386 DX passed its self-test of microcode and major [[programmable logic array|PLA]] contents}}</ref> During the BIST, the microprogram counter is re-used to walk through all of the ROMs, with the results being collated via a network of multiple-input signature registers (MISRs) and linear-feedback shift registers.<ref>{{cite journal|url=https://nptel.ac.in/courses/Webcourse-contents/IIT%20Kharagpur/Embedded%20systems/Pdf/Lesson-40.pdf|title=5.1 Exhaustive Test in the Intel 80386|series=Testing of Embedded System|journal=Built-In-Self-Test (BIST) for Embedded Systems|page=21|publisher=[[IIT Kharagpur]]|date=7 October 2006|access-date=6 October 2018|quote=For ROMs, the patterns are generated by the microprogram counter which is part of the normal logic.}}</ref> On start up of the [[Intel 486]], a hardware-controlled BIST runs for 2<sup>20</sup> clock cycles to check various arrays including the microcode ROM, after which control is transferred to the microcode for further self-testing of registers and computation units.<ref name="gelsinger-1999">{{cite conference|url=https://www.computer.org/csdl/proceedings/iccd/1989/1971/00/00063355.pdf|title=Computer Aided Design and Built In Self Test on the i486™ CPU|first1=Patrick|last1=Gelsinger|author-link1=Pat Gelsinger|first2=Sundar|last2=lyengar|first3=Joseph|last3=Krauskopf|first4=James|last4=Nadir|author5=Intel|publisher=IEEE|year=1999|pages=200–201|conference=1989 IEEE International Conference on Computer Design: VLSI in Computers and Processors}}</ref> The Intel 486 microcode ROM has 250,000 transistors.<ref name="gelsinger-1999"/>


AMD had a long-term contract to reuse Intel's 286, 386 and 486 microcode.<ref name="infoworld-20041017"/> In October 2004, a court ruled that the agreement did not cover AMD distributing Intel's 486 [[in-circuit emulation]] (ICE) microcode.<ref name="infoworld-20041017"/>
AMD had a long-term contract to reuse Intel's 286, 386 and 486 microcode.<ref name="infoworld-20041017"/> In October 2004, a court ruled that the agreement did not cover AMD distributing Intel's 486 [[in-circuit emulation]] (ICE) microcode.<ref name="infoworld-20041017"/>


===Direct Access Testing===
===Direct Access Testing===
Direct Access Testing (DAT) is included in Intel CPUs as part of the [[design for testing]] (DFT) and Design for Debug (DFD) initiatives allow full coverage testing of individual CPUs prior to sale.<ref name="wu-2004">{{cite document|url=https://eecs.ceas.uc.edu/~jonewb/TESTING/Papers/Intel.pdf|title=An An optimized DFT and test pattern generation strategy for an Intel high performance microprocessor|conference=Test|year=2004|first1=David M.|last1=Wu|first2=Mike|last2=Lin|first3=Madhukar|last3=Reddy|first4=Talal|last4=Jaber|first5=Anil|last5=Sabbavarapu|first6=Larry|last6=Thatcher|author7=Intel Corporation|pages=38,43,44|quote=Direct Access Testing (DAT) for array access and diagnosis and Programmable Weak Write Test Mode (PWWTM) for memory cell stability test to reduce the test time. … Array {{abbr|DFT|Design for Test}} test strategy is to use PBIST (Programmable Built-In Self Test) to test the second level cache and use DAT to test the remaining arrays … PBIST is available through the JTAG TAP controller. <!-- … a PX chip may be sold at any one of the 7 configurations when either or some part of the UL1 … on top of the possible configurations to substitute the defective column with the spare column. --> … DAT mode in PX as shown in Figure 4 … PX has more arrays (>110) … array test coverage of PX is 99.3% ‒ the highest in Pentium 4 family}}</ref>
Direct Access Testing (DAT) is included in Intel CPUs as part of the [[design for testing]] (DFT) and Design for Debug (DFD) initiatives allow full coverage testing of individual CPUs prior to sale.<ref name="wu-2004">{{cite web|last1=Wu|first1=David M.|last2=Lin|first2=Mike|last3=Reddy|first3=Madhukar|last4=Jaber|first4=Talal|last5=Sabbavarapu|first5=Anil|last6=Thatcher|first6=Larry|author7=Intel Corporation|year=2004|title=An An optimized DFT and test pattern generation strategy for an Intel high performance microprocessor|url=https://eecs.ceas.uc.edu/~jonewb/TESTING/Papers/Intel.pdf|journal=|pages=38, 43, 44|quote=Direct Access Testing (DAT) for array access and diagnosis and Programmable Weak Write Test Mode (PWWTM) for memory cell stability test to reduce the test time. … Array {{abbr|DFT|Design for Test}} test strategy is to use PBIST (Programmable Built-In Self Test) to test the second level cache and use DAT to test the remaining arrays … PBIST is available through the JTAG TAP controller. <!-- … a PX chip may be sold at any one of the 7 configurations when either or some part of the UL1 … on top of the possible configurations to substitute the defective column with the spare column. --> … DAT mode in PX as shown in Figure 4 … PX has more arrays (>110) … array test coverage of PX is 99.3% ‒ the highest in Pentium 4 family}}</ref>


In May 2020, a script reading directly from the Control Register Bus (CRBUS)<ref>{{cite web |last1=Team |first1=uCode Research |title=chip-red-pill/crbus_scripts |url=https://github.com/chip-red-pill/crbus_scripts |accessdate=26 May 2020 |date=25 May 2020}}</ref> (after exploiting "Red Unlock" in JTAG USB-A to USB-A 3.0 with Debugging Capabilities, without D+, D- and Vcc<ref>{{Citation|author=Positive Research|title=ptresearch/IntelTXE-PoC|date=2020-07-21|url=https://github.com/ptresearch/IntelTXE-PoC|access-date=2020-07-25}}</ref>) was used to read from the Local Direct Access Test (LDAT) port of the Intel [[Goldmont]] CPU and the loaded microcode and patch arrays were read.<ref name="ermolev-20200519-2">{{cite twitter|number=1262697756805795841|user=_markel___|first=Mark|last=Ermolov|date=2020-05-19|title=Using the Local Direct Access Test (LDAT) DFT feature of Intel Atom CPU, we dumped Microcode Sequencer ROM. Also, we extracted what we think is IROM (Immediates for uops) and even managed to modify MS Patch RAM and Match/Patch registers}}</ref> These arrays are only accessible after the CPU has been put into a specific mode, and consist of five arrays accessed through offset 0x6a0:<ref name="bosch-20200522">{{cite web|url=https://pbx.sh/ldat/|title=Intel LDAT notes|date=2020-05-22|access-date=2020-05-26|first=Peter|last=Bosch|quote=PDAT CR: 0x6A0; Array Select: 0‒4}}</ref>
In May 2020, a script reading directly from the Control Register Bus (CRBUS)<ref>{{cite web |last1=Team |first1=uCode Research |title=chip-red-pill/crbus_scripts |website=[[GitHub]] |url=https://github.com/chip-red-pill/crbus_scripts |access-date=26 May 2020 |date=25 May 2020}}</ref> (after exploiting "Red Unlock" in JTAG USB-A to USB-A 3.0 with Debugging Capabilities, without D+, D− and Vcc<ref>{{Citation|author=Positive Research|title=ptresearch/IntelTXE-PoC|date=2020-07-21|url=https://github.com/ptresearch/IntelTXE-PoC|access-date=2020-07-25}}</ref>) was used to read from the Local Direct Access Test (LDAT) port of the Intel [[Goldmont]] CPU and the loaded microcode and patch arrays were read.<ref name="ermolev-20200519-2">{{cite twitter|number=1262697756805795841|user=_markel___|first=Mark|last=Ermolov|date=2020-05-19|title=Using the Local Direct Access Test (LDAT) DFT feature of Intel Atom CPU, we dumped Microcode Sequencer ROM. Also, we extracted what we think is IROM (Immediates for uops) and even managed to modify MS Patch RAM and Match/Patch registers}}</ref> These arrays are only accessible after the CPU has been put into a specific mode, and consist of five arrays accessed through offset 0x6a0:<ref name="bosch-20200522">{{cite web|url=https://pbx.sh/ldat/|title=Intel LDAT notes|date=2020-05-22|access-date=2020-05-26|first=Peter|last=Bosch|quote=PDAT CR: 0x6A0; Array Select: 0‒4}}</ref>


{{ordered list|start=0
{{ordered list|start=0
Line 92: Line 99:
{{reflist|refs=
{{reflist|refs=
<ref name="ronen-2005018">{{cite report|url=http://www.cs.tau.ac.il/~afek/p6tx050111.pdf|title=Micro Operations (Uops)
<ref name="ronen-2005018">{{cite report|url=http://www.cs.tau.ac.il/~afek/p6tx050111.pdf|title=Micro Operations (Uops)
|work=The Pentium II/III Processor "Compiler on a Chip"|first=Ronny|last=Ronen|author2=Intel Labs|location=Haifa|publisher=[[Tel Aviv University]]|date=18 January 2005|accessdate=23 January 2018|archive-url=https://web.archive.org/web/20070416221626/http://www.cs.tau.ac.il/~afek/p6tx050111.pdf|archive-date=16 April 2007|pages=26, 31, 32, 43, 44, 46|quote=Each "[[complex instruction set computer|CISC]]" {{abbr|inst|instruction}} is broken into one or more [[micro-operation|uops]] … Canonical representation of {{abbr|src|source}}/{{abbr|dest|destination}} (3 {{abbr|src|source}}, 2 {{abbr|dest|destination}}) … e.g., <code>pop eax</code> becomes <code>esp1<-esp0+4, eax1<-[esp0]</code> … {{abbr|ID|Instruction Decoder}}: Convert instructions into {{abbr|uops|micro-operations}}. Buffers up to 6 {{abbr|uops|micro-operations}} … {{abbr|Alloc|Allocation}} & {{abbr|RAT|Register Alias Table}} … able to work on up to 3 {{abbr|uops|micro-operations}} per clock … Reservation station (RS) … Pool of all "not yet executed" {{abbr|uops|micro-operations}} (up to 20) … In order Retirement: … Retires up to 3 {{abbr|uops|micro-operations}} per clock … {{abbr|OOO|Out Of Order}} Cluster … Up to 5 resource-ready {{abbr|uops|micro-operations}} are selected, and dispatched per clock}}</ref>
|work=The Pentium II/III Processor "Compiler on a Chip"|first=Ronny|last=Ronen|author2=Intel Labs|location=Haifa|publisher=[[Tel Aviv University]]|date=18 January 2005|access-date=23 January 2018|archive-url=https://web.archive.org/web/20070416221626/http://www.cs.tau.ac.il/~afek/p6tx050111.pdf|archive-date=16 April 2007|pages=26, 31, 32, 43, 44, 46|quote=Each "[[complex instruction set computer|CISC]]" {{abbr|inst|instruction}} is broken into one or more [[micro-operation|uops]] … Canonical representation of {{abbr|src|source}}/{{abbr|dest|destination}} (3 {{abbr|src|source}}, 2 {{abbr|dest|destination}}) … e.g., <code>pop eax</code> becomes <code>esp1<-esp0+4, eax1<-[esp0]</code> … {{abbr|ID|Instruction Decoder}}: Convert instructions into {{abbr|uops|micro-operations}}. Buffers up to 6 {{abbr|uops|micro-operations}} … {{abbr|Alloc|Allocation}} & {{abbr|RAT|Register Alias Table}} … able to work on up to 3 {{abbr|uops|micro-operations}} per clock … Reservation station (RS) … Pool of all "not yet executed" {{abbr|uops|micro-operations}} (up to 20) … In order Retirement: … Retires up to 3 {{abbr|uops|micro-operations}} per clock … {{abbr|OOO|Out Of Order}} Cluster … Up to 5 resource-ready {{abbr|uops|micro-operations}} are selected, and dispatched per clock}}</ref>
<ref name="gwennap-20070915">{{cite news|archive-url=https://web.archive.org/web/20091221182054/https://www.ele.uva.es/~jesman/BigSeti/ftp/Cajon_Desastre/MPR/111204.pdf|archivedate=21 December 2009|url=https://www.ele.uva.es/~jesman/BigSeti/ftp/Cajon_Desastre/MPR/111204.pdf|url-status=dead|title=P6 Microcode Can Be Patched|first=Linley|last=Gwennap|date=15 September 1997|work=[[Microprocessor Report]]|accessdate=23 January 2018|quote=Intel has implemented a microcode patch capability in its [[P6 (microarchitecture)|P6]] processors, including [[Pentium Pro]] and [[Pentium II]] … allows the microcode to be altered after the processor is fabricated, repairing bugs that are found after the processor is designed. … originally intended the feature to be used only for debugging, but after dealing with the expense of the [[Pentium FDIV bug]] … Intel decided to make it usable in the field. … P6 chip contains a complete set of microcode in an internal [[read-only memory|ROM]] … BIOS writes a memory address into a special CPU register to trigger a download sequence … P6 processors contain a small [[static random-access memory|SRAM]] that holds up to 60 microinstructions. The patch code is downloaded into this SRAM … also contains a set of "match" registers that cause a trap when a particular microcode address is encountered. (This is similar to the "instruction [[breakpoint]]" capability used to debug [[assembly language|assembly code]].) This trap, which takes a single cycle to process, vectors microcode execution into the patch RAM. … downloaded microcode consists of two segments. … first is an initialization routine that is run immediately … also initializes the match registers, if necessary. … second segment contains one or more patches that remain in the patch RAM during normal operation and are accessed via a match-register trap. … original microcode is stored in ROM, … match registers allow the operation of the microcode to be changed. In this way, an [[x86 assembly language|x86 instruction]] that is operating incorrectly can be repaired, assuming it is implemented in microcode. … a patch is created to replace a section of the original microcode, performing the correct operation and then [[branch (computer science)|jumping]] back. … number of match registers, … more than one. … single bug, … might require multiple patches, and some bugs are too complex to repair … mechanism could allow multiple bugs to be fixed, … features of the P6 processor can be disabled via a special register … 2,048-byte block of data. The block contains a 48-byte header—which includes a date code, the [[CPUID|CPU ID]] (which includes the [[stepping level]]) of the target processor, and a checksum—and 2,000 bytes of data to be downloaded by the processor. … checksum … is not used by the CPU. … 2,000 data bytes are encrypted in a way that Intel claims will be extremely difficult to break. The bytes are divided into blocks of varying lengths, each of which is encoded differently. … typically much smaller than 2,000 bytes, the remaining data is random noise intended to confuse anyone attempting to break the encryption. … Intel has not published any information on the format of its microcode, … is deliberately designed to be difficult to understand. Only a small number of Intel employees know the P6 microcode formats.}}</ref>
<ref name="gwennap-20070915">{{cite news|archive-url=https://web.archive.org/web/20091221182054/https://www.ele.uva.es/~jesman/BigSeti/ftp/Cajon_Desastre/MPR/111204.pdf|archive-date=21 December 2009|url=https://www.ele.uva.es/~jesman/BigSeti/ftp/Cajon_Desastre/MPR/111204.pdf|url-status=dead|title=P6 Microcode Can Be Patched|first=Linley|last=Gwennap|date=15 September 1997|work=[[Microprocessor Report]]|access-date=23 January 2018|quote=Intel has implemented a microcode patch capability in its [[P6 (microarchitecture)|P6]] processors, including [[Pentium Pro]] and [[Pentium II]] … allows the microcode to be altered after the processor is fabricated, repairing bugs that are found after the processor is designed. … originally intended the feature to be used only for debugging, but after dealing with the expense of the [[Pentium FDIV bug]] … Intel decided to make it usable in the field. … P6 chip contains a complete set of microcode in an internal [[read-only memory|ROM]] … BIOS writes a memory address into a special CPU register to trigger a download sequence … P6 processors contain a small [[static random-access memory|SRAM]] that holds up to 60 microinstructions. The patch code is downloaded into this SRAM … also contains a set of "match" registers that cause a trap when a particular microcode address is encountered. (This is similar to the "instruction [[breakpoint]]" capability used to debug [[assembly language|assembly code]].) This trap, which takes a single cycle to process, vectors microcode execution into the patch RAM. … downloaded microcode consists of two segments. … first is an initialization routine that is run immediately … also initializes the match registers, if necessary. … second segment contains one or more patches that remain in the patch RAM during normal operation and are accessed via a match-register trap. … original microcode is stored in ROM, … match registers allow the operation of the microcode to be changed. In this way, an [[x86 assembly language|x86 instruction]] that is operating incorrectly can be repaired, assuming it is implemented in microcode. … a patch is created to replace a section of the original microcode, performing the correct operation and then [[branch (computer science)|jumping]] back. … number of match registers, … more than one. … single bug, … might require multiple patches, and some bugs are too complex to repair … mechanism could allow multiple bugs to be fixed, … features of the P6 processor can be disabled via a special register … 2,048-byte block of data. The block contains a 48-byte header—which includes a date code, the [[CPUID|CPU ID]] (which includes the [[stepping level]]) of the target processor, and a checksum—and 2,000 bytes of data to be downloaded by the processor. … checksum … is not used by the CPU. … 2,000 data bytes are encrypted in a way that Intel claims will be extremely difficult to break. The bytes are divided into blocks of varying lengths, each of which is encoded differently. … typically much smaller than 2,000 bytes, the remaining data is random noise intended to confuse anyone attempting to break the encryption. … Intel has not published any information on the format of its microcode, … is deliberately designed to be difficult to understand. Only a small number of Intel employees know the P6 microcode formats.}}</ref>
<ref name="gelas-20060501">{{cite news|url=https://www.anandtech.com/show/1998/3|title=Intel Core versus AMD's K8 architecture|first=Johan|last=De Gelas|date=1 May 2006|accessdate=23 January 2018|work=[[AnandTech]]|page=3|quote=Core architecture is equipped with four x86 decoders, 3 simple decoders and 1 complex decoder … to translate the 1 to 15 byte variable length x86 instructions into … fixed length RISC-like instructions (called micro-ops). … common x86 instructions are translated into a single micro-op … complex decoder is responsible for the instructions that produce up to 4 micro-ops. … really long and complex x86 instructions are handled by a microcode sequencer. … macro-op fusion … the x86 compare instruction (<code>{{abbr|CMP|Compare}}</code>) is fused with a jump (<code>{{abbr|JNE TARG|Jump Not Equal to Target}}</code>).}}</ref>
<ref name="gelas-20060501">{{cite news|url=https://www.anandtech.com/show/1998/3|title=Intel Core versus AMD's K8 architecture|first=Johan|last=De Gelas|date=1 May 2006|access-date=23 January 2018|work=[[AnandTech]]|page=3|quote=Core architecture is equipped with four x86 decoders, 3 simple decoders and 1 complex decoder … to translate the 1 to 15 byte variable length x86 instructions into … fixed length RISC-like instructions (called micro-ops). … common x86 instructions are translated into a single micro-op … complex decoder is responsible for the instructions that produce up to 4 micro-ops. … really long and complex x86 instructions are handled by a microcode sequencer. … macro-op fusion … the x86 compare instruction (<code>{{abbr|CMP|Compare}}</code>) is fused with a jump (<code>{{abbr|JNE TARG|Jump Not Equal to Target}}</code>).}}</ref>
<ref name="chen-ahn-20141211">{{cite document|url=https://www.dcddcc.com/docs/2014_paper_microcode.pdf|title=Security Analysis of x86 Processor Microcode|first1=Daming Dominic|last1=Chen|first2=Gail-Joon|last2=Ahn|publisher=[[Arizona State University]]|date=11 December 2014|accessdate=23 January 2018|pages=1, 5, 7|quote=supervisor privileges (ring zero) are required to update processor microcode … Since the 1970s, processor manufacturers have decoded the x86 … into a sequence of … (RISC) micro-operations (uops) … introduced writable patch memory to provide an update mechanism for implementing dynamic debugging capabilities and correcting processor errata, especially after the infamous [[Pentium FDIV bug]] of 1994. … P6 (Pentium Pro) microarchitecture in 1995, … [[AMD K7|K7 microarchitecture]] in 1999 … with [[symmetric multiprocessing]] (SMP) … should be executed synchronously on each logical processor … patch RAM in addition to the {{abbr|MROM|Microcode read-only memory}} … up to 60 microinstructions, with patching implemented by pairs of match and destination registers. … a 520 byte block containing a 2048-bit RSA modulus that appears to be constant within each processor family. This is followed by a four byte RSA exponent with the fixed value 11h}}</ref>
<ref name="chen-ahn-20141211">{{cite web|url=https://www.dcddcc.com/docs/2014_paper_microcode.pdf|title=Security Analysis of x86 Processor Microcode|first1=Daming Dominic|last1=Chen|first2=Gail-Joon|last2=Ahn|publisher=[[Arizona State University]]|date=11 December 2014|access-date=23 January 2018|pages=1, 5, 7|quote=supervisor privileges (ring zero) are required to update processor microcode … Since the 1970s, processor manufacturers have decoded the x86 … into a sequence of … (RISC) micro-operations (uops) … introduced writable patch memory to provide an update mechanism for implementing dynamic debugging capabilities and correcting processor errata, especially after the infamous [[Pentium FDIV bug]] of 1994. … P6 (Pentium Pro) microarchitecture in 1995, … [[AMD K7|K7 microarchitecture]] in 1999 … with [[symmetric multiprocessing]] (SMP) … should be executed synchronously on each logical processor … patch RAM in addition to the {{abbr|MROM|Microcode read-only memory}} … up to 60 microinstructions, with patching implemented by pairs of match and destination registers. … a 520 byte block containing a 2048-bit RSA modulus that appears to be constant within each processor family. This is followed by a four byte RSA exponent with the fixed value 11h}}</ref>
<ref name="hardice">{{cite web|url=http://www.hardice.org/hardice/reference/intel/probe-mode/details-of-intel-probe-mode|title=Details of Intel Probe mode|work=Hardice|accessdate=23 January 2018|quote=emit a packet over the {{abbr|BPM|0-7 pins; Breakpoint Monitor Pins 0‒7}} when special instructions are executed … To enable Extended Execution Trace, special microcode patches must be applied … For the Pentium 4 only, there exists a second type … called microcode Extended Execution Trace … Control Register Bus in turn allows access to internal arrays and functions on the processor, such as accessing the {{abbr|LLC|Last Level Cache}} and the microcode/{{abbr|Virtual Fuse|VFuse}} PROM. … that sits on the CPU package but is not within the CPU silicon die. This PROM also contains the microcode that the CPU loads during cold boot. … breakpoint on a 48-bit microcode address … accessed by the {{abbr|TAP|Test Access Port}} commands {{abbr|BRKPTCTLA|Breakpoint Control A}} and {{abbr|BRKPTCTLB|Breakpoint Control B}}.}}</ref>
<ref name="hardice">{{cite web|url=http://www.hardice.org/hardice/reference/intel/probe-mode/details-of-intel-probe-mode|title=Details of Intel Probe mode|work=Hardice|access-date=23 January 2018|quote=emit a packet over the {{abbr|BPM|0-7 pins; Breakpoint Monitor Pins 0‒7}} when special instructions are executed … To enable Extended Execution Trace, special microcode patches must be applied … For the Pentium 4 only, there exists a second type … called microcode Extended Execution Trace … Control Register Bus in turn allows access to internal arrays and functions on the processor, such as accessing the {{abbr|LLC|Last Level Cache}} and the microcode/{{abbr|Virtual Fuse|VFuse}} PROM. … that sits on the CPU package but is not within the CPU silicon die. This PROM also contains the microcode that the CPU loads during cold boot. … breakpoint on a 48-bit microcode address … accessed by the {{abbr|TAP|Test Access Port}} commands {{abbr|BRKPTCTLA|Breakpoint Control A}} and {{abbr|BRKPTCTLB|Breakpoint Control B}}.}}</ref>
<ref name="hinton-et-al-2001">{{cite news|url=https://www.intel.com/content/dam/www/public/us/en/documents/research/2001-vol05-iss-1-intel-technology-journal.pdf|work=Intel Technology Journal|issue=Q1|year=2001|editor-first=Lin|editor-last=Chao|title=The Microarchitecture of the Pentium 4 Processor|first1=Glenn|last1=Hinton|first2=Dave|last2=Sager|first3=Mike|last3=Upton|first4=Darrell|last4=Boggs|first5=Doug|last5=Carmean|first6=Alan|last6=Kyker|first7=Patrice|last7=Roussel|quote=IA-32 instruction bytes are then decoded into basic operations called uops (micro-operations) … advanced form of a Level 1 (L1) instruction cache called the Execution Trace Cache … between the instruction decode logic and the execution core … to store the already decoded … uops. … instructions are decoded once … then used repeatedly from there … has a capacity to hold up to 12K uops … similar hit rate to an 8K to 16K byte conventional instruction cache. … packs the uops into groups of six uops per trace line … microcode ROM … for complex IA-32 instructions, such as string move, and for fault and interrupt handling … Trace Cache jumps into the microcode ROM which then issues the uops … After the microcode ROM finishes sequencing uops … front end of the machine resumes fetching uops from the Trace Cache. … deep buffering of the Pentium 4 processor (126 uops and 48 loads in flight)}}</ref>
<ref name="hinton-et-al-2001">{{cite news|url=https://www.intel.com/content/dam/www/public/us/en/documents/research/2001-vol05-iss-1-intel-technology-journal.pdf|work=Intel Technology Journal|issue=Q1|year=2001|editor-first=Lin|editor-last=Chao|title=The Microarchitecture of the Pentium 4 Processor|first1=Glenn|last1=Hinton|first2=Dave|last2=Sager|first3=Mike|last3=Upton|first4=Darrell|last4=Boggs|first5=Doug|last5=Carmean|first6=Alan|last6=Kyker|first7=Patrice|last7=Roussel|quote=IA-32 instruction bytes are then decoded into basic operations called uops (micro-operations) … advanced form of a Level 1 (L1) instruction cache called the Execution Trace Cache … between the instruction decode logic and the execution core … to store the already decoded … uops. … instructions are decoded once … then used repeatedly from there … has a capacity to hold up to 12K uops … similar hit rate to an 8K to 16K byte conventional instruction cache. … packs the uops into groups of six uops per trace line … microcode ROM … for complex IA-32 instructions, such as string move, and for fault and interrupt handling … Trace Cache jumps into the microcode ROM which then issues the uops … After the microcode ROM finishes sequencing uops … front end of the machine resumes fetching uops from the Trace Cache. … deep buffering of the Pentium 4 processor (126 uops and 48 loads in flight)}}</ref>
<ref name="shanley-1998">{{cite book|first=T.|last=Shanley|publisher=Addison-Wesley Professional|year=1998|work=Pentium Pro and Pentium II System Architecture|url=https://books.google.com/books?id=MLJClvCYh34C&pg=PA432|title=The BIOS Update Loader|page=435|isbn=9780201309737}}</ref>
<ref name="shanley-1998">{{cite book|first=T.|last=Shanley|publisher=Addison-Wesley Professional|year=1998|title=Pentium Pro and Pentium II System Architecture|url=https://books.google.com/books?id=MLJClvCYh34C&pg=PA432|page=435|isbn=9780201309737}}</ref>
<ref name="asanovic-2002">{{cite paper|url=https://dspace.mit.edu/bitstream/handle/1721.1/35849/6-823Spring-2002/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-823Computer-System-ArchitectureSpring2002/4E0FC5FE-6F01-43D7-95FE-91E32EB349CF/0/lecture20.pdf|first=Krste|last=Asanovic|issue=Spring|year=2002|journal=Microprocessor Evolution: 4004 to Pentium Pro|page=14|title=P6 uops|accessdate=23 January 2018|quote=Each uop has fixed format of around 118 bits … – opcode, two sources, and destination … – sources and destination fields are 32-bits wide to hold immediate or operand}}</ref>
<ref name="asanovic-2002">{{cite journal|url=https://dspace.mit.edu/bitstream/handle/1721.1/35849/6-823Spring-2002/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-823Computer-System-ArchitectureSpring2002/4E0FC5FE-6F01-43D7-95FE-91E32EB349CF/0/lecture20.pdf|first=Krste|last=Asanovic|author-link=Krste Asanović|issue=Spring|year=2002|journal=Microprocessor Evolution: 4004 to Pentium Pro|page=14|title=P6 uops|access-date=23 January 2018|quote=Each uop has fixed format of around 118 bits … – opcode, two sources, and destination … – sources and destination fields are 32-bits wide to hold immediate or operand}}</ref>
<ref name="kagan-et-al-1997">{{cite paper|url=https://www.smtnet.com/library/files/upload/pentium-microarchitecture.pdf|title=MMX Microarchitecture of Pentium Processors With MMX Technology and Pentium II Microprocessors|first1=Michael|last1=Kagan|first2=Simcha|last2=Gochman|first3=Doron|last3=Orenstien|first4=Derrick|last4=Lin|journal=[[Intel Technology Journal]]|issue=Q3|year=1997|pages=6, 7|quote=Pentium II processor's microarchitecture is similar to that of the Pentium Pro microprocessor … modified to convert the new [[MMX (instruction set)|MMX]] instructions to Pentium Pro processor-specific uops (new Single Instruction Multiple Data [SIMD] uops were added to implement the new functionality). … A microcode assist was created to correct the problem and redo the operation. An assist is a customer-invisible event that flushes out the machine and allows microcode to handle rare but difficult-to-handle problems. Since all MMX instructions zero the {{abbr|TOS|Top of Stack}}, the assist needs to write the {{abbr|TOS|Top of Stack}} to zero and restart the operation. … Illegal opcodes that are instruction holes in the MMX instruction opcode map are defined to generate a one uop assist call. This assist call instructs the ROB to flush the machine and causes an assist microcode flow to cause the processor to handle illegal opcode faults.}}</ref>
<ref name="kagan-et-al-1997">{{cite journal|url=https://www.smtnet.com/library/files/upload/pentium-microarchitecture.pdf|title=MMX Microarchitecture of Pentium Processors With MMX Technology and Pentium II Microprocessors|first1=Michael|last1=Kagan|first2=Simcha|last2=Gochman|first3=Doron|last3=Orenstien|first4=Derrick|last4=Lin|journal=Intel Technology Journal|issue=Q3|year=1997|pages=6, 7|quote=Pentium II processor's microarchitecture is similar to that of the Pentium Pro microprocessor … modified to convert the new [[MMX (instruction set)|MMX]] instructions to Pentium Pro processor-specific uops (new Single Instruction Multiple Data [SIMD] uops were added to implement the new functionality). … A microcode assist was created to correct the problem and redo the operation. An assist is a customer-invisible event that flushes out the machine and allows microcode to handle rare but difficult-to-handle problems. Since all MMX instructions zero the {{abbr|TOS|Top of Stack}}, the assist needs to write the {{abbr|TOS|Top of Stack}} to zero and restart the operation. … Illegal opcodes that are instruction holes in the MMX instruction opcode map are defined to generate a one uop assist call. This assist call instructs the ROB to flush the machine and causes an assist microcode flow to cause the processor to handle illegal opcode faults.}}</ref>
<ref name="kim-et-al-2004">{{cite document|url=http://maggini.eng.umd.edu/pub/pre-exec-cgo2004.pdf|title=Physical Experimentation with Prefetching Helper Threads on Intels Hyper-Threaded Processors|first1=Dongkeun|last1=Kim|first2=Steve|last2=Shih-wei Liao|first3=Perry H.|last3=Wang|first4=Juan|last4=del Cuvillo|first5=Xinmin|last5=Tian|first6=Xiang|last6=Zou|first7=Hong|last7=Wang|first8=Donald|last8=Yeung|first9=Milind|last9=Girkar|first10=John P.|last10=Shen|date=11 January 2004|accessdate=24 January 2018|pages=4, 5|quote=L1 Trace cache: 12K micro-ops, 8-way set associative, 6 micro-ops per line … Shared: Trace cache, … {{abbr|IA-32|Intel Architecture 32-bit}} instruction decode, Microcode ROM, {{abbr|Uop|Micro-operation}} retirement logic, … Partitioned: Uop queue}}</ref>
<ref name="kim-et-al-2004">{{cite web|url=http://maggini.eng.umd.edu/pub/pre-exec-cgo2004.pdf|title=Physical Experimentation with Prefetching Helper Threads on Intels Hyper-Threaded Processors|first1=Dongkeun|last1=Kim|first2=Steve|last2=Shih-wei Liao|first3=Perry H.|last3=Wang|first4=Juan|last4=del Cuvillo|first5=Xinmin|last5=Tian|first6=Xiang|last6=Zou|first7=Hong|last7=Wang|first8=Donald|last8=Yeung|first9=Milind|last9=Girkar|first10=John P.|last10=Shen|date=11 January 2004|access-date=24 January 2018|pages=4, 5|quote=L1 Trace cache: 12K micro-ops, 8-way set associative, 6 micro-ops per line … Shared: Trace cache, … {{abbr|IA-32|Intel Architecture 32-bit}} instruction decode, Microcode ROM, {{abbr|Uop|Micro-operation}} retirement logic, … Partitioned: Uop queue}}</ref>
<ref name="infoworld-20041017">{{cite news|url=https://books.google.com/books?id=ZDgEAAAAMBAJ&pg=PA5|work=[[InfoWorld]]|date=17 October 1994|accessdate=24 January 2018|page=5|title=Court ruling against AMD causes some concern|quote=The decision by the federal district court in San Jose, Calif., said that AMD does not have the right to use Intel's [[in-circuit emulation]] (ICE) code in the AMD microprocessors. This code is present on all AMD 486s but is only used in a low-power 486-DXL and 486-DXLV processors. … AMD has started to rework its entire line of 486s to eliminate the code.}}</ref>
<ref name="infoworld-20041017">{{cite news|url=https://books.google.com/books?id=ZDgEAAAAMBAJ&pg=PA5|work=[[InfoWorld]]|date=17 October 1994|access-date=24 January 2018|page=5|title=Court ruling against AMD causes some concern|quote=The decision by the federal district court in San Jose, Calif., said that AMD does not have the right to use Intel's [[in-circuit emulation]] (ICE) code in the AMD microprocessors. This code is present on all AMD 486s but is only used in a low-power 486-DXL and 486-DXLV processors. … AMD has started to rework its entire line of 486s to eliminate the code.}}</ref>
<ref name="Stiller_1996">{{cite magazine |title=Prozessorgeflüster |series=Trends & News / aktuell - Prozessoren |language=de |author-first1=Andreas |author-last1=Stiller |author-first2=Matthias R. |author-last2=Paul<!-- info contributor on processor internals --> |date=1996-05-12 |volume=1996 |issue=6 |magazine=[[c't – magazin für computertechnik]] |publisher=[[Verlag Heinz Heise GmbH & Co KG]] |issn=0724-8679 |page=20 |url=https://www.heise.de/ct/artikel/Prozessorgefluester-284546.html |access-date=2017-08-28 |url-status=live |archive-url=https://web.archive.org/web/20170828172141/https://www.heise.de/ct/artikel/Prozessorgefluester-284546.html |archive-date=2017-08-28}}</ref>
}}
}}


==Further reading==
==Further reading==
{{refbegin}}
{{refbegin}}
* {{cite patent|country=US|patent-number=5404473|status=patent|inventor1-first=David B.|inventor1-last=Papworth|inventor2-first=Michael A.|inventor2-last=Fetterman|inventor3-first=Andrew F.|inventor3-last=Glew|inventor4-first=Lawrence O.|inventor4-last=Smith (III)|inventor5-first=Michael M.|inventor5-last=Hancock|inventor6-first=Beth|inventor6-last=Schultz|assign1=[[Intel]]|title=Apparatus and method for handling string operations in a pipelined processor|pridate=1994-03-01|fdate=1994-03-01|date-issued=1995-04-04|pubdate=1995-04-04}} "the first {{abbr|Cuops|Control micro operations}} in a REP swing operation loads the {{abbr|MS|micro sequencer}} Loop Counter with the number of iterations remaining after the unrolled iterations are executed. … a small number of iterations (e.g., seven), are sent during the time it takes for the Loop Counter in the MS to be loaded. This unrolled code is executed conditionally based on the value of (E)CX … remaining three iterations are turned into [[NOP (code)|NOPS]]."
* {{cite patent|country=US|number=5404473|status=patent|inventor1-first=David B.|inventor1-last=Papworth|inventor2-first=Michael A.|inventor2-last=Fetterman|inventor3-first=Andrew F.|inventor3-last=Glew|inventor4=Lawrence O. Smith (III), Michael M. Hancock, first=Beth Schultz|assign1=[[Intel]]|title=Apparatus and method for handling string operations in a pipelined processor|pridate=1994-03-01|fdate=1994-03-01|pubdate=1995-04-04}} "the first {{abbr|Cuops|Control micro operations}} in a REP swing operation loads the {{abbr|MS|micro sequencer}} Loop Counter with the number of iterations remaining after the unrolled iterations are executed. … a small number of iterations (e.g., seven), are sent during the time it takes for the Loop Counter in the MS to be loaded. This unrolled code is executed conditionally based on the value of (E)CX … remaining three iterations are turned into [[NOP (code)|NOPS]]."
* {{cite patent|country=US|patent-number=5559974|inventor1-first=Darrell D.|inventor1-last=Boggs|inventor2-first=Gary L.|inventor2-last=Brown|inventor3-first=Michael M.|inventor3-last=Hancock|inventor4-first=Donald D.|inventor4-last=Parker|assign1=[[Intel]]|status=patent|title=Decoder having independently loaded micro-alias and macro-alias registers accessible simultaneously by one micro-operation|pridate=1994-03-01|fdate=1996-09-24|pubdate=1996-09-24}}
* {{cite patent|country=US|number=5559974|inventor1-first=Darrell D.|inventor1-last=Boggs|inventor2-first=Gary L.|inventor2-last=Brown|inventor3-first=Michael M.|inventor3-last=Hancock|inventor4-first=Donald D.|inventor4-last=Parker|assign1=[[Intel]]|status=patent|title=Decoder having independently loaded micro-alias and macro-alias registers accessible simultaneously by one micro-operation|pridate=1994-03-01|fdate=1996-09-24|pubdate=1996-09-24}}
* {{cite patent|country=US|patent-number=5566298|inventor1-first=Darrell D.|inventor1-last=Boggs|inventor2-first=Gary L.|inventor2-last=Brown|inventor3-first=Michael M.|inventor3-last=Hancock|inventor4-first=Donald D.|inventor4-last=Parker|inventor5-first=Gail M.|inventor5-last=Rupnick|assign1=[[Intel]]|status=patent|title=Method for state recovery during assist and restart in a decoder having an alias mechanism|pridate=1994-03-01|fdate=1994-03-01|date-issued=1996-10-15|pubdate=1996-10-15}} "… control returns to the Micro-operation Sequence (MS) unit to issue further error correction Control micro-operations (Cuops). In order to simplify restart, the Cuops originating from the error-causing macroinstruction supplied by the translate programmable logic arrays (XLAT PLAs) are loaded into the Cuop registers, with their valid bits unasserted."
* {{cite patent|country=US|number=5566298|inventor1-first=Darrell D.|inventor1-last=Boggs|inventor2-first=Gary L.|inventor2-last=Brown|inventor3-first=Michael M.|inventor3-last=Hancock|inventor4=Donald D. Parker, Gail M. Rupnick|assign1=[[Intel]]|status=patent|title=Method for state recovery during assist and restart in a decoder having an alias mechanism|pridate=1994-03-01|fdate=1994-03-01|pubdate=1996-10-15}} "… control returns to the Micro-operation Sequence (MS) unit to issue further error correction Control micro-operations (Cuops). In order to simplify restart, the Cuops originating from the error-causing macroinstruction supplied by the translate programmable logic arrays (XLAT PLAs) are loaded into the Cuop registers, with their valid bits unasserted."
* {{cite patent|country=US|patent-number=5600806|title=Method and apparatus for aligning an instruction boundary in variable length macroinstructions with an instruction buffer|inventor1-first=Gary L.|inventor1-last=Brown|inventor2-first=Donald D.|inventor2-last=Parker|assign1=[[Intel]]|pridate=1994-03-01|date-issued=1997-02-04|pubdate=1997-02-04|status=patent}} "ADD, XOR, SUB, AND, and OR, which are implemented with one generic Cuop. Another group of instructions representable by only one {{abbr|Cuop|Control micro-operation}} includes {{abbr|ADC|Add with Carry}} and {{abbr|SBB|Subtract with Borrow}}
* {{cite patent|country=US|number=5600806|title=Method and apparatus for aligning an instruction boundary in variable length macroinstructions with an instruction buffer|inventor1-first=Gary L.|inventor1-last=Brown|inventor2-first=Donald D.|inventor2-last=Parker|assign1=[[Intel]]|pridate=1994-03-01|pubdate=1997-02-04|status=patent}} "ADD, XOR, SUB, AND, and OR, which are implemented with one generic Cuop. Another group of instructions representable by only one {{abbr|Cuop|Control micro-operation}} includes {{abbr|ADC|Add with Carry}} and {{abbr|SBB|Subtract with Borrow}}
* {{cite patent|country=US|patent-number=5630083|title=Decoder for decoding multiple instructions in parallel|inventor1-first=Adrian L.|inventor1-last=Carbine|inventor2-first=Gary L.|inventor2-last=Brown|inventor3-first=Donald D.|inventor3-last=Parker|assign1=[[Intel]]|pridate=1994-03-01|fdate=1996-07-03|date-issued=1997-05-13|pubdate=2013-03-01|status=patent}}
* {{cite patent|country=US|number=5630083|title=Decoder for decoding multiple instructions in parallel|inventor1-first=Adrian L.|inventor1-last=Carbine|inventor2-first=Gary L.|inventor2-last=Brown|inventor3-first=Donald D.|inventor3-last=Parker|assign1=[[Intel]]|pridate=1994-03-01|fdate=1996-07-03|pubdate=2013-03-01|status=patent}}
* {{cite patent|country=US|patent-number=6055656|inventor1-first=James A.|inventor1-last=Wilson, Jr.|inventor2-first=Anthony C.|inventor2-last=Miller|inventor3-first=Michael W.|inventor3-last=Rhodehamel|inventor4-first=Adrian|inventor4-last=Carbine|inventor5-first=Derek B. I.|inventor5-last=Feltham|inventor6-first=Sumeet|inventor6-last=Agrawal|title=Control register bus access through a standardized test access port|pubdate=2000-04-25|fdate=1995-05-02|pridate=1995-05-02|issue-date=2000-04-25|assign1=[[Intel]]|status=patent}}
* {{cite patent|country=US|number=6055656|inventor1-first=James A.|inventor1-last=Wilson, Jr.|inventor2-first=Anthony C.|inventor2-last=Miller|inventor3-first=Michael W.|inventor3-last=Rhodehamel|inventor4=Adrian Carbine, Derek B. I. Feltham, Sumeet Agrawal|title=Control register bus access through a standardized test access port|pubdate=2000-04-25|fdate=1995-05-02|pridate=1995-05-02|assign1=[[Intel]]|status=patent}}
* {{cite patent|country=US|patent-number=20030196096|inventor1-last=Sutton|inventor1-first=James A.|title=Microcode patch authentication|pubdate=2003-10-16|fdate=2002-04-12|status=patent}}
* {{cite patent|country=US|number=20030196096|inventor1-last=Sutton|inventor1-first=James A.|title=Microcode patch authentication|pubdate=2003-10-16|fdate=2002-04-12|status=patent}}
* {{cite patent|country=US|patent-number=5948097|title=Method and apparatus for changing privilege levels in a computer system without use of a call gate|inventor1-first=Andrew|inventor1-last=Glew|inventor2-first=Scott Dion|inventor2-last=Rodgers|assign1=[[Intel]]|fdate=1996-08-29|pridate=1996-08-29|date-issued=1999-09-07|pubdate=1999-09-07|status=patent}} "SYSENTER and SYSEXIT are assembly-language instructions that may be executed on an Intel architecture processor, such as the Pentium Pro processor … micro-operation is determined to be ready when its source fields have been filled with appropriate data … instruction decode unit comprises one or more translate (XLAT) programmable logic arrays (PLAs) that decode each instruction in to one or more micro-operations. … SYSENTER and SYSEXIT instructions are decoded in to micro-operations that perform the steps illustrated in FIGS. 5 and 6, respectively."
* {{cite patent|country=US|number=5948097|title=Method and apparatus for changing privilege levels in a computer system without use of a call gate|inventor1-first=Andrew|inventor1-last=Glew|inventor2-first=Scott Dion|inventor2-last=Rodgers|assign1=[[Intel]]|fdate=1996-08-29|pridate=1996-08-29|pubdate=1999-09-07|status=patent}} "SYSENTER and SYSEXIT are assembly-language instructions that may be executed on an Intel architecture processor, such as the Pentium Pro processor … micro-operation is determined to be ready when its source fields have been filled with appropriate data … instruction decode unit comprises one or more translate (XLAT) programmable logic arrays (PLAs) that decode each instruction in to one or more micro-operations. … SYSENTER and SYSEXIT instructions are decoded in to micro-operations that perform the steps illustrated in FIGS. 5 and 6, respectively."
* {{cite web|url=https://opensource.apple.com/source/xnu/xnu-3789.41.3/osfmk/i386/ucode.c.auto.html|title=Microcode updater interface sysctl|work=[[XNU]]|quote=<code>#define IA32_BIOS_UPDT_TRIG (0x79) /* microcode update trigger MSR */</code>|accessdate=24 January 2018|format=<code>ucode.c</code> driver}}
* {{cite web|url=https://opensource.apple.com/source/xnu/xnu-3789.41.3/osfmk/i386/ucode.c.auto.html|title=Microcode updater interface sysctl|work=[[XNU]]|quote=<code>#define IA32_BIOS_UPDT_TRIG (0x79) /* microcode update trigger MSR */</code>|access-date=24 January 2018|format=<code>ucode.c</code> driver}}
* {{cite book|title=Efficient Embedded Memory Testing with APG|volume=1|doi=10.1109/TEST.2002.1041744|first1=A. T.|last1=Sivaram|first2=Daniel|last2=Fan|first3=A.|last3=Yiin|date=2002-10-10|isbn=0-7803-7542-4|issn=1089-3539|publisher=IEEE|location=Baltimore, Maryland|s2cid=19579807}}
* {{cite conference|volume=1|doi=10.1109/TEST.2002.1041744|first1=A. T.|last1=Sivaram|first2=Daniel|last2=Fan|first3=A.|last3=Yiin|book-title=Proceedings. International Test Conference|title=Efficient embedded memory testing with APG|date=2002-10-10|pages=47–54|isbn=0-7803-7542-4|issn=1089-3539|publisher=IEEE|location=Baltimore, Maryland|s2cid=19579807}}
* {{cite paper|first=Peter|last=Bosch|date=2020-10-01|access-date=2020-11-01|url=https://www.youtube.com/watch?v=4oFOpDflJMA|conference=Hardwear.io Security Conference|location=Netherlands|title=Under the hood of a CPU: Reverse Engineering the P6 microcode}}
* {{cite web|first=Peter|last=Bosch|date=2020-10-01|access-date=2020-11-01|url=https://www.youtube.com/watch?v=4oFOpDflJMA|location=Netherlands|title=Under the hood of a CPU: Reverse Engineering the P6 microcode|website=[[YouTube]]}}


{{refend}}
{{refend}}

== External links ==
* [https://github.com/chip-red-pill/uCodeDisasm uCodeDisasm] — Intel microcode disassembler in Python (from CRBUS), names of uops


[[Category:Intel x86 microprocessors|Microcode, Intel]]
[[Category:Intel x86 microprocessors|Microcode, Intel]]

Latest revision as of 16:06, 16 October 2024

Intel microcode is microcode that runs inside x86 processors made by Intel. Since the P6 microarchitecture introduced in the mid-1990s, the microcode programs can be patched by the operating system or BIOS firmware to work around bugs found in the CPU after release.[1] Intel had originally designed microcode updates for processor debugging under its design for testing (DFT) initiative.[2]

Following the Pentium FDIV bug, the patchable microcode function took on a wider purpose to allow in-field updating without needing to do a product recall.[1]

In the P6 and later microarchitectures, x86 instructions are internally converted into simpler RISC-style micro-operations that are specific to a particular processor and stepping level.[1]

Pre-P6 microcode

[edit]

On the Intel 80486 and AMD Am486 there are approximately 5000 lines of microcode assembly, totalling approximately 240 Kbits stored in the microcode ROM.[3]

P6 and later micro-operations

[edit]

Starting with the Pentium Pro, in most Intel x86 processors, instructions are converted by the instruction fetch and decode unit to sequences of processor-specific micro-operations that are directly executed by the processor. For the instructions that are implemented in microcode, the microcode consists of micro-operations fetched from on-chip memory.[4]

On the Pentium Pro, each micro-operation is 72-bits wide,[5]: 43  or 118-bits wide.[6]: 2 [7]: 14  This includes an opcode, two source fields, and one destination field,[8]: 7  with the ability to hold a 32-bit immediate value.[6][7]: 14  The Pentium Pro is able to detect parity errors in its internal microcode ROM and report these via the Machine Check Architecture.[9]

Micro-operations have a consistent format with up to three source inputs, and two destination outputs.[10] The processor performs register renaming to map these inputs to and from the real register file (RRF) before and after their execution.[10] Out-of-order execution is used, so the micro-operations and instructions they represent may not appear in the same order.

During development of the Pentium Pro, several microcode fixes were included between the A2 and B0 steppings.[11] For the Pentium II (based on the P6 Pentium Pro), additional micro-operations were added to support the MMX instruction set.[12] In several cases, "microcode assists" were added to handle rare corner-cases in a reliable way.[12]

The Pentium 4 can have 126 micro-operations in flight at the same time.[13]: 10  Micro-operations are decoded and stored in an Execution Trace Cache with 12,000 entries, to avoid repeated decoding of the same x86 instructions.[13]: 5  Groups of six micro-operations are packed into a trace line.[13]: 5  Micro-operations can borrow extra immediate data space within the same cache-line.[14]: 49  Complex instructions, such as exception handling, result in jumping to the microcode ROM.[13]: 6  During development of the Pentium 4, microcode accounted for 14% of processor bugs versus 30% of processor bugs during development of the Pentium Pro.[15]: 35 

The Intel Core microarchitecture introduced in 2006 added "macro-operations fusion" for some common pairs of instructions including comparison followed by a jump.[16] The instruction decoders in the Core convert x86 instructions into microcode in three different ways:

Conversion of x86 instructions to micro-operations on Core[16]
x86 instructions x86 decoders micro-operations
common simple decoder × 3 1–3
most others complex decoder × 1 ≤4
very complex microcode sequencer many

For Intel's hyper-threading implementation of simultaneous multithreading, the microcode ROM, trace cache, and instruction decoders are shared, but the micro-operation queue is not shared.[17]

Update facility

[edit]

In the mid-1990s, a facility for supplying new microcode was initially referred to as the Pentium Pro BIOS Update Feature.[18][19] It was intended that user-mode applications should make a BIOS interrupt call to supply a new "BIOS Update Data Block", which the BIOS would partially validate and save to nonvolatile BIOS memory; this could be supplied to the installed processors on next boot.[18]

Intel distributed a program called BUP_UTIL.EXE, renamed CHECKUP3.EXE that could be run under DOS. Collections of multiple microcode updates were concatenated together and numerically numbered with the extension .PDB, such as PEP6.PDB.[20]: 79 

Processor interface

[edit]

The processor boots up using a set of microcode held inside the processor and stored in an internal ROM.[1] A microcode update populates a separate SRAM and set of "match registers" that act as breakpoints within the microcode ROM, to allow jumping to the updated list of micro-operations in the SRAM.[1] A match is performed between the Microcode Instruction Pointer (UIP) all of the match registers, with any match resulting in a jump to the corresponding destination microcode address.[2]: 3  In the original P6 architecture there is space in the SRAM for 60 micro-operations, and multiple match/destination register pairs.[1][2]: 3  It takes one processor instruction cycle to jump from ROM microcode to patched microcode held in SRAM.[1] Match registers consist of a microcode match address, and a microcode destination address.[21]

The processor must be in protection ring zero ("Ring 0") in order to initiate a microcode update.[21]: 1  Each CPU in a symmetric multiprocessing arrangement needs to be updated individually.[21]: 1 

An update is initiated by placing its address in eax register, setting ecx = 0x79, and executing a wrmsr (Write model-specific register).[22]: 435 

Microcode update format

[edit]

Intel distributes microcode updates as a 2,048 (2 kilobyte) binary blob.[1] The update contains information about which processors it is designed for, so that this can be checked against the result of the CPUID instruction.[1] The structure is a 48-byte header, followed by 2,000 bytes intended to be read directly by the processor to be updated:[1]

  1. A microcode program that is executed by the processor during the microcode update process.[1] This microcode is able to reconfigure and enable or disable components using a special register, and it must update the breakpoint match registers.[1]
  2. Up to sixty patched micro-operations to be populated into the SRAM.[1]
  3. Padding consisting of random values, to obfuscate understanding of the format of the microcode update.[1]

Each block is encoded differently, and the majority of the 2,000 bytes are not used as configuration program and SRAM micro-operation contents themselves are much smaller.[1] Final determination and validation of whether an update can be applied to a processor is performed during decryption via the processor.[18] Each microcode update is specific to a particular CPU revision, and is designed to be rejected by CPUs with a different stepping level. Microcode updates are encrypted to prevent tampering and to enable validation.[23]

With the Pentium there are two layers of encryption and the precise details explicitly not documented by Intel, instead being only known to fewer than ten employees.[24]

Microcode updates for Intel Atom, Nehalem and Sandy Bridge additionally contain an extra 520-byte header containing a 2048-bit RSA modulus with an exponent of 17 decimal.[21]: 7, 8 

Observed Intel microcode data-block lengths (in bytes)[21]: 16 
Micro architecture Example processors Supplied length Functional length Suspected encoding
P6 Pentium Pro 2000 864; 872; 944; 1968 64-bit block cipher
Core PIII … Core 2 4048 3096
Netburst P4, Pentium D, Celeron 2000–7120 2000 + N*1024 chained block cipher
Atom, Nehalem, Sandy Bridge Core i3/i5/i7 976–16336 976 + N*1024; 5120 AES + RSA signature

Debugging

[edit]

Special debugging-specific microcode can be loaded to enable Extended Execution Trace, which then outputs extra information via the Breakpoint Monitor Pins.[25] On the Pentium 4, loading special microcode can give access to Microcode Extended Execution Trace mode.[25] When using the JTAG Test Access Port (TAP), a pair of Breakpoint Control registers allow breaking on microcode addresses.[25]

During the mid-1980s NEC and Intel had a long-running US federal court case about microcode copyright.[26] NEC had been acting as a second source for Intel 8086 CPUs with its NEC μPD8086, and held long-term patent and copyright cross-licensing agreements with Intel. In August 1982 Intel sued NEC for copyright infringement over the microcode implementation.[27][28] NEC prevailed by demonstrating via cleanroom software engineering that the similarities in the implementation of microcode on its V20 and V30 processors was the result of the restrictions demanded by the architecture, rather than via copying.[26]

The Intel 386 can perform a built-in self-test of the microcode and programmable logic arrays, with the value of the self-test placed in the EAX register.[29] During the BIST, the microprogram counter is re-used to walk through all of the ROMs, with the results being collated via a network of multiple-input signature registers (MISRs) and linear-feedback shift registers.[30] On start up of the Intel 486, a hardware-controlled BIST runs for 220 clock cycles to check various arrays including the microcode ROM, after which control is transferred to the microcode for further self-testing of registers and computation units.[31] The Intel 486 microcode ROM has 250,000 transistors.[31]

AMD had a long-term contract to reuse Intel's 286, 386 and 486 microcode.[32] In October 2004, a court ruled that the agreement did not cover AMD distributing Intel's 486 in-circuit emulation (ICE) microcode.[32]

Direct Access Testing

[edit]

Direct Access Testing (DAT) is included in Intel CPUs as part of the design for testing (DFT) and Design for Debug (DFD) initiatives allow full coverage testing of individual CPUs prior to sale.[33]

In May 2020, a script reading directly from the Control Register Bus (CRBUS)[34] (after exploiting "Red Unlock" in JTAG USB-A to USB-A 3.0 with Debugging Capabilities, without D+, D− and Vcc[35]) was used to read from the Local Direct Access Test (LDAT) port of the Intel Goldmont CPU and the loaded microcode and patch arrays were read.[36] These arrays are only accessible after the CPU has been put into a specific mode, and consist of five arrays accessed through offset 0x6a0:[37]

  1. ROM: Microcode triads
  2. ROM: Sequence Words
  3. RAM: Sequence Words (updatable)
  4. RAM: Match/Patch pairs (updatable)
  5. RAM: Microcode triads (updatable)

References

[edit]
  1. ^ a b c d e f g h i j k l m n o Gwennap, Linley (15 September 1997). "P6 Microcode Can Be Patched" (PDF). Microprocessor Report. Archived from the original (PDF) on 21 December 2009. Retrieved 23 January 2018. Intel has implemented a microcode patch capability in its P6 processors, including Pentium Pro and Pentium II … allows the microcode to be altered after the processor is fabricated, repairing bugs that are found after the processor is designed. … originally intended the feature to be used only for debugging, but after dealing with the expense of the Pentium FDIV bug … Intel decided to make it usable in the field. … P6 chip contains a complete set of microcode in an internal ROM … BIOS writes a memory address into a special CPU register to trigger a download sequence … P6 processors contain a small SRAM that holds up to 60 microinstructions. The patch code is downloaded into this SRAM … also contains a set of "match" registers that cause a trap when a particular microcode address is encountered. (This is similar to the "instruction breakpoint" capability used to debug assembly code.) This trap, which takes a single cycle to process, vectors microcode execution into the patch RAM. … downloaded microcode consists of two segments. … first is an initialization routine that is run immediately … also initializes the match registers, if necessary. … second segment contains one or more patches that remain in the patch RAM during normal operation and are accessed via a match-register trap. … original microcode is stored in ROM, … match registers allow the operation of the microcode to be changed. In this way, an x86 instruction that is operating incorrectly can be repaired, assuming it is implemented in microcode. … a patch is created to replace a section of the original microcode, performing the correct operation and then jumping back. … number of match registers, … more than one. … single bug, … might require multiple patches, and some bugs are too complex to repair … mechanism could allow multiple bugs to be fixed, … features of the P6 processor can be disabled via a special register … 2,048-byte block of data. The block contains a 48-byte header—which includes a date code, the CPU ID (which includes the stepping level) of the target processor, and a checksum—and 2,000 bytes of data to be downloaded by the processor. … checksum … is not used by the CPU. … 2,000 data bytes are encrypted in a way that Intel claims will be extremely difficult to break. The bytes are divided into blocks of varying lengths, each of which is encoded differently. … typically much smaller than 2,000 bytes, the remaining data is random noise intended to confuse anyone attempting to break the encryption. … Intel has not published any information on the format of its microcode, … is deliberately designed to be difficult to understand. Only a small number of Intel employees know the P6 microcode formats.
  2. ^ a b c Yeoh Eng Hong; Lim Seong Leong; Wong Yik Choong; Lock Choon Hou; Mahmud Adnan (20 April 1998). Chao, Lin (ed.). "An Overview of Advanced Failure Analysis Techniques for Pentium and Pentium Pro Microprocessors" (PDF). Intel Technology Journal (Q2). Pentium Pro microprocessor ... Micropatching DFT feature. ... consists of two key elements: the microcode patch RAM and several pairs of Match and Destination registers. ... Microcode Instruction Pointer (UIP) matches the content of a Match register, the UIP will be reloaded with a new address from the Destination register. ... UIP for the reset subroutine can be set in the Match register ... thereby bypassing the reset subroutine altogether.
  3. ^ Trumbull, Patricia V. (1994-10-07). Intel Corporation v. Advanced Micro Devices (Findings of fact and conclusions of law following "ICE" module of trial). United States District Court for the Northern District of California. San Jose. Retrieved 2021-05-10 – via Advanced Micro Devices. Twelve pins are affiliated with the "ICE" circuitry. … AMD 486DXL and DXLV connect three pins associated with "ICE" in order to implement its "SMM" feature. … 250 lines or 12,032 bits of the "ICE" microcode in the 486. "ICE" constitutes about five percent of the total 486 microcode. … two lines … (used to set the "ICE" mode "flip flop") … blue coded lines of microcode are associated with production testing and not used for "ICE" related purposes. … Seventy-five red coded lines were used by Intel to perform "SMM" in its 486SL, a data sheet function of this version of the chip. About 32 yellow coded lines perform routine operations which are not unique to "ICE." About two lines remain dedicated solely to "ICE."
  4. ^ "A Tour of the Pentium Pro Processor Microarchitecture". Intel. Archived from the original on 1996-12-20.
  5. ^ Kubiatowicz, John (3 May 2004). "Dynamic Scheduling in P6 (Pentium Pro, II, III)" (PDF). Low Power Design, Advanced Intel Processors. CS152 Computer Architecture and Engineering (Lecture 25). Complex 80x86 instructions are executed by a conventional microprogram (8K x 72 bits) that issues long sequences of micro-operations
  6. ^ a b Gwennap, Linley (16 February 1995). "Intel's P6 Uses Decoupled Superscalar Design" (PDF). Microprocessor Report. Vol. 9, no. 2. MicroDesign Resources. pp. 1–7. S2CID 14414612. Archived from the original (PDF) on 8 October 2018. P6 uops have a fixed length of 118 bits, using a regular structure to encode an operation, two sources, and a destination. The source and destination fields are each wide enough to contain a 32-bit operand.
  7. ^ a b Asanovic, Krste (2002). "P6 uops" (PDF). Microprocessor Evolution: 4004 to Pentium Pro (Spring): 14. Retrieved 23 January 2018. Each uop has fixed format of around 118 bits … – opcode, two sources, and destination … – sources and destination fields are 32-bits wide to hold immediate or operand
  8. ^ Colwell, Robert P.; Steck, Randy L.; Intel Corporation (1995-04-12). "A 0.6 μm BiCMOS Processor With Dynamic Execution" (PDF). p. 7. Retrieved 2020-05-27. Micro-ops are the atomic unit of work in the P6 processor and are comprised of an opcode, two source and one destination operand. These micro-ops are fixed length and are more general than the Pentium(R) processor's microcode since they need to be scheduled.
  9. ^ 16.6.1. Simple Error Codes (PDF). Machine Check Architecture (Report). Pentium® Pro Family Developer's Manual. Vol. 3: Operating System Writer's Guide. 3 January 1996. p. 401. Archived from the original on 6 September 2001. Retrieved 1 October 2018. unique codes indicate global error information … Microcode ROM Parity Error{{cite report}}: CS1 maint: unfit URL (link)
  10. ^ a b Ronen, Ronny; Intel Labs (18 January 2005). Micro Operations (Uops) (PDF). The Pentium II/III Processor "Compiler on a Chip" (Report). Haifa: Tel Aviv University. pp. 26, 31, 32, 43, 44, 46. Archived from the original (PDF) on 16 April 2007. Retrieved 23 January 2018. Each "CISC" inst is broken into one or more uops … Canonical representation of src/dest (3 src, 2 dest) … e.g., pop eax becomes esp1<-esp0+4, eax1<-[esp0]ID: Convert instructions into uops. Buffers up to 6 uopsAlloc & RAT … able to work on up to 3 uops per clock … Reservation station (RS) … Pool of all "not yet executed" uops (up to 20) … In order Retirement: … Retires up to 3 uops per clock … OOO Cluster … Up to 5 resource-ready uops are selected, and dispatched per clock
  11. ^ Papworth, David B.; Intel Corporation (April 1996). "Tuning the Pentium Pro Microarchitecture" (PDF). IEEE Micro. p. 14. ISSN 0272-1732. Archived from the original (PDF) on 8 October 2018. Retrieved 8 October 2018. B0 stepping incorporated several microcode bugs and speed path fixes for problems discovered on the A-step silicon
  12. ^ a b Kagan, Michael; Gochman, Simcha; Orenstien, Doron; Lin, Derrick (1997). "MMX Microarchitecture of Pentium Processors With MMX Technology and Pentium II Microprocessors" (PDF). Intel Technology Journal (Q3): 6, 7. Pentium II processor's microarchitecture is similar to that of the Pentium Pro microprocessor … modified to convert the new MMX instructions to Pentium Pro processor-specific uops (new Single Instruction Multiple Data [SIMD] uops were added to implement the new functionality). … A microcode assist was created to correct the problem and redo the operation. An assist is a customer-invisible event that flushes out the machine and allows microcode to handle rare but difficult-to-handle problems. Since all MMX instructions zero the TOS, the assist needs to write the TOS to zero and restart the operation. … Illegal opcodes that are instruction holes in the MMX instruction opcode map are defined to generate a one uop assist call. This assist call instructs the ROB to flush the machine and causes an assist microcode flow to cause the processor to handle illegal opcode faults.
  13. ^ a b c d Hinton, Glenn; Sager, Dave; Upton, Mike; Boggs, Darrell; Carmean, Doug; Kyker, Alan; Roussel, Patrice (2001). Chao, Lin (ed.). "The Microarchitecture of the Pentium 4 Processor" (PDF). Intel Technology Journal. No. Q1. IA-32 instruction bytes are then decoded into basic operations called uops (micro-operations) … advanced form of a Level 1 (L1) instruction cache called the Execution Trace Cache … between the instruction decode logic and the execution core … to store the already decoded … uops. … instructions are decoded once … then used repeatedly from there … has a capacity to hold up to 12K uops … similar hit rate to an 8K to 16K byte conventional instruction cache. … packs the uops into groups of six uops per trace line … microcode ROM … for complex IA-32 instructions, such as string move, and for fault and interrupt handling … Trace Cache jumps into the microcode ROM which then issues the uops … After the microcode ROM finishes sequencing uops … front end of the machine resumes fetching uops from the Trace Cache. … deep buffering of the Pentium 4 processor (126 uops and 48 loads in flight)
  14. ^ Fog, Agner (2020-05-25). "The microarchitecture of Intel, AMD and VIA CPUs" (PDF) (An optimization guide for assembly programmers and compiler makers). Technical University of Denmark. p. 49. … If a μop has an immediate 32-bit operand outside the ±215 interval so that it cannot be represented as a 16-bit signed integer, then it will use two trace cache entries unless it can borrow storage space from a nearby μop. … A μop in need of extra storage space can borrow 16 bits of extra storage space from a nearby μop that doesn't need its own data space.
  15. ^ Bentley, Bob; Gray, Rand (2001). Chao, Lin (ed.). "Validating The Intel® Pentium® 4 Processor" (PDF). Intel Technology Journal (Q1): 29–26. Bug Discussion
  16. ^ a b De Gelas, Johan (1 May 2006). "Intel Core versus AMD's K8 architecture". AnandTech. p. 3. Retrieved 23 January 2018. Core architecture is equipped with four x86 decoders, 3 simple decoders and 1 complex decoder … to translate the 1 to 15 byte variable length x86 instructions into … fixed length RISC-like instructions (called micro-ops). … common x86 instructions are translated into a single micro-op … complex decoder is responsible for the instructions that produce up to 4 micro-ops. … really long and complex x86 instructions are handled by a microcode sequencer. … macro-op fusion … the x86 compare instruction (CMP) is fused with a jump (JNE TARG).
  17. ^ Kim, Dongkeun; Shih-wei Liao, Steve; Wang, Perry H.; del Cuvillo, Juan; Tian, Xinmin; Zou, Xiang; Wang, Hong; Yeung, Donald; Girkar, Milind; Shen, John P. (11 January 2004). "Physical Experimentation with Prefetching Helper Threads on Intels Hyper-Threaded Processors" (PDF). pp. 4, 5. Retrieved 24 January 2018. L1 Trace cache: 12K micro-ops, 8-way set associative, 6 micro-ops per line … Shared: Trace cache, … IA-32 instruction decode, Microcode ROM, Uop retirement logic, … Partitioned: Uop queue
  18. ^ a b c 8: Pentium Pro Processor BIOS Update Feature (PDF) (Report). 2.0. Intel. 12 January 1996. p. 45. Retrieved 3 November 2020. authentication procedure relies upon the decryption provided by the processor to verify an update from a potentially hostile sources.
  19. ^ Stiller, Andreas; Paul, Matthias R. (1996-05-12). "Prozessorgeflüster". c't – magazin für computertechnik. Trends & News / aktuell - Prozessoren (in German). Vol. 1996, no. 6. Verlag Heinz Heise GmbH & Co KG. p. 20. ISSN 0724-8679. Archived from the original on 2017-08-28. Retrieved 2017-08-28.
  20. ^ Mueller, Scott; Zacker, Craig (September 1998). Minatel, Jim; Byus, Jill; Kughen, Rick (eds.). Upgrading and Repairing PCs (PDF) (Tenth Anniversary ed.). Que Publishing. p. 79. ISBN 0-7897-1636-4. Retrieved 1 October 2018. Processor Steppings (Revisions) and Microcode Update Revisions Supported by the Update Database File PEP6.PDB … Using the processor update utility (CHECKUP3.EXE), … can easily verify … the correct microcode update
  21. ^ a b c d e Chen, Daming Dominic; Ahn, Gail-Joon (11 December 2014). "Security Analysis of x86 Processor Microcode" (PDF). Arizona State University. pp. 1, 5, 7. Retrieved 23 January 2018. supervisor privileges (ring zero) are required to update processor microcode … Since the 1970s, processor manufacturers have decoded the x86 … into a sequence of … (RISC) micro-operations (uops) … introduced writable patch memory to provide an update mechanism for implementing dynamic debugging capabilities and correcting processor errata, especially after the infamous Pentium FDIV bug of 1994. … P6 (Pentium Pro) microarchitecture in 1995, … K7 microarchitecture in 1999 … with symmetric multiprocessing (SMP) … should be executed synchronously on each logical processor … patch RAM in addition to the MROM … up to 60 microinstructions, with patching implemented by pairs of match and destination registers. … a 520 byte block containing a 2048-bit RSA modulus that appears to be constant within each processor family. This is followed by a four byte RSA exponent with the fixed value 11h
  22. ^ Shanley, T. (1998). Pentium Pro and Pentium II System Architecture. Addison-Wesley Professional. p. 435. ISBN 9780201309737.
  23. ^ Wolfe, Alexander (30 June 1997). "Intel preps plan to bust bugs in Pentium MPUs". EE Times. No. 960. Archived from the original on 1999-11-13. Retrieved 3 October 2018 – via Techweb. obscure moniker "BIOS Update Feature." … "Each BIOS Update is tailored for a particular stepping of [a] processor," … data block is mapped directly-… after decryption-to the microcode itself.
  24. ^ Wolfe, Alexander (30 June 1997). "Hole seen in Intel's bug-busting feature". EE Times. Santa Clara. Archived from the original on 2003-03-09. Ajay Malhortra, a technical marketing manager based here at Intel's microprocessor group. "Not only is the data block containing the microcode patch encrypted, but once the processor examines the header of the BIOS update, there are two levels of encryption in the processor that must occur before it will successfully load the update." … closely guarded secret. "There is no documentation," said Frank Binns, an architect in Intel's microprocessor group. "It's not as if you can get an Intel 'Red Book' with this stuff written down. It's actually in the heads of less than 10 people in the whole of Intel."
  25. ^ a b c "Details of Intel Probe mode". Hardice. Retrieved 23 January 2018. emit a packet over the BPM when special instructions are executed … To enable Extended Execution Trace, special microcode patches must be applied … For the Pentium 4 only, there exists a second type … called microcode Extended Execution Trace … Control Register Bus in turn allows access to internal arrays and functions on the processor, such as accessing the LLC and the microcode/Virtual Fuse PROM. … that sits on the CPU package but is not within the CPU silicon die. This PROM also contains the microcode that the CPU loads during cold boot. … breakpoint on a 48-bit microcode address … accessed by the TAP commands BRKPTCTLA and BRKPTCTLB.
  26. ^ a b Elkins, David S. (Winter 1990). "NEC v. Intel: A Guide to Using "Clean Room" Procedures as Evidence". Computer/Law Journal. 10 (4): 453. NEC's use of its clean room procedures as trial evidence … Judge Gray defined microcode … within the Copyright Act's definition of a "computer program," … Intel's microcode is copyrightable. … Intel's microcode did not contain the required copyright notice. … copyrights had been forfeited. … Intel was left with no basis for its claim of copying
  27. ^ Hinckley, Robert C. (January 1987). "NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright Editors'". Santa Clara High Technology Law Journal. 3 (1). Appendix: Microcode formats; 8086/8088 Format; V20/V30 format
  28. ^ Leong, Kathy Chin (28 March 1988). "Intel witness recants story". Computerworld. Vol. 22, no. 13. San Jose. pp. 83, 84. ISSN 0010-4841. Retrieved 2 October 2018.
  29. ^ "Intel386 DX Microprocessor 32-BIT CHMOS Microprocessor with Integrated Memory Management" (PDF). December 1995. Archived from the original on 3 September 2004. self-test checks the function of all of the Control ROM … EAX register will contain a signature of 00000000h indicating the Intel386 DX passed its self-test of microcode and major PLA contents{{cite web}}: CS1 maint: unfit URL (link)
  30. ^ "5.1 Exhaustive Test in the Intel 80386" (PDF). Built-In-Self-Test (BIST) for Embedded Systems. Testing of Embedded System. IIT Kharagpur: 21. 7 October 2006. Retrieved 6 October 2018. For ROMs, the patterns are generated by the microprogram counter which is part of the normal logic.
  31. ^ a b Gelsinger, Patrick; lyengar, Sundar; Krauskopf, Joseph; Nadir, James; Intel (1999). Computer Aided Design and Built In Self Test on the i486™ CPU (PDF). 1989 IEEE International Conference on Computer Design: VLSI in Computers and Processors. IEEE. pp. 200–201.
  32. ^ a b "Court ruling against AMD causes some concern". InfoWorld. 17 October 1994. p. 5. Retrieved 24 January 2018. The decision by the federal district court in San Jose, Calif., said that AMD does not have the right to use Intel's in-circuit emulation (ICE) code in the AMD microprocessors. This code is present on all AMD 486s but is only used in a low-power 486-DXL and 486-DXLV processors. … AMD has started to rework its entire line of 486s to eliminate the code.
  33. ^ Wu, David M.; Lin, Mike; Reddy, Madhukar; Jaber, Talal; Sabbavarapu, Anil; Thatcher, Larry; Intel Corporation (2004). "An An optimized DFT and test pattern generation strategy for an Intel high performance microprocessor" (PDF). pp. 38, 43, 44. Direct Access Testing (DAT) for array access and diagnosis and Programmable Weak Write Test Mode (PWWTM) for memory cell stability test to reduce the test time. … Array DFT test strategy is to use PBIST (Programmable Built-In Self Test) to test the second level cache and use DAT to test the remaining arrays … PBIST is available through the JTAG TAP controller. … DAT mode in PX as shown in Figure 4 … PX has more arrays (>110) … array test coverage of PX is 99.3% ‒ the highest in Pentium 4 family
  34. ^ Team, uCode Research (25 May 2020). "chip-red-pill/crbus_scripts". GitHub. Retrieved 26 May 2020.
  35. ^ Positive Research (2020-07-21), ptresearch/IntelTXE-PoC, retrieved 2020-07-25
  36. ^ Ermolov, Mark [@_markel___] (2020-05-19). "Using the Local Direct Access Test (LDAT) DFT feature of Intel Atom CPU, we dumped Microcode Sequencer ROM. Also, we extracted what we think is IROM (Immediates for uops) and even managed to modify MS Patch RAM and Match/Patch registers" (Tweet) – via Twitter.
  37. ^ Bosch, Peter (2020-05-22). "Intel LDAT notes". Retrieved 2020-05-26. PDAT CR: 0x6A0; Array Select: 0‒4

Further reading

[edit]
  • US patent 5404473, Papworth, David B.; Fetterman, Michael A. & Glew, Andrew F. et al., "Apparatus and method for handling string operations in a pipelined processor", published 1995-04-04, assigned to Intel  "the first Cuops in a REP swing operation loads the MS Loop Counter with the number of iterations remaining after the unrolled iterations are executed. … a small number of iterations (e.g., seven), are sent during the time it takes for the Loop Counter in the MS to be loaded. This unrolled code is executed conditionally based on the value of (E)CX … remaining three iterations are turned into NOPS."
  • US patent 5559974, Boggs, Darrell D.; Brown, Gary L. & Hancock, Michael M. et al., "Decoder having independently loaded micro-alias and macro-alias registers accessible simultaneously by one micro-operation", published 1996-09-24, assigned to Intel 
  • US patent 5566298, Boggs, Darrell D.; Brown, Gary L. & Hancock, Michael M. et al., "Method for state recovery during assist and restart in a decoder having an alias mechanism", published 1996-10-15, assigned to Intel  "… control returns to the Micro-operation Sequence (MS) unit to issue further error correction Control micro-operations (Cuops). In order to simplify restart, the Cuops originating from the error-causing macroinstruction supplied by the translate programmable logic arrays (XLAT PLAs) are loaded into the Cuop registers, with their valid bits unasserted."
  • US patent 5600806, Brown, Gary L. & Parker, Donald D., "Method and apparatus for aligning an instruction boundary in variable length macroinstructions with an instruction buffer", published 1997-02-04, assigned to Intel  "ADD, XOR, SUB, AND, and OR, which are implemented with one generic Cuop. Another group of instructions representable by only one Cuop includes ADC and SBB
  • US patent 5630083, Carbine, Adrian L.; Brown, Gary L. & Parker, Donald D., "Decoder for decoding multiple instructions in parallel", published 2013-03-01, assigned to Intel 
  • US patent 6055656, Wilson, Jr., James A.; Miller, Anthony C. & Rhodehamel, Michael W. et al., "Control register bus access through a standardized test access port", published 2000-04-25, assigned to Intel 
  • US patent 20030196096, Sutton, James A., "Microcode patch authentication", published 2003-10-16 
  • US patent 5948097, Glew, Andrew & Rodgers, Scott Dion, "Method and apparatus for changing privilege levels in a computer system without use of a call gate", published 1999-09-07, assigned to Intel  "SYSENTER and SYSEXIT are assembly-language instructions that may be executed on an Intel architecture processor, such as the Pentium Pro processor … micro-operation is determined to be ready when its source fields have been filled with appropriate data … instruction decode unit comprises one or more translate (XLAT) programmable logic arrays (PLAs) that decode each instruction in to one or more micro-operations. … SYSENTER and SYSEXIT instructions are decoded in to micro-operations that perform the steps illustrated in FIGS. 5 and 6, respectively."
  • "Microcode updater interface sysctl" (ucode.c driver). XNU. Retrieved 24 January 2018. #define IA32_BIOS_UPDT_TRIG (0x79) /* microcode update trigger MSR */
  • Sivaram, A. T.; Fan, Daniel; Yiin, A. (2002-10-10). "Efficient embedded memory testing with APG". Proceedings. International Test Conference. Vol. 1. Baltimore, Maryland: IEEE. pp. 47–54. doi:10.1109/TEST.2002.1041744. ISBN 0-7803-7542-4. ISSN 1089-3539. S2CID 19579807.
  • Bosch, Peter (2020-10-01). "Under the hood of a CPU: Reverse Engineering the P6 microcode". YouTube. Netherlands. Retrieved 2020-11-01.
[edit]
  • uCodeDisasm — Intel microcode disassembler in Python (from CRBUS), names of uops