x86
It has been suggested that x86 assembly language be merged into this article. (Discuss) Proposed since July 2007. |
The generic term x86 refers to the "CISC" type instruction set of the most commercially successful CPU architecture[1] in the history of personal computing, developed by Intel and used in processors from Intel, AMD, VIA, and others. It derived from the model numbers of the first few generations of CPUs, backward compatible with Intel's original 16-bit 8086 of 1978, most of which were ending in "86".[2]
As the x86 term became common after the introduction of the 80386 in 1985, it usually implies a binary compatibility also with the extended 32-bit instruction set of the 80386. This may sometimes be emphasized as x86-32 to distinguish it either from the original 16-bit x86-16 or from the newer 64-bit x86-64 (also called x64).[3]
Today, x86 hardware usually implies also 64-bit capabilities, at least for personal computers and servers. However, to avoid compatibility problems, x86 software usually implies only 32-bit, while x86-64 or x64 are used to denote exclusive 64-bit software.[4][5]
The only significant competitors to x86 in PCs were the Motorola 68k, CISC type, and the PowerPC, RISC type, instruction sets. However, by August 7, 2006, Apple Inc. switched to x86 CPUs granting the x86 instruction set an effective monopoly among desktop and notebook processors. The x86 also held a growing majority among servers and workstations. Markets without a significant x86 presence include low cost embedded processors found in appliances and toys, among others.[6]
Countless computer software is written for the x86 platform – including nearly all modern commercial operating systems from MS-DOS and Microsoft Windows to Linux, BSD, Solaris OS, and Mac OS X – making the x86 instruction set architecture indispensable on a global scale, and practically irreplaceable.
Chronology
The table below lists brands of famous[7] x86 (instruction set) consumer targeted processors grouped by generations. Note: A definition of CPU generation is not strict. Each generation is roughly marked by significantly improved and commercially successful processor microarchitecture designs.
Generation | Introduction | Prominent Consumer CPU brands | Addressing | Notable features |
---|---|---|---|---|
1 (IA-16) | 1978 | Intel 8086, Intel 8088, Intel 80186, NEC V20 | 16-bit | First x86 microprocessors |
2 | 1982 | Intel 80286 | 16-bit | built-in MMU |
3 (IA-32) | 1985 | Intel386, AMD Am386 | 32-bit | IA-32 instruction set, MMU with paging |
4 | 1989 | Intel486 | 32-bit | Instruction pipeline, integrated FPU, integrated cache |
5 | 1993 | Pentium, AMD K5, AMD K6 | 32-bit | Superscalar, 64-bit bus, MMX |
6 | 1995 | Pentium Pro, Pentium II, AMD K6-2, Cyrix 6x86, Pentium III | 32-bit | RISC core, L2 cache, superpipelining, SSE |
6-M | 2003 | Pentium M | 32-bit | low power, |
7 (IA-32, X86-64) | 1999 | Athlon, Athlon XP, Pentium 4, Pentium D | 32-bit, 64-bit | SSE2, SSE3, Hyper-Threading |
7-M | 2006 | Intel Core | 32-bit | dual-core |
8 (X86-64) | 2003 | Athlon 64, Intel Core 2 | 64-bit | x86-64 instruction set, multi-core |
History
The x86 architecture first appeared as the Intel 8086 CPU released in 1978, a fully 16-bit design based on the earlier Intel 8085. Although not binary compatible, it was designed to allow assembly language programs written for the 8085 be mechanically translated into the equivalent 8086 assembly. This made the new processor a tempting migration path for 8085 hardware and software vendors, but - mainly due to a wider databus - not without significant redesign of system hardware. To address this, Intel introduced the almost identical, but externally 8-bit, 8088 which permitted simpler printed circuit boards, demanded fewer (1-bit wide) DRAM chips, and more easily could be interfaced to already established (i.e. low-cost) 8-bit system and peripheral chips. Among other, non technical, factors, this contributed to the fact that IBM built their IBM PC around the 8088, despite a presence of (at the time) better 16-bit microprocessors from Motorola, Zilog, and National Semiconductor. Subsequently, the IBM PC became the dominant personal computer platform and the 8088 and its successors became the dominant CPU architecture for desktop and laptop computers.
At various times, companies such as IBM, NEC, AMD, TI, STM, Fujitsu, OKI, Siemens, Cyrix, Intersil, C&T, NexGen, and UMC started to design and/or manufacture x86 processors intended for personal computers as well as embedded systems. Such x86 implementations are seldom plain copies but often employ different internal microarchitectures as well as different solutions at the electronic and physical levels. Quite naturally, early compatible chips were 16-bit, while 32-bit designs appeared much later. For the personal computer market, real quantities started to appear around 1990 with i386 and i486 compatible processors, often named similarly to Intel's original chips. Other companies, which designed or manufactured x86 or x87 processors, include ITT Corporation, National Semiconductor, ULSI System Technology, and Weitek.
Following the fully pipelined i486, Intel introduced the Pentium brand name (which, unlike numbers, could be trademarked) for their new line of superscalar x86 designs. With the x86 naming scheme now legally cleared, IBM partnered with Cyrix to produce the 5x86 and then the very efficient 6x86 (M1) and 6x86MX (MII) lines of Cyrix designs, which were the first x86 chips implementing register renaming to enable speculative execution. AMD meanwhile designed and manufactured the advanced but delayed 5k86 (K5), which, internally, was heavily based on AMD's earlier 29K RISC design; similar to NexGen's Nx586, it used a strategy where dedicated pipeline stages decode x86 instructions into uniform and easily handled micro-operations, a method that has remained standard to this day.
Some early versions of these chips had heat dissipation problems. The 6x86 was also affected by a few minor compatibility issues, the Nx586 lacked an FPU and (the then crucial) pin-compatibility, while the K5 had somewhat disappointing performance when it was (eventually) launched. A low customer awareness of alternatives to the Pentium line further contributed to these designs being comparatively unsuccessful, despite the fact that the K5 had very good Pentium compatibility and the 6x86 was significantly faster than the Pentium on integer code.[8] AMD later managed to establish itself as a serious contender with the K6 line of processors, which gave way to the highly successful Athlon and Opteron. There were also other contenders, such as Centaur Technology, (IDT), Rise Technology, and Transmeta. VIA Technologies' energy efficient C3 and C7 processors were designed by Centaur and are in full production today.
The architecture has twice been extended to a larger word size. In 1985, Intel released the 32-bit 386 to gradually replace the earlier 16-bit chips (which were sold for many more years). This extension to the architecture is sometimes called x86-32 to differentiate it from the original "x86-16" or the newer x86-64 extension. However, it was originally referred to as i386 by Intel (and others) and later renamed IA-32 (for Intel Architecture-32-bit) when Intel unveiled its unrelated 64-bit Itanium architecture, referred to as IA-64. In 1999-2003, AMD further extended the architecture to 64 bits, originally called x86-64 in AMD documents, but now AMD64. Intel soon adopted AMD's architectual extensions under the name IA-32e which was later renamed EM64T and finally Intel 64 (not to be confused with the unrelated IA-64 architecture). Microsoft and Sun Microsystems have used their own vendor-neutral x64 for this same x86-64 architecture.
Design
Technical overview
The x86 architecture is a variable instruction length, primarily two-address, "CISC" design with emphasis on backward compatibility. The instruction set is not typical CISC however, but basically an extended and orthogonalized version of the simple eight-bit 8085 architecture. Words are stored in little-endian order and 16-bit and 32-bit accesses are allowed to unaligned memory addresses. To conserve opcode space, most register-addresses are three bits, and at most one operand can be in memory (in contrast with some highly orthogonal CISC designs such as PDP-11 where both operands can be in memory), but this memory operand may also be the destination, while the other operand, the source, can be either register or immediate. This contributes, among other factors, to a code footprint that rivals 8-bit machines and enables efficient use of instruction cache memory. During execution, current x86 processors employ a few extra decoding steps to split most instructions into smaller pieces, micro-ops, which are readily executed by a micro-architecture that could be (simplistically) described as a RISC-machine without the usual load/store limitations. The small number of general registers (also inherited from 8085) has made register-relative addressing (using small immediate offsets) an important method of accessing operands, especially on the stack. Much work has therefore been invested in making such accesses as fast as register accesses, i.e. a one cycle instruction throughput in most circumstances.
Segmentation
Minicomputers during the late 1970s were running up against the 16-bit 64-KB address limit, as memory had become cheaper. Most such companies therefore redesigned their processors to directly handle 32-bit addressing and data. The original 8086, developed from the simple 8085 microprocessor and primarily aiming at another market, instead adopted a much-criticized concept of segment registers which raised the memory address limit by only 4 bits, to 20 bits (1 megabyte).
Data and/or code could be managed within "near" 16-bit segments within this 1 MB address space, or a compiler could operate in a "far" mode using 32-bit segment:offset
pairs reaching (only) 1 MB. While that would also prove to be quite limiting by the mid-1980s, it was working for the emerging PC market, and made it very simple to translate software from the older 8080, 8085, and Z80 to the newer processor. Seven years later, in 1985, this cumbersome addressing model was effectively factored out by the introduction of 32-bit offset registers, in the 386 design.
The original 8086 and 8088
The original Intel 8086 and 8088 have fourteen 16-bit registers. Four of them (AX, BX, CX, DX) are general registers (although each have an additional purpose; for example only CX can be used as a counter with the loop instruction). Each can be accessed as two separate bytes (thus BX's high byte can be accessed as BH and low byte as BL). Four segment registers (CS, DS, SS and ES) are used to form a memory address. There are two pointer registers. SP points to the bottom of the stack and BP which is used to point at some other place in the stack or the memory(Offset). Two registers (SI and DI) are for array indexing.The FLAGS register contains flags such as carry flag, overflow flag and zero flag. Finally, the instruction pointer (IP) points to the current instruction.
Addressing modes can be summarized by this formula:
The 8086 has 64 KB of 8-bit (or alternatively 32 K-word of 16-bit) I/O space, and a 64 KB (one segment) stack in memory supported by hardware. Only words (2 bytes) can be pushed to the stack. The stack grows downwards (toward numerically lower addresses), its bottom being pointed by SS:SP. There are 256 interrupts, which can be invoked by both hardware and software. The interrupts can cascade, using the stack to store the return address.
Real mode
Real mode is an operating mode of 80286 and later x86-compatible CPUs. Real mode is characterized by a 20 bit segmented memory address space (meaning that only 1 MB of memory can be addressed), direct software access to BIOS routines and peripheral hardware, and no concept of memory protection or multitasking at the hardware level. All x86 CPUs in the 80286 series and later start up in real mode at power-on; 80186 CPUs and earlier had only one operational mode, which is equivalent to real mode in later chips.
In real mode, memory access is segmented. This is done by shifting the segment address left by 4 bits and adding an offset in order to receive a final 20-bit address. For example, if DS is A000h and SI is 5677h, DS:SI will point at the absolute address DS × 16 + SI = A5677h. Thus the total address space in real mode is 220 bytes, or 1 MB, quite an impressive figure for 1978. All memory addresses consist of both a segment and offset; every type of access (code, data, or stack) has a default segment register associated with it (for data the register is usually DS, for code it is CS, and for stack it is SS). For data accesses, the segment register can be explicitly specified (using a segment override prefix) to use any of the four segment registers.
In this scheme, two different segment/offset pairs can point at a single absolute location. Thus, if DS is A111h and SI is 4567h, DS:SI will point at the same A5677h as above. This scheme makes it impossible to use more than four segments at once. CS and SS are vital for the correct functioning of the program, so that only DS and ES can be used to point to data segments outside the program (or, more precisely, outside the currently-executing segment of the program) or the stack. This scheme was intended as a compatibility measure with the Intel 8085.
The segmented nature can make programming and compilers design difficult because the use of near and far pointers affect performance. The introduction of bank switching schemes such as EEMS made programming even more complicated before the adoption of 32 bit addressing methods with later processors.
16-bit protected mode
In addition to real mode, the Intel 80286 supports protected mode, expanding addressable physical memory to 16 MB and addressable virtual memory to 1GB. This is done by using the segment registers only for storing an index to a segment table. There were two such tables, the Global Descriptor Table (GDT) and the Local Descriptor Table (LDT), each holding up to 8192 segment descriptors, each segment giving access to 64 KB of memory. The segment table provided a 24-bit base address, which can be added to the desired offset to create an absolute address. Each segment can be assigned one of four ring levels used for hardware-based computer security.
Because real mode DOS programs may do direct hardware access or perform segment arithmetic, both incompatible with protected mode, an operating system (OS) is limited in its ability to run these applications as processes. To overcome these difficulties, Intel introduced the 80386 with virtual 8086 mode. While still subject to paging, it uses real mode to form linear addresses and allows the OS to trap both I/O and memory access. By design, protected mode programs do not assume a relation between selector values and physical addresses.
Operating systems like OS/2 1.x try to switch the processor between protected and real modes. This is both slow and unsafe, because a real mode program can easily crash a computer. OS/2 1.x defines restrictive programming rules allowing a Family API or bound program to run in either real or protected mode.
Windows 3.0 should run real mode programs in 16-bit protected mode. Windows 3.0, when transitioning to protected mode, decided to preserve the single privilege level model that was used in real mode, which is why Windows applications and DLLs can hook interrupts and do direct hardware access. That lasted through the Windows 9x series. If a Windows 1.x or 2.x program is written properly and avoids segment arithmetic, it will run the same way in both real and protected modes. Windows programs generally avoid segment arithmetic because Windows implements a software virtual memory scheme, moving program code and data in memory when programs are not running, so manipulating absolute addresses is dangerous; programs should only keep handles to memory blocks when not running. Starting an old program while Windows 3.0 is running in protected mode triggers a warning dialog, suggesting to either run Windows in real mode or to obtain an updated version of the application. Updating well-behaved programs using the MARK utility with the MEMORY parameter avoids this dialog. It is not possible to have some GUI programs running in 16-bit protected mode and other GUI programs running in real mode. In Windows 3.1 real mode disappeared.
32-bit protected mode
The Intel 80386 introduced a significant advance in x86 architecture: an all 32-bit design supporting paging. All of the registers, instructions, I/O space and memory are 32-bit. Memory is accessed through a 32-bit extension of protected mode. As in the 286, segment registers are used to index a segment table describing the division of memory. With a 32-bit offset, every application may access up to 4 GB (or more with memory segments). In addition, 32-bit protected mode supports paging, a mechanism making it possible to use virtual memory. An exception to this design is the Intel 80386SX, which is 32-bit with 24-bit addressing and a 16-bit data bus.
No new general-purpose registers were added. All 16-bit registers except the segment registers were expanded to 32 bits. This is represented by prefixing an "E" (for Extended) to the register opcodes (thus the expanded AX became EAX, SI became ESI and so on). With a greater number of registers, instructions and operands, the machine code format was expanded. To provide backward compatibility, segments with executable code can be marked as containing either 16 or 32 bit instructions. Special prefixes allow inclusion of 32-bit instructions in a 16-bit segment or vice versa.
Paging and segmented memory access are required for modern multitasking operating systems. Linux, 386BSD and Windows NT were developed for the 386 because it was the first Intel architecture CPU to support paging and 32-bit segment offsets. The 386 architecture became the basis of all further development in the x86 series. The success of Windows 3.1, the first widely accepted version of Microsoft Windows, was largely due to its ability to take advantage of 386 features, even though it was used mainly to run multiple sessions rather than to take advantage of the native 32-bit instruction set.
The Intel 80387 math co-processor was integrated into the next CPU in the series, the Intel 80486 (the 486SX, sold as a budget processor, had its co-processor disabled or removed). The new floating point unit (FPU) performed floating point calculations, important for scientific applications and graphic design.
MMX and beyond
MMX is a SIMD instruction set designed by Intel, introduced in 1997 for Pentium MMX microprocessors. It developed out of a similar unit first used on the Intel i860. It is supported on most subsequent IA-32 processors by Intel and other vendors. MMX is typically used for video applications.
MMX added 8 new 64-bit registers to the architecture, known as MM0 through MM7 (generically MMn). In reality, these new registers are aliases for the existing x87 FPU stack registers. Hence, anything done to the floating point stack also affects the MMX registers. Unlike the floating point stack, these MMn registers are randomly accessible.
3DNow!
In 1997 AMD introduced 3DNow! which consisted of SIMD floating point instruction enhancements to MMX. The introduction of this technology coincided with the rise of 3D entertainment applications and was designed to improve the CPU's vector processing performance of graphic-intensive applications. 3D video game developers and 3D graphics hardware vendors use 3DNow! to enhance their performance on AMD's K6 and Athlon series of processors.
SSE
In 1999, Intel introduced the Streaming SIMD Extensions (SSE) instruction set which added eight new 128 bit registers (not overlaid with other registers) and 70 floating point instructions.
In 2000 Intel introduced the SSE2 instruction set, adding a complete complement of integer instructions (analogous to MMX) to the original SSE registers and 64-bit SIMD floating point instructions to the original SSE registers. The first addition made MMX almost obsolete, and the second allowed the instructions to be realistically targeted by conventional compilers.
Introduced in 2004 along with the Prescott revision of the Pentium 4 processor, SSE3 added specific memory and thread-handling instructions to boost the performance of Intel's HyperThreading technology. AMD licensed the SSE3 instruction set and implemented most of the SSE3 instructions for its revision E and later Athlon 64 processors. The Athlon 64 does not support HyperThreading and lacks those SSE3 instructions used only for HyperThreading.
64-bit Long mode
By 2002, it was obvious that the 32-bit address space of the x86 architecture was limiting its performance in applications requiring large data sets. A 32-bit address space would allow the processor to directly address only 4 GB of data, a size surpassed by applications such as video processing and database engines, while using the 64-bit address, one can directly address 16777216 TiB (more than 16 billion MB) of data, although most 64-bit architectures don't support access to the full 64-bit address space (AMD64, for example, supports only 48 bits, split into 4 paging levels, from a 64-bit address).
AMD, who would traditionally follow the lead of Intel, took the initiative of extending the 32-bit x86 architecture to 64-bit, initially calling it x86-64, later renaming it AMD64. The Opteron, Athlon 64, Turion 64, and later Sempron families of processors use this architecture. The success of the AMD64 line of processors coupled with the lukewarm reception of the IA-64 architecture prompted Intel to reverse-engineer and adopt the instruction set, adding new extensions of its own and branding it the EM64T architecture, and later re-branding it Intel 64.
In its literature and product version names, Microsoft and Sun refer to AMD64/Intel 64 collectively as x64 in the Windows and Solaris operating systems respectively. Linux distributions refer to it either as "x86-64", its variant "x86_64", or "amd64". BSD systems use "amd64" while Mac OS X uses "x86_64".
This was the first time that a major upgrade of the x86 architecture was initiated and originated by a manufacturer other than Intel. It was also the first time that Intel accepted technology of this nature from an outside source.
Virtualization
x86 virtualization is difficult because the architecture did not meet the Popek and Goldberg requirements until recently. Nevertheless, there are several commercial x86 virtualization products, such as VMware, Parallels and Microsoft Virtual PC, as well as open source virtualization projects such as Bochs, QEMU. Other solutions, such as the Kernel-based Virtual Machine ("KVM"), require newer processors which provide better hardware support for virtualization.
Intel and AMD have introduced x86 processors with hardware-based virtualization extensions that overcome the classical virtualization limitations of the x86 architecture. These extensions are known as Intel VT (IVT or simply VT) that was code named "Vanderpool," and AMD-V that was code named "Pacifica." Although most modern x86 server-based and many modern x86 desktop-based processors include these extensions, the technology is generally considered immature at this point with most software-based virtualization outperforming these extensions.[9] This is expected to change as the technology matures.
See also
- IA-32
- x86 assembly language
- x86 instruction listings
- x87
- Real mode — Unreal mode — Virtual 8086 mode — Protected mode — Long mode
- x86-64
- IA64
- Microarchitecture
- List of Intel microprocessors
- List of AMD microprocessors
- List of VIA microprocessors
- List of x86 manufacturers
Footnotes
- ^ Unlike the microarchitecture (and the specific electronical and physical implementation) used for a specific chip design
- ^ Intel abandoned its "x86" naming scheme with the Pentium in 1993 (as numbers could not be trademarked). However, the term x86 was already firmly established among technicians, compiler writers etc.
- ^ Intel's naming are IA-32 and Intel 64 (EM64T or IA-32e) for x86 and x86-64 respectively. Likewise, AMD today prefers AMD64 over the x86-64 name they once introduced.
- ^ "Linux* Kernel Compiling". Intel. Retrieved 2007-09-04.
- ^ "Intel Web page search result for "x64"". Retrieved 2007-09-04.
- ^ The embedded processor's market is populated by more than 20 different architectures, which, due to the price sensitivity, low power and hardware simplicity requirements, outnumber the x86.
- ^ "Microprocessor Hall of Fame". Intel. Retrieved 2007-08-11.
- ^ It had a slower Floating point unit however, which is slightly ironic as Cyrix started out as a designer of fast Floating point units for x86 processors.
- ^ A Comparison of Software and Hardware Techniques for x86 Virtualization
References
- Adams, Keith (2006-21-2006). "A Comparison of Software and Hardware Techniques for x86 Virtualization" (PDF). Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, USA, 2006. ACM 1-59593-451-0/06/0010. Retrieved 2006-12-22.
{{cite conference}}
: Check date values in:|date=
(help); Unknown parameter|booktitle=
ignored (|book-title=
suggested) (help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help)
- Rosenblum, Mendel (May, 2005). "Virtual machine monitors: current technology and future trends" (PDF). IEEE Computer, volume 38, issue 5.
{{cite conference}}
: Check date values in:|date=
(help); Unknown parameter|booktitle=
ignored (|book-title=
suggested) (help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help)