Jump to content

ARM architecture family: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
Line 1: Line 1:
{{redirect|ARM}}
{{redirect|ARM}}
{{technical|date=November 2012}}
{{Use British English|date=June 2012}}{{Use dmy dates|date=October 2012}}{{lead too short|date=July 2012}}
{{Use British English|date=June 2012}}{{Use dmy dates|date=October 2012}}{{lead too short|date=July 2012}}
{{Infobox CPU architecture
{{Infobox CPU architecture

Revision as of 03:09, 8 November 2012

ARM
The ARM logo
DesignerARM Holdings
Bits32-bit & 64-bit implementations
Introduced1985
VersionARMv8[1]
DesignRISC
TypeRegister-Register
EncodingFixed
BranchingCondition code
EndiannessBi (Little as default)
ExtensionsNEON, Thumb, Jazelle, VFP, A64
Registers
16/31[1]

An ARM processor is designed in accordance with the 32-bit RISC CPU architecture developed by British company ARM Holdings. ARM architecture has been in development since 1990 and is the most widely used 32-bit instruction set architecture in numbers produced.[2][3] ARM formerly stood for Advanced RISC Machine and before that Acorn RISC Machine.[citation needed]

The company ARM Holdings does not manufacture its own electronic chips, but assigns different licenses to semiconductor manufacturers. The number of these licensees and various advantages of the architecture (e.g. low energy demand) means that ARM chips are the most widely used chips in embedded systems: for example, almost all current smartphones and tablet computers have one or more licensed ARM processors.

Features and applications

In 2005 about 98% of the more than one billion mobile phones sold each year used at least one ARM processor.[4] As of 2009 ARM processors accounted for approximately 90% of all embedded 32-bit RISC processors[5] and were used extensively in consumer electronics, including personal digital assistants (PDAs), tablets, mobile phones, digital media and music players, hand-held game consoles, calculators and computer peripherals such as hard drives and routers.

Licensees

The ARM architecture is licensable. Companies that are current or former ARM licensees include Advanced Micro Devices, Inc.,[6] Alcatel-Lucent, Apple Inc., AppliedMicro, Atmel, Broadcom, Cirrus Logic, CSR plc, Digital Equipment Corporation, Ember, Energy Micro, Freescale, Fuzhou Rockchip, Intel (through DEC), LG, Marvell Technology Group, Microsemi, Microsoft, NEC, Nintendo, Nuvoton, Nvidia, NXP (formerly Philips Semiconductor), Oki, ON Semiconductor, Psion, Qualcomm, Renesas, Research In Motion, Samsung, Sharp, Silicon Labs, Sony, ST-Ericsson, STMicroelectronics, Symbios Logic, Texas Instruments, VLSI Technology, Yamaha, and ZiiLABS.

In addition to the abstract architecture, ARM offers several microprocessor core designs, including the ARM7, ARM9, ARM11, Cortex-A8, Cortex-A9, and Cortex-A15. Companies often license these designs from ARM to manufacture and integrate into their own system on a chip (SoC) with other components like RAM, GPUs, or radio basebands (for mobile phones).

System-on-chip packages integrating ARM's core designs include Nvidia Tegra's first three generations, CSR plc's Quatro family, ST-Ericsson's Nova and NovaThor, Silicon Labs's Precision32 MCU, Texas Instruments's OMAP products, Samsung's Hummingbird and Exynos products, Apple's A4, A5, A5X, A6 and A6X chips, and Freescale's i.MX.

Companies can also obtain an ARM architectural license for designing their own, different CPU cores using the ARM instruction set. Distinct ARM architecture implementations by licensees include Apple's A6, AppliedMicro's X-Gene, Qualcomm's Snapdragon and Krait, DEC's StrongARM, Marvell (formerly Intel) XScale, and Nvidia's planned Project Denver.

History

Originally conceived by Acorn Computers for use in its personal computers, the first ARM-based products were the co-processor modules for the BBC Micro series of computers. After achieving success with the BBC Micro computer, Acorn Computers Ltd considered how to move on from the relatively simple MOS Technology 6502 processor to address business markets like the one that would soon be dominated by the IBM PC, launched in 1981. The Acorn Business Computer (ABC) plan required a number of second processors to be made to work with the BBC Micro platform, but processors such as the Motorola 68000 and National Semiconductor 32016 were unsuitable, and the 6502 was not powerful enough for a graphics based user interface[7].

Acorn would need a new architecture, having tested all of the available processors and found them wanting. Acorn then seriously considered designing its own processor, and their engineers came across papers on the Berkeley RISC project. They felt it showed that if a class of graduate students could create a competitive 32-bit processor, then Acorn would have no problem. A trip to the Western Design Center in Phoenix, where the 6502 was being updated by what was effectively a single-person company, showed Acorn engineers Steve Furber[8] and Sophie Wilson that they did not need massive resources and state-of-the-art R&D facilities.

Wilson set about developing the instruction set, writing a simulation of the processor in BBC Basic that ran on a BBC Micro with a second 6502 processor. It convinced the Acorn engineers that they were on the right track. Before they could go any further, however, they would need more resources. It was time for Wilson to approach Acorn's CEO, Hermann Hauser, and explain what was afoot. Once the go-ahead had been given, a small team was put together to implement Wilson's model in hardware.

A Conexant ARM processor used mainly in routers

Acorn RISC Machine: ARM2

The official Acorn RISC Machine project started in October 1983. VLSI Technology, Inc was chosen as silicon partner, since it already supplied Acorn with ROMs and some custom chips. The design was led by Wilson and Furber, and was consciously designed with a similar efficiency ethos as the 6502.[9] It had a key design goal of achieving low-latency input/output (interrupt) handling like the 6502. The 6502's memory access architecture had allowed developers to produce fast machines without the use of costly direct memory access hardware. VLSI produced the first ARM silicon on 26 April 1985 – it worked the first time and came to be termed ARM1 by April 1985.[10] The first "real" production systems named ARM2 were available the following year.

The ARM1 second processor for the BBC Micro

Its first practical application was as a second processor to the BBC Micro, where it was used to develop the simulation software to finish work on the support chips (VIDC, IOC, MEMC) and to speed up the operation of the CAD software used in developing ARM2. Wilson subsequently rewrote BBC Basic in ARM assembly language, and the in-depth knowledge obtained from designing the instruction set allowed the code to be very dense, making ARM BBC Basic an extremely good test for any ARM emulator. The original aim of a principally ARM-based computer was achieved in 1987 with the release of the Acorn Archimedes.

In 1992 Acorn once more won the Queen's Award for Technology for the ARM.

The ARM2 featured a 32-bit data bus, a 26-bit address space and twenty-seven 32-bit registers. Program code had to lie within the first 64 Mbyte of the memory, as the program counter was limited to 24 bits because the top 6 and bottom 2 bits of the 32-bit register served as status flags. The ARM2 had a transistor count of just 30,000, compared to Motorola's six-year older 68000 model with 68,000. Much of this simplicity comes from not having microcode (which represents about one-quarter to one-third of the 68000) and, like most CPUs of the day, not including any cache. This simplicity led to its low power usage, while performing better than the Intel 80286.[11] A successor, ARM3, was produced with a 4 KB cache, which further improved performance.

Apple, DEC, Intel, Marvell: ARM6, StrongARM, XScale

In the late 1980s Apple Computer and VLSI Technology started working with Acorn on newer versions of the ARM core. The work was so important that Acorn spun off the design team in 1990 into a new company called Advanced RISC Machines Ltd. Advanced RISC Machines became ARM Ltd when its parent company, ARM Holdings plc, floated on the London Stock Exchange and NASDAQ in 1998.[12]

The new Apple-ARM work would eventually turn into the ARM6, first released in early 1992. Apple used the ARM6-based ARM 610 as the basis for their Apple Newton PDA. In 1994, Acorn used the ARM 610 as the main central processing unit (CPU) in their Risc PC computers. DEC licensed the ARM6 architecture and produced the StrongARM. At 233 MHz this CPU drew only one watt (more recent versions draw far less). This work was later passed to Intel as a part of a lawsuit settlement, and Intel took the opportunity to supplement their ageing i960 line with the StrongARM. Intel later developed its own high performance implementation named XScale which it has since sold to Marvell.

Licensing

The ARM core has remained largely the same size throughout these changes. ARM2 had 30,000 transistors, while the ARM6 grew only to 35,000. ARM's business has always been to sell IP cores, which licensees use to create microcontrollers and CPUs based on this core. The original design manufacturer combines the ARM core with a number of optional parts to produce a complete CPU, one that can be built on old semiconductor fabs and still deliver substantial performance at a low cost. The most successful implementation has been the ARM7TDMI with hundreds of millions sold. Atmel has been a precursor design center in the ARM7TDMI-based embedded system.

ARM licensed about 1.6 billion cores in 2005. In 2005, about 1 billion ARM cores went into mobile phones.[13] By January 2008, over 10 billion ARM cores had been built, and in 2008 iSuppli predicted that by 2011, 5 billion ARM cores will be shipping per year.[14] As of January 2011, ARM stated that over 15 billion ARM processors have shipped.[15]

The ARM architectures used in smartphones, personal digital assistants and other mobile devices range from ARMv5, in obsolete/low-end devices, to the ARM M-series, in current high-end devices. ARMv6 processors represented a step up in performance from standard ARMv5 cores, and are used in some cases, but Cortex processors (ARMv7) now provide faster and more power-efficient options than all those prior generations. ARMv7 also mandates a hardware floating point unit, which has ABI and performance impact.

In 2009, some manufacturers introduced netbooks based on ARM architecture CPUs, in direct competition with netbooks based on Intel Atom.[16] According to analyst firm IHS iSuppli, by 2015, ARM ICs are estimated to be in 23% of all laptops.[17]

In 2011, HiSilicon Technologies Co. Ltd. licensed a variety of ARM technology to be used in communications chip designs. These included 3G/4G basestations, networking infrastructure and mobile computing applications.[18]

ARM cores

Architecture Family
ARMv1 ARM1
ARMv2 ARM2, ARM3
ARMv3 ARM6, ARM7
ARMv4 StrongARM, ARM7TDMI, ARM9TDMI
ARMv5 ARM7EJ, ARM9E, ARM10E, XScale
ARMv6 ARM11, ARM Cortex-M
ARMv7 ARM Cortex-A, ARM Cortex-M, ARM Cortex-R
ARMv8 ARM Cortex-A50[19]

A summary of the numerous vendors who implement ARM cores in their design is provided by ARM.[20]

Example applications of ARM cores

ARM cores are used in a number of products, particularly various PDAs and smartphones. Some computing examples are the Acorn Archimedes, Apple iPad and ASUS Eee Pad Transformer. Some other uses are the Apple iPhone smartphone, iPod portable media player, Canon PowerShot A470 digital camera, Nintendo DS handheld games console and TomTom automotive navigation system.

Since 2005, ARM was also involved in Manchester University's computer, SpiNNaker, which used ARM cores to simulate the human brain.[21]

ARM chips are also used in Raspberry Pi, BeagleBoard, BeagleBone, PandaBoard, and other Single-board computers, because they are very small, very cheap and use little power.

Architecture

From 1995, the ARM Architecture Reference Manual has been the primary source of documentation on the ARM processor architecture and instruction set, distinguishing interfaces that all ARM processors are required to support (such as instruction semantics) from implementation details that may vary. The architecture has evolved over time, and starting with the Cortex series of cores, three "profiles" are defined:

  • "Application" profile: Cortex-A series
  • "Real-time" profile: Cortex-R series
  • "Microcontroller" profile: Cortex-M series.

Profiles are allowed to subset the architecture. For example, the ARMv6-M profile (used by the Cortex M0 / M0+ / M1) is a subset of the ARMv7-M profile (it supports fewer instructions).

CPU modes

The ARM architecture specifies the following CPU modes. At any moment in time, the CPU can be in only one mode, but it can switch modes due to external events (interrupts) or programmatically.

User mode
The only non-privileged mode.
System mode
The only privileged mode that is not entered by an exception. It can only be entered by executing an instruction that explicitly writes to the mode bits of the CPSR.
Supervisor (svc) mode
A privileged mode entered whenever the CPU is reset or when a SWI instruction is executed.
Abort mode
A privileged mode that is entered whenever a prefetch abort or data abort exception occurs.
Undefined mode
A privileged mode that is entered whenever an undefined instruction exception occurs.
Interrupt mode
A privileged mode that is entered whenever the processor accepts an IRQ interrupt.
Fast Interrupt mode
A privileged mode that is entered whenever the processor accepts an FIQ interrupt.
Hyp mode
A hypervisor mode introduced in armv-7a for cortex-A15 processor for providing hardware virtualization support.

Instruction set

To keep the design clean, simple and fast, the original ARM implementation was hardwired without microcode, like the much simpler 8-bit 6502 processor used in prior Acorn microcomputers.

The ARM architecture includes the following RISC features:

  • Load/store architecture.
  • No support for misaligned memory accesses (although now supported in ARMv6 cores, with some exceptions related to load/store multiple word instructions).
  • Uniform 16 × 32-bit register file.
  • Fixed instruction width of 32 bits to easy decoding and pipelining, at the cost of decreased code density. Later, the Thumb instruction set increased code density.
  • Mostly single clock-cycle execution.

To compensate for the simpler design, compared with contemporary (30 years ago) processors like the Intel 80286 and Motorola 68020, some additional design features were used:

  • Conditional execution of most instructions, reducing branch overhead and compensating for the lack of a branch predictor.
  • Arithmetic instructions alter condition codes only when desired.
  • 32-bit barrel shifter which can be used without performance penalty with most arithmetic instructions and address calculations.
  • Powerful indexed addressing modes.
  • A link register for fast leaf function calls.
  • Simple, but fast, 2-priority-level interrupt subsystem with switched register banks.

Arithmetic instructions

The ARM supports add, subtract, and multiply instructions. The integer divide instructions are only implemented by ARM cores based on the following ARM architectures:

  • ARMv7-M and ARMv7E-M architectures always includes divide instructions.[22]
  • ARMv7-R architecture always includes divide instructions in the Thumb instruction set, but optionally in the ARM instruction set.[23]
  • ARMv7-A architecture optionally includes the divide instructions. The instructions might not be implemented, or implemented only in the Thumb instruction set, or implemented in both the Thumb and ARM instructions sets, or implemented if the Virtualization Extensions are included.[23]

Registers

Registers R0-R7 are the same across all CPU modes; they are never banked.

R13 and R14 are banked across all privileged CPU modes except system mode. That is, each mode that can be entered because of an exception has its own R13 and R14. These registers generally contain the stack pointer and the return address from function calls, respectively.

Registers across CPU modes
usr sys svc abt und irq fiq
R0
R1
R2
R3
R4
R5
R6
R7
R8 R8_fiq
R9 R9_fiq
R10 R10_fiq
R11 R11_fiq
R12 R12_fiq
R13 R13_svc R13_abt R13_und R13_irq R13_fiq
R14 R14_svc R14_abt R14_und R14_irq R14_fiq
R15
CPSR
SPSR_svc SPSR_abt SPSR_und SPSR_irq SPSR_fiq

Aliases:

R13 is also referred to as SP, the Stack Pointer.

R14 is also referred to as LR, the Link Register.

R15 is also referred to as PC, the Program Counter.

Conditional execution

The conditional execution feature (called predication) is implemented with a 4-bit condition code selector (the predicate) on every instruction; one of the four-bit codes is reserved as an "escape code" to specify certain unconditional instructions, but nearly all common instructions are conditional. Most CPU architectures only have condition codes on branch instructions.

This cuts down significantly on the encoding bits available for displacements in memory access instructions, but on the other hand it avoids branch instructions when generating code for small if statements. The standard example of this is the subtraction-based Euclidean algorithm:

In the C programming language, the loop is:

    while(i != j) {
       if (i > j)
           i -= j;
       else
           j -= i;
    }

In ARM assembly, the loop is:

loop:   CMP  Ri, Rj         ; set condition "NE" if (i != j),
                            ;               "GT" if (i > j),
                            ;            or "LT" if (i < j)
        SUBGT  Ri, Ri, Rj   ; if "GT" (greater than), i = i-j;
        SUBLT  Rj, Rj, Ri   ; if "LT" (less than), j = j-i;
        BNE  loop           ; if "NE" (not equal), then loop

which avoids the branches around the then and else clauses. Note that if Ri and Rj are equal then neither of the SUB instructions will be executed, optimising out the need for a conditional branch to implement the while check at the top of the loop, for example had SUBLE (less than or equal) been used.

One of the ways that Thumb code provides a more dense encoding is to remove that four bit selector from non-branch instructions.

Other features

Another feature of the instruction set is the ability to fold shifts and rotates into the "data processing" (arithmetic, logical, and register-register move) instructions, so that, for example, the C statement

a += (j << 2);

could be rendered as a single-word, multi-cycle instruction on the ARM. [24]

ADD  Ra, Ra, Rj, LSL #2

This results in the typical ARM program being denser than expected with fewer memory accesses; thus the pipeline is used more efficiently.

The ARM processor also has some features rarely seen in other RISC architectures, such as PC-relative addressing (indeed, on the 32-bit[1] ARM the PC is one of its 16 registers) and pre- and post-increment addressing modes.

Another item of note is that the instruction set increased over time. Some early ARM processors (before ARM7TDMI), for example, have no instruction to store a two-byte quantity.

Pipelines and other implementation issues

The ARM7 and earlier implementations have a three stage pipeline; the stages being fetch, decode, and execute. Higher performance designs, such as the ARM9, have deeper pipelines: Cortex-A8 has thirteen stages. Additional implementation changes for higher performance include a faster adder, and more extensive branch prediction logic. The difference between the ARM7DI and ARM7DMI cores, for example, was an improved multiplier (hence the added "M").

Coprocessors

The ARM family of processors does not support or have any instructions similar to Intel x86's CPUID. There are, however, mechanisms for addressing coprocessors in the ARM architecture.

The ARM architecture provides a non-intrusive way of extending the instruction set using "coprocessors" which can be addressed using MCR, MRC, MRRC, MCRR, and similar instructions. The coprocessor space is divided logically into 16 coprocessors with numbers from 0 to 15, coprocessor 15 (cp15) being reserved for some typical control functions like managing the caches and MMU operation (on processors that have one).

In ARM-based machines, peripheral devices are usually attached to the processor by mapping their physical registers into ARM memory space or into the coprocessor space or connecting to another device (a bus) which in turn attaches to the processor. Coprocessor accesses have lower latency so some peripherals (for example an XScale interrupt controller) are designed to be accessible in both ways (through memory and through coprocessors).

In other cases, chip designers only integrate hardware using the coprocessor mechanism. For example, an image processing engine might be a small ARM7TDMI core combined with a coprocessor that has specialised operations to support a specific set of HDTV transcoding primitives.

Debugging

All modern ARM processors include hardware debugging facilities; without them, software debuggers could not perform basic operations like halting, stepping, and breakpointing of code starting from reset. These facilities are built using JTAG support, though some newer cores optionally support ARM's own two-wire "SWD" protocol. In ARM7TDMI cores, the "D" represented JTAG debug support, and the "I" represented presence of an "EmbeddedICE" debug module. For ARM7 and ARM9 core generations, EmbeddedICE over JTAG was a de facto debug standard, although it was not architecturally guaranteed.

The ARMv7 architecture defines basic debug facilities at an architectural level. These include breakpoints, watchpoints, and instruction execution in a "Debug Mode"; similar facilities were also available with EmbeddedICE. Both "halt mode" and "monitor" mode debugging are supported. The actual transport mechanism used to access the debug facilities is not architecturally specified, but implementations generally include JTAG support.

There is a separate ARM "CoreSight" debug architecture, which is not architecturally required by ARMv7 processors.

DSP enhancement instructions

To improve the ARM architecture for digital signal processing and multimedia applications, a few new instructions were added to the set.[25] These are signified by an "E" in the name of the ARMv5TE and ARMv5TEJ architectures. E-variants also imply T,D,M and I.

The new instructions are common in digital signal processor architectures. They are variations on signed multiply–accumulate, saturated add and subtract, and count leading zeros.

Jazelle

Jazelle DBX (Direct Bytecode eXecution) is a technique that allows Java Bytecode to be executed directly in the ARM architecture as a third execution state (and instruction set) alongside the existing ARM and Thumb-mode. Support for this state is signified by the "J" in the ARMv5TEJ architecture, and in ARM9EJ-S and ARM7EJ-S core names. Support for this state is required starting in ARMv6 (except for the ARMv7-M profile), although newer cores only include a trivial implementation that provides no hardware acceleration.

Thumb

To improve compiled code-density, processors since the ARM7TDMI (released in 1994[26]) have featured Thumb instruction set, which have their own state. (The "T" in "TDMI" indicates the Thumb feature.) When in this state, the processor executes the Thumb instruction set, a compact 16-bit encoding for a subset of the ARM instruction set.[27] Most of the Thumb instructions are directly mapped to normal ARM instructions. The space-saving comes from making some of the instruction operands implicit and limiting the number of possibilities compared to the ARM instructions executed in the ARM instruction set state.

In Thumb, the 16-bit opcodes have less functionality. For example, only branches can be conditional, and many opcodes are restricted to accessing only half of all of the CPU's general purpose registers. The shorter opcodes give improved code density overall, even though some operations require extra instructions. In situations where the memory port or bus width is constrained to less than 32 bits, the shorter Thumb opcodes allow increased performance compared with 32-bit ARM code, as less program code may need to be loaded into the processor over the constrained memory bandwidth.

Embedded hardware, such as the Game Boy Advance, typically have a small amount of RAM accessible with a full 32-bit datapath; the majority is accessed via a 16 bit or narrower secondary datapath. In this situation, it usually makes sense to compile Thumb code and hand-optimise a few of the most CPU-intensive sections using full 32-bit ARM instructions, placing these wider instructions into the 32-bit bus accessible memory.

The first processor with a Thumb instruction decoder was the ARM7TDMI. All ARM9 and later families, including XScale, have included a Thumb instruction decoder.

Thumb-2

Thumb-2 technology made its debut in the ARM1156 core, announced in 2003. Thumb-2 extends the limited 16-bit instruction set of Thumb with additional 32-bit instructions to give the instruction set more breadth, thus producing a variable-length instruction set. A stated aim for Thumb-2 is to achieve code density similar to Thumb with performance similar to the ARM instruction set on 32-bit memory. In ARMv7 this goal can be said to have been met.[citation needed]

Thumb-2 extends both the ARM and Thumb instruction set with yet more instructions, including bit-field manipulation, table branches, and conditional execution. A new "Unified Assembly Language" (UAL) supports generation of either Thumb-2 or ARM instructions from the same source code; versions of Thumb seen on ARMv7 processors are essentially as capable as ARM code (including the ability to write interrupt handlers). This requires a bit of care, and use of a new "IT" (if-then) instruction, which permits up to four successive instructions to execute based on a tested condition. When compiling into ARM code this is ignored, but when compiling into Thumb-2 it generates an actual instruction. For example:

; if (r0 == r1)
CMP r0, r1
ITE EQ        ; ARM: no code ... Thumb: IT instruction
; then r0 = r2;
MOVEQ r0, r2  ; ARM: conditional; Thumb: condition via ITE 'T' (then)
; else r0 = r3;
MOVNE r0, r3  ; ARM: conditional; Thumb: condition via ITE 'E' (else)
; recall that the Thumb MOV instruction has no bits to encode "EQ" or "NE"

All ARMv7 chips support the Thumb-2 instruction set. Other chips in the Cortex and ARM11 series support both "ARM instruction set state" and "Thumb-2 instruction set state".[28][29][30]

Thumb Execution Environment (ThumbEE)

ThumbEE, also termed Thumb-2EE, and marketed as Jazelle RCT (Runtime Compilation Target), was announced in 2005, first appearing in the Cortex-A8 processor. ThumbEE is a fourth processor mode, making small changes to the Thumb-2 extended Thumb instruction set. These changes make the instruction set particularly suited to code generated at runtime (e.g. by JIT compilation) in managed Execution Environments. ThumbEE is a target for languages such as Java, C#, Perl, and Python, and allows JIT compilers to output smaller compiled code without impacting performance.

New features provided by ThumbEE include automatic null pointer checks on every load and store instruction, an instruction to perform an array bounds check, access to registers r8-r15 (where the Jazelle/DBX Java VM state is held), and special instructions that call a handler.[31] Handlers are small sections of frequently called code, commonly used to implement a feature of a high level language, such as allocating memory for a new object. These changes come from repurposing a handful of opcodes, and knowing the core is in the new ThumbEE mode.

Floating-point (VFP)

VFP (Vector Floating Point) technology is an FPU coprocessor extension to the ARM architecture. It provides low-cost single-precision and double-precision floating-point computation fully compliant with the ANSI/IEEE Std 754-1985 Standard for Binary Floating-Point Arithmetic. VFP provides floating-point computation suitable for a wide spectrum of applications such as PDAs, smartphones, voice compression and decompression, three-dimensional graphics and digital audio, printers, set-top boxes, and automotive applications. The VFP architecture was intended to support execution of short "vector mode" instructions but these operated on each vector element sequentially and thus did not offer the performance of true single instruction, multiple data (SIMD) vector parallelism. This vector mode was therefore removed shortly after its introduction,[32] to be replaced with the much more powerful NEON Advanced SIMD unit.

Some devices such as the ARM Cortex-A8 have a cut-down VFPLite module instead of a full VFP module, and require roughly ten times more clock cycles per float operation.[33] Other floating-point and/or SIMD coprocessors found in ARM-based processors include FPA, FPE, iwMMXt. They provide some of the same functionality as VFP but are not opcode-compatible with it.

Advanced SIMD (NEON)

The Advanced SIMD extension (aka NEON or "MPE" Media Processing Engine) is a combined 64- and 128-bit single instruction multiple data (SIMD) instruction set that provides standardised acceleration for media and signal processing applications. NEON is included in all Cortex-A8 devices but is optional in Cortex-A9 devices.[34] NEON can execute MP3 audio decoding on CPUs running at 10 MHz and can run the GSM adaptive multi-rate (AMR) speech codec at no more than 13 MHz. It features a comprehensive instruction set, separate register files and independent execution hardware.[35] NEON supports 8-, 16-, 32- and 64-bit integer and single-precision (32-bit) floating-point data and SIMD operations for handling audio and video processing as well as graphics and gaming processing. In NEON, the SIMD supports up to 16 operations at the same time. The NEON hardware shares the same floating-point registers as used in VFP. Devices such as the ARM Cortex-A8 and Cortex-A9 support 128-bit vectors but will execute with just 64 bits at a time,[33] whereas newer Cortex-A15 devices can execute 128 bits at once.

Security Extensions (TrustZone)

The Security Extensions, marketed as TrustZone Technology, is found in ARMv6KZ and later application profile architectures. It provides a low cost alternative to adding an additional dedicated security core to an SoC, by providing two virtual processors backed by hardware based access control. This enables the application core to switch between two states, referred to as worlds (to reduce confusion with other names for capability domains), in order to prevent information from leaking from the more trusted world to the less trusted world. This world switch is generally orthogonal to all other capabilities of the processor, thus each world can operate independently of the other while using the same core. Memory and peripherals are then made aware of the operating world of the core and may use this to provide access control to secrets and code on the device.

Typical applications of TrustZone Technology are to run a rich operating system in the less trusted world, and smaller security-specialized code in the more trusted world (named TrustZone Software, a TrustZone optimised version of the Trusted Foundations Software developed by Trusted Logic), allowing much tighter digital rights management for controlling the use of media on ARM-based devices,[36] and preventing any unapproved use of the device. Open Virtualization is an open source implementation of the trusted world architecture for TrustZone.

In practice, since the specific implementation details of TrustZone are proprietary and have not been publicly disclosed for review, it is unclear what level of assurance is provided for a given threat model.

No-execute page protection

As of ARMv6, the ARM architecture supports no-execute page protection, which is referred to as XN, for eXecute Never.[37]

ARMv8 and 64-bit

Released in late 2011, ARMv8 represents the first fundamental change to the ARM architecture. It adds a 64-bit architecture, dubbed 'AArch64', and a new 'A64' instruction set. Within the context of ARMv8, the 32-bit architecture and instruction set are referred to as 'AArch32' and 'A32', respectively. The Thumb instruction sets are referred to as 'T32' and have no 64-bit counterpart. ARMv8 allows 32-bit applications to be executed in a 64-bit OS, and for a 32-bit OS to be under the control of a 64-bit hypervisor.[1] Applied Micro, AMD, Broadcom, Calxeda, HiSilicon, Samsung, ST Microelectronics and other companies have announced implementation plans.[38][39][40][41] ARM announced their Cortex-A53 and Cortex-A57 cores on 30 October 2012.[19]

To both AArch32 and AArch64, ARMv8 makes VFPv3/v4 and advanced SIMD (NEON) standard. It also adds cryptography instructions supporting AES and SHA-1/SHA-256.

AArch64 features:

  • New instruction set, A64
    • 31 general-purpose 64-bit registers
    • Instructions are still 32 bits long and mostly the same as A32
    • Most instructions can take 32-bit or 64-bit arguments
    • Addresses assumed to be 64-bit
  • Advanced SIMD (NEON) enhanced
    • Has 32 × 128-bit registers (up from 16), also accessible via VFPv4
    • Supports double-precision floating point
    • Fully IEEE 754 compliant
    • AES encrypt/decrypt and SHA-1/SHA-2 hashing instructions also use these registers
  • A new exception system
    • Fewer banked registers and modes
  • Memory translation from 48-bit virtual addresses based on the existing LPAE, which was designed to be easily extended to 64-bit

OS support:

  • Linux - patches adding ARMv8 support have been posted for review by Catalin Marinas of ARM ltd. The patches will be included in Linux kernel version 3.7.[42]

ARM licensees

ARM Ltd does not manufacture or sell CPU devices based on its own designs, but rather, licenses the processor architecture to interested parties. ARM offers a variety of licensing terms, varying in cost and deliverables. To all licensees, ARM provides an integratable hardware description of the ARM core, as well as complete software development toolset (compiler, debugger, SDK), and the right to sell manufactured silicon containing the ARM CPU.

Fabless licensees, who wish to integrate an ARM core into their own chip design, are usually only interested in acquiring a ready-to-manufacture verified IP core. For these customers, ARM delivers a gate netlist description of the chosen ARM core, along with an abstracted simulation model and test programs to aid design integration and verification. More ambitious customers, including integrated device manufacturers (IDM) and foundry operators, choose to acquire the processor IP in synthesizable RTL (Verilog) form. With the synthesizable RTL, the customer has the ability to perform architectural level optimisations and extensions. This allows the designer to achieve exotic design goals not otherwise possible with an unmodified netlist (high clock speed, very low power consumption, instruction set extensions, etc.). While ARM does not grant the licensee the right to resell the ARM architecture itself, licensees may freely sell manufactured product (chip devices, evaluation boards, complete systems, etc.). Merchant foundries can be a special case; not only are they allowed to sell finished silicon containing ARM cores, they generally hold the right to re-manufacture ARM cores for other customers.

Like most IP vendors, ARM prices its IP based on perceived value. In architectural terms, lower performing ARM cores command lower license costs than higher performing cores. In implementation terms, a synthesizable core costs more than a hard macro (blackbox) core. Complicating price matters, a merchant foundry which holds an ARM license (such as Samsung and Fujitsu) can offer reduced licensing costs to its fab customers. In exchange for acquiring the ARM core through the foundry's in-house design services, the customer can reduce or eliminate payment of ARM's upfront license fee. Compared to dedicated semiconductor foundries (such as TSMC and UMC) without in-house design services, Fujitsu/Samsung charge 2 to 3 times more per manufactured wafer. For low to mid volume applications, a design service foundry offers lower overall pricing (through subsidisation of the license fee). For high volume mass produced parts, the long term cost reduction achievable through lower wafer pricing reduces the impact of ARM's NRE (Non-Recurring Engineering) costs, making the dedicated foundry a better choice.

Many semiconductor or IC design firms hold ARM licenses: Analog Devices, AppliedMicro, Atmel, Broadcom, Cirrus Logic, Energy Micro, Faraday Technology, Freescale, Fujitsu, Intel (through its settlement with Digital Equipment Corporation), IBM, Infineon Technologies, Marvell Technology Group, Nintendo, Nvidia, NXP Semiconductors, OKI, Qualcomm, Samsung, Sharp, STMicroelectronics, and Texas Instruments are some of the many companies who have licensed the ARM in one form or another.

Approximate licensing costs

ARM's 2006 annual report and accounts state that royalties totalling £88.7 million ($164.1 million) were the result of licensees shipping 2.45 billion units.[43] This is equivalent to £0.036 ($0.067) per unit shipped. However, this is averaged across all cores, including expensive new cores and inexpensive older cores.

In the same year ARM's licensing revenues for processor cores were £65.2 million (US$119.5 million),[44] in a year when 65 processor licenses were signed,[45] an average of £1 million ($1.84 million) per license. Again, this is averaged across both new and old cores.

Given that ARM's 2006 income from processor cores was approximately 60% from royalties and 40% from licenses, ARM makes the equivalent of £0.06 ($0.11) per unit shipped including both royalties and licenses. However, as one-off licenses are typically bought for new technologies, unit sales (and hence royalties) are dominated by more established products. Hence, the figures above do not reflect the true costs of any single ARM product.

Operating systems

Android, a popular[citation needed] operating system running on the ARM architecture

Acorn systems

The very first ARM-based Acorn Archimedes personal computers ran an interim operating system called Arthur, which evolved into RISC OS, used on later ARM-based systems from Acorn and other vendors.

Embedded operating systems

The ARM architecture is supported by a large number of embedded and real-time operating systems, including Windows CE, Windows RT, Symbian, ChibiOS/RT, FreeRTOS, eCos, Integrity, Nucleus PLUS, MicroC/OS-II, QNX, RTEMS, CoOS, BRTOS, RTXC Quadros, ThreadX, Unison Operating System, uTasker, VxWorks, MQX and OSE.[46]

Unix

The ARM architecture is supported by Unix and Unix-like operating systems such as:

Windows

The ARM ARMv 5, 6 and 7 architecture is supported by Windows CE 5, and with it the OS building on it: Windows Embedded Compact, Windows Mobile. The smaller Microsoft OS .NET Microframework uses ARM exclusively.

Windows 8 and Windows Phone will support ARMv 7. Microsoft demonstrated a preliminary version of Windows (version 6.2.7867) running on an ARM-based computer at the 2011 Consumer Electronics Show.[71] In December 2011, Microsoft released a document about hardware certification of OEM products, Windows Hardware Certification Requirements which confirms that they intend to ban the possibility of installing alternative operating systems on ARM-based devices running Windows 8. The document insists that they will require x86 and x86-64 devices to have the Secure UEFI enabled. They allow for the possibility that a custom secure boot mode could be enabled providing to the user the ability to add signatures. However, they intend that going to custom secure boot mode or disabling secure boot mode on ARM devices will not be compatible with running Windows.

In June 2012, Nvidia confirmed that their ARM based Tegra SoC would be powering the non-Pro version of Microsoft's Surface tablet computer.[72] This model will be running Windows RT, a version of Windows 8 developed specifically for ARM devices.

The ReactOS project, a clean room reverse engineered open source implementation of Windows NT 5.x, also has an ARM port in development.

See also

References

  1. ^ a b c d Grisenthwaite, Richard (2011). "ARMv8 Technology Preview" (PDF). Retrieved 31 October 2011.
  2. ^ "ARM Cores Climb Into 3G Territory" by Mark Hachman, 2002.
  3. ^ "The Two Percent Solution" by Jim Turley 2002.
  4. ^ "ARMed for the living room".
  5. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/1941487.1941501, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/1941487.1941501 instead.
  6. ^ "AMD Strengthens Security Solutions through Technology Partnership with ARM ".
  7. ^ Manners, David (29 April 1998). "ARM's way". Electronics Weekly. Retrieved 26 October 2012.
  8. ^ Furber, Stephen B. (2000). ARM system-on-chip architecture. Boston: Addison-Wesley. ISBN 0-201-67519-6.
  9. ^ Goodwins, Rupert (4 December 2010). "Intel's victims: Eight would-be giant killers". ZDNet. Retrieved 7 March 2012.
  10. ^ "Some facts about the Acorn RISC Machine" Roger Wilson posting to comp.arch, 2 November 1988. Retrieved 25 May 2007.
  11. ^ Patterson, Jason. The Acorn Archimedes", The History Of Computers During My Lifetime – The 1980s by (Retrieved 12 March 2008)].
  12. ^ "ARM Corporate Backgrounder", ARM Technology.
  13. ^ "ARMed for the living room" by Tom Krazit 2006.
  14. ^ "ARM Achieves 10 Billion Processor Milestone", ARM Technology, 22 January 2008.
  15. ^ "Company Profile – ARM", ARM Company Profile, 11 January 2011.
  16. ^ "ARM netbook ships with detachable tablet" by Eric Brown 2009
  17. ^ Dylan McGrath, EE Times. "IHS: ARM ICs to be in 23% of laptops in 2015." 18 July 2011. Retrieved 20 July 2011.
  18. ^ Peter Clarke, EE Times. "HiSilicon extends ARM licenses for 3G/4G." 2 August 2011. Retrieved 2 August 2011.
  19. ^ a b "ARM Launches Cortex-A50 Series, the World's Most Energy-Efficient 64-bit Processors" (Press release). ARM Holdings. Retrieved 31 October 2012.
  20. ^ "Line Card" (PDF). 2003. Retrieved 1 October 2012.
  21. ^ Parrish, Kevin (14 July 2011). "One Million ARM Cores Linked to Simulate Brain". EE Times. Retrieved 2 August 2011.
  22. ^ ARMv7-M Architecture Reference Manual; ARM Holdings.
  23. ^ a b ARMv7-A and ARMv7-R Architecture Reference Manual; ARM Holdings.
  24. ^ "9.1.2. Instruction cycle counts".
  25. ^ "ARM DSP Instruction Set Extensions". Arm.com. Archived from the original on 14 April 2009. Retrieved 18 April 2009. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  26. ^ ARM7TDMI Technical Reference Manual page ii
  27. ^ Jaggar, Dave (1996). ARM Architecture Reference Manual. Prentice Hall. pp. 6–1. ISBN 978-0-13-736299-8.
  28. ^ "ARM Processor Instruction Set Architecture". Arm.com. Archived from the original on 15 April 2009. Retrieved 18 April 2009. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  29. ^ "ARM aims son of Thumb at uCs, ASSPs, SoCs". Linuxdevices.com. Retrieved 18 April 2009.
  30. ^ "ARM Information Center". Infocenter.arm.com. Retrieved 18 April 2009.
  31. ^ "Arm strengthens Java compilers: New 16-Bit Thumb-2EE Instructions Conserve System Memory" by Tom R. Halfhill 2005.
  32. ^ "VFP directives and vector notation". Arm.com. Retrieved 21 November 2011.
  33. ^ a b "Differences between ARM Cortex-A8 and Cortex-A9". Shervin Emami. Retrieved 21 November 2011.
  34. ^ "Cortex-A9 Processor". Arm.com. Retrieved 21 November 2011.
  35. ^ "About the Cortex-A9 NEON MPE". Arm.com. Retrieved 21 November 2011.
  36. ^ "ARM Announces Availability of Mobile Consumer DRM Software Solutions Based on ARM T". News.thomasnet.com. Retrieved 18 April 2009.
  37. ^ "APX and XN (execute never) bits have been added in VMSAv6 [Virtual Memory System Architecture]", ARM Architecture Reference Manual. Retrieved 2009/12/01.
  38. ^ Anand Lal Shimpi (11/14/2011). "Applied Micro's X-Gene: The First ARMv8 SoC". AnandTech. Retrieved 2012-10-31. {{cite web}}: Check date values in: |date= (help)
  39. ^ Lawrence Latif (Oct 30 2012). "AMD says ARM based Opteron chips will appear in 2014". The Inquirer. Retrieved 2012-10-31. {{cite web}}: Check date values in: |date= (help)
  40. ^ Anand Lal Shimpi. "AMD Will Build 64-bit ARM based Opteron CPUs for Servers, Production in 2014". AnandTech. Retrieved 31 October 2012.
  41. ^ ARM Keynote: ARM Cortex-A53 and ARM Cortex-A57 64bit ARMv8 processors launched on armdevices.net
  42. ^ Linus Torvalds (1 October 2012). "Re: [GIT PULL] arm64: Linux kernel port". Linux kernel mailing list. Retrieved 2 October 2012.
  43. ^ "Business review/Financial review/IFRS", p. 10, ARM annual report and accounts, 2006. Retrieved 7 May 2007.
  44. ^ Based on total £110.6 million ($202.5 million) divided by "License revenues by product"; "Business review/Financial review/IFRS" and "Key performance indicators" respectively, p. 10 / p. 3 ARM annual report and accounts, 2006. Retrieved 7 May 2007.
  45. ^ "Key performance indicators", p. 3, ARM annual report and accounts, 2006. Retrieved 7 May 2007.
  46. ^ "Software Enablement". arm.com. ARM Ltd. Archived from the original on 16 November 2010. Retrieved 18 November 2010. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  47. ^ "Android Source Code". Archived from the original on 7 July 2011. Retrieved 1 July 2011. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  48. ^ "Arch Linux Arm". Retrieved 12 November 2011.
  49. ^ http://www.angstrom-distribution.org/
  50. ^ Womack, Brian (8 July 2009). "Google to Challenge Microsoft With Operating System". Bloomberg. Retrieved 8 July 2009.
  51. ^ "ARM Port". Retrieved 24 July 2012.
  52. ^ "ELinOS supported boards". Archived from the original on 18 April 2010. Retrieved 22 April 2010. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  53. ^ "Architectures/ARM". Retrieved 1 June 2009.
  54. ^ "Gentoo Linux ARM Development". Retrieved 1 June 2009.
  55. ^ "New release for ARM cpus". 25 January 2007. Retrieved 17 September 2009.
  56. ^ "Mer ARM builds". Retrieved 27 November 2011.
  57. ^ "Platform Support for MontaVista Linux". Archived from the original on 18 January 2010. Retrieved 16 February 2010. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  58. ^ "PuppyLinux". Retrieved 2 May 2012.
  59. ^ "RedSleeve". Retrieved 28 March 2012.
  60. ^ "Slackware Linux for ARM". Archived from the original on 9 June 2009. Retrieved 1 June 2009. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  61. ^ "TimeSys". Retrieved 30 September 2011.
  62. ^ "Ubuntu on Arm". Canonical Ltd. 2009. Archived from the original on 20 May 2009. Retrieved 15 June 2009. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  63. ^ "ARM". Retrieved 1 June 2009.
  64. ^ "Building for the Devices". Retrieved September 2012. {{cite web}}: Check date values in: |accessdate= (help)
  65. ^ "Porting webOS to a touchscreen ARM netbook (or convertible tablet)". Retrieved September 2012. {{cite web}}: Check date values in: |accessdate= (help)
  66. ^ "Wind River – Board Support Packages". Archived from the original on 5 February 2010. Retrieved 16 February 2010. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  67. ^ "openSUSE 12.2 for ARM Final!". Retrieved 6 November 2012.
  68. ^ "FreeBSD/ARM Project". Archived from the original on 27 April 2009. Retrieved 1 June 2009. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  69. ^ "Hardware supported by NetBSD". Archived from the original on 10 June 2009. Retrieved 1 June 2009. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  70. ^ "OpenBSD/armish". Archived from the original on 9 June 2009. Retrieved 1 June 2009. {{cite web}}: Unknown parameter |deadurl= ignored (|url-status= suggested) (help)
  71. ^ Microsoft demonstrates early build of Windows 8
  72. ^ "Nvidia confirms Tegra under the hood of Microsoft Surface Win RT edition". moneycontrol.com. Retrieved 19 June 2012.

Further reading

  • The Definitive Guide to the ARM Cortex-M3; 2nd Edition; Joseph Yiu; Newnes; 479 pages; 2009; ISBN 978-1-85617-963-8. (Online Sample)
  • The Definitive Guide to the ARM Cortex-M0; 2nd Edition; Joseph Yiu; Newnes; 552 pages; 2011; ISBN 978-0-12-385477-3. (Online Sample)