Jump to content

Cell (processor): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Architecture: I dont see why this is false color at all
Krappie (talk | contribs)
m Open source software development: wikify the gnu toolchain, also make it one word
Line 79: Line 79:
Both PPE and SPEs are programmable in C/C++ using a common API provided by libraries. According to Sony, a compiler, debugger, IDE, performance analyzer, and Cell emulator should be made available soon. {{ref|scei}} IBM has developed a pseudo-filesystem for [[Linux]] coined "Spufs" that simplifies access to and use of the SPE resources.
Both PPE and SPEs are programmable in C/C++ using a common API provided by libraries. According to Sony, a compiler, debugger, IDE, performance analyzer, and Cell emulator should be made available soon. {{ref|scei}} IBM has developed a pseudo-filesystem for [[Linux]] coined "Spufs" that simplifies access to and use of the SPE resources.


IBM is currently maintaining the Linux kernel and [[GDB]] ports, while Sony maintains the GNU tool chain ([[GNU Compiler Collection|GCC]], [[GNU Binutils|binutils]]). {{ref|bergmann}}.
IBM is currently maintaining the Linux kernel and [[GDB]] ports, while Sony maintains the [[GNU toolchain]] ([[GNU Compiler Collection|GCC]], [[GNU Binutils|binutils]]). {{ref|bergmann}}.


== Acronyms ==
== Acronyms ==

Revision as of 08:51, 27 September 2005

File:Cellchip.jpg
(IBM Microelectronics) Cell microprocessor

Cell is a microprocessor jointly developed by IBM, Toshiba and Sony. The Cell architecture is intended to be scalable through the use of vector processing. The first major commercial application of Cell is in Sony's upcoming PlayStation 3 game console.

History

In 2000, Sony Computer Entertainment Inc., Toshiba Corp., and IBM formed an alliance ("STI") to design and build the processor. The STI Design Center in Austin, Texas opened in March 2001. [1] The Cell was designed over a period of four years, using enhanced versions of the design tools for the POWER4 processor. Over 400 engineers from the three companies worked together in Austin, with critical support from eleven of IBM's design centers. [2]

On May 17, 2005, Sony Computer Entertainment confirmed the spec of the Cell processor that would be shipping in the forthcoming PlayStation 3 console. This Cell will have one POWER processing element (PPE) on the core, with seven SPEs and one SPE reserved for redundancy (to help increase manufacturing yield). It will be clocked at 3.2 GHz, although in lab conditions the processor apparently has been clocked successfully up to 5.2 GHz. The chips are being fabricated using a 90 nanometre SOI process, at IBM's facility in East Fishkill, New York. Full production may switch at some later date to use a 65 nm or 45 nm process jointly developed by IBM and Toshiba at their Nagasaki fabrication plant.

Architecture

File:Cell.JPG
Cell's die

While the Cell chip can have a number of different configurations, the basic configuration is composed of one "Power Processor Element" ("PPE") (sometimes called "Processing Element", or "PE"), and multiple "Synergistic Processing Elements" ("SPE") [3]. The PPEs and SPEs are linked together by an internal high speed bus dubbed "Element Interconnect Bus" ("EIB"). Due to the nature of its applications, Cell is optimized towards single precision floating point computation though it can still perform more general purpose computing tasks due to its PPE.

Power Processor Element

The PPE is based on the POWER Architecture, which is the basis of IBM's line of POWER and PowerPC offerings. The PPE is not intended as the primary processor for the system, but acts as a controller for the other eight SPEs, which handle most of the computational workload. However, the PPE is used to run conventional OSes due to its similarity to other 64-bit PowerPC processors, and because the SPEs are designed for vectorized floating point code execution. The PPE contains a 32 KiB instruction and data Level 1 cache and a 512 KiB Level 2 cache. Additionally, IBM has included a VMX (AltiVec) unit in the Cell PPE. [4]

Synergistic Processing Elements (SPE)

Each SPE is composed of a "Synergistic Processing Unit" ("SPU"), and a SMF unit (DMA, MMU, and bus interface). [5] A SPE is a RISC processor with 128-bit SIMD organization [6] for single and double precision instructions. Each SPE contains a 256 KiB instruction and data local memory area (called "local store") which is visible to the PPE and can be addressed directly by software. The local store does not operate like a superscalar CPU cache since it is neither transparent to software nor does it contain hardware structures that predict what data to load. The SPEs contain a 128 × 128 register file and measure 14.5 mm² on a 90 nm process.

In one typical usage scenario, the system will load the SPEs with small programs (similar to threads), chaining the SPEs together to handle each step in a complex operation. For instance, a set-top box might load programs for reading a DVD, video and audio decoding, and display, and the data would be passed off from SPE to SPE until finally ending up on the TV. Another possibility is to partition the input data set and have several SPEs performing the same kind of operation in parallel. At 4 GHz, each SPE gives a theoretical 32 GFLOPS of performance. Performance of the PPE's VMX unit is unclear, but should be around 32 GFLOPS in addition to the SPEs.

In comparison to a modern personal computer, the comparatively high overall floating point performance of the Cell processor seemingly dwarfs the capabilities of the SIMD unit in desktop CPUs like the Pentium 4 and the Athlon 64. However, it should be noted that comparing only the floating point capabilities of the system is a single-dimensional and application-specific metric. Unlike the Cell processor, the aforementioned desktop CPUs are more suited to the general purpose software one might run on a personal computer.

Element Interconnect Bus (EIB)

The EIB is a circular bus made of two channels in opposite directions. It enables communication between the PPE and SPEs. It is also connected to the L2 cache, the memory controller, and the FlexIO for external communications. [7]

Memory controller and I/O

Cell contains a dual channel next-generation Rambus XDRcontroller that is incorporated on-die. Using 32-bit wide data busses and two channels, the overall peak memory bandwidth is 25.6 GB/s (2 channels × 2 devices per channel × 2 bytes per device × 3.2 GHz). The system interface used in Cell, also a Rambus design, is known as FlexIO. The FlexIO interface is organized into 12 "lanes," each lane being a unidirectional 8-bit wide point-to-point path. Five of these are inbound lanes to Cell, while the remaining seven are outbound. This provides a theoretical peak bandwidth of 76.8 GB/s (44.8 GB/s outbound, 32 GB/s inbound).

Broadband Engine

Much less information is available about the 'broadband engine', most coming from patent applications. It is believed that Cell allows for multiple processing cores to be put onto one die, and the patent shows four cores on one die. The companies designing Cell have claimed that they intend to scale the processor for various uses, both low-end and high-end, by varying the number of cores on the chip, the number of units in a single core, and by linking multiple chips to each other via network or memory bus.

Figures on current implementation

The following figures reflect estimates of performance of the only current implementation of Cell, designed for the PlayStation 3 game console:

  • 256 GFLOPS in single-precision operations @4Ghz. [8]
  • 25-30 GFLOPS in double-precision operations @4Ghz. [9] (probably 26 [10])
  • 234 million transistors [11]
  • 1 PPE with 32 KiB instruction and data L1 cache, and 512 KiB data L2 cache [12]
  • 8 SPEs with 256 KiB local memory each [13]
  • 0.9 - 1.3 V nominal supply voltage [14]
  • 10 digital thermal sensors [15]
  • 5 power management states (Dynamic Power Management) [16]
  • 221 mm² die (90 nm process) [17]
  • Power consumption is unknown yet, MacWorld suggests 30 W, [18] other sources speculate 50-80 W or more. [19] Electronic Design notes that each SPE consumes about 2 W at 3 GHz, and 4 W at 4 GHz. Combining the eight SPEs, the PPE, and other logic, the CELL processor will dissipate close to 30 W at 3 GHz, and perhaps double that at 4 GHz. [20]

Architecture compared

In some ways the Cell system resembles early Seymour Cray designs in reverse. The famed CDC 6600 used a single very fast processor to handle the mathematical calculations, while a series of ten slower systems were given smaller programs to keep the main memory fed with data. In the Cell the problem has been reversed: reading the data is no longer the difficult problem due to the complex encodings used in industry; today the problem is efficiently decoding that data into an ever-less-compressed version as quickly as possible.

In other ways the Cell resembles a modern desktop computer on a single chip.

Modern graphics cards have multiple elements very similar to the SPE's, known as vertex shader units, with an attached high speed memory. Programs, known as shaders, are loaded onto the units to process the basic geometry fed from the computer's CPU, apply styles and display it.

The main differences are that the Cell's SPEs are much more general purpose than shader units, and the ability to chain the SPEs under program control offers considerably more flexibility, allowing the Cell to handle graphics, sound, or anything else.

Possible applications

Blade server

IBM has already presented a blade server prototype based on two Cell processors, running the 2.6.11 Linux kernel. [21] The processors ran at 2.4 - 2.8 GHz. IBM expects to soon run them at 3.0 GHz, providing 200 GFLOPS single-precision floating point performance per CPU (or 400 GFLOPS per board). IBM also expects to arrange seven blades in a single rackmount chassis (similar to their BladeCenter product line) for a total performance of 2.8 TFLOPS (or 284 GFLOPS in double precision) per chassis. However, the performance numbers released by IBM are still theoretical, and the real-world performance may fall significantly short of theoretical expectations.

Console videogames

Sony's PlayStation 3 video game console will contain the first production application of the Cell processor, clocked at 3.2 GHz and containing seven usable SPEs. An eighth will be manufactured, but one will be disabled at the factory in order to allow Sony to increase the yield on the processor manufacture.

Home cinema

Reportedly, Toshiba is considering producing HDTVs using Cell. They have already presented a system to decode 48 MPEG-2 streams simultaneously on a 1920×1080 screen. [22]

Software engineering

Due to the flexible nature of the Cell, there are several possibilities for the utilization of its resources: [23]

Job queue

The PPE maintains a job queue, schedules jobs in SPEs, and monitors progress. Each SPE runs a "mini kernel" whose role is to fetch a job, execute it, and synchronize with the PPE.

Self-multitasking of SPEs

The kernel and scheduling is distributed across the SPEs. Tasks are synchronized using mutexes or semaphores like in a conventional operating system. Ready to run tasks wait in a queue for a SPE to execute them. The SPEs use shared memory for all tasks in this configuration.

Stream processing

Each SPE runs a distinct program. Data comes from an input stream, and is sent to SPEs. When an SPE has terminated the processing, the output data is sent to output stream.

Open source software development

As of 2005-06-23, patches enabling Cell support in the Linux kernel were submitted for inclusion by IBM developers [24]. Arnd Bergmann (one of the developers of the aforementioned patches) also described the Linux-based Cell architecture at LinuxTag 2005. [25]

Both PPE and SPEs are programmable in C/C++ using a common API provided by libraries. According to Sony, a compiler, debugger, IDE, performance analyzer, and Cell emulator should be made available soon. [26] IBM has developed a pseudo-filesystem for Linux coined "Spufs" that simplifies access to and use of the SPE resources.

IBM is currently maintaining the Linux kernel and GDB ports, while Sony maintains the GNU toolchain (GCC, binutils). [27].

Acronyms

References

  1. ^ "Introduction to the Cell multiprocessor". September 7 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  2. ^ "CELL Processor Gets Ready To Entertain The Masses". February 8 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  3. ^ "Arnd Bergmann on Cell". June 25 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  4. ^ "Spufs: The Cell Synergistic Processing Unit as a virtual file system". June 25 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  5. ^ "Cell-CPU auf dem LinuxTag (at the LinuxTag)". June 25 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  6. ^ "Open sourcing of Cell coming to fruition". June 10 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  7. ^ "Unleashing the power: A programming example of large FFTs on Cell (broadcast replay)". June 9 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  8. ^ "IBM Discloses Cell Based Blade Server Board Prototype". May 25 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  9. ^ "IBM will unlock door to Cell". May 23 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  10. ^ "Toshiba Demonstrates Cell Microprocessor Simultaneously Decoding 48 MPEG-2 Streams". April 25 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  11. ^ "CELL: A New Platform for Digital Entertainment". March 9 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  12. ^ "CELL Microprocessor Revisited". 28 February 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  13. ^ "Power Efficient Processor Design and the Cell Processor" (PDF). 16 February 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  14. ^ "Prospects For the CELL Microprocessor Beyond Games". 11 February 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  15. ^ "ISSCC 2005: The CELL Microprocessor". 10 February 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  16. ^ "A 4.8Ghz Fully Pipelined Embedded SRAM in the Streaming Processor of a CELL Processor" (PDF). 9 February 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  17. ^ "The Design and Implementation of a First-Generation CELL Processor" (PDF). 8 February 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  18. ^ "IBM, Sony, Toshiba unveil nine-core Cell processor". 7 February 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  19. ^ "Cell Microprocessor Briefing". 7 February 2005. {{cite news}}: Check date values in: |date= (help); Unknown parameter |org= ignored (help)
  20. ^ "The Cell Processor Programming Model". LinuxTag 2005. 11 June. {{cite web}}: Check date values in: |date= and |year= / |date= mismatch (help)
  21. ^ "IBM Research - Cell". IBM. 11 june. {{cite web}}: Check date values in: |date= and |year= / |date= mismatch (help)
  22. ^ "Cell Architecture Explained". Blachford. 22 September. {{cite web}}: Check date values in: |date= and |year= / |date= mismatch (help)