Jump to content

Floating point operations per second

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 68.144.182.127 (talk) at 02:31, 7 June 2005 (FLOPS. GPUs, and game consoles). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

For other uses, see Disambiguation

In computing, FLOPS is an abbreviation of floating point operations per second. This is used as a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating point calculations. (Note: a hertz is a cycle (or operation) per second. Compare to MIPS -- million instructions per second.)

Computing devices exhibit an enormous range of performance levels in floating-point applications, so it makes sense to introduce larger units than the FLOPS. The standard SI prefixes can be used for this purpose, resulting in such units as the megaFLOPS (MFLOPS, 106 FLOPS), the gigaFLOPS (GFLOPS, 109 FLOPS), the teraFLOPS (TFLOPS, 1012 FLOPS), and the petaFLOPS (PFLOPS, 1015 FLOPS).

One should speak in the singular of a FLOPS and not of a FLOP, although the latter is frequently encountered. The final S stands for second and does not indicate a plural.

The performance spectrum

A cheap but modern desktop computer using, for example, the Pentium 4 or Athlon 64 CPUs, typically have clock frequencies in excess of 2 GHz and computational performance in the range of a few GFLOPS. Even some video game consoles of the late 1990's vintage, such as the Gamecube and Dreamcast had performance in excess of one GFLOPS (but see below).

The original supercomputer, the Cray-1, was set up at Los Alamos National Laboratory in 1976. The Cray-1 was capable of 80 MFLOPS. In less than 30 years since then, the computational speed of supercomputers has jumped a millionfold.

The fastest computer in world as of November 5, 2004 was the IBM Blue Gene supercomputer, measuring 70.72 TFLOPS. This supercomputer was a prototype of the Blue Gene/L machine IBM is building for the Lawrence Livermore National Laboratory in California. During a speed test on 24th March 2005, it was rated at 135.5 TFLOPS. Blue Gene's new record was achieved by doubling the number of current racks to 32. Each rack holds 1,024 processors, yet the chips are the same as those found in high-end computers. The complete version will have a total of 64 racks and a theoretical speed measured at 360 TFLOPS. Distributed computing uses the Internet to link personal computers to achieve similar effect: it has allowed SETI@home, the largest such project, to compute data at more than 100 TFLOPS. Folding@home, the most powerful distributed computing project, has been able to sustain over 200 TFLOPS. As of June 2005, GIMPS is sustaining 17 TFLOPS.

Pocket calculators are at the other end of the performance spectrum. Each calculation request to a typical calculator requires only a single operation, so there is rarely any need for its response time to exceed that needed by the operator. Any response time below 0.1 second is experienced as instantaneous by a human operator, so a simple calculator could be said to operate at about 10 FLOPS.

Humans are even worse floating-point processors. If it takes a person a quarter of an hour to carry out a pencil-and-paper long division with 10 significant digits, that person would be calculating in the milliFLOPS range.

FLOPS as a measure of performance

In order for FLOPS to be useful as a measure of floating-point performance, a standard benchmark must be available on all computers of interest. One example is the LINPACK benchmark.

FLOPS in isolation are arguably not very useful as a benchmark for modern computers. There are many other factors in computer performance other than raw floating-point computation speed, such as I/O performance, interprocessor communication, cache coherence, and the memory hierarchy. This means that supercomputers are in general only capable of a small fraction of their "theoretical peak" FLOPS throughput (obtained by adding together the theoretical peak FLOPS performance of every element of the system). Even when operating on large highly parallel problems, their performance will be bursty, to a large part because of the residual effects of Amdahl's law on even highly parallel problems. Real benchmarks therefore measure both peak actual FLOPS performance as well as sustained FLOPS performance.

For ordinary (non-scientific) applications, integer operations (measured in MIPS) are far more common. Measuring floating point operation speed, therefore, does not predict accurately how the processor will perform on just any problem. However, for many scientific jobs such as analysis of data, a FLOPS rating is effective.

FLOPS. GPUs, and game consoles

Very high FLOPS figures are often quoted for inexpensive computer video cards and game consoles.

For example, the PlayStation 3 coming out in 2006 has been announced as having a system floating point performance of 2.0 TFLOPS, roughly double the announced TFLOPS rating of the Xbox 360, itself rated in the TFLOPS class. Although the PS3 will have better TFLOPS the graphics on the Xbox 360 are said (the PS3 and X-Box 360 are not out yet, no real comparison can/has been done) to be better than that of the PS3. By comparison, a general-purpose PC would have a FLOPS rating of only a few GFLOPS, if the performance of its CPU alone was considered. The TFLOPS ratings of the games consoles would appear to class them as supercomputers, if comparisons based on FLOPS alone were valid.

However, these FLOPS figures should be treated with caution, as they are not in general comparable like-for-like with FLOPS for a fully-programmable general-purpose CPU. The game console figures are based on total system performance (CPU + GPU). Most of the FLOPS performance for games consoles or video cards comes from their GPUs, which are deeply pipelined vector processors specialized for graphics operations, with only limited programmability. This is possible because 3D graphics operations are a classic example of a highly parallelizable problem which can easily be split between different execution units and pipelines, allowing a high speed gain to be obtained from scaling the number of logic gates, rather than clock speed alone. Furthermore, the system memory bandwidth on each console is a major factor to its performance. Xbox 360's overall system memory bandwidth eclipses PS3's by a factor of five, possibly nullifying PS3's claim that its superior Cell processor will cause it to be a more powerful system.

Disambiguation