Jump to content

Intel C++ Compiler

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Azakhark (talk | contribs) at 09:34, 18 May 2010 (Optimizations). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Intel C++ Compiler
Developer(s)Intel
Stable release
11.1 / June 23, 2009 (2009-06-23)
Repository
Operating systemLinux, Microsoft Windows and Mac OS X
TypeCompiler
LicenseProprietary
Websitehttp://software.intel.com/en-us/intel-compilers/

Intel C++ Compiler (also known as icc or icl) describes a group of C/C++ compilers from Intel. Compilers are available for Linux, Microsoft Windows and Mac OS X.

Intel supports compilation for its IA-32, Intel 64, Itanium 2, processors and certain non-Intel but compatible processors, such as certain AMD processors. Developers should check system requirements. The Intel C++ Compiler for IA-32 and Intel 64 features an automatic vectorizer that can generate SSE, SSE2, SSE3 and SSE4 SIMD instructions, the embedded variant for Intel Wireless MMX and MMX 2.[1] Since its introduction, the Intel C++ Compiler for IA-32 has greatly increased adoption of SSE2 in Windows application development.[citation needed]

Intel C++ Compiler further supports both OpenMP 3.0 and automatic parallelization for symmetric multiprocessing. With the add-on capability Cluster OpenMP, the compiler can also automatically generate Message Passing Interface calls for distributed memory multiprocessing from OpenMP directives.

Intel C++ Compiler belongs to the family of compilers with the Edison Design Group frontend (like the SGI MIPSpro, Comeau C++, Portland Group, and others). The compiler is also notable for being widely used for SPEC CPU Benchmarks of IA-32, x86-64, and Itanium 2 architectures.

The Intel C++ Compiler is available in four forms. It is part of Intel Parallel Studio, the Intel C++ Compiler Professional Edition package, the Intel Compiler Suite package and the Intel Cluster Toolkit, Compiler Edition. The Intel Software Products site provides more information.

Optimizations

Intel tunes its compilers to optimize for its hardware platforms to minimize stalls and to produce code that executes in the fewest number of cycles. The Intel C++ Compiler supports three separate high-level techniques for optimizing the compiled program: interprocedural optimization (IPO), profile-guided optimization (PGO),[2] and high-level optimizations (HLO). It also supports tools and techniques for adding and maintaining parallelism to applications.

Profile-guided optimization refers to a mode of optimization where the compiler is able to access data from a sample run of the program across a representative input set. The data would indicate which areas of the program are executed more frequently, and which areas are executed less frequently. All optimizations benefit from profile-guided feedback because they are less reliant on heuristics when making compilation decisions.

High-level optimizations are optimizations performed on a version of the program that more closely represents the source code. This includes loop interchange, loop fusion, loop unrolling, loop distribution, data prefetch, and more.[3] These optimizations are usually very aggressive and may take considerable compilation time.

Interprocedural optimization applies typical compiler optimizations (such as constant propagation) but using a broader scope that may include multiple procedures, multiple files, or the entire program.[4]

The compilers include a parallel debugger extension, Intel Threading Building Blocks, lambda function support, and a source checker tool for use with threaded code.

Intel's compiler has been criticized for applying, by default, floating-point optimizations not allowed by the C standard and that require a special flags with other compilers such as gcc.[5]

Languages

Intel's suite of compilers has front ends for C, C++, and Fortran.

Early versions of ICC for Linux that predate GCC 3.x use the Dinkumware name mangling scheme in order to provide a more standard implementation of C++ than GCC 2.x. This made its ABI incompatible with both GCC versions. Intel removed the Dinkumware libraries in the 10.0 release (June 2007). Since then, the compiler has been and remains compatible with GCC 3.2 and later.

Architectures

Versions

The following versions of Intel C++ Compiler have been released:

Compiler version Release date Major New Features
Intel C++ Compiler 11.1 June 23, 2009 Support for latest Intel SSE SSE4.2, AVX and AES instructions. Parallel Debugger Extension. Improved integration into Microsoft Visual Studio, Eclipse CDT 5.0 and Mac Xcode IDE.
Intel C++ Compiler 11.0 November 2008 Initial C++0x support [1]. VS2008 IDE integration on Windows. OpenMP 3.0. Source Checker for static memory/parallel diagnostics.
Intel C++ Compiler 10.1 November 7, 2007 New OpenMP* compatibility runtime library: if you use the new OpenMP RTL, you can mix and match with libraries and objects built by Visual C++. To use the new libraries, you need to use the new option "-Qopenmp /Qopenmp-lib:compat" on Windows, and "-openmp -openmp-lib:compat" on Linux. This version of the Intel compiler supports more intrinsics from Visual Studio 2005.

VS2008 support - command line only in this release. The IDE integration was not supported yet.

Intel C++ Compiler 10.0 June 5, 2007[6] Improved parallelizer and vectorizer, Streaming SIMD Extensions 4 (SSE4), new and enhanced optimization reports for advanced loop transformations, new optimized exception handling implementation.
Intel C++ Compiler 9.0 June 14, 2005[7] AMD64 architecture (for Windows), software-based speculative pre-computation (SSP) optimization, improved loop optimization reports.[8][9]
Intel C++ Compiler 8.1 September, 2004 AMD64 architecture (for Linux).[10][11]
Intel C++ Compiler 8.0 December 15, 2003[12] Precompiled headers, code-coverage tools. [2]
Intel C++ Compiler 7.1 March, 2003 Partial support for the Intel Pentium 4 with Streaming SIMD Extensions 3 (SSE3). [3]
Intel C++ Compiler 7.0 November 25, 2002[13] [4]
Intel C++ Compiler 6.0 April 24, 2002[14] [5]

Experimental / Prototype Versions

In addition, the following "prototype" editions have been made available:

Compiler version Release date Major New Features
Intel STM Compiler Prototype Edition September 17, 2007[15] Prototype version of the Intel compiler that implements support for Software Transactional Memory (STM). The Intel STM Compiler supports Linux and Windows, producing 32 bit code for x86 (Intel and AMD) processors. Intel stated the belief that "The availability of such a prototype compiler allows unprecedented exploration by C / C++ software developers of a promising technique to make programming for multi-core easier." The STM compiler requires that you already have the Intel compiler installed.
Intel Concurrent Collections for C/C++ 0.3 September, 2008 Intel Concurrent Collections for C/C++ provides a mechanism for constructing C++ programs that execute in parallel. It allows developers to ignore issues of parallelism such as low-level threading constructs or scheduling/distribution of computations. The model allows developers to specify high-level computational steps including inputs and outputs without imposing unnecessary ordering on their execution. Code within the computational steps is written using standard serial constructs of the C++ language. Data is either local to a computational step or it is explicitly produced and consumed by them. It supports multiple styles of parallelism (e.g., data, task, pipeline parallel).

Flags and manuals

Documentation can be found at the Intel Software Technical Documentation site.

Windows Linux Comment
/Od -O0 No optimization
/O1 -O1 Optimize for size
/O2 -O2 Optimize for speed and enable some optimization
/O3 -O3 Enable all optimizations as O2, and intensive loop optimizations
/QxO -xO Enables SSE3, SSE2 and SSE instruction sets optimizations for non-Intel CPUs [16]
/fast -fast Shorthand. On Windows this equates to "/O3 /Qipo /xT /no-prec-div" ; on Linux "-O3 -ipo -static -xHOST -no-prec-div". Note that the processor specific optimization flag (-xHOST) will optimize for the processor compiled on—it is the only flag of -fast, which may be overridden.
/Qprof-gen -prof_gen Compile the program and instrument it for a profile generating run.
/Qprof-use -prof_use May only be used after running a program that was previously compiled using prof_gen. Uses profile information during each step of the compilation process.

Debugging

The Intel compiler provides debugging information that is standard for the common debuggers (DWARF 2 on Linux, similar to gdb, and COFF for Windows). The flags to compile with debugging information are /Zi on Windows and -g on Linux.

Intel also provides its own debugger called idb, which can be run in both dbx and gdb compatible command mode.

While the Intel compiler can generate a gprof compatible profiling output, Intel also provides a kernel level, system-wide statistical profiler as a separate product called VTune. VTune features an easy-to-use GUI (integrated into Visual Studio for Windows, Eclipse for Linux) as well as a command line interface.

The 11.x releases of the compiler introduced the Parallel Debugger Extension, which provides techniques for debugging threaded applications. It can be used with other, compatible compilers, such as Microsoft Visual C++ on Windows as available in Visual Studio 2005 and 2008 and gcc on Linux.

Criticism

Here is an almost-verbatim quote from a blog[17]: The Intel compiler and several different Intel function libraries have suboptimal performance on AMD and VIA processors. The reason is that the compiler or library can make multiple versions of a piece of code, each optimized for a certain processor and instruction set, for example SSE2, SSE3, etc. The system includes a function that detects which type of CPU it is running on and chooses the optimal code path for that CPU. This is called a CPU dispatcher. However, the Intel CPU dispatcher does not only check which instruction set is supported by the CPU, it also checks the vendor ID string. If the vendor string is "GenuineIntel" then it uses the optimal code path. If the CPU is not from Intel then, in most cases, it will run the slowest possible version of the code, even if the CPU is fully compatible with a better version.

This vendor-specific CPU dispatching decreases the performance on non-Intel processors of software built with an Intel compiler or an Intel function library - possibly without the knowledge of the programmer. This has allegedly led to misleading benchmarks[18]. A legal battle between AMD and Intel over this and other issues has been settled in November 2009[19]. In addition to this, the US Federal Trade Commission has filed an antitrust complaint against Intel.[20]

See also

References