Jump to content

Compiler: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
No edit summary
Line 1: Line 1:
hello u
[[Image:Ideal compiler.png|right|thumb|300px|A diagram of the operation of an ideal compiler.]]

A ~G~ is gay '''compiler''' is a [[computer program]] that translates a computer program written in one [[computer language]] (called the ''source language'') into an equivalent program written in another computer language (called the output, object, or ''target language'').

== Introduction and history ==

Most compilers translate [[source code]] written in a [[high level language]] to [[object code]] or [[machine language]] that may be directly executed by a computer or a [[virtual machine]]. However, translation from a low level language to a high level one is also possible; this is normally known as a [[decompiler]] if it is reconstructing a high level language program which (could have) generated the low level language program. Compilers also exist which translate from one high level language to another (cross compilers), or sometimes to an intermediate language that still needs further processing; these are sometimes known as [[cascader]]s.

Typical compilers output so-called [[object code|object]]s that basically contain [[machine code]] augmented by information about the name and location of entry points and external calls (to functions not contained in the object). A set of object files, which need not have all come from a single compiler provided that the compilers used share a common output format, may then be [[linker|linked]] together to create the final executable which can be run directly by a user.

<!-- ### in the following paragraph (only), upper-case FORTRAN is correct, as it was the name used at the time, and on IBM's early compilers ### -->
Several experimental compilers were developed in the [[1950s]], but the [[Fortran|FORTRAN]] team led by [[John Backus]] at [[IBM]] is generally credited as having introduced the first complete compiler, in [[1957]]. [[COBOL]] was an early language to be compiled on multiple architectures, in [[1960]]. [http://www.interesting-people.org/archives/interesting-people/199706/msg00011.html]

The idea of compilation quickly caught on, and most of the principles of compiler design were developed during the [[1960s]].

A compiler is itself a computer program written in some ''implementation language''. Early compilers were written in [[Assembler|assembly language]]. The first ''self-hosting'' compiler -- capable of compiling its own source code in a high-level language -- was created for [[Lisp programming language|Lisp]] by Hart and Levin at [[Massachusetts Institute of Technology|MIT]] in [[1962]]. [http://www.ai.mit.edu/research/publications/browse/0000browse.shtml]
The use of high-level languages for writing compilers gained added impetus in the early [[1970s]] when Pascal and C compilers were written in their own languages. Building a self-hosting compiler is a [[Bootstrapping|bootstrapping]] problem -- the first such compiler for a language must be compiled either by a compiler written in a different language, or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an interpreter.

During the [[1980s]] and [[1990s]] a large number of free compilers and [[compiler development tools]] were developed for all kinds of languages, both as part of the [[GNU Compiler Collection|GNU]] project and other [[open-source]] initiatives. Some of them are considered to be of high quality and their free source code makes a nice read for anyone interested in modern compiler concepts.

== Types of compilers ==

A compiler may produce code intended to run on the same type of computer and operating system ("[[platform]]") as the compiler itself runs on. This is sometimes called a native-code compiler. Alternatively, it might produce code designed to run on a different platform. This is known as a [[cross compiler]]. Cross compilers are very useful when bringing up a new hardware platform for the first time (see [[bootstrapping]]). A "source to source compiler" is a type of compiler that takes a high level language as its input and outputs a high level language. For example, an automatic parallelizing compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations (e.g. [[OpenMP]]) or language constructs (e.g. Fortran's <code>DOALL</code> statements).

* [[One-pass compiler]], like early compilers for [[Pascal programming language|Pascal]]
** The compilation is done in one pass, hence it is very fast.
* [[Threaded code compiler]] (or interpreter), like most implementations of [[FORTH]]
** This kind of compiler can be thought of as a database lookup program. It just replaces given strings in the source with given binary code. The level of this binary code can vary; in fact, some FORTH compilers can compile programs that don't even need an operating system.
* [[Incremental compiler]], like many Lisp systems
** Individual functions can be compiled in a run-time environment that also includes interpreted functions. Incremental compilation dates back to 1962 and the first Lisp compiler, and is still used in [[Common Lisp]] systems.
* [[Stage compiler]] that compiles to assembly language of a theoretical machine, like some Prolog implementations
** This Prolog machine is also known as the [[Warren abstract machine]] (or WAM). Byte-code compilers for Java, Python (and many more) are also a subtype of this.
* [[Just-in-time compilation|Just-in-time compiler]], used by Smalltalk and Java systems
** Applications are delivered in [[bytecode]], which is compiled to native machine code just prior to execution.
* A [[retargetable compiler]] is a compiler that can relatively easily be modified to generate code for different [[Central processing unit|CPU]] architectures. The object code produced by these is frequently of lesser quality than that produced by a compiler developed specifically for a processor. Retargetable compilers are often also cross compilers. [[GNU Compiler Collection|GCC]] is an example of a retargetable compiler.
* A parallelizing compiler converts a serial input program into a form suitable for efficient execution on a [[parallel computer]] architecture.

== Compiled vs. interpreted languages ==

Many people divide higher-level programming languages into [[compiled language]]s and [[interpreted language]]s. However, there is rarely anything about a language that requires it to be compiled or interpreted. Compilers and interpreters are ''implementations'' of languages, not languages themselves. The categorization usually reflects the most popular or widespread implementations of a language -- for instance, BASIC is thought of as an interpreted language, and C a compiled one, despite the existence of BASIC compilers and C interpreters. There are exceptions, however; some language specifications assume the use of a compiler (as with C), or spell out that implementations must include a compilation facility (as with Common Lisp); on the other hand, some languages have features that are very easy to implement in an interpreter, but make writing a compiler much harder (one such example is the capability of executing arbitrary source code contained in a run-time supplied string). <!--- examples of languages that have this should be supplied --->

== Compiler design ==

In the past, compilers were divided into many passes<sup>[[Compiler#Notes|1]]</sup> to save space. A pass in this context is a run of the compiler through the source code of the program to be compiled, resulting in the building up of the internal data of the compiler (such as the evolving symbol table and other assisting data). When each pass is finished, the compiler can free the internal data space needed during that pass. This 'multipass' method of compiling was the common compiler technology at the time, but was also due to the small main memories of host computers relative to the source code and data.

Many modern compilers share a common 'two stage' design. The [[front end]] translates the source language into an intermediate representation. The second stage is the [[back end]], which works with the internal [[representation]] to produce code in the output language. The front end and back end may operate as separate passes, or the front end may call the back end as a [[subroutine]], passing it the intermediate representation.

This approach mitigates complexity separating the concerns of the front end, which typically revolve around language semantics, error checking, and the like, from the concerns of the back end, which concentrates on producing output that is both efficient and correct. It also has the advantage of allowing the use of a single back end for multiple source languages, and similarly allows the use of different back ends for different targets.

Often, optimizers and error checkers can be shared by both front ends and back ends if they are designed to operate on the intermediate language that a front-end passes to a back end. This can let many compilers (combinations of front and back ends) reuse the large amounts of work that often go into code analyzers and optimizers.

Certain languages, due to the design of the language and certain rules placed on the declaration of variables and other objects used, and the predeclaration of executable procedures prior to reference or use, are capable of being compiled in a single pass. The [[Pascal programming language]] is well known for this capability, and in fact many Pascal compilers are themselves written in the Pascal language because of the rigid specification of the language and the capability to use a single pass to compile Pascal language programs.

== Compiler front end ==
The compiler front end consists of multiple phases itself, each informed by [[formal language]] theory:

# [[Lexical analysis]] - breaking the source code text into small pieces ('tokens' or 'terminals'), each representing a single atomic unit of the language, for instance a [[keyword]], [[identifier]] or [[symbol|symbol names]]. The token language is typically a [[regular language]], so a [[finite state automaton]] constructed from a [[regular expression]] can be used to recognize it. This phase is also called lexing or scanning.
# [[Syntax analysis]] - Identifying syntactic structures of source code. It only focuses on the structure. In other words, it identifies the order of tokens and understand hierarchical structures in code. This phase is also called parsing.
# [[Semantic analysis]] is to recognize the ''meaning'' of program code and start to prepare for output. In that phase, type checking is done and most of compiler errors show up.
# [[Intermediate language generation]] - an equivalent to the original program is created in an intermediate language.

== Compiler back end ==
While there are applications where only the compiler front end is necessary, such as static language verification tools, a real compiler hands the intermediate representation generated by the front end to the back end, which produces a functional equivalent program in the output language. This is done in multiple steps:
# [[Compiler Analysis]] - This is the process to gather program information from the intermediate representation of the input source files. Typical analysis are variable define-use and [[use-define chain]], [[data dependence analysis]], [[alias analysis]] etc. Accurate analysis is the base for any compiler optimizations. The [[call graph]] and [[control flow graph]] are usually also built during the analysis phase.
# [[Compiler optimization|Optimization]] - the intermediate language representation is transformed into functionally equivalent but faster (or smaller) forms. Popular optimizations are [[inline expansion]], [[dead code elimination]], [[constant propagation]], [[loop transformation]], [[register allocation]] or even [[auto parallelization]].
# [[Code generation]] - the transformed intermediate language is translated into the output language, usually the native [[machine language]] of the system. This involves resource and storage decisions, such as deciding which variables to fit into [[registers]] and [[computer storage|memory]] and the selection and scheduling of appropriate [[machine instruction]]s along with their associated [[addressing mode]]s (see also [[Sethi-Ullman algorithm]]).

== Notes ==
# A ''pass'' has also been known as a ''parse'' in some textbooks. The idea is that the source code is ''parsed'' by gradual, iterative refinement to produce the completely translated object code at the end of the process. There is, however, some dispute over the general use of ''parse'' for all those phases (passes), since some of them, e.g. object code generation, are arguably not regarded to be parsing as such.

== References ==
*''[[Compilers: Principles, Techniques and Tools]]'' by [[Alfred V. Aho]], [[Ravi Sethi]], and [[Jeffrey D. Ullman]] (ISBN 0201100886) is considered to be the standard authority on compiler basics, and makes a good primer for the techniques mentioned above. (It is often called the ''Dragon Book'' because of the picture on its cover showing a Knight of Programming fighting the Dragon of Compiler Design.) [http://www.aw.com/catalog/academic/product/0,4096,0201100886,00.html External link to publisher's catalog entry]

*''[[Understanding and Writing Compilers: A Do It Yourself Guide]]'' (ISBN 0333217322) by [[Richard Bornat]] is an unusually helpful book, being one of the few that adequately explains the recursive generation of machine instructions from a parse-tree. Having learnt his subject in the early days of mainframes and minicomputers, the author has many useful insights that more recent books often fail to convey.

==See also==
* [[List of important publications in computer science#Compilers| Important publications in compilers]] for [[programming language]]s
*[[assembler]]s
*Interpreters:
**[[interpreter (computer software)|interpreter software]]
**[[abstract interpretation]]
*[[linker]]s
*[[parsing]]:
**[[Top-down parsing]]
**[[Bottom-up parsing]]
**[[Semantic analysis]]
***[[attribute grammar]]
*[[Semantics encoding]]
*[[error avalanche]]
*[[recompilation]]
*[[decompiler]]
*[[Just-in-time compiler]]
*[[Jello compiler|Jello]]
*[[Loop nest optimization]]
*[[Meta-Compilation]]
*[[preprocessor]]
*[[parallel compilers]]

==External links==
*[http://codepedia.com/compile What is "compile"?] from the developer's encyclopedia
<!-- delink dev-pedia for the time being -->
*[http://www.kegel.com/crosstool/ Building and Testing gcc/glibc cross toolchains]
*[http://citeseer.org/cs?q=compiler Citations from CiteSeer]
*[http://compilers.iecc.com/index.html The comp.compilers newsgroup and RSS feed]
*[http://compilers.iecc.com/crenshaw/ ''Let's Build a Compiler'' by Jack Crenshaw (1988 to 1995)] "a non-technical introduction to compiler construction"
*[http://www.gtoal.com/software/CompilersOneOhOne Simple compiler source] from the "[http://groups.yahoo.com/group/compilers101/ Compilers 101]" group. One page, easy to follow.
*[http://www.tutorial-reports.com/computer-science/parallel-compiler/ Parallel Compilers]

[[Category:Computer programming tools]][[Category:Computer terminology]]
[[Category:Computer science]][[Category:Compilers|*]]

[[af:Vertalerkonstruksie]]
[[bg:&#1050;&#1086;&#1084;&#1087;&#1080;&#1083;&#1072;&#1090;&#1086;&#1088;]]
[[ca:Compilador]]
[[cs:P&#345;eklada&#269;]]
[[de:Compiler]]
[[et:Kompilaator]]
[[es:Compilador]]
[[fr:Compilateur]]
[[ko:&#52980;&#54028;&#51068;&#47084;]]
[[it:Compilatore]]
[[he:&#1502;&#1492;&#1491;&#1512;]]
[[lt:Kompiliatorius]]
[[hu:Fordítóprogram]]
[[nl:Compiler]]
[[ja:&#12467;&#12531;&#12497;&#12452;&#12521;]]
[[no:Kompilator]]
[[pl:Kompilator]]
[[pt:Compilador]]
[[ru:&#1058;&#1088;&#1072;&#1085;&#1089;&#1083;&#1103;&#1090;&#1086;&#1088;]]
[[simple:Compiler]]
[[fi:Ohjelmointikielen kääntäjä]]
[[sv:Kompilator]]
[[th:&#3605;&#3633;&#3623;&#3649;&#3611;&#3619;&#3650;&#3611;&#3619;&#3649;&#3585;&#3619;&#3617;]]
[[tr:Derleyici]]
[[zh:&#32534;&#35793;&#22120;]]Graeme waz ere!

Revision as of 13:37, 24 May 2005

hello u