Jump to content

Programming language

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Kuciwalker (talk | contribs) at 00:26, 8 October 2005 (Classifications of programming languages). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

An alternate rewrite has been proposed. Please refer to it for large rewrites.

A programming language or computer language is a standardized communication technique for expressing instructions to a computer. It is a set of syntactic and semantic rules used to define computer programs. A language enables a programmer to precisely specify (but see Genetic Programming) what data a computer will act upon, how these data will be stored/transmitted, and what actions will be taken under various circumstances.

Features of programming language

Each programming language can be thought of as a set of formal specifications concerning syntax, vocabulary, and meaning.

These specifications usually include:

Most languages that are widely used, or have been used for a considerable period of time, have standardization bodies that meet regularly to create and publish formal definitions of the language, and discuss extending or supplementing the already extant definitions.

Data types

Internally, all data in modern digital computers are stored simply as zeros or ones (binary). The data typically represent information in the real world such as names, bank accounts and measurements and so the low-level binary data are organized by programming languages into these high-level concepts.

The particular system by which data are organized in a program is the type system of the programming language; the design and study of type systems is known as type theory. Languages can be classified as statically typed systems, and dynamically typed languages. Statically typed languages can be further subdivided into languages with manifest types, where each variable and function declaration has its type explicitly declared, and type-inferred languages. It is possible to perform type inference on programs written in a dynamically typed language, but it is entirely possible to write programs in these languages that make type inference infeasible. Sometimes dynamically typed languages are called latently typed.

With statically typed languages, there usually are pre-defined types for individual pieces of data (such as numbers within a certain range, strings of letters, etc.), and programmatically named values (variables) can have only one fixed type, and allow only certain operations: numbers cannot change into names and vice versa. Most mainstream statically typed languages, such as C, C++, C#, Java and Delphi, require all types to be specified explicitly; advocates argue that this makes the program easier to understand, detractors object to the verbosity it produces. Type inference is a mechanism whereby the type specifications can often be omitted completely, if it is possible for the compiler to infer the types of values from the contexts in which they are used -- for example, if a variable is assigned the value 1, a type-inferring compiler does not need to be told explicitly that the variable is an integer. There are however many different uses for integers; it might e.g. make sense in a program to prevent inadvertent adding of a phone number to the number of apples in a box. Therefore some languages such as Ada allow defining different kinds of incompatible integers; this is called strong typing. Type-inferred languages can be more flexible to use, particularly when they also implement parametric polymorphism. Examples of type-inferring languages are Haskell, MUMPS and ML.

Dynamically typed languages treat all data locations interchangeably, so inappropriate operations (like adding names, or sorting numbers alphabetically) will not cause errors until run-time -- although some implementations provide some form of static checking for obvious errors. Examples of these languages are APL, Objective-C, Lisp, Smalltalk, JavaScript, Tcl, Prolog, Python, and Ruby.

Strongly typed languages do not permit the usage of values as different types; they are rigorous about detecting incorrect type usage, either at runtime for dynamically typed languages, or at compile time for statically typed languages. Ada, Java, ML, and Oberon are examples of strongly typed languages.

Weakly typed languages do not strictly enforce type rules or have an explicit type-violation mechanism, often allowing for undefined behavior, segmentation violations, or other unsafe behavior if types are assigned incorrectly. C, assembly language, C++, and Tcl are examples of weakly typed languages.

Note that strong vs. weak is a continuum; Java is a strongly typed language relative to C, but is weakly typed relative to ML. Use of these terms is often a matter of perspective, much in the way that an assembly language programmer would consider C to be a high-level language while a Java programmer would consider C to be a low-level language.

Note that strong and static are orthogonal concepts. Java is a strongly, statically typed language. C is a weakly, statically typed language. Python is a strongly, dynamically typed language. Tcl is a weakly, dynamically typed language. But beware that some people incorrectly use the term strongly typed to mean strongly, statically typed, or, even more confusingly, to mean simply statically typed--in the latter usage, C would be called strongly typed, despite the fact that C doesn't catch that many type errors and that it's both trivial and common to defeat its type system (even accidentally).

Aside from when and how the correspondence between expressions and types is determined, there's also the crucial question of what types the language defines at all, and what types it allows as the values of expressions (expressed values) and as named values (denoted values). Low-level languages like C typically allow programs to name memory locations, regions of memory, and compile-time constants, while allowing expressions to return values that fit into machine registers; ANSI C extended this by allowing expressions to return struct values as well (see record). Functional languages often restrict names to denoting run-time computed values directly, instead of naming memory locations where values may be stored, and in some cases refuse to allow the value denoted by a name to be modified at all. Languages that use garbage collection are free to allow arbitrarily complex data structures as both expressed and denoted values.

Finally, in some languages, procedures are allowed only as denoted values (they cannot be returned by expressions or bound to new names); in others, they can be passed as parameters to routines, but cannot otherwise be bound to new names; in others, they are as freely usable as any expressed value, but new ones cannot be created at run-time; and in still others, they are first-class values that can be created at run-time.

Data structures

Most languages also provide ways to assemble complex data structures from built-in types and to associate names with these new combined types (using arrays, lists, stacks, files).

Object oriented languages allow the programmer to define data-types called "Objects" which have their own intrinsic functions and variables (called methods and attributes respectively). A program containing objects allows the objects to operate as independent but interacting sub-programs: this interaction can be designed at coding time to model or simulate real-life interacting objects. This is a very useful, and intuitive, functionality. Languages such as Python and Ruby have developed as OO (Object oriented) languages. They are comparatively easy to learn and to use, and are gaining popularity in professional programming circles, as well as being accessible to non-professionals. It is commonly thought that object-orientation makes languages more intuitive, increasing the public availability and power of customized computer applications.

Instruction and control flow

Once data has been specified, the machine must be instructed how to perform operations on the data. Elementary statements may be specified using keywords or may be indicated using some well-defined grammatical structure.

Each language takes units of these well-behaved statements and combines them using some ordering system. Depending on the language, differing methods of grouping these elementary statements exist. This allows one to write programs that are able to cover a variety of input, instead of being limited to a small number of cases. Furthermore, beyond the data manipulation instructions, other typical instructions in a language are those used for control flow (branches, definitions by cases, loops, backtracking, functional composition).

Design philosophy

For the above-mentioned purposes, each language has been developed using a special design or philosophy. Some aspect or another is particularly stressed by the way the language uses data structures, or by which its special notation encourages certain ways of solving problems or expressing their structure.

Since programming languages are artificial languages, they require a high degree of discipline to accurately specify which operations are desired. Programming languages are not error tolerant; however, the burden of recognizing and using the special vocabulary is reduced by help messages generated by the programming language implementation. There are a few languages which offer a high degree of freedom in allowing self-modification in which a program re-writes parts of itself to handle new cases. Typically, only machine language, Prolog, PostScript, and the members of the Lisp family (Common Lisp, Scheme) provide this capability. In MUMPS language this technique is called dynamic recompilation; emulators and other virtual machines exploit this technique for greater performance.

There are a variety of ways to classify programming languages. The distinctions are not clear-cut; a particular language standard may be implemented in multiple classifications. For example, a language may have both compiler and interpreter.

In addition, most compiler contain some run-time interpreter features. The most notable example is the familiar I/O format string, which is written in a specialized, little language and which is used to describe how to convert program data to or from an external representation. This string is typically interpreted at run time by a specialized format-language interpreter included in the run-time support libraries. Many programmers have found the flexibility of this arrangement to be very valuable.

Compiler and interpreter

Programming languages generally have two program translators: compiler or interpreter. With a compiler, the code you enter is reduced to a set of machine-specific instructions before being saved as an executable file. With an interpreter, the code is saved in the same format that you entered. Compiled programs generally run faster than interpreted ones because interpreted programs must be reduced to machine instructions at runtime. However, with an interpreter you can do things that cannot be done in a compiler. For example, interpreted programs can modify themselves by adding or changing functions at runtime. It is also usually easier to develop applications in an interpreted environment because you don't have to recompile your application each time you want to test a small section.

History of programming languages

The development of programming languages, unsurprisingly, follows closely the development of the physical and electronic processes used in today's computers.

Programming languages have been under development for years and will remain so for many years to come. They got their start with a list of steps to wire a computer to perform a task. These steps eventually found their way into software and began to acquire newer and better features. The first major languages were characterized by the simple fact that they were intended for one purpose and one purpose only, while the languages of today are differentiated by the way they are programmed in, as they can be used for almost any purpose. And perhaps the languages of tomorrow will be more natural with the invention of quantum and biological computers.

Charles Babbage is often credited with designing the first computer-like machines, which had several programs written for them (in the equivalent of assembly language) by Ada Lovelace.

In the 1940s the first recognizably modern, electrically powered computers were created. Some military calculation needs were a driving force in early computer development, such as encryption, decryption, trajectory calculation and massive number crunching needed in the development of atomic bombs. At that time, computers were extremely large, slow and expensive: advances in electronic technology in the post-war years led to the construction of more practical electronic computers. At that time only Konrad Zuse imagined the use of a programming language (developed eventually as Plankalkül) like those of today for solving problems.

Subsequent breakthroughs in electronic technology (transistors, integrated circuits, and chips) drove the development of increasingly reliable and more usable computers. The first widely used high level programming language was Fortran, developed during 1954–57 by an IBM team led by John W. Backus. It is still widely used for numerical work, with the latest international standard released in 2004. A Computer Languages History graphic shows a timeline from Fortran in 1954.

Dennis Ritchie and Brian Kernighan developed the C programming language, initially for DEC PDP-11 in 1970. Later with lead of Bjarne Stroustrup the programming language C++ appeared in 1985 as an Object oriented language vertically compatible with C. Sun Microsystems released Java in 1995 which became very popular as an introductory programming language taught in universities. Microsoft presented the C# programming language in 2001 which is very similar to C++ and Java. There are many, many other languages (cf. List of programming languages).



Classifications of programming languages

Formal semantics

The rigorous definition of the meaning of programming languages is the subject of formal semantics.

See also

Template:Major programming languages small