Lisp (programming language): Difference between revisions

Content deleted Content added

Inline

Revision as of 00:58, 6 September 2004

Lisp is a family of computer programming languages with a long history. Developed first as an abstract notation for recursive functions, it later became the favored language of artificial intelligence research during the field's heyday in the 1970s and 1980s. Lisp languages are today used in a number of fields, from Web development to finance [1], and are also common in computer science education.

The name Lisp derives from "List Processing". Linked lists are one of Lisp languages' major data structures, and the same basic list operations work in all Lisp dialects. Other commonalities in Lisp dialects include dynamic typing, support for functional programming, and the ability to manipulate source code as data.

Lisp languages also have an instantly-recognizable appearance. Program code is written using the same syntax as lists -- the parenthesized S-expression syntax. Every sub-expression in a program (or data structure) is set off with parentheses. This makes Lisp languages easy to parse, and also makes it simple to do metaprogramming -- creating programs which write other programs. This is a major reason for its great popularity in the 70s and 80s because it was believed by artificial intelligence programmers that Lisp would lend itself naturally to self-propogating programs.

Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older. Like Fortran, Lisp has changed a great deal since its early days, and a number of dialects have existed over its history. Today, the most widely-known Lisp dialects for general-purpose programming are Common Lisp and Scheme.

History

Information Processing Language was the first AI language, from 1955 or 1956, and already included many of the concepts, such as list-processing and recursion, which came to be used in Lisp.

Lisp was invented by John McCarthy in 1958 while he was at MIT. McCarthy published its design a paper in Communications of the ACM in 1960, entitled "Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I". (Part II was never published.) He showed that with a couple of simple operators and a notation for functions (see minimal lisp below) you can build a whole programming language.

Lisp was originally implemented by Steve Russell on an IBM 704 computer, and two instructions on that machine became the primitive operations for decomposing lists: car (Contents of Address Register) and cdr (Contents of Decrement Register). Most dialects of LISP still use car and cdr for the operations that return the first item in a list and the rest of the list respectively.

Because of its expressiveness and flexibility, Lisp became popular with the artificial intelligence community. However, Lisp had its downsides as well: programs generate a large amount of intermediate output, which take up memory and have to be garbage collected. This made it difficult to run Lisp on the memory-limited stock hardware of the day. In the 1970s, an increasing user community and generous government funding led to the creation of LISP machines: dedicated hardware for running Lisp environments and programs. Along with modern compiler construction techniques, today's gigantic computer capacities (by the standards of the 1970s) have made this specialization unnecessary and quite efficient Lisp environments now exist.

During the 1980s and 1990s, a great effort was made to unify the numerous Lisp dialects into a single language. The new language, Common Lisp, was essentially a superset of the dialects it replaced. In 1994, ANSI published the Common Lisp standard, "ANSI X3.226-1994 Information Technology Programming Language Common Lisp." By this time the world market for Lisp was much smaller than in its heyday.

The language is amongst the oldest programming languages still in use as of the time of writing in 2003. Algol, Fortran and COBOL are of a similar vintage, and Fortran and COBOL are also still being used.

The now-ubiquitous if-then-else structure, now taken for granted as an essential element of any programming language, was invented by McCarthy for use in Lisp, where it saw its first appearance. It was inherited by Algol, which popularized it.

Syntax

Lisp is an expression-oriented language. Unlike most other languages, no distinction is made between "expressions" and "statements"; all code and data are written as expressions. When an expression is evaluated, it produces a value (or list of values), which then can be embedded into other expressions.

McCarthy's 1958 paper introduced two types of syntax: S-expressions (Symbolic Expressions), which are also called sexp's, and M-expressions (Meta Expressions), which express functions of S-expressions. M-expressions never found favour, and almost all LISPs today use S-expressions to manipulate both code and data.

The heavy use of parentheses in S-expressions has been criticized -- one joke acronym for Lisp is "Lots of Irritating Superfluous Parentheses" -- but the S-expression syntax is also responsible for much of Lisp's power: the syntax is extremely regular, which facilitates manipulation by computer.

The reliance on expressions gives the language great flexibility. Because Lisp functions are themselves written as lists, they can be processed exactly like data: programs can easily be written to manipulate other programs. This is known as metaprogramming. Many Lisp dialects exploit this feature using macro systems, which make it possible to extend the language almost without limit.

A Lisp list is written with its elements separated by whitespace, and delimited by parentheses. For example,

(1 2 "foo")

is a list whose elements have the values 1, 2, and "foo". These values are implicitly typed: they are respectively two integers and a string, and do not have to be declared as such. The empty list () is also represented as nil.

Expressions are written as lists, using prefix notation. The first element in the list is the name of a form, i.e. a function, operator, macro, or "special form" (see below.) The remainder of the list are the arguments. For example, the function list returns its arguments as a list, so the expression

(list 1 2 "foo")

evaluates to the list (1 2 "foo"). If any of the arguments are expression, they are recursively evaluated before the enclosing expression is evaluated. For example,

(list 1 2 (list 3 4))

evaluates to the list (1 2 (3 4)). Note that the third argument is a list; lists can be nested.

Arithmetic operators are treated similarly. The expression

(+ 1 2 3 4)

evaluates to 10. The equivalent under infix notation would be "1+2+3+4".

"Special forms" provide LISP's control structure. For example, the special form if takes three arguments. If the first argument is non-nil, it evaluates to the second argument; otherwise, it evaluates to the third argument. Thus, the expression

(if nil
    (list 1 2 "foo")
    (list 3 4 "bar"))

evaluates to (3 4 "bar"). (Of course, this would be more useful if a non-trivial expression had been substituted in place of nil!)

Another special form, defun, is used to define functions. The arguments to defun are the function name, a list of arguments, and the expression that the function evaluates to.

Pairs and lists

A Lisp list is a singly-linked list. Each cell of this list is called a cons (or sometimes a pair, because it contains two pointers), and is composed of two pointers, called the car and cdr. These names are today arbitrary, but stem from the assembly-code operations for dereferencing pointers on a 1950s-era IBM computer. ("CAR" was an acronym for "Contents of Arithmetic Register", and "CDR" for "Contents of Decrement Register".) Today, "car" and "cdr" should be taken as simply shorthand for "first pointer of a cons" and "second pointer of a cons". They are equivalent to the data and next fields discussed in the article linked list.

Of the many data structures that can be built out of singly-linked lists, one of the most basic is called a proper list. A proper list is either the special NIL (empty list) symbol, or a pair in which the car points to a datum (which may be another pair structure, such as a list), and the cdr points to another proper list.

If a given pair is taken to be the head of a linked list, then its car points to the first element of the list, and its cdr points to the rest of the list. For this reason, the car and cdr functions are also called first and rest when referring to conses which are part of a linked list (rather than, say, a tree).

Thus, a Lisp list is not an atomic object, as an instance of a container class in C++ or Java would be. A list is nothing more than an aggregate of linked conses. A variable which refers to a given list is simply a pointer to the first cons in the list. Traversal of a list is usually done by "cdring down" the list; that is, taking successive cdrs to visit each cons of the list.

Parenthesized S-expressions represent linked list structure. There are several ways to represent the same list as an S-expression. A cons can be written in dotted-pair notation as (a . b), where a is the car and b the cdr. A longer proper list might be written (a . (b . (c . (d . NIL)))) in dotted-pair notation. This is conventionally abbreviated as (a b c d) in list notation. An improper list may be written in a combination of the two -- as (a b c . d) for the list of three conses whose last cdr is d.

Because conses and lists are so universal in Lisp systems, it is a common misconception that they are Lisp's only data structure. In fact, all but the most simplistic Lisps have other data structures -- such as vectors (arrays), hash tables, structures, and so forth.

Shared structure

Lisp lists, being simple linked lists, can share structure with one another. That is to say, two lists can have the same tail, or final sequence of conses. For instance, after the execution of the following Common Lisp code --

(setq foo (list 'a 'b 'c))
(setq bar (cons 'x (cdr foo)))

? the lists foo and bar are (a b c) and (x b c) respectively. However, the tail (b c) is the same structure in both lists. Altering it, such as by replacing the c with a goose, will affect both lists.

Sharing structure rather than copying can be a dramatic performance improvement. However, it means that Lisp functions can alter lists passed to them as arguments. This can be a source of bugs, and functions which alter their arguments are documented as destructive for this very reason.

Aficionados of functional programming avoid destructive functions. In the Scheme dialect, which favors the functional style, the names of destructive functions are marked with a cautionary exclamation point, or "bang" ? such as set-car! (read set car bang), which replaces the car of a cons. In the Common Lisp dialect, destructive functions are commonplace; in fact, the language includes a special facility, setf, to make it easier to define and use them. A frequent style in Common Lisp is to write code functionally (without destructive calls) when prototyping, then to add destructive calls as an optimization where it is safe to do so.

Self-evaluating forms and quoting

Lisp evaluates expressions which are entered by the user. Most expressions evaluate to some other (usually, simpler) expression -- for instance, a variable evaluates to its value; (+ 2 3) evaluates to 5. However, some expressions evaluate to themselves. They are parsed by the read function, but are left alone by eval. Numbers and strings are this way: if you enter 5 into Lisp, you just get back the same 5.

Other expressions -- such as lists and symbols -- can also be marked to prevent them from being evaluated. This is the role of the quote special form, or its abbreviation ' (a single quotation mark). For instance, usually if you enter the symbol foo you will get back the value of that variable -- or an error, if there is no such binding. If you wish to refer to the symbol itself, you enter (quote foo) or, usually, 'foo.

More complex forms of quoting are used with macros. For instance, both Common Lisp and Scheme support the backquote or quasiquote, entered with the ` character. This is almost the same as the plain quote, except it allows variables to be interpolated into a quoted list.

Self-evaluating forms and quoted forms are Lisp's equivalent of literals. However, they are not necessarily constants. In some Lisp dialects it is possible to modify the values of literals in program code. For instance, if a quoted form is used in the body of a function, and is changed as a side-effect, that function's behavior may differ on subsequent iterations. This is usually a bug, and is undefined behavior in some dialects. When behavior like this is intentional, using a closure is the explicit way to do it.

List structure of program code

A fundamental distinction between Lisp and other languages is that in Lisp, program code is not simply text. Parenthesized S-expressions, as depicted above, are the printed representation of Lisp code, but as soon as these are entered into a Lisp system they are translated by the parser (called the READ function) into linked list and tree structures in memory. Lisp macros operate on these structures, not on the program text. In contrast, in most other languages the parser's output is purely internal to the language implementation and cannot be manipulated by the programmer. Macros in C, for instance, operate on the level of the preprocessor, before the parser is invoked, and cannot re-structure the program code in the way Lisp macros can.

In simplistic Lisp implementations, this list structure is directly interpreted to run the program; a function is literally a piece of list structure which is traversed by the interpreter in executing it. However, most actual Lisp systems (including all conforming Common Lisp systems) also include a compiler which transforms such functions into machine code.

Evaluation and the REPL

Lisp languages are frequently used with an interactive command line, which may be combined with an integrated development environment. The user types in expressions at the command line, or directs the IDE to transmit them to the Lisp system. Lisp reads the entered expressions, evaluates them, and prints the result. For this reason, the Lisp command line is called a "read-eval-print loop", or REPL.

The basic operation of the REPL is as follows. This is a simplistic description which omits many elements of a real Lisp, such as quoting and macros.

The read function accepts textual S-expressions as input, and parses them into list structure. For instance, if you type the string (+ 1 2) at the prompt, read translates this into a linked list with three elements -- the symbol +, the number 1, and the number 2. It so happens that this list is also a valid piece of Lisp code; that is, it can be evaluated. This is because the car of the list names a function -- the addition operation -- and the cdr (1 2) is a valid list of arguments to that function.

The eval function evaluates list structure, returning some other piece of structure as a result. Evaluation does not have to mean interpretation; some Lisp systems compile every expression to native machine code. It is simple, however, to describe evaluation as interpretation: To evaluate a list whose car names a function, eval first evaluates each of the arguments given in its cdr, then applies the function to the arguments. In this case, the function is addition, and applying it to the argument-list (1 2) yields the answer 3. This is the result of the evaluation.

It is the job of the print function to represent output to the user. For a simple result such as 3 this is trivial. An expression which evaluated to a piece of list structure would require that print traverse the list and print it out as an S-expression.

To implement a Lisp REPL, it is necessary only to implement these three functions and an infinite-loop function. (Naturally, the implementation of eval will be complicated, since it must also implement all the primitive functions like car and + and special forms like if.) This done, a basic REPL itself is but a single line of code: (loop (print (eval (read)))).

Example programs

Here are some examples of Lisp code. While not typical of Lisp programs used in industry, they are typical of Lisp as it is usually taught in computer science courses.

As the reader may have noticed from the above discussion, Lisp syntax lends itself naturally to recursion. Mathematical problems such as the enumeration of recursively-defined sets are simple to express in this notation. This function evaluates to the factorial of its argument:

(defun factorial (n)
  (if (<= n 1)
      1
      (* n (factorial (- n 1)))))

This is an alternative function, which is more efficient in some Lisp systems because it uses tail recursion:

(defun factorial (n &optional (acc 1))
  (if (<= n 1)
      acc
      (factorial (- n 1) (* acc n))))

Here's a contrasting iterative version using Common Lisp's loop macro:

(defun factorial (n)
   (loop for i from 1 to n
         for fac = 1 then (* fac i)
         finally return fac))

The following function takes a list argument and evaluates to the reverse of the list. (Lisp actually has a built-in reverse function which does the same thing.)

(defun -reverse (l &optional acc)
  (if (atom l)
      acc
      (reverse (cdr l) (cons (car l) acc))))

Object systems

Various object systems and models have been built on top of, alongside, or into Lisp, including:

Flavors, built at MIT
The Common Lisp Object System, CLOS (descended from Flavors)

CLOS features multiple inheritance, multiple dispatch ("multimethods"), and a powerful system of "method combinations". In fact, Common Lisp, which includes CLOS, was the first object-oriented language to be officially standardized.

Implementation

There have been many implementations of Lisp languages. Very early Lisps were implemented as interpreters, although native-code compilation was also an early feature. During the 1980s, a number of firms produced Lisp machines -- computers dedicated to running Lisp, whose operating system and hardware were customized for Lisp. Almost all modern Common Lisp systems are native-code compilers, while most Scheme systems are interpreters. A Lisp system may be implemented using a SECD virtual machine.

Genealogy and Variants

Over its almost fifty-year history, Lisp has spawned many variations on the core theme of an S-expression language. Moreover, each given dialect may have several implementations -- for instance, there are more than a dozen implementations of Common Lisp.

Differences between dialects may be quite significant -- for instance, Common Lisp and Scheme do not even use the same keyword to define functions! Within a dialect that is standardized, however, conformant implementations support the same core language, but with different extensions and libraries.

(Note: The following list is a mix of dialects and implementations, and is far from complete and not in chronological order!)

Lisp -- McCarthy's original version, developed at MIT.
Common Lisp -- descended mainly from ZetaLISP and Franz, with some InterLISP input. Prevailing standard for industrial use today.
MacLisp -- developed for MIT's Project MAC (no relation to Apple's Macintosh, or to MacCarthy (sic)), direct descendant of LISP.
ZetaLisp -- used on the Lisp machines, direct descendant of MACLisp.
InterLisp -- developed at MIT, later adopted as a "west coast" Lisp for the Xerox Lisp machines. A small version called "InterLISP 65" was published for Atari's 6502-based computer line.
Franz Lisp -- originally a Berkeley project; later run by Franz, Inc.
Gold Hill Common Lisp -- an early PC implementation of Common Lisp.
Coral Lisp -- an implementation of LISP for the Macintosh.
Scheme -- a minimalist LISP originally designed for teaching; an early user of lexical variable scoping rather than dynamic scoping.
AutoLISP/Visual LISP -- customization language for the AutoCAD product.
Emacs Lisp -- scripting language for the Emacs editor.
Oaklisp -- an object-oriented dialect of Scheme with first-class classes.
Guile - a GNU implementation of Scheme designed for extension systems
Cambridge Lisp -- originally implemented on IBM mainframes; published by Metacomco for the Amiga.
the Knowledge Representation System
Lispkit Lisp -- a purely functional ("pure Lisp") dialect implemented on a virtual machine (the SECD machine) and used as a testbed for experimentation with functional language concepts.
Symmetric Lisp -- A parallel Lisp in which environments are first-class objects. It is implemented in Common Lisp.
STING -- A parallel dialect of Scheme intended to serve as a high-level operating system for symbolic programming languages. Features include first-class threads and processors and customisable scheduling policies.
*LISP (STARLISP) -- A data-parallel extension of Common LISP for the Connection Machine, uses "pvars".

External links

http://lisp.org -- Association of Lisp Users
http://alu.cliki.net/ -- Association of Lisp Users Wiki, a general discussion of things Lispish
http://www.cliki.net/ -- CLiki, a wiki about free software in Common Lisp.
http://www.cons.org/ -- a collection of Lisp-related sites
http://www.gnu.org/software/gcl -- a GNU cross-platform Common Lisp implementation
an interactive LISP course
Design patterns in Lisp
Mid-Sweden University Sundsvall Common Lisp B-level course, Notes from the lectures, spring of 1997 (PDF document format)
http://www.gigamonkeys.com/book/ -- A book on teaching yourself Lisp.
http://www.lisp.org/table/systems.htm -- A list of Common Lisp implementations.

@@ Line 84: / Line 84: @@
  (setq bar (cons 'x (cdr foo)))
-— the lists <TT>foo</TT> and <TT>bar</TT> are <TT>(a b c)</TT> and <TT>(x b c)</TT> respectively. However, the tail <TT>(b c)</TT> is the same structure in both lists. Altering it, such as by replacing the <TT>c</TT> with a <TT>goose</TT>, will affect both lists.
+? the lists <TT>foo</TT> and <TT>bar</TT> are <TT>(a b c)</TT> and <TT>(x b c)</TT> respectively. However, the tail <TT>(b c)</TT> is the same structure in both lists. Altering it, such as by replacing the <TT>c</TT> with a <TT>goose</TT>, will affect both lists.
 Sharing structure rather than copying can be a dramatic performance improvement. However, it means that Lisp functions can alter lists passed to them as arguments. This can be a source of bugs, and functions which alter their arguments are documented as ''destructive'' for this very reason.
-Aficionados of [[functional programming]] avoid destructive functions. In the Scheme dialect, which favors the functional style, the names of destructive functions are marked with a cautionary exclamation point, or "bang" — such as <TT>set-car!</TT> (read ''set car bang''), which replaces the car of a cons. In the Common Lisp dialect, destructive functions are commonplace; in fact, the language includes a special facility, <TT>setf</TT>, to make it easier to define and use them. A frequent style in Common Lisp is to write code functionally (without destructive calls) when prototyping, then to add destructive calls as an optimization where it is safe to do so.
+Aficionados of [[functional programming]] avoid destructive functions. In the Scheme dialect, which favors the functional style, the names of destructive functions are marked with a cautionary exclamation point, or "bang" ? such as <TT>set-car!</TT> (read ''set car bang''), which replaces the car of a cons. In the Common Lisp dialect, destructive functions are commonplace; in fact, the language includes a special facility, <TT>setf</TT>, to make it easier to define and use them. A frequent style in Common Lisp is to write code functionally (without destructive calls) when prototyping, then to add destructive calls as an optimization where it is safe to do so.
 ===Self-evaluating forms and quoting===
@@ Line 168: / Line 168: @@
 Differences between dialects may be quite significant -- for instance, Common Lisp and Scheme do not even use the same keyword to define functions! Within a dialect that is standardized, however, conformant implementations support the same core language, but with different extensions and libraries.
-''Note: The following list is a mix of dialects and implementations, and is not in chronological order!''
+''(Note: The following list is a mix of dialects and implementations, and is far from complete and not in chronological order!)''
 * Lisp -- McCarthy's original version, developed at MIT.