Jump to content

WordNet: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
new section Knowledge Structure
Benizi (talk | contribs)
Line 35: Line 35:


== Knowledge Structure ==
== Knowledge Structure ==
Both nouns and verbs are organized in to hierarchies, defined by hypernym or ''IS A'' relationships. For instance, the sense 1 of the word ''dog'' would have the following hypernym hierarchy; the words on the same level are synonyms of each other: some sense of ''dog'' is synonymous with some other senses of ''domestic dog'' and ''Canis familiaris'', and so on. Each set of synonyms, also known as a sysnset, has a unique index and share their properties, such as gloss (or dictionary) definition.
Both nouns and verbs are organized in to hierarchies, defined by hypernym or ''IS A'' relationships. For instance, the sense 1 of the word ''dog'' would have the following hypernym hierarchy; the words on the same level are synonyms of each other: some sense of ''dog'' is synonymous with some other senses of ''domestic dog'' and ''Canis familiaris'', and so on. Each set of synonyms, also known as a synset, has a unique index and share their properties, such as gloss (or dictionary) definition.


dog, domestic dog, Canis familiaris
dog, domestic dog, Canis familiaris
Line 50: Line 50:


In the case of adjectives, the organization is different. Two opposite 'head' senses work as binary poles, while 'satellite' synonyms connect to each of the heads via synonymy relations. Thus, the hierarchies, and the concept of lexicographic files, do not apply here the same way they do for nouns and verbs.
In the case of adjectives, the organization is different. Two opposite 'head' senses work as binary poles, while 'satellite' synonyms connect to each of the heads via synonymy relations. Thus, the hierarchies, and the concept of lexicographic files, do not apply here the same way they do for nouns and verbs.



== Limitations ==
== Limitations ==

Revision as of 17:20, 5 September 2005

WordNet is a semantic lexicon for the English language. It groups English words into sets of synonyms called synsets, provides short definitions, and records the various semantic relations between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications. The database and software tools have been released under a BSD style license and can be downloaded and used freely. The database can also be browsed online.

WordNet was created and is being maintained at the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George A. Miller. Development began in 1985. Over the years, the project received about $3 million of funding, mainly from government agencies interested in machine translation.

Database contents

As of 2005, the database contains about 150,000 words organized in over 115,000 synsets for a total of 203,000 word-sense pairs; in compressed form, it is about 12 megabytes large.

WordNet distinguishes between nouns, verbs, adjectives and adverbs on the assumption that these are stored differently in the human brain. Every synset contains a group of synonymous words or collocations (a collocation is a sequence of words that go together to form a specific meaning, such as "car pool"); words typically participate in several synsets. The meaning of the synsets is further clarified with short defining glosses. A typical example synset with gloss is:

good, right, ripe -- (most suitable or right for a particular purpose; "a good time to plant tomatoes"; "the right time to act"; "the time is ripe for great sociological changes")

Every synset is connected to other synsets via a number of relations. These relation vary based on the type of word:

  • Nouns
    • synonyms: synsets with similar meaning
    • hypernyms: Y is a hypernym of X if every X is a (kind of) Y
    • hyponyms: Y is a hyponym of X if every Y is a (kind of) X
    • coordinate terms: Y is a coordinate term of X if X and Y share a hypernym
    • holonym: Y is a holonym of X if X is a part of Y
    • meronym: Y is a meronym of X if Y is a part of X
  • Verbs
    • synonyms
    • hypernym: the noun Y is a hypernym of the verb X if the activity X is a (kind of) Y
    • coordinate terms: those verbs sharing a common hypernym
  • Adjectives
    • synonyms and related nouns
    • antonyms: adjectives of opposite meaning
  • Adverbs
    • synonyms and root adjectives
    • antonyms

WordNet also provides the polysemy count of a word: the number of synsets that contain the word. If a word participates in several synsets (i.e. has several senses), then typically some senses are much more common than others. WordNet quantifies this by the frequency score: in several sample texts all words were semantically tagged with the corresponding synset, and then it was counted how often a word appeared in a specific sense.

The database's interface is able to deduce the root form of a word from the user's input; only the root form is stored in the database.

Knowledge Structure

Both nouns and verbs are organized in to hierarchies, defined by hypernym or IS A relationships. For instance, the sense 1 of the word dog would have the following hypernym hierarchy; the words on the same level are synonyms of each other: some sense of dog is synonymous with some other senses of domestic dog and Canis familiaris, and so on. Each set of synonyms, also known as a synset, has a unique index and share their properties, such as gloss (or dictionary) definition.

 dog, domestic dog, Canis familiaris
    => canine, canid
       => carnivore
         => placental, placental mammal, eutherian, eutherian mammal
           => mammal
             => vertebrate, craniate
               => chordate
                 => animal, animate being, beast, brute, creature, fauna
                   => ...

At the top level, these hierarchies are organized in to 25 primitive groups for nouns, and 15 for verbs. These groups form lexicographic files at maintenance level.

In the case of adjectives, the organization is different. Two opposite 'head' senses work as binary poles, while 'satellite' synonyms connect to each of the heads via synonymy relations. Thus, the hierarchies, and the concept of lexicographic files, do not apply here the same way they do for nouns and verbs.

Limitations

Unlike other dictionaries, WordNet does not include information about etymology, pronunciation and the forms of irregular verbs and contains only limited information about usage.

The actual lexicographical and semantical information is maintained in lexicographer files, which are then processed by a tool called grind to produce the distributed database. Both grind and the lexicographer files are freely available, but modifying and maintaining the database is nonetheless difficult.

The project EuroWordNet has produced WordNets for several European languages and linked them together; these are not freely available however. The Global Wordnet project attempts to coordinate the production and linking of wordnets for all languages. Oxford University Press, the publishers of the Oxford English Dictionary have voiced plans to produce their own online WordNet.

The eXtended WordNet is a project at the University of Texas at Dallas which aims to improve WordNet by semantically parsing the glosses, thus making the information contained in these definitions available for automatic knowledge processing systems. It is also freely available under a license similar to WordNet's.

The GCIDE project produces a dictionary by combining a public domain Webster's Dictionary from 1913 with some WordNet definitions and material provided by volunteers. It is released under the copyleft license GPL.

The hypernym/hyponym relationships among the noun synsets can be used as an ontology in the computer science sense. The SUMO upper ontology has produced a mapping from the WordNet synsets for nouns and verbs to SUMO classes. The OpenCyc upper ontology is also linked to WordNet. WordNet was the primary source for constructing the lower classes of the SENSUS ontology.

FrameNet is a similar project. It consists of a lexicon which is based on annotating over 100,000 sentences with their semantic properties. the unit in focus is the lexical frame, a type of state or event together with the properites associated with it.

See also:

External links: