Jump to content

Factored language model

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by SmackBot (talk | contribs) at 22:55, 10 December 2006 (Date/correct the maintenance tags using AWB). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The factored language model (FLM) is an extension of conventional Language model. In an FLM, each word is viewed as a vector of k factors: . An FLM provides the probabilistic model where the prediction of factor is based on parents . For an example, if represents word token and represents Part of speech tag for English, the model gives a model for predicting current work token based on traditional Ngram model as well as Part of speech tag of the previous word.

A main advantage of factored language models is they allow users to put in linguistic knowledge such as explicitly model the relationship between word tokens and Part of speech in English, or morphological information (stems, root, etc.) in Arabic.

Like N-gram models, smoothing techniques are necessary in parameter estimation. In particular, generalized backing-off is used in training an FLM.

References

  • J Bilmes and K Kirchhoff (2003). "Factored Language Models and Generalized Parallel Backoff" (PDF). Human Language Technology Conference. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help)