Pitman–Yor process: Difference between revisions
No edit summary |
Entropeneur (talk | contribs) mNo edit summary |
||
(22 intermediate revisions by 16 users not shown) | |||
Line 1: | Line 1: | ||
{{expert| [[probability theory]]}} |
|||
In [[probability theory]], a '''Pitman–Yor process'''<ref name=Ishwaran03>{{cite journal |
In [[probability theory]], a '''Pitman–Yor process'''<ref name=Ishwaran03>{{cite journal |
||
|first1=H|last1=Ishwaran |
|first1=H|last1=Ishwaran |
||
Line 5: | Line 4: | ||
|title=Generalized weighted Chinese restaurant processes for species sampling mixture models |
|title=Generalized weighted Chinese restaurant processes for species sampling mixture models |
||
|journal=Statistica Sinica |
|journal=Statistica Sinica |
||
|volume=13|pages=1211– |
|volume=13|pages=1211–1235 |
||
|year=2003 |
|year=2003 |
||
⚫ | |||
⚫ | |||
⚫ | |||
|first1=Jim |last1=Pitman |first2=Marc |last2=Yor |
|first1=Jim |last1=Pitman |first2=Marc |last2=Yor |
||
|title=The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator |
|title=The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator |
||
Line 15: | Line 13: | ||
|doi=10.1214/aop/1024404422 |
|doi=10.1214/aop/1024404422 |
||
|mr=1434129 | zbl = 0880.60076 |
|mr=1434129 | zbl = 0880.60076 |
||
⚫ | |||
}}</ref> |
|||
⚫ | |||
,<ref name=Teh2006>{{cite journal |
|||
|first1=Yee Whye |last1=Teh |
|first1=Yee Whye |last1=Teh |
||
|title=A hierarchical Bayesian language model based on Pitman–Yor processes |
|title=A hierarchical Bayesian language model based on Pitman–Yor processes |
||
|journal=Proceedings of the 21st International Conference on Computational Linguistics and the 44th |
|journal=Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics |
||
|year=2006 |
|year=2006 |
||
}}</ref> denoted PY(''d'', ''θ'', ''G''<sub>0</sub>), is a [[stochastic process]] whose sample path is a [[probability distribution]]. A random sample from this process is an infinite discrete probability distribution, consisting of an infinite set of atoms drawn from ''G''<sub>0</sub>, with weights drawn from a two-parameter [[ |
}}</ref> denoted PY(''d'', ''θ'', ''G''<sub>0</sub>), is a [[stochastic process]] whose sample path is a [[probability distribution]]. A random sample from this process is an infinite discrete probability distribution, consisting of an infinite set of atoms drawn from ''G''<sub>0</sub>, with weights drawn from a two-parameter [[Poisson-Dirichlet distribution]]. The process is named after [[Jim Pitman]] and [[Marc Yor]]. |
||
The parameters governing the Pitman–Yor process are: 0 ≤ ''d'' < 1 a discount parameter, a strength parameter ''θ'' > −''d'' and a base distribution ''G''<sub>0</sub> over a probability space ''X''. When ''d'' = 0, it becomes the [[Dirichlet process]]. The discount parameter gives the Pitman–Yor process more flexibility over tail behavior than the Dirichlet process, which has exponential tails. This makes Pitman–Yor process useful for modeling data with [[power-law]] tails (e.g., word frequencies in natural language). |
The parameters governing the Pitman–Yor process are: 0 ≤ ''d'' < 1 a discount parameter, a strength parameter ''θ'' > −''d'' and a base distribution ''G''<sub>0</sub> over a probability space ''X''. When ''d'' = 0, it becomes the [[Dirichlet process]]. The discount parameter gives the Pitman–Yor process more flexibility over tail behavior than the Dirichlet process, which has exponential tails. This makes Pitman–Yor process useful for modeling data with [[power-law]] tails (e.g., word frequencies in natural language). |
||
The exchangeable random partition induced by the Pitman–Yor process is an example of a [[Poisson–Kingman partition]], and of a [[Gibbs type random partition]]. |
The exchangeable random partition induced by the Pitman–Yor process is an example of a [[Chinese_restaurant_process#Two-parameter_generalization|Chinese restaurant process]], a [[Poisson–Kingman partition]], and of a [[Gibbs type random partition]]. |
||
==Naming conventions== |
==Naming conventions== |
||
The name "Pitman–Yor process" was coined by Ishwaran and James<ref>{{cite journal |
The name "Pitman–Yor process" was coined by Ishwaran and James<ref>{{cite journal |
||
|first1=H. |
|first1=H. |
||
Line 38: | Line 33: | ||
|journal=Journal of the American Statistical Association |
|journal=Journal of the American Statistical Association |
||
|year=2001 |
|year=2001 |
||
|doi=10.1198/016214501750332758 |
|||
⚫ | |||
|volume=96 |
|||
|issue=453 |
|||
|pages=161–173 |
|||
|citeseerx=10.1.1.36.2559 |
|||
⚫ | |||
|first1=M. |
|first1=M. |
||
|last1=Perman |
|last1=Perman |
||
Line 47: | Line 47: | ||
|title=Size-biased sampling of Poisson point processes and excursions |
|title=Size-biased sampling of Poisson point processes and excursions |
||
|journal=Probability Theory and Related Fields |
|journal=Probability Theory and Related Fields |
||
|volume=92 |
|||
|pages=21–39 |
|||
|year=1992 |
|year=1992 |
||
|doi=10.1007/BF01205234 |
|||
|doi-access=free |
|||
}}</ref><ref name=Perman1990>{{cite thesis |
}}</ref><ref name=Perman1990>{{cite thesis |
||
|first1=M. |
|first1=M. |
||
|last1=Perman |
|last1=Perman |
||
|title=Random Discrete Distributions Derived from Subordinators |
|title=Random Discrete Distributions Derived from Subordinators |
||
| |
|publisher=Department of Statistics, University of California at Berkeley |
||
|year=1990 |
|year=1990 |
||
⚫ | |||
}}</ref> so technically it perhaps may have been better named the Perman–Pitman–Yor process. |
|||
It is also sometimes referred to as the two-parameter Poisson–Dirichlet process, after the two-parameter generalization of the Poisson–Dirichlet distribution which describes the joint distribution of the sizes of the atoms in the random measure, sorted by strictly decreasing order |
It is also sometimes referred to as the two-parameter Poisson–Dirichlet process, after the two-parameter generalization of the Poisson–Dirichlet distribution which describes the joint distribution of the sizes of the atoms in the [[random measure]], sorted by strictly decreasing order. |
||
==See also== |
==See also== |
||
*[[Chinese restaurant process]] |
*[[Chinese restaurant process]] |
||
*[[Dirichlet distribution]] |
*[[Dirichlet distribution]] |
||
*[[Dirichlet process]] |
|||
*[[Latent Dirichlet allocation]] |
*[[Latent Dirichlet allocation]] |
||
Line 69: | Line 72: | ||
{{Stochastic processes}} |
{{Stochastic processes}} |
||
{{DEFAULTSORT:Pitman-Yor process}} |
|||
[[Category:Stochastic processes]] |
[[Category:Stochastic processes]] |
||
[[Category: |
[[Category:Nonparametric Bayesian statistics]] |
||
[[Category: |
[[Category:Cluster analysis algorithms]] |
||
Latest revision as of 07:12, 7 July 2024
In probability theory, a Pitman–Yor process[1][2][3][4] denoted PY(d, θ, G0), is a stochastic process whose sample path is a probability distribution. A random sample from this process is an infinite discrete probability distribution, consisting of an infinite set of atoms drawn from G0, with weights drawn from a two-parameter Poisson-Dirichlet distribution. The process is named after Jim Pitman and Marc Yor.
The parameters governing the Pitman–Yor process are: 0 ≤ d < 1 a discount parameter, a strength parameter θ > −d and a base distribution G0 over a probability space X. When d = 0, it becomes the Dirichlet process. The discount parameter gives the Pitman–Yor process more flexibility over tail behavior than the Dirichlet process, which has exponential tails. This makes Pitman–Yor process useful for modeling data with power-law tails (e.g., word frequencies in natural language).
The exchangeable random partition induced by the Pitman–Yor process is an example of a Chinese restaurant process, a Poisson–Kingman partition, and of a Gibbs type random partition.
Naming conventions
[edit]The name "Pitman–Yor process" was coined by Ishwaran and James[5] after Pitman and Yor's review on the subject.[2] However the process was originally studied in Perman et al.[6][7]
It is also sometimes referred to as the two-parameter Poisson–Dirichlet process, after the two-parameter generalization of the Poisson–Dirichlet distribution which describes the joint distribution of the sizes of the atoms in the random measure, sorted by strictly decreasing order.
See also
[edit]References
[edit]- ^ Ishwaran, H; James, L F (2003). "Generalized weighted Chinese restaurant processes for species sampling mixture models". Statistica Sinica. 13: 1211–1235.
- ^ a b Pitman, Jim; Yor, Marc (1997). "The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator". Annals of Probability. 25 (2): 855–900. CiteSeerX 10.1.1.69.1273. doi:10.1214/aop/1024404422. MR 1434129. Zbl 0880.60076.
- ^ Pitman, Jim (2006). Combinatorial Stochastic Processes. Vol. 1875. Berlin: Springer-Verlag. ISBN 9783540309901.
- ^ Teh, Yee Whye (2006). "A hierarchical Bayesian language model based on Pitman–Yor processes". Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics.
- ^ Ishwaran, H.; James, L. (2001). "Gibbs Sampling Methods for Stick-Breaking Priors". Journal of the American Statistical Association. 96 (453): 161–173. CiteSeerX 10.1.1.36.2559. doi:10.1198/016214501750332758.
- ^ Perman, M.; Pitman, J.; Yor, M. (1992). "Size-biased sampling of Poisson point processes and excursions". Probability Theory and Related Fields. 92: 21–39. doi:10.1007/BF01205234.
- ^ Perman, M. (1990). Random Discrete Distributions Derived from Subordinators (Thesis). Department of Statistics, University of California at Berkeley.