Monte Carlo integration: Difference between revisions

Content deleted Content added

Inline

Revision as of 02:57, 1 June 2013

An illustration of Monte Carlo integration. In this example, the domain D is the inner circle and the domain E is the square. Because the square's area can be easily calculated, the area of the circle can be estimated by the ratio (0.8) of the points inside the circle (40) to the total number of points (50), yielding an approximation for π/4 ≈ 0.8

In mathematics, Monte Carlo integration is a technique for numerical integration using random numbers. It is a particular method of Monte Carlo methods that numerically computes a definite integral. While other algorithms usually evaluate the integrand at a regular grid,^[1] Monte Carlo algorithms randomly choose the points at which the integrand is evaluated.^[2] This method is particularly useful for higher dimensional integrals.^[3]

Informally, to estimate the area of a domain D, first pick a simple domain E whose area is easily calculated and which contains D. Now pick a sequence of random points that fall within E. Some fraction of these points will also fall within D. The area of D is then estimated as this fraction multiplied by the area of E.

Particular techniques to perform a Monte Carlo integration can be considered. Uniform sampling, stratified sampling and importance sampling are the most common.

Overview

Consider the set Ω, subset of R^m on which the multidimensional definite integral

I=\int _{\Omega }f({\overline {\mathbf {x} }})\,d{\overline {\mathbf {x} }}

is to be calculated with known volume of Ω

V=\int _{\Omega }d{\overline {\mathbf {x} }}

The most naive approach to compute I is to sample points uniformly on Ω:^[4] given N uniform samples,

{\overline {\mathbf {x} }}_{1},\cdots ,{\overline {\mathbf {x} }}_{N}\in \Omega ,

I can be approximated by

I\approx Q_{N}\equiv V{\frac {1}{N}}\sum _{i=1}^{N}f({\overline {\mathbf {x} }}_{i})=V\langle f\rangle

.

This is because the law of large numbers ensures that

\lim _{N\to \infty }Q_{N}=I

.

One has to take into account that implementation issues such as pseudorandom number generators and limited floating point precision can lead to systematic errors, nevertheless, only in very particular cases this has to be taken into account.

Given the estimation of I from Q_N, the error bars of Q_N can be estimated by the sample variance using the unbiased estimate of the variance:

\mathrm {Var} (f)\equiv \sigma _{N}^{2}={\frac {1}{N-1}}\sum _{i=1}^{N}\left(f({\overline {\mathbf {x} }}_{i})-\langle f\rangle \right)^{2}.

which leads to

\mathrm {Var} (Q_{N})={\frac {V^{2}}{N^{2}}}\sum _{i=1}^{N}\mathrm {Var} (f)=V^{2}{\frac {\mathrm {Var} (f)}{N}}=V^{2}{\frac {\sigma _{N}^{2}}{N}}

.

As long as the sequence

\left\{\sigma _{1}^{2},\sigma _{2}^{2},\sigma _{3}^{2},\ldots \right\}

is bounded, this variance decreases asymptotically to zero as 1/N. The estimation of the error of Q_N is thus

\delta Q_{N}\approx {\sqrt {\mathrm {Var} (Q_{N})}}=V{\frac {\sigma _{N}}{\sqrt {N}}},

which decreases as ${\tfrac {1}{\sqrt {N}}}$ and the familiar law of random walk applies: to reduce the error by a factor of 10 requires a 100-fold increase in the number of sample points. This result is quite strong in the sense that it does not depend on the number of dimensions of the integral: most of the deterministic methods like trapezoidal rule strongly depend on the dimension of the integral, because they use a grid to fill the space to compute the integral, and the grid grows exponentially with the dimensions.^[5]

The above expression provides a statistical estimate of the error on the result, which is not a strict error bound; random sampling of the region may not uncover all the important features of the function, resulting in an underestimate of the error.

Example

A paradigmatic example of a Monte Carlo integration is the estimation of π. Consider the function

H\left(x,y\right)={\begin{cases}1&{\text{if }}x^{2}+y^{2}\leq 1\\0&{\text{else}}\end{cases}}

and the set Ω = [−1,1] × [−1,1] with V = 4. Notice that

I_{\pi }=\int _{\Omega }H(x,y)dxdy=\pi .

Thus, a crude way of calculating the value of π with Monte Carlo integration is to pick N random numbers on Ω and compute

Q_{N}=4{\frac {1}{N}}\sum _{i=0}^{N}H(x_{i},y_{i})

In the figure the relative error is shown, following the expected scaling.

Wolfram Mathematica Example

The code below describes a process of integrating the function

func[x\_]:={\frac {1}{1+{\text{Sinh}}[2*x]*({\text{Log}}[x])^{2}}}

using the Monte-Carlo method in Mathematica:

code:

func[x_] := 1/(1 + Sinh[2*x]*(Log[x])^2) p = Plot[func[x], {x, 0.8, 3}]; p1 = Plot[PDF[NormalDistribution[1, 0.399], 1.1*x - 0.1], {x, 0.8, 3}]; Show[{p, p1}] NSolve[D[func[x], x] == 0, x, Reals] Distrib[x_, average_, var_] := PDF[NormalDistribution[average, var], 1.1*x - 0.1] n = 10; RV = RandomVariate[TruncatedDistribution[{0.8, 3}, NormalDistribution[1, 0.399]], n] Int = 1/n Total[func[RV]/Distrib[[RV, 1, 0.399]]*Integrate[Distrib[x, 1, 0.399], {x, 0.8, 3}] NIntegrate[func[x], {x, 0.8, 3}] Int2 = ((3 - 0.8)/10) Total[func[RV]]

Recursive stratified sampling

Recursive stratified sampling is a generalization of one-dimensional adaptive quadratures to multi-dimensional integrals. On each recursion step the integral and the error are estimated using a plain Monte Carlo algorithm. If the error estimate is larger than the required accuracy the integration volume is divided into sub-volumes and the procedure is recursively applied to sub-volumes.

The ordinary 'dividing by two' strategy does not work for multi-dimensions as the number of sub-volumes grows way too quickly to keep track. Instead one estimates along which dimension a subdivision should bring the most dividends and only subdivides the volume along this dimension.

The stratified sampling algorithm concentrates the sampling points in the regions where the variance of the function is largest thus reducing the grand variance and making the sampling more effective, as shown on the illustration.

The popular MISER routine implements a similar algorithm.

MISER Monte Carlo

The MISER algorithm is based on recursive stratified sampling. This technique aims to reduce the overall integration error by concentrating integration points in the regions of highest variance.^[6]

The idea of stratified sampling begins with the observation that for two disjoint regions a and b with Monte Carlo estimates of the integral $E_{a}(f)$ and $E_{b}(f)$ and variances $\sigma _{a}^{2}(f)$ and $\sigma _{b}^{2}(f)$ , the variance Var(f) of the combined estimate

E(f)={\tfrac {1}{2}}\left(E_{a}(f)+E_{b}(f)\right)

is given by,

\mathrm {Var} (f)={\frac {\sigma _{a}^{2}(f)}{4N_{a}}}+{\frac {\sigma _{b}^{2}(f)}{4N_{b}}}

It can be shown that this variance is minimized by distributing the points such that,

{\frac {N_{a}}{N_{a}+N_{b}}}={\frac {\sigma _{a}}{\sigma _{a}+\sigma _{b}}}

Hence the smallest error estimate is obtained by allocating sample points in proportion to the standard deviation of the function in each sub-region.

The MISER algorithm proceeds by bisecting the integration region along one coordinate axis to give two sub-regions at each step. The direction is chosen by examining all d possible bisections and selecting the one which will minimize the combined variance of the two sub-regions. The variance in the sub-regions is estimated by sampling with a fraction of the total number of points available to the current step. The same procedure is then repeated recursively for each of the two half-spaces from the best bisection. The remaining sample points are allocated to the sub-regions using the formula for N_a and N_b. This recursive allocation of integration points continues down to a user-specified depth where each sub-region is integrated using a plain Monte Carlo estimate. These individual values and their error estimates are then combined upwards to give an overall result and an estimate of its error.

Importance sampling

VEGAS Monte Carlo

The VEGAS algorithm takes advantage of the information stored during the sampling, and uses it and importance sampling to efficiently estimate the integral I. It samples points from the probability distribution described by the function |f| so that the points are concentrated in the regions that make the largest contribution to the integral.

In general, if the Monte Carlo integral of f is sampled with points distributed according to a probability distribution described by the function g, we obtain an estimate:

E_{g}(f;N)=E\left({\tfrac {f}{g}};N\right)

with a corresponding variance,

\mathrm {Var} _{g}(f;N)=\mathrm {Var} \left({\tfrac {f}{g}};N\right)

If the probability distribution is chosen as

g={\tfrac {|f|}{I(|f|)}}

then it can be shown that the variance $V_{g}(f;N)$ vanishes, and the error in the estimate will be zero. In practice it is not possible to sample from the exact distribution g for an arbitrary function, so importance sampling algorithms aim to produce efficient approximations to the desired distribution.

The VEGAS algorithm approximates the exact distribution by making a number of passes over the integration region which creates the histogram of the function f. Each histogram is used to define a sampling distribution for the next pass. Asymptotically this procedure converges to the desired distribution.^[7] In order to avoid the number of histogram bins growing like K_d, the probability distribution is approximated by a separable function:

g(x_{1},x_{2},\ldots )=g_{1}(x_{1})g_{2}(x_{2})\ldots

so that the number of bins required is only Kd. This is equivalent to locating the peaks of the function from the projections of the integrand onto the coordinate axes. The efficiency of VEGAS depends on the validity of this assumption. It is most efficient when the peaks of the integrand are well-localized. If an integrand can be rewritten in a form which is approximately separable this will increase the efficiency of integration with VEGAS.

VEGAS incorporates a number of additional features, and combines both stratified sampling and importance sampling.^[7] The integration region is divided into a number of "boxes", with each box getting a fixed number of points (the goal is 2). Each box can then have a fractional number of bins, but if bins/box is less than two, Vegas switches to a kind variance reduction (rather than importance sampling).

This routines uses the VEGAS Monte Carlo algorithm to integrate the function f over the dim-dimensional hypercubic region defined by the lower and upper limits in the arrays xl and xu, each of size dim. The integration uses a fixed number of function calls. The result and its error estimate are based on a weighted average of independent samples.

The VEGAS algorithm computes a number of independent estimates of the integral internally, according to the iterations parameter described below, and returns their weighted average. Random sampling of the integrand can occasionally produce an estimate where the error is zero, particularly if the function is constant in some regions. An estimate with zero error causes the weighted average to break down and must be handled separately.

Metropolis–Hastings algorithm

Importance sampling provides a very important tool to perform Monte-Carlo integration.^[3] The main result of importance sampling to this method is that the uniform sampling of ${\overline {\mathbf {x} }}$ is a particular case of a more generic choice, on which the samples are drawn from any distribution $p({\overline {\mathbf {x} }})$ . The idea is that $p({\overline {\mathbf {x} }})$ can be chosen to decrease the variance of the measurement Q_N.

Consider the following example where one would like to numerically integrate a gaussian function, centered at 0, with σ = 1, from −1000 to 1000. Naturally, if the samples are drawn uniformly on the interval [−1000, 1000], only a very small part of them would be significant to the integral. This can be improved by choosing a different distribution from where the samples are chosen from, for instance by sampling according to a gaussian distribution centered at 0, with σ = 1. Of course the "right" choice strongly depends on the integrand.

Formally, given a set of samples chosen from a distribution

p({\overline {\mathbf {x} }}):\qquad {\overline {\mathbf {x} }}_{1},\cdots ,{\overline {\mathbf {x} }}_{N}\in V,

the estimator for I is given by^[3]

Q_{N}\equiv {\frac {1}{Z_{N}}}\sum _{i=1}^{N}{\frac {f({\overline {\mathbf {x} }}_{i})}{p({\overline {\mathbf {x} }}_{i})}}

where

Z_{N}\equiv \sum _{i=1}^{N}{\frac {1}{p({\overline {\mathbf {x} }}_{i})}}

is the normalization. Notice that if the $p({\overline {\mathbf {x} }})$ is a uniform distribution, this estimator is the same as the one introduced in introduction.

The Metropolis-Hastings algorithm is one of the most used algorithms to generate ${\overline {\mathbf {x} }}$ from $p({\overline {\mathbf {x} }})$ ,^[3] thus providing an efficient way of computing integrals.

Notes

^ Press et al, 2007, Chap. 4.
^ Press et al, 2007, Chap. 7.
^ ^a ^b ^c ^d Newman, 1999, Chap. 2.
^ Newman, 1999, Chap. 1.
^ Press et al, 2007
^ Press, 1990, pp190-195.
^ ^a ^b Lepage, 1978

References

R. E. Caflisch, Monte Carlo and quasi-Monte Carlo methods, Acta Numerica vol. 7, Cambridge University Press, 1998, pp. 1–49.
S. Weinzierl, Introduction to Monte Carlo methods,
W.H. Press, G.R. Farrar, Recursive Stratified Sampling for Multidimensional Monte Carlo Integration, Computers in Physics, v4 (1990).
G.P. Lepage, A New Algorithm for Adaptive Multidimensional Integration, Journal of Computational Physics 27, 192-203, (1978)
G.P. Lepage, VEGAS: An Adaptive Multi-dimensional Integration Program, Cornell preprint CLNS 80-447, March 1980
J. M. Hammersley, D.C. Handscomb (1964) Monte Carlo Methods. Methuen. ISBN 0-416-52340-4
Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007). Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8.
Newman, MEJ; Barkema, GT (1999). Monte Carlo Methods in Statistical Physics. Clarendon Press.
Robert, CP; Casella, G (2004). Monte Carlo Statistical Methods (2nd ed.). Springer. ISBN 978-1-4419-1939-7.

External links

Café math : Monte Carlo Integration : A blog article describing Monte Carlo integration (principle, hypothesis, confidence interval)
Module for Monte Carlo Integration
Internet Resources for Monte Carlo Integration

[1] Press et al, 2007, Chap. 4.

[2] Press et al, 2007, Chap. 7.

[newman1999ch2-3] Newman, 1999, Chap. 2.

[newman1999ch1-4] Newman, 1999, Chap. 1.

[5] Press et al, 2007

[6] Press, 1990, pp190-195.

[Lepage,_1978-7] Lepage, 1978

[1]

[2]

[3]

[4]

[5]

[6]

[7]

@@ Line 55: / Line 55: @@
 A paradigmatic example of a Monte Carlo integration is the estimation of π. Consider the function
-:<math>H \left(x,y\right)=\begin{cases}
+:<math>H\left(x,y\right)=\begin{cases}
 & \text{if }x^{2}+y^{2}\leq1\\
 & \text{else}