Talk:Kernel density estimation

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
???	This article has not yet received a rating on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the importance scale.

Mathematics Start‑class Mid‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
Mid	This article has been rated as Mid-priority on the project's priority scale.

(Particularly the section about the risk function.)

Incorrect caption

Note that the figure shows $\sum _{i=1}^{N}W(x-x_{i})$ rather than ${\frac {1}{N}}\sum _{i=1}^{N}W(x-x_{i})$ as the caption says. --anon

How do you know, as there is no y-axis in the picture? Oleg Alexandrov (talk) 03:31, 1 March 2006 (UTC)[reply]

{\frac {1}{N}}\sum _{i=1}^{N}

is an average. An average is never greater than the largest component. If you look at the graph, the blue curve is clearly the sum of the component curves. Zik 03:40, 5 March 2006 (UTC)[reply]

You are right, I fixed the caption. I have no idea how I had missed that. :) Oleg Alexandrov (talk) 01:09, 6 March 2006 (UTC)[reply]

name

In my experience calling the technique Parzen windowing is limited specifically to time-series analysis, and mainly in engineering fields. In general statistics (and in statistical machine learning), the term kernel density estimation is much more common. Therefore I'd propose it be moved there. As an aside, the attribution to Parzen is also historically problematic, since Rosenblatt introduced the technique into the statistics literature in 1956, and it had been used in several more obscure papers as early as the 1870s, and again in the early 1950s. --Delirium 22:59, 26 August 2006 (UTC)[reply]

x

What is x in the equation? --11:06, 5 October 2006 (UTC)

It is a real number, I guess. Oleg Alexandrov (talk) 02:55, 6 October 2006 (UTC)[reply]

Changing the name of this page

The technique called here Parzen window is called kernel density estimation in non parametric statistics. It seems to me to be a much more general term and much clearer for people searching for it. The comment above state the same problem. I also agree that the article should refer to the Parzen-Rosenblatt notion of a kernel, and not just of Parzen. The definition of a Parzen-Rosenblatt kernel should be latter added on the kernel (statistics) page. —The preceding unsigned comment was added by Gpeilon (talk • contribs).

That's fine with me. If you move the page, you should also fix the double redirects. That is, after the move, while viewing the article at the new name, click on "what links here" on the left, and any redirects which point to redirects need to be made to point to the new name. Cheers, Oleg Alexandrov (talk) 03:18, 9 January 2007 (UTC)[reply]

Formula for optimal bandwidth

Hi, I just noticed that the optimal global bandwidth in Rosenblatt, M. The Annals of Mathematical Statistics, Vol. 42, No. 6. (Dec., 1971), pp. 1815-1842. has an additional factor of $2^{\frac {2}{5}}$ . Just an oversight, or is there a reason for the difference that I'm missing? Best, Yeteez 18:34, 24 May 2007 (UTC)[reply]

In addition, what is the lower case n in the optimal bandwidth, it is undefined. CnlPepper (talk) 17:18, 13 December 2007 (UTC)[reply]

Scaling factor

Shouldn't the $\sigma$ in the formula for K(x) be dropped, on the grounds that it is already there in the form of h in the formula for ${\hat {f}}_{h}(x)$ ?

--Santaclaus 15:45, 7 June 2007 (UTC)[reply]

Stata

Though not sure whether it violates the guidelines of what wikipedia is, I like the example section. But I would like to see the commands in some non-proprietory language, e.g. R. --Ben ^T/_C 14:41, 2 July 2007 (UTC)[reply]

Practical Use

Can somebody please add a paragraph on what the practical use of Kernel density estimation is? Provide an example from statistics or econometrics? Thanks!

Kernel?

Isn't a Gaussian with variance of 1 totally arbitrary? On the other hand, using the PDF of your measurement tool as a kernel seems quite meaningful. For example, if you are measuring people's heights and know you can measure to a std. dev of 1/4", then convolving the set of measured heights by a Gaussian with std. dev of 1/4" seems like it captures everything you know about the data set. For example, in the limit of one sample, the estimation would reflect our best guess of the distribution for that one person. 155.212.242.34 22:07, 6 November 2007 (UTC)[reply]

Anybody? —Ben FrantzDale (talk) 16:00, 26 August 2008 (UTC)[reply]

--> I agree with the above poster that a standard gaussian is arbitrary. True, gaussians are often used as the kernel, but the variance of the gaussian is usually selected based on the "coarseness" of the desired result, and therefore not necessarily 1. —Preceding unsigned comment added by Zarellam (talk • contribs) 07:08, 17 April 2009 (UTC)[reply]

The variance then is the parameter h and can still be chosen as desired. I fixed this on the page.170.223.0.55 (talk) 14:57, 27 April 2009 (UTC)[reply]

Properties section - more on $c_{1},c_{2},c_{3}$

It appears that the section Properties tells us how to select $h$ . However, I found several things confusing here, and would like to see these described more clearly.

First, if I'm interpreting correctly, $c_{1}$ and $c_{2}$ would be constants for the standard normal kernel that was earlier stated to be the common choice, e.g. $c_{1}=1$ and $c_{2}\approx 0.28$ . The fact that these constants for the standard normal were not given confused me and left me thinking that maybe there was a notational inconsistency or something, or that I wasn't interpreting something right. So please, mention what these constants are for the standard kernel choice.

Next and more serious, I'm still confused about $c_{3}$ . It appears that we're going to find $c_{3}$ in order to find $h^{*}$ But $c_{3}$ apparently must be estimated as a function of $h$ . I mean if $f$ is the underlying true distribution, which we don't know, then we don't know $f''(x)$ , so the implication is that we'd need to use ${\hat {f}}''$ , which is defined in terms of $h$ . So it seems like $h^{*}$ has a circular definition. —Preceding unsigned comment added by 98.207.54.162 (talk) 19:12, 7 February 2009 (UTC)[reply]

You are correct. I added an internal link about

c_{1}

and

c_{2}

to the relevant page. For

c_{3}

somebody with more knowledge of the estimation algorithms (cross-validation, plug-in etcetera) should have a look, as none of those algorithms are presently discussed on wikipedia. In any case the parameter

c_{3}

must be estimated from the input data set, and is usually derived from its variance

\sigma ^{2}

. Probably something should be said about the derivation of the

R(f,{\hat {f}}(x))

function as well, which is (I think) the AMISE form. 78.21.160.201 (talk) 12:44, 27 August 2009 (UTC)[reply]

Comparison to histogram

The description of a histogram as KDE with a boxcar kernel is not entirely accurate. In a histogram the bin centers are fixed, whereas in KDE the kernel is centered on each data point. See this page for more explanation. --Nubicles (talk) 04:19, 20 February 2009 (UTC)[reply]