Goertzel algorithm: Difference between revisions
m WP:CHECKWIKI error 61 fix, References after punctuation per WP:REFPUNC and WP:PAIC using AWB (8459) |
ParaTechNoid (talk | contribs) Major reorganisation -- yet essential technical content is mostly unchanged. |
||
Line 1: | Line 1: | ||
The '''Goertzel algorithm''' is a [[digital signal processing]] (DSP) technique for |
The '''Goertzel algorithm''' is a [[digital signal processing]] (DSP) technique that provides a means for efficient evaluation of individual terms of the [[Discrete Fourier Transform]] (DFT), thus making it useful in certain practical applications, such as recognition of [[Dual-tone multi-frequency|DTMF]] tones produced by the buttons pushed on a telephone keypad. The algorithm was first described by [[Gerald Goertzel]] in 1958<ref>{{Citation |first=G. |last=Goertzel |date=January 1958 |title= An Algorithm for the Evaluation of Finite Trigonometric Series |journal= American Mathematical Monthly |volume=65 |issue=1 |pages=34–35 |doi=10.2307/2310304 |url=http://www.jstor.org/discover/10.2307/2310304?uid=3739664&uid=2134&uid=2&uid=70&uid=4&uid=3739256&sid=55978747833}}</ref>. |
||
Like the DFT, the ''Goertzel Algorithm'' analyses one selectable frequency component from a [[Discrete signal]] <ref>{{Harvtxt|Mock|1985}}</ref><ref>{{Harvtxt|Chen|1996}}</ref><ref>{{Harvtxt|Schmer|2000}}</ref>. Unlike direct DFT calculations, the ''Goertzel algorithm'' applies a single real-valued coefficient at each iteration, using real-valued arithmetic for real-valued input sequences. For covering a full spectrum, the Goertzel algorithm has a [[Computational complexity theory|higher order of complexity]] than [[Fast Fourier transform]] (FFT) algorithms; but for computing a small number of selected frequency components, it is more numerically efficient. The simple structure of the ''Goertzel algorithm'' makes it well suited to small processors and embedded applications, though not limited to these. |
|||
A practical application of this algorithm is recognition of the [[Dual-tone multi-frequency|DTMF]] tones produced by the buttons pushed on a telephone keypad.<ref>{{Harvtxt|Mock|1985}}</ref><ref>{{Harvtxt|Chen|1996}}</ref><ref>{{Harvtxt|Schmer|2000}}</ref> |
|||
The Goertzel Algorithm can also be used "in reverse" as a sinusoid synthesis function, which requires only 1 multiplication and 1 subtraction per generated sample.<ref>http://www.dattalo.com/technical/theory/sinewave.html</ref><ref>http://cs-www.cs.yale.edu/c2/images/uploads/AudioProc-TR.pdf</ref> |
|||
== |
== The algorithm == |
||
The Goertzel algorithm |
The main calculation in the Goertzel algorithm has the form of a [[digital filter]], and for this reason the algorithm is often called a ''Goertzel Filter''. The filter has a cascade of two stages. The first stage calculates an intermediate sequence, <math>s(n)</math>, given an input sequence, <math>x(n)</math>: |
||
:<math>s(n) = x(n) + 2 \cos(2 \pi \omega) s(n-1) - s(n-2) |
:(1) <math>s(n) = x(n) + 2 \cos(2 \pi \omega) s(n-1) - s(n-2)</math> |
||
The output sequence <math>s(n)</math> from the first filter, is then applied to the following filter to produce output sequence <math>y(n)</math>. |
|||
where <math>s(-2) = s(-1) = 0</math> and <math>\omega</math> is some frequency of interest, in cycles per sample, which should be less than 1/2. This effectively implements a second-order [[Infinite impulse response|IIR]] filter with poles at <math>e^{+2 \pi i \omega}</math> and <math>e^{-2 \pi i \omega}</math>, and requires only one multiplication (assuming <math>2 \cos(2 \pi \omega)</math> is pre-computed), one addition and one subtraction per input sample. For [[Real number|real]] inputs, these operations are real. |
|||
:(2) <math>y(n) = s(n) - e^{-2 \pi i \omega} s(n-1)</math> |
|||
The first filter stage can be observed to be a second-order [[Infinite impulse response|IIR]] filter with a [[Digital filter|direct form]] structure. This particular structure has the property that its internal state variables equal the past output values from that stage. Input values <math>x(n)</math> for <math>n<0</math> are presumed all equal to 0. To establish the initial filter state so that evaluation can begin at sample <math>x(0)</math>, the filter states are assigned initial values <math>s(-2) = s(-1) = 0</math>. The <math>\omega</math> parameter specifies the normalised frequency to be analysed, given in cycles per sample. To avoid [[Aliasing|aliasing hazards]], frequency <math>\omega</math> is often restricted to the range 0 to 1/2, but the results are mathematically valid outside of this range. |
|||
The [[Z transform]] of this process is |
|||
:<math>\frac{S(z)}{X(z)} = \frac{1}{1 - 2 \cos(2 \pi \omega) z^{-1} + z^{-2}} = \frac{1}{(1 - e^{+2 \pi i \omega} z^{-1})(1 - e^{-2 \pi i \omega} z^{-1})}.</math> |
|||
Applying an additional, [[Finite Impulse Response|FIR]], transform of the form |
|||
:<math>\frac{Y(z)}{S(z)} = 1 - e^{-2 \pi i \omega}z^{-1}</math> |
|||
will give an overall transform of |
|||
:<math>\frac{S(z)}{X(z)} \frac{Y(z)}{S(z)} = \frac{Y(z)}{X(z)} = \frac{(1 - e^{-2 \pi i \omega}z^{-1})}{(1 - e^{+2 \pi i \omega} z^{-1})(1 - e^{-2 \pi i \omega} z^{-1})} = \frac{1}{1 - e^{+2 \pi i \omega} z^{-1}}.</math> |
|||
The time-domain equivalent of this overall transform is |
|||
:<math>y(n) = x(n) + e^{+2 \pi i \omega} y(n-1) = \sum_{k=-\infty}^{n}x(k) e^{+2 \pi i \omega (n-k)} = e^{+2 \pi i \omega n} \sum_{k=-\infty}^{n}x(k) e^{-2 \pi i \omega k},</math> |
|||
which becomes, assuming <math>x(k) = 0</math> for all <math>k < 0</math>, |
|||
:<math>y(n) = e^{+2 \pi i \omega n} \sum_{k=0}^{n}x(k) e^{-2 \pi i \omega k},</math> |
|||
or, the equation for the <math>(n+1)</math>-sample [[Discrete Fourier transform|DFT]] of sequence <math>x</math>, evaluated for <math>\omega</math> and multiplied by the phase factor <math>e^{+2 \pi i \omega n}</math>. |
|||
The second stage filter can be observed to be a [[Finite Impulse Response|FIR filter]], since its calculations do not use any of its past outputs. |
|||
Note that applying the additional transform Y(z)/S(z) only requires the last two samples of the <math>s</math> sequence. Consequently, upon processing <math>N</math> samples <math>x(0)...x(N-1)</math>, the last two samples from the <math>s</math> sequence can be used to compute the value of a [[Discrete Fourier transform|DFT]] bin, which corresponds to the chosen frequency <math>\omega</math> as |
|||
:<math>X(\omega) = y(N-1) e^{-2 \pi i \omega (N - 1)} = ( s(N-1) - e^{-2 \pi i \omega} s(N-2) ) e^{-2 \pi i \omega (N - 1)}.</math> |
|||
For the special (however classical) case often found when computing [[Discrete Fourier transform|DFT]] bins, where <math>\omega N = k</math> for some integer, <math>k</math>, this simplifies to |
|||
:<math>X(\omega) = ( s(N-1) - e^{-2 \pi i \omega} s(N-2) ) e^{+2 \pi i \omega} = e^{+2 \pi i \omega} s(N-1) - s(N-2).</math> |
|||
In either case, the corresponding power can be computed using the same cosine term required to compute <math>s</math> as |
|||
:<math>X(\omega) X'(\omega) = s(N-2)^2 + s(N-1)^2 - 2 cos(2 \pi \omega) s(N-2) s(N-1).</math> |
|||
This is due to the fact that the power does not depend on the phase factor <math>e^{-2 \pi i \omega (N - 1)}</math>. |
|||
[[Z transform]] methods can be applied to study the properties of the filter cascade. The Z transform of the first filter stage given in equation (1) is: |
|||
== Implementation == |
|||
:(3) <math>\frac{S(z)}{X(z)} = \frac{1}{1 - 2 \cos(2 \pi \omega) z^{-1} + z^{-2}} = \frac{1}{(1 - e^{+2 \pi i \omega} z^{-1})(1 - e^{-2 \pi i \omega} z^{-1})}.</math> |
|||
The Z transform of the second filter stage given in equation (2) is: |
|||
:(4) <math>\frac{Y(z)}{S(z)} = 1 - e^{-2 \pi i \omega}z^{-1}</math> |
|||
The combined transfer function of the cascade of the two filter stages is then |
|||
:(5) <math>\frac{S(z)}{X(z)} \frac{Y(z)}{S(z)} = \frac{Y(z)}{X(z)} = \frac{(1 - e^{-2 \pi i \omega}z^{-1})}{(1 - e^{+2 \pi i \omega} z^{-1})(1 - e^{-2 \pi i \omega} z^{-1})} = \frac{1}{1 - e^{+2 \pi i \omega} z^{-1}}.</math> |
|||
This can be transformed back to an equivalent time domain sequence, and the terms unrolled back to the first input term at index <math>n=0</math>. |
|||
:(6) <math> \begin{align} |
|||
y(n) & = x(n) + e^{+2 \pi i \omega} y(n-1) \\ |
|||
& = \sum_{k=-\infty}^{n}x(k) e^{+2 \pi i \omega (n-k)} \\ |
|||
& = e^{+2 \pi i \omega n} \sum_{k=0}^{n} x(k) e^{-2 \pi i \omega k} |
|||
\end{align} </math> |
|||
It can be observed that the [[Pole (complex analysis)|poles]] of the filter's [[Z transform]] are located at <math>e^{+2 \pi i \omega}</math> and <math>e^{-2 \pi i \omega}</math>, on a circle of unit radius centered on the origin of the complex [[Z transform|Z transform plane]]. This property indicates that the filter process is [[Marginal stability|marginally stable]] and potentially vulnerable to [[Numerical stability|numerical error accumulation]] when computed using low-precision arithmetic and long input sequences. |
|||
When implemented in a general-purpose processor, values for <math>s(n-1)</math> and <math>s(n-2)</math> can be retained in variables and new values of <math>s</math> can be shifted through as they are computed, assuming that only the final two values of the <math>s</math> sequence are required. The code may then be as follows: |
|||
== DFT computations == |
|||
For the important case of computing a DFT term, the following special restrictions are applied. |
|||
* the filtering terminates at index <math>n=N</math> where <math>N</math> is the number of terms in the input sequence of the DFT |
|||
* the frequencies chosen for the Goertzel analysis are restricted to the special form |
|||
:(7) <math>\omega = \frac{K}{N}</math> |
|||
* the index number <math>K</math> indicating the "frequency bin" of the DFT is selected from the set of index numbers |
|||
:(8) <math>K \epsilon \{ 0, 1, 2, ..., N-1 \} </math> |
|||
Making these substitutions into equation (6), and observing that the term <math>e^{-2 \pi i K} = 1.0</math> , equation (6) then takes the following form. |
|||
:(9) <math>y(N) = \sum_{k=0}^{N} x(k)e^{-2 \pi i \frac{k K}{N}} </math> |
|||
We can observe that the right side of equation (9) is extremely similar to the defining formula for DFT term <math>X(K)</math>, the DFT term for index number K, but not exactly the same. The summation shown in equation (9) requires N+1 input terms, but only N input terms are available when evaluating a DFT. A simple but inelegant expedient is to extend the input sequence <math>x</math> with one more artificial value <math>x(N)=0</math> <ref>http://cnx.org/content/m12024/latest/</ref>. We can see from equation (9), the mathematical effect on the final result is the same as removing term <math>x(N)</math> from the summation, thus delivering the intended DFT value. |
|||
However, there is a more elegant approach that avoids the extra filter pass. From equation (1), we can note that when the extended input term <math>x(N)=0</math> is used in the final step, |
|||
:(10) <math>s(N) = 2 \cos(2 \pi \omega) S(N-1) - s(N-2)</math> |
|||
Thus, the algorithm can be completed as follows: |
|||
* terminate the IIR filter after processing input term <math>x(N-1)</math> |
|||
* apply equation (10) to construct <math>s(N)</math> from the prior outputs <math>s(N-1)</math> and <math>s(N-2)</math> |
|||
* apply equation (2) with the calculated <math>s(N)</math> value, and with <math>s(N-1)</math> produced by the final direct calculation of the filter. |
|||
The last two mathematical operations are simplified by combining them algebraically. |
|||
:(11) <math>\begin{align} y(N)\quad & = s(N) - e^{-2 \pi i \frac{K}{N}} s(N-1)\quad \\ |
|||
& = (2 \cos(2 \pi \omega) s(N-1) - s(N-2)) - e^{-2 \pi i \frac{K}{N}} s(N-1) \\ |
|||
& = e^{2 \pi i \frac{K}{N}} s(N-1) - s(N-2) |
|||
\end{align}</math> |
|||
Note that stopping the filter updates at term <math>N-1</math> and immediately applying equation (2) rather than equation (11) misses the final filter state updates, yielding a result with incorrect phase <ref> http://www.eetimes.com/design/signal-processing-dsp/4024443/The-Goertzel-Algorithm. Use with caution.</ref>. |
|||
The particular filtering structure chosen for the ''Goertzel Algorithm'' is the key to its efficient DFT calculations. We can observe that only one output value <math>y(N)</math> is used for calculating the DFT, so calculations for all the other output terms are omitted. Since the FIR filter is not calculated, the IIR stage calculations <math>s(0), s(1),</math> etc. can be discarded immediately after updating the first stage's internal state. |
|||
This seems to leave a paradox: to complete the algorithm, the FIR filter stage must be evaluated once using the final two outputs from the IIR filter stage, while for computational efficiency the IIR filter iteration discards its output values. This is where the properties of the direct-form filter structure are applied. The two internal state variables of the IIR filter provide the last two values of the IIR filter output, which are the terms required to evaluate the FIR filter stage. |
|||
== Applications == |
|||
=== Power spectrum terms === |
|||
Examining equation (6), a final IIR filter pass to calculate term <math>s(N)</math> using a supplemental input value <math>x(N)=0</math> applies a complex multiplier of magnitude 1.0 to the previous term <math>s(N-1)</math>. Consequently, <math>s(N)</math> and <math>s(N-1)</math> represent equivalent signal power. It is equally valid to apply equation equation (11) and calculate the signal power from term <math>y(N)</math>, or to apply equation (2) and calculate the signal power from term <math>y(N-1)</math>. Both cases lead to the following expression for the signal power represented by DFT term X(K). |
|||
:(12) <math>\begin{align} |
|||
X(K) X'(K) & = y(N)\ y'(N) = y(N-1)\ y'(N-1) \\ |
|||
& = s(N-1)^2 + s(N-2)^2 - 2 cos(2 \pi \frac{K}{N})\ s(N-1)\ s(N-2) |
|||
\end{align} </math> |
|||
In the implementation illustrated below, the variables <math>sprev</math> and <math>sprev2</math> are used for the IIR filter state values that preserve output history. |
|||
<pre> |
<pre> |
||
Nterms defined here |
|||
s_prev = 0 ; |
|||
Kterm selected here |
|||
s_prev2 = 0 ; |
|||
omega = 2 * PI * Kterm / Nterms; |
|||
normalized_frequency = target_frequency / sample_rate ; |
|||
wr = cos(omega); |
|||
coeff = 2 * cos(2 * PI * normalized_frequency) ; |
|||
wi = sin(omega); |
|||
for each sample, x[n], |
|||
coeff = 2 * wr; |
|||
s = x[n] + coeff * s_prev - s_prev2 ; |
|||
s_prev2 = s_prev ; |
|||
sprev = 0; |
|||
sprev2 = 0; |
|||
for each index k in range 0 to Nterms-1 |
|||
s = x[k] + coeff * sprev - sprev2; |
|||
sprev2 = sprev; |
|||
sprev = s; |
|||
end |
end |
||
power = s_prev2 * s_prev2 + s_prev * s_prev - coeff * s_prev * s_prev2 ; |
|||
power = sprev2*sprev2 + sprev*sprev - coeff*sprev*sprev2 ; |
|||
</pre> |
</pre> |
||
This implementation presumes that the complete input data sequence <math>x</math> is provided as an array. It is also possible to organise the computations so that incoming samples are delivered singly, in real-time, to a [[Object-oriented programming|software object]] that maintains the filter state between updates, with the final power result accessed after the other processing is done. |
|||
Instead of storing the history of samples in an array it is also possible to process the incoming samples iteratively in real-time. |
|||
=== Single DFT term with real-valued arithmetic === |
|||
The case of real-valued input data arises frequently, especially in embedded systems where the input streams result from direct measurements of physical processes. Comparing to the implementation shown in the previous section, when the input data are real-valued, the filter internal state variables <i>sprev</i> and <i>sprev2</i> can be observed also to be real-valued, consequently, no complex arithmetic is required in the first IIR stage. Optimizing for real-valued arithmetic typically is as simple as applying appropriate real-valued data types for the variables. |
|||
After the calculations using input term <math>x(N-1)</math>, and filter iterations are terminated, equation (11) must be applied to evaluate the DFT term. The final calculation uses complex-valued arithmetic, but this can be converted into real-valued arithmetic by separating real and imaginary terms. |
|||
:(13) <math>\begin{align} |
|||
Wr & = cos(2 \pi \frac{K}{N}) \\ |
|||
Wi & = sin(2 \pi \frac{K}{N}) \\ |
|||
y(N) & = (Wr\ s(N-1) - s(N-2)) + i (Wi\ s(N-1)) |
|||
\end{align} |
|||
</math> |
|||
Comparing to the power spectrum application, the only difference is the calculation used to finish. |
|||
<pre> |
|||
(Same IIR filter calculations as in the signal power implementation) |
|||
XKreal = sprev * wr - sprev2; |
|||
XKimag = sprev * wi; |
|||
</pre> |
|||
=== Phase detection === |
|||
This application requires the same evaluation of DFT term <math>X(K)</math>, as discussed in the previous section, using a real-valued or complex-valued input stream. Then the signal phase can be evaluated as: |
|||
:(14) <math> phase = tan^{-1}(imaginary(X(K)) / (real(X(K)) </math> |
|||
taking appropriate precautions for singularities, quadrant, and so forth when computing the inverse tangent function. |
|||
=== Complex signals in real arithmetic === |
|||
Since complex signals decompose linearly into real and imaginary parts, the ''Goertzel algorithm'' in real-valued arithmetic can be applied separately to the sequence of real parts, yielding <math>yr(n)</math>, and to the sequence of imaginary parts, yielding <math>yi(n)</math>, both computed in real arithmetic. After that, the two complex-valued partial results can be recombined: |
|||
:(15) <math>y(n) = yr(n) + i\ yi(n)</math> |
|||
== Computational complexity == |
== Computational complexity == |
||
According to [[Computational complexity theory]], computing a set of <math>M</math> DFT terms using <math>M</math> applications of the Goertzel algorithm on a data set with <math>N</math> values with a "cost per operation" of <math>K</math> has a complexity order |
|||
To compute a single [[Discrete Fourier transform|DFT]] bin for a complex sequence of length ''N'', this algorithm requires 2''N'' multiplications and 4''N'' additions/subtractions within the loop, as well as 4 multiplications and 4 additions/subtractions to compute <math>X(\omega)</math>, for a total of 2''N''+4 multiplications and 4''N''+4 additions/subtractions (for real sequences, the required operations are half that amount). In contrast, the [[Fast Fourier transform]] (FFT) requires 2log<sub>2</sub>''N'' multiplications and 3log<sub>2</sub>''N'' additions/subtractions per [[Discrete Fourier transform|DFT]] bin, but must compute all ''N'' bins simultaneously (similar optimizations are available to halve the number of operations in an FFT when the input sequence is real). |
|||
:<math> K\ N\ M</math> |
|||
* To compute a single [[Discrete Fourier transform|DFT]] bin <math>X(\omega)</math> for a complex input sequence of length <math>N</math>, the Goertzel algorithm requires <math>2 N</math> multiplications and <math>4\ N</math> additions/subtractions within the loop, as well as 4 multiplications and 4 final additions/subtractions, for a total of <math>2 N\ + 4</math> multiplications and <math>4 N\ + 4</math> additions/subtractions. This is repeated for each of the <math>M</math> frequencies. |
|||
When the number of desired [[Discrete Fourier transform|DFT]] bins, ''M'', is small (e.g., when detecting [[DTMF]] tones), it is computationally advantageous to implement the Goertzel algorithm, rather than the FFT. Approximately, this occurs when |
|||
:<math>M < \frac{5}{6} \log_2 N</math> |
|||
or if, for some reason, ''N'' is not an integral power of 2 while you stick to a simple FFT algorithm like the 2-radix [[Cooley-Tukey FFT algorithm]], and zero-padding the samples out to an integral power of 2 would violate |
|||
:<math>M < \frac{5 N_{\mathrm{padded}}}{6 N} \log_2\left(N_{\mathrm{padded}}\right)</math> |
|||
Moreover, the Goertzel algorithm can be computed as samples come in, and the FFT algorithm may require a large table of ''N'' pre-computed sines and cosines to be efficient. |
|||
In contrast, using an [[Fast Fourier transform|FFT]] on a data set with <math>N</math> values has a complexity order |
|||
If multiplications are not considered as difficult as additions, or vice versa, the 5/6 ratio can shift between anything from 3/4 (additions dominate) to 1/1 (multiplications dominate). |
|||
:<math> K\ N\ log(N)</math> |
|||
* This is harder to apply directly because it depends on the FFT algorithm used, but a typical example is a radix-2 FFT, which requires <math>2 log_2(N)</math> multiplications and <math>3 log_2(N)</math> additions/subtractions per [[Discrete Fourier transform|DFT]] bin, for each of the <math>N</math> bins. |
|||
In the complexity order expressions, when the number of calculated terms <math>M</math> is smaller than <math>log(N)</math>, the advantage of the Goertzel algorithm is clear. But because FFT code is comparatively very complex, the "cost per unit of work" factor <math>K</math> is often much larger for an FFT, and the practical advantage favours the Goertzel algorithm more than the complexity expressions indicate. |
|||
As a rule-of-thumb for determining whether a radix-2 FFT or a ''Goertzel algorithm'' is more efficient, adjust the number of terms N in the data set upward to the nearest exact power of 2, calling this <math>N_2</math>, and the Goertzel algorithm is likely to be faster if |
|||
:<math>M <= \frac{5 N_2}{6 N} \log_2 N_2</math> |
|||
FFT implementations and processing platforms have a significant impact on the relative performance. Some FFT implementations<ref>{{Citation |last1=Press|last2=Flannery|Last3=Teukolsky|Last4=Vetterling|title=Numerical Recipes, The Art of Scientific Computing|publisher=Cambridge University Press||date=2007|chapter=Chapter 12|doi= }}</ref> perform internal complex-number calculations to generate coefficients on-the-fly, significantly increasing their "cost K per unit of work." FFT and DFT algorithms can use tables of pre-computed coefficient values for better numerical efficiency, but this requires more accesses to coefficient values buffered in external memory, which can lead to increased cache contention that counters some of the numerical advantage. |
|||
Both algorithms gain approximately a factor of 2 efficiency when using real-valued rather than complex-valued input data. However, these gains are natural for the ''Goertzel algorithm'' but will not be achieved for the FFT without using certain algorithm variants specialised for [[Fast Fourier transform|transforming real-valued data]]. |
|||
== References == |
== References == |
||
Line 67: | Line 156: | ||
* {{Citation |last=Mock |first=P. |url=http://focus.ti.com/lit/an/spra168/spra168.pdf |title=Add DTMF Generation and Decoding to DSP-μP Designs |journal=EDN |date=March 21, 1985 |issn=0012-7515 |doi= }}; also found in DSP Applications with the TMS320 Family, Vol. 1, Texas Instruments, 1989. |
* {{Citation |last=Mock |first=P. |url=http://focus.ti.com/lit/an/spra168/spra168.pdf |title=Add DTMF Generation and Decoding to DSP-μP Designs |journal=EDN |date=March 21, 1985 |issn=0012-7515 |doi= }}; also found in DSP Applications with the TMS320 Family, Vol. 1, Texas Instruments, 1989. |
||
* {{Citation |last=Schmer |first=Gunter |url=http://focus.ti.com/lit/an/spra096a/spra096a.pdf |title=DTMF Tone Generation and Detection: An Implementation Using the TMS320C54x |publisher=Texas Instruments |series=Application Report |volume=SPRA096a |date= May 2000 |doi= }} |
* {{Citation |last=Schmer |first=Gunter |url=http://focus.ti.com/lit/an/spra096a/spra096a.pdf |title=DTMF Tone Generation and Detection: An Implementation Using the TMS320C54x |publisher=Texas Instruments |series=Application Report |volume=SPRA096a |date= May 2000 |doi= }} |
||
* {{Citation |last=Proakis |first= J. G. |first2=D. G. |last2= Manolakis |title=Digital Signal Processing: Principles, Algorithms, and Applications |location= Upper Saddle River, NJ |publisher=Prentice Hall |year=1996 |pages=480–481 |isbn= |doi= }} |
|||
==External links== |
==External links== |
||
* http://netwerkt.wordpress.com/2011/08/25/goertzel-filter/ C Code implementation of iterative Goertzel algorithm |
* http://netwerkt.wordpress.com/2011/08/25/goertzel-filter/ C Code implementation of iterative Goertzel algorithm |
||
* http://www. |
* http://www.embedded.com/showArticle.jhtml?articleID=17301593 |
||
* http://cnx.org/content/m12024/latest/ |
|||
* http://www.embedded.com/showArticle.jhtml?articleID=17301593 |
|||
* {{Citation |last=Proakis |first= J. G. |first2=D. G. |last2= Manolakis |title=Digital Signal Processing: Principles, Algorithms, and Applications |location= Upper Saddle River, NJ |publisher=Prentice Hall |year=1996 |pages=480–481 |isbn= |doi= }} |
|||
[[Category:FFT algorithms]] |
[[Category:FFT algorithms]] |
Revision as of 20:52, 18 November 2012
The Goertzel algorithm is a digital signal processing (DSP) technique that provides a means for efficient evaluation of individual terms of the Discrete Fourier Transform (DFT), thus making it useful in certain practical applications, such as recognition of DTMF tones produced by the buttons pushed on a telephone keypad. The algorithm was first described by Gerald Goertzel in 1958[1].
Like the DFT, the Goertzel Algorithm analyses one selectable frequency component from a Discrete signal [2][3][4]. Unlike direct DFT calculations, the Goertzel algorithm applies a single real-valued coefficient at each iteration, using real-valued arithmetic for real-valued input sequences. For covering a full spectrum, the Goertzel algorithm has a higher order of complexity than Fast Fourier transform (FFT) algorithms; but for computing a small number of selected frequency components, it is more numerically efficient. The simple structure of the Goertzel algorithm makes it well suited to small processors and embedded applications, though not limited to these.
The Goertzel Algorithm can also be used "in reverse" as a sinusoid synthesis function, which requires only 1 multiplication and 1 subtraction per generated sample.[5][6]
The algorithm
The main calculation in the Goertzel algorithm has the form of a digital filter, and for this reason the algorithm is often called a Goertzel Filter. The filter has a cascade of two stages. The first stage calculates an intermediate sequence, , given an input sequence, :
- (1)
The output sequence from the first filter, is then applied to the following filter to produce output sequence .
- (2)
The first filter stage can be observed to be a second-order IIR filter with a direct form structure. This particular structure has the property that its internal state variables equal the past output values from that stage. Input values for are presumed all equal to 0. To establish the initial filter state so that evaluation can begin at sample , the filter states are assigned initial values . The parameter specifies the normalised frequency to be analysed, given in cycles per sample. To avoid aliasing hazards, frequency is often restricted to the range 0 to 1/2, but the results are mathematically valid outside of this range.
The second stage filter can be observed to be a FIR filter, since its calculations do not use any of its past outputs.
Z transform methods can be applied to study the properties of the filter cascade. The Z transform of the first filter stage given in equation (1) is:
- (3)
The Z transform of the second filter stage given in equation (2) is:
- (4)
The combined transfer function of the cascade of the two filter stages is then
- (5)
This can be transformed back to an equivalent time domain sequence, and the terms unrolled back to the first input term at index .
- (6)
It can be observed that the poles of the filter's Z transform are located at and , on a circle of unit radius centered on the origin of the complex Z transform plane. This property indicates that the filter process is marginally stable and potentially vulnerable to numerical error accumulation when computed using low-precision arithmetic and long input sequences.
DFT computations
For the important case of computing a DFT term, the following special restrictions are applied.
- the filtering terminates at index where is the number of terms in the input sequence of the DFT
- the frequencies chosen for the Goertzel analysis are restricted to the special form
- (7)
- the index number indicating the "frequency bin" of the DFT is selected from the set of index numbers
- (8)
Making these substitutions into equation (6), and observing that the term , equation (6) then takes the following form.
- (9)
We can observe that the right side of equation (9) is extremely similar to the defining formula for DFT term , the DFT term for index number K, but not exactly the same. The summation shown in equation (9) requires N+1 input terms, but only N input terms are available when evaluating a DFT. A simple but inelegant expedient is to extend the input sequence with one more artificial value [7]. We can see from equation (9), the mathematical effect on the final result is the same as removing term from the summation, thus delivering the intended DFT value.
However, there is a more elegant approach that avoids the extra filter pass. From equation (1), we can note that when the extended input term is used in the final step,
- (10)
Thus, the algorithm can be completed as follows:
- terminate the IIR filter after processing input term
- apply equation (10) to construct from the prior outputs and
- apply equation (2) with the calculated value, and with produced by the final direct calculation of the filter.
The last two mathematical operations are simplified by combining them algebraically.
- (11)
Note that stopping the filter updates at term and immediately applying equation (2) rather than equation (11) misses the final filter state updates, yielding a result with incorrect phase [8].
The particular filtering structure chosen for the Goertzel Algorithm is the key to its efficient DFT calculations. We can observe that only one output value is used for calculating the DFT, so calculations for all the other output terms are omitted. Since the FIR filter is not calculated, the IIR stage calculations etc. can be discarded immediately after updating the first stage's internal state.
This seems to leave a paradox: to complete the algorithm, the FIR filter stage must be evaluated once using the final two outputs from the IIR filter stage, while for computational efficiency the IIR filter iteration discards its output values. This is where the properties of the direct-form filter structure are applied. The two internal state variables of the IIR filter provide the last two values of the IIR filter output, which are the terms required to evaluate the FIR filter stage.
Applications
Power spectrum terms
Examining equation (6), a final IIR filter pass to calculate term using a supplemental input value applies a complex multiplier of magnitude 1.0 to the previous term . Consequently, and represent equivalent signal power. It is equally valid to apply equation equation (11) and calculate the signal power from term , or to apply equation (2) and calculate the signal power from term . Both cases lead to the following expression for the signal power represented by DFT term X(K).
- (12)
In the implementation illustrated below, the variables and are used for the IIR filter state values that preserve output history.
Nterms defined here Kterm selected here omega = 2 * PI * Kterm / Nterms; wr = cos(omega); wi = sin(omega); coeff = 2 * wr; sprev = 0; sprev2 = 0; for each index k in range 0 to Nterms-1 s = x[k] + coeff * sprev - sprev2; sprev2 = sprev; sprev = s; end power = sprev2*sprev2 + sprev*sprev - coeff*sprev*sprev2 ;
This implementation presumes that the complete input data sequence is provided as an array. It is also possible to organise the computations so that incoming samples are delivered singly, in real-time, to a software object that maintains the filter state between updates, with the final power result accessed after the other processing is done.
Single DFT term with real-valued arithmetic
The case of real-valued input data arises frequently, especially in embedded systems where the input streams result from direct measurements of physical processes. Comparing to the implementation shown in the previous section, when the input data are real-valued, the filter internal state variables sprev and sprev2 can be observed also to be real-valued, consequently, no complex arithmetic is required in the first IIR stage. Optimizing for real-valued arithmetic typically is as simple as applying appropriate real-valued data types for the variables.
After the calculations using input term , and filter iterations are terminated, equation (11) must be applied to evaluate the DFT term. The final calculation uses complex-valued arithmetic, but this can be converted into real-valued arithmetic by separating real and imaginary terms.
- (13)
Comparing to the power spectrum application, the only difference is the calculation used to finish.
(Same IIR filter calculations as in the signal power implementation) XKreal = sprev * wr - sprev2; XKimag = sprev * wi;
Phase detection
This application requires the same evaluation of DFT term , as discussed in the previous section, using a real-valued or complex-valued input stream. Then the signal phase can be evaluated as:
- (14)
taking appropriate precautions for singularities, quadrant, and so forth when computing the inverse tangent function.
Complex signals in real arithmetic
Since complex signals decompose linearly into real and imaginary parts, the Goertzel algorithm in real-valued arithmetic can be applied separately to the sequence of real parts, yielding , and to the sequence of imaginary parts, yielding , both computed in real arithmetic. After that, the two complex-valued partial results can be recombined:
- (15)
Computational complexity
According to Computational complexity theory, computing a set of DFT terms using applications of the Goertzel algorithm on a data set with values with a "cost per operation" of has a complexity order
- To compute a single DFT bin for a complex input sequence of length , the Goertzel algorithm requires multiplications and additions/subtractions within the loop, as well as 4 multiplications and 4 final additions/subtractions, for a total of multiplications and additions/subtractions. This is repeated for each of the frequencies.
In contrast, using an FFT on a data set with values has a complexity order
- This is harder to apply directly because it depends on the FFT algorithm used, but a typical example is a radix-2 FFT, which requires multiplications and additions/subtractions per DFT bin, for each of the bins.
In the complexity order expressions, when the number of calculated terms is smaller than , the advantage of the Goertzel algorithm is clear. But because FFT code is comparatively very complex, the "cost per unit of work" factor is often much larger for an FFT, and the practical advantage favours the Goertzel algorithm more than the complexity expressions indicate.
As a rule-of-thumb for determining whether a radix-2 FFT or a Goertzel algorithm is more efficient, adjust the number of terms N in the data set upward to the nearest exact power of 2, calling this , and the Goertzel algorithm is likely to be faster if
FFT implementations and processing platforms have a significant impact on the relative performance. Some FFT implementations[9] perform internal complex-number calculations to generate coefficients on-the-fly, significantly increasing their "cost K per unit of work." FFT and DFT algorithms can use tables of pre-computed coefficient values for better numerical efficiency, but this requires more accesses to coefficient values buffered in external memory, which can lead to increased cache contention that counters some of the numerical advantage.
Both algorithms gain approximately a factor of 2 efficiency when using real-valued rather than complex-valued input data. However, these gains are natural for the Goertzel algorithm but will not be achieved for the FFT without using certain algorithm variants specialised for transforming real-valued data.
References
- ^ Goertzel, G. (January 1958), "An Algorithm for the Evaluation of Finite Trigonometric Series", American Mathematical Monthly, 65 (1): 34–35, doi:10.2307/2310304
- ^ Mock (1985)
- ^ Chen (1996)
- ^ Schmer (2000)
- ^ http://www.dattalo.com/technical/theory/sinewave.html
- ^ http://cs-www.cs.yale.edu/c2/images/uploads/AudioProc-TR.pdf
- ^ http://cnx.org/content/m12024/latest/
- ^ http://www.eetimes.com/design/signal-processing-dsp/4024443/The-Goertzel-Algorithm. Use with caution.
- ^ Press; Flannery (2007), "Chapter 12", Numerical Recipes, The Art of Scientific Computing, Cambridge University Press
{{citation}}
: Cite has empty unknown parameter:|1=
(help); Unknown parameter|Last3=
ignored (|last3=
suggested) (help); Unknown parameter|Last4=
ignored (|last4=
suggested) (help)
- Chen, Chiouguey J. (June 1996), Modified Goertzel Algorithm in DTMF Detection Using the TMS320C80 DSP (PDF), Application Report, vol. SPRA066, Texas Instruments
- Mock, P. (March 21, 1985), "Add DTMF Generation and Decoding to DSP-μP Designs" (PDF), EDN, ISSN 0012-7515; also found in DSP Applications with the TMS320 Family, Vol. 1, Texas Instruments, 1989.
- Schmer, Gunter (May 2000), DTMF Tone Generation and Detection: An Implementation Using the TMS320C54x (PDF), Application Report, vol. SPRA096a, Texas Instruments
- Proakis, J. G.; Manolakis, D. G. (1996), Digital Signal Processing: Principles, Algorithms, and Applications, Upper Saddle River, NJ: Prentice Hall, pp. 480–481
External links
- http://netwerkt.wordpress.com/2011/08/25/goertzel-filter/ C Code implementation of iterative Goertzel algorithm
- http://www.embedded.com/showArticle.jhtml?articleID=17301593