Reassignment method: Difference between revisions
m clean up, References after punctuation per WP:REFPUNC and WP:PAIC using AWB (8748) |
IznoRepeat (talk | contribs) m add WP:TEMPLATECAT to remove from template; genfixes |
||
(42 intermediate revisions by 22 users not shown) | |||
Line 1: | Line 1: | ||
{{more citations needed|date=March 2023}} |
|||
The '''method of reassignment''' is a technique for |
|||
[[Image:Reassigned spectrogral surface of bass pluck.png|thumb|400px| |
|||
sharpening a [[time-frequency representation]] by mapping |
|||
Reassigned spectral surface for the onset of an acoustic bass tone having a sharp pluck and a fundamental frequency of approximately 73.4 Hz. Sharp spectral ridges representing the harmonics are evident, as is the abrupt onset of the tone. The spectrogram was computed using a 65.7 ms Kaiser window with a shaping parameter of 12.]] |
|||
the data to time-frequency coordinates that are nearer to |
|||
The '''method of reassignment''' is a technique for sharpening a [[time-frequency representation]] (e.g. [[spectrogram]] or the [[short-time Fourier transform]]) by mapping the data to time-frequency coordinates that are nearer to the true [[Support (mathematics)|region of support]] of the analyzed signal. The method has been independently introduced by several parties under various names, including ''method of reassignment'', ''remapping'', ''time-frequency reassignment'', and ''modified moving-window method''.<ref name="hainsworth">{{Cite thesis |type=PhD |chapter=Chapter 3: Reassignment methods |title=Techniques for the Automated Analysis of Musical Audio |last=Hainsworth |first=Stephen |year=2003 |publisher=University of Cambridge |citeseerx=10.1.1.5.9579 }}</ref> The method of reassignment sharpens blurry time-frequency data by relocating the data according to local estimates of instantaneous frequency and group delay. This mapping to reassigned time-frequency coordinates is very precise for signals that are separable in time and frequency with respect to the analysis window. |
|||
the true [[Support (mathematics)|region of support]] of the |
|||
analyzed signal. The method has been independently |
|||
introduced by several parties under various names, including |
|||
''method of reassignment'', ''remapping'', ''time-frequency reassignment'', |
|||
and ''modified moving-window method''.<ref name="hainsworth">{{Cite thesis |type=PhD |chapter=Chapter 3: Reassignment methods |title=Techniques for the Automated Analysis of Musical Audio |url=http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.5.9579 |last=Hainsworth |first=Stephen |year=2003 |publisher=University of Cambridge |accessdate= |docket= |oclc= }}</ref> In |
|||
the case of the [[spectrogram]] or the [[short-time Fourier transform]], |
|||
the method of reassignment sharpens blurry |
|||
time-frequency data by relocating the data according to |
|||
local estimates of instantaneous frequency and group delay. |
|||
This mapping to reassigned time-frequency coordinates is |
|||
very precise for signals that are separable in time and |
|||
frequency with respect to the analysis window. |
|||
== Introduction == |
== Introduction == |
||
Many signals of interest have a distribution of energy that varies in time and frequency. For example, any sound signal having a beginning or an end has an energy distribution that varies in time, and most sounds exhibit considerable variation in both time and frequency over their duration. Time-frequency representations are commonly used to analyze or characterize such signals. They map the one-dimensional time-domain signal into a two-dimensional function of time and frequency. A time-frequency representation describes the variation of spectral energy distribution over time, much as a musical score describes the variation of musical pitch over time. |
|||
[[Image:Reassigned spectrogral surface of bass pluck.png|thumb|400px| |
|||
Reassigned spectral surface for the onset of an acoustic bass tone |
|||
having a sharp pluck and a fundamental frequency of approximately 73.4 Hz. |
|||
Sharp spectral ridges representing the harmonics are evident, as is the |
|||
abrupt onset of the tone. |
|||
The spectrogram was computed using a 65.7 ms Kaiser window with a shaping |
|||
parameter of 12.]] |
|||
Many signals of interest have a distribution of energy that |
|||
varies in time and frequency. For example, any sound signal |
|||
having a beginning or an end has an energy distribution that |
|||
varies in time, and most sounds exhibit considerable |
|||
variation in both time and frequency over their duration. |
|||
Time-frequency representations are commonly used to analyze |
|||
or characterize such signals. They map the one-dimensional |
|||
time-domain signal into a two-dimensional function of time |
|||
and frequency. A time-frequency representation describes the |
|||
variation of spectral energy distribution over time, much as |
|||
a musical score describes the variation of musical pitch |
|||
over time. |
|||
In audio signal analysis, the spectrogram is the most commonly used time-frequency representation, probably because it is well understood, and immune to so-called "cross-terms" that sometimes make other time-frequency representations difficult to interpret. But the windowing operation required in spectrogram computation introduces an unsavory tradeoff between time resolution and frequency resolution, so spectrograms provide a time-frequency representation that is blurred in time, in frequency, or in both dimensions. The method of time-frequency reassignment is a technique for refocussing time-frequency data in a blurred representation like the spectrogram by mapping the data to time-frequency coordinates that are nearer to the true region of support of the analyzed signal.<ref name="improving" /> |
|||
In audio signal analysis, the spectrogram is the most |
|||
commonly used time-frequency representation, probably |
|||
because it is well-understood, and immune to so-called |
|||
"cross-terms" that sometimes make other time-frequency |
|||
representations difficult to interpret. But the windowing |
|||
operation required in spectrogram computation introduces an |
|||
unsavory tradeoff between time resolution and frequency |
|||
resolution, so spectrograms provide a time-frequency |
|||
representation that is blurred in time, in frequency, or in |
|||
both dimensions. The method of time-frequency reassignment |
|||
is a technique for refocussing time-frequency data in a |
|||
blurred representation like the spectrogram by mapping the |
|||
data to time-frequency coordinates that are nearer to the |
|||
true region of support of the analyzed signal. |
|||
== The spectrogram as a time-frequency representation == |
== The spectrogram as a time-frequency representation == |
||
{{main|Spectrogram}} |
|||
One of the best-known time-frequency representations is the spectrogram, defined as the squared magnitude of the short-time Fourier transform. Though the short-time phase spectrum is known to contain important temporal information about the signal, this information is difficult to interpret, so typically, only the short-time magnitude spectrum is considered in short-time spectral analysis.<ref name="improving"/> |
|||
One of the best-known time-frequency representations is the |
|||
spectrogram, defined as the squared magnitude of the |
|||
short-time Fourier transform. Though the short-time phase |
|||
spectrum is known to contain important temporal information |
|||
about the signal, this information is difficult to |
|||
interpret, so typically, only the short-time magnitude |
|||
spectrum is considered in short-time spectral analysis. |
|||
As a time-frequency representation, the spectrogram has |
As a time-frequency representation, the spectrogram has relatively poor resolution. Time and frequency resolution are governed by the choice of analysis window and greater concentration in one domain is accompanied by greater smearing in the other.<ref name="improving"/> |
||
relatively poor resolution. Time and frequency resolution |
|||
are governed by the choice of analysis window and greater |
|||
concentration in one domain is accompanied by greater |
|||
smearing in the other. |
|||
A time-frequency representation having improved resolution, relative to the spectrogram, is the [[Wigner–Ville distribution]], which may be interpreted as a short-time Fourier transform with a window function that is perfectly matched to the signal. The Wigner–Ville distribution is highly concentrated in time and frequency, but it is also highly nonlinear and non-local. Consequently, this |
|||
A time-frequency representation having improved resolution, |
|||
distribution is very sensitive to noise, and generates cross-components that often mask the components of interest, making it difficult to extract useful information concerning the distribution of energy in multi-component signals.<ref name="improving"/> |
|||
relative to the spectrogram, is the [[Wigner–Ville distribution]], |
|||
which may be interpreted as a short-time |
|||
Fourier transform with a window function that is perfectly |
|||
matched to the signal. The Wigner–Ville distribution is |
|||
highly concentrated in time and frequency, but it is also |
|||
highly nonlinear and non-local. Consequently, this |
|||
distribution is very sensitive to noise, and generates |
|||
cross-components that often mask the components of interest, |
|||
making it difficult to extract useful information concerning |
|||
the distribution of energy in multi-component signals. |
|||
[[Cohen's class distribution function|Cohen's class]] of bilinear time-frequency representations is a class of "smoothed" Wigner–Ville distributions, employing a smoothing kernel that can reduce sensitivity of the distribution to noise and suppresses cross-components, at the expense of smearing the distribution in time and frequency. This smearing causes the distribution to be non-zero in regions where the true Wigner–Ville distribution shows no energy.<ref name="improving"/> |
|||
[[Cohen's class distribution function|Cohen's class]] of |
|||
bilinear time-frequency representations is a class of |
|||
"smoothed" Wigner–Ville distributions, employing a smoothing |
|||
kernel that can reduce sensitivity of the distribution to |
|||
noise and suppresses cross-components, at the expense of |
|||
smearing the distribution in time and frequency. This |
|||
smearing causes the distribution to be non-zero in regions |
|||
where the true Wigner–Ville distribution shows no energy. |
|||
The spectrogram is a member of Cohen's class. It is a smoothed Wigner–Ville distribution with the smoothing kernel equal to the Wigner–Ville distribution of the analysis window. The method of reassignment smooths the Wigner–Ville distribution, but then refocuses the distribution back to the true regions of support of the signal components. The method has been shown to reduce time and frequency smearing of any member of Cohen's class.<ref name="improving"> |
|||
The spectrogram is a member of Cohen's class. It is a |
|||
{{cite journal |author1=F. Auger |author2=P. Flandrin |name-list-style=amp |date=May 1995 |title=Improving the readability of time-frequency and time-scale representations by the reassignment method |journal=IEEE Transactions on Signal Processing |volume=43 |issue=5 |pages=1068–1089 |doi=10.1109/78.382394 |bibcode=1995ITSP...43.1068A |citeseerx=10.1.1.646.794 |s2cid=6336685 }}</ref><ref>P. Flandrin, F. Auger, and E. Chassande-Mottin, |
|||
smoothed Wigner–Ville distribution with the smoothing kernel |
|||
equal to the Wigner–Ville distribution of the analysis |
|||
window. The method of reassignment smoothes the Wigner–Ville |
|||
distribution, but then refocuses the distribution back to |
|||
the true regions of support of the signal components. The |
|||
method has been shown to reduce time and frequency smearing |
|||
of any member of Cohen's class |
|||
<ref name = "improving"> |
|||
{{cite journal |author=F. Auger and P. Flandrin |year=1995 | month=May |title=Improving the readability of time-frequency and |
|||
time-scale representations by the reassignment method |journal=IEEE Transactions on Signal Processing |volume=43 |issue=5 |pages=1068–1089 |publisher= |doi=10.1109/78.382394 |url= |accessdate= }} |
|||
</ref> |
|||
.<ref>P. Flandrin, F. Auger, and E. Chassande-Mottin, |
|||
''Time-frequency reassignment: From principles to algorithms'', |
''Time-frequency reassignment: From principles to algorithms'', |
||
in Applications in Time-Frequency Signal Processing |
in Applications in Time-Frequency Signal Processing |
||
Line 114: | Line 35: | ||
== The method of reassignment == |
== The method of reassignment == |
||
Pioneering work on the method of reassignment was published by Kodera, Gendrin, and de Villedary under the name of ''Modified Moving Window Method''.<ref name=Kodera>{{cite journal |author1=K. Kodera |author2=R. Gendrin |author3=C. de Villedary |name-list-style=amp |date=Feb 1978 |title=Analysis of time-varying signals with small BT values |journal=IEEE Transactions on Acoustics, Speech, and Signal Processing |volume=26 |issue=1 |pages=64–76 |doi=10.1109/TASSP.1978.1163047 }}</ref> Their technique enhances the resolution in time and frequency of the classical Moving Window Method (equivalent to the spectrogram) by assigning to each data point a new time-frequency coordinate that better-reflects the distribution of energy in the analyzed signal.<ref name=Kodera/>{{rp|67}} |
|||
Pioneering work on the method of reassignment was |
|||
published by Kodera, Gendrin, and de Villedary under the |
|||
name of ''Modified Moving Window Method'' |
|||
<ref> |
|||
{{cite journal |author=K. Kodera, R. Gendrin, and C. de Villedary |year=1978 |title=Analysis of time-varying signals with small BT values |journal=IEEE Transactions on Acoustics, Speech and Signal Processing |volume=26 |issue=1 |pages=64–76 |month=Feb | publisher= |doi=10.1109/TASSP.1978.1163047 |url= |accessdate= }} |
|||
</ref> |
|||
Their technique enhances the resolution in time and |
|||
frequency of the classical Moving Window Method (equivalent |
|||
to the spectrogram) by assigning to each data point a new |
|||
time-frequency coordinate that better-reflects the |
|||
distribution of energy in the analyzed signal. |
|||
In the classical moving window method, a time-domain signal, <math>x(t)</math> is decomposed into a set of coefficients, <math>\epsilon( t, \omega )</math>, based on a set of elementary signals, <math>h_{\omega}(t)</math>, defined<ref name=Kodera/>{{rp|73}}<!-- far from the same notation as Kodera p73, but the same thing. --> |
|||
In the classical moving window method, a time-domain |
|||
signal, <math>x(t)</math> is decomposed into a set of |
|||
coefficients, <math>\epsilon( t, \omega )</math>, based on a set of elementary signals, <math>h_{\omega}(t)</math>, |
|||
defined |
|||
:<math>h_{\omega}(t) = h(t) e^{j \omega t} </math> |
|||
<center><math> |
|||
h_{\omega}(t) = h(t) e^{j \omega t} |
|||
</math></center> |
|||
where <math>h(t)</math> is a (real-valued) lowpass kernel |
where <math>h(t)</math> is a (real-valued) lowpass kernel function, like the window function in the short-time Fourier transform. The coefficients in this decomposition are defined |
||
function, like the window function in the short-time Fourier |
|||
transform. The coefficients in this decomposition are defined |
|||
:<math>\begin{align} |
|||
\epsilon( t, \omega ) |
\epsilon( t, \omega ) &= \int x(\tau) h( t - \tau ) e^{ -j \omega \left[ \tau - t \right]} d\tau \\ |
||
&= \int x(\tau) h( t - \tau ) e^{ -j \omega \left[ \tau - t \right]} d\tau \\ |
|||
&= e^{ j \omega t} \int x(\tau) h( t - \tau ) e^{ -j \omega \tau } d\tau \\ |
&= e^{ j \omega t} \int x(\tau) h( t - \tau ) e^{ -j \omega \tau } d\tau \\ |
||
&= e^{ j \omega t} X(t, \omega) \\ |
&= e^{ j \omega t} X(t, \omega) \\ |
||
&= X_{t}(\omega) = M_{t}(\omega) e^{j \phi_{\tau}(\omega)} |
&= X_{t}(\omega) \\ |
||
&= M_{t}(\omega) e^{j \phi_{\tau}(\omega)} |
|||
\end{align}</math |
\end{align}</math> |
||
where <math>M_{t}(\omega)</math> is the magnitude, and <math>\phi_{\tau}(\omega)</math> the phase, of <math>X_{t}(\omega)</math>, the Fourier transform of the signal <math>x(t)</math> shifted in time by <math>t</math> and windowed by <math>h(t)</math>.<ref name=Fitz09>{{cite arXiv |last1=Fitz |first1=Kelly R. |last2=Fulop |first2=Sean A. |title=A Unified Theory of Time-Frequency Reassignment |date=2009 |class=cs.SD |eprint=0903.3080 }} – this preprint manuscript is written by a previous contributor to this Wikipedia article; see [[Special:Diff/239438445|their contribution]].</ref>{{rp|4}} |
|||
where <math>M_{t}(\omega)</math> is the magnitude, and |
|||
<math>\phi_{\tau}(\omega)</math> the phase, of |
|||
<math>X_{t}(\omega)</math>, the Fourier transform of the |
|||
signal <math>x(t)</math> shifted in time by <math>t</math> |
|||
and windowed by <math>h(t)</math>. |
|||
<math>x(t)</math> can be reconstructed from the moving window coefficients by |
<math>x(t)</math> can be reconstructed from the moving window coefficients by<ref name=Fitz09/>{{rp|8}} |
||
:<math>\begin{align} |
|||
x(t) |
x(t) & = \iint X_{\tau}(\omega) h^{*}_{\omega}(\tau - t) d\omega d\tau \\ |
||
& = \iint X_{\tau}(\omega) h( \tau - t ) e^{ -j \omega \left[ \tau - t \right]} d\omega d\tau \\ |
& = \iint X_{\tau}(\omega) h( \tau - t ) e^{ -j \omega \left[ \tau - t \right]} d\omega d\tau \\ |
||
&= \iint M_{\tau}(\omega) e^{j \phi_{\tau}(\omega)} h( \tau - t ) e^{ -j \omega \left[ \tau - t \right]} d\omega d\tau \\ |
&= \iint M_{\tau}(\omega) e^{j \phi_{\tau}(\omega)} h( \tau - t ) e^{ -j \omega \left[ \tau - t \right]} d\omega d\tau \\ |
||
&= \iint M_{\tau}(\omega) h( \tau - t ) e^{ j \left[ \phi_{\tau}(\omega) - \omega \tau+ \omega t \right] } d\omega d\tau |
&= \iint M_{\tau}(\omega) h( \tau - t ) e^{ j \left[ \phi_{\tau}(\omega) - \omega \tau+ \omega t \right] } d\omega d\tau |
||
\end{align}</math |
\end{align}</math> |
||
For signals having magnitude spectra, <math>M(t,\omega)</math>, whose time variation is slow relative to the phase variation, the maximum contribution to the reconstruction integral comes from the vicinity of the point <math>t,\omega</math> satisfying the phase stationarity condition<ref name=Kodera/>{{rp|74}} |
|||
For signals having magnitude spectra, |
|||
<math>M(t,\omega)</math>, whose time variation is slow |
|||
relative to the phase variation, the maximum contribution to |
|||
the reconstruction integral comes from the vicinity of the |
|||
point <math>t,\omega</math> satisfying the phase |
|||
stationarity condition |
|||
:<math>\begin{align} |
|||
\frac{\partial}{\partial \omega} \left[ \phi_{\tau}(\omega) - \omega \tau + \omega t\right] & = 0 \\ |
\frac{\partial}{\partial \omega} \left[ \phi_{\tau}(\omega) - \omega \tau + \omega t\right] & = 0 \\ |
||
\frac{\partial}{\partial \tau} \left[ \phi_{\tau}(\omega) - \omega \tau + \omega t \right] & = 0 |
\frac{\partial}{\partial \tau} \left[ \phi_{\tau}(\omega) - \omega \tau + \omega t \right] & = 0 |
||
\end{ |
\end{align}</math> |
||
or equivalently, around the point <math>\hat{t}, \hat{\omega}</math> |
or equivalently, around the point <math>\hat{t}, \hat{\omega}</math> defined by<ref name=Kodera/>{{rp|74}} |
||
:<math>\begin{align} |
|||
\hat{t}(\tau, \omega) & = \tau - \frac{\partial \phi_{\tau}(\omega)}{\partial \omega} = |
\hat{t}(\tau, \omega) & = \tau - \frac{\partial \phi_{\tau}(\omega)}{\partial \omega} = -\frac{\partial \phi(\tau, \omega)}{\partial \omega} \\ |
||
\hat{\omega}(\tau, \omega) & = \frac{\partial \phi_{\tau}(\omega)}{\partial \tau} = \omega + \frac{\partial \phi(\tau, \omega)}{\partial \tau} |
|||
\end{align}</math> |
|||
\hat{\omega}(\tau, \omega) & = \frac{\partial \phi_{\tau}(\omega)}{\partial \tau} = |
|||
\omega + \frac{\partial \phi(\tau, \omega)}{\partial \tau} . |
|||
\end{align}</math></center> |
|||
This phenomenon is known in such fields as optics as the [[stationary phase approximation|principle of stationary phase]], which states that for periodic or quasi-periodic signals, the variation of the Fourier phase spectrum not attributable to periodic oscillation is slow with respect to time in the vicinity of the frequency of oscillation, and in surrounding regions the variation is relatively rapid. Analogously, for impulsive signals, that are concentrated in time, the variation of the phase spectrum is slow with respect to frequency near the time of the impulse, and in surrounding regions the variation is relatively rapid.<ref name=Kodera/>{{rp|73}} |
|||
This phenomenon is known in such fields as optics as the |
|||
[[stationary phase approximation|principle of stationary phase]], |
|||
which states that for periodic or quasi-periodic |
|||
signals, the variation of the Fourier phase spectrum not |
|||
attributable to periodic oscillation is slow with respect to |
|||
time in the vicinity of the frequency of oscillation, and in |
|||
surrounding regions the variation is relatively rapid. |
|||
Analogously, for impulsive signals, that are concentrated in |
|||
time, the variation of the phase spectrum is slow with |
|||
respect to frequency near the time of the impulse, and in |
|||
surrounding regions the variation is relatively rapid. |
|||
In reconstruction, positive and negative contributions to the synthesized waveform cancel, due to destructive interference, in frequency regions of rapid phase variation. Only regions of slow phase variation (stationary phase) will contribute significantly to the reconstruction, and the maximum contribution (center of gravity) occurs at the point where the phase is changing most slowly with respect to time and frequency.<ref name=Kodera/>{{rp|71}} |
|||
In reconstruction, positive and negative contributions to |
|||
the synthesized waveform cancel, due to destructive |
|||
interference, in frequency regions of rapid phase variation. |
|||
Only regions of slow phase variation (stationary phase) will |
|||
contribute significantly to the reconstruction, and the |
|||
maximum contribution (center of gravity) occurs at the point |
|||
where the phase is changing most slowly with respect to time |
|||
and frequency. |
|||
The time-frequency coordinates thus computed are equal to the local group delay, <math>\hat{t}_{g}(t,\omega),</math> and local instantaneous frequency, <math>\hat{\omega}_{i}(t,\omega),</math> and are computed from the phase of the short-time Fourier transform, which is normally ignored when constructing the spectrogram. These quantities are ''local'' in the sense that they represent a windowed and filtered signal that is localized in time and frequency, and are not global properties of the signal under analysis.<ref name=Kodera/>{{rp|70}} |
|||
The time-frequency coordinates thus computed are equal to |
|||
the local group delay, <math>\hat{t}_{g}(t,\omega)</math>, |
|||
and local instantaneous frequency, <math>\hat{\omega} |
|||
_{i}(t,\omega)</math>, and are computed from the phase of |
|||
the short-time Fourier transform, which is normally ignored |
|||
when constructing the spectrogram. These quantities are |
|||
''local'' in the sense that they are represent a windowed |
|||
and filtered signal that is localized in time and frequency, |
|||
and are not global properties of the signal under analysis. |
|||
The modified moving window method, or method of reassignment, changes (reassigns) the point of attribution of <math>\epsilon(t,\omega)</math> to this point of maximum contribution <math>\hat{t}(t,\omega), \hat{\omega}(t,\omega)</math>, rather than to the point <math>t,\omega</math> at which it is computed. This point is sometimes called the ''center of gravity'' of the distribution, by way of analogy to a mass distribution. This analogy is a useful reminder that the attribution of spectral energy to the center of gravity of its distribution only makes sense when there is energy to attribute, so the method of reassignment has no meaning at points where the spectrogram is zero-valued.<ref name="improving" /> |
|||
The modified moving window method, or method of |
|||
reassignment, changes (reassigns) the point of attribution |
|||
of <math>\epsilon(t,\omega)</math> to this point of maximum |
|||
contribution <math>\hat{t}(t,\omega), |
|||
\hat{\omega}(t,\omega)</math>, rather than to the point |
|||
<math>t,\omega</math> at which it is computed. This point is |
|||
sometimes called the ''center of gravity'' of the |
|||
distribution, by way of analogy to a mass distribution. This |
|||
analogy is a useful reminder that the attribution of |
|||
spectral energy to the center of gravity of its distribution |
|||
only makes sense when there is energy to attribute, so the |
|||
method of reassignment has no meaning at points where the |
|||
spectrogram is zero-valued. |
|||
== Efficient computation of reassigned times and frequencies == |
== Efficient computation of reassigned times and frequencies == |
||
In digital signal processing, it is most common to sample the time and frequency domains. The discrete Fourier transform is used to compute samples <math>X(k)</math> of the Fourier transform from samples <math>x(n)</math> of a time domain signal. The reassignment operations proposed by Kodera et al. cannot be applied directly to the discrete short-time Fourier transform data, because partial derivatives cannot be computed directly on data that is discrete in time and frequency, and it has been suggested that this difficulty has been the primary barrier to wider use of the method of reassignment. |
|||
In digital signal processing, it is most common to sample |
|||
the time and frequency domains. The discrete Fourier |
|||
transform is used to compute samples <math>X(k)</math> of |
|||
the Fourier transform from samples <math>x(n)</math> of a |
|||
time domain signal. The reassignment operations proposed by |
|||
Kodera ''et al.'' cannot be applied directly to the |
|||
discrete short-time Fourier transform data, because partial |
|||
derivatives cannot be computed directly on data that is |
|||
discrete in time and frequency, and it has been suggested |
|||
that this difficulty has been the primary barrier to wider |
|||
use of the method of reassignment. |
|||
It is possible to approximate the partial derivatives using |
It is possible to approximate the partial derivatives using finite differences. For example, the phase spectrum can be evaluated at two nearby times, and the partial derivative with respect to time be approximated as the difference between the two values divided by the time difference, as in |
||
finite differences. For example, the phase spectrum can be |
|||
evaluated at two nearby times, and the partial derivative |
|||
with respect to time be approximated as the difference |
|||
between the two values divided by the time difference, as in |
|||
:<math>\begin{align} |
|||
\frac{\partial \phi(t, \omega)}{\partial t} & \approx |
\frac{\partial \phi(t, \omega)}{\partial t} & \approx \frac{1}{\Delta t} \left[ \phi \left (t + \frac{\Delta t}{2}, \omega \right ) - \phi \left (t - \frac{\Delta t}{2}, \omega \right ) \right] \\ |
||
\frac{\partial \phi(t, \omega)}{\partial \omega} & \approx \frac{1}{\Delta \omega}\left[ \phi \left (t, \omega+ \frac{\Delta \omega}{2} \right ) - \phi \left (t, \omega-\frac{\Delta \omega}{2} \right ) \right] |
|||
\end{align}</math> |
|||
\frac{\partial \phi(t, \omega)}{\partial \omega} & \approx |
|||
\frac{1}{\Delta \omega} |
|||
\left[ \phi(t, \omega+ \frac{\Delta \omega}{2}) - \phi(t, \omega-\frac{\Delta \omega}{2}) \right] |
|||
\end{matrix}</math></center> |
|||
For sufficiently small values of <math>\Delta t</math> and <math>\Delta \omega,</math> and provided that the phase difference is appropriately "unwrapped", this finite-difference method yields good approximations to the partial derivatives of phase, because in regions of the spectrum in which the evolution of the phase is dominated by rotation due to sinusoidal oscillation of a single, nearby component, the phase is a linear function. |
|||
For sufficiently small values of <math>\Delta t</math> and |
|||
<math>\Delta \omega</math>, and provided that the phase |
|||
difference is appropriately "unwrapped", this |
|||
finite-difference method yields good approximations to the |
|||
partial derivatives of phase, because in regions of the |
|||
spectrum in which the evolution of the phase is dominated by |
|||
rotation due to sinusoidal oscillation of a single, nearby |
|||
component, the phase is a linear function. |
|||
Independently of Kodera ''et al.'', Nelson arrived at a similar method for |
Independently of Kodera ''et al.'', Nelson arrived at a similar method for improving the time-frequency precision of short-time spectral data from partial derivatives of the short-time phase |
||
spectrum.<ref name="crossspectral">{{cite journal |author=D. J. Nelson |date=Nov 2001 |title=Cross-spectral methods for processing speech |journal=Journal of the Acoustical Society of America |volume=110 |issue=5 |pages=2575–2592 |doi=10.1121/1.1402616 |pmid=11757947 |bibcode=2001ASAJ..110.2575N }}</ref> It is easily shown that Nelson's ''cross spectral surfaces'' compute an approximation of the derivatives that is equivalent to the finite differences method. |
|||
improving the time-frequency precision of short-time |
|||
spectral data from partial derivatives of the short-time phase |
|||
spectrum. |
|||
<ref name = "crossspectral"> |
|||
{{cite journal |author=D. J. Nelson |year=2001 |month=Nov |title=Cross-spectral methods for processing speech |journal=Journal of the Acoustical Society of America |volume=110 |issue=5 |pages=2575–2592 |publisher= |doi=10.1121/1.1402616 |url= |accessdate= }} |
|||
</ref> |
|||
It is easily shown that Nelson's |
|||
''cross spectral surfaces'' compute an approximation of the derivatives that |
|||
is equivalent to the finite differences method. |
|||
Auger and Flandrin showed that the method of reassignment, proposed |
Auger and Flandrin showed that the method of reassignment, proposed in the context of the spectrogram by Kodera et al., could be extended to any member of [[Cohen's class]] of time-frequency representations by generalizing the reassignment operations to |
||
in the context of the spectrogram by Kodera ''et al.'', could be extended to |
|||
any member of [[Cohen's class]] of time-frequency representations by generalizing the |
|||
reassignment operations to |
|||
:<math>\begin{align} |
|||
\hat{t} (t,\omega) &= t - \frac{\iint \tau \cdot W_{x}(t-\tau,\omega -\nu) \cdot \Phi(\tau,\nu) d\tau d\nu} {\iint W_{x} \left (t-\tau,\omega -\nu \right ) \cdot \Phi (\tau,\nu) d\tau d\nu } \\ |
|||
\hat{t} (t,\omega) & = t - |
|||
\hat{\omega} (t,\omega) & = \omega - \frac{\iint \nu \cdot W_{x}(t-\tau,\omega -\nu) \cdot \Phi(\tau,\nu) d\tau d\nu} {\iint W_{x}(t-\tau,\omega -\nu) \cdot \Phi(\tau,\nu) d\tau d\nu} |
|||
\end{align}</math> |
|||
{\iint W_{x}(t-\tau,\omega -\nu) \cdot \Phi(\tau,\nu) d\tau d\nu } \\ |
|||
\hat{\omega} (t,\omega) & = \omega - |
|||
\frac{\iint \nu \cdot W_{x}(t-\tau,\omega -\nu) \cdot \Phi(\tau,\nu) d\tau d\nu} |
|||
{\iint W_{x}(t-\tau,\omega -\nu) \cdot \Phi(\tau,\nu) d\tau d\nu} |
|||
\end{matrix}</math></center> |
|||
where <math>W_{x}(t,\omega)</math> is the Wigner–Ville distribution of <math>x(t)</math>, and <math>\Phi(t,\omega)</math> is the kernel function that defines the distribution. They further described an efficient method for computing the times and frequencies for the reassigned spectrogram efficiently and accurately without explicitly computing the partial derivatives of |
|||
where <math>W_{x}(t,\omega)</math> is the Wigner–Ville |
|||
phase.<ref name="improving" /> |
|||
distribution of <math>x(t)</math>, and |
|||
<math>\Phi(t,\omega)</math> is the kernel function that |
|||
defines the distribution. They further described an |
|||
efficient method for computing the times and frequencies for |
|||
the reassigned spectrogram efficiently and accurately |
|||
without explicitly computing the partial derivatives of |
|||
phase. |
|||
<ref name = "improving" /> |
|||
In the case of the spectrogram, the reassignment operations |
In the case of the spectrogram, the reassignment operations can be computed by |
||
can be computed by |
|||
:<math>\begin{align} |
|||
\hat{t} (t,\omega) & = t - \Re \ |
\hat{t} (t,\omega) & = t - \Re \left \{ \frac{ X_{\mathcal{T}h}(t,\omega) \cdot X^*(t,\omega) }{ | X(t,\omega) |^2 } \right \} \\ |
||
\hat{\omega}(t,\omega) & = \omega + \Im \left \{ \frac{ X_{\mathcal{D}h}(t,\omega) \cdot X^*(t,\omega) }{ | X(t,\omega) |^2 } \right \} |
|||
{ | X(t,\omega) |^2 } \Bigg\} \\ |
|||
\end{align}</math> |
|||
\hat{\omega}(t,\omega) & = \omega + \Im \Bigg\{ \frac{ X_{\mathcal{D}h}(t,\omega) \cdot X^*(t,\omega) } |
|||
{ | X(t,\omega) |^2 } \Bigg\} |
|||
\end{matrix}</math></center> |
|||
where <math>X(t,\omega)</math> is the short-time Fourier transform computed using an analysis window <math>h(t), X_{\mathcal{T}h}(t,\omega)</math> is the short-time Fourier transform computed using a time-weighted analysis window <math>h_{\mathcal{T}}(t) = t \cdot h(t)</math> and <math>X_{\mathcal{D}h}(t,\omega)</math> is the short-time Fourier transform computed using a time-derivative analysis window <math>h_{\mathcal{D}}(t) = \tfrac{d}{dt}h(t)</math>. |
|||
where <math>X(t,\omega)</math> is the short-time Fourier |
|||
transform computed using an analysis window |
|||
<math>h(t)</math>, <math>X_{\mathcal{T}h}(t,\omega)</math> |
|||
is the short-time Fourier transform computed using a |
|||
time-weighted anlaysis window <math>h_{\mathcal{T}}(t) = t |
|||
\cdot h(t)</math> and |
|||
<math>X_{\mathcal{D}h}(t,\omega)</math> is the short-time |
|||
Fourier transform computed using a time-derivative analysis |
|||
window <math>h_{\mathcal{D}}(t) = \frac{d}{dt}h(t)</math>. |
|||
Using the auxiliary window functions <math>h_{\mathcal{T}}(t)</math> and <math>h_{\mathcal{D}}(t)</math>, the reassignment operations can be computed at any time-frequency coordinate |
|||
Using the auxiliary window functions |
|||
<math>t,\omega</math> from an algebraic combination of three Fourier transforms evaluated at <math>t,\omega</math>. Since these algorithms operate only on short-time spectral data evaluated at a single time and frequency, and do not explicitly compute any derivatives, this gives an efficient method of computing the reassigned discrete short-time Fourier transform. |
|||
<math>h_{\mathcal{T}}(t)</math> and |
|||
<math>h_{\mathcal{D}}(t)</math>, the reassignment operations |
|||
can be computed at any time-frequency coordinate |
|||
<math>t,\omega</math> from an algebraic combination of three |
|||
Fourier transforms evaluated at <math>t,\omega</math>. Since |
|||
these algorithms operate only on short-time spectral |
|||
data evaluated at a single time and frequency, and do not |
|||
explicitly compute any derivatives, this gives an efficient |
|||
method of computing the reassigned discrete short-time |
|||
Fourier transform. |
|||
One constraint in this method of computation is that the <math>| X(t,\omega) |^2</math> must be non-zero. This is not much of a restriction, |
One constraint in this method of computation is that the <math>| X(t,\omega) |^2</math> must be non-zero. This is not much of a restriction, since the reassignment operation itself implies that there is some energy to reassign, and has no meaning when the distribution is zero-valued. |
||
since the reassignment operation itself implies that there |
|||
is some energy to reassign, and has no meaning when the |
|||
distribution is zero-valued. |
|||
==Separability== |
==Separability== |
||
The short-time Fourier transform can often be used to estimate the amplitudes and phases of the individual components in a ''multi-component'' signal, such as a quasi-harmonic musical instrument tone. Moreover, the time and frequency reassignment operations can be used to sharpen the representation by attributing the spectral energy reported by the short-time Fourier transform to the point that is the local center of gravity of the complex energy distribution.<ref>K. Fitz, L. Haken, On the use of time-frequency reassignment in additve sound modeling, Journal of the Audio Engineering Society 50 (11) (2002) 879 – 893.</ref> |
|||
The short-time Fourier transform can often be used to |
|||
estimate the amplitudes and phases of the individual |
|||
components in a ''multi-component'' signal, such as a |
|||
quasi-harmonic musical instrument tone. Moreover, the time |
|||
and frequency reassignment operations can be used to sharpen |
|||
the representation by attributing the spectral energy |
|||
reported by the short-time Fourier transform to the point |
|||
that is the local center of gravity of the complex energy |
|||
distribution. |
|||
For a signal consisting of a single component, the instantaneous frequency can be estimated from the partial derivatives of phase of any short-time Fourier transform channel that passes the component. If the signal is to be decomposed into many components, |
|||
For a signal consisting of a single component, the |
|||
instantaneous frequency can be estimated from the partial |
|||
derivatives of phase of any short-time Fourier transform |
|||
channel that passes the component. If the signal is to be |
|||
decomposed into many components, |
|||
:<math>x(t) = \sum_{n} A_{n}(t) e^{j \theta_{n}(t)}</math> |
|||
<center><math> |
|||
x(t) = \sum_{n} A_{n}(t) e^{j \theta_{n}(t)} |
|||
</math></center> |
|||
and the instantaneous frequency of each component |
and the instantaneous frequency of each component is defined as the derivative of its phase with respect to time, that is, |
||
is defined as the derivative of its phase with respect to time, |
|||
that is, |
|||
:<math>\omega_{n}(t) = \frac{d \theta_{n}(t)}{d t},</math> |
|||
<center><math> |
|||
\omega_{n}(t) = \frac{d \theta_{n}(t)}{d t}, |
|||
</math></center> |
|||
then the instantaneous frequency of each individual component |
then the instantaneous frequency of each individual component can be computed from the phase of the response of a filter that passes that component, provided that no more than one component lies in the passband of the filter. |
||
can be computed from the phase of the response of a filter that passes |
|||
that component, provided that no more than |
|||
one component lies in the passband of the filter. |
|||
This is the property, in the frequency domain, that Nelson called ''separability''<ref name="crossspectral" /> and is required of all signals so analyzed. If this property is not met, then the desired multi-component decomposition cannot be achieved, because the parameters of individual components cannot be estimated from the short-time Fourier transform. In such cases, a different analysis window must be chosen so that the separability criterion is satisfied. |
|||
This is the property, in the frequency domain, that Nelson |
|||
called ''separability'' |
|||
<ref name = "crossspectral" /> |
|||
and is required of all signals so analyzed. If this property is not met, then |
|||
the desired multi-component decomposition cannot be achieved, |
|||
because the parameters of individual components cannot be |
|||
estimated from the short-time Fourier transform. In such |
|||
cases, a different analysis window must be chosen so that |
|||
the separability criterion is satisfied. |
|||
If the components of a signal are separable in frequency with respect to a particular short-time spectral analysis window, then the output of each short-time Fourier transform filter is a filtered version of, at most, a single dominant (having significant energy) component, and so the derivative, with respect to time, of the phase of the <math>X(t,\omega_0)</math> is equal to the derivative with respect to time, of the phase of the dominant component at <math>\omega_0.</math> Therefore, if a component, <math>x_n(t),</math> having instantaneous frequency <math>\omega_{n}(t)</math> is the dominant component in the vicinity of <math>\omega_0,</math> then the instantaneous frequency of that component can be computed from the phase of the short-time Fourier transform evaluated at <math>\omega_0.</math> That is, |
|||
If the components of a signal are separable in frequency |
|||
with respect to a particular short-time spectral analysis |
|||
window, then the output of each short-time Fourier transform |
|||
filter is a filtered version of, at most, a single |
|||
dominant (having significant energy) component, and so the |
|||
derivative, with respect to time, of the phase of the |
|||
<math>X(t,\omega_{0})</math> is equal to the derivative with |
|||
respect to time, of the phase of the dominant component at |
|||
<math>\omega_{0}</math>. Therefore, if a component, |
|||
<math>x_{n}(t)</math>, having instantaneous frequency |
|||
<math>\omega_{n}(t)</math> is the dominant component in the |
|||
vicinity of <math>\omega_{0}</math>, then the instantaneous |
|||
frequency of that component can be computed from the phase |
|||
of the short-time Fourier transform evaluated at |
|||
<math>\omega_{0}</math>. That is, |
|||
:<math>\begin{align} |
|||
\omega_{n}(t) |
\omega_{n}(t) &= \frac{\partial}{\partial t} \arg\{ x_{n}(t) \} \\ |
||
&= \frac{\partial}{\partial t} \arg\{ x_{n}(t) \} \\ |
|||
&= \frac{\partial }{\partial t} \arg\{ X(t,\omega_{0}) \} |
&= \frac{\partial }{\partial t} \arg\{ X(t,\omega_{0}) \} |
||
\end{ |
\end{align}</math> |
||
Just as each bandpass filter in the short-time Fourier transform filterbank may pass at most a single complex exponential component, two temporal events must be sufficiently separated in time that they do not lie in the same windowed segment of the input signal. This is the property of separability in the time domain, and is equivalent to requiring that the time between two events be |
|||
[[Image:Long-window reassigned spectrogram of speech.png|thumb|400px| |
|||
greater than the length of the impulse response of the short-time Fourier transform filters, the span of non-zero samples in <math>h(t).</math> |
|||
Long-window reassigned spectrogram of the word "open", |
|||
computed using a 54.4 ms Kaiser window with a shaping |
|||
parameter of 9, emphasizing harmonics.]] |
|||
<gallery mode=packed heights=300px> |
|||
[[Image:Short-window reassigned spectrogram of speech.png|thumb|400px| |
|||
Long-window reassigned spectrogram of speech.png|Long-window reassigned spectrogram of the word "open", computed using a 54.4 ms Kaiser window with a shaping parameter of 9, emphasizing harmonics. |
|||
computed using a 13.6 ms Kaiser window with a shaping |
Short-window reassigned spectrogram of speech.png|Short-window reassigned spectrogram of the word "open", computed using a 13.6 ms Kaiser window with a shaping parameter of 9, emphasizing formants and glottal pulses. |
||
</gallery> |
|||
parameter of 9, emphasizing formants and glottal pulses.]] |
|||
In general, there is an infinite number of equally valid decompositions for a multi-component signal. The separability property must be considered in the context of the desired decomposition. For example, in the analysis of a speech signal, an analysis window that is long relative to the time between glottal pulses is sufficient to separate harmonics, but the individual glottal pulses will be smeared, because many pulses are covered by each window (that is, the individual pulses are not separable, in time, by the chosen analysis window). An analysis window that is much shorter than the time between glottal pulses may resolve the glottal pulses, because no window spans more than one pulse, but the harmonic frequencies are smeared together, because the main lobe of the analysis window spectrum is wider than the spacing between the harmonics (that is, the harmonics are not separable, in frequency, by the chosen analysis window).<ref name="crossspectral"/>{{rp|2585}} |
|||
Just as each bandpass filter in the short-time Fourier |
|||
transform filterbank may pass at most a single complex |
|||
exponential component, two temporal events must be |
|||
sufficiently separated in time that they do not lie in the |
|||
same windowed segment of the input signal. This is the |
|||
property of separability in the time domain, and is |
|||
equivalent to requiring that the time between two events be |
|||
greater than the length of the impulse response of the |
|||
short-time Fourier transform filters, the span of non-zero |
|||
samples in <math>h(t)</math>. |
|||
== Extensions == |
|||
In general, there is an infinite number of equally valid |
|||
decompositions for a multi-component signal. |
|||
The separability property must be considered in the context of the |
|||
desired decomposition. For example, in the analysis of a speech signal, |
|||
an analysis window that is long relative to the time between glottal pulses |
|||
is sufficient to separate harmonics, but the individual |
|||
glottal pulses will be smeared, because |
|||
many pulses are covered by each window |
|||
(that is, the individual pulses are not separable, in time, |
|||
by the chosen analysis window). |
|||
An analysis window that is much shorter than the |
|||
time between glottal pulses may resolve the glottal pulses, |
|||
because no window spans |
|||
more than one pulse, but the harmonic frequencies |
|||
are smeared together, because the main lobe of the analysis window |
|||
spectrum is wider than the spacing between the harmonics |
|||
(that is, the harmonics are not separable, in frequency, |
|||
by the chosen analysis window). |
|||
=== Consensus complex reassignment === |
|||
== References == |
|||
Gardner and Magnasco (2006) argues that the [[auditory nerve]]s may use a form of the reassignment method to process sounds. These nerves are known for preserving timing (phase) information better than they do for magnitudes. The authors come up with a variation of reassignment with complex values (i.e. both phase and magnitude) and show that it produces sparse outputs like auditory nerves do. By running this reassignment with windows of different bandwidths (see discussion in the section above), a "consensus" that captures multiple kinds of signals is found, again like the auditory system. They argue that the algorithm is simple enough for neurons to implement.<ref name=Gar06>{{cite journal |last1=Gardner |first1=Timothy J. |last2=Magnasco |first2=Marcelo O. |title=Sparse time-frequency representations |journal=Proceedings of the National Academy of Sciences |date=18 April 2006 |volume=103 |issue=16 |pages=6094–6099 |doi=10.1073/pnas.0601707103|doi-access=free |pmid=16601097 |pmc=1431718 |bibcode=2006PNAS..103.6094G }}</ref> |
|||
=== Synchrosqueezing transform === |
|||
<references/> |
|||
{{empty section|date=January 2024}} |
|||
<ref name=Meignen19>{{cite journal |last1=Meignen |first1=Sylvain |last2=Oberlin |first2=Thomas |last3=Pham |first3=Duong-Hung |title=Synchrosqueezing transforms: From low- to high-frequency modulations and perspectives |journal=Comptes Rendus Physique |date=July 2019 |volume=20 |issue=5 |pages=449–460 |doi=10.1016/j.crhy.2019.07.001|bibcode=2019CRPhy..20..449M }}</ref> |
|||
== References == |
|||
{{Reflist}} |
|||
== Further reading == |
== Further reading == |
||
Line 456: | Line 176: | ||
* [http://www.klingbeil.com/spear/ SPEAR - Sinusoidal Partial Editing Analysis and Resynthesis] |
* [http://www.klingbeil.com/spear/ SPEAR - Sinusoidal Partial Editing Analysis and Resynthesis] |
||
* [http://www.cerlsoundgroup.org/Loris/ Loris - Open-source software for sound modeling and morphing] |
* [http://www.cerlsoundgroup.org/Loris/ Loris - Open-source software for sound modeling and morphing] |
||
* [http://musicalgorithms.ewu.edu/algorithms/roughness.html SRA - A web-based research tool for spectral and roughness analysis of sound signals] (supported by a Northwest Academic Computing Consortium grant to J. Middleton, Eastern Washington University) |
* [http://musicalgorithms.ewu.edu/algorithms/roughness.html SRA - A web-based research tool for spectral and roughness analysis of sound signals] {{Webarchive|url=https://web.archive.org/web/20191118182132/http://musicalgorithms.ewu.edu/algorithms/Roughness.html |date=2019-11-18 }} (supported by a Northwest Academic Computing Consortium grant to J. Middleton, Eastern Washington University) |
||
{{Clear}} |
|||
{{Compression methods}} |
|||
[[Category:Time–frequency analysis]] |
[[Category:Time–frequency analysis]] |
||
[[Category:Transforms]] |
[[Category:Transforms]] |
||
[[Category:Data compression]] |
Latest revision as of 00:53, 6 December 2024
This article needs additional citations for verification. (March 2023) |
The method of reassignment is a technique for sharpening a time-frequency representation (e.g. spectrogram or the short-time Fourier transform) by mapping the data to time-frequency coordinates that are nearer to the true region of support of the analyzed signal. The method has been independently introduced by several parties under various names, including method of reassignment, remapping, time-frequency reassignment, and modified moving-window method.[1] The method of reassignment sharpens blurry time-frequency data by relocating the data according to local estimates of instantaneous frequency and group delay. This mapping to reassigned time-frequency coordinates is very precise for signals that are separable in time and frequency with respect to the analysis window.
Introduction
[edit]Many signals of interest have a distribution of energy that varies in time and frequency. For example, any sound signal having a beginning or an end has an energy distribution that varies in time, and most sounds exhibit considerable variation in both time and frequency over their duration. Time-frequency representations are commonly used to analyze or characterize such signals. They map the one-dimensional time-domain signal into a two-dimensional function of time and frequency. A time-frequency representation describes the variation of spectral energy distribution over time, much as a musical score describes the variation of musical pitch over time.
In audio signal analysis, the spectrogram is the most commonly used time-frequency representation, probably because it is well understood, and immune to so-called "cross-terms" that sometimes make other time-frequency representations difficult to interpret. But the windowing operation required in spectrogram computation introduces an unsavory tradeoff between time resolution and frequency resolution, so spectrograms provide a time-frequency representation that is blurred in time, in frequency, or in both dimensions. The method of time-frequency reassignment is a technique for refocussing time-frequency data in a blurred representation like the spectrogram by mapping the data to time-frequency coordinates that are nearer to the true region of support of the analyzed signal.[2]
The spectrogram as a time-frequency representation
[edit]One of the best-known time-frequency representations is the spectrogram, defined as the squared magnitude of the short-time Fourier transform. Though the short-time phase spectrum is known to contain important temporal information about the signal, this information is difficult to interpret, so typically, only the short-time magnitude spectrum is considered in short-time spectral analysis.[2]
As a time-frequency representation, the spectrogram has relatively poor resolution. Time and frequency resolution are governed by the choice of analysis window and greater concentration in one domain is accompanied by greater smearing in the other.[2]
A time-frequency representation having improved resolution, relative to the spectrogram, is the Wigner–Ville distribution, which may be interpreted as a short-time Fourier transform with a window function that is perfectly matched to the signal. The Wigner–Ville distribution is highly concentrated in time and frequency, but it is also highly nonlinear and non-local. Consequently, this distribution is very sensitive to noise, and generates cross-components that often mask the components of interest, making it difficult to extract useful information concerning the distribution of energy in multi-component signals.[2]
Cohen's class of bilinear time-frequency representations is a class of "smoothed" Wigner–Ville distributions, employing a smoothing kernel that can reduce sensitivity of the distribution to noise and suppresses cross-components, at the expense of smearing the distribution in time and frequency. This smearing causes the distribution to be non-zero in regions where the true Wigner–Ville distribution shows no energy.[2]
The spectrogram is a member of Cohen's class. It is a smoothed Wigner–Ville distribution with the smoothing kernel equal to the Wigner–Ville distribution of the analysis window. The method of reassignment smooths the Wigner–Ville distribution, but then refocuses the distribution back to the true regions of support of the signal components. The method has been shown to reduce time and frequency smearing of any member of Cohen's class.[2][3] In the case of the reassigned spectrogram, the short-time phase spectrum is used to correct the nominal time and frequency coordinates of the spectral data, and map it back nearer to the true regions of support of the analyzed signal.
The method of reassignment
[edit]Pioneering work on the method of reassignment was published by Kodera, Gendrin, and de Villedary under the name of Modified Moving Window Method.[4] Their technique enhances the resolution in time and frequency of the classical Moving Window Method (equivalent to the spectrogram) by assigning to each data point a new time-frequency coordinate that better-reflects the distribution of energy in the analyzed signal.[4]: 67
In the classical moving window method, a time-domain signal, is decomposed into a set of coefficients, , based on a set of elementary signals, , defined[4]: 73
where is a (real-valued) lowpass kernel function, like the window function in the short-time Fourier transform. The coefficients in this decomposition are defined
where is the magnitude, and the phase, of , the Fourier transform of the signal shifted in time by and windowed by .[5]: 4
can be reconstructed from the moving window coefficients by[5]: 8
For signals having magnitude spectra, , whose time variation is slow relative to the phase variation, the maximum contribution to the reconstruction integral comes from the vicinity of the point satisfying the phase stationarity condition[4]: 74
or equivalently, around the point defined by[4]: 74
This phenomenon is known in such fields as optics as the principle of stationary phase, which states that for periodic or quasi-periodic signals, the variation of the Fourier phase spectrum not attributable to periodic oscillation is slow with respect to time in the vicinity of the frequency of oscillation, and in surrounding regions the variation is relatively rapid. Analogously, for impulsive signals, that are concentrated in time, the variation of the phase spectrum is slow with respect to frequency near the time of the impulse, and in surrounding regions the variation is relatively rapid.[4]: 73
In reconstruction, positive and negative contributions to the synthesized waveform cancel, due to destructive interference, in frequency regions of rapid phase variation. Only regions of slow phase variation (stationary phase) will contribute significantly to the reconstruction, and the maximum contribution (center of gravity) occurs at the point where the phase is changing most slowly with respect to time and frequency.[4]: 71
The time-frequency coordinates thus computed are equal to the local group delay, and local instantaneous frequency, and are computed from the phase of the short-time Fourier transform, which is normally ignored when constructing the spectrogram. These quantities are local in the sense that they represent a windowed and filtered signal that is localized in time and frequency, and are not global properties of the signal under analysis.[4]: 70
The modified moving window method, or method of reassignment, changes (reassigns) the point of attribution of to this point of maximum contribution , rather than to the point at which it is computed. This point is sometimes called the center of gravity of the distribution, by way of analogy to a mass distribution. This analogy is a useful reminder that the attribution of spectral energy to the center of gravity of its distribution only makes sense when there is energy to attribute, so the method of reassignment has no meaning at points where the spectrogram is zero-valued.[2]
Efficient computation of reassigned times and frequencies
[edit]In digital signal processing, it is most common to sample the time and frequency domains. The discrete Fourier transform is used to compute samples of the Fourier transform from samples of a time domain signal. The reassignment operations proposed by Kodera et al. cannot be applied directly to the discrete short-time Fourier transform data, because partial derivatives cannot be computed directly on data that is discrete in time and frequency, and it has been suggested that this difficulty has been the primary barrier to wider use of the method of reassignment.
It is possible to approximate the partial derivatives using finite differences. For example, the phase spectrum can be evaluated at two nearby times, and the partial derivative with respect to time be approximated as the difference between the two values divided by the time difference, as in
For sufficiently small values of and and provided that the phase difference is appropriately "unwrapped", this finite-difference method yields good approximations to the partial derivatives of phase, because in regions of the spectrum in which the evolution of the phase is dominated by rotation due to sinusoidal oscillation of a single, nearby component, the phase is a linear function.
Independently of Kodera et al., Nelson arrived at a similar method for improving the time-frequency precision of short-time spectral data from partial derivatives of the short-time phase spectrum.[6] It is easily shown that Nelson's cross spectral surfaces compute an approximation of the derivatives that is equivalent to the finite differences method.
Auger and Flandrin showed that the method of reassignment, proposed in the context of the spectrogram by Kodera et al., could be extended to any member of Cohen's class of time-frequency representations by generalizing the reassignment operations to
where is the Wigner–Ville distribution of , and is the kernel function that defines the distribution. They further described an efficient method for computing the times and frequencies for the reassigned spectrogram efficiently and accurately without explicitly computing the partial derivatives of phase.[2]
In the case of the spectrogram, the reassignment operations can be computed by
where is the short-time Fourier transform computed using an analysis window is the short-time Fourier transform computed using a time-weighted analysis window and is the short-time Fourier transform computed using a time-derivative analysis window .
Using the auxiliary window functions and , the reassignment operations can be computed at any time-frequency coordinate from an algebraic combination of three Fourier transforms evaluated at . Since these algorithms operate only on short-time spectral data evaluated at a single time and frequency, and do not explicitly compute any derivatives, this gives an efficient method of computing the reassigned discrete short-time Fourier transform.
One constraint in this method of computation is that the must be non-zero. This is not much of a restriction, since the reassignment operation itself implies that there is some energy to reassign, and has no meaning when the distribution is zero-valued.
Separability
[edit]The short-time Fourier transform can often be used to estimate the amplitudes and phases of the individual components in a multi-component signal, such as a quasi-harmonic musical instrument tone. Moreover, the time and frequency reassignment operations can be used to sharpen the representation by attributing the spectral energy reported by the short-time Fourier transform to the point that is the local center of gravity of the complex energy distribution.[7]
For a signal consisting of a single component, the instantaneous frequency can be estimated from the partial derivatives of phase of any short-time Fourier transform channel that passes the component. If the signal is to be decomposed into many components,
and the instantaneous frequency of each component is defined as the derivative of its phase with respect to time, that is,
then the instantaneous frequency of each individual component can be computed from the phase of the response of a filter that passes that component, provided that no more than one component lies in the passband of the filter.
This is the property, in the frequency domain, that Nelson called separability[6] and is required of all signals so analyzed. If this property is not met, then the desired multi-component decomposition cannot be achieved, because the parameters of individual components cannot be estimated from the short-time Fourier transform. In such cases, a different analysis window must be chosen so that the separability criterion is satisfied.
If the components of a signal are separable in frequency with respect to a particular short-time spectral analysis window, then the output of each short-time Fourier transform filter is a filtered version of, at most, a single dominant (having significant energy) component, and so the derivative, with respect to time, of the phase of the is equal to the derivative with respect to time, of the phase of the dominant component at Therefore, if a component, having instantaneous frequency is the dominant component in the vicinity of then the instantaneous frequency of that component can be computed from the phase of the short-time Fourier transform evaluated at That is,
Just as each bandpass filter in the short-time Fourier transform filterbank may pass at most a single complex exponential component, two temporal events must be sufficiently separated in time that they do not lie in the same windowed segment of the input signal. This is the property of separability in the time domain, and is equivalent to requiring that the time between two events be greater than the length of the impulse response of the short-time Fourier transform filters, the span of non-zero samples in
-
Long-window reassigned spectrogram of the word "open", computed using a 54.4 ms Kaiser window with a shaping parameter of 9, emphasizing harmonics.
-
Short-window reassigned spectrogram of the word "open", computed using a 13.6 ms Kaiser window with a shaping parameter of 9, emphasizing formants and glottal pulses.
In general, there is an infinite number of equally valid decompositions for a multi-component signal. The separability property must be considered in the context of the desired decomposition. For example, in the analysis of a speech signal, an analysis window that is long relative to the time between glottal pulses is sufficient to separate harmonics, but the individual glottal pulses will be smeared, because many pulses are covered by each window (that is, the individual pulses are not separable, in time, by the chosen analysis window). An analysis window that is much shorter than the time between glottal pulses may resolve the glottal pulses, because no window spans more than one pulse, but the harmonic frequencies are smeared together, because the main lobe of the analysis window spectrum is wider than the spacing between the harmonics (that is, the harmonics are not separable, in frequency, by the chosen analysis window).[6]: 2585
Extensions
[edit]Consensus complex reassignment
[edit]Gardner and Magnasco (2006) argues that the auditory nerves may use a form of the reassignment method to process sounds. These nerves are known for preserving timing (phase) information better than they do for magnitudes. The authors come up with a variation of reassignment with complex values (i.e. both phase and magnitude) and show that it produces sparse outputs like auditory nerves do. By running this reassignment with windows of different bandwidths (see discussion in the section above), a "consensus" that captures multiple kinds of signals is found, again like the auditory system. They argue that the algorithm is simple enough for neurons to implement.[8]
Synchrosqueezing transform
[edit]This section is empty. You can help by adding to it. (January 2024) |
References
[edit]- ^ Hainsworth, Stephen (2003). "Chapter 3: Reassignment methods". Techniques for the Automated Analysis of Musical Audio (PhD). University of Cambridge. CiteSeerX 10.1.1.5.9579.
- ^ a b c d e f g h F. Auger & P. Flandrin (May 1995). "Improving the readability of time-frequency and time-scale representations by the reassignment method". IEEE Transactions on Signal Processing. 43 (5): 1068–1089. Bibcode:1995ITSP...43.1068A. CiteSeerX 10.1.1.646.794. doi:10.1109/78.382394. S2CID 6336685.
- ^ P. Flandrin, F. Auger, and E. Chassande-Mottin, Time-frequency reassignment: From principles to algorithms, in Applications in Time-Frequency Signal Processing (A. Papandreou-Suppappola, ed.), ch. 5, pp. 179 – 203, CRC Press, 2003.
- ^ a b c d e f g h K. Kodera; R. Gendrin & C. de Villedary (Feb 1978). "Analysis of time-varying signals with small BT values". IEEE Transactions on Acoustics, Speech, and Signal Processing. 26 (1): 64–76. doi:10.1109/TASSP.1978.1163047.
- ^ a b Fitz, Kelly R.; Fulop, Sean A. (2009). "A Unified Theory of Time-Frequency Reassignment". arXiv:0903.3080 [cs.SD]. – this preprint manuscript is written by a previous contributor to this Wikipedia article; see their contribution.
- ^ a b c D. J. Nelson (Nov 2001). "Cross-spectral methods for processing speech". Journal of the Acoustical Society of America. 110 (5): 2575–2592. Bibcode:2001ASAJ..110.2575N. doi:10.1121/1.1402616. PMID 11757947.
- ^ K. Fitz, L. Haken, On the use of time-frequency reassignment in additve sound modeling, Journal of the Audio Engineering Society 50 (11) (2002) 879 – 893.
- ^ Gardner, Timothy J.; Magnasco, Marcelo O. (18 April 2006). "Sparse time-frequency representations". Proceedings of the National Academy of Sciences. 103 (16): 6094–6099. Bibcode:2006PNAS..103.6094G. doi:10.1073/pnas.0601707103. PMC 1431718. PMID 16601097.
- ^ Meignen, Sylvain; Oberlin, Thomas; Pham, Duong-Hung (July 2019). "Synchrosqueezing transforms: From low- to high-frequency modulations and perspectives". Comptes Rendus Physique. 20 (5): 449–460. Bibcode:2019CRPhy..20..449M. doi:10.1016/j.crhy.2019.07.001.
Further reading
[edit]- S. A. Fulop and K. Fitz, A spectrogram for the twenty-first century, Acoustics Today, vol. 2, no. 3, pp. 26–33, 2006.
- S. A. Fulop and K. Fitz, Algorithms for computing the time-corrected instantaneous frequency (reassigned) spectrogram, with applications, Journal of the Acoustical Society of America, vol. 119, pp. 360 – 371, Jan 2006.
External links
[edit]- TFTB — Time-Frequency ToolBox
- SPEAR - Sinusoidal Partial Editing Analysis and Resynthesis
- Loris - Open-source software for sound modeling and morphing
- SRA - A web-based research tool for spectral and roughness analysis of sound signals Archived 2019-11-18 at the Wayback Machine (supported by a Northwest Academic Computing Consortium grant to J. Middleton, Eastern Washington University)