User:Jliu49/sandbox
Computational Auditory Scene Analysis
What is CASA?
CASA emerged aiming to develop machine systems that separate sound sources through the use of perceptual principles on complex acoustic mixture[1]. As a study of ASA through computational means, CASA aims to achieve human performance in ASA through monaural or binaural acoustic scene input [1]. A main contributor to ASA was the work of Albert Bregman [2]. . Instead of modeling the human auditory system in detail, the study of CASA works within the basis of certain principles of auditory processing and sound separation. While CASA is often compared with automatic speech recognition (ASR), whose goal and evaluation is common with CASA, the realm of CASA is much wider than that of ASR. ASR, or any form of speech enhancement, assumes the input to have a signal (target) and a noise component. However, CASA assumes the input to have a variety of components that can randomly appear and disappear within the audio input and aim to process and manipulate the signals similar to human performance [1].
Principles
Human ASA
Since CASA serves to model functionality parts of the auditory system, it is necessary to view parts of the biological auditory system in terms of known physical models. Consisting of 3 areas, the outer, middle and inner ear, the auditory periphery acts as a complex transducer that converts sound vibrations into action potentials in the auditory nerve. The outer ear consists of the external ear, ear canal and the ear drum. The outer ear, like an acoustic funnel, helps locating the sound source [3]. The ear canal acts as a resonant tube (like an organ pipe) to amplify frequencies between 2 000 through 5 500Hz with a maximum amplification of about 11dB occurring around 4 000 Hz [4]. As the organ of hearing, the cochlea consists of 2 membranes, Reissner’s and the basilar membrane. The basilar membrane moves to audio stimuli through the specific stimulus frequency matches the resonant frequency of a particular region of the basilar membrane. The movement the basilar membrane displaces the inner hair cells in one direction, which encodes a half-wave rectified signal of action potentials in the spiral ganglion cells. The axons of these cells make up the auditory nerve, encoding the rectified stimulus. The auditory nerve responses select certain frequencies, similar to the basilar membrane. For lower frequencies, the fibers exhibit “phase locking”. Neurons in higher auditory pathway centers are tuned to specific stimuli features, such as periodicity, sound intensity, amplitude and frequency modulation [1].
There are also neuroanatomical associations of ASA through the posterior cortical areas, including the posterior superior temporal lobes and the posterior cingulate. Studies have found that impairments in ASA and segregation and grouping operations are affected in patients with Alzheimer’s disease Cite error: A <ref>
tag is missing the closing </ref>
(see the help page).
Neurons in higher auditory pathway centers are tuned to specific stimuli features, such as periodicity, sound intensity, amplitude and frequency modulation (CASA pg 3).
System Architecture
(brand new information to page)
Cochleagram
(brand new information to page)
As the first stage of CASA processing, the cochleagram creates a time-frequency representation of the input signal. By mimicking the components of the outer and middle ear, the signal is broken up into different frequencies that are naturally selected by the cochlea and hair cells. Because of the frequency selectivity of the basilar membrane, a filterbank is used to model the membrane, with each filter associated with a specific point on the basilar membrane (CASA book pg 15). It has been shown that a 4th order gammatone filter gives an excellent fit for the experimentally derived human auditory filter shape. [105 resource from CASA book, pg 16]
1. Figure 1.5 in CASA book (pg 14).
Correlogram
(brand new information to page)[5]
Cross-Correlogram
(brand new information to page)
Time-Frequency Masks
(brand new information to page)
Resynthesis
(brand new information to page)
Evaluation of CASA systems
Will include the following:
a. Comparison with Clean Target Signal
b. Automatic Recognition Measure
c. Human Listening
d. Correspondence with Biological Data
Applications
(in this section, addition to current information)[6]
Monaural CASA
(historical)
Binaural CASA
(historical)
Neural CASA Models
(current)
Analysis of Musical Audio Signals
(current)[7]
Neural Perceptual Modeling
(current)
See also
References
- ^ a b c d Wang, DeLiang, “Computational Scene Analysis”, Challenges for Computational Intelligence, Springer, Berlin, pp. 163-191, 2007.
- ^ Bregman, A. Auditory Scene Analysis. MIT Press, Cambridge, MA, 1990.
- ^ Warren, R., Auditory Perception: A New Analysis and Synthesis. New York: Cambridge University Press, 1999.
- ^ Wiener, F., On the diffraction of a progressive wave by the human head. Journal of the Acoustical Society of America, 19, 143-146, 1947.
- ^ Wang, DeLiang. Computational Auditory Scene Analysis: Principles, Algorithms, And Applications. Hoboken, N.J.: Wiley interscience , 2006.
- ^ Brown, G., Cooke, M., "Computational scene analysis", Computer Speech and Language", vol. 8, pp. 297-336, 1994.
- ^ Godsmark, D., Brown, G., "A blackboard architecture for computational auditory scene analysis", Speech Communication, vol. 27, pp. 351-366, 1999.