Jump to content

User:Jliu49/sandbox

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Jliu49 (talk | contribs) at 13:08, 18 July 2012. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Computational Auditory Scene Analysis

What is CASA?

Based on Bregman’s work on underlying principles of the perception of complex acoustic mixtures, CASA emerged aiming to develop machine systems that separate sound sources through the use of perceptual principles (CASA book, xvii). As a study of ASA through computational means, CASA aims to achieve human performance in ASA through monaural or binaural acoustic scene input (CASA, pg 11).

Instead of modeling the human auditory system in detail, the study of CASA works within the basis of certain principles of auditory processing and sound separation.

There is certain criticism of CASA lacking a common theory, often compared with automatic speech recognition, whose goal and evaluation is common with CASA. However, the realm of CASA is much wider than that of automatic speech recognition.

Goals

(Top edit on existing material with added application information)

i. Marr’s framework

CASA is to describe the environmental audio to the listener (CASA 12, referencing [85]).

ii. Bregman’s framework

CASA is to computationally extract individual streams from one or two recordings of an acoustic scene (CASA 12, referencing [13]).

Cocktail party processor

1. CASA has helped construct a “cocktail party processor”. [material already exists on WIKI page]

Applications

(Brief overview, will be expanded later in the article)

1. Compared with ASR (automatic speech recognition) a. As a study of ASA through computational means, CASA aims to achieve human performance in ASA through monaural or binaural acoustic scene input (CASA, pg 11).


Compared with ASR

Difference between ASR (automatic speech recognition) and CASA [1]


Basics of CASA Systems

Human ASA

(The main neuroscience part of the article)[2]

Structure and Function of the Auditory System

Will include neuroanatomy and function (but with a focus of CASA application) of the following: Auditory periphery

i. The auditory periphery acts as a complex transducer that converts sound vibrations into action potentials in the auditory nerve. The periphery consists of 3 areas, the outer, middle and inner ear. The outer ear consists of the external ear, ear canal and the ear drum.

  1. The flange surrounds the pinna, serving as a conical structure that functions as an old-fashioned ear trumpet (Auditory Perception, pg 5).
  2. The outer ear, like an acoustic funnel, helps locating the sound source. (Auditory Perception, pg 5).
  3. The ear canal acts as a resonant tube (like an organ pipe) to amplify frequencies between 2 000 through 5 500Hz with a maximum amplification of about 11dB occurring around 4 000 Hz (Wiener, 1947 _ in Auditory Perception, pg. 6).  

b. Cochlea

i. As the organ of hearing, the cochlea consists of 2 membranes, Reissner’s and the basilar membrane (CASA pg 3).

c. Movements of the basilar membrane

i. The basilar membrane moves to audio stimuli through the specific stimulus frequency matches the resonant frequency of a particular region of the basilar membrane (CASA pg 3). ii. The stimulus frequency can also be represented by a place code (CASA pg 3). iii. place code theory (http://www.cns.nyu.edu/~david/courses/perception/lecturenotes/pitch/pitch.html) iv. The movement the basilar membrane displaces the inner hair cells, which encodes a half-wave rectified signal of action potentials in the spiral ganglion cells. The axons of these cells make up the auditory nerve (CASA pg 3).

d. Auditory nerve responses

i. “Auditory nerve encodes a half-wave rectified version of the stimulus because the Aps are only initiated by the movement of hairs in one direction” (CASA book pg 3) ii. The auditory nerve responses select certain frequencies, similar to the basilar membrane. For lower frequencies, the fibers exhibit “phase locking”.

e. Auditory cortex

i. Neurons in higher auditory pathway centers are tuned to specific stimuli features, such as periodicity, sound intensity, amplitude and frequency modulation (CASA pg 3).

System Architecture

(brand new information to page)

Cochleagram

(brand new information to page)

1. Figure 1.5 in CASA book (pg 14).

Correlogram

(brand new information to page)[3]

1. As the first stage of CASA processing, the cochleagram creates a time-frequency representation of the input signal. By mimicking the components of the outer and middle ear, the signal is broken up into different frequencies that are naturally selected by the cochlea and hair cells. Because of the frequency selectivity of the basilar membrane, a filterbank is used to model the membrane, with each filter associated with a specific point on the basilar membrane (CASA book pg 15). It has been shown that a 4th order gammatone filter gives an excellent fit for the experimentally derived human auditory filter shape. [105 resource from CASA book, pg 16]


Cross-Correlogram

(brand new information to page)

Time-Frequency Masks

(brand new information to page)

Resynthesis

(brand new information to page)

Evaluation of CASA systems

Will include the following:

a. Comparison with Clean Target Signal

b. Automatic Recognition Measure

c. Human Listening

d. Correspondence with Biological Data

Applications

(in this section, addition to current information)[4]

Monaural CASA

(historical)

Binaural CASA

(historical)

Neural CASA Models

(current)

Analysis of Musical Audio Signals

(current)[5]

Neural Perceptual Modeling

(current)

See also

Auditory scene analysis

References

  1. ^ Wang, DeLiang, “Computational Scene Analysis”, Challenges for Computational Intelligence, Springer, Berlin, pp. 163-191, 2007.
  2. ^ Wang, DeLiang, “Computational Scene Analysis”, Challenges for Computational Intelligence, Springer, Berlin, pp. 163-191, 2007.
  3. ^ Wang, DeLiang. Computational Auditory Scene Analysis: Principles, Algorithms, And Applications. Hoboken, N.J.: Wiley interscience , 2006.
  4. ^ Brown, G., Cooke, M., "Computational scene analysis", Computer Speech and Language", vol. 8, pp. 297-336, 1994.
  5. ^ Godsmark, D., Brown, G., "A blackboard architecture for computational auditory scene analysis", Speech Communication, vol. 27, pp. 351-366, 1999.