This is the user sandbox of Jliu49. A user sandbox is a subpage of the user's user page. It serves as a testing spot and page development space for the user and is not an encyclopedia article. Create or edit your own sandbox here.

Other sandboxes: Main sandbox | Template sandbox

Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review!

Computational Auditory Scene Analysis

What is CASA?

Based on Bregman’s work on underlying principles of the perception of complex acoustic mixtures, CASA emerged aiming to develop machine systems that separate sound sources through the use of perceptual principles (CASA book, xvii). As a study of ASA through computational means, CASA aims to achieve human performance in ASA through monaural or binaural acoustic scene input (CASA, pg 11).

Instead of modeling the human auditory system in detail, the study of CASA works within the basis of certain principles of auditory processing and sound separation.

There is certain criticism of CASA lacking a common theory, often compared with automatic speech recognition, whose goal and evaluation is common with CASA. However, the realm of CASA is much wider than that of automatic speech recognition.

Goals

(Top edit on existing material with added application information)

i. Marr’s framework

CASA is to describe the environmental audio to the listener (CASA 12, referencing [85]).

ii. Bregman’s framework

CASA is to computationally extract individual streams from one or two recordings of an acoustic scene (CASA 12, referencing [13]).

Cocktail party processor

1. CASA has helped construct a “cocktail party processor”. [material already exists on WIKI page]

Applications

(Brief overview, will be expanded later in the article)

1. Compared with ASR (automatic speech recognition) a. As a study of ASA through computational means, CASA aims to achieve human performance in ASA through monaural or binaural acoustic scene input (CASA, pg 11).

Compared with ASR

Difference between ASR (automatic speech recognition) and CASA ^[1]

Basics of CASA Systems

Human ASA

(The main neuroscience part of the article)^[2]

Structure and Function of the Auditory System

Will include neuroanatomy and function (but with a focus of CASA application) of the following: Auditory periphery

i. The auditory periphery acts as a complex transducer that converts sound vibrations into action potentials in the auditory nerve. The periphery consists of 3 areas, the outer, middle and inner ear. The outer ear consists of the external ear, ear canal and the ear drum.

  1. The flange surrounds the pinna, serving as a conical structure that functions as an old-fashioned ear trumpet (Auditory Perception, pg 5).
  2. The outer ear, like an acoustic funnel, helps locating the sound source. (Auditory Perception, pg 5).
  3. The ear canal acts as a resonant tube (like an organ pipe) to amplify frequencies between 2 000 through 5 500Hz with a maximum amplification of about 11dB occurring around 4 000 Hz (Wiener, 1947 _ in Auditory Perception, pg. 6).

b. Cochlea

i. As the organ of hearing, the cochlea consists of 2 membranes, Reissner’s and the basilar membrane (CASA pg 3).

c. Movements of the basilar membrane

i. The basilar membrane moves to audio stimuli through the specific stimulus frequency matches the resonant frequency of a particular region of the basilar membrane (CASA pg 3). ii. The stimulus frequency can also be represented by a place code (CASA pg 3). iii. place code theory (http://www.cns.nyu.edu/~david/courses/perception/lecturenotes/pitch/pitch.html) iv. The movement the basilar membrane displaces the inner hair cells, which encodes a half-wave rectified signal of action potentials in the spiral ganglion cells. The axons of these cells make up the auditory nerve (CASA pg 3).

d. Auditory nerve responses

i. “Auditory nerve encodes a half-wave rectified version of the stimulus because the Aps are only initiated by the movement of hairs in one direction” (CASA book pg 3) ii. The auditory nerve responses select certain frequencies, similar to the basilar membrane. For lower frequencies, the fibers exhibit “phase locking”.

e. Auditory cortex

i. Neurons in higher auditory pathway centers are tuned to specific stimuli features, such as periodicity, sound intensity, amplitude and frequency modulation (CASA pg 3).