Revision as of 01:36, 26 July 2012

This is the user sandbox of Jliu49. A user sandbox is a subpage of the user's user page. It serves as a testing spot and page development space for the user and is not an encyclopedia article. Create or edit your own sandbox here.

Other sandboxes: Main sandbox | Template sandbox

Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review!

Computational Auditory Scene Analysis

What is CASA?

CASA emerged aiming to develop machine systems that separate sound sources through the use of perceptual principles on complex acoustic mixture^[1]. As a study of ASA through computational means, CASA aims to achieve human performance in ASA through monaural or binaural acoustic scene input ^[1]. A main contributor to ASA was the work of Albert Bregman ^[2]. . Instead of modeling the human auditory system in detail, the study of CASA works within the basis of certain principles of auditory processing and sound separation. While CASA is often compared with automatic speech recognition (ASR), whose goal and evaluation is common with CASA, the realm of CASA is much wider than that of ASR. ASR, or any form of speech enhancement, assumes the input to have a signal (target) and a noise component. However, CASA assumes the input to have a variety of components that can randomly appear and disappear within the audio input and aim to process and manipulate the signals similar to human performance ^[1].

Principles

Human ASA

Since CASA serves to model functionality parts of the auditory system, it is necessary to view parts of the biological auditory system in terms of known physical models. Consisting of 3 areas, the outer, middle and inner ear, the auditory periphery acts as a complex transducer that converts sound vibrations into action potentials in the auditory nerve. The outer ear consists of the external ear, ear canal and the ear drum. The outer ear, like an acoustic funnel, helps locating the sound source ^[3]. The ear canal acts as a resonant tube (like an organ pipe) to amplify frequencies between 2 000 through 5 500Hz with a maximum amplification of about 11dB occurring around 4 000 Hz ^[4]. As the organ of hearing, the cochlea consists of 2 membranes, Reissner’s and the basilar membrane. The basilar membrane moves to audio stimuli through the specific stimulus frequency matches the resonant frequency of a particular region of the basilar membrane. The movement the basilar membrane displaces the inner hair cells in one direction, which encodes a half-wave rectified signal of action potentials in the spiral ganglion cells. The axons of these cells make up the auditory nerve, encoding the rectified stimulus. The auditory nerve responses select certain frequencies, similar to the basilar membrane. For lower frequencies, the fibers exhibit “phase locking”. Neurons in higher auditory pathway centers are tuned to specific stimuli features, such as periodicity, sound intensity, amplitude and frequency modulation ^[1]. There are also neuroanatomical associations of ASA through the posterior cortical areas, including the posterior superior temporal lobes and the posterior cingulate. Studies have found that impairments in ASA and segregation and grouping operations are affected in patients with Alzheimer’s disease Cite error: A <ref> tag is missing the closing </ref> (see the help page).

Neurons in higher auditory pathway centers are tuned to specific stimuli features, such as periodicity, sound intensity, amplitude and frequency modulation (CASA pg 3).