Jump to content

User:Fives enough/sandbox: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
Citation bot (talk | contribs)
Alter: pages. Add: pmid, vauthors, doi, date, journal. Removed access-date with no URL. Removed parameters. Formatted dashes. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | #UCB_CommandLine
 
(11 intermediate revisions by 3 users not shown)
Line 5: Line 5:
'''Annotated Bibliography Section'''
'''Annotated Bibliography Section'''


1. The authors of this article are both researchers and have written numerous articles relating to speech, hearing, and computer or technology use; while each has a different background: studying how people integrate sensory information in speech perception and building lipreading machines that can decode speech. I did not detect any particular bias in their article. The article was mainly discussing how people and machines can integrate auditory and visual information to understand speech. While the article was posted in the ''American Scientist'' it does not contain much technical jargon and could be easily understand by a college level student or someone with an interest in the subject.<ref>{{cite journal|last=Massaro|first=Dominic W.|coauthors=David G. Stork|title=Speech recognition and sensory integration: a 240-year-old theorem helps explain how people and machines can integrate auditory and visual information to understand speech|title=American Scientist|year=1998|month=May-June|volume=86|pages=236-239|accessdate=)October 5, 2010}}</ref>
1. The authors of this article are both researchers and have written numerous articles relating to speech, hearing, and computer or technology use; while each has a different background: studying how people integrate sensory information in speech perception and building lipreading machines that can decode speech. I did not detect any particular bias in their article. The article was mainly discussing how people and machines can integrate auditory and visual information to understand speech. While the article was posted in the ''American Scientist'' it does not contain much technical jargon and could be easily understand by a college level student or someone with an interest in the subject.<ref>{{cite journal|last=Massaro|first=Dominic W.|coauthors=David G. Stork|title=Speech recognition and sensory integration: a 240-year-old theorem helps explain how people and machines can integrate auditory and visual information to understand speech|journal=American Scientist|year=1998|date=May-June 1998|volume=86|pages=236–239|doi=10.1511/1998.25.236 }}</ref>


2. The authors of this article are Italian researchers in the Audiology and Phoniatrics Department at University Hospital of Ferrara. I cannot locate further credentials that are in English, but came across multiple publications and articles written by them individually and collectively. This article discusses the phenomenon as it occurs in various other languages with the main focus of their study on Italian subjects. The article starts out fairly basic but it soon becomes apparent that it is a journal publication and becomes much more technical and over the heads of the average reader. I did not notice any bias to the article.<ref>{{cite journal|last=Bovo|first=R.|coauthers=A.Ciorba,S. Prosser, A. Martini|title=The McGurk phenomenon in Italian listeners|title=ACTA Otorhinolaryngologica Italica|year=2009|month=August|volume=29|pages=203-208|accessdate=)October 5, 2010}}</ref>
2. The authors of this article are Italian researchers in the Audiology and Phoniatrics Department at University Hospital of Ferrara. I cannot locate further credentials that are in English, but came across multiple publications and articles written by them individually and collectively. This article discusses the phenomenon as it occurs in various other languages with the main focus of their study on Italian subjects. The article starts out fairly basic but it soon becomes apparent that it is a journal publication and becomes much more technical and over the heads of the average reader. I did not notice any bias to the article.<ref>{{cite journal|last=Bovo|first=R.|vauthors=A.Ciorba,S. Prosser, A. Martini|title=The McGurk phenomenon in Italian listeners|journal=ACTA Otorhinolaryngologica Italica|year=2009|date=August 2009|volume=29|pages=203–208}}</ref>


3. This article is written by various authors with a wide range of experience. I think this group is a reliable source due to their combined vast knowledge of the music and sound field as well it's relation to auditory research. Each member has authored numerous articles and publications. The article appears to give a fair assessment of the study and does not seem biased to me. This article is found in a professional journal and does include quite a bit of technical terminology, but with a little further research can be easily understood by college students or anyone familiar with speech therapy, or has an auditory or music background. This article discusses how phonemes are heard and visualized in sung syllables versus the studies of this same phenomenon in speech patterns.<ref>{{cite journal|last=Quinto|first=Lena|coauthors=William Forde Thompson, Frank A. Russo, Sandra E. Trehub|title=A comparsion of the McGurk effect for spoken and sung syllables|journal=Attention, Perception and Psychophysics|year=2010|month=August|volume=72|issue=6|pages=1450-1454|accessdate=October 5, 2011}}</ref>
3. This article is written by various authors with a wide range of experience. I think this group is a reliable source due to their combined vast knowledge of the music and sound field as well it's relation to auditory research. Each member has authored numerous articles and publications. The article appears to give a fair assessment of the study and does not seem biased to me. This article is found in a professional journal and does include quite a bit of technical terminology, but with a little further research can be easily understood by college students or anyone familiar with speech therapy, or has an auditory or music background. This article discusses how phonemes are heard and visualized in sung syllables versus the studies of this same phenomenon in speech patterns.<ref>{{cite journal|last=Quinto|first=Lena|coauthors=William Forde Thompson, Frank A. Russo, Sandra E. Trehub|title=A comparsion of the McGurk effect for spoken and sung syllables|journal=Attention, Perception and Psychophysics|year=2010|date=August 2010|volume=72|issue=6|pages=1450–1454|doi=10.3758/APP.72.6.1450 |pmid=20675792 }}</ref>




Line 15: Line 15:


'''Current work on McGurk effect article:''' Current article in regular type; new material in '''bold'''
=== '''**Current work on McGurk effect article:''' Previous article in regular type; new material in '''bold''' (tables are new material)** ===


The McGurk effect is a perceptual phenomenon which demonstrates an interaction between hearing and vision in speech perception. It is a compelling illusion in which humans perceive mismatched audiovisual speech as a completely different syllable. It suggests that speech perception is multimodal, that is, that it involves information from more than one sensory modality. This effect may be experienced when a video of one phoneme's production is dubbed with a sound-recording of a different phoneme being spoken. Often, the perceived phoneme is a third, intermediate phoneme. As an example, the syllable /ba-ba/ is spoken over the lip movements of /ga-ga/, an the perception is of /da-da/. The McGurk effect is sometimes called the McGurk-MacDonald effect.
The McGurk effect is a perceptual phenomenon which demonstrates an interaction between hearing and vision in speech perception. It is a compelling illusion in which humans perceive mismatched audiovisual speech as a completely different syllable. It suggests that speech perception is multimodal, that is, that it involves information from more than one sensory modality. This effect may be experienced when a video of one phoneme's production is dubbed with a sound-recording of a different phoneme being spoken. Often, the perceived phoneme is a third, intermediate phoneme. As an example, the syllable /ba-ba/ is spoken over the lip movements of /ga-ga/, an the perception is of /da-da/. The McGurk effect is sometimes called the McGurk-MacDonald effect.


'''Discovery of McGurk effect'''


It was first described in a paper by Harry McGurk and John MacDonald in 1976. This effect was discovered by accident when McGurk and her research assistant, MacDonald, asked a technician to dub a video with a different phoneme other than the one spoken while conducting a study on how infants perceive language at different development stages. When the video was played back, both researchers heard a third phoneme rather than the one spoken or mouthed in the video.
It was first described in a paper by Harry McGurk and John MacDonald in 1976. This effect was discovered by accident when McGurk and his research assistant, MacDonald, asked a technician to dub a video with a different phoneme other than the one spoken while conducting a study on how infants perceive language at different development stages. When the video was played back, both researchers heard a third phoneme rather than the one spoken or mouthed in the video. McGurk and MacDonald originally believe that this effect resulted from the common phonetic and visual properties of /b/ and /g/. '''The researchers explain that the sound 'bah' is more acoustically in line with 'dah' and that 'gah' is more visually in line with 'dah' rather than 'bah'. These similarities create mixed signals in the brain - in the auditory and visual processing centers, and the brain needing to find a common factor between these two centers processes 'duh.''''<ref>{{cite web|title=Psycholinguistics/Perception of Continuous Speech|url=http://en.wikiversity.org/wiki/Psycholinguistics/Perception_of_Continuous_Speech|accessdate=October 5, 2011|author=Wikiversity}}</ref> '''Research suggests that visual information plays an important role in how well we process auditory information even when sufficient auditory information in presented in a clear and accurate format.'''<ref>{{cite web|last=Green|first=Kerry P.|title=Studies of the McGurk Effect: Implications for Theories of Speech Perception|coauthors=University of Arizona}}</ref> '''Listeners unconsciously watch mouth and facial movements as a form of lip reading to understand a speaker's meaning. When the facial area or facial movements are obscured, whether through the McGurk effect or lack of visual contact with the face miscommunication is more likely.'''


'''Further studies have shown that the McGurk effect appears with other consonants and vowels and that it can exist throughout whole sentences, but the illusion is greater with certain vowels, consonants , and words.'''<ref>{{cite web|last=Nicholson|first=Annie Rose H.|title=Using Words to Examine the McGurk Effect|url=http://www.laurenscharff.com/courseinfo/SL02/an2mcgurk.html|accessdate=October 5, 2011}}</ref> '''Studies have shown that the McGurk effect is stronger when matching vowel combinations are used in the auditory and visual syllable sound stimuli versus using non matching vowel combinations. The same was true for matching and nonmatching vowel combinations in the spoken word stimuli.'''
'''How and Why does it work'''



McGurk and MacDonald originally believe that this effect resulted from the common phonetic and visual properties of /b/ and /g/. '''Both researchers explain that the sound 'bah' is more acoustically in line with 'dah' and that 'gah' is more visually in line with 'dah' rather than 'bah'. These similarities create mixed signals in the brain - in the auditory and visual processing centers, and the brain finds a common factor to process: 'dah.''''<ref>{{cite web|title=Psycholinguistics/Perception of Continuous Speech|url=http://en.wikiversity.org/wiki/Psycholinguistics/Perception_of_Continuous_Speech|accessdate=October 5, 2011|author=Wikiversity}}</ref> Further research '''has been inconsistent in showing results of the McGurk effect with words and full sentences. with with results proving that has shown that it can exist throughout whole sentences. '''Studies have also shown that the effect is greater with certain vowels and syllables. The effect is very robust; that is, knowledge about it seems to have little effect on one's perception of it. This is different from certain optical illusions, which break down once one "sees through" them.
{| class="wikitable"
|-
! Sounds/Matching Vowels !! Words/Matching Vowels
|-
| ba/ga || bat/gat
|-
| be/ge || bad/gad
|-
| bi/gi || moo/goo
|-
| bo/go || bent/vest
|-
| bu/gu || might/die
|-
|}

{| class="wikitable"
|-
! Sounds/Nonmatching Vowels !! Words/Nonmatching Vowels
|-
| ga/bi || bat/vet
|-
| ba/gi || bet/vat
|-
| bi/ga || mail/deal
|-
| gi/ba || mat/dead
|-
| be/gu || met/gal
|}

'''When subjects watch a video with the visual "My gag kok me koo grive" dubbed with the audio "My bap pop me poo brive" most subjects reported hearing "My dad taught me to drive" on the audiovisual.'''<ref>{{cite journal|last=Massaro|first=Dominic W.|coauthors=David G. Stork|title=Speech recognition and sensory integration: a 240-year-old theorem helps explain how people and machines can integrate auditory and visual information to understand speech|journal=American Scientist|year=1998|date=May-June 1998|volume=86|pages=236–239|doi=10.1511/1998.25.236 }}</ref> The effect is very robust; that is, knowledge about it seems to have little effect on one's perception of it. '''Subjects can be told of the effect before watching a dubbed video and they will still hear the third phoneme while watching, yet if they close their eyes they can hear the correct auditory stimuli. Subjects report hearing a third phoneme even when watching a dubbed video of themselves mouthing the sound.''' This '''illusion''' is different from certain optical illusions, which break down once one "sees through" them. '''Precise synchronization of mouth movements and dubbed words, clarity of image, or the gaze patterns of the subjects do not play a role in whether or not subjects successfully hear the third phoneme. It has been have shown that a delay of 250 msec or advance of 60 msec between visual and auditory stimuli can appear before results begin to be effected.'''<ref>{{cite journal|last=Quinto|first=Lena|coauthors=William Forde Thompson, Frank A. Russo, Sandra E. Trehub|title=A comparsion of the McGurk effect for spoken and sung syllables|journal=Attention, Perception and Psychophysics|year=2010|date=August 2010|volume=72|issue=6|pages=1450–1454|doi=10.3758/APP.72.6.1450 |pmid=20675792 }}</ref>

'''Research has shown that the McGurk effect is prevalent in other languages as well as English. It has been demonstrated that it has a strong effect in Italian, German, Spanish, and Hungarian languages, while it has a weaker effect in Japanese, Chinese, and Thai languages. These latter languages have simpler phonological cues with less consonant contrasts along with fewer visually distinct contrasts. It is possible that languages with more complex phonological characteristics require more attention to visual cues.'''<ref>{{cite journal|last=Bovo|first=R.|vauthors=A.Ciorba,S. Prosser, A. Martini|title=The McGurk phenomenon in Italian listeners|journal=ACTA Otorhinolaryngologica Italica|year=2009|date=August 2009|volume=29|pages=203–208}}</ref> '''Using the McGurk effect to decipher lyrics to a song has a greater outcome then auditory stimuli alone, but it's effect is not as strong as with the spoken word. This may be due to the difference in movement of the jaw and lips. In speech for the most part these movements are minimal, where as in singing the mouth is opened wider and lip movements are exaggerated for the production of higher pitches and sound fullness,and articulating sung vowels.''' <ref>{{cite journal|last=Quinto|first=Lena|coauthors=William Forde Thompson, Frank A. Russo, Sandra E. Trehub|title=A comparsion of the McGurk effect for spoken and sung syllables|journal=Attention, Perception and Psychophysics|year=2010|date=August 2010|volume=72|issue=6|pages=1450–1454|doi=10.3758/APP.72.6.1450 |pmid=20675792 }}</ref>


Study into the McGurk effect is being used to produce more accurate speech recognition programs by making use of a video camera and lip reading software. It has also been examined in relation to witness testimony; Wareham & Wright's 2005 study showed that inconsistent visual information can change the perception of spoken utterances, suggesting that the McGurk effect may have many influences in everyday perception.
Study into the McGurk effect is being used to produce more accurate speech recognition programs by making use of a video camera and lip reading software. It has also been examined in relation to witness testimony; Wareham & Wright's 2005 study showed that inconsistent visual information can change the perception of spoken utterances, suggesting that the McGurk effect may have many influences in everyday perception.

Latest revision as of 13:06, 2 August 2023

McGurk effect

I would like to expand this article to include more information on its effect in whole sentences and add in some missing citations. I've also been looking at information and studies regarding how this phenomenon effects listeners in various languages besides English, particularly Italian; German; Spanish; Japanese; and Chinese, and would like to add information about this. Along with these two areas I would like to add information on how this effect compares in spoken and sung phonemes. My hope is to also weave some info into the article on how/why the brain transfers the visual phoneme and auditory phoneme into the third phoneme. I will be using the sources listed below.

Annotated Bibliography Section

1. The authors of this article are both researchers and have written numerous articles relating to speech, hearing, and computer or technology use; while each has a different background: studying how people integrate sensory information in speech perception and building lipreading machines that can decode speech. I did not detect any particular bias in their article. The article was mainly discussing how people and machines can integrate auditory and visual information to understand speech. While the article was posted in the American Scientist it does not contain much technical jargon and could be easily understand by a college level student or someone with an interest in the subject.[1]

2. The authors of this article are Italian researchers in the Audiology and Phoniatrics Department at University Hospital of Ferrara. I cannot locate further credentials that are in English, but came across multiple publications and articles written by them individually and collectively. This article discusses the phenomenon as it occurs in various other languages with the main focus of their study on Italian subjects. The article starts out fairly basic but it soon becomes apparent that it is a journal publication and becomes much more technical and over the heads of the average reader. I did not notice any bias to the article.[2]

3. This article is written by various authors with a wide range of experience. I think this group is a reliable source due to their combined vast knowledge of the music and sound field as well it's relation to auditory research. Each member has authored numerous articles and publications. The article appears to give a fair assessment of the study and does not seem biased to me. This article is found in a professional journal and does include quite a bit of technical terminology, but with a little further research can be easily understood by college students or anyone familiar with speech therapy, or has an auditory or music background. This article discusses how phonemes are heard and visualized in sung syllables versus the studies of this same phenomenon in speech patterns.[3]



**Current work on McGurk effect article: Previous article in regular type; new material in bold (tables are new material)**

[edit]

The McGurk effect is a perceptual phenomenon which demonstrates an interaction between hearing and vision in speech perception. It is a compelling illusion in which humans perceive mismatched audiovisual speech as a completely different syllable. It suggests that speech perception is multimodal, that is, that it involves information from more than one sensory modality. This effect may be experienced when a video of one phoneme's production is dubbed with a sound-recording of a different phoneme being spoken. Often, the perceived phoneme is a third, intermediate phoneme. As an example, the syllable /ba-ba/ is spoken over the lip movements of /ga-ga/, an the perception is of /da-da/. The McGurk effect is sometimes called the McGurk-MacDonald effect.


It was first described in a paper by Harry McGurk and John MacDonald in 1976. This effect was discovered by accident when McGurk and his research assistant, MacDonald, asked a technician to dub a video with a different phoneme other than the one spoken while conducting a study on how infants perceive language at different development stages. When the video was played back, both researchers heard a third phoneme rather than the one spoken or mouthed in the video. McGurk and MacDonald originally believe that this effect resulted from the common phonetic and visual properties of /b/ and /g/. The researchers explain that the sound 'bah' is more acoustically in line with 'dah' and that 'gah' is more visually in line with 'dah' rather than 'bah'. These similarities create mixed signals in the brain - in the auditory and visual processing centers, and the brain needing to find a common factor between these two centers processes 'duh.'[4] Research suggests that visual information plays an important role in how well we process auditory information even when sufficient auditory information in presented in a clear and accurate format.[5] Listeners unconsciously watch mouth and facial movements as a form of lip reading to understand a speaker's meaning. When the facial area or facial movements are obscured, whether through the McGurk effect or lack of visual contact with the face miscommunication is more likely.

Further studies have shown that the McGurk effect appears with other consonants and vowels and that it can exist throughout whole sentences, but the illusion is greater with certain vowels, consonants , and words.[6] Studies have shown that the McGurk effect is stronger when matching vowel combinations are used in the auditory and visual syllable sound stimuli versus using non matching vowel combinations. The same was true for matching and nonmatching vowel combinations in the spoken word stimuli.


Sounds/Matching Vowels Words/Matching Vowels
ba/ga bat/gat
be/ge bad/gad
bi/gi moo/goo
bo/go bent/vest
bu/gu might/die
Sounds/Nonmatching Vowels Words/Nonmatching Vowels
ga/bi bat/vet
ba/gi bet/vat
bi/ga mail/deal
gi/ba mat/dead
be/gu met/gal

When subjects watch a video with the visual "My gag kok me koo grive" dubbed with the audio "My bap pop me poo brive" most subjects reported hearing "My dad taught me to drive" on the audiovisual.[7] The effect is very robust; that is, knowledge about it seems to have little effect on one's perception of it. Subjects can be told of the effect before watching a dubbed video and they will still hear the third phoneme while watching, yet if they close their eyes they can hear the correct auditory stimuli. Subjects report hearing a third phoneme even when watching a dubbed video of themselves mouthing the sound. This illusion is different from certain optical illusions, which break down once one "sees through" them. Precise synchronization of mouth movements and dubbed words, clarity of image, or the gaze patterns of the subjects do not play a role in whether or not subjects successfully hear the third phoneme. It has been have shown that a delay of 250 msec or advance of 60 msec between visual and auditory stimuli can appear before results begin to be effected.[8]


Research has shown that the McGurk effect is prevalent in other languages as well as English. It has been demonstrated that it has a strong effect in Italian, German, Spanish, and Hungarian languages, while it has a weaker effect in Japanese, Chinese, and Thai languages. These latter languages have simpler phonological cues with less consonant contrasts along with fewer visually distinct contrasts. It is possible that languages with more complex phonological characteristics require more attention to visual cues.[9] Using the McGurk effect to decipher lyrics to a song has a greater outcome then auditory stimuli alone, but it's effect is not as strong as with the spoken word. This may be due to the difference in movement of the jaw and lips. In speech for the most part these movements are minimal, where as in singing the mouth is opened wider and lip movements are exaggerated for the production of higher pitches and sound fullness,and articulating sung vowels. [10]


Study into the McGurk effect is being used to produce more accurate speech recognition programs by making use of a video camera and lip reading software. It has also been examined in relation to witness testimony; Wareham & Wright's 2005 study showed that inconsistent visual information can change the perception of spoken utterances, suggesting that the McGurk effect may have many influences in everyday perception.


References

  1. ^ Massaro, Dominic W. (May–June 1998). "Speech recognition and sensory integration: a 240-year-old theorem helps explain how people and machines can integrate auditory and visual information to understand speech". American Scientist. 86: 236–239. doi:10.1511/1998.25.236. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)CS1 maint: date and year (link) CS1 maint: date format (link)
  2. ^ Bovo, R. (August 2009). "The McGurk phenomenon in Italian listeners". ACTA Otorhinolaryngologica Italica. 29: 203–208. {{cite journal}}: More than one of author-name-list parameters specified (help)CS1 maint: date and year (link)
  3. ^ Quinto, Lena (August 2010). "A comparsion of the McGurk effect for spoken and sung syllables". Attention, Perception and Psychophysics. 72 (6): 1450–1454. doi:10.3758/APP.72.6.1450. PMID 20675792. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)CS1 maint: date and year (link)
  4. ^ Wikiversity. "Psycholinguistics/Perception of Continuous Speech". Retrieved October 5, 2011.
  5. ^ Green, Kerry P. "Studies of the McGurk Effect: Implications for Theories of Speech Perception". {{cite web}}: Missing or empty |url= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
  6. ^ Nicholson, Annie Rose H. "Using Words to Examine the McGurk Effect". Retrieved October 5, 2011.
  7. ^ Massaro, Dominic W. (May–June 1998). "Speech recognition and sensory integration: a 240-year-old theorem helps explain how people and machines can integrate auditory and visual information to understand speech". American Scientist. 86: 236–239. doi:10.1511/1998.25.236. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)CS1 maint: date and year (link) CS1 maint: date format (link)
  8. ^ Quinto, Lena (August 2010). "A comparsion of the McGurk effect for spoken and sung syllables". Attention, Perception and Psychophysics. 72 (6): 1450–1454. doi:10.3758/APP.72.6.1450. PMID 20675792. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)CS1 maint: date and year (link)
  9. ^ Bovo, R. (August 2009). "The McGurk phenomenon in Italian listeners". ACTA Otorhinolaryngologica Italica. 29: 203–208. {{cite journal}}: More than one of author-name-list parameters specified (help)CS1 maint: date and year (link)
  10. ^ Quinto, Lena (August 2010). "A comparsion of the McGurk effect for spoken and sung syllables". Attention, Perception and Psychophysics. 72 (6): 1450–1454. doi:10.3758/APP.72.6.1450. PMID 20675792. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)CS1 maint: date and year (link)