Wikipedia talk:Manual of Style/Pronunciation/Archive 6

This is an archive of past discussions on Wikipedia:Manual of Style. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

←

Archive 4

→

Standard for phonemic IPA transcription of international English

I propose that we develop, or better yet find an existing standard for a broad phonemic IPA transcription which would cover all dialects of English. Such a transcription scheme would be functionally equivalent to one of the dictionary-specific respelling schemes (see the details of fifteen different schemes at pronunciation respelling for English), in that only a limited number of symbols would be used. But unlike any of those, it would serve equally well for an anglophone from London, Edinburgh, Pretoria, Toronto, Dallas, or Canberra. It would also be compatible with and a step towards learning the full set of IPA used for other languages or describing speech sounds in detail, for which other dictionary respellings cannot be used.

To demonstrate the basic pronunciation of English words, the phonemic IPA could be surrounded by slashes /.../. Phonetic IPA in brackets [...] would be used for foreign words, or when discussion detailed sounds, such as comparing regional English dialects or describing a speech impediment.

There's an example of such a schema here: The sounds of English and the International Phonetic Alphabet at antimoon.com. It is footnoted with the assumptions it makes. Interestingly, it uses a superscript /^r/ to show where rhotic and non-rhotic accents differ, and a regular /r/ where they don't. This may be a good start for an international chart for English pronunciation. Does anyone know of any other examples, to compare? —Michael Z. 2006-08-17 01:11 Z

See the above threads for reasons why soundalike pronunciation guides are approximately 72 trillion times more useful to our readership than the above idea. Tempshill 04:01, 17 August 2006 (UTC)

Mm-hm. I'm proposing a standardization of the way we use IPA for English, which I hope will make it easier and more consistent with its use in dictionaries and throughout Wikipedia. Abandoning IPA is a separate issue. —Michael Z. 2006-08-17 05:00 Z

Let me interject here: why do IPA proponents here seem to see this as an all-or-nothing proposition? Most of us who are critical of IPA, like me, would like to see another system (like "pro-nun", but not necessarily that, just some overall decent system) alongside IPA. What we're saying (most of us, anyhow), is that IPA alone is a very bad idea. +ILike2BeAnonymous 06:00, 17 August 2006 (UTC)

I'm proposing refining the way we render English words in IPA. You are talking about another issue. —Michael Z. 2006-08-17 06:12 Z

I've done a bit of research on the topic of IPA use in dictionaries, and summarized it at Pronunciation respelling for English#International Phonetic Alphabet. The gist is that nearly all English dictionaries which use IPA use a standard method of phonemic transcription called the Gimson system.[1] It was developed for British Received Pronunciation, but is quite compatible with General American English, and in fact a number of dictionaries use five additional non-phonemic symbols to represent both British and American English with a single transcription.[2] Others use two transcriptions for British and General American, or use a combined system and supplement it with some American transcriptions. —Michael Z. 2006-08-17 05:08 Z

I like this idea. A consistent way of rendering IPA that doesn't need multiple versions for each dialect, and which doesn't rely on the personal transcription abilities of linguists, would be a boon. Ideally, it would be applicable by non-linguists by reference to a sound chart. It wouldn't be useful at all for non-English pronunciations, but for English terms it would make it more accessible to editors and readers both. — Saxifrage ✎ 07:42, 17 August 2006 (UTC)

It will never work. The various dialectal differences in English cannot be reduced to a single transcription system. (How would we possible capture the variation in words like "banana", "pasta", and "tomato"?) Nor would a Wikipedia-specific system ever be accepted by the linguists here; for one thing, it would be a form of original research. It would also ultimately be as ad-hoc as the pro-nun-see-AY-shun systems that are generally rejected here. Let's not waste our time trying to develop this further. User:Angr 15:52, 27 August 2006 (UTC)

I'm not sure myself it will work, but I'm more optimistic than Angr. Most of the differences between the major English dialects can be smoothed over by a suitable broad transcription, but not all. The [æ]/[ɑ] distinction in words like banana can't be; neither can unsystematic differences like [təˈmætoʊ]/[təˈmeɪtoʊ]/[təˈmɑtoʊ]. The solution for these rare cases is to provide multiple pronunciations. I think the original-research is a red herring; this isn't content, it's convention, like a hundred WP conventions. The pronunciation guideline already mandates a broad transcription for general purposes; all this proposal does is define what "broad transcription" means in the context of WP. --CJGB (Chris) 19:25, 27 August 2006 (UTC)

I think Chris has it right. When the point is to describe three different pronunciations of a word, then three different, narrower transcriptions are needed, and they would be put in square brackets [...] to indicate that they are phonetic transcriptions. Likewise when comparing pronunciation in different English dialects. But just to demonstrate the general phonemes of a word which is recognizable in most or all dialects, we can use a simpler, broad phonemic transcription, which would appear in slashes /.../. This is the way IPA is intended to be used.

And this is not original research, whether that would be acceptable in this context or not. The whole point is that there already is the Gimson phonemic system which is widely used by other references. I'm suggesting that we investigate the details, and consider adopting a version of it. —Michael Z. 2006-09-11 02:27 Z

Broad IPA for English (vowels)

This thread has been silent for a while, but I have made a table comparing the various IPA renderings for English dialects in wikipedia. I added one column under the name "broad", that could serve as braod transciption applicable to all dialects with only minor deviations. The proposal is based in the Gimson system mentioned above and the general understanding that in non-rhotic dialects a written /r/ is pronounced as a schwa or a lengthening of the vowel. Please have look. −Woodstone 12:16, 27 August 2006 (UTC)

word	RP	AmEng	AusEng	broad	after discussion	poss. rhotics
full vowels
bid	/ɪ/	/ɪ/	/ɪ/	/ɪ/	/ɪ/
bead	/iː/	/i/	/iː/	/iː/	/i/
bed	/ɛ/,/e/	/ɛ/	/e/	/ɛ/	/ɛ/
bad	/æ/,/a/	/æ/	/æː/	/æː/	/æ/
bath	/ɑː/	/æ/	/aː/	???	???
dance	/ɑː/	/æ/	/æ/	???	???
pasta	/æ/	/ɑ/	/aː/	???	???
carry	/æ/	/ɛ/	/æ/	???	???
pod	/ɒ/	/ɑ/	/ɔ/	/ɑ/	/ɒ/
cloth	/ɒ/	/ɔ/	/ɔ/	???	???
father	/ɑː/		/aː/	/ɑː/	/ɑ/
bud	/ʌ/	/ʌ/	/a/	/ʌ /	/ʌ /
hurry	/ʌ/	/ɜ/	/a/	???	???
bought	/ɔː/	/ɔ/	/oː/	/ɔː/	/ɔ/
toe	/əʊ/	/o/	/əʉ/	/oː/	/oʊ/
good	/ʊ/	/ʊ/	/ʊ/	/ʊ/	/ʊ/
booed	/uː/	/u/	/ʉː/	/uː/	/u/
diphthongs
bay	/eɪ/	/e/	/æɪ/	/eɪ/	/eɪ/
boy	/ɔɪ/	/ɔɪ/	/oɪ/	/ɔɪ/	/ɔɪ/
bny	/aɪ/,/ʌɪ/	/aɪ/	/ɑe/	/аɪ/	/аɪ/
cow	/aʊ/	/aʊ/	/æɔ/	/aʊ/	/aʊ/
idea					/iə/
rhotacised vowels (r silent in non-US)
bird	/ɜː/,/əː/	/ɝ/	/ɜː/	/ɜr/	/ɜr/	/ɝ/ or /ɜɹ/
beer	/ɪə/	/ɪɹ/	/ɪə/	/ir/	/ir/	/iɚ/ or /iɹ/
bear	/ɛə/,/ɛː/	/ɛɹ/	/eː/	/ɛr/	/ɛr/	/ɛɚ/ or /ɛɹ/
bar		/ɑɹ/		/ɑr/	/ɑr/	/ɑɚ/ or /ɑɹ/
bore		/ɔɹ/		/ɔr/	/ɔr/	/ɔɚ/ or /ɔɹ/
boor	/ʊə/,/ɔː/	/ʊɹ/	/ʊə/	/ʊr/	/ʊr/	/ʊɚ/ or /ʊɹ/
reduced vowels
roses	/ɪ/	/ɨ/	/ə/	/ə/	/ə/
rosa's					?
runner		/ɚ/		/ər/	/ər/	/ɚ/ or /əɹ/
bottle	/l̩/	/l̩/	/l̩/	/l̩/	/əl/
button	/n̩/	/n̩/	/n̩/	/n̩/	/ən/
rhythm	/m̩/	/m̩/	/m̩/	/m̩/	/əm/

Largely looks good, Woodstone, but I'd be in favor of maintaining the roses/Rosa's distinction that some people make by transcribing the reduced vowel in the former as barred-i (or maybe as ɪ). I also see no reason to merge the quality of "wasp" with that of "father"--for those whose dialects make the distinction, it will cause error, and for those whose dialects don't, it will end up being merged automatically, as is appropriate to that dialect. One question that the proponents of IPA have largely not addressed is whether we need several transcriptions, RP, GenAm, GenAus, etc., for every pronunciation. Something as bulky as "Tudor (IPA: RP: ['tjuːdə], GenAm: ['t(j)udɚ], GenAus: ['tʃʉːdə])" is completely unnecessary, and probably also harder to read, for those with a basic rather than advanced knowledge of IPA, than simply ['tjuːdɚ].--Atemperman 20:19, 28 August 2006 (UTC)

I would argue for removing the diacritics. Partly this makes it easier for everyone to write the transcriptions, but in larger part because diacritics indicate a small deviation. I don't think we need to indicate small deviations in a broad-broad transcription such as this as they are either systematic (at least, systematic on the level we're looking at, in terms of how the broad-broad transcription translates to the broad transcription of the individual dialects), or they indicate unnecessary detail. So, I'd leave off the length indicators, and write the 'le' in 'bottle' as /əl/ (and so forth for N and M). You've taken this approach with the rhotics, using /r/ instead of diacritics or the inverted small-r. — Saxifrage ✎ 21:38, 28 August 2006 (UTC)

Personally, I like the convention of using inverted r to represent "droppable" final [r], but I wonder how kosher it is for professional linguists. (The old OED used it.) Alternatively, you could use a superscripted r (again, is it kosher?) or place it in brackets.--CJGB (Chris) 21:59, 28 August 2006 (UTC)

In linguistics, /r/ and /ɹ/ mean different things: the first is the trilled "r" found in Spanish and other languages, the second is the sonorant "r" found in English and others. The reason I suspect Woodstone used the upright r is that, when the context is only English, there is a linguistic convention of convenience of using the normal r character to stand in for /ɹ/. This tends to be done in broad transcriptions, which makes it particularly well-suited for this scheme. It's not a bad idea to indicate droppable R somehow, but using normal r and inverted r to make the distinction would be confusing. If it was absolutely going to be done with those two characters, I'd suggest the inverse: use inverted r where the r is necessary, and the "less detailed" normal r in the "less detailed" position of droppable Rs. — Saxifrage ✎ 23:08, 28 August 2006 (UTC)

I expect you're right about using turned r for droppable r. I'm not crazy about your "reverse" proposal -- I feel pretty strongly that /ɹ/ should be /r/ in broad transcription. Not that I'm boss. One approach might be to use /ɚ/ for the second element in the r diphthongs. Then we just regard /ɚ/ and /ə/ as merging in non-rhotic dialects. (Of course, that means not merging /ɜr/ and /ər/, as I suggested elsewhere.) I'll add a column to show what I mean. --CJGB (Chris) 19:18, 29 August 2006 (UTC)

I must say that I’m sceptical that this can be pulled off. It will teach people a bad form of the IPA, because in a best-case scenario for many characters they’ll learn to associated symbol (a) with sound (b), and then when they see symbol (a) in the transcription of a foreign language, they’ll think it’s sound (b) when in fact it’s almost identical to their own sound (c). For that reason, I think that this task is better-off with a respelling pr’-NUN-see-AY-sh’n key. (My personal opinion is that an unambiguous respelling key should be used for English that is defined against the IPA for various English dialects (exclusively); and for every other language the IPA should be used (exclusively). This means we can avoid multiple pronunciation keys for words unless there’s authentically multiple pronunciations as with ‘glass’.)

In any case this is not a phonemic transcription (or any sort of underlying transcription), so /slashes/ shouldn’t be used. I’m using quotes here, but I could’ve used italics just as well.

However, assuming I am out-voted and this is accepted, I have some suggestions to make it a bit more useful. Firstly, I think the long o should be marked as ‘oʊ’ rather than ‘oː’. This makes it consistent with ‘eɪ’, is a transcription used for American English and in Australian English dictionaries and conservatively for RP, and in any case only Scottish and some other similar accents actually ever use [oː]: American English has (to my ears) usually [oʊ], sometimes [o]

Secondly, I think it should be unambiguous regardless of whether the colons are read. They should be kept for clarity, but this basically means that the vowel of pod should be ‘ɒ’, rather than ‘ɑ’. Ideally, no look-ahead should be necessary, but rhotic/non-rhotic differences makes it so. I also agree that no diacritics should be used, so that əl, əm etc. is sufficient as it was with əɹ.

There are cases of the ‘non-rhotic’ vowels that don’t correspond to the ‘rhoticised’ vowels of rhotic dialects, for instance, in ‘idea’, ‘theatre’ or ‘yeah’. I therefore suggest that the form for them should be more like ‘iːə’ (Ikea), ‘ɪə’ (idea), ‘ɪəɹ’ (beer); ‘eɪəɹ’ (player), ‘ɛə’ (yeah), ‘ɛəɹ’ (bear). Rhotic speakers will therefore understand to associate ‘ɪə’ with ‘iːə’, whereas non-rhotic speakers will understand to associate ‘ɪəɹ’ with ‘ɪə’. (Note also that the AusE vowel in boor isn’t /ʊə/, unlike the vowels in tour or pure which could be described as such. I suggest the key should be changed to tour, and that this should be considered a case when there’s legitimately two pronunciations.)

The ‘bad’ vowel shouldn’t be included. Although it’s phonemically relevant to Australian English, it’s also phonemically relevant to other dialects (and just doesn’t happen to be included). Between all these dialects the distribution varies (for instance, ‘bad’ has the short vowel in parts of the US that have a similar split). So long as it’s mentioned that ‘æ’ stands for both /æ/ and /æː/ in the respective dialects, I don’t see that it matters, and a guess will usually be relevant. When the pronunciation in a given dialect is relevant, a proper use of the IPA will need to be used anyway.

—Felix the Cassowary 12:34, 29 August 2006 (UTC)

I have added a column with the above proposals worked in as far as I understand them. I dropped all diacritics, replacing the symbol for pod and using schwa before sonantic consonants. Using different symbols for droppable or firm r is tricky and would defeat the simplicity to a certain extent. I don't foresee many cases where it would be important. I do not quite understand the remark that the proposed system is not phonemic. To a very large extent it is phonemic within each dialect. How can bad be omitted? It is a clear phonemic distinction with a minimal pair bad/bed. Feel free to add your ideas in the last column. −Woodstone 18:38, 29 August 2006 (UTC)

Is /ɜr/ really a phoneme? Couldn't it merged with /ər/?--CJGB (Chris) 18:56, 29 August 2006 (UTC)

It is a sound, but an uncommon one. In my own dialect it's similar enough to /ər/ that I think merging in the key would be appropriate, but I can't comment on that for other dialects. — Saxifrage ✎ 20:53, 29 August 2006 (UTC)

The proposal to use rhotacized schwa, might work for beer, bear and even boor, but I have some hesitation for bar. And it breaks the simplicity of no diacritics. I add the inverted r in the same column, just to see how it looks. Normal /r/ would stay for pure consonant r. I suppose /ɜ/ (always stressed) could be considered an allophone of /ə/ (never stressed) as suggested above. −Woodstone 21:22, 29 August 2006 (UTC)

Feel free to comment, but please do not destroy earlier proposals. That makes it impossible to follow the discussion. If you have comments, add them here or in a new column. −Woodstone 11:54, 30 August 2006 (UTC)

Adding new lines to the draft chart was not "destroying earlier proposals", but if you'd rather pretend the difficulties don't exist by deleting them, that's your prerogative. —The preceding signed comment was added by Angr (talk • contribs). 13:49, 30 August 2006 (UTC)

I agree almost entirely with Felix the Cassowary's comments.

The words bath, dance, cloth, pasta, carry have been raised as difficulties by Angr. For the first four, I would suggest simply giving two pronunciations, e.g. bath /bæθ/, /bɑθ/. The alternative option of coming up with a special symbol for trap-bath split words (/a/, maybe) seems to suffer from a couple of problems: that /a/ wouldn't be an actual phoneme anywhere, and if dance and calf were written /dans/ and /kaf/ (as suggested by their RP and GenAm pronunciations) then Australians and the northern English (respectively) might be confused. For carry, I don't know: the marry/merry/Mary merger is a regular process and so the GenAm pronunciation could be predicted from /kæri/, but that might be confusing. In the non-IPA Help:Pronunciation respelling key, arr ended up being used for words like carry, with a separate line in the key.

---JHJ 16:28, 30 August 2006 (UTC)

My 2 cents:

don't forget the horse-hoarse merger: I suggest /or/ for FORCE and /ɔr/ for NORTH.
for the weak vowel merger, I think there is precedent for barred i /ɨ/ for Roses, leaving schwa /ə/ for Rosa's
As Angr says, you need to accommodate Lexical sets BATH and CLOTH; I think one symbol is much better than two. I suggest TRAP /æ/, BATH /a/, PALM /ɑ/ and LOT /ɒ/, CLOTH /ɔ/, THOUGHT /ɔː/. Why bother handling some accent distinctions and then copping out and leaving others raw? I can't see how the fact that trap-bath split words are not a phoneme anywhere is relevant. The overall proposed trancription will not be 1-1 with any accent, so one more is not imposing any extra imperfection for any given speaker. The possibility of North English being confused by "dance" and "bad" having different representations is no worse than the already-allowed possibility of Americans confused by "father" and "bother".
You should also allow for happy tensing: if you use /i/ for happY you can use /iː/ for FLEECE.
Re "pasta" foreign A (and similarly "yoghurt" foreign O): maybe these are just the largest classes in the miscellaneous pronunciation differences which can't be grouped meaningfully.
In my rhotic accent, "idea" has no second diphthong: it's trisyllabic with second-syllable stress [aɪˈdiː.ə], like "Ikea". I think it's stress-related and can be put under happY-tensing, viz. /aɪdiə/.
in conclusion, I also agree with Mr Cassowary: the degree of departure from any kind of phonemic status means I think using IPA will be misleading. Why not just go with a respelling system? I've always liked Chambers'; a few augmentations and it would work. If the repselling is flexible enough it should be possible to do a little applet or something to translate its rendition into IPA for most accents: a future technical enhancement? Keep IPA for articles on accents, phonetics, and non-English languages. jnestorius^(talk) 17:56, 30 August 2006 (UTC)

The horse-hoarse merger can be safely ignored, I think; the distinction is kept only in minority pronunciations everywhere except Scotland and Ireland, and we aren't even attempting to cover those accents here. As for BATH and CLOTH words, I don't think anyone will remember to use different symbols for them. Most Americans won't know whether a given word they pronounce with [æ] is to be transcribed with /æ/ or /a/ under this scheme, nor will most Brits know whether a given word they pronounce with [ɒ] is to be transcribed with /ɒ/ or /ɔ/. —The preceding signed comment was added by Angr (talk • contribs). 18:37, 30 August 2006 (UTC)

Can I ask why you think we're not attempting to cover Scottish and Irish accents? Personally I'm reasonably relaxed about the NORTH/FORCE distinction because it's generally obvious (to those of us who make it) from the spelling, and the handful of exceptions like sport (and force itself) are mainly not the sort of words that are likely to need transcribing in Wikipedia, but I don't see why the Scots and Irish (and the rest of us who speak neither GenAm nor RP) should be left out. One of my biggest problems with the use of IPA-based systems of transcription is their tendency to be overly dialect-specific - remember that RP is only spoken by a fairly tiny proportion of the UK population. I would support a systematic respelling system whose key included notes for speakers of different accents. --JHJ 08:05, 31 August 2006 (UTC)

A few more points. When I said that they symbols should be unambiguous without a colon, I didn’t mean the colon should be removed. I was going for the system where a colon and a change of symbol are used.

When I said that ‘bad’ shouldn’t be included, I meant the vowel described as ‘[æː]’, not the vowel described as ‘[æ]’. That’s precisely the sort of confusion I expected would ensue from using a symbol for ‘[æː]’.

Merging ‘[əɹ]’ and ‘[ɜɹ]’ would be a bad idea. Not for any real reason, they just don’t seem/sound/feel anything alike. One’s a long stressed vowel, the other’s a short unstressed vowel—I think that’s enough reason to keep them apart!

I’m wholly against using a symbol for the trap/bath split. It’s just not doable for precisely the same reason that the bad vowel isn’t doable: The distribution varies. I say /dæːns/ and /baːθ/ and /kæsəl/ and /kaːnt/. It would be much clearer to just say “‘dæns’ or ‘dɑːns’” (or, better, “dans or daans”). This difference is not at all comparable to the difference between /kaː/ and /kɑɹ/: One is an automatic process that happens in all contexts that match a certain criteria; the second happens only sometimes (but is mostly well-established). It also means that when unknowing Americans write that can’t is pronounced ‘kænt’, they’re not wrong; they just haven’t included all the information.

I’m double against trying to use a single symbol for ‘pasta’ (fwiw, I say ‘pasta’ and ‘pastor’ exactly alike, as /paːstə/).

jnestorius, your point 6 is precisely what I was saying. You speak a rhotic dialect, so you merge ‘ɪə’ with ‘iːə’. I speak a non-rhotic dialect, so I merge ‘ɪə’ and ‘ɪəɹ’ (Q. What do you call a deer with no eyes? A. No eye deer! Q. What do you call a deer with neither eyes nor legs? A. Still no eye deer!) The table doesn’t go far enough in not including other cases like ‘ɛə’.

I re-iterate that I don’t think we should be using the IPA for this purpose. Using the IPA for a dictionary in a country like Australia where the pronunciation is largely homogeneous is fine; or in a language like German where there’s a standard; but for an international English-language encyclopædia where there’s no such standard it’s simply not going to work.

—Felix the Cassowary 14:28, 31 August 2006 (UTC)

Felix: re idea: my bad, I thought you were handling people who stress the first syllable. Regarding trap-bath-pasta: I think the important point is to signal which pronunciation differences are accent-driven and which are free variations.

Angr: Probably more people distinguish force-north than speak AusE. Do we cut back to the Big 2, GAm and RP? I don't like the idea that we should avoid doing something because it's hard. Perhaps we could add some kind of verification/accent parameter to pronunciation templates: {{Eng-pron|IPA = ɪgzɑmpl|dialects = "RP:Aus:NZ"}} which signals "if you speak a different accent, your pronunciation may vary." Relevant speakers could trawl these and add dialects at leisure. This would not just be useful on IPA transcriptions, could work for the pseudo-IPA discussed in this section, or for respellings.

jnestorius^(talk) 17:02, 31 August 2006 (UTC)

Fwiw, most Australians get by without feeling the need to use /a:/ in ‘example’. Regarding dialect vs free variation, thing is to some extent trap-bath is free variation. I say /kæsəl/, but I could say /kaːsəl/ if I wanted to. Sometimes I do. I think there’s also regions of variation in the border between the normally-splitting parts of England with the normally not-splitting parts? —Felix the Cassowary 14:05, 1 September 2006 (UTC)

Yes, a bit. That's why I mentioned calf: based on RP and GenAm it appears to be a classic trap/bath word, but in northern England it usually has the long vowel. So the northern English (like me) might be puzzled by a transcription that implies that it has the same vowel as staff, because for us it doesn't. There are a handful of other words where I have the short vowel but where I've heard the long vowel from speakers who don't normally have the split: examples include aunt, laugh, master. In particular I think Birmingham accents (just north of the main isogloss) tend to have the long vowel in aunt and laugh. All these variations would probably be best served by giving two pronunciations, along the lines of "dans or daans", as you suggested above.--JHJ 14:32, 1 September 2006 (UTC)

Something needs to be done about yod-dropping. My accent is pretty close to GAm, and new is /nu/, not British-style /nju/. Same with Duke, tutor /tuɾɹ/, assume /ə'sum/, Zeus, and probably others that I can't think of now. So we need to find a 'standard'. Oh yeah, is tissue supposed to be /'tɪ ʃu/? Are we considering the whine-wine merger standard? Lock-Loch merger? Is baths /bæθs/ or /baðz/?Cameron Nedland 20:48, 4 September 2006 (UTC)

What are we trying to achieve?

I've pretty much stopped participating in this discussion. I was under the impression that the purpose of a very broad IPA transcription that could be used as a respelling key would be to ignore the distinctions of each dialect. We can't possibly presume to create a transcription system that can capture the distinctions between dialects with simple representations as people have been discussing—is a handful of Wikipedia volunteers really sufficient for an achievement that has been unattainable by all linguists combined to date?

In order to be successful, we have to aim at a much more modest goal: an inaccurate, very broad respelling key that is good enough to grossly indicate pronunciation, and which is easily implementable by non-experts without reference to a tangle of notes and first-hand or linguistics-grade level of familiarity with different dialects. Essentially, we need a respelling key that is to a broad transcription as a broad transcription is to a narrow one. Am I mistaken?

(Even if I am mistaken, I am right about this: we have failed to clearly and precisely describe the problem we are attempting to solve, and so our discussion is unfocused and unproductive. In the field of problem-solving, having a vaguely-specified problem is a very well-understood path to failing to solve that problem.) — Saxifrage ✎ 21:15, 4 September 2006 (UTC)

I agree the aim is to come up with a respelling key that is good enough to grossly indicate pronunciation, and which is easily readable by non-experts. Whether it needs to be easily writable by non-experts is different. Ideally we need to allow anyone who knows how they pronounce a word to add that information, but at the same time ensure that a pronunciation specific to their dialect does not mislead a different reader with a different dialect.

The reason we are arguing over finicky dialect details is to ensure the key is, as far as possible, not misleading for any accent. To rework my earlier suggestion, suppose a Surrey editor indicates the pronunciation of "force", as {{Eng-pron|key = fɔːs|dialects = "RP"}}; then a Nebraskan amends this to {{Eng-pron|key = fɔːrs|dialects = "RP;GAm"}}; then a Glaswegian changes it to {{Eng-pron|key = fors|dialects = "RP;GAm;Scot"}}. None of these representations is incorrect, but each is successively more precise (assuming the key given in the preceding section). On the other hand, "booth" would give {{Eng-pron|key = buːð|dialects = "RP"}}{{Eng-pron|key = buːθ|dialects = "GAm"}}, since this is a lexical difference.
Because the respelling key is "very broad" or "inaccurate", some of us believe IPA is inappropriate. The difference between [djuː] and [duː] for due is a particularly tricky one; whereas [dū] for due and [doo] for do seems more versatile (to me).

jnestorius^(talk) 21:04, 10 September 2006 (UTC)

My experience with transliterations of Thai language to English characters makes me very wary about [dū] for due and [doo] for do: non-native speakers of English have no feeling for such (and even many native speakers might not find it practical). The seemingly understandable indication prevents a reader to look up "ū" or "oo" in a table, while he/she may well have (and thus 'learn') a completely wrong idea. Perhaps first of all determine whether the English Wikipedia is intended for native speakers only (as for many Wikipedia versions in other languages), or do you just as well wish to achieve a world version. — SomeHuman 11 Sep 2006 01:49 (UTC)

Wow, I made a simple proposal and went away for a few days, and I find that many more knowledgeable editors find the idea interesting! After reading the resulting discussion, I do still think that phonemic IPA can be generalized enough to serve well for representing the "general English" pronunciation of many words. As a Canadian, I understand that I may have to have some understanding of my own accent to be able to fully use such a scheme, and the same may apply to, say, a Scot, East Indian, or South African reader. I would be content if a scheme would be able to represent a majority of words in RP and General American, as several dictionaries seem to have done—perhaps not perfectly, but well enough to go to press.

Remember, this applies only where the words are phonemically consistent in RP and GAm, and any time a word is "pronounced differently" in two dialects, we can provide two different broad transcriptions, and when there are important detailed differences, we could switch to a narrower phonetic IPA. I do appreciate all of the knowledgeable comments above which show how detached from reality my naïve "phonemically consistent" comment is.

As to whether such a scheme should use IPA symbols or not: I still think that a phonemic IPA scheme could be functionally equivalent to a respelling system—this is the intent of IPA's brackets [...] vs slashes /.../, and it applies perfectly to our case. Obviously it's not quite that simple, as shown by the example of [dū] for due implying palatalized of [dʲu:] for some readers. But couldn't a phonemic IPA system also encapsulate this? I mean, couldn't we simply say that [u:] implies that this palatalization occurs, in dialects which feature it. Essentially, that [u:] is equivalent to [ū]? —Michael Z. 2006-09-11 03:16 Z

Not unless you have a different notation for /uː/ that doesn't cause palatalisation, as in do, noon, two (as opposed to due, new, tune). Again, I think a non-IPA system would be easier here, though I think the bold oo used for the former group (as against ew for the latter group) in Help: Pronunciation respelling key isn't the best choice. Note also that /dj/ and /tj/ have merged with /dʒ/ and /tʃ/ in many varieties, leading to due being a homophone of Jew, but this is predictable and probably doesn't make it much harder to come up with a system.--JHJ 11:13, 12 September 2006 (UTC)

Perhaps this is one of the rare cases when two phonemic pronunciations must be offered. This effect is not applied the same way in all words: due = GAm [du:], RP [d^ju:]; schedule: [skedʒul], [ʃed^jul] (I think). When it's not necessary to discuss the regional differences in detail, perhaps a conditional notation would be sufficient: due: /d(^j)u:/.

If we were using one of the dictionary respelling systems, we'd have to offer two different pronunciations for some words anyway: do: {doo}; due: {doo, dū}; pure: {pūr}.

Of course there's no need to provide a pronunciation in the articles about the common words due and schedule, so the cases where this comes up in a phonemic pronunciation will be quite rare. —Michael Z. 2006-09-12 18:53 Z

On the other hand, keep in mind the intended scope and anticipated context. Articles on the common words do and due do not need to show readers how to pronounce those words—that's a problem for Wiktionary editors to deal with. In a linguistics context, we would use a narrow phonetic transcription, for example, an article about English dialects may describe the British and American pronunciation of due as [dʲu:] and [du:]. Remember, the intent of this proposal is a scheme demonstrate the pronunciation of unfamiliar English words to readers of English—not to teach the English language.

Thanks for the thoughtful comments, everyone. —Michael Z. 2006-09-11 03:16 Z

Short question: couldn't we simply adopt a scheme like the one described in this review of the Collins COBUILD dictionary? Specific assumptions are listed here. —Michael Z. 2006-09-11 03:22 Z

An excellent source for our efforts above here was given by user JHJ, to be found at IPA transcription systems for English. I will shortly work it into the table. −Woodstone 12:08, 12 September 2006 (UTC)

Looking at the pages from www.antimoon.com that Michael Z. links to above, I notice the following defects: (1) they apparently ignore the BATH and CLOTH lexical sets; (2) they are laboring under the misapprehension (lamentably widespread among British linguists) that Americans pronounce LOT words with a long vowel. General American pot is NOT homophonous with RP part, however convenient it is for Wells, antimoon, and co. to pretend it is. —An gr 20:47, 20 October 2006 (UTC)

I think you're reading too much into the antimoon table. You should have left "apparent misapprehension" rather than removing "apparent". True, it says in the footnotes "In AmE, ɑː is pronounced instead of ɒ", but, based on the table itself, this is to be interpreted as "In AmE hot has the same vowel as father". It also cautions, before the table, "One symbol can mean two different phonemes in American and British English." Maybe they should have used ɑ rather than ɑː as the one symbol for the two phonemes. Maybe that caution should have been in big red letters. These seem like minor issues. Are you worried that someone who (a) is very familiar with the details of IPA and (b) knows little about English phonology will interpret the transcriptions too narrowly? That seems unlikely. As for BATH and CLOTH, the table is only enumerating phonemes, not showing their distribution. jnestorius^(talk) 00:03, 21 October 2006 (UTC)

It doesn't seem unlikely at all. There are plenty of Brits and Europeans who are familiar enough with IPA to know what [ɑː] means but don't know enough about American English to know that when the footnote says "In AmE, ɑː is pronounced instead of ɒ" it isn't true of a very large number of words for a very large number of American English speakers. —An gr 10:14, 21 October 2006 (UTC)

The Lazar's Proposal

Here's my proposal for a diaphonemic IPA scheme. Basically, it would show every distinction found in every major dialect. In this system, there would only be a handful of words (such as "idea") that would require multiple transcriptions. --The Lazar 20:57, 23 September 2006 (UTC)

Word	Proposed System
Full Vowels
bid	/ɪ/
Sirius	/ɪ/
bead	/iː/
bed	/ɛ/
merry	/ɛ/
bad	/æ/
marry	/æ/
bath	/æː/
dance	/æˑ/
pasta	/ɑˑ/
father	/ɑː/
pod	/ɒ/
cloth	/ɒː/
bought	/ɔː/
bud	/ʌ/
hurry	/ʌ/
toe	/oʊ/
good	/ʊ/
booed	/uː/
Diphthongs
bay	/eɪ/
boy	/ɔɪ/
buy	/aɪ/
cow	/aʊ/
few	/ju:/
due	/ju:/
Rhotacized Vowels
bird	/ɜːr/
beer	/ɪər/
bear	/ɛər/
bar	/ɑːr/
border	/ɔːr/
boarder	/ɔːr/
boor	/ʊər/
Reduced Vowels
roses	/ɪ/
rosa's	/ə/
runner	/ər/
bottle	/l̩/
button	/n̩/
rhythm	/m̩/

I've taken the liberty of changing "bored" in the table to "boarder" and adding "border", to highlight that your scheme ignores the NORTH-FORCE split. Which I disagree with, by the way. jnestorius^(talk) 21:37, 23 September 2006 (UTC)

Yep, my system does in fact ignore the NORTH-FORCE distinction (which isn't technically a split, but that's splitting hairs), and (although the table above only focuses on vowel phonemes) it would also ignore the WHICH-WITCH distinction. (And I must be derelict as a New Englander, seeing as how I actually know some people that maintain the NORTH-FORCE distinction!) I think we should only show features that are found in a) General American, b) RP, or c) mainstream Australian English. I think that nowadays, the NORTH-FORCE distinction isn't widespread enough to warrant inclusion in the pronunciation guide. There's a bunch of regional phonemic distinctions found in NYC, Ireland, Scotland, and Australia that arguably deserve as much recognition as NORTH-FORCE. --The Lazar 02:05, 24 September 2006 (UTC)

Well, NORTH-FORCE is one of the Scottish and Irish features. You also haven't specified a transcription for borrow; I'm not going to second-guess you there. Looking at your 5 different symbols for bad-bath-dance-pasta-father, I'm surprised you bridle at FORCE. jnestorius^(talk) 02:32, 24 September 2006 (UTC)

Here then is my proposal. Is anyone fundamentally opposed to attempting a one-transcription-fits-(nearly)-all-dialects standard? If not, let's quickly agree a first-draft policy, implement it soon, and fine-tune it later as needs be. We can all sit around arguing the fine details and nothing will get done. Let's start with a fairly minimal set of phonemes, so that where necessary we provide multiple transcriptions of a word or phrase. If, after review, it seems like adding an extra symbol will reduce this compromise, we can do so. As MZ points out, some distinctions for common words may not occur in the words we actually need to transcribe, so we will save time by not catering for such distinctions to begin with. The only thing that gives me pause is the question of revising existing transcriptions if our standard does change. Can anyone think of something clever with Templates / Whatlinkshere / Talkpage notes / Bots that might make this less painful? jnestorius^(talk) 02:32, 24 September 2006 (UTC)

I suppose you could make the case that a) NORTH-FORCE should be included because it's the most widespread of all the "regional" phonemic distinctions, and b) my system makes way too much use of length marks and half-length marks (ie, maybe the "pasta" foreign words and the differing extents of the trap-bath split in RP and AusEng are just cases where two transcriptions should be given). As for "borrow", it would use /ɒ/ (cf "merry", "marry", "hurry"); you're right that it should have been included for the sake of comprehensiveness. Below is a compromise between the viewpoints we've each expressed on this talk page. --The Lazar 02:43, 24 September 2006 (UTC)

Word	Compromise System
Full Vowels
bid	/ɪ/
Sirius	/ɪ/
bead	/iː/
bed	/ɛ/
merry	/ɛ/
bad	/æ/
marry	/æ/
bath	/æ:/
dance	/æ:/, /æ/
pasta	/ɑ:/, /æ/
father	/ɑː/
pod	/ɒ/
borrow	/ɒ/
cloth	/ɒː/
bought	/ɔː/
bud	/ʌ/
hurry	/ʌ/
toe	/oʊ/
good	/ʊ/
booed	/uː/
Diphthongs
bay	/eɪ/
boy	/ɔɪ/
buy	/aɪ/
cow	/aʊ/
few	/ju:/
due	/ju:/
Rhotacized Vowels
bird	/ɜːr/
beer	/ɪər/
bear	/ɛər/
bar	/ɑːr/
border	/ɔːr/
boarder	/oːr/
boor	/ʊər/
Reduced Vowels
roses	/ɪ/
rosa's	/ə/
runner	/ər/
bottle	/l̩/
button	/n̩/
rhythm	/m̩/

"Is anyone fundamentally opposed to attempting a one-transcription-fits-(nearly)-all-dialects standard?" Yes. I am. It's a futile attempt and a waste of time. Angr 14:51, 27 September 2006 (UTC)

So, what would you prefer to do? At the moment, pronunciation information in Wikipedia is a mess. Some articles use ad hoc "pro-NUN" spellings, which are fairly simple but aren't actually very good at representing certain not-so-subtle points of English phonology like the distinction most dialects have between the vowels of FOOT and GOOSE. Some (mainly astronomical, I think) use Kwami's Help:Pronunciation respelling key, which I think is actually quite a good idea (although the details of the scheme could probably be improved) but it's clear that a lot of people don't like it. Some use a single IPA phonemic transcription based on an unspecified dialect (GenAm more often than not, I suspect), which is nice if the reader speaks that dialect but maybe not if they don't. A few give different transcriptions based on different dialects, which is fine in some contexts but is too space-hungry in others. Some use IPA phonetic transcriptions, which suffer from the same problems as phonemic ones but IMO more so (because dialects differ more in their phonetics than in their phonology).--JHJ 16:26, 27 September 2006 (UTC)

I would prefer to list the pronunciations separately in the few cases where (1) a pronunciation guide is deemed helpful (it really isn't necessary for everyday English words unless the pronunciation is really divergent from what the spelling would lead you to expect, or in cases like Yoghurt and Tomato where it is precisely the difference in pronunciation that is interesting) and (2) the topic is not limited to one English-speaking country (placenames in Britain can be rendered in RP alone, those in the U.S. in GenAm alone, etc., following the principle already in place for spelling conventions). I think in cases where the only significant difference in pronunciation is the presence vs. absence of coda /r/, putting "(r)" in parentheses can cover both pronunciations. That was what I did at Berlin when I listed the pronunciation as [bə(r)ˈlɪn]; but even there it could be considered preferable to write "GenAm [bɚˈlɪn], RP [bəˈlɪn]". Angr 17:42, 27 September 2006 (UTC)

I agree that most articles do not need pronunciation guides. I (quite strongly) disagree that placenames in Scotland, Wales and northern England should have only RP pronunciations given. Writing Glasgow [ˈglɑːzgəʊ], for example, would be an abomination. (In fact, I'm almost tempted to try it and watch the reaction :-).) As for the [bə(r)ˈlɪn] solution, it seems reasonable in that example, but it'll run into problems with the GOAT vowel, for which I can think of about five different phonetic transcriptions in reasonably standard accents, for example.--JHJ 18:14, 27 September 2006 (UTC)

On second thoughts, I'm not sure that your RP version of Berlin is right. For my own speech I feel the sound in the first syllable is the NURSE vowel (a frontish roundedish vowel in my speech, corresponding to RP [ɜː]), not a schwa; it might be worth looking up something like the Longman Pronunciation Dictionary to check what it says.--JHJ 18:55, 27 September 2006 (UTC)

LPD agrees with you that Berlin has the NURSE vowel, not the lettER vowel, in the first syllable. So it makes an interesting minimal pair with one pronunciation of Boleyn. Angr 19:31, 27 September 2006 (UTC)

As for the "compromise system", it's mostly OK. I'd replace the syllabic consonants with schwa+ordinary consonant (which I find more intuitive, and the symbols involved are probably easier to deal with), and I'd change the example for reduced /ɪ/ from roses to something which is more reliably pronounced with /ɪ/ in dialects that make the distinction, Lenin for example.--JHJ 16:26, 27 September 2006 (UTC)

In the case of "syllabic m" using schwa + ordinary consonant is also more accurate. Despite the spelling, rhythm and prism are pronounced [ˈrɪðəm] and [ˈprɪzəm] in ordinary speech; syllabic [m] is really found only in fast speech and then only after bilabial consonants (e.g. [ˈoʊpm̩] for open). Angr 17:42, 27 September 2006 (UTC)

Since the exact transcription might be debatable, in cases like this I would prefer the simpler transcription, or the one requiring a simpler set of symbols, for the sake of the reader and the principal. The schwa is definitely required, so it may be better not to require the reader to learn the dotted syllabic consonants /l̩, n̩, m̩/. I don't object strongly to them though, because they still work if the reader ignores the dot, and of course, it is easily associated with the dot syllable symbol. (the same applies to using /ər/ instead of /əɹ, ɚ, ɝ/) —Michael Z. 2006-09-28 20:46 Z

So perhaps a set of guidelines as suggested by Angr above would be more useful than a detailed "generic" IPA scheme. Then we can fall back on some more detailed IPA guidelines, like "keep it simple", and "prefer the schwa to a syllabic consonant". There's lots of advice to be gleaned from all of the comments above. Let's try to distill this discussion into some helpful advice for writing IPA for English. —Michael Z. 2006-10-21 03:55 Z

A local-accent transcription alternative proposal

Regarding local pronunciations, Louisville, Kentucky#Pronunciation is a featured article with quite a nice section. (I'm not sure if ['lǝvǝl] would be better ['lʌvǝl]: is [ǝ] meant to be the local realisation of /ʌ/, and if so, it it accurate?)

I'm not wedded to a one-size-fits-all solution, but I certainly don't want to restrict transcriptions to RP/GAm for articles particularly associated with regions with other accents. If we model our pronunciation convention on the spelling convention, the appropriate phrasing of the spelling convention is not "use British or American spelling as appropriate"; it's "use local spelling". Currently, IPA transcriptions of English words link to Help:IPA for English. (well, some link to IPA but probably shouldn't.) I think they would better link to a page in the Wikipedia: namespace. We could have a series of such pages, one per accent, that would simply list all IPA symbols for its phonemes, plus a few sample words for each. I say put these in Wikipedia: rather than main namespace as they would be minimal pages designed solely as an aid for readers to work out the pronunciation, like the key in any dictionary. There would be links back to the Help:IPA for English article and to the main-namespace article which describes the relevant phonology in detail. To start with, we have Wikipedia:IPA key for Received Pronunciation, Wikipedia:IPA key for General American and Wikipedia:IPA key for Australian English. If someone then feels none of the above is suited for whatever word, they can add another page for the relevant accent and link to that, provided there is verifiability for the listed phoneme inventory and symbols. This places the onus on the local patriot to put in the extra effort they feel their locale deserves. As for non-region-specific words, I'm happy to point to RP and GAm transcriptions, since with that as a start it should be easy to work out the sound in one's local accent based on the spelling (RP /ˈfænʃɔː/ might be Scottish /ˈfænʃor/, but not if it's spelled Featherstonehaugh ). As to whether we need always specify both pronunciations: I guess not; but if someone sees only one and decides to add the other that shouldn't be disallowed. jnestorius^(talk) 20:10, 27 September 2006 (UTC)

Well, I'd prefer a diaphonemic system, which would almost certainly succeed in covering a wider range of dialects (though obviously it's not going to cover all dialects) with, in many cases, fewer transcriptions, and would probably be easier to read.--JHJ 17:40, 28 September 2006 (UTC)

An example

As a way of comparing how different methods cope with a particular example, how would they deal with the table of the main Uranian moons on the Uranus page? At the moment, it looks like this:

The main Uranian moons (compared to Earth's Moon)
Name (Pronunciation key)		Diameter (km)	Mass (kg)	Orbital radius (km)	Orbital period (d)
Miranda	mə-ran'-də məˈrændə	470 (14%)	7.0×10¹⁹ (0.1%)	129,000 (35%)	1.4 (5%)
Ariel	arr'-ee-əl ˈariəl	1160 (33%)	14×10²⁰ (1.8%)	191,000 (50%)	2.5 (10%)
Umbriel	um'-bree-əl ˈʌmbriəl	1170 (34%)	12×10²⁰ (1.6%)	266,000 (70%)	4.1 (15%)
Titania	tə-taan'-yə təˈtɑnjə	1580 (45%)	35×10²⁰ (4.8%)	436,000 (115%)	8.7 (30%)
Oberon	oe'-bər-on ˈɔʊbərɑn	1520 (44%)	30×10²⁰ (4.1%)	584,000 (150%)	13.5 (50%)

Under a diaphonemic scheme (the one currently labelled "Compromise System"), it might end up more like this:

The main Uranian moons (compared to Earth's Moon)
Name (Pronunciation key)		Diameter (km)	Mass (kg)	Orbital radius (km)	Orbital period (d)
Miranda	/mɪˈrændə/	470 (14%)	7.0×10¹⁹ (0.1%)	129,000 (35%)	1.4 (5%)
Ariel	/ˈæriəl/	1160 (33%)	14×10²⁰ (1.8%)	191,000 (50%)	2.5 (10%)
Umbriel	/ˈʌmbriəl/	1170 (34%)	12×10²⁰ (1.6%)	266,000 (70%)	4.1 (15%)
Titania	/tɪˈtɑːnjə/	1580 (45%)	35×10²⁰ (4.8%)	436,000 (115%)	8.7 (30%)
Oberon	/ˈoʊbərɒn/	1520 (44%)	30×10²⁰ (4.1%)	584,000 (150%)	13.5 (50%)

with a link to the key (which should be in the Wikipedia: namespace) where the link to Help:Pronunciation respelling key is now. Note how it needs only one transcription for Oberon, which would probably need several in some other methods.

---JHJ 17:40, 28 September 2006 (UTC)

That really helps put it into perspective. It certainly looks suitable to get the pronunciation across to me, a Canadian. Perhaps we can find more real examples to get a broader picture. I'll start with one, below. —Michael Z. 2006-09-28 18:52 Z

Recce /ˈre.ki/ (from reconnaissance)

If you want some to play with, try List of names in English with non-intuitive pronunciations jnestorius^(talk) 19:42, 30 September 2006 (UTC)

Lithuanian

I've put a notice at Wikipedia talk:Simplified phonetic transcription for Lithuanian drawing its editors' attention to this page, and vice versa. Looks like it's only used on a few articles. jnestorius^(talk) 19:42, 30 September 2006 (UTC)

Doesn't seem to make a whole lot of sense

I'm not a linguist, so I'm not an expert on systems for representing pronunciation. However, it would seem strange to me to develop your own system and keep the non-intuitive aspects of the IPA. As long as you're going to diverge from IPA, you might as well go all-out and adopt a system like that used in the Random House Dictionary of the English Language, reserving the IPA only for those sounds not commonly in the English language. -- Mwalcoff 00:44, 24 October 2006 (UTC)

Choosing an IPA scheme for English is not developing our own system. IPA is very flexible, and allows the transcriber to create a narrow or broad, phonetic or phonemic transcription. The work we're doing here is to help editors keep IPA for English consistent, and to better represent various English accents or dialects with fewer transcriptions.

I don't know what you mean by non-intuitive aspects. The IPA draws on various characters used in English and other Latin alphabets, and is more intuitive than the similar-looking but diverse plethora of respelling systems used in other dictionaries. —Michael Z. 2006-10-24 01:15 Z

See, and that's why I wrote what I did above in "What are we trying to achieve?" The original point was actually a simplified system that was easy to use and understand, reserving proper IPA for cases where the details mattered. Attempting a new interpretation of IPA that allows for a consistent transcription across multiple accents is much, much more ambitious, and is something that linguists have failed at for years. Trying to do so here as amateurs is futile. — Saxifrage ✎ 01:43, 24 October 2006 (UTC)

(edit conflict)

Fair enough on your first point. On your second, I don't want to get into the pros and cons of IPA again, but there's no way the IPA is more intuitive than the systems used in American dictionaries as far as native speakers of English are concerned. Which do you think is more intuitive? "ʃɪˈkɑːgoʊ" or "shi-kä'-gō"? -- Mwalcoff 01:48, 24 October 2006 (UTC)

Well, I agree that choosing an existing dictionary scheme is probably a safer way to go than developing an all-new system. Doing a detailed survey of existing IPA schemes is the right start, anyway.

Honestly, I prefer the IPA. I grew up with the North American-style pronunciation systems in my school dictionaries, and I still don't know what ä means—or is that a, aa, ah, aw, ȧ, or o?

Yes, an English reader can easily guess the consonants in shi-kä'-gō, but there is nothing particularly intuitive about the vowels. Does "shi" sound like shy, she, or ship? Does kä represent caw or kay? Does gō sound like gaw or go? The diacritics don't give many hints. I happen to know that macrons usually top "long" vowels and breves "short" ones. I know what the diaeresis or umlaut represents in French and German, but kä bears no relationship to them. And is shi-kä'-gō more intuitive than SHi-kä'-gō, shi-kä'-gO, shih-kah'-goh, shi-kaa'-gō, shĭ-kä'-gō, s͡hi-kä'-gō, or shi-kaw'-gō? It appears that other American dictionary editors thought that aa, ah, aw, ȧ, or even o were just as good. At least I can learn one IPA, and recognize it when I see it. But if we transcribe words with one of the other systems, then I need the key at hand every single time I read a transcription.

I am a bit familiar with some other European languages (although the only one I actually speak is written with the Cyrillic alphabet), so to me the "pure" sound of the basic IPA vowels is intuitive: [a, e, i, o, u]. The sound of [æ] is pretty easy to guess, [ɑ] and [ɒ] are rounded a-sounds, the one that looks more like a normal letter a sounds closer to [a]. At least the combination [oʊ] looks like a diphthong of [o] and [u], unlike ō. [θ] and [ð] correspond to the sounds in the Greek and Old English letters. Very distinctive symbols like the long [ʃ] are easy to learn and remember. Most of IPA has a similar kind of visual logic or inherent mnemonic aid, to me.

Finally, I find that on the computer screen, it can be hard to distinguish diacritics accurately, especially háčeks from breves. I laboured to transcribe the OED respelling from a low-res scan: it made my brain hurt, and I'm not positive I got it all right. In contrast, IPA uses distinctive letter shapes to convey the most important information, and only uses diacritics for more obscure technical stuff.

No matter which system is used, a reader will require a key or some learning to understand it well enough to use it even for ordinary English words. But at least IPA is unified and consistent, is easy to recognize, is found in many English dictionaries (maybe I'm also prejudiced because the best Canadian dictionary, the Canadian Oxford uses it), is usable for other languages, is understood by many or most learners of English, and has some internal visual logic. When I look at all the other systems in the respelling article table, my mind just boggles. —Michael Z. 2006-10-24 02:51 Z

It is true that IPA can have a lot of variation in the way it's transcribed. The other aim of the current effort is to help keep it sensible and consistent for our purposes. Cheers. —Michael Z. 2006-10-24 03:27 Z

As I said before, I'm not going to get into the pros and cons of the IPA. But I'm sure that you'll get far more people who will be able to guess the pronunciation of "shi-kä'-gō" than "ʃɪˈkɑːgoʊ". That's not just because the IPA uses several oddball symbols, but because many people, in the US at least, remember the symbols for short and long vowels from phonics education. The "long" vowels have macrons; the "short" ones don't. That's 10 vowel sounds taken care of right there. Almost all American dictionaries use those symbols. (Some use breve accents for short vowels.) Because the letter "a" has three basic sounds, the "ä" symbol is what's often used to represent the "father" sound. There are differences among the systems used in the different dictionaries because they are copyrighted, and they can't steal from each other. But all the major mass-market dictionaries (New World, Merriam, Random House, etc.) use a variation of what millions of kids learn in school.

Your view of what's intuitive is colored by your knowledge of other languages. In English, the letter "i" makes the /ɪ/ or /aɪ/ sound, so using it to mean the sound in "meet" is completely non-intuitive. Most native English speakers would tend to think of the "o" in "Chicago" as a single sound, so it would make no sense to them to throw a second symbol in after the /o/. You may think /æ/ is easy to guess, but I had no idea what it meant when I first saw it. And how many English speakers know Greek or Old English?

The IPA does have its advantages, especially in scientific use. But you're not going to convince me that /tʃʌm/ is more intuitive than /chum/. -- Mwalcoff 03:32, 24 October 2006 (UTC)

Some good points. Which one of the dictionary systems would you choose? —Michael Z. 2006-10-24 07:47 Z

I'd suggest an arbitrary respelling key, like (though not copied from) the ones used by dictionaries. Currently articles tend to have IPA and proNUN guides in them, the latter being inconsistent and amateurish. A guide that uses a straight "this symbol = this sound" key would be useful to readers. It can't replace the IPA due to the lack of detail and precision, but if we're using two systems (for different ends) anyway, why not do the reader-friendly one well? — Saxifrage ✎ 19:36, 24 October 2006 (UTC)

I agree. We do have the issue that the systems used in American dictionaries are all copyrighted. But in reality, there's only a few common English sounds we'd have to worry about, since the symbols for "short" and "long" vowels, as well as "ä," are nearly universal in the US. We'd need to choose symbols for the vowels in took and paw and decide what to do about r-colored vowels. As far as consonants go, we could just use the letter or letters that represent each sound in English and use IPA for other sounds. We'd have to decide whether to allow for two letters to represent one sound or to come up with a way to make them one symbol, as in "s͡h." Finally, we'd have to find a way to differentiate the two "th" sounds. -- Mwalcoff 22:22, 24 October 2006 (UTC)

Copyrighted!? I hadn't thought of that. Isn't there at least a solid and simple scheme from an old Webster's or something else out of copyright we can use? I know that no original research is meant for article content, but the same principal applies to meta-information: some professional linguist probably did a better job than we could, and his system will have stood the test of time (at least if it's one that continued to be used for any length of time). Also, I thought part of the point was to use a system more familiar than IPA, which is out the window if we just mix-'n'-match our own. —Michael Z. 2006-10-26 00:11 Z

I don't know. Wiktionary uses the American Heritage system (as well as IPA and SAMPA), and as far as I know, they haven't been sued yet. -- Mwalcoff 00:53, 27 October 2006 (UTC)

OT, but: Michael Z., IPA cardinal [ɑ] is a low, back unrounded sound and about the only time I’ve seen /ɑ/ to represent a rounded vowel aside from in confused cross-dialectal English transcriptions is in Swedish and Hungarian where the rounding is not contrastive for either; nor compulsory in Swedish; and there’s certain phonological advantages to ignoring the rounding in Hungarian. In fact, the only difference between IPA cardinal [ɑ] and [ɒ] is that the latter has open rounding. —Felix the Cassowary 06:33, 24 October 2006 (UTC)

Hm, I guess I didn't mean rounded in the technical sense. The two round-shaped a's sound similar to me: I guess the back vowels feel somewhat between [a] and [o], and the one that looks more like a conventional letter a is the one that is less rounded, or closer to the [a]. Not sure if it makes sense the way I explain it, but it works to remind me of the sound. —Michael Z. 2006-10-24 07:41 Z

How to enter IPA

There is no instruction on how to enter IPA on this page. Don't you think that there should be? Many it seems don't know how. One way is to enter the unicode number like this "ɑ" (you'll have to go to edit to be able to see what I mean, note also the IPA template). Another way is to copy & past the symbol (still remember the IPA template unless it's a regular roman letter). Here's one place to find these. Jimp 02:58, 25 October 2006 (UTC)

I've added a short section to get you started: Wikipedia:Manual of Style (pronunciation)#Entering IPA characters. I use the OS X character palette all the time. —Michael Z. 2006-10-25 04:27 Z

Michael, thank you for writing the section. I'm sure editors will find it useful. Jimp 03:33, 26 October 2006 (UTC)

I just added an observation that you can add IPA characters using Wikipedia’s character map. In fact, I’m a little confused as to why this whole discussion came up, seeing as AFAIK almost all IPA characters can be added using it. Anyway, Saxifrage just commented it out, saying ‘most IPA characters are not in the character map below the edit box right now’. But as far as I can see, most are: They were added a while ago. For reference, I can see:

IPA: t̪ d̪ ʈ ɖ ɟ ɡ ɢ ʡ ʔ ɸ ʃ ʒ ɕ ʑ ʂ ʐ ʝ ɣ ʁ ʕ ʜ ʢ ɦ ɱ ɳ ɲ ŋ ɴ ʋ ɹ ɻ ɰ ʙ ʀ ɾ ɽ ɫ ɬ ɮ ɺ ɭ ʎ ʟ ɥ ʍ ɧ ɓ ɗ ʄ ɠ ʛ ʘ ǀ ǃ ǂ ǁ ɨ ʉ ɯ ɪ ʏ ʊ ɘ ɵ ɤ ə ɚ ɛ ɜ ɝ ɞ ʌ ɔ ɐ ɶ ɑ ɒ ʰ ʷ ʲ ˠ ˤ ⁿ ˡ ˈ ˌ ː ˑ ̪ • {{IPA|}}

As well as, from the various other sections,

Characters: ... ħ ð þ œ æ ø (skipping caps)

Greek: ... β ... θ ... χ

That seems to be far more than most people would ever use; the only ones I can immediately not see are the ligatures and some ext-IPA symbols, but the ligatures are deprecated now, I think, and ext-IPA symbols aren’t actually IPA symbols.

Are there any glaring oversights? Is Wikipedia behaving different from me versus everyone else?—I’ll note that Wiktionary had more when Wikipedia only had less, but I’m sure I did nothing to get these extra characters showing.

—Felix the Cassowary 12:55, 27 October 2006 (UTC)

My pardon. Either my brain malfunctioned or they're changing the character map from day-to-day again. I recall not seeing an IPA section when I looked yesterday. Consider my objection retracted! — Saxifrage ✎ 17:00, 27 October 2006 (UTC)

I always forget that editors see that. I used my style sheet to hide the character map many months (years?) ago, because it didn't work in my browser, and added a lot of visual clutter to the page. —Michael Z. 2006-10-27 17:29 Z

The topic came up because one day I was wandering around the 'pedia and stumbled on someone's comment that they don't know how to enter IPA. I'd recently suggested to some other editor that he use IPA (instead of SAMPA). I thought "Hey, maybe that bloke doesn't know how to enter IPA either." I thought it might be useful to have somewhere to point such editors. Yes, there is the character map but some editors, especially new ones, might not even know what it's all about also not all computers display all the characters ... I have the proof before me I'm afraid. -- Jimp 16:16, 8 November 2006 (UTC)

Audio pronunciation proposal

We should replace the IPA pronunciation guides in all Wikipedia articles with actual audio files of humans speaking the words, like Wiktionary has for many of its entries. If there are (a guess) 100,000 articles that would benefit from a pronunciation, we would need just 1000 editors to pronounce 100 words each. We could have different icons to tag a person's accent: "This is an American pronunciation of the word", "This is a British pronunciation of the word", etc.

This would be useful to people, unlike IPA pronunciation guides, which are of use only to linguists, because, statistically, nobody uses IPA and nobody understands it. Tempshill 20:40, 3 November 2006 (UTC)

I think the only statistics that bake up your bolded, italicised statement are either very old or were a survey of two cats and three dogs or similarly incomplete samples. Many people understand the IPA, it’s just that they’re probably a small minority of all potential Wikipedia readers.

We should not replace the IPA pronunciations with audio files; sometimes, someone who can’t play the sound for whatever reasons but is fluent with the IPA might come across an article, and the pronunciation key would help them. This has happened to me many times.

But we certainly should add them. I’ve seen a few articles with audio files (I’ve rarely listened to them, so they might be wildly inaccurate), and I see no reason why more shouldn’t have them. I’m sure people who have better recording setups than me will continue to contribute.

—Felix the Cassowary 00:31, 4 November 2006 (UTC)

I would start a project to try and get this going but lack the time at the moment. BTW, you're wildly overestimating the number of people who understand IPA, which is in fact useless to non-linguists. Tempshill 17:33, 4 November 2006 (UTC)

If you can provide a more accessible textual rendering of speech, you're welcome to suggest that. In the meantime however, IPA is the most accurate and precise way of representing text that is in common use by those who need to accurately and precisely represent speech. That said, a drive to get audio files added to all pages that currently have pronunciation guides is a great idea. — Saxifrage ✎ 17:59, 4 November 2006 (UTC)

Strong words. Actors and (opera)-singers also depend on IPA for singing/talking pieces not in their own language. If you know IPA you can pronounce just about anything, what you get wrong is intonation and stress-patterns. I don't know of any good scheme to show intonation. Kaleissin 13:10, 29 January 2007 (UTC)

While recordings might be helpful, it is ridiculous to say that the IPA guides should be removed because no one understands them. If you don't understand them, then look up what they mean in the article on IPA. If you are using Wikipedia, then you are obviously familiar with the practice of looking up unknown information, so why not do so here? You would have to do this anyway no matter what system of phonetic respelling was used, since any of them can only be understood with reference to a key. And if some other, idiosyncratic respelling system were used, there would actually be FEWER people who could say they understood it properly -- EVERYONE would have to look up the key, since no one could be sure what it meant. At least now, with the standardized IPA transcriptions, those people who happened to have been exposed to IPA before can dispense with that inconvenience -- the people who haven't had IPA before must, it is true, resort to the pronunciation key, but they would have to do so no matter what system was used.

The tendency of your criticism is therefore against not only IPA, but against any respelling system. Should all the phonetic respellings really be replaced with audio recordings? Such a move is not only unnecessary and impractical, it is insulting, since anyone who has ever opened a dictionary is familiar with the principle of phonetic respelling -- it would be, I think, a form of insult to imply that our readers are either too unintelligent, or too lazy, to apply the same principle here. I support the use of recordings as supplements to the respellings, but to remove them entirely would make the reader dependent on something cumbersome and likely to malfunction, in place of a supposedly complex notation which, in reality, anyone can easily understand who wishes to take the time to do so, and which the rest may safely ignore.Gheuf 07:34, 6 November 2006 (UTC)

We can all believe whatever we want to about whether the number of people who can come to understand IPA is significant or insignificant; the truth is, there are many situations where audio is superior, and still some situations (such as print republication, always a possibility for Wikipedia; techincal or contextual problems with audio, such as in a small library; second-language or hearing-impaired listeners who cannot catch some phonetic distinctions by ear; etc.) where audio is impractical and IPA is still necessary. Let's agree that both are desirable, no agreement is necessary on which is "better". This thread is going nowhere. --Homunq 13:40, 29 January 2007 (UTC)

Problems with foreign-language IPA

There are many pages where a foreign name is given with a foreign pronunciation. This is great for pretentious people who like to try to emulate foreign languages when saying foreign names, but for people like me, who just want to know how to say a foreign name in an ordinary English way, the way a newscaster would say it, these pronunciations are not very helpful. For example, the article on Chopin gives a fantastic IPA transcription for how to say his name in French ([fʁedeʁik fʁɑ̃swa ʃɔpɛ̃]), but gives no clue for us ordinary folk who don't want to sound like snobby francophiles how to pronounce this famous composer's name. The normal English pronunciation for Chopin is not obvious—one might guess [tʃɒpən] or [tʃoʊpən] given no knowledge of French—but is in fact [ˈʃoʊpæn].

The Chopin article says only that he is Polish-French, so one could reasonably conclude that the IPA given is supposed to be any of the three languages in context—French, Polish, or English. I feel like the MoS should more strongly encourage including English IPA for foreign names, and should require clearly labeling foreign IPA with what language it represents. I think the guidelines should say that whenever a non-English pronunciation is given, the language should be identified. Nohat 06:44, 7 November 2006 (UTC)

Personally, I'd prefer to be told how he pronounced it himself than to be told how I "should" pronounce it in English. (And you would need a source for your claim about the "usual" English pronunciation; I don't say it like that, and not just because your transcription is American and I'm not - I don't say it like "show pan" in my accent either.) I do agree that it should be made clear that it's French, though.--JHJ 09:57, 7 November 2006 (UTC)

I don't think there is any real argument that foreign names don't have English pronunciations—Most English dictionaries (OED excepted) give pronunciations for proper names, and there are whole dictionaries of pronunciations, some of which focus just on proper names [3]. All these pronunciation sources give English pronunciations; that is, pronunciations using only English phonemes. Getting sources for proper name transcriptions is not difficult, then, but really is only necessary if there is something controversial about a pronunciation, as, apparently, is the case with Chopin. Indeed I think it is the foreign-language pronunciations which will be more difficult to find sources for than English ones.

As for the pronunciation of Chopin, Merriam-Webster, American Heritage, and Random House all give [ʃoʊpæn] [4], [5], [6]. None of the online British dictionaries that I can find have entries for Chopin, so if British English or related dialects have another pronunciation for it, that information is not available in obvious places online. Regardless, I would have thought it would be obvious that if there is more than one usual English pronunciation, then they should all be given, like at Gdansk. I wasn't trying to make some overwrought point about what I think is the "correct" English pronunciation of Chopin. The only point I was making is that this is the English Wikipedia, and there ought to be English pronunciation guides. I'm certainly not opposing foreign-language IPA pronunciations—obviously some people, such as yourself, find them useful—I just think that overall they are less useful than English IPA pronunciations. There are fewer people who can read them (only people who know all of the IPA rather than just the subset used to represent English), even fewer people who can properly pronounce them, and of those who can, only those who like to flaunt their knowledge of foreign languages would actually use them. English pronunciations, on the other hand, are useful to people who are only familiar with the IPA used for English, which is a much larger group, and who want to talk about the subject of a Wikipedia article without sounding ignorant (or pretentious). Nohat 18:14, 7 November 2006 (UTC)

OK. I'm just saying: (a) I personally prefer having the original forms; to someone familiar with the IPA they provide more information. (b) We need to be careful with anglicisations - there's more variation than some people realise both in how much names are anglicised and in which sounds are chosen. I have a nasalised vowel (and no [n]) in Chopin, which I would think is not unusual in Britain at least. In many words British speakers tend to anglicise [a] and [o] sounds as the TRAP and LOT vowels respectively, while American speakers tend to prefer the PALM and GOAT vowels (using Wells's lexical sets rather than dialect-specific phonetic symbols); this all makes sense given differences between accents, but it needs care. (c) Often (and I think Chopin may be an example) most readers will already know how they pronounce it in English, so is an English pronunciation really necessary?--JHJ 19:53, 7 November 2006 (UTC)

It seems really dangerous to assume that anyone reading the Wikipedia article on a topic will already know anything about the topic, let alone how to pronounce it. It is true that there is variation in how names are anglicized, but that shouldn't be construed as a justification for not trying to inform our readers at all. As it stands, if someone who doesn't know how to pronounce Chopin goes to the Chopin, he or she will not likely come away with any idea of how to pronounce it. If we added something like "English: [ʃoʊpæn] or [ʃoʊpæ̃]", then the reader has a much better chance of coming away with an idea for how to pronounce this foreign name. Setting aside irrelevant debates on how best to represent the GOAT vowel in IPA, I'm not sure how including this information could be seen as anything but an improvement. Nohat 22:35, 7 November 2006 (UTC)

It seems to me that most foreign names will not have a canonical anglicized pronunciation; people will use the nearest approximating phonemes and these may vary by (English) accent. The Gdańsk example is interesting: "The Polish name Gdańsk is usually pronounced IPA [gəˈdɑːnsk], [gəˈdaɪnsk], or [gəˈdænsk] in English." I don't think that's a good model. To me, [ɑː] (well, [ɑ]) seems American and [æ] British, and [aɪ] just plain wrong. It could be fleshed out with more detail, saying which pronunciations are accepted or disputed in which English-speaking region. In that particular case, I think listening to the Polish recording and trying to approximate it will serve well enough for any English-speaker. More interesting are cases where, as Nohat says, there is a danger of sounding pretentious by being too faithful to the source language. In such cases, I agree the anglicized pronunciation is appropriate for inclusion. Of course, here too there are differences by English accent e.g. French stress. The best standard would be "if no anglicized transcription is provided, assume it approximates the native one"; however, in a permanent work-in-progress like Wikipedia it will never be possible to know whether the lack of an anglicized transcription is in accordance with this principle or simply a lacuna. jnestorius^(talk) 20:57, 7 November 2006 (UTC)

I think that the claim that most foreign names will not have a canonical anglicization is incorrect. General as well as pronouncing dictionaries are full of them, and this information should be provider to our readers.

American Heritage gives all three of [gəˈdɑːnsk], [gəˈdaɪnsk], and [gəˈdænsk] for Gdansk. Merriam-Webster and Random House both give [gəˈdɑːnsk] and [gəˈdænsk]. NBC Handbook of Pronunciation gives just [gəˈdɑːnsk]. Longman Pronouncing Dictionary gives [gəˈdænsk] for British English and [gəˈdɑːnsk] for American English. Pronouncing Dictionary of Proper Names gives both [gəˈdɑːnsk] and [gəˈdænsk], without any dialectical marks. In summary, [gəˈdɑːnsk] and [gəˈdænsk] appear in many pronunciation authorities of both American and British provenance, although [gəˈdɑːnsk] is identified as American and [gəˈdænsk] as British in Longman, and [gəˈdaɪnsk] only appears in American Heritage.

If you listen to the recording on Gdansk, I think you will agree that [gəˈdaɪnsk] sounds most like the Polish. But yet, this pronunciation only appears in American Heritage (as the last pronunciation given) and you called it "just plain wrong". Which kind of disproves the argument that "listening to the Polish recording and trying to approximate it will serve well enough for any English speaker", doesn't it? Nohat 22:35, 7 November 2006 (UTC)

I unreservedly agree with Nohat. Many placenames and people from foreign-language-speaking areas have conventionalised English pronunciations (which might differ between dialects). These should always be listed as the first pronunciation in any lists of pronunciations. Number of times I’ve gone to an article on Wikipedia primarily to find out what the accepted English pronunciation is, only to be presented with the foreign pronunciation, has been really disappointing... For many people and places, I think if you tried to pronunce the name according to the nearest English phonemes, many people would have no idea what you were talking about. —Felix the Cassowary 23:35, 7 November 2006 (UTC)

Nohat: Touché! ([ˈtaʊtʃɪ]??) I guess the choice between evaluating a pronunciation as "pretentious" and "wrong" depends on how much of the foreign language you know — guess how much Polish I know...still, the Polish transcription was up there too...mea culpa ([ʃaɪt]). I prefer the native pronunciation before the English, but since for many places, the English spelling is also different, it would be more consistent to do as Felix says. Thus Ganges River would stay:

The Ganges River (English: /ˈgænʤiz/; Gangā /ˈgəŋgaː/ in most Indian languages)

while Colombia would have an addition (assuming this is the correct pronunciation...gee, it's not like the Spanish at all!):

Colombia (English [kə'lʌmbɪə]), or formally, the Republic of Colombia (Spanish: República de Colombia, IPA [re'puβ̞lika ð̞e ko'lombja])

jnestorius^(talk) 00:03, 8 November 2006 (UTC)

The Ganges and Colombia examples are exactly what I mean! Just like you say. I'm not sure, though, what to do about extremely minor phonemic differences like the difference between what you put for Colombia and what I would have put: English [kə'lʌmbiə].

In any case, I would like to recommend we add the following to the "Foreign names" section of the MoS page:

When a foreign name has a usual English pronunciation (or pronunciations), include both the English and foreign-language pronunciations. Transcriptions should always have a label identifying what language they are transcribing.

Nohat 01:02, 8 November 2006 (UTC)

I endorse your suggested wording. As for the minor phonemic differences issue, that's what most of this Talk page has been discussing for a good while now, to little effect. jnestorius^(talk) 10:12, 8 November 2006 (UTC)

I also endorse that wording. The issue of how to represent English pronunciations still exists, of course, but it's not specific to foreign names.--JHJ 12:02, 8 November 2006 (UTC)

Since the proposed addition was uncontroversial, I added it to the guidelines. Nohat 19:03, 10 November 2006 (UTC)