Jump to content

Wikipedia talk:Manual of Style/Spelling

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Jared Grainger (talk | contribs) at 17:21, 10 January 2006 (Tagging pages: well don't remove the context from my writing). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Tagging pages - Read this first

This section summarises a proposal for handling national varieties of English in WP. Please read this first, then discuss below and vote.

The Problems

If you are reading this, you are probably familiar with some of the following issues

  1. Spelling inconsistencies within and across articles
  2. Resources wasted "correcting" spellings
  3. Resources wasted "correcting back" spellings (either after some time, and so on forever, or immediately, potentially starting arguments or edit wars)
  4. Resources wasted arguing and trying to interpret the current guidelines.

The proposal

A series of templates will be defined, in the form {en:humour} (with double braces of course). Once that is done, editors will write variant words in the following way

I have no sense of {{en:humour}} because I'm a pizza.

A "locale" setting will be added to user profiles, which allows them to specify their preferred variety. Based on that setting, the above will be rendered as

I have no sense of humour because I'm a pizza.

for a "UK" user and as

I have no sense of humor because I'm a pizza.

for a "US" user. That way, everyone gets to read WP articles with their favourite spelling style. It will be like having two (or more) WP editions, that are always automatically in sync—i.e., unlike the editions for lanuages other than English, they will not be affected by forking.

The dynamics of the proposal

With this mechanism in place it is very unlikely that a casual reader changes a variant word to his preferred spelling, because as soon as they see the template, they think about it twice, possibly learn about yet another benefit of being logged in, and if they care so much about these things they will propagate this idiom in other articles and for other words. People would learn by example and WP would converge gradually to a superior equilibrium, starting from the current state.

This mechanism can be extended easily to other varieties of the English language either immediately or when/if the need arises.

There would be an equilibrium which is the one and only correct equilibrium, which will please everyone, and which every good-intentioned wikipedian will actively contribute to reach. As opposed to the current chaotic situation whereby there are opposing forces trying to pull the spelling to "their" side, and the WP in this sense will never reach an equilibrium.

Using such templates would be a guideline. As with many guidelines, some editors (most, I would say) will be unaware of it, and write "naturally". Note that this will make articles strictly no worse than the current situation. Then the anal guy comes around and spots the oh-my-god-horrible "misspelling". Being anal, he or she is aware of the guidelines, and therefore corrects it accordingly. Everybody happy, the end.

Again, if you forget to stick to the rules (genuinely or deliberately), it's not a problem, because most people won't notice or won't care, while the anal guys above will correct it immediately to the correct version.

Open issues (food for thought)

Q: How would the title of an article be handled?
A: Good point. Not worse than it is now, but ideas are welcome.

Other features

This mechanism could be useful for other languages (e.g. Portuguese) that have similar spelling idiosyncrasies.

Also it may turn out handy for other similar issues that some people feel strongly about, such as spelling of "God/G-d/god".

QA

Q1: {en:humour} or {en:humor}?
A1: Since this code will be seen only by editors, it doesn't really matter. We could allow both to mean the same thing.

Q2: How do we avoid the "UK Labor Party"?
A2: When talking about the UK Labour Party, you would write it as is, like you do now. Anybody manually correcting it as "Labor Party" or relying on "{en:labour} party" is clearly making a mistake.

Q3: What happens to users that are not logged in or that haven't set the locale preference?
A3: Users that are not logged-in see the default spelling. The default can be US, UK, Jamaican, we can have a vote, we can try to determine it using the IP address, whatever. But I don't think this is important because then if somebody complains we can simply say: well, if you don't like this create an account and/or set up your preferences.

Q4: Can we use bots to get to the equilibrium faster?
A4: We could, but I don't think it's a good idea. Instances like "Labour Party" would be messed up, and I think it's more important not to break what's currently correct than automating the changes. I think trying to automate the transition is missing the point made in the "dynamics" section.

Q5: Wouldn't this make Wikitext totally unreadable and a chore to edit?
A5: Please explain how this

I have a poor sense of {{en::humor}} because I'm a pizza.

is so much more unreadable and difficult to edit than this

I have a poor sense of humor—I'm a pizza after all. (P. Margherita)

Q6: I can read all flavours of English just fine. Isn't this proposal useless?
A6: No. See "The Problems" subsection above.

Q7: Isn't this gonna put strain on our servers?
A7: No. All is required is parsing the templates, looking up a user preference and looking up the right spelling in a static map in memory. Compare this to the all-important "Skin" feature.

Q8: Isn't this gonna take a long time to implement?
A8: This is besides the point. We are not asking you to do it, simply to express your view on the proposal. If there is an agreement that this would be a welcome feature, it will be prioritised.

Q9: What about other grammatical/punctuation variations?
A9: I think we should stick to spellings to start with, for the following reasons:

  1. It's much simpler
  2. It would be much easier to convince people that this is a good idea

In other words, I think that anything more complicated than "downtown" vs. "city centre" should not be done just yet.

Q10: I'd still like to read about US topics in US spelling and UK topics in UK spelling.
A10: We can leave local spellings throughout a local article as per current guidelines. Or we can add a {context:UK} or {context:US} at the top of "local" articles/sections and users can choose to let this template override the localisation templates.

Q11: My idea is along the same lines, but it's better.
A11: Excellent, we'd love to hear about it! The proposal above is only tentative, and some variants are already being discussed (e.g. MOIO vs MABIO below). But don't forget to vote!

PizzaMargherita 14:34, 8 January 2006 (UTC)[reply]

Votes on tagging pages proposal

"A feature like this would be welcome in the English Wikipedia."

  • Agree I like the idea, but only if the implementation at least has some kind of automatic tagging, possibly in the form of a checkbox when editing a page. I am also in favour of exploring an alternate implementation (MOIO, see below) further. Jared Grainger 18:29, 8 January 2006 (UTC)[reply]
  • Mildly oppose. Thank you for putting so much effort into explaining your proposal. I can understand your reasoning, but I disagree. Here is why I oppose the proposal:
  1. I don't like the idea of splitting Wikipedia into two (or more) viewing modes. There is only one English language and there should only be one Wikipedia. Wikipedia is an international project and having different spellings coexist in that project gives it an international flavour.
  2. There are many spelling variations that can be used in both British and American English (dialog(ue), travel(l)er, realize(-ise), fetus/foetus, per cent or percent, theatre/theater and so on...) What about Canadian spelling? In order to make everyone happy, you'd have to devise a system that enables every user to create his own individual system of spelling.
  3. There would never be consensus about the "default view". I'd guess that more than 90% of all Wikipedia users are not logged in. What will they see? Using tags will shift the current spelling controversies to a higher level of abstraction. Now, people argue about what spellings to use in a article. Using tags, they will argue about how these tags will be interpreted.
  4. Assuming (on average) five spelling variants per article, roughly five million tags must be put into place! Did you realize that? I think resources should rather be used to write new and improve already existing articles. I know that many people care about spelling, but all in all, it's a minor issue.
  5. Spelling is just the tip of the iceberg of linguistic variations. What about punctuation, grammar, lexical differences?
  6. There is one rule concerning spelling that has gained general consensus: Articles related to a certain English-speaking country should bear that country's spelling. For instance: London -> UK spelling. There would be strong resistance among the editors of such articles to changing all British spellings like behaviour to {en:behavior} just in order to allow a small minority of users to read the article in their favourite spelling. Nobbie 13:20, 10 January 2006 (UTC)[reply]
  • Oppose. Burden on resources and on editors wading through the edit screen, for minimal benefit. Problems going beyond spelling, such as "she is in hospital" and the like. Too many options where more than one spelling is used within one geographical region, even if another region doesn't use one of them. Would create more arguments than it avoids. Gene Nygaard 16:45, 10 January 2006 (UTC)[reply]

Tagging pages

Please note, the top part of these comments precedes the proposal above. PizzaMargherita 11:25, 9 January 2006 (UTC)[reply]

Is it feasible to create some templates to tag every pages (by en-GB, en-AU, en-GB-oed, en-US, en-CA.....). It is more convenient for editors. So that pages can be kept more consistence. - Cheung1304 19:27, 3 Mar 2005 (UTC)

There seems to be a discussion about tags on the talk page of Wikipedia:Manual_of_Style Nobbie 13:48, 4 Mar 2005 (UTC)

Proposed templates: Template:BrE, Template:AmE, Template:CaE, etc. Cheung1304 03:45, 9 Mar 2005 (UTC)

We shouldn't mark articles as only being editable by Britons, Americans, Canadians, etc. This is a very divisive proposal that would only add to Wikistress and edit wars, jguk 23:36, 22 Mar 2005 (UTC)
This isn't the right approach. Articles should avoid containing editorial information; that's what the Talk: pages are for. Another approach would be to use HTML comments <!-- comments -->, but I'm not sure it's necessary. — Matt Crypto 11:47, 23 Mar 2005 (UTC)
A comment saying <!-- This page is written using (country) English. Do not change the spelling or style to that of another dialect. --> would be sufficient. For example, there could be comments in the coding for the pages on Chicago, Illinois and White House warning people that changing the text on those pages from American English to British English is a no-no. --/ɛvɪs/ 20:16, Mar 23, 2005 (UTC)
To use Kenneth Williams' last words, what's the bloody point? jguk 20:36, 23 Mar 2005 (UTC)
<rant>Beause people like you think we all come from England. Some of us however, LIVE IN THE WESTERN HEMISPHERE. You don't need to overrun this site with british words that the general public cannot understand. </rant> 209.2.60.75 20:12, 3 October 2005 (UTC)[reply]


Maybe you'd be interested to know that I also live in the Western hemisphere? :) jguk 20:33, 3 October 2005 (UTC)[reply]

Hmmm, I think only one country in the western hemisphere uses US spelling, the rest (that have English as an official lenguage) use English. I'd be worried about any general public that reads so poorly that it cannot understand English spelling, but would understand US spelling. Pete.Hurd 20:42, 3 October 2005 (UTC)[reply]
I haven't seen any Canadian Tyre stores. Have you? It's a little more complicated than U.S. and UK, and parts of the UK also have strange things not shared by all of the UK. Gene Nygaard 05:08, 8 December 2005 (UTC)[reply]

This is quite an interesting problem, as it is pretty much totally about style. If dialects are/could be classified as above - en-GB, en-AU etc. - then we have a set of defined dialects. The Wikipedia approach to whole different languages is to have a whole separate set of articles for each language (meaning differing content on Welsh pages to English pages; so someone in Wales could be looking at a totally different article to their neighbour. Good thing? - separate issue). It would be overkill to apply this same approach to different dialects within a langauge, as the only differences are spelling, grammar and phrasing. Therefore, there could be a tag to give a section of text alternatives in each dialect, e.g. [ [ dialect:en-GB|The colours he favoured were considered humourous to her.|en-US|The colors he favored were considered humorous to her. ] ]. Of course this also requires the abililty for a reader to select their preferred dialect; does anyone know if such information is already available in locales etc.? I suppose if a user's language is en and their country is GB then it could be assumed their dialect choice is en-GB. What do people think? Is there a place where ideas such as this can be put forward? --Splidje 12:57, 7 December 2005 (UTC)[reply]

I think this makes a lot of sense. See my proposal here and here.
Nobody really liked it, but then again nobody gave any good reasons why we shouldn't do it.
In essence: this
I have a poor sense of {{EN::humor}} because I'm a pizza.
would render as this
I have a poor sense of humor because I'm a pizza.
or this
I have a poor sense of humour because I'm a pizza.
depending on user preferences. If we want to get fancy, we could even allow personalised dialects, i.e. one may set up my own Pizza dialect where template "humor" renders as "homour" (UK) but "color" renders as "color" (US). PizzaMargherita 21:03, 7 December 2005 (UTC)[reply]
I don't think it's even worth striving for. I'd sooner put up with some variation in spelling rather than having the additional complications in editing, in reading the edit screen, and especially all the clutter that will show up on my watchlist and on recent changes as everybody rushes to add all those silly tags. Plus the strong likelihood of several different robots running amok because they have been poorly designed in an attempt to automate that. Gene Nygaard 05:08, 8 December 2005 (UTC)[reply]
All of these points have been addressed in the discussions linked above, which I encourage you to read.
If clutter in your watchlist is your primary concern, fear not, for that could only decrease. This would be a consequence of the existence of a stable equilibrium. In fact one of the issues that this proposal is trying to address is edit wars on spelling.
As for complications in reading and editing wikitext, do you really think that this
I have a poor sense of {{EN::humor}} because I'm a pizza.
is any more complicated than this?
I have a poor sense of humor&mdash;I'm a pizza after all. (P.&nbsp;Margherita)
Finally, I don't think there's any need for robots. Everything is explained in the links. PizzaMargherita 22:49, 8 December 2005 (UTC)[reply]

I (DerekP) think this would be a waste of time because the use of alternative spellings for English language words is not an issue that is causing real problems in the world; except to people who fuss over non-important issues. If we have tagging, then every word should have to be tagged just in case there is an alternative that the author doesn't know about yet.

{{EN::I}} {{EN::have}} {{EN::a}} {{EN::poor}} {{EN::sense}} {{EN::of}}
{{EN::humor}} {{EN::because}} {{EN::I'm}} {{EN::a}} {{EN::pizza}}.
 

In short, why bother wasting time and effort over small issues before the big issues are dealt with?

And by the way, should that be

{{EN:humor}}

or

{{EN:humour}}

or either?


DerekP 02:12, 15 December 2005 (UTC)[reply]

"the use of alternative spellings for English language words is not an issue that is causing real problems in the world." - I agree that war and hunger are more serious problems, and I accept that they should be dealt with first, but you can't deny that spelling diversity is wasting a lot of time and resources ("correcting", "correcting back", discussing, etc), and that spellings are inconsistent across articles, often within articles.
"every word should have to be tagged" - No, it shouldn't. I'll try to summarise and clarify the proposal in a standalone section.
"should that be
{{EN:humor}}
or
{{EN:humour}}
or either?" - Strange that, spelling inconsistencies in articles are not a problem, but the convention adopted in the template itself (seen only by editors) is. Ok, in that case my answer is that both templates will be introduced and will yield to exactly the same result. PizzaMargherita 17:54, 23 December 2005 (UTC)[reply]
Ok, so you are saying: 1. the problems you have listed exist only in your head and 2. Q and A number 7 are utter rubbish. Fair enough, but can you please explain why? Thanks. PizzaMargherita 16:32, 8 January 2006 (UTC)[reply]
I don't think the MABIO method would take much in the way of resources. There would only be a handful of additional tags per article and the dictionary is unlikely to change very often, meaning that it can be cached or inlined into the code. The MOIO method would be somewhat inefficient if improperly implemented (i.e. scanning every word every time the article is viewed), but all that needs to be done is to scan the text when it is saved and mark all the dialect words. Then it is comparable to MABIO in efficiency. This also brings another possible implementation idea to mind: put a checkbox that automatically scans the article and tags dialect words.Jared Grainger 17:59, 8 January 2006 (UTC)[reply]

In response to Splidje's original question: yes this would be complete overkill, a waste of programmers' time, would complicate wikitext and become a big inconvenience for all editors. Articles will end up in a worse mix of English dialects, as they will end up half-dialectified, while most editors try to work around, ignore, or remove the dialect tags littering them.

If you really want to work on the software, it's probably best to develop an extension to Mediawiki, and discuss it there. I will oppose adding anything like this to English Wikipedia.

Unobtrusively tagging articles as suggested at the very top of this discussion might be a good idea. But why don't we all go improve some articles instead of generating more words about this proposal? Michael Z. 2006-01-8 18:28 Z

"complete overkill" - see QA7, please articulate your argument if you disagree. Any technical feasibility study is more that welcome at this stage, whatever its outcome. On the other hand, opposing the proposal simply saying "It won't work" or "If it passes I'm not gonna comply" is not very constructive.
"a waste of programmers' time" - see QA8
"would complicate wikitext and become a big inconvenience for all editors" - see QA5
"Articles will end up in a worse mix of English dialects, as they will end up half-dialectified" - No, that is the current state. Please explain how the situation will be worse. Thanks. PizzaMargherita 18:47, 8 January 2006 (UTC)[reply]
Complete overkill in that it complicates wikitext. I can't look at an article and just type an addition, without risking it becoming a mix of dialects when someone views it. There are already too many different whiz-bang templates cluttering wikitext, without adding this one which has to be mixed in everywhere. Who's going to take on the task of patrolling Wikipedia, searching out out "tire tread" vs. "I tire easily" and adding the English dialect templates to the right ones? Michael Z. 2006-01-8 22:52 Z
I must insist: you can't tell me that the proposed templates will make the wikitext (which only editors see) more unreadable than it is already. Is "&nbsp;" readable? Maybe not, but sometimes using it is the one and only right thing to do. We should write what we mean: "{en:humour}" means "humour—this word is written in two ways depending on the locale", whereas "Labour Party" means "Labour Party".
"I can't look at an article and just type an addition, without risking it becoming a mix of dialects when someone views it." - I agree, but the same can be said about the present situation, which is strictly worse than the proposed one.
"Who's going to take on the task of patrolling Wikipedia, searching out out "tire tread" vs. "I tire easily" and adding the English dialect templates to the right ones?" - Please note that there is no need to actively and exaustively perform this task. Please read the dynamics of the proposal. The answer to your question is: it will be the same people who are creating these problems in the first place that will do that, along with every good Wikipedian that bumps into such an occurrence. Much in the same way you correct a typo in a random article when you are reading it. I think that a question like "Who's going to patrol Wikipedia?" could have been justified in 2001, but now it sounds a bit silly. PizzaMargherita 23:45, 8 January 2006 (UTC)[reply]

MABIO vs MOIO

In this subsection I have moved discussions about an alternative implementation. PizzaMargherita 11:25, 9 January 2006 (UTC)[reply]

Ok, I suppose a central database of dialectic versions of words would eliminate redundancy. However, there are two opposite approaches to referencing this database. Either the tagging is done on every word that is to be rendered by the "dialect engine", or the tagging is done on every word to be skipped by the engine - e.g.
... was a member of the UK {{nodialect:Labour}} party.
The former approach allows for a more efficient rendering algorithm which just has to be called on finding a tag, but means people have to tag every instance of every word in every article (redundancy); the latter approach means quite a simple job for editors, as all they have to do is exclude the odd word, but it means the algorithm needs to sweep through every word in an article and look up matches in the database. --Splidje 10:10, 13 December 2005 (UTC)[reply]
I see what you mean. I prefer the former approach, because I think the latter is more difficult to implement and it's less explicit in what it does. I don't think the former is such a burden for editors. Also, the latter wouldn't prevent somebody that is unuware of the mechanism to change one spelling to another, with no net effect on the article. I mean, with the latter approach, writing "colour" or "color" doesn't change the end result (provided one has set the locale in the preferences), and so there is still no equilibrium to speak of.
Anyway I think now the real problem is to get enough people to agree that some form of tagging for dialects would be welcome. Then we can worry about the details. PizzaMargherita 12:06, 13 December 2005 (UTC)[reply]
True. How does one go about doing that? Does wikipedia / mediawiki (this is a property of the software) have a mechanism for rallying support behind something? --Splidje 13:33, 13 December 2005 (UTC)[reply]


Regarding the discussion about two possible methods of implementation, I believe that only marking words that should have their spelling "forced" is preferable because it is much simpler for the editors.

Advantages:

  • Every article is automatically affected.
  • No need to worry about which marking words with variable spelling.
  • Editors can use their preferred spelling without worrying about it. They can write colour or color when they edit but it will appear in the dialect of the user when it is displayed.
    • In fact, this makes the feature mostly transparent as each editor edits and views in his/her preferred dialect. Many of them won't even realize it! This is especially good for new editors. It also reduces conflict.
  • The spelling in an article will always be consistent.
  • Only the most die-hard "spelling partisan" will go to the trouble of explicitly forcing every word to be in his/her preferred spelling.


Disadvantages:

  • Some words that should only appear using a particular spelling would be inadvertently changed, although I'd imagine most of these words would be capitalized or tagged to begin with and could be automatically ignored.
    • This fact might go unnoticed by some editors because of the transparent conversion.

An option to show spelling variations in the preview would be useful for catching any problems with this system.


Disadvantages of the other method (explicitly marking words with variable spelling):

  • There are many spelling variations and keeping track of and marking them all is a chore. The editing process becomes much more complicated.
  • Many people will not even be aware of such a feature, or simply won't bother marking their words.
    • Because of the above two points, many articles will still have inconsistent spelling.
  • Spelling partisans won't cooperate anyway.
  • Every article needs to be updated. This is a monumental task.

Advantages:

  • Where it matters (i.e. when a word MUST be spelled a particular way) the spelling in articles stays the same unless someone manually edits them.

Finally, some more thoughts on the subject in general: Although writing about U.S. subjects in American English (for example) might seem like a good standard for selecting a spelling dialect, it really doesn't benefit the reader. Most people would prefer to read words in the way they are accustomed to seeing them, regardless of the subject matter. The advantage of having selectable dialects is that every reader gets to view the page as they prefer to see it. It also allows people to read an article using unusual (for them) spelling in case they feel it makes the article more colouful or they are interested in what the differences are. I don't think it's that difficult to implement and automatic dialect selection based on IP address is easy. Jared Grainger 06:38, 8 January 2006 (UTC)[reply]

Hi Jared, thanks for your thoughts. I agree with many things you say, but I think you missed an important point. The "mark-only-invariant-occurrences" (MOIO) implementation (e.g. UK {nodialect:Labour} Party) does not lend itself to a gradual introduction. I.e. when it's introduced, a lot of pages that do not have the "nodialect" tagging will instantly change from correct to broken. Ok, on the other hand a lot of the articles will instantly change from "debatable" to "consistent", but I don't think it's worth it, and it would be a drastic change that, like with robots, I think it's better to avoid. With the other "mark-all-but-invariant-occurrences" (MABIO), the WP will naturally evolve from the current state to a better one. And as I think you mentioned, in the MOIO scenario, if a UK editor writes "Labour party" without tags, s/he doesn't realise that it's a problem, because that's what he would read back. However, all non-UK readers would see "Labor Party", which is not just debatable, it is plain wrong.
Anyway, as I said, what this proposal needs now is support. Once we have that, we can start looking into the implementation in more detail. PizzaMargherita 11:57, 8 January 2006 (UTC)[reply]
As I said above, the software could be set to ignore words in certain styles (e.g. capitalized, tagged, italics) which would probably take care of 99% of the problems. The multi-dialect preview option would also help in this regard. I believe the advantages of MOIO clearly outweigh those of MABIO if the main problem of MOIO (words that must be invariant, which is also MABIO's main advantage) can be resolved. Would you agree on that point?
Additional research into words that must be invariant would be useful.
Regardless of how it is implemented, I am in favour of some way of converting dialectsJared Grainger 17:59, 8 January 2006 (UTC)[reply]
As I said, I'll be happy to continue this debate on the particulars of the implementation once we have a general agreement that 1. we have a problem and 2. it's a problem worth solving. I think some people are in denial right now. Once again thanks for your ideas, keep them coming. PizzaMargherita 19:11, 8 January 2006 (UTC)[reply]
Well then, perhaps you should start a another poll directly above your original one. Something like "Show your support for or against the displaying of pages in the user's dialect, assuming the details can be worked out before it is implemented." That would probably get more people to vote since it leaves the technical issues aside. Then they can proceed to the more specific poll if they feel they understand the technical issues well enough. Jared Grainger 19:32, 8 January 2006 (UTC)[reply]


In response to Splidje's original question: yes this would be complete overkill, a waste of programmers' time, would complicate wikitext and become a big inconvenience for all editors. Articles will end up in a worse mix of English dialects, as they will end up half-dialectified, while most editors try to work around, ignore, or remove the dialect tags littering them.

If you really want to work on the software, it's probably best to develop an extension to Mediawiki, and discuss it there. I will oppose adding anything like this to English Wikipedia.

Unobtrusively tagging articles as suggested at the very top of this discussion might be a good idea. But why don't we all go improve some articles instead of generating more words about this proposal? Michael Z. 2006-01-8 18:28 Z

This is what I expected to hear eventually. The programmers have spoken and they don't want to bother with it. Oh well, no use discussing it any further I guess....
However, you didn't comment on the "transparent" MOIO method. What are your thoughts about that implementation method (besides the fact that you don't like the idea in general), Mr. Z programmer? Jared Grainger 19:00, 8 January 2006 (UTC)[reply]
Am I Mr. Z programmer? If by transparent, you mean that all text gets automatically converted, that would a very bad idea. How does the server determine reliably whether some text is a direct quotation or a proper name, or not? Single quotation marks, double quotation marks, and italics can all be used to mark quotations, and for other uses. How does the server know which sense of "tire" or "curb" is being used? Michael Z. 2006-01-8 22:52 Z
I think Michael hit the nail on the head here, it looks like any automation is potentially troublesome due to homographs. PizzaMargherita 23:45, 8 January 2006 (UTC)[reply]


Micheal, you might not have read what I (Jared Grainger) wrote about this above. The idea is that the software ignores anything "special," such as capitalized words, italics, words in quotes, etc. which would probably be correct the vast majority of the time.

The few problems that arise with homonyms can be corrected with tags whenever someone spots them and the editing page could optionally display dialect words and their alternatives to quickly check this when editing old stuff or writing new stuff. True, a few errors will crop up in older articles but they will be edited away with time and it's not like the articles out there are flawless anyway.

Your example about tire/tyre being used as a verb would make an interesting case study. How many times is tire ambiguosuly used as a verb in WP? I'd guess that most articles are written in the past tense, so they'd be tired but tyred isn't a real word. Some made up examples:

  • He grew tired of the war...
  • ...was quoted as saying "this tires me."

Neither of those would cause a problem with software that was properly written because tired wouldn't be in its dictionary and it would notice that tires is surround by quote marks.

To test this further, I searched wikipedia with Google for "tire" and scanned the summary text of the first 100 matches for possible problems. Here are some direct excerpts:

  • Tire irons are also...
  • The Goodyear Tire & Rubber Company...
  • Canadian Tire is a Canadian retail...
  • Tyre (tire) characteristics...
  • ...such as tire (tiri-) [an article on Elvish]
  • ...when the eye's photoreceptors, primarily those known as cone cells, "tire" from the over stimulation...
  • ...words like tire and jail ... [talking about differences between dialects]

Note the Google summaries don't include italics so there were one or two that looked like they would have caused a problem but were actually italicized in the article.

  • tire was only used as a verb once, and it was in quotes
  • Most editors correcly italicized words when referring to the word itself (e.g. tyre is spelled tire in the U.S.)
  • Tire was always capitalized when used as the name of a company


The only problem I saw was when the word was at the beginning of a sentence (and therefore capitalized) but again, software could easily handle this situation as there's no proper name composed of the single word Tire (i.e. it's always "Tire Kingdom" or something like that). Therefore, dialect words that are capitalized at the beginning of a sentence and are not followed by another capitalized word (e.g. Tire chains are a...) would be treated as a simple dialect word.

In conclusion, out of 100 articles, I only saw one instance where my software design would have made a mistake: "tyre (tire)." Regardless of the user's dialect settings, this would be simple to spot and correct. They'll see "tyre (tyre)" or "tire (tire)" and realize that it's an old article and will remove the (tyre)/(tire part. A tag isn't even required in this case.

I also did a quick check on tires before posting this and the results were similar. I only checked the first 50, but I only found one problem phrase: "tires ('tyres' in the UK)."

So there is some data that supports my belief that simple software rules will work almost all of the time and that the few mistakes that slip through will be easily spotted and corrected with a single tag.

Can anyone find a word that is commonly used in WP in situations that would be erroneous depending on the dialect? Remember, it doesn't count if...

  • it has special formatting
  • it is part of a quote
  • it meets the criteria I mentioned above for words at the beginning of a sentece.

...because my software design would skip such words.

I get the impression that there are a lot of closed minds involved in this debate, but I always try to keep an open mind. So if anyone can find a major flaw in my idea that cannot be easily remedied with simple software I will be the first to abandon it. Jared Grainger 01:07, 9 January 2006 (UTC)[reply]

I'm mildly in favour of automation, but only as an optional editing tool for new material, and in a MABIO framework. In other words, as I think you said, when one is editing we can have a checkbox saying "scan the diff for variant words and mark them for me if they don't have special formatting", and in the preview they could be shown marked, or something. So if I write "Joe did some labour for the Labour Party", and I choose to check the checkbox (i.e. nothing automatic is happening behind my back, which I would be against), then the preview would show "Joe did some {en:labour} for the Labour Party". On the other hand, if you write "Labor has been underpaid." and use the tool, the tool mis-identifies this occurrence as an invariant. At that point, the only option for the editor is to explicitly type 7 more characters: "{en:Labor} has been underpaid." Even so, frankly I'm not convinced that saving typing 7 characters/occurrence is worth the effort and if it deserves to clutter the edit page with a third checkbox. PizzaMargherita 07:56, 9 January 2006 (UTC)[reply]
First of all, you apparently aren't reading what I write because I wrote a whole paragraph explaing why capitalized words at the beginning of are easy to fix in nearly every case.
Secondly, you're talking about automation but only in your framework (MABIO) from your POV and then you start disparaging it, leading to reader confusion about my method (MOIO).
Thirdly, "clutter the edit page with a third checkbox???" I'm sorry, but I find this ridiculous. I offered this idea as a compromise of sorts because I believe your view has some merit, even though I find it inferior to mine.
BTW, thanks for taking care of all the chores of moving stuff around and reorganizing. Jared Grainger 20:45, 9 January 2006 (UTC)[reply]


  • "the vast majority of the time"
  • "True, a few errors will crop up"
  • "out of 100 articles . . . I only saw one instance"
  • "I only found one problem phrase"
  • "will work almost all of the time"
That just doesn't seem good enough to me. The software you envision will work right, say, 99.9% of the time. You also say that it "ignores anything 'special,'", so in some percentage of cases it will not change a dialectic word when it ought to (to name one example, when quotation marks indicate a translation instead of a quotation). So the software would change the dialect of articles, with some failures and some false positives. Plus some articles will be only partly tagged for dialect. Plus others will be untagged. Plus wikitext gets peppered with a new template, and to set these correctly editors have to be familiar with several English dialects and understand the complex logic by which the template fails.
Wikipedia's English gets less consistent. Editing plain text becomes more complex. Editors get to argue about what is correct U.S. English, and U.K. English, and Canadian English, and Indian English . . . What benefit do we get for this cost?
Let me put this another way: I'm Canadian. For every Canadian editor dialectifying articles, there are probably ten writing them. For every Canadian editor writing, there are probably twenty writing articles using other English dialects. The dialectifier can tag maybe 0.01% of all articles? Cost of added dialect tags in running text: more than zero. Benefit to me: practically zero. Resulting cost/benefit ratio of this scheme: practically infinite.
Or another way: you say your software idea will almost never change the sense of a sentence. That's not good enough. I'd rather read an article which is simply written in New Zealand English, than one that's 99.99% reliably machine-converted to Canadian English. I don't want the server changing the sense of a sentence or paragraph ever!
Or another: can somebody point to an example of such software that works well? Michael Z. 2006-01-9 08:03 Z



  • "the vast majority of the time"
  • "True, a few errors will crop up"
  • "out of 100 articles . . . I only saw one instance"
  • "I only found one problem phrase"
  • "will work almost all of the time"
That just doesn't seem good enough to me. The software you envision will work right, say, 99.9% of the time. You also say that it "ignores anything 'special,'", so in some percentage of cases it will not change a dialectic word when it ought to (to name one example, when quotation marks indicate a translation instead of a quotation). So the software would change the dialect of articles, with some failures and some false positives. Plus some articles will be only partly tagged for dialect. Plus others will be untagged. Plus wikitext gets peppered with a new template, and to set these correctly editors have to be familiar with several English dialects and understand the complex logic by which the template fails.


I think maybe you're confusing the two methods. My method (Christened MOIO by Pizza) doesn't use much in the way of tags or templates except when necessary to force a word to be ignored by the automatic conversion. (internal "tags" can be used for processing efficiency but this is completely transparent to the editors/readers.) In my research, as shown above, NO tags would have been required because the only "mistake" it would have made was when someone was attempting to show the difference in dialects (e.g. some wrote "tyre (tire)") Such phrases are obsolete with dialect conversion and can be removed. Any kind of tags or special editing considerations would be very rarely necessary in my design. Jared Grainger 20:07, 9 January 2006 (UTC)[reply]


Wikipedia's English gets less consistent. Editing plain text becomes more complex. Editors get to argue about what is correct U.S. English, and U.K. English, and Canadian English, and Indian English . . . What benefit do we get for this cost?
Well, I disagree that consistency would decrease and that and editing plain text would become more complex. With transparent conversion the user writes plain text (simple) and the reader see it in his dialect (consistent, unlike now where there are many different dialects and sometimes mixtures). In the event of a dialectical error, which should be rare in my estimation, someone adds a tag to force the word to be ignored by the dialect engine. I'm sure you'll disagree, but I can't help that.
However, I don't understand why you think this would cause more arguments about correct forms of English. Could you please explain, preferably with an example of some sort?
What benefit do we get for this cost?
Well, here are the benefits as I perceive them:
  • Almost a complete end to conflicts over spelling
  • Consistent spelling over all of WP, not just consistent within an article.
  • More comfortable reading for users
  • Articles look more professional (some people think dialects of other countries are "crude").
  • People are less likely to be "turned off" by "misspelled" words
  • The ability to view an article in a different spelling dialect for fun or educational purposes. I can learn about many different spelling dialects just by switching the page. Maybe I'm planning a trip or moving to another country and want to get used to alternate spellings.
And the disadvantages:
  • Additional resources required
  • Errors will be introduced into articles
All other disadvantages (complexity, etc.) stem from these errors. If the number of errors is small enough, and they can be handled easily enough, then I believe the advantages outweigh the disadvantages. Jared Grainger 20:07, 9 January 2006 (UTC)[reply]


Let me put this another way: I'm Canadian. For every Canadian editor dialectifying articles, there are probably ten writing them. For every Canadian editor writing, there are probably twenty writing articles using other English dialects. The dialectifier can tag maybe 0.01% of all articles? Cost of added dialect tags in running text: more than zero. Benefit to me: practically zero. Resulting cost/benefit ratio of this scheme: practically infinite.
Or another way: you say your software idea will almost never change the sense of a sentence. That's not good enough. I'd rather read an article which is simply written in New Zealand English, than one that's 99.99% reliably machine-converted to Canadian English. I don't want the server changing the sense of a sentence or paragraph ever!
Okay, that's fair enough. Clearly our opinions are irreconcilable. I believe that consistency, reducing conflicts over spelling dialects and making reading more comfortable is worth the possibility of a few errors.
I would like to point out that spelling variations rarely affect the meaning and even when it does happen it's almost inconceivable that the meaning would really be changed. The result might be technically incorrect, but should still be understandable. I believe that far more errors in grammer, spelling, and meaning are produced by humans in the course of writing an article than would be produced by transparent conversion of spelling dialects. But again, that's just my opinion and obviously you disagree.
But here's something different: I don't know of any words that are likely to cause serious problems for my software design, but what do you think about a limited scope version for "very safe" words? I highly doubt there would be any problems with words like capitalisation and capitalization. Would this meet your expectations of 100% accuracy? Jared Grainger 20:07, 9 January 2006 (UTC)[reply]


Or another: can somebody point to an example of such software that works well? Michael Z. 2006-01-9 08:03 Z
Would that change your mind? There may be programs out there or a simple prototype program could be written. Would you be in favour of transparent conversions if a demo that "works well" could be found/produced? What about you, Pizza? Jared Grainger 20:07, 9 January 2006 (UTC)[reply]


I'm increasingly of the idea that any attempt to automate tagging (be it with robots or MOIO) is not only risky, but it's also missing the nature of the problems we are trying to solve. As I see it, the overall goal is not to make WP spelling consistent overnight. The goal is to set a point of equilibrium to which everybody can help converge to (if it makes any sense at all to speak of equilibrium in WP), at the same time settling all dialect disputes instantly. PizzaMargherita 12:08, 9 January 2006 (UTC)[reply]
I already discussed the advantages and disadvantages of the two methods above so I won't reiterate them here. Your idea isn't bad, and it would definately help somewhat, but I don't think it's the best solution and it could backfire.
You aim for gradual acceptance, but what if your MABIO tags are introduced and subsequently become unpopular? A year later they might even become "deprecated" and discouraged from use, only provided for backwards compatability.
With my method, the articles don't even need to be "unrolled" if MOIO falls into disfavour; only the article rendering software needs to be disabled because all spelling conversions are done automatically and the original text with the original spelling is still there! Only the rare invariant tags would remain in a few articles. Jared Grainger 20:07, 9 January 2006 (UTC)[reply]
When discussing efficiency, which I think would be another problem (real or perceived) for MOIO, you suggested:
The MOIO method would be somewhat inefficient if improperly implemented (i.e. scanning every word every time the article is viewed), but all that needs to be done is to scan the text when it is saved and mark all the dialect words.
Now, this suggestion is what I would call computer-aided MABIO, which I don't oppose. On the other hand, I agree with Michael that any demonstrably fallible automation (which pure MOIO, your research tells us, clearly is) should not happen, or at least not unbeknownst to the editor - hence the optional checkbox.
As I explained when replying to Michael, the possibility of adding "tags" for efficiency is transparent and completely internal so that not every word needs to be checked when converting dialects purely for efficiency reasons. No human would ever see those "tags."


As for either implementation "falling into disfavour", MOIO would be affected in the same way as MABIO, because as you say you would be left with invariant tags. For both implementations, however, the "problem" can be immediately resolved with robots, because the wikitext at that point is tagged, and there can be no replacement errors. PizzaMargherita 20:39, 9 January 2006 (UTC)[reply]
You are correct.

This brings to mind another advantage of my method: it can be easily deployed on a "trial basis" covering as few or as many articles as desired and just as quickly unrolled.

Imagine this sceneario: one day a new item appears in eveyone's preference allowing them to test the new "transparent spelling dialect conversion" feature, along with a wiki explaining it. People who don't want to use it don't have to change their preferences as it will be disabled by default and unregistered users won't even have the option. Those people still see articles in the dialect in which they were originally written. Comments are gathered over time and the community gives their feedback on the new system.

The beautiful part is that it is all done in software and doesn't affect the original articles. Rolling it back would be instant (just disable the option in user's preferences) and no work would be lost because people didn't have to manually tag articles in the first place, as with the MABIO method OK, the effort required to create a handful of invariant-word tags would be wasted but that's nothing comparing to losing thousands of tags on every dialect word that the MABIO method requires if it needs to be rolled-back.

How can you deny that this is a good idea? Especially when you add in all the other advantages/disadvantages I wrote about earlier. These points can be added to my list:

MOIO

  • Disabled by default
  • Covers every article automatically
  • More consensus/community friendly. Everyone has a chance to try it and comment about it
  • Passive, people can continue edit as normal without participating in the system.


MABIO

  • No way to let users "try before they buy" -- tags must be implemented first and many articles must be changed before people have a chance to sample the system.
  • Active participation required on the part of all editors or the results get in a muddle.
  • If it fails, everyone who spent their time tagging dialect words will feel cheated.

If after reading the above, the two other people involved in this discussion continue to insist that "provably fallible software is bad regardless of the success rate" and "we need to gradually reach equilibrium" and no one else joins in the discussion then I will give up. Jared Grainger 21:38, 9 January 2006 (UTC)[reply]

Wikipedia rule?

This is in the Manual of Style… so what’s the rule for deciding which spellings to use?

The guidelines are here. Also see the "Tagging Pages" proposal above. PizzaMargherita 10:02, 23 December 2005 (UTC)[reply]

IANA language tag

Can this be defined on the project page? Maurreen 07:40, 19 Feb 2005 (UTC)

See [1] and IANA. Language tags are used frequently in HTML files. "en" is the tag for English (all varieties).
Used in en.wikipedia.org, for example. Nobbie 10:41, 19 Feb 2005 (UTC)

U.S.

I'm not sure "the official standard of the U.S. administration" is the best phrasing. How about "used by the U.S. government"? Maurreen 07:42, 19 Feb 2005 (UTC)

Name of article

This should not be a subpage of the Manual of Style, as it is just guidance on what spellings are predominant in different territories. In fact, I tend to think it should be in the main article namespace, rather than the Wikipedia namespace. I can't think of a snappy name though: Comparison of spellings in different forms of standard English does not really have a ring to it. Kind regards, jguk 12:58, 19 Feb 2005 (UTC)

I'll suggest this again, as it was never intended that this page should become an opportunity to get into nationalistic arguments on spelling. I propose moving this page to Comparison of spellings in different forms of standard English in the main namespace. Any comments? jguk 20:38, 23 Mar 2005 (UTC)
That's ok with me. I created this page and put it into the Wikipedia namespace because I was "inspired" by the discussion on the talk page. The name you propose is... um. I can't think of a better name, either. Nobbie 16:41, 25 Mar 2005 (UTC)

Australian spelling

Not sure where the creator of the spelling chart got his information for standard Australian spelling. I have fixed the Australian spellings for English words. – AxSkov 08:54, 24 Feb 2005 (UTC)

Other things

Shouldn't this page have

  • whether countries call the little dot at the end of sentences a period or a full stop?
  • the names for the letter Z (zee, zed)?
  • whether people say "spelled" or "spelt" (and similar differences)?
  • whether the past participle for get is got and/or gotten (both are used in American English)?
  • the date orders used by countries (such as March 23, 2005 or 23 March 2005 for the date I posted this)?
  • whether groups are referred to as singular or plural entities ("The company is having its anniversary party" or "The company are having their anniversary party")?

This page could use to be more thorough. The American and British English differences page would be a good template, but I'd like a page comparing more than two dialects. --/ɛvɪs/ 20:09, Mar 23, 2005 (UTC)

Aluminium/Aluminum

Just a clarification here... do the normal British/U.S. English rules apply in the case of aluminium vs. aluminum, considering that aluminium is the internationally recognised spelling preferred by the IUPAC?

I ask because there is a minor dispute concerning this on the Powerbook page, for a product made by a company founded in the U.S., with an international market (Apple). I can see why one spelling must be arbitrarily selected where it is explicitly British vs U.S. spellings, but this is an example where there is a more "neutral" spelling. Any comments? StuartH 06:00, 28 July 2005 (UTC)[reply]

Perhaps like sulfur. — Instantnood 17:29, August 22, 2005 (UTC)

Spell article titles according to most google hits?

Is there a policy for US/British spelling of article titles when one spelling yields many more hits than the other spelling? (This is regarding a discussion at Talk:Behavioural genetics; the US spelling (behavioral) has 5.5 times the number of google hits) --Nectar T 08:09, 11 September 2005 (UTC)[reply]

Wikipedia policy is explicit in allowing all forms of standard English - that is, British spellings are permissible. The corollary of this is that we do not use the number of google hits (alone) to decide these things, as google always prefers American spellings and American usage over other standard forms of English, jguk 20:30, 13 September 2005 (UTC)[reply]
One problem with that is that the 5.5 times as many internet users searching for the dominant spelling will not come across the Wikipedia article. That is, for every 100 people searching for this topic, less than 20 would find the Wikipedia article.--Nectar T 08:03, 4 October 2005 (UTC)[reply]
Wikipedia is not for the exclusive use of US citizens, why pass a rule to exclude the rest of the world. Redirects will show whatever the other spelling is, this is a non-problem. As a professional scientist working in this area I get used to searching in an intelligent way, I'd guess that half the scientific literature uses the English spelling. If we were pass a decree that US spelling was the official spelling of wikipedia then the rest of the world would be given the finger. I don't like that. Pete.Hurd 19:12, 4 October 2005 (UTC)[reply]

Where a word can be spelt in different ways - or where different worlds are used for the same thing (courgette (wikipedia entry 12th on google.com) v zucchini (wikipedia entry 7th on google.com), etc.) - then WP policy says we make use of a redirect, which helps search engines find things anyway, jguk 19:41, 4 October 2005 (UTC)[reply]

These are the right arguments, but in this case the spelling not used doesn't show up within Google's top 110 entries (I stopped there). A page being invisible to 80% of google searchers is a practical problem, as is a page being invisible to 20% of google searchers. (These numbers assume the groups using different spellings create web pages at the same rate). I wonder if some day WP software can be made to host the same page with titles in both dialects etc. (Google gives weight to a page's title).--Nectar T 03:42, 5 October 2005 (UTC)[reply]

Wikipedia can't expect to be top of the list for every single article - and our article on behavioural genetics is only three paragraphs long, so that many of the higher-rated articles on google are far more illuminating on the subject than the wikipedia article - so I'm not concerned about the google-rating on this one. Hopefully the WP article will improve over time, at which point its google rating will probably improve greatly too, jguk 19:32, 5 October 2005 (UTC)[reply]

Plants/Animals - American or International English?

The article for rose has had a few reverts of the spelling for color/colour. The discussion for that particular issue is here: Talk:Rose#American_vs._British_spelling, but it is a wider issue from what I can see. The only guidelines that I have found are this page and Wikipedia:Manual_of_Style#National_varieties_of_English. Neither really cover which is the prefered spelling for a fairly neutral topic like plants. Iris (plant), Poinsettia, and Rhododendron use British English, but Azalea, Orchid, and Tulip use American English. There isn't much rhyme or reason and the lack of a clear guideline is a drag. Please comment. Cacophony 03:21, 23 September 2005 (UTC)[reply]


Let me suggest that sticking to the rule of following the original article's spelling convention is a lot easier than thousands of parallel debates about which alternative is more logical for each of many many many different topics across wikipedia. If I were emperor I'd make a different rule, but there are good reasons that I won't become emperor. Pete.Hurd 04:13, 23 September 2005 (UTC)[reply]