Jump to content

Wikipedia:WikiProject Punctuation: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Shell Kinney (talk | contribs)
Dump files to process: completed 351000-353000
Line 51: Line 51:


*[http://www-2.cs.cmu.edu/~tom7/periodbot/349000-351000.html 349000-351000] - [[User:Cjpuffin|Cjpuffin]] working
*[http://www-2.cs.cmu.edu/~tom7/periodbot/349000-351000.html 349000-351000] - [[User:Cjpuffin|Cjpuffin]] working
*[http://www-2.cs.cmu.edu/~tom7/periodbot/355000-357000.html 355000-357000]
*[http://www-2.cs.cmu.edu/~tom7/periodbot/355000-357000.html 355000-357000] [[User:NatureBoy|NatureBoy]] working
*[http://www-2.cs.cmu.edu/~tom7/periodbot/357000-359000.html 357000-359000]
*[http://www-2.cs.cmu.edu/~tom7/periodbot/357000-359000.html 357000-359000]
*[http://www-2.cs.cmu.edu/~tom7/periodbot/361000-363000.html 361000-363000]
*[http://www-2.cs.cmu.edu/~tom7/periodbot/361000-363000.html 361000-363000]

Revision as of 18:29, 28 June 2005

Project Punctuation is a project to fix missing punctuation in Wikipedia articles.

Goals

This project exists to correct common typographic and grammatical errors in Wikipedia. The errors are discovered automatically by software crawling offline dumps. Because these errors are difficult to recognize and correct automatically, potential errors are collected together into lists that are processed manually by volunteers. Fortunately, this task is very easy for humans.

Project status

As of June 2005, we have a first analysis complete and manual processing is underway (about 46% complete as at 23/6/5). This analysis detects the lack of punctuation at the end of a paragraph of text.

How to help

The output of the analysis is sent to a series of dump files, which, because of their size, are not stored on Wikipedia. To help, choose one of the dump files from the list below. Go through the entire dump file, and fix all of the articles that need help. When you're done, edit this page to remove it from the list, so that nobody tries to duplicate that work.

In order to help people find this project, consider using an edit summary like the following: missing period ([[Wikipedia:WikiProject Punctuation|You can help!]])

Fixing articles

The dump files appear as a series of article titles (which are links to their Wikipedia pages), each followed by a list of paragraphs that may have problems. These are found by a computer program, so not every article or paragraph that appears needs to be fixed. Also, because the dumps are based off the last downloadable version of Wikipedia (currently 16 May 2005), someone may have fixed the mistakes independently, or the article may not even exist any more. However, the rate of actual errors is very high, so there is lots to do.

Here's what to fix:

  • If there is a paragraph of English text that does not end in punctuation, add it. Exceptions:
  • Items in a list. Most are filtered out, but some are formatted using paragraphs and markup; these are usually easy to spot because many list items appear in the dump
  • Paragraphs that end with See also: [link] — it is standard Wikipedia practice to omit a period for "see also", "main article" and similar (many are filtered out)
  • Paragraphs that end with parenthetical citations or links (but the previous sentence should end with punctuation!)
  • Quotations, with an attribution
Example: "Brevity is the soul of wit." — William Shakespeare
  • Paragraphs that end in a parenthetical remark, with internal punctuation. This is bad style, but we are only attempting to fix incontrovertible mistakes (usually filtered out automatically)
Example: The king commanded him to leave (but he didn't.)
  • Paragraphs that end in an abbreviation or word that contains a period. This is also bad style, but not incontrovertibly a mistake
Example: Many English Wikipedia editors come from the [[U.S.]]
  • However, you should fix errors of including the period within the [[ ]] links, if it does not belong
Example: Links to articles that don't exist will be [[red.]]
Example: The link to an article should not include a [[punctuation|period.]]
  • Sentences that are not ending, because they are followed by a list, equation, table, or image (many of these are filtered out)
  • Sentences that are incomplete. If it's not obvious how to repair them, you can leave a message on the Talk page asking for help
Example: In 1926, the president declared
  • Missing punctuation on text that is otherwise badly broken. If a page is marked for cleanup, you might just leave it for the person that ultimately cleans it up. If it's not marked, you should consider marking it with {{cleanup}}
Example: nascar is totaly sweet & its generally agreed the cars are very very fast
  • Sometimes you will also catch stray characters that have been inserted, or broken Wiki markup. You should fix these too.

Generally, if you're not sure it's a mistake, or covered in the scope of this project, don't fix it. There are plenty of real mistakes to tackle first.

Dump files to process

These are the files. The numbers are article id ranges (Wikipedia has about 650k), and each dump file contains a varying number of errors (roughly 40 hits, about half of which will be real errors). You can edit this page to indicate to others that you are working through a dump file (by adding to a row: - ~~~ working). Don't forget to remove the link here once you process it!

Status:

Complete: #0–238,999. (11:39, 18 Jun 2005 (UTC))
Active: #239,000–369,000
Remaining: #369,001–650,000 (to be posted when we finish the current batch)


Other ways to help

Missing punctuation continues to be a problem in new articles, so I will continue to run periodbot occasionally. Suggestions for patterns to automatically filter out are appreciated, since we don't currently do anything to avoid seeing false positives again and again.

I am also interested in other analyses to run.

Participants

Feel free to add your name (use ~~~) to this list if you have helped process the dump files.

Similar WikiProjects