User:Citation bot: Difference between revisions
→External links: New section. |
m Quick-adding category Wikipedia bots (using HotCat) |
||
Line 124: | Line 124: | ||
==External links== |
==External links== |
||
* [http://code.google.com/p/citation-bot/ Source code repository] |
* [http://code.google.com/p/citation-bot/ Source code repository] |
||
[[Category:Wikipedia bots]] |
Revision as of 00:36, 23 March 2010
This user account is a bot operated by Smith609 (talk). It is used to make repetitive automated or semi-automated edits that would be extremely tedious to do manually, in accordance with the bot policy. Administrators: if this bot is malfunctioning or causing harm, please block it. |
User interaction
Activate Find out how you can use the Citation bot on your own pages here.
|
Bugs Please report any bugs, ideas or suggestions, at Google Code (or failing that, here). You can check out the bot's source code from its subversion repository.
|
Emergency shutoff
Administrators: Click here to understand how to block this bot with minimal disruption.) Non-administrators can report misbehaving bots to Wikipedia:Administrators' noticeboard/Incidents. |
Function summary
This bot was originally designed to add Digital object identifiers (DOIs) to references; it now does much more, adding PubMed Identifiers (PMIDs), and ISBNs, and fixing common formatting errors. Ideas for new functions are welcomed.
The bot periodically works through every page using citation templates on Wikipedia. If you are interested, http://toolserver.org/~verisimilus/Bot/DOI_bot/progress-doibot.php?date=20090101 has stats on its progress. Remove the 'date' argument to see its progress in all time. Note that the bot only operates automatically when there are no outstanding bugs - automatic (or operator-supervised) edits are prefixed by [Pu##] here.
Stopping the bot from editing
- To prevent the Citation bot from editing a page, include the text
{{bots|deny=Citation bot}}
anywhere on the page. Please also leave a note here explaining why the action has become necessary, so that it can be resolved! - If the bot is erroneously adding a DOI, author, etc to a citation, and you want to stop it adding the data again, you need to put a comment in place of the appropriate parameter – because the bot will not overwrite existing data. So use something along the lines of
|doi = <!-- this comment stops Citation bot adding the wrong DOI here-->
or words to that effect. Again, it may be possible for me to fix the underlying problem if you let me know about it – but there are a few, rare instances (such as false positives and editor preference) where it is impossible to implement an automatic fix.
False positives
If the bot is adding seemingly-unrelated data to a citation, it is probably receiving a false positive from the citation databases it consults. Unfortunately, there's no way for the bot to know this, so there are two ways of avoiding it:
- Change the citation template to one which the bot doesn't modify, such as cite web, cite news, etc;
- Add a comment into one or more of the parameters - these comments will not be over-ridden by the bot, and will reduce the chance of the citation databases throwing false positives.
Capitalisation errors
See User:Citation_bot/capitalisation_exclusions.
- The bot is also incorrectly changing pmid author initials from uppercase to titlecase, e.g. Smith JQ is being changed to Smith Jq while rendering {{cite pmid}}. This will soon be fixed.
Reading the edit summaries
To assist debugging, the bot's error summaries begin with a code in [square brackets]. This identifies how the bot was initiated (letter), and what revision of the code was used (number). When major development is underway, the publicly accessible interface to the bot may use an older version of the code that has been established to be bug-free.
- Pu - Initiated from the server. May be operating supervised or unsupervised.
- U - Initiated by a user
- Ax - {{Cite arXiv}} maintenance, activated when blank template detected
- C - {{cite doi}} family maintenance, activated when blank template detected
If a bug is marked as 'fixed in r50' and you notice the bug in an edit beginning [U40], then there is no need to report the bug again. If you see it in an edit starting [Pu60], however, then please do report that it wasn't fixed as expected.
Function
Automatic or Manually Assisted: Automatic
Programming Language(s): PHP w/ Snoopy & BasicBot
Function Summary: Maintains and expands citations; ensures standards are complied to.
Edit period(s) (e.g. Continuous, daily, one time run): Will do a thorough job every few months; will be available to be used on specific articles whenever requested.
Edit rate requested: 6 edits per minute. In reality the querying of other websites will be the rate limiting step.
Function Details: Citation bot only amends the parameters of {{cite journal}}, {{cite book}}, {{cite arXiv}} and {{citation}}.
- Adds a DOI if missing
- Replaces "id=PMID #" and "id=DOI #" with "pmid=#" and "doi=#"
- Replaces "url=http://dx.doi.org/#" with "doi=#"
- Translates all parameters (not values) to lowercase (they won't show up in the output otherwise), and replaces "authors" with "author" (common typo)
- Removes "doilabel" parameters – these are now redundant
- Searches for all missing parameters (including URL), then adds them if available. This is especially convenient when only the PMID/arXiv/DOI is included within the template
- If a URL is already present, attempts to deduce its format (e.g. free full text, abstract, deadlink) and sets the format parameter accordingly
- If a URL is not present, follows the DOI link; if it can deduce that free access is available, sets the URL parameter to the landing page with a note on the format
- Where the {{cite doi}} template has been used, creates or expands the accompanying reference.
- Automatically expands multi-use template using the {{cite doi}} and {{cite pmid}} templates
DOI location Logic
- The bot uses a variety of methods to locate a DOI, in the order stated:
- Search CrossRef for citation information based on available citation details
- Use the
url
parameter.- Search for a DOI within the
url
parameter. - Check that the url is active (not a 404 page containing an example DOI)
- Check the metadata of the web page linked to by the
url
parameter for a DOI - Scour the page source for a DOI
- Search for a DOI within the
If there is no URL provided, use the Yahoo! API to search for"title" + authors
.
With the retrieved URL:Does the URL contain a doi? (e.g. http://example.com/view=article&id=10.1001/doi/ishere)If so, does the page contain data telling us we've got the right title?
Sites that I've seen with DOIs in the URL are only BIOONE and Blackwell publishing. The former of these encodes the title in an invisible span.
- Do the <meta> tags contain a dc.Identifier or citation_doi?
- If so, check the dc.title or citation_title matches the title we want.
- Is there a DOI in the page, anywhere?
- Are there lots of DOIs?
- Do any occur in association with the title? If there are any <code><br>, <p>, <li> or <td></code> tags between the title and a DOI, the DOI could refer to a different reference, and we'll have to ignore it.
- Is there a unique DOI?
- Does the DOI appear in the first 5000 characters of the document? If so, it is probably part of the document description. Any later, and it's more likely to be a reference.
- Are there lots of DOIs?
- Do the <meta> tags contain a dc.Identifier or citation_doi?
PMID location
- The bot also queries PubMed to search for PMID parameters.
ISBN identification
- The bot also finds ISBNs, and expands book information if it is provided with an ISBN.
arXiv handling
- The bot expands {cite arXiv} templates with an eprint parameter, and updates them to use {cite journal} where appropriate
Template uniformity
- Where a mixture of {citation} and {cite x} family templates are used in an article, the bot identifies them and modifies the article to use the dominant form.
Bot approval
- Wikipedia:Bots/Requests_for_approval/DOI_bot
- Wikipedia:Bots/Requests_for_approval/DOI_bot_2
- Wikipedia:Bots/Requests_for_approval/DOI_bot_3
- Wikipedia:Bots/Requests_for_approval/Citation_bot_4
- Wikipedia:Bots/Requests_for_approval/Citation_bot_5