Jump to content

User:Proteins/Writing scripts for Wikipedia: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Proteins (talk | contribs)
m unnecessary adverbs
Proteins (talk | contribs)
briefer topic sentence
Line 13: Line 13:
Every browser converts the received HTML code into a DOM tree; the word "DOM" stands for "Document Model". It's a tree, meaning that it arranges the HTML elements into a hierarchy. You can view this tree as described in the next section, and you can modify this tree however you wish using JavaScript.
Every browser converts the received HTML code into a DOM tree; the word "DOM" stands for "Document Model". It's a tree, meaning that it arranges the HTML elements into a hierarchy. You can view this tree as described in the next section, and you can modify this tree however you wish using JavaScript.


==How to access the DOM tree in your browser==
==How to view the DOM tree in your browser==


The DOM tree is created by the browser, and most browsers allow you to see it. The following instructions should allow you to see it in different browsers:
Most browsers allow you to see the DOM tree, which is the browser's internal representation of the webpage. The following instructions should allow you to see it in different browsers:


* In [[Mozilla Firefox 3|Firefox 3]], the best approach is to download an add-on known as "[https://addons.mozilla.org/en-US/firefox/addon/6622 DOM inspector]". Once added, it should appear under the "Tools" menu in the top bar of the browser, which is next to the "Bookmarks" menu". DOM Inspector can also be activated using the keycode Ctrl-Shift-I.
* In [[Mozilla Firefox 3|Firefox 3]], the best approach is to download an add-on known as "[https://addons.mozilla.org/en-US/firefox/addon/6622 DOM inspector]". Once added, it should appear under the "Tools" menu in the top bar of the browser, which is next to the "Bookmarks" menu". DOM Inspector can also be activated using the keycode Ctrl-Shift-I.

Revision as of 14:59, 30 October 2008

Scripts are amazing. They give you nearly unlimited power to analyze Wikipedia articles, to modify their appearance and even to add new elements. For example, you can count the number of polysyllabic words (analysis), color the words according to their syllables (modification) and create interactive dialogs for the reader (addition). In general, scripts do not affect the underlying article, the one stored on the database, so that multiple people can view the same article according to their own preferences, by using different scripts.

Scripts are also not hard to write! You need to know some HTML tags, and you have to learn how browsers represent the HTML internally, in a so-called DOM tree. Once you've learned those things, and mastered a few commands in JavaScript, you can do anything. There's already a WikiProject devoted to scripts, but this essay was written in the hope that Wikipedians might appreciate a slightly simpler introduction to scripts. Please let me know if something is unclear or incorrect.

Wiki markup, HTML and the DOM tree

Wikipedia articles can be represented in three forms: as wiki-markup, as HTML and as a DOM tree. This section explains the difference, and how you can see and modify the same article in each of its forms.

A typical article on Wikipedia is written in standard wiki-markup. For example, bold-faced words are written as '''bold-faced words''' contained between three single quotes. I believe this is the form of the article stored in the Wikipedia database. To see and modify the article in this form, you click on the "edit this page" tab at the top of the article.

When you request an article from the Wikipedia database, it is returned to you as HTML code. This HTML code is generated by the MediaWiki software from the underlying wiki-markup. You can see this HTML code by clicking on "View page source" or "View source" on your browser. Your browser takes the returned HTML node and renders it for you into the beautiful webpage you see before you. You can modify the HTML code directly using JavaScript, but it's difficult; it's much better and more customary to modify the rendering of the page by manipulating the DOM tree, which is the browser's internal representation of the HTML code.

Every browser converts the received HTML code into a DOM tree; the word "DOM" stands for "Document Model". It's a tree, meaning that it arranges the HTML elements into a hierarchy. You can view this tree as described in the next section, and you can modify this tree however you wish using JavaScript.

How to view the DOM tree in your browser

Most browsers allow you to see the DOM tree, which is the browser's internal representation of the webpage. The following instructions should allow you to see it in different browsers:

  • In Firefox 3, the best approach is to download an add-on known as "DOM inspector". Once added, it should appear under the "Tools" menu in the top bar of the browser, which is next to the "Bookmarks" menu". DOM Inspector can also be activated using the keycode Ctrl-Shift-I.
  • In Google Chrome, right-clicking on any part of the page summons a menu. At the bottom of that menu is the choice "Inspect element", which shows the position of the element in the DOM tree.
  • In Internet Explorer 7, the Internet Explorer Developer Toolbar, a free download from Microsoft, is used to show the DOM tree. This toolbar can be found at the far right, behind the double arrows that are to the right of the "Tools" menu, which is itself to the right of the "Page" menu.
  • In Safari, click on the "Develop" menu and select the choice "Show Web Inspector". The Develop menu is located in the topmost menu bar, between the "Bookmarks" and "Window" menus. If the Develop menu is not there, click on the "Edit" menu and select its last element, "Preferences". A window will pop up, on which you choose the last tab, labeled "Advanced". At the bottom of the Advanced screen is a checkbox labeled "Show Develop menu in menu bar." Clicking this checkbox should introduce the Develop menu in the menu bar.
  • In Opera, the equivalent DOM inspector can be turned on by clicking on the "Tools" menu in the top menu bar (sandwiched between the "Widgets" and "Help" menus). Under the Tools menu, click on the "Advanced" submenu, and from the resulting sub-sub-menu, choose "Developer Tools". This should turn on an analysis system at the bottom of the screen, which incidentally can also be detached into a window of its own. Within this analysis window, clicking on the "DOM" tab should reveal the DOM tree. One drawback of this inspector seems to be that it does not reveal the changes in the DOM tree after your script has run. Instead, it reloads the webpage afresh, always showing the original unmodified DOM tree.

The DOM tree of typical Wikipedia pages

Inspecting the DOM tree of Wikipedia articles will reveal a common architecture. The main content of the article is contained inside a DIV element with the id label "bodyContent"; to reach this crucial node, however, you need to drill down a few levels. The bodyContent node is found under the "content" node, which in turn is under the "column-content" node, which in turn is under the "globalWrapper" node, which is turn is under the standard BODY node, which is under the HTML node, which is under the "document" node, the top of the DOM tree. Thus, to reach bodyContent, you need to follow the sequence of child-nodes (sometimes called a "trail" through the document, or an XPath)

document → HTML → BODY → globalWrapper → column-content → content → bodyContent

Why are so many levels necessary before getting to the main article? The MediaWiki software uses these other levels to add all the extra decorations found on the page. For example, the user commands along the upper edge at the right, such as your user name, you user talk page, your preferences, etc. are found under "column-one" node, which is the sibling node of the "column-content" node. So are the tabs at the top of the page such as "article", "talk", "edit this page", etc. as well as the menus for navigation, search, interaction and toolbox in the left-hand column. By placing these in a separate node, they can be located and manipulated independently from the content.

Looking inside the bodyContent node using a DOM inspector reveals all the HTML code that makes up the article. For example, typical section headings are contained under H2 nodes, whereas successive subsections are contained under H3, H4 and H5 nodes. Normal text is contained in paragraph nodes labeled "P". Unordered (that is, bullet-pointed) lists and ordered (that is, numbered) lists are contained under UL and OL nodes, respectively; individual items in both cases are contained under LI (list item) nodes. Indentation corresponds to discursive lists; these are labeled with a DL, and the indented text is contained under a DD node. In some cases, a DL list is actually a definition list, one that has defined, boldfaced terms contained under a DT node; these terms are generated using an initial semicolon in wiki-markup. Larger-scale groupings of HTML nodes can be made using DIV and SPAN tags.