Tag soup

Tag soup is HTML code, written without regard for the rules of HTML structure and semantics (HTML is the markup language which composes Web pages). Generally, tag soup is created when the author is using HTML for a presentational document rather than a semantic document.

Tag soup is characterized by a large amount of common authoring mistakes, such as malformed HTML tags, improperly-nested HTML elements, unescaped character entities (especially ampersands (&) and less-than signs (<)), and the use of presentational HTML elements and attributes in order to create visual effects without respect for their implied meaning (that is, against their semantic purpose, see divitis).

Although often thought of as typifying private and semi-professional or hobbyist Web sites, tag soup is created by many professional web page layout programs, and written by hand by many professional web developers for some of the highest-profile sites.

Widespread usage of tag soup

Today, the majority of Web pages consist of invalid or malformed HTML and thus may be considered tag soup.

One possible cause of the proliferation of tag soup may be that until the release of Macromedia Dreamweaver MX 2003, no WYSIWYG editor produced valid and well-formed code.

Another factor in the popularity of tag soup is that most mainstream Web browsers currently in use tolerate code that is invalid or not well-formed without raising any errors. Thus, testing Web pages using current mainstream browsers will not enforce valid or well-formed pages.

Implications

Early browsers were very forgiving of malformed HTML and went to great lengths to render a Web page in the manner it thought the author 'intended' it to look.

Because of this, most current mainstream Web browsers can render Web pages in more than one mode, including a "Quirks mode". The Web browser switches into Quirks Mode when it encounters a Web page that appears to be using tag soup. Quirks Mode allows the browser to render the Web page in the same way as older browsers may have rendered it. The problem of tag soup is carried forth as each new browser that is released needs to be able to render the existing Web.

While most mainstream Web browsers can render tag soup in more or less the way the author 'intended' it, many other user agents cannot. For example, Web browsers for people with disabilities may have problems rendering the page. Other examples of user agents which may have problems with malformed code or code which is not used for its intended purpose include tools such as search engine spiders and Web browsers in hand-held devices.

How XHTML affects tag soup

XHTML is a reformulation of the HTML language based on XML. The XML Specification clearly defines what a conforming user agent (such as a web browser) must do when malformed code is encountered. Thus, a browser interpreting a Web page as XHTML will refuse to display the page if it encounters a formation error, ensuring that future XHTML will not be tag soup.

However, XHTML 1.0 states that XHTML may be interpreted by current Web browsers as HTML if it follows a set of compatibility guidelines defined in Appendix C of the XHTML 1.0 Recommendation. At this time, the popular web browser Internet Explorer is unable to interpret XHTML documents as XML, and thus most current XHTML pages are served to browsers as HTML, using the MIME type of "text/html".

Because XHTML 1.0 served to browsers as HTML is parsed as if it were badly-formed HTML, XHTML 1.0 is affected by tag soup in the same way as HTML.

Future versions of XHTML after version 1.0 do not allow the XHTML to be served to browsers as HTML. If implemented according to the recommendation, this should prevent the problem of tag soup once XHTML served as XHTML is supported by all major browsers.

Semantic markup used for presentational reasons

Some design idioms that were once good workarounds given the lack of presentational elements in early HTML specifications are now considered tag soup. These include the use of HTML table elements for structural markup (not for tabular data), the HTML font element and single pixel GIF images used for spacing (spacer GIFs). It is now advisable that CSS be used in place of such hacks.

Examples of tag soup

Other examples of tag soup include:

Elements and attributes which do not exist in an HTML or XHTML specification but which are extensions to particular web browsers.
Heading elements which don't accurately represent the structure of the document. For example, a <h3> element which does not represent a section within a section designated by <h2>. In this case the author may have chosen the level of heading according to how it looked in a mainstream Web browser, rather than to represent the structure of a document.
Unnecessarily large or bloated markup, even when it is valid. For example, very long lists of keywords in the head of the document, or table elements nested inside other table elements when it is not necessary.