Quotation mark glyphs: Difference between revisions
m Reword sentence |
m reorder content |
||
Line 3: | Line 3: | ||
{{SpecialChars}} |
{{SpecialChars}} |
||
{{Punctuation marks|1=<div style="font-size:25%; vertical-align:text-top; line-height:1.5em">‘—’<br />“—”<br />'—'<br />"—"<br />{{unicode|‹}}—{{unicode|›}}<br />{{unicode|«}}—{{unicode|»}}</div>}} |
{{Punctuation marks|1=<div style="font-size:25%; vertical-align:text-top; line-height:1.5em">‘—’<br />“—”<br />'—'<br />"—"<br />{{unicode|‹}}—{{unicode|›}}<br />{{unicode|«}}—{{unicode|»}}</div>}} |
||
Different [[typeface]]s, [[character encoding]]s and [[computer language]]s use various encodings and '''glyphs for quotation marks'''. This article lists some of these [[glyph]]s along with their [[Unicode]] code points and [[HTML]] entities. |
Different [[typeface]]s, [[character encoding]]s and [[computer language]]s use various encodings and '''glyphs for quotation marks'''. This article lists some of these [[glyph]]s along with their [[Unicode]] code points and [[HTML]] entities. |
||
⚫ | |||
⚫ | |||
⚫ | [[English language|English]] curved quotes, also called “book quotes” or “curly quotes”, resemble small figures [[6 (number)|six]] and [[9 (number)|nine]] raised above the baseline (like <font face="TimesRoman"><sup>6</sup>...<sup>9</sup></font> and <font face="TimesRoman"><sup>66</sup>...<sup>99</sup></font>), but then solid, i.e., with the [[counter (typography)|counter]]s filled. In many [[typeface]]s, the shapes are the same as those of an inverted (upside down) and normal [[comma (punctuation)|comma]]. |
||
==Typewriter quotation marks== |
==Typewriter quotation marks== |
||
Line 27: | Line 31: | ||
: <code>“Good morning, Dave”, said HAL.</code> |
: <code>“Good morning, Dave”, said HAL.</code> |
||
: <code>‘Good morning, Dave’, said HAL.</code> |
: <code>‘Good morning, Dave’, said HAL.</code> |
||
⚫ | |||
⚫ | |||
⚫ | [[English language|English]] curved quotes, also called “book quotes” or “curly quotes”, resemble small figures [[6 (number)|six]] and [[9 (number)|nine]] raised above the baseline (like <font face="TimesRoman"><sup>6</sup>...<sup>9</sup></font> and <font face="TimesRoman"><sup>66</sup>...<sup>99</sup></font>), but then solid, i.e., with the [[counter (typography)|counter]]s filled. In many [[typeface]]s, the shapes are the same as those of an inverted (upside down) and normal [[comma (punctuation)|comma]]. |
||
===Quotation marks in electronic documents=== |
===Quotation marks in electronic documents=== |
||
⚫ | |||
[[Unicode_and_email|To use non ASCII characters in e-mail]] and on [[Usenet]] the sending mail application needs to set a [[MIME type]] specifying the encoding. In most cases, (the exceptions being if UTF-7 is used or if the 8BITMIME extension is present), this also requires the use of a [[MIME#Content-Transfer-Encoding|content-transfer encoding]]. <!--Could not find any reference. A few mail clients send curved quotes using the [[windows-1252]] codes, but mark the text as ISO-8859-1, causing problems for decoders that do not make the dubious assumption that C1 [[control code]]s in ISO-8859-1 text were meant to be windows-1252 printable characters.--> |
|||
Curved and straight quotes are also sometimes referred to as smart quotes ('''“…”''') and dumb quotes ('''"…"''') respectively; these names are in reference to the name of a function found in several word processors that automatically converts straight quotes typed by the user into curved quotes. |
Curved and straight quotes are also sometimes referred to as smart quotes ('''“…”''') and dumb quotes ('''"…"''') respectively; these names are in reference to the name of a function found in several word processors that automatically converts straight quotes typed by the user into curved quotes. |
||
This function, known as “educating quotes”, was developed for systems that lack separate open- and close-quote keyboard keys. |
This function, known as “educating quotes”, was developed for systems that lack separate open- and close-quote keyboard keys. |
||
⚫ | Since curved quotes are the typographically correct ones, Word processors have traditionally offered curved quotes to users. Before Unicode was widely accepted and supported, this meant representing the curved quotes in whatever 8-bit encoding the software and underlying [[operating system]] were using—but the character sets for [[Microsoft Windows|Windows]] and [[Apple Macintosh|Macintosh]] used two different pairs of values for curved quotes, and [[ISO 8859-1]] (historically the default character set for the [[Unix]]es and older [[Linux]] systems) has ''no'' curved quotes, making cross-platform compatibility quite difficult to implement. |
||
⚫ | |||
⚫ | Word processors have traditionally offered curved quotes to users |
||
Compounding the problem is the “smart quotes” feature mentioned above, which some word processors (including Microsoft Word and [[OpenOffice.org]]) use by default. With this feature turned on, users may not have realized that the ASCII-compatible straight quotes they were typing on their keyboards ended up as something entirely different. |
Compounding the problem is the “smart quotes” feature mentioned above, which some word processors (including Microsoft Word and [[OpenOffice.org]]) use by default. With this feature turned on, users may not have realized that the ASCII-compatible straight quotes they were typing on their keyboards ended up as something entirely different. |
||
Line 46: | Line 45: | ||
Further, the “smart quotes” feature converts opening apostrophes (such as in the words ’tis, ’em, and ’til) into opening single quotation marks—essentially upside-down apostrophes. A blatant example of this error appears in the advertisements for the television show ''[['Til Death|'''‘'''Til Death]]''. |
Further, the “smart quotes” feature converts opening apostrophes (such as in the words ’tis, ’em, and ’til) into opening single quotation marks—essentially upside-down apostrophes. A blatant example of this error appears in the advertisements for the television show ''[['Til Death|'''‘'''Til Death]]''. |
||
Unicode support has since become the norm for operating systems. Thus, in at least some cases, transferring content containing curved quotes (or any other non-ASCII characters) from a word processor to another application or platform has |
Unicode support has since become the norm for operating systems. Thus, in at least some cases, transferring content containing curved quotes (or any other non-ASCII characters) from a word processor to another application or platform has been less troublesome, provided all steps in the process (including the [[Clipboard (software)|clipboard]] if applicable) are Unicode-aware. But there are still applications which still use the older character sets, or output data using them, and thus problems still occur. |
||
There are other considerations for including curved quotes in the widely used [[markup language]]s HTML, [[XML]], and [[SGML]]. If the encoding of the document supports direct representation of the characters, they can be used, but doing so can result in difficulties if the document needs to be edited by someone who is using an editor that cannot support the encoding. For example, many simple text editors only handle a few encodings or assume that the encoding of any file opened is a platform default, so the quote characters may appear as “garbage”. HTML includes a set of entities for curved quotes: <tt>&lsquo;</tt> (left single), <tt>&rsquo;</tt> (right single), <tt>&sbquo;</tt> (low 9 single), <tt>&ldquo;</tt> (left double), <tt>&rdquo;</tt> (right double), and <tt>&bdquo;</tt> (low 9 double). XML does not define these by default, but specifications based on it can do so, and XHTML does. In addition, while the HTML 4, XHTML and XML specifications allow specifying numeric character references in either hexadecimal or decimal, SGML and older versions of HTML (and many old implementations) only support decimal references. Thus, to represent curly quotes in XML and SGML, it is safest to use the decimal numeric character references. That is, to represent the double curly quotes use <code>&#8220;</code> and <code>&#8221;</code>, and to represent single curly quotes use <code>&#8216;</code> and <code>&#8217;</code>. Both numeric and named references function correctly in almost every modern browser. While using numeric references can make a page more compatible with outdated browsers, using named references are safer for systems that handle multiple character encodings (i.e. RSS aggregators and search results). |
There are other considerations for including curved quotes in the widely used [[markup language]]s HTML, [[XML]], and [[SGML]]. If the encoding of the document supports direct representation of the characters, they can be used, but doing so can result in difficulties if the document needs to be edited by someone who is using an editor that cannot support the encoding. For example, many simple text editors only handle a few encodings or assume that the encoding of any file opened is a platform default, so the quote characters may appear as “garbage”. HTML includes a set of entities for curved quotes: <tt>&lsquo;</tt> (left single), <tt>&rsquo;</tt> (right single), <tt>&sbquo;</tt> (low 9 single), <tt>&ldquo;</tt> (left double), <tt>&rdquo;</tt> (right double), and <tt>&bdquo;</tt> (low 9 double). XML does not define these by default, but specifications based on it can do so, and XHTML does. In addition, while the HTML 4, XHTML and XML specifications allow specifying numeric character references in either hexadecimal or decimal, SGML and older versions of HTML (and many old implementations) only support decimal references. Thus, to represent curly quotes in XML and SGML, it is safest to use the decimal numeric character references. That is, to represent the double curly quotes use <code>&#8220;</code> and <code>&#8221;</code>, and to represent single curly quotes use <code>&#8216;</code> and <code>&#8217;</code>. Both numeric and named references function correctly in almost every modern browser. While using numeric references can make a page more compatible with outdated browsers, using named references are safer for systems that handle multiple character encodings (i.e. RSS aggregators and search results). |
||
Line 52: | Line 51: | ||
==Quotation marks in Unicode== |
==Quotation marks in Unicode== |
||
{{see also|Non-English usage of quotation marks}} |
{{see also|Non-English usage of quotation marks}} |
||
In Unicode, 29 characters are marked "Quotation mark". |
In Unicode, 29 characters are marked "Quotation mark".They are divided into two general character categories, "Pi" (punctuation initial quote) and "Pf" (punctuation final quote). |
||
{| class="wikitable" |
{| class="wikitable" |
||
! colspan=5 | Quotation marks in Unicode <small>([[Unicode character property|Character property]] "Quotation_Mark"=Yes)</small> |
! colspan=5 | Quotation marks in Unicode <small>([[Unicode character property|Character property]] "Quotation_Mark"=Yes)</small> |
||
Line 181: | Line 180: | ||
== References == |
== References == |
||
{{reflist}} |
{{reflist}} |
||
== See also == |
|||
<!-- * [[Greater-than sign]] how is it relevant, only as the ASCII approximation of › ? --> |
|||
{{DEFAULTSORT:Quotation Mark Glyphs}} |
{{DEFAULTSORT:Quotation Mark Glyphs}} |
Revision as of 10:57, 15 March 2012
Punctuation marks | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
||||||||||||||||||||||||||||||
In other scripts | ||||||||||||||||||||||||||||||
Related | ||||||||||||||||||||||||||||||
Category | ||||||||||||||||||||||||||||||
Different typefaces, character encodings and computer languages use various encodings and glyphs for quotation marks. This article lists some of these glyphs along with their Unicode code points and HTML entities.
Quotation marks in English
English curved quotes, also called “book quotes” or “curly quotes”, resemble small figures six and nine raised above the baseline (like 6...9 and 66...99), but then solid, i.e., with the counters filled. In many typefaces, the shapes are the same as those of an inverted (upside down) and normal comma.
Typewriter quotation marks
"Ambidextrous" quotation marks were introduced on typewriters to reduce the number of keys on the keyboard, and were inherited by computer keyboards and character sets. Some computer systems designed in the past had character sets with proper opening and closing quotes. However, the ASCII character set, which has been used on a wide variety of computers since the 1960s, only contained straight single quote (U+0027 ' APOSTROPHE) and double quote (U+0022 " QUOTATION MARK).
Many systems, like the personal computers of the 1980s and early '90s, actually drew these quotes like curved closing quotes on-screen and in printouts, so text would appear like this (approximately):
”Good morning, Dave”, said HAL.
’Good morning, Dave’, said HAL.
These same systems often drew the grave accent (`, U+0060) as an open quote glyph (actually a high-reversed-9 glyph, to preserve some usability as a grave). This gives a proper appearance at the cost of semantic correctness. Nothing similar was available for the double quote, so many people resorted to using two single quotes for double quotes, which would look like the following:
‛‛Good morning, Dave’’, said HAL.
‛Good morning, Dave’, said HAL.
The typesetting application TeX still uses this convention for input files. However, the appearance of these characters has varied greatly from font to font. On systems which provide straight quotes and grave accents like most do today (and as Unicode specifies) the result is poor as shown here:
``Good morning, Dave'', said HAL.
`Good morning, Dave', said HAL.
The Unicode slanted/curved quotes described below are shown here for comparison:
“Good morning, Dave”, said HAL.
‘Good morning, Dave’, said HAL.
Quotation marks in electronic documents
Historically support for curved quotes was a problem in information technology, primarily because the widely used ASCII character set did not include a representation for them. To use non ASCII characters in e-mail and on Usenet the sending mail application needs to set a MIME type specifying the encoding. In most cases, (the exceptions being if UTF-7 is used or if the 8BITMIME extension is present), this also requires the use of a content-transfer encoding.
Curved and straight quotes are also sometimes referred to as smart quotes (“…”) and dumb quotes ("…") respectively; these names are in reference to the name of a function found in several word processors that automatically converts straight quotes typed by the user into curved quotes. This function, known as “educating quotes”, was developed for systems that lack separate open- and close-quote keyboard keys.
Since curved quotes are the typographically correct ones, Word processors have traditionally offered curved quotes to users. Before Unicode was widely accepted and supported, this meant representing the curved quotes in whatever 8-bit encoding the software and underlying operating system were using—but the character sets for Windows and Macintosh used two different pairs of values for curved quotes, and ISO 8859-1 (historically the default character set for the Unixes and older Linux systems) has no curved quotes, making cross-platform compatibility quite difficult to implement.
Compounding the problem is the “smart quotes” feature mentioned above, which some word processors (including Microsoft Word and OpenOffice.org) use by default. With this feature turned on, users may not have realized that the ASCII-compatible straight quotes they were typing on their keyboards ended up as something entirely different.
Further, the “smart quotes” feature converts opening apostrophes (such as in the words ’tis, ’em, and ’til) into opening single quotation marks—essentially upside-down apostrophes. A blatant example of this error appears in the advertisements for the television show ‘Til Death.
Unicode support has since become the norm for operating systems. Thus, in at least some cases, transferring content containing curved quotes (or any other non-ASCII characters) from a word processor to another application or platform has been less troublesome, provided all steps in the process (including the clipboard if applicable) are Unicode-aware. But there are still applications which still use the older character sets, or output data using them, and thus problems still occur.
There are other considerations for including curved quotes in the widely used markup languages HTML, XML, and SGML. If the encoding of the document supports direct representation of the characters, they can be used, but doing so can result in difficulties if the document needs to be edited by someone who is using an editor that cannot support the encoding. For example, many simple text editors only handle a few encodings or assume that the encoding of any file opened is a platform default, so the quote characters may appear as “garbage”. HTML includes a set of entities for curved quotes: ‘ (left single), ’ (right single), ‚ (low 9 single), “ (left double), ” (right double), and „ (low 9 double). XML does not define these by default, but specifications based on it can do so, and XHTML does. In addition, while the HTML 4, XHTML and XML specifications allow specifying numeric character references in either hexadecimal or decimal, SGML and older versions of HTML (and many old implementations) only support decimal references. Thus, to represent curly quotes in XML and SGML, it is safest to use the decimal numeric character references. That is, to represent the double curly quotes use “
and ”
, and to represent single curly quotes use ‘
and ’
. Both numeric and named references function correctly in almost every modern browser. While using numeric references can make a page more compatible with outdated browsers, using named references are safer for systems that handle multiple character encodings (i.e. RSS aggregators and search results).
Quotation marks in Unicode
In Unicode, 29 characters are marked "Quotation mark".They are divided into two general character categories, "Pi" (punctuation initial quote) and "Pf" (punctuation final quote).
Quotation marks in Unicode (Character property "Quotation_Mark"=Yes) | ||||
---|---|---|---|---|
Glyph | Code | Unicode name | HTML | Comments |
" | U+0022 | quotation mark | " | Typewriter (“programmer’s”) quote, ambidextrous |
' | U+0027 | apostrophe | ' | Typewriter (“programmer’s”) straight single quote, ambidextrous |
« | U+00AB | left-pointing double angle quotation mark | « | Double angle quote (chevron, guillemet, duck-foot quote), left |
» | U+00BB | right-pointing double angle quotation mark | » | Double angle quote, right |
‘ | U+2018 | left single quotation mark | ‘ | Single curved quote, left |
’ | U+2019 | right single quotation mark | ’ | Single curved quote, right |
‚ | U+201A | single low-9 quotation mark | ‚ | Low single curved quote, left |
‛ | U+201B | single high-reversed-9 quotation mark | ‛ | also called single reversed comma, quotation mark |
“ | U+201C | left double quotation mark | “ | Double curved quote, or “curly quote”, left |
” | U+201D | right double quotation mark | ” | Double curved quote, right |
„ | U+201E | double low-9 quotation mark | „ | Low double curved quote, left |
‟ | U+201F | double high-reversed-9 quotation mark | ‟ | also called double reversed comma, quotation mark |
‹ | U+2039 | single left-pointing angle quotation mark | ‹ | Single angle quote, left |
› | U+203A | single right-pointing angle quotation mark | › | Single angle quote, right |
Quotation marks in Chinese, Japanese, and Korean (CJK) | ||||
「 | U+300C | left corner bracket | 「 | CJK |
」 | U+300D | right corner bracket | 」 | CJK |
『 | U+300E | left white corner bracket | 『 | CJK |
』 | U+300F | right white corner bracket | 』 | CJK |
〝 | U+301D | reversed double prime quotation mark | 〝 | CJK |
〞 | U+301E | double prime quotation mark | 〞 | CJK |
〟 | U+301F | low double prime quotation mark | 〟 | CJK |
Alternate encodings | ||||
﹁ | U+FE41 | presentation form for vertical left corner bracket | ﹁ | CJK Compatibility, preferred use: U+300C |
﹂ | U+FE42 | presentation form for vertical right corner bracket | ﹂ | CJK Compatibility, preferred use: U+300D |
﹃ | U+FE43 | presentation form for vertical left corner white bracket | ﹃ | CJK Compatibility, preferred use: U+300E |
﹄ | U+FE44 | presentation form for vertical right corner white bracket | ﹄ | CJK Compatibility, preferred use: U+300F |
" | U+FF02 | fullwidth quotation mark | " | Halfwidth and Fullwidth Forms, corresponds with U+0022 |
' | U+FF07 | fullwidth apostrophe | ' | Halfwidth and Fullwidth Formscorresponds with U+0027 |
「 | U+FF62 | halfwidth left corner bracket | 「 | Halfwidth and Fullwidth Forms, corresponds with U+300C |
」 | U+FF63 | halfwidth right corner bracket | 」 | Halfwidth and Fullwidth Forms, corresponds with U+300D |