ISO/IEC 8859-1: Difference between revisions
→Code page layout: Added visible Unicode code point number Tag: Reverted |
Citation bot (talk | contribs) Removed parameters. | Use this bot. Report bugs. | #UCB_CommandLine |
||
(128 intermediate revisions by 57 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Character encoding |
{{Short description|Character encoding}} |
||
{{About||the Unicode block also called "Latin 1"|Latin-1 Supplement (Unicode block)|the |
{{About||the Unicode block also called "Latin 1"|Latin-1 Supplement (Unicode block)|the 1990s-era misnomer "ISO-8859-1"|Windows-1252}} |
||
{{Infobox character encoding |
{{Infobox character encoding |
||
| name = ISO/IEC 8859-1:1998 |
| name = ISO/IEC 8859-1:1998 |
||
| mime = ISO-8859-1 |
| mime = ISO-8859-1 |
||
| alias = iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819 |
| alias = iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819 |
||
| image = Latin-1-infobox.svg |
| image = Latin-1-infobox.svg{{!}}class=skin-invert-image |
||
| caption = ISO 8859-1 code page layout |
| caption = ISO/IEC 8859-1 code page layout |
||
| standard = [[ISO/IEC 8859]] |
| standard = [[ISO/IEC 8859]] |
||
| lang = [[English language|English]], [[Western Latin character sets (computing)|various others]] |
| lang = [[English language|English]], [[Western Latin character sets (computing)|various others]] |
||
| status = |
| status = |
||
| extends = [[US-ASCII]] |
| extends = [[ASCII|US-ASCII]] |
||
| basedon = [[DEC MCS]] |
| basedon = [[Multinational Character Set|DEC MCS]] |
||
| next = {{plainlist| |
| next = {{plainlist| |
||
* [[UTF-8]] |
|||
* [[UTF-16]]}} |
|||
| otherrelated = {{plainlist| |
|||
* [[ISO/IEC 8859-15]] |
* [[ISO/IEC 8859-15]] |
||
* [[Windows-1252]] |
* [[Windows-1252]] |
||
* [[BraSCII]]}} |
|||
| classification = [[Extended ASCII]], [[ISO 8859]] |
| classification = [[Extended ASCII]], [[ISO/IEC 8859]] |
||
}} |
}} |
||
'''ISO/IEC |
'''ISO/IEC 8859-1:1998''', ''Information technology—[[8-bit computing|8-bit]] single-[[byte]] coded graphic [[character (computing)|character]] sets—Part 1: Latin alphabet No. 1'', is part of the [[ISO/IEC 8859]] series of [[ASCII]]-based standard [[character encoding]]s, first edition published in 1987. ISO/IEC 8859-1 encodes what it refers to as "'''Latin alphabet no. 1'''", consisting of 191 [[character (computing)|characters]] from the [[Latin script]]. This character-encoding scheme is used throughout the [[Americas]], [[Western Europe]], [[Oceania]], and much of [[Africa]]. It is the basis for some popular 8-bit character sets and the first two blocks of characters in [[Unicode]]. |
||
{{As of|2024|12}}, 1.1% of all<!-- (and 15 of the top 1000<ref>{{Cite web|title=Usage Survey of Character Encodings broken down by Ranking|url=https://w3techs.com/technologies/cross/character_encoding/ranking|access-date=2024-12-16|website=W3Techs|language=en |url-status=live}}</ref>) --> [[website|web sites]] use {{nowrap|ISO/IEC 8859-1}}.<ref name="encoding">{{Cite web|title=Historical trends in the usage statistics of character encodings for Web sites, December 2024|url=https://w3techs.com/technologies/history_overview/character_encoding|access-date=2024-12-16|website=W3Techs }}</ref><ref>{{Cite web|url=https://w3techs.com/forum/topic/22994|title=Source of character encoding statistics?|website=W3Techs |date=August 2014 |first1=John |last1=Cowan |first2=Sam |last2=Soltano |url-status=live |archive-url= https://archive.today/20240404230303/https://w3techs.com/forum/topic/22994 |archive-date=4 April 2024 }}</ref> It is the most declared single-byte character encoding, but as Web browsers and the [[HTML5]] standard<ref name="WHATWG">{{cite web |url=https://encoding.spec.whatwg.org/#names-and-labels |title=Encoding |at=sec. 5.2 Names and labels |publisher=[[WHATWG]] |date=27 January 2015 |access-date=4 February 2015 |archive-url=https://web.archive.org/web/20150204174315/https://encoding.spec.whatwg.org/#names-and-labels |archive-date=4 February 2015 |url-status=live}}</ref> interpret them as the superset [[Windows-1252]], these documents may include characters from that set. Some countries or languages show a higher usage than the global average, in 2024 Brazil according to website use, use is at<!-- arguably, adding together 7.1% (ISO-8859-1) + 0.4% (Windows-1252) = 7.5%, or because some pages include more than one encoding, likely better to show 100-97.1% = --> 2.9%,<ref>{{Cite web |title=Distribution of Character Encodings among websites that use Brazil |url=https://w3techs.com/technologies/segmentation/sl-br-/character_encoding |access-date=2024-12-16 |website=W3Techs }}</ref> and in Germany at <!-- adding together 2.5% (ISO-8859-1) + 0.5% (Windows-1252) = 3.0% or because some pages include more than one encoding 100-97.5 = --> 2.5%.<ref>{{Cite web|title=Distribution of Character Encodings among websites that use .de|url=https://w3techs.com/technologies/segmentation/tld-de-/character_encoding|access-date=2024-12-16|website=W3Techs }}</ref><ref>{{Cite web|title=Distribution of Character Encodings among websites that use German|url=https://w3techs.com/technologies/segmentation/cl-de-/character_encoding|access-date=2024-12-16|website=W3Techs |url-status=live |archive-url=https://archive.today/20240404232501/https://w3techs.com/technologies/segmentation/cl-de-/character_encoding |archive-date=4 April 2024}}</ref> |
|||
ISO-8859-1 was (according to the standard, at least) the default encoding of documents delivered via [[Hypertext Transfer Protocol|HTTP]] with a [[media type|MIME type]] beginning with "text/" ([[HTML5]] changed this to [[Windows-1252]]).<ref>{{Cite web|url=https://encoding.spec.whatwg.org/|title=Encoding Standard|website=encoding.spec.whatwg.org}}</ref><ref>{{Cite web|url=https://html.spec.whatwg.org/multipage/infrastructure.html|title=HTML Standard|website=html.spec.whatwg.org}}</ref> {{As of|2022|01}}, 1.1% of all (but only 5 of the top 1000<ref>{{Cite web|url=https://w3techs.com/technologies/cross/character_encoding/ranking|title=Usage Survey of Character Encodings broken down by Ranking|website=w3techs.com|language=en|access-date=2022-01-03}}</ref>) [[website]]s use {{nowrap|ISO 8859-1}}.<ref name="encoding">{{cite web|url=https://w3techs.com/technologies/history_overview/character_encoding|title=Historical trends in the usage of character encodings, September 2021|access-date=2021-09-22}}</ref><ref>{{Cite web|url=https://w3techs.com/forum/topic/22994|title=Source of character encoding statistics?|website=w3techs.com}}</ref> It is the most ''declared'' single-byte character encoding in the world on the web, but as web browsers interpret it as the superset [[Windows-1252]] the documents may include characters from that set. |
|||
ISO-8859-1 was (according to the standard, at least) the default encoding of documents delivered via [[Hypertext Transfer Protocol|HTTP]] with a [[media type|MIME type]] beginning with {{code|text/}}, the default encoding of the values of certain descriptive HTTP headers, and defined the repertoire of characters allowed in [[HTML]] 3.2 documents. It is specified by many other standards.{{examples|date=August 2024}} In practice, the superset encoding Windows-1252 is the more likely effective default<ref>{{Cite web |title=c++ - What is the native narrow string encoding on Windows? |url=https://stackoverflow.com/questions/4649388/what-is-the-native-narrow-string-encoding-on-windows |date=Jan 2011 |access-date=2023-02-16 |website=Stack Overflow |language=en}}</ref> and it is increasingly common for [[UTF-8]] to work whether or not a standard specifies it. |
|||
Depending on the country, use can be much higher than the global average, e.g. for Germany at 4.7% (and including Windows-1252 at <!-- adding together 4.7% (ISO-8859-1) + 0.6% (Windows-1252) = 5.3% or because of round-off --> 5.2%).<ref>{{Cite web|title=Distribution of Character Encodings among websites that use .de|url=https://w3techs.com/technologies/segmentation/tld-de-/character_encoding|access-date=2022-01-03|website=w3techs.com}}</ref><ref>{{Cite web|title=Distribution of Character Encodings among websites that use German|url=https://w3techs.com/technologies/segmentation/cl-de-/character_encoding|access-date=2021-09-22|website=w3techs.com}}</ref> |
|||
⚫ | '''ISO-8859-1''' is the [[Internet Assigned Numbers Authority|IANA]] preferred name for this standard when supplemented with the [[C0 and C1 control codes]] from [[ISO/IEC 6429]]. The following other aliases are registered: '''iso-ir-100''', '''csISOLatin1''', '''latin1''', '''l1''', '''IBM819''', '''Code page 28591''' a.k.a. '''Windows-28591''' is used for it in Windows.<ref>{{cite web |url=https://msdn.microsoft.com/en-us/library/dd317756(v=vs.85).aspx |title=Code Page Identifiers |publisher=Microsoft Corporation |access-date=2010-12-19}}</ref> IBM calls it '''code page 819''' or '''CP819''' ('''[[CCSID]] 819''').<ref>{{cite web|title=Code page 819 information document|archive-url=https://web.archive.org/web/20170116144609/https://www-01.ibm.com/software/globalization/cp/cp00819.html|archive-date=2017-01-16|url=https://www-01.ibm.com/software/globalization/cp/cp00819.html}}</ref><ref>{{cite web|title=CCSID 819 information document|archive-url=https://web.archive.org/web/20160327100212/http://www-01.ibm.com/software/globalization/ccsid/ccsid819.html|archive-date=2016-03-27|url=http://www-01.ibm.com/software/globalization/ccsid/ccsid819.html}}</ref><ref>{{Citation|title=Code Page CPGID 00819 (pdf)|url=https://public.dhe.ibm.com/software/globalization/gcoc/attachments/CP00819.pdf|publisher=IBM}}</ref><ref>{{Citation|title=Code Page CPGID 00819 (txt)|url=https://public.dhe.ibm.com/software/globalization/gcoc/attachments/CP00819.txt|publisher=IBM}}</ref> [[Oracle Database|Oracle]] calls it '''WE8ISO8859P1'''.<ref name="Oracle_2002_ISO8859">{{cite book |title=Oracle9i Database Globalization Support Guide |author-first1=Cathy |author-last1=Baird |author-first2=Dan |author-last2=Chiba |author-first3=Winson |author-last3=Chu |author-first4=Jessica |author-last4=Fan |author-first5=Claire |author-last5=Ho |author-first6=Simon |author-last6=Law |author-first7=Geoff |author-last7=Lee |author-first8=Peter |author-last8=Linsley |author-first9=Keni |author-last9=Matsuda |author-first10=Tamzin |author-last10=Oscroft |author-first11=Shige |author-last11=Takeda |author-first12=Linus |author-last12=Tanaka |author-first13=Makoto |author-last13=Tozawa |author-first14=Barry |author-last14=Trute |author-first15=Mayumi |author-last15=Tsujimoto |author-first16=Ying |author-last16=Wu |author-first17=Michael |author-last17=Yau |author-first18=Tim |author-last18=Yu |author-first19=Chao |author-last19=Wang |author-first20=Simon |author-last20=Wong |author-first21=Weiran |author-last21=Zhang |author-first22=Lei |author-last22=Zheng |author-first23=Yan |author-last23=Zhu |author-first24=Valarie |author-last24=Moore |publisher=[[Oracle Corporation]] |edition=Release 2 (9.2) |date=2002 |orig-year=1996 |id=Oracle A96529-01 |chapter=Appendix A: Locale Data |url=https://docs.oracle.com/cd/B10501_01/server.920/a96529.pdf |access-date=2017-02-14 |url-status=live |archive-url=https://web.archive.org/web/20170214190952/https://docs.oracle.com/cd/B10501_01/server.920/a96529.pdf |archive-date=2017-02-14}}</ref> |
||
ISO-8859-1 was the default encoding of the values of certain descriptive HTTP headers, and defined the repertoire of characters allowed in [[HTML]] 3.2 documents, and is specified by many other standards. This is sometimes assumed to be the encoding of text on [[Microsoft Windows]] (and [[Unix]]) if there is no [[byte order mark]] (BOM); this is only gradually being changed to [[UTF-8]]. |
|||
⚫ | '''ISO-8859-1''' is the [[Internet Assigned Numbers Authority|IANA]] preferred name for this standard when supplemented with the [[C0 and C1 control codes]] from [[ISO/IEC 6429]]. The following other aliases are registered: '''iso-ir-100''', '''csISOLatin1''', '''latin1''', '''l1''', '''IBM819''' |
||
== Coverage == |
== Coverage == |
||
{{See also|Latin-script alphabet}} |
{{See also|Latin-script alphabet}} |
||
Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following languages (while it may exclude correct [[ |
Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following languages (while it may exclude correct [[quotation mark#German|quotation marks]] such as for many languages including [[German language|German]] and [[Icelandic language|Icelandic]]): |
||
=== Modern languages with complete coverage === |
=== Modern languages with complete coverage === |
||
{{columns-list|colwidth= |
{{columns-list|colwidth=15em| |
||
* [[Afrikaans language|Afrikaans]] |
* [[Afrikaans language|Afrikaans]] |
||
* [[Albanian language|Albanian]] |
* [[Albanian language|Albanian]] |
||
Line 43: | Line 44: | ||
* [[Galician language|Galician]] |
* [[Galician language|Galician]] |
||
* [[Icelandic language|Icelandic]] |
* [[Icelandic language|Icelandic]] |
||
* [[Ido]] |
|||
* [[Irish Language|Irish]] |
* [[Irish Language|Irish]] |
||
* [[Indonesian language|Indonesian]] |
* [[Indonesian language|Indonesian]] |
||
* [[Italian language|Italian]] |
* [[Italian language|Italian]] |
||
* [[Leonese dialect|Leonese]] |
* [[Leonese dialect|Leonese]] |
||
* [[Lojban]] |
|||
* [[Luxembourgish language|Luxembourgish]]{{efn|Basic classical orthography}} |
* [[Luxembourgish language|Luxembourgish]]{{efn|Basic classical orthography}} |
||
* [[Malay language|Malay]]{{efn|[[Rumi script]]}} |
* [[Malay language|Malay]]{{efn|[[Rumi script]]}} |
||
Line 54: | Line 57: | ||
* [[Portuguese language|Portuguese]]{{efn|European and Brazilian}} |
* [[Portuguese language|Portuguese]]{{efn|European and Brazilian}} |
||
* [[Romansh language|Rhaeto-Romanic]] |
* [[Romansh language|Rhaeto-Romanic]] |
||
* [[Rotokas alphabet|Rotokas]] |
|||
* [[Scottish Gaelic]] |
* [[Scottish Gaelic]] |
||
* [[Scots language|Scots]] |
* [[Scots language|Scots]] |
||
Line 61: | Line 65: | ||
* [[Swedish language|Swedish]] |
* [[Swedish language|Swedish]] |
||
* [[Tagalog language|Tagalog]] |
* [[Tagalog language|Tagalog]] |
||
* [[Toki Pona]] |
|||
* [[Walloon language|Walloon]] |
* [[Walloon language|Walloon]] |
||
}} |
}} |
||
Line 79: | Line 84: | ||
| [[Danish language|Danish]] || [[Ǿ]], ǿ (the accent is optional and ǿ is very rare)|| Ø, ø or øe || |
| [[Danish language|Danish]] || [[Ǿ]], ǿ (the accent is optional and ǿ is very rare)|| Ø, ø or øe || |
||
|- |
|- |
||
| [[Dutch language|Dutch]] || [[IJ (digraph)|IJ]], ij ( |
| [[Dutch language|Dutch]] || [[IJ (digraph)|IJ]], ij (debatable), [[j́]] (in emphasized words like "blíj́f") || [[digraph (orthography)|digraphs]] IJ, ij or ÿ; blíjf || |
||
|- |
|- |
||
| [[Estonian language|Estonian]] || [[Š]], š, [[Ž]], ž (only present in loanwords) || Sh, sh, Zh, zh || [[ISO/IEC 8859-15|ISO-8859-15]], [[Windows-1252]] |
| [[Estonian language|Estonian]], [[Finnish language|Finnish]] || [[Š]], š, [[Ž]], ž (only present in loanwords) || Sh, sh, Zh, zh || [[ISO/IEC 8859-15|ISO-8859-15]], [[Windows-1252]] |
||
⚫ | |||
⚫ | |||
|- |
|- |
||
| [[French language|French]] || [[Œ]], œ, and the very rare [[Ÿ]] || [[digraph (orthography)|digraphs]] OE, oe; Y or Ý || [[ISO/IEC 8859-15|ISO-8859-15]], [[Windows-1252]] |
| [[French language|French]] || [[Œ]], œ, and the very rare [[Ÿ]] || [[digraph (orthography)|digraphs]] OE, oe; Y or Ý || [[ISO/IEC 8859-15|ISO-8859-15]], [[Windows-1252]] |
||
|- |
|- |
||
| [[German language|German]] || [[Capital ẞ|ẞ]] (capital ß, used only in all capitals |
| [[German language|German]] || [[Capital ẞ|ẞ]] (capital ß, used only in all capitals) || [[digraph (orthography)|digraph]] SS or SZ || |
||
|- |
|- |
||
| [[Hungarian language|Hungarian]] || [[Ő]], ő, [[Ű]], ű || Ö, ö, Ü, ü || [[ISO/IEC 8859-2]], [[Windows-1250]] |
| [[Hungarian language|Hungarian]] || [[Ő]], ő, [[Ű]], ű || Ö, ö, Ü, ü <br />[[Õ]], õ, [[Û]], û (the characters replaced in {{nowrap|[[ISO/IEC 8859-2|8859-2]]}}) || [[ISO/IEC 8859-2|ISO-8859-2]], [[Windows-1250]] |
||
|- |
|- |
||
| [[Irish language|Irish]] ([[Irish orthography#Alphabet|traditional orthography]])|| Ḃ, ḃ, Ċ, ċ, Ḋ, ḋ, Ḟ, ḟ, Ġ, ġ, Ṁ, ṁ, Ṗ, ṗ, Ṡ, ṡ, Ṫ, ṫ || Bh, bh, Ch, ch, Dh, dh, Fh, fh, Gh, gh, Mh, mh, Ph, ph, Sh, sh, Th, th || [[ISO/IEC 8859-14|ISO-8859-14]] |
| [[Irish language|Irish]] ([[Irish orthography#Alphabet|traditional orthography]])|| Ḃ, ḃ, Ċ, ċ, Ḋ, ḋ, Ḟ, ḟ, Ġ, ġ, Ṁ, ṁ, Ṗ, ṗ, Ṡ, ṡ, Ṫ, ṫ || Bh, bh, Ch, ch, Dh, dh, Fh, fh, Gh, gh, Mh, mh, Ph, ph, Sh, sh, Th, th || [[ISO/IEC 8859-14|ISO-8859-14]] |
||
|- |
|- |
||
⚫ | |||
⚫ | |||
| [[Welsh language|Welsh]] || [[Ẁ]], ẁ, [[Ẃ]], ẃ, [[Ŵ]], ŵ, [[Ẅ]], ẅ, [[Ỳ]], ỳ, [[Ŷ]], ŷ, [[Ÿ]] || W, w, Y, y, Ý, ý || [[ISO/IEC 8859-14|ISO-8859-14]] |
| [[Welsh language|Welsh]] || [[Ẁ]], ẁ, [[Ẃ]], ẃ, [[Ŵ]], ŵ, [[Ẅ]], ẅ, [[Ỳ]], ỳ, [[Ŷ]], ŷ, [[Ÿ]] || W, w, Y, y, Ý, ý || [[ISO/IEC 8859-14|ISO-8859-14]] |
||
|} |
|} |
||
Line 100: | Line 104: | ||
=== Quotation marks === |
=== Quotation marks === |
||
For some languages listed above, the correct typographical [[Quotation mark#Summary table|quotation marks]] are missing, as only {{code|« »}}, {{code|" "}}, and {{code|' '}} are included. Also, this scheme does not provide for oriented (6- or 9-shaped) single or double quotation marks. Some fonts will display the spacing grave accent (0x60) and the apostrophe (0x27) as a matching pair of oriented single quotation marks, but this is not considered part of the modern standard. |
For some languages listed above, the correct typographical [[Quotation mark#Summary table|quotation marks]] are missing, as only {{code|« »}}, {{code|" "}}, and {{code|' '}} are included. Also, this scheme does not provide for oriented (6- or 9-shaped) single or double quotation marks. Some fonts will display the spacing grave accent (0x60) and the apostrophe (0x27) as a matching pair of oriented single quotation marks (see {{section link|Quotation mark|Typewriters and early computers}}), but this is not considered part of the modern standard. |
||
=== Superscript digits === |
|||
Only 3 superscript digits have been encoded: <code>²</code> at 0xB2, <code>³</code> at 0xB3, and <code>¹</code> at 0xB9, lacking the superscript digit 0 and digits 4–9. Additionally, none of the subscript digits have been encoded. A workaround would be to use rich text formatting for the digits not covered by this standard. |
|||
== History == |
== History == |
||
ISO |
ISO 8859-1 was based on the [[Multinational Character Set]] (MCS) used by [[Digital Equipment Corporation]] (DEC) in the popular [[VT220]] terminal in 1983. It was developed within the [[Ecma International|European Computer Manufacturers Association]] (ECMA), and published in March 1985 as [[ECMA-94]],<ref name="ECMA_1985_ECMA94_R1" /> by which name it is still sometimes known. The second edition of ECMA-94 (June 1986)<ref>{{Cite web|url=https://www.ecma-international.org/publications/files/ECMA-ST/Ecma-094.pdf|title=Second edition of ECMA-94 (June 1986)}}</ref> also included [[ISO/IEC 8859-2|ISO 8859-2]], [[ISO/IEC 8859-3|ISO 8859-3]], and [[ISO/IEC 8859-4|ISO 8859-4]] as part of the specification. |
||
The original draft of ISO |
The original draft of ISO 8859-1 placed French ''Œ'' and ''œ'' at code points 215 (0xD7) and 247 (0xF7), as in the MCS. However, the delegate from France, being neither a linguist nor a typographer, falsely stated that these are not independent French letters on their own, but mere [[Orthographic ligature|ligatures]] (like ''fi'' or ''fl''), supported by the delegate team from [[Bull Publishing Company]], who regularly did not print French with ''Œ/œ'' in their house style at the time. An anglophone delegate from Canada insisted on retaining ''Œ/œ'' but was rebuffed by the French delegate and the team from Bull. These code points were soon filled with × and ÷ under the suggestion of the German delegation. Support for French was further reduced when it was again falsely stated that the letter ''ÿ'' is "not French", resulting in the absence of the capital ''Ÿ''. In fact, the letter ''ÿ'' is found in a number of French proper names, and the capital letter has been used in dictionaries and encyclopedias.<ref>{{Cite journal|last=André|first=Jacques|title=ISO Latin-1, norme de codage des caractères européens? Trois caractères français en sont absents!|journal=Cahiers GUTenberg|issue=25|date=1996|pages=65–77|doi=10.5802/cg.205 |url=http://www.numdam.org/article/CG_1996___25_65_0.pdf|language=fr}}</ref> These characters were added to [[ISO/IEC 8859-15#1999|ISO/IEC 8859-15:1999]]. [[BraSCII]] matches the original draft. |
||
In 1985, [[Commodore International|Commodore]] adopted ECMA-94 for its new [[AmigaOS]] operating system.<ref name="Amiga-1251">{{Cite web |title=Registration of new charset [Amiga-1251] |date=2003-01-10 |author-first=Michael |author-last=Malyshev |url=https://www.iana.org/assignments/charset-reg/Amiga-1251 |publisher=ATO-RU (Amiga Translation Organization - Russian Department) |access-date=2016-12-05 |url-status=live |archive-url=https://web.archive.org/web/20161205191644/https://www.iana.org/assignments/charset-reg/Amiga-1251 |archive-date=2016-12-05}}</ref> The Seikosha MP-1300AI impact dot-matrix printer, used with the Amiga |
In 1985, [[Commodore International|Commodore]] adopted ECMA-94 for its new [[AmigaOS]] operating system.<ref name="Amiga-1251">{{Cite web |title=Registration of new charset [Amiga-1251] |date=2003-01-10 |author-first=Michael |author-last=Malyshev |url=https://www.iana.org/assignments/charset-reg/Amiga-1251 |publisher=ATO-RU (Amiga Translation Organization - Russian Department) |access-date=2016-12-05 |url-status=live |archive-url=https://web.archive.org/web/20161205191644/https://www.iana.org/assignments/charset-reg/Amiga-1251 |archive-date=2016-12-05}}</ref> The Seikosha MP-1300AI impact dot-matrix printer, used with the Amiga 1000, included this encoding.{{Citation needed|date=April 2012}} |
||
In 1990, the |
In 1990, the first version of [[Unicode]] used the code points of ISO-8859-1 as the first 256 Unicode code points. |
||
In 1992, the [[Internet Assigned Numbers Authority|IANA]] registered the character map '''ISO_8859-1:1987''', more commonly known by its preferred [[MIME]] name of '''ISO-8859-1''' (note the extra hyphen over ISO |
In 1992, the [[Internet Assigned Numbers Authority|IANA]] registered the character map '''ISO_8859-1:1987''', more commonly known by its preferred [[MIME]] name of '''ISO-8859-1''' (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the [[Internet]]. This map assigns the [[C0 and C1 control codes]] to the unassigned code values thus provides for 256 characters via every possible 8-bit value. |
||
== Code page layout == |
== Code page layout == |
||
Line 154: | Line 161: | ||
|- |
|- |
||
| {{chset-left1|2x}} |
| {{chset-left1|2x}} |
||
| {{chset-ctrl1 |
| {{chset-ctrl1|32 U+0020: SPACE | [[Space character| SP ]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|33 U+0021: EXCLAMATION MARK | [[!]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|34 U+0022: QUOTATION MARK | [["]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|35 U+0023: NUMBER SIGN | [[Number sign|#]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|36 U+0024: DOLLAR SIGN | [[$]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|37 U+0025: PERCENT SIGN | [[%]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|38 U+0026: AMPERSAND | [[&]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|39 U+0027: APOSTROPHE | [[']] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|40 U+0028: LEFT PARENTHESIS | [[(]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|41 U+0029: RIGHT PARENTHESIS | [[)]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|42 U+002A: ASTERISK | [[*]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|43 U+002B: PLUS SIGN | [[+]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|44 U+002C: COMMA | [[,]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|45 U+002D: HYPHEN-MINUS | [[-]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|46 U+002E: FULL STOP | [[Full stop|.]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|47 U+002F: SOLIDUS | [[Slash (punctuation)|/]] | style=background:#EFF}} |
||
|- |
|- |
||
| {{chset-left1|3x}} |
| {{chset-left1|3x}} |
||
| {{chset-cell1 |
| {{chset-cell1|48 U+0030: DIGIT ZERO | [[0]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|49 U+0031: DIGIT ONE | [[1]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|50 U+0032: DIGIT TWO | [[2]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|51 U+0033: DIGIT THREE | [[3]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|52 U+0034: DIGIT FOUR | [[4]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|53 U+0035: DIGIT FIVE | [[5]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|54 U+0036: DIGIT SIX | [[6]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|55 U+0037: DIGIT SEVEN | [[7]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|56 U+0038: DIGIT EIGHT | [[8]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|57 U+0039: DIGIT NINE | [[9]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|58 U+003A: COLON | [[Colon (punctuation)|:]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|59 U+003B: SEMICOLON | [[;]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|60 U+003C: LESS-THAN SIGN | [[Less-than sign|<]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|61 U+003D: EQUALS SIGN | [[=]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|62 U+003E: GREATER-THAN SIGN | [[Greater-than sign|>]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|63 U+003F: QUESTION MARK | [[?]] | style=background:#EFF}} |
||
|- |
|- |
||
| {{chset-left1|4x}} |
| {{chset-left1|4x}} |
||
| {{chset-cell1 |
| {{chset-cell1|64 U+0040: COMMERCIAL AT | [[@]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|65 U+0041: LATIN CAPITAL LETTER A | [[A]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|66 U+0042: LATIN CAPITAL LETTER B | [[B]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|67 U+0043: LATIN CAPITAL LETTER C | [[C]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|68 U+0044: LATIN CAPITAL LETTER D | [[D]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|69 U+0045: LATIN CAPITAL LETTER E | [[E]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|70 U+0046: LATIN CAPITAL LETTER F | [[F]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|71 U+0047: LATIN CAPITAL LETTER G | [[G]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|72 U+0048: LATIN CAPITAL LETTER H | [[H]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|73 U+0049: LATIN CAPITAL LETTER I | [[I]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|74 U+004A: LATIN CAPITAL LETTER J | [[J]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|75 U+004B: LATIN CAPITAL LETTER K | [[K]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|76 U+004C: LATIN CAPITAL LETTER L | [[L]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|77 U+004D: LATIN CAPITAL LETTER M | [[M]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|78 U+004E: LATIN CAPITAL LETTER N | [[N]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|79 U+004F: LATIN CAPITAL LETTER O | [[O]] }} |
||
|- |
|- |
||
| {{chset-left1|5x}} |
| {{chset-left1|5x}} |
||
| {{chset-cell1 |
| {{chset-cell1|80 U+0050: LATIN CAPITAL LETTER P | [[P]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|81 U+0051: LATIN CAPITAL LETTER Q | [[Q]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|82 U+0052: LATIN CAPITAL LETTER R | [[R]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|83 U+0053: LATIN CAPITAL LETTER S | [[S]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|84 U+0054: LATIN CAPITAL LETTER T | [[T]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|85 U+0055: LATIN CAPITAL LETTER U | [[U]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|86 U+0056: LATIN CAPITAL LETTER V | [[V]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|87 U+0057: LATIN CAPITAL LETTER W | [[W]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|88 U+0058: LATIN CAPITAL LETTER X | [[X]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|89 U+0059: LATIN CAPITAL LETTER Y | [[Y]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|90 U+005A: LATIN CAPITAL LETTER Z | [[Z]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|91 U+005B: LEFT SQUARE BRACKET | [[Left square bracket|[]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|92 U+005C: REVERSE SOLIDUS | [[Backslash|\]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|93 U+005D: RIGHT SQUARE BRACKET | [[Right square bracket|]]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|94 U+005E: CIRCUMFLEX ACCENT | [[^]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|95 U+005F: LOW LINE" | [[Underscore|_]] | style=background:#EFF}} |
||
|- |
|- |
||
| {{chset-left1|6x}} |
| {{chset-left1|6x}} |
||
| {{chset-cell1 |
| {{chset-cell1|96 U+0060: GRAVE ACCENT" | [[`]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|97 U+0061: LATIN SMALL LETTER A | [[a]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|98 U+0062: LATIN SMALL LETTER B | [[b]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|99 U+0063: LATIN SMALL LETTER C | [[c]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|100 U+0064: LATIN SMALL LETTER D | [[d]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|101 U+0065: LATIN SMALL LETTER E | [[e]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|102 U+0066: LATIN SMALL LETTER F | [[f]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|103 U+0067: LATIN SMALL LETTER G | [[g]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|104 U+0068: LATIN SMALL LETTER H | [[h]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|105 U+0069: LATIN SMALL LETTER I | [[i]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|106 U+006A: LATIN SMALL LETTER J | [[j]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|107 U+006B: LATIN SMALL LETTER K | [[k]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|108 U+006C: LATIN SMALL LETTER L | [[l]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|109 U+006D: LATIN SMALL LETTER M | [[m]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|110 U+006E: LATIN SMALL LETTER N | [[n]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|111 U+006F: LATIN SMALL LETTER O | [[o]] }} |
||
|- |
|- |
||
| {{chset-left1|7x}} |
| {{chset-left1|7x}} |
||
| {{chset-cell1 |
| {{chset-cell1|112 U+0070: LATIN SMALL LETTER P | [[p]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|113 U+0071: LATIN SMALL LETTER Q | [[q]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|114 U+0072: LATIN SMALL LETTER R | [[r]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|115 U+0073: LATIN SMALL LETTER S | [[s]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|116 U+0074: LATIN SMALL LETTER T | [[t]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|117 U+0075: LATIN SMALL LETTER U | [[u]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|118 U+0076: LATIN SMALL LETTER V | [[v]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|119 U+0077: LATIN SMALL LETTER W | [[w]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|120 U+0078: LATIN SMALL LETTER X | [[x]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|121 U+0079: LATIN SMALL LETTER Y | [[y]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|122 U+007A: LATIN SMALL LETTER Z | [[z]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|123 U+007B: LEFT CURLY BRACKET | [[Left curly bracket|{]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|124 U+007C: VERTICAL LINE" | [[Vertical bar||]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|125 U+007D: RIGHT CURLY BRACKET | [[Right curly bracket|}]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|126 U+007E: TILDE" | [[~]] | style=background:#EFF}} |
||
| {{chset-cell1|||style=background:#DDD}} |
| {{chset-cell1|||style=background:#DDD}} |
||
|- |
|- |
||
Line 298: | Line 305: | ||
|- |
|- |
||
| {{chset-left1|Ax}} |
| {{chset-left1|Ax}} |
||
| {{chset-ctrl1 |
| {{chset-ctrl1|160 U+00A0: NO-BREAK SPACE | [[NBSP]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|161 U+00A1: INVERTED EXCLAMATION MARK | [[¡]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|162 U+00A2: CENT SIGN | [[¢]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|163 U+00A3: POUND SIGN | [[£]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|164 U+00A4: CURRENCY SIGN | [[¤]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|165 U+00A5: YEN SIGN | [[¥]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|166 U+00A6: BROKEN BAR | [[¦]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|167 U+00A7: SECTION SIGN | [[§]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|168 U+00A8: DIAERESIS | [[¨]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|169 U+00A9: COPYRIGHT SIGN | [[©]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|170 U+00AA: FEMININE ORDINAL INDICATOR | [[ª]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|171 U+00AB: LEFT-POINTING DOUBLE ANGLE QUOTATION MARK | [[«]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|172 U+00AC: NOT SIGN | [[¬]] | style=background:#EFF}} |
||
| {{chset-ctrl1 |
| {{chset-ctrl1|173 U+00AD: SOFT HYPHEN | [[soft hyphen|SHY]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|174 U+00AE: REGISTERED SIGN | [[®]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|175 U+00AF: MACRON | [[¯]] | style=background:#EFF}} |
||
|- |
|- |
||
| {{chset-left1|Bx}} |
| {{chset-left1|Bx}} |
||
| {{chset-cell1 |
| {{chset-cell1|176 U+00B0: DEGREE SIGN | [[°]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|177 U+00B1: PLUS-MINUS SIGN | [[±]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|178 U+00B2: SUPERSCRIPT TWO | [[superscript|²]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|179 U+00B3: SUPERSCRIPT THREE | [[superscript|³]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|180 U+00B4: ACUTE ACCENT | [[´]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|181 U+00B5: MICRO SIGN | [[µ]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|182 U+00B6: PILCROW SIGN | [[¶]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|183 U+00B7: MIDDLE DOT | [[·]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|184 U+00B8: CEDILLA | [[¸]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|185 U+00B9: SUPERSCRIPT ONE | [[superscript|¹]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|186 U+00BA: MASCULINE ORDINAL INDICATOR | [[º]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|187 U+00BB: RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK | [[»]] | style=background:#EFF}} |
||
| {{chset-cell1 |
| {{chset-cell1|188 U+00BC: VULGAR FRACTION ONE QUARTER | [[Fraction#Typographical variations|¼]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|189 U+00BD: VULGAR FRACTION ONE HALF | [[½]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|190 U+00BE: VULGAR FRACTION THREE QUARTERS | [[Fraction#Typographical variations|¾]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|191 U+00BF: INVERTED QUESTION MARK | [[¿]] | style=background:#EFF}} |
||
|- |
|- |
||
| {{chset-left1|Cx}} |
| {{chset-left1|Cx}} |
||
| {{chset-cell1 |
| {{chset-cell1|192 U+00C0: LATIN CAPITAL LETTER A WITH GRAVE | [[À]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|193 U+00C1: LATIN CAPITAL LETTER A WITH ACUTE | [[Á]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|194 U+00C2: LATIN CAPITAL LETTER A WITH CIRCUMFLEX | [[Â]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|195 U+00C3: LATIN CAPITAL LETTER A WITH TILDE | [[Ã]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|196 U+00C4: LATIN CAPITAL LETTER A WITH DIAERESIS | [[Ä]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|197 U+00C5: LATIN CAPITAL LETTER A WITH RING ABOVE | [[Å]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|198 U+00C6: LATIN CAPITAL LETTER AE | [[Æ]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|199 U+00C7: LATIN CAPITAL LETTER C WITH CEDILLA | [[Ç]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|200 U+00C8: LATIN CAPITAL LETTER E WITH GRAVE | [[È]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|201 U+00C9: LATIN CAPITAL LETTER E WITH ACUTE | [[É]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|202 U+00CA: LATIN CAPITAL LETTER E WITH CIRCUMFLEX | [[Ê]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|203 U+00CB: LATIN CAPITAL LETTER E WITH DIAERESIS | [[Ë]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|204 U+00CC: LATIN CAPITAL LETTER I WITH GRAVE | [[Ì]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|205 U+00CD: LATIN CAPITAL LETTER I WITH ACUTE | [[Í]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|206 U+00CE: LATIN CAPITAL LETTER I WITH CIRCUMFLEX | [[Î]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|207 U+00CF: LATIN CAPITAL LETTER I WITH DIAERESIS | [[Ï]] }} |
||
|- |
|- |
||
| {{chset-left1|Dx}} |
| {{chset-left1|Dx}} |
||
| {{chset-cell1 |
| {{chset-cell1|208 U+00D0: LATIN CAPITAL LETTER ETH | [[Ð]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|209 U+00D1: LATIN CAPITAL LETTER N WITH TILDE | [[Ñ]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|210 U+00D2: LATIN CAPITAL LETTER O WITH GRAVE | [[Ò]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|211 U+00D3: LATIN CAPITAL LETTER O WITH ACUTE | [[Ó]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|212 U+00D4: LATIN CAPITAL LETTER O WITH CIRCUMFLEX | [[Ô]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|213 U+00D5: LATIN CAPITAL LETTER O WITH TILDE | [[Õ]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|214 U+00D6: LATIN CAPITAL LETTER O WITH DIAERESIS | [[Ö]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|215 U+00D7: MULTIPLICATION SIGN | [[×]] | style=background:#EFD}} |
||
| {{chset-cell1 |
| {{chset-cell1|216 U+00D8: LATIN CAPITAL LETTER O WITH STROKE | [[Ø]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|217 U+00D9: LATIN CAPITAL LETTER U WITH GRAVE | [[Ù]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|218 U+00DA: LATIN CAPITAL LETTER U WITH ACUTE | [[Ú]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|219 U+00DB: LATIN CAPITAL LETTER U WITH CIRCUMFLEX | [[Û]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|220 U+00DC: LATIN CAPITAL LETTER U WITH DIAERESIS | [[Ü]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|221 U+00DD: LATIN CAPITAL LETTER Y WITH ACUTE | [[Ý]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|222 U+00DE: LATIN CAPITAL LETTER THORN | [[Þ]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|223 U+00DF: LATIN SMALL LETTER SHARP S | [[ß]] }} |
||
|- |
|- |
||
| {{chset-left1|Ex}} |
| {{chset-left1|Ex}} |
||
| {{chset-cell1 |
| {{chset-cell1|224 U+00E0: LATIN SMALL LETTER A WITH GRAVE | [[à]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|225 U+00E1: LATIN SMALL LETTER A WITH ACUTE | [[á]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|226 U+00E2: LATIN SMALL LETTER A WITH CIRCUMFLEX | [[â]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|227 U+00E3: LATIN SMALL LETTER A WITH TILDE | [[ã]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|228 U+00E4: LATIN SMALL LETTER A WITH DIAERESIS | [[ä]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|229 U+00E5: LATIN SMALL LETTER A WITH RING ABOVE | [[å]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|230 U+00E6: LATIN SMALL LETTER AE | [[æ]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|231 U+00E7: LATIN SMALL LETTER C WITH CEDILLA | [[ç]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|232 U+00E8: LATIN SMALL LETTER E WITH GRAVE | [[è]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|233 U+00E9: LATIN SMALL LETTER E WITH ACUTE | [[é]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|234 U+00EA: LATIN SMALL LETTER E WITH CIRCUMFLEX | [[ê]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|235 U+00EB: LATIN SMALL LETTER E WITH DIAERESIS | [[ë]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|236 U+00EC: LATIN SMALL LETTER I WITH GRAVE | [[ì]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|237 U+00ED: LATIN SMALL LETTER I WITH ACUTE | [[í]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|238 U+00EE: LATIN SMALL LETTER I WITH CIRCUMFLEX | [[î]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|239 U+00EF: LATIN SMALL LETTER I WITH DIAERESIS | [[ï]] }} |
||
|- |
|- |
||
| {{chset-left1|Fx}} |
| {{chset-left1|Fx}} |
||
| {{chset-cell1 |
| {{chset-cell1|240 U+00F0: LATIN SMALL LETTER ETH | [[ð]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|241 U+00F1: LATIN SMALL LETTER N WITH TILDE | [[ñ]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|242 U+00F2: LATIN SMALL LETTER O WITH GRAVE | [[ò]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|243 U+00F3: LATIN SMALL LETTER O WITH ACUTE | [[ó]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|244 U+00F4: LATIN SMALL LETTER O WITH CIRCUMFLEX | [[ô]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|245 U+00F5: LATIN SMALL LETTER O WITH TILDE | [[õ]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|246 U+00F6: LATIN SMALL LETTER O WITH DIAERESIS | [[ö]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|247 U+00F7: DIVISION SIGN | [[÷]] | style=background:#EFD}} |
||
| {{chset-cell1 |
| {{chset-cell1|248 U+00F8: LATIN SMALL LETTER O WITH STROKE | [[ø]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|249 U+00F9: LATIN SMALL LETTER U WITH GRAVE | [[ù]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|250 U+00FA: LATIN SMALL LETTER U WITH ACUTE | [[ú]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|251 U+00FB: LATIN SMALL LETTER U WITH CIRCUMFLEX | [[û]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|252 U+00FC: LATIN SMALL LETTER U WITH DIAERESIS | [[ü]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|253 U+00FD: LATIN SMALL LETTER Y WITH ACUTE | [[ý]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|254 U+00FE: LATIN SMALL LETTER THORN | [[þ]] }} |
||
| {{chset-cell1 |
| {{chset-cell1|255 U+00FF: LATIN SMALL LETTER Y WITH DIAERESIS | [[ÿ]] }} |
||
|- |
|- |
||
| {{chset-table-footer1| |
| {{chset-table-footer1| |
||
Line 415: | Line 422: | ||
=== ISO/IEC 8859-15 === |
=== ISO/IEC 8859-15 === |
||
[[ISO/IEC 8859-15]] was developed in 1999, as an update of ISO/IEC 8859-1. It provides some characters for French and Finnish text and the [[euro sign]], which are missing from ISO/IEC 8859-1. This required the removal of some infrequently used characters from ISO/IEC |
[[ISO/IEC 8859-15|ISO/IEC 8859-15]] was developed in 1999, as an update of ISO/IEC 8859-1. It provides some characters for French and Finnish text and the [[euro sign]], which are missing from ISO/IEC 8859-1. This required the removal of some infrequently used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: {{code|¤}}, {{code|¦}}, {{code|¨}}, {{code|´}}, {{code|¸}}, {{code|¼}}, {{code|½}}, and {{code|¾}}. Ironically, three of the newly added characters ({{code|Œ}}, {{code|œ}}, and {{code|Ÿ}}) had already been present in [[DEC (company)|DEC]]'s 1983 [[Multinational Character Set]] (MCS), the predecessor to ISO/IEC 8859-1 (1987). Since their original code points were now reused for other purposes, the characters had to be reintroduced under different, less logical code points. |
||
ISO-IR-204, a more minor modification, had been registered in 1998, altering ISO-8859-1 by replacing the [[universal currency sign]] (¤) with the euro sign<ref>{{ |
ISO-IR-204, a more minor modification (called '''code page 61235''' by FreeDOS),<ref>{{cite web |url=https://github.com/FDOS/cpi/blob/master/CPIISO/codepage.txt |title=Cpi/CPIISO/Codepage.TXT at master · FDOS/Cpi |website=[[GitHub]] }}</ref> had been registered in 1998, altering ISO-8859-1 by replacing the [[universal currency sign]] (¤) with the euro sign<ref>{{cite iso-ir |number=204 |title=Supplementary set for Latin-1 alternative with EURO SIGN |sponsor=ITS Information Technology Standardization |date=1998-09-16}}</ref> (the same substitution made by ISO-8859-15). |
||
=== Windows-1252 === |
=== Windows-1252 === |
||
The popular [[Windows-1252]] character set adds all the missing characters provided by [[ISO/IEC 8859-15]], plus a number of typographic symbols, by replacing the rarely used C1 controls in the range 128 to 159 ([[hexadecimal|hex]] 80 to 9F). It is very common to mislabel Windows-1252 text as being in ISO-8859-1. A common result was that all the quotes and apostrophes (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read. Many |
The popular [[Windows-1252]] character set adds all the missing characters provided by [[ISO/IEC 8859-15]], plus a number of typographic symbols, by replacing the rarely used C1 controls in the range 128 to 159 ([[hexadecimal|hex]] 80 to 9F). It is very common to mislabel Windows-1252 text as being in ISO-8859-1. A common result was that all the quotes and apostrophes (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read. Many Web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252 characters, and that behavior was later standardized in [[HTML5]].<ref>{{cite web |last=van Kesteren |first=Anne |url=https://encoding.spec.whatwg.org/#names-and-labels |work=Encoding Standard |title=5.2 Names and labels |publisher=[[WHATWG]] |date=27 January 2015 |access-date=4 February 2015 |archive-url=https://web.archive.org/web/20150204174315/https://encoding.spec.whatwg.org/#names-and-labels |archive-date=4 February 2015 |url-status=live}}</ref> |
||
=== Mac Roman === |
=== Mac Roman === |
||
The [[Apple Macintosh]] computer introduced a character encoding called [[Mac Roman]] in 1984. It was meant to be suitable for Western European [[desktop publishing]]. It is a superset of |
The [[Apple Macintosh]] computer introduced a character encoding called [[Mac Roman]] in 1984. It was meant to be suitable for Western European [[desktop publishing]]. It is a superset of ASCII, and has most of the characters that are in ISO-8859-1 and all the extra characters from Windows-1252, but in a totally different arrangement. The few printable characters that are in ISO/IEC 8859-1, but not in this set, are often a source of trouble when editing text on Web sites using older Macintosh browsers, including the last version of [[Internet Explorer for Mac]]. |
||
=== Other === |
=== Other === |
||
DOS |
[[DOS]] has [[code page 850]], which has all printable characters that ISO-8859-1 has, albeit in a totally different arrangement, plus the most widely used [[graphic character]]s from [[code page 437|code page 437]]. |
||
⚫ | Between 1989<ref name="HP82240B_1989"/> and 2015,<!-- End of production of HP 50g, HP's last RPL calculator. --> [[Hewlett-Packard]] used another superset of ISO-8859-1 on many of their calculators. [[RPL character set|This proprietary character set]] was sometimes referred to simply as "ECMA-94" as well.<ref name="HP82240B_1989">{{cite book |title=HP 82240B Infrared Printer |publisher=[[Hewlett-Packard]] |date=August 1989 |edition=1 |id=HP reorder number 82240-90014 |location=Corvallis, OR, USA }}</ref> HP also has [[code page 1053]], which adds the medium shade (▒, U+2592) at 0x7F.<ref>{{cite web|title=Code Page 1053|url=https://www-03.ibm.com/systems/resources/systems_i_software_globalization_pdf_cp01053z.pdf|archive-url=https://web.archive.org/web/20130121104245/http://www-03.ibm.com/systems/resources/systems_i_software_globalization_pdf_cp01053z.pdf|archive-date=2013-01-21}}</ref> |
||
Several [[EBCDIC]] code pages were purposely designed to have the same set of characters as ISO-8859-1, to allow easy conversion between them. |
|||
⚫ | Between 1989<ref name="HP82240B_1989"/> and 2015,<!-- |
||
== See also == |
== See also == |
||
* [[Latin script in Unicode]] |
* [[Latin script in Unicode]] |
||
* [[Unicode]] |
* [[Unicode]] |
||
* [[Universal Character Set]] |
* [[Universal Coded Character Set]] |
||
** [[DIN 91379|European<!-- Latin --> Unicode subset (DIN 91379)]]<!-- (Thus also Greek and [[Cyrillic script|Cyrillic]] for [[Bulgarian language|Bulgarian]]) --> |
|||
* [[UTF-8]] |
* [[UTF-8]] |
||
* [[Windows code page]]s |
* [[Windows code page]]s |
||
Line 443: | Line 453: | ||
== External links == |
== External links == |
||
*[https://www.iso.org/standard/28245.html ISO/IEC 8859-1:1998] |
*[https://www.iso.org/standard/28245.html ISO/IEC 8859-1:1998] |
||
*[http://std.dkuug.dk/JTC1/sc2/wg3/docs/n411.pdf ISO/IEC FDIS 8859-1:1998]<!-- Mirror: http://open-std.org/JTC1/sc2/wg3/docs/n411.pdf --> — 8-bit single-byte coded graphic character sets, Part 1: Latin alphabet No. 1 ''(draft dated February 12, 1998, published April 15, 1998)'' |
*[http://std.dkuug.dk/JTC1/sc2/wg3/docs/n411.pdf ISO/IEC FDIS 8859-1:1998] {{Webarchive|url=https://web.archive.org/web/20200930013716/http://std.dkuug.dk/JTC1/SC2/WG3/docs/n411.pdf |date=2020-09-30 }}<!-- Mirror: http://open-std.org/JTC1/sc2/wg3/docs/n411.pdf --> — 8-bit single-byte coded graphic character sets, Part 1: Latin alphabet No. 1 ''(draft dated February 12, 1998, published April 15, 1998)'' |
||
*[https:// |
*[https://ecma-international.org/publications-and-standards/standards/ecma-94/ Standard ECMA-94: 8-Bit Single Byte Coded Graphic Character Sets — Latin Alphabets No. 1 to No. 4] ''2nd edition (June 1986)'' |
||
*[https:// |
*[https://itscj.ipsj.or.jp/ir/100.pdf ISO-IR 100] Right-Hand Part of Latin Alphabet No.1 ''(February 1, 1986)'' |
||
*[https://www.eki.ee/letter/ The Letter Database] |
*[https://www.eki.ee/letter/ The Letter Database] |
||
*<!-- <ref name="Czyborra_1998"> -->{{cite web |title=The ISO 8859 Alphabet Soup |author-first=Roman |author-last=Czyborra |date=1998-12-01 |url=https://czyborra.com/charsets/iso8859.html#ISO-8859-1 |access-date=2016-12-01 |url-status=live |archive-url=https://web.archive.org/web/20161201141241/http://czyborra.com/charsets/iso8859.html |archive-date=2016-12-01}} [https://czyborra.com/charsets/iso8859-1.txt.gz |
*<!-- <ref name="Czyborra_1998"> -->{{cite web |title=The ISO 8859 Alphabet Soup |author-first=Roman |author-last=Czyborra |date=1998-12-01 |url=https://czyborra.com/charsets/iso8859.html#ISO-8859-1 |access-date=2016-12-01 |url-status=live |archive-url=https://web.archive.org/web/20161201141241/http://czyborra.com/charsets/iso8859.html |archive-date=2016-12-01}} [https://czyborra.com/charsets/iso8859-1.txt.gz] [https://czyborra.com/charsets/iso8859-1.bdf.gz]<!-- </ref> --> |
||
{{Character encodings}} |
{{Character encodings}} |
Latest revision as of 03:18, 24 December 2024
MIME / IANA | ISO-8859-1 |
---|---|
Alias(es) | iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819 |
Language(s) | English, various others |
Standard | ISO/IEC 8859 |
Classification | Extended ASCII, ISO/IEC 8859 |
Extends | US-ASCII |
Based on | DEC MCS |
Succeeded by | |
Other related encoding(s) | |
ISO/IEC 8859-1:1998, Information technology—8-bit single-byte coded graphic character sets—Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. ISO/IEC 8859-1 encodes what it refers to as "Latin alphabet no. 1", consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is the basis for some popular 8-bit character sets and the first two blocks of characters in Unicode.
As of December 2024[update], 1.1% of all web sites use ISO/IEC 8859-1.[1][2] It is the most declared single-byte character encoding, but as Web browsers and the HTML5 standard[3] interpret them as the superset Windows-1252, these documents may include characters from that set. Some countries or languages show a higher usage than the global average, in 2024 Brazil according to website use, use is at 2.9%,[4] and in Germany at 2.5%.[5][6]
ISO-8859-1 was (according to the standard, at least) the default encoding of documents delivered via HTTP with a MIME type beginning with text/
, the default encoding of the values of certain descriptive HTTP headers, and defined the repertoire of characters allowed in HTML 3.2 documents. It is specified by many other standards.[example needed] In practice, the superset encoding Windows-1252 is the more likely effective default[7] and it is increasingly common for UTF-8 to work whether or not a standard specifies it.
ISO-8859-1 is the IANA preferred name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. The following other aliases are registered: iso-ir-100, csISOLatin1, latin1, l1, IBM819, Code page 28591 a.k.a. Windows-28591 is used for it in Windows.[8] IBM calls it code page 819 or CP819 (CCSID 819).[9][10][11][12] Oracle calls it WE8ISO8859P1.[13]
Coverage
[edit]Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following languages (while it may exclude correct quotation marks such as for many languages including German and Icelandic):
Modern languages with complete coverage
[edit]- Notes
- ^ Basic classical orthography
- ^ Rumi script
- ^ Bokmål and Nynorsk
- ^ European and Brazilian
Languages with incomplete coverage
[edit]ISO-8859-1 was commonly used[citation needed] for certain languages, even though it lacks characters used by these languages. In most cases, only a few letters are missing or they are rarely used, and they can be replaced with characters that are in ISO-8859-1 using some form of typographic approximation. The following table lists such languages.
Language | Missing characters | Typical workaround | Supported by |
---|---|---|---|
Catalan | Ŀ, ŀ (deprecated) | L·, l· | |
Danish | Ǿ, ǿ (the accent is optional and ǿ is very rare) | Ø, ø or øe | |
Dutch | IJ, ij (debatable), j́ (in emphasized words like "blíj́f") | digraphs IJ, ij or ÿ; blíjf | |
Estonian, Finnish | Š, š, Ž, ž (only present in loanwords) | Sh, sh, Zh, zh | ISO-8859-15, Windows-1252 |
French | Œ, œ, and the very rare Ÿ | digraphs OE, oe; Y or Ý | ISO-8859-15, Windows-1252 |
German | ẞ (capital ß, used only in all capitals) | digraph SS or SZ | |
Hungarian | Ő, ő, Ű, ű | Ö, ö, Ü, ü Õ, õ, Û, û (the characters replaced in 8859-2) |
ISO-8859-2, Windows-1250 |
Irish (traditional orthography) | Ḃ, ḃ, Ċ, ċ, Ḋ, ḋ, Ḟ, ḟ, Ġ, ġ, Ṁ, ṁ, Ṗ, ṗ, Ṡ, ṡ, Ṫ, ṫ | Bh, bh, Ch, ch, Dh, dh, Fh, fh, Gh, gh, Mh, mh, Ph, ph, Sh, sh, Th, th | ISO-8859-14 |
Maltese | Ċ, ċ, Ġ, ġ, Ħ, ħ, Ż, ż | C, c, G, g, H, h, Z, z | ISO-8859-3 |
Welsh | Ẁ, ẁ, Ẃ, ẃ, Ŵ, ŵ, Ẅ, ẅ, Ỳ, ỳ, Ŷ, ŷ, Ÿ | W, w, Y, y, Ý, ý | ISO-8859-14 |
The letter ÿ, which appears in French only very rarely, mainly in city names such as L'Haÿ-les-Roses and never at the beginning of words, is included only in lowercase form. The slot corresponding to its uppercase form is occupied by the lowercase letter ß from the German language, which did not have an uppercase form at the time when the standard was created.
Quotation marks
[edit]For some languages listed above, the correct typographical quotation marks are missing, as only « »
, " "
, and ' '
are included. Also, this scheme does not provide for oriented (6- or 9-shaped) single or double quotation marks. Some fonts will display the spacing grave accent (0x60) and the apostrophe (0x27) as a matching pair of oriented single quotation marks (see Quotation mark § Typewriters and early computers), but this is not considered part of the modern standard.
Superscript digits
[edit]Only 3 superscript digits have been encoded: ²
at 0xB2, ³
at 0xB3, and ¹
at 0xB9, lacking the superscript digit 0 and digits 4–9. Additionally, none of the subscript digits have been encoded. A workaround would be to use rich text formatting for the digits not covered by this standard.
History
[edit]ISO 8859-1 was based on the Multinational Character Set (MCS) used by Digital Equipment Corporation (DEC) in the popular VT220 terminal in 1983. It was developed within the European Computer Manufacturers Association (ECMA), and published in March 1985 as ECMA-94,[14] by which name it is still sometimes known. The second edition of ECMA-94 (June 1986)[15] also included ISO 8859-2, ISO 8859-3, and ISO 8859-4 as part of the specification.
The original draft of ISO 8859-1 placed French Œ and œ at code points 215 (0xD7) and 247 (0xF7), as in the MCS. However, the delegate from France, being neither a linguist nor a typographer, falsely stated that these are not independent French letters on their own, but mere ligatures (like fi or fl), supported by the delegate team from Bull Publishing Company, who regularly did not print French with Œ/œ in their house style at the time. An anglophone delegate from Canada insisted on retaining Œ/œ but was rebuffed by the French delegate and the team from Bull. These code points were soon filled with × and ÷ under the suggestion of the German delegation. Support for French was further reduced when it was again falsely stated that the letter ÿ is "not French", resulting in the absence of the capital Ÿ. In fact, the letter ÿ is found in a number of French proper names, and the capital letter has been used in dictionaries and encyclopedias.[16] These characters were added to ISO/IEC 8859-15:1999. BraSCII matches the original draft.
In 1985, Commodore adopted ECMA-94 for its new AmigaOS operating system.[17] The Seikosha MP-1300AI impact dot-matrix printer, used with the Amiga 1000, included this encoding.[citation needed]
In 1990, the first version of Unicode used the code points of ISO-8859-1 as the first 256 Unicode code points.
In 1992, the IANA registered the character map ISO_8859-1:1987, more commonly known by its preferred MIME name of ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the Internet. This map assigns the C0 and C1 control codes to the unassigned code values thus provides for 256 characters via every possible 8-bit value.
Code page layout
[edit]0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ||||||||||||||||
1x | ||||||||||||||||
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
8x | ||||||||||||||||
9x | ||||||||||||||||
Ax | NBSP | ¡ | ¢ | £ | ¤ | ¥ | ¦ | § | ¨ | © | ª | « | ¬ | SHY | ® | ¯ |
Bx | ° | ± | ² | ³ | ´ | µ | ¶ | · | ¸ | ¹ | º | » | ¼ | ½ | ¾ | ¿ |
Cx | À | Á | Â | Ã | Ä | Å | Æ | Ç | È | É | Ê | Ë | Ì | Í | Î | Ï |
Dx | Ð | Ñ | Ò | Ó | Ô | Õ | Ö | × | Ø | Ù | Ú | Û | Ü | Ý | Þ | ß |
Ex | à | á | â | ã | ä | å | æ | ç | è | é | ê | ë | ì | í | î | ï |
Fx | ð | ñ | ò | ó | ô | õ | ö | ÷ | ø | ù | ú | û | ü | ý | þ | ÿ |
Undefined
Symbols and punctuation
Undefined in the first release of ECMA-94 (1985).[14] In the original draft Œ was at 0xD7 and œ was at 0xF7. |
Similar character sets
[edit]ISO/IEC 8859-15
[edit]ISO/IEC 8859-15 was developed in 1999, as an update of ISO/IEC 8859-1. It provides some characters for French and Finnish text and the euro sign, which are missing from ISO/IEC 8859-1. This required the removal of some infrequently used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: ¤
, ¦
, ¨
, ´
, ¸
, ¼
, ½
, and ¾
. Ironically, three of the newly added characters (Œ
, œ
, and Ÿ
) had already been present in DEC's 1983 Multinational Character Set (MCS), the predecessor to ISO/IEC 8859-1 (1987). Since their original code points were now reused for other purposes, the characters had to be reintroduced under different, less logical code points.
ISO-IR-204, a more minor modification (called code page 61235 by FreeDOS),[18] had been registered in 1998, altering ISO-8859-1 by replacing the universal currency sign (¤) with the euro sign[19] (the same substitution made by ISO-8859-15).
Windows-1252
[edit]The popular Windows-1252 character set adds all the missing characters provided by ISO/IEC 8859-15, plus a number of typographic symbols, by replacing the rarely used C1 controls in the range 128 to 159 (hex 80 to 9F). It is very common to mislabel Windows-1252 text as being in ISO-8859-1. A common result was that all the quotes and apostrophes (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read. Many Web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252 characters, and that behavior was later standardized in HTML5.[20]
Mac Roman
[edit]The Apple Macintosh computer introduced a character encoding called Mac Roman in 1984. It was meant to be suitable for Western European desktop publishing. It is a superset of ASCII, and has most of the characters that are in ISO-8859-1 and all the extra characters from Windows-1252, but in a totally different arrangement. The few printable characters that are in ISO/IEC 8859-1, but not in this set, are often a source of trouble when editing text on Web sites using older Macintosh browsers, including the last version of Internet Explorer for Mac.
Other
[edit]DOS has code page 850, which has all printable characters that ISO-8859-1 has, albeit in a totally different arrangement, plus the most widely used graphic characters from code page 437.
Between 1989[21] and 2015, Hewlett-Packard used another superset of ISO-8859-1 on many of their calculators. This proprietary character set was sometimes referred to simply as "ECMA-94" as well.[21] HP also has code page 1053, which adds the medium shade (▒, U+2592) at 0x7F.[22]
Several EBCDIC code pages were purposely designed to have the same set of characters as ISO-8859-1, to allow easy conversion between them.
See also
[edit]- Latin script in Unicode
- Unicode
- Universal Coded Character Set
- UTF-8
- Windows code pages
- ISO/IEC JTC 1/SC 2
References
[edit]- ^ "Historical trends in the usage statistics of character encodings for Web sites, December 2024". W3Techs. Retrieved 2024-12-16.
- ^ Cowan, John; Soltano, Sam (August 2014). "Source of character encoding statistics?". W3Techs. Archived from the original on 4 April 2024.
- ^ "Encoding". WHATWG. 27 January 2015. sec. 5.2 Names and labels. Archived from the original on 4 February 2015. Retrieved 4 February 2015.
- ^ "Distribution of Character Encodings among websites that use Brazil". W3Techs. Retrieved 2024-12-16.
- ^ "Distribution of Character Encodings among websites that use .de". W3Techs. Retrieved 2024-12-16.
- ^ "Distribution of Character Encodings among websites that use German". W3Techs. Archived from the original on 4 April 2024. Retrieved 2024-12-16.
- ^ "c++ - What is the native narrow string encoding on Windows?". Stack Overflow. Jan 2011. Retrieved 2023-02-16.
- ^ "Code Page Identifiers". Microsoft Corporation. Retrieved 2010-12-19.
- ^ "Code page 819 information document". Archived from the original on 2017-01-16.
- ^ "CCSID 819 information document". Archived from the original on 2016-03-27.
- ^ Code Page CPGID 00819 (pdf) (PDF), IBM
- ^ Code Page CPGID 00819 (txt), IBM
- ^ Baird, Cathy; Chiba, Dan; Chu, Winson; Fan, Jessica; Ho, Claire; Law, Simon; Lee, Geoff; Linsley, Peter; Matsuda, Keni; Oscroft, Tamzin; Takeda, Shige; Tanaka, Linus; Tozawa, Makoto; Trute, Barry; Tsujimoto, Mayumi; Wu, Ying; Yau, Michael; Yu, Tim; Wang, Chao; Wong, Simon; Zhang, Weiran; Zheng, Lei; Zhu, Yan; Moore, Valarie (2002) [1996]. "Appendix A: Locale Data". Oracle9i Database Globalization Support Guide (PDF) (Release 2 (9.2) ed.). Oracle Corporation. Oracle A96529-01. Archived (PDF) from the original on 2017-02-14. Retrieved 2017-02-14.
- ^ a b Standard ECMA-94: 8-bit Single-Byte Coded Graphic Character Set (PDF) (1 ed.). European Computer Manufacturers Association (ECMA). March 1985 [1984-12-14]. Archived (PDF) from the original on 2016-12-02. Retrieved 2016-12-01.
[…] Since 1982 the urgency of the need for an 8-bit single-byte coded character set was recognized in ECMA as well as in ANSI/X3L2 and numerous working papers were exchanged between the two groups. In February 1984 ECMA TC1 submitted to ISO/TC97/SC2 a proposal for such a coded character set. At its meeting of April 1984 SC decided to submit to TC97 a proposal for a new item of work for this topic. Technical discussions during and after this meeting led TC1 to adopt the coding scheme proposed by X3L2. Part 1 of Draft International Standard DTS 8859 is based on this joint ANSI/ECMA proposal. […] Adopted as an ECMA Standard by the General Assembly of Dec. 13–14, 1984. […]
- ^ "Second edition of ECMA-94 (June 1986)" (PDF).
- ^ André, Jacques (1996). "ISO Latin-1, norme de codage des caractères européens? Trois caractères français en sont absents!" (PDF). Cahiers GUTenberg (in French) (25): 65–77. doi:10.5802/cg.205.
- ^ Malyshev, Michael (2003-01-10). "Registration of new charset [Amiga-1251]". ATO-RU (Amiga Translation Organization - Russian Department). Archived from the original on 2016-12-05. Retrieved 2016-12-05.
- ^ "Cpi/CPIISO/Codepage.TXT at master · FDOS/Cpi". GitHub.
- ^ ITS Information Technology Standardization (1998-09-16). Supplementary set for Latin-1 alternative with EURO SIGN (PDF). ITSCJ/IPSJ. ISO-IR-204.
- ^ van Kesteren, Anne (27 January 2015). "5.2 Names and labels". Encoding Standard. WHATWG. Archived from the original on 4 February 2015. Retrieved 4 February 2015.
- ^ a b HP 82240B Infrared Printer (1 ed.). Corvallis, OR, USA: Hewlett-Packard. August 1989. HP reorder number 82240-90014.
- ^ "Code Page 1053" (PDF). Archived from the original (PDF) on 2013-01-21.
External links
[edit]- ISO/IEC 8859-1:1998
- ISO/IEC FDIS 8859-1:1998 Archived 2020-09-30 at the Wayback Machine — 8-bit single-byte coded graphic character sets, Part 1: Latin alphabet No. 1 (draft dated February 12, 1998, published April 15, 1998)
- Standard ECMA-94: 8-Bit Single Byte Coded Graphic Character Sets — Latin Alphabets No. 1 to No. 4 2nd edition (June 1986)
- ISO-IR 100 Right-Hand Part of Latin Alphabet No.1 (February 1, 1986)
- The Letter Database
- Czyborra, Roman (1998-12-01). "The ISO 8859 Alphabet Soup". Archived from the original on 2016-12-01. Retrieved 2016-12-01. [1] [2]