Numerals in Unicode: Difference between revisions
f?????¿Nf Tags: Visual edit Mobile edit Mobile web edit |
Drmccreedy (talk | contribs) Undid revision 1254887336 by 49.186.78.32 (talk) revert vandalism |
||
(46 intermediate revisions by 27 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Graphemes for various number systems}} |
|||
''' |
A '''numeral''' (often called ''number'' in [[Unicode]]) is a character that denotes a number. The '''[[decimal]]''' number digits {{mono|1=0–9}} are used widely in various writing systems throughout the world, however the [[graphemes]] representing the decimal digits differ widely. Therefore Unicode includes 22 different sets of graphemes for the decimal digits, and also various decimal points, thousands separators, negative signs, etc. Unicode also includes several '''non-decimal''' numerals such as [[Aegean numerals]], [[Roman numerals]], [[counting rod numerals]], [[Mayan numerals]], [[Babylonian numerals|Cuneiform numerals]] and [[Attic numerals|ancient Greek numerals]]. There is also a large number of typographical variations of the [[Western Arabic numerals]] provided for specialized mathematical use and for compatibility with earlier character sets, such as [[Unicode subscripts and superscripts|²]] or ②, and composite characters such as ½. |
||
⚫ | |||
Numerals invariably involve composition of glyphs as a limited number of characters are composed to make other numerals. For example, the sequence 9–9–0 in Arabic numerals composes the numeral for nine hundred ninety (990). In Roman numerals, the same number is expressed by the composed numeral Ⅹↀ or ⅩⅯ. Each of these is a distinct numeral for representing the same abstract number. The semantics of the numerals differ in particular in their composition. Hindu-Arabic digits are positional-value compositions, while the Roman numerals are sign-value and they are additive and subtractive depending on their composition. |
|||
⚫ | Grouped by their numerical property as used in a text, Unicode has four values for Numeric Type. First there is the "not a number" type. Then there are [[radix|decimal-radix]] numbers, commonly used in Western style decimals (plain 0–9), there are numbers that are not part of a decimal system such as Roman numbers, and decimal numbers in typographic context, such as encircled numbers. Not noted is a numbering like "A. B. C." for chapter numbering. |
||
⚫ | |||
⚫ | Grouped by their numerical property as used in a text, Unicode has four values for Numeric Type. First there is the "not a number" type. Then there are [[radix|decimal-radix]] numbers, commonly used in Western style decimals (plain |
||
{{Numeric Type (Unicode)|state=uncollapsed}} |
{{Numeric Type (Unicode)|state=uncollapsed}} |
||
Line 14: | Line 13: | ||
==Numerals by script== |
==Numerals by script== |
||
⚫ | |||
⚫ | The [[Hindu–Arabic numeral system]] involves ten digits representing 0–9. Unicode includes the [[Western Arabic numerals]] in the Basic Latin (or ASCII derived) block. The digits are repeated in several other scripts: [[Eastern Arabic numerals|Eastern Arabic]], Balinese, Bengali, Devanagari, Ethiopic, Gujarati, Gurmukhi, Telugu, Khmer, Lao, Limbu, Malayalam, Mongolian, Myanmar, New Tai Lue, Nko, Oriya, Telugu, Thai, Tibetan, Osmanya. Unicode includes a numeric value property for each digit to assist in collation and other text processing operations. However, there is no mapping between the various related digits. |
||
⚫ | |||
⚫ | |||
⚫ | The [[ |
||
⚫ | |||
==== Fractions ==== |
==== Fractions ==== |
||
The fraction slash character (U+2044) allows authors using Unicode to compose any arbitrary fraction along with the decimal digits. This was intended to instruct font rendering to make the surrounding digits smaller and raise them on the left and lower them on the right, but this is rarely implemented ( |
The fraction slash character (U+2044) allows authors using Unicode to compose any arbitrary fraction along with the decimal digits. This was intended to instruct font rendering to make the surrounding digits smaller and raise them on the left and lower them on the right, but this is rarely implemented. (A workaround is to use the super/subscript characters described below, but only Arabic numerals are available.) Unicode also includes a handful of [[vulgar fraction]]s as compatibility characters, but discourages their use. |
||
==== Decimal fractions ==== |
==== Decimal fractions ==== |
||
Several characters in Unicode can serve as a decimal separator depending on the locale. Decimal fractions are represented in text as a sequence of decimal digit numerals with a decimal separator separating the whole-number portion from the fractional portion. For example, the decimal fraction for |
Several characters in Unicode can serve as a decimal separator depending on the locale. Decimal fractions are represented in text as a sequence of decimal digit numerals with a decimal separator separating the whole-number portion from the fractional portion. For example, the decimal fraction for ¼ is expressed as zero-point-two-five ("0.25"). Unicode has no dedicated general decimal separator but unifies the decimal separator function with other punctuation characters. So the "." used in "0.25" is the same period character (U+002E) used to end the sentence. However, cultures vary in the glyph or grapheme used for a decimal separator. So in some locales, the comma (U+002C) may be used instead: "0,25". Still other locales use a space (or non-breaking space) for "0 25". The Arabic writing system includes a dedicated character for a decimal separator that looks much like a comma "٫" (U+066B) which when combined with the Arabic digits to express one-quarter appears as: "٠٫٢٥". |
||
==== Characters for mathematical constants ==== |
==== Characters for mathematical constants ==== |
||
⚫ | Currently, three Unicode characters semantically represent mathematical constants: {{unichar|210E|Planck constant|nlink=}}, the {{unichar|210f|Planck constant over two pi|nlink=Planck constant#Value}}, and {{unichar|2107|Euler constant|nlink=}} (of unknown significance<ref>It is unknown which constant this is supposed to be. [https://unicode.org/mail-arch/unicode-ml/y2002-m04/0073.html Xerox standard XCCS 353/046 just says "Euler's."]</ref>). Other mathematical constants can be represented using characters that have multiple semantic uses. For example, although Unicode includes a character for ''natural exponent'' ℯ (U+212F) its UCS canonical name derives from its glyph: {{unichar|212f|script small e}}; and the mathematical constant [[Pi|π]], 3.141592.., is represented by {{unichar|03c0|greek small letter pi}}. |
||
⚫ | Currently, three Unicode characters semantically represent mathematical constants: {{unichar|210E|Planck constant|nlink=}}, the {{unichar|210f|Planck constant over two pi|nlink=Planck constant#Value}}, and {{unichar|2107|Euler constant|nlink= |
||
==== Rich text and other compatibility numerals ==== |
==== Rich text and other compatibility numerals ==== |
||
The Western Arabic numerals also appear among the compatibility characters as rich text variant forms including bold, double-struck, monospace, sans-serif and sans-serif bold |
The Western Arabic numerals also appear among the compatibility characters as rich text variant forms including bold, double-struck, monospace, sans-serif and sans-serif bold, along with fullwidth variants for legacy vertical text support. |
||
Rich text parenthesized, circled and other variants are also included in the blocks |
Rich text parenthesized, circled and other variants are also included in the blocks Enclosed CJK Letters and Months; Enclosed Alphanumerics, Superscripts and Subscripts; Number Forms; and Dingbats. |
||
=== |
=== Suzhou (huāmǎ/Sūzhōu mǎzi) numerals === |
||
{{Main|Suzhou numerals|Chinese numerals}} |
{{Main|Suzhou numerals|Chinese numerals}} |
||
The ''huāmǎ'' system is a variation of the rod numeral system. Rod numerals are closely related to the [[counting rods]] and the [[Chinese abacus|abacus]], which is why the numeric symbols for 1, 2, 3, 6, 7 and 8 in the ''huāmǎ'' system are represented in a similar way as on the abacus. Nowadays, the ''huāmǎ'' system is only used for displaying prices in Chinese markets or on traditional handwritten invoices. |
The ''huāmǎ'' ({{Lang-zh|s=花码|t=花碼}}'')''/''Sūzhōu mǎzi'' ({{Lang-zh|s=苏州码子|t=蘇州碼字}}) system is a variation of the rod numeral system. Rod numerals are closely related to the [[counting rods]] and the [[Chinese abacus|abacus]], which is why the numeric symbols for 1, 2, 3, 6, 7 and 8 in the ''huāmǎ'' system are represented in a similar way as on the abacus. Nowadays, the ''huāmǎ'' system is only used for displaying prices in Chinese markets or on traditional handwritten invoices. |
||
The digits of the Suzhou numerals are in the [[CJK Symbols and Punctuation]] block at U+3021—U+3029, U+3007, U+5341, U+5344, and U+5345. In Unicode 3.0 these characters are incorrectly called [[Hangzhou]] style numerals. In the Unicode 4.0, an erratum was added which stated:<ref name="UTN27">{{cite web|last=Freytag|first=Asmus|author2=Rick McGowan|author3=Ken Whistler|date=2006-05-08|title=UTN #27: Known anomalies in Unicode Character Names|url=http://unicode.org/notes/tn27/|work=Technical Notes|publisher=Unicode Consortium|accessdate=2008-06-13}}</ref>{{quotation|The Suzhou numerals (Chinese ''su1zhou1ma3zi'') are special numeric forms used by traders to display the prices of goods. The use of "HANGZHOU" in the names is a misnomer.}}All references to "Hangzhou" in the Unicode standard have been corrected to "Suzhou" except for the character names themselves, which cannot be changed once assigned, according to the Unicode Stability Policy.<ref>{{cite web|date=2008-02-28|title=Name Stability|url=https://www.unicode.org/policies/stability_policy.html#Name|work=Unicode Character Encoding Stability Policy|publisher=Unicode Consortium|accessdate=2008-06-13}}</ref> (This policy allows software to use the names as unique identifiers.) |
|||
=== Suzhou (huāmǎ) numerals in Unicode === |
|||
According to the [[Unicode]] standard version 3.0, these characters are called [[Hangzhou]] style numerals. This indicates that it is not used only by Cantonese in Hong Kong. In the Unicode standard 4.0, an erratum was added which stated: |
|||
The digits of the Suzhou numerals are designated in the CJK Symbols and Punctuation block between U+3021 and U+3029, U+3007, U+5341, U+5344, and. U+5345. |
|||
=== Japanese and Korean numerals === |
=== Japanese and Korean numerals === |
||
Line 57: | Line 51: | ||
'''Attic numerals''' were used by [[ancient Greece|ancient Greeks]], possibly from the [[7th century BC]]. They were also known as '''Herodianic numerals''' because they were first described in a 2nd-century manuscript by [[Aelius Herodianus|Herodian]]. They are also known as '''acrophonic numerals''' because all of the symbols used derive from the first letters of the words that the symbols represent: 'one', 'five', 'ten', 'hundred', 'thousand' and 'ten thousand'. See [[Greek numerals]] and [[acrophony]]. |
'''Attic numerals''' were used by [[ancient Greece|ancient Greeks]], possibly from the [[7th century BC]]. They were also known as '''Herodianic numerals''' because they were first described in a 2nd-century manuscript by [[Aelius Herodianus|Herodian]]. They are also known as '''acrophonic numerals''' because all of the symbols used derive from the first letters of the words that the symbols represent: 'one', 'five', 'ten', 'hundred', 'thousand' and 'ten thousand'. See [[Greek numerals]] and [[acrophony]]. |
||
{|class="wikitable" |
{| class="wikitable" |
||
|- |
|- |
||
!Decimal |
! Decimal |
||
!Symbol |
! Symbol |
||
!Greek numeral |
! Greek numeral |
||
|- |
|- |
||
|[[1 (number)|1]] |
| [[1 (number)|1]] |
||
|Ι |
| Ι |
||
|ἴος or ἰός ( |
| {{lang|grc|ἴος}} or {{lang|grc|ἰός}} ({{lang|grc-Latn|ios}}) |
||
|- |
|- |
||
|[[5 (number)|5]] |
| [[5 (number)|5]] |
||
|Π |
| Π |
||
|πέντε (' |
| {{lang|grc|πέντε}} ('{{lang|grc-Latn|pente}}) |
||
|- |
|- |
||
|[[10 (number)|10]] |
| [[10 (number)|10]] |
||
|Δ |
| Δ |
||
|δέκα ( |
| {{lang|grc|δέκα}} ({{lang|grc-Latn|deka}}) |
||
|- |
|- |
||
|[[100 (number)|100]] |
| [[100 (number)|100]] |
||
|Η |
| Η |
||
|ἑκατόν (' |
| {{lang|grc|ἑκατόν}} ('{{lang|grc-Latn|hekaton}}}) |
||
|- |
|- |
||
|[[1000 (number)|1000]] |
| [[1000 (number)|1000]] |
||
|Χ |
| Χ |
||
|χίλιοι ( |
| {{lang|grc|χίλιοι}} ({{lang|grc-Latn|khilioi}}) |
||
|- |
|- |
||
|[[10000 (number)|10000]] |
| [[10000 (number)|10000]] |
||
|Μ |
| Μ |
||
|μύριοι ( |
| {{lang|grc|μύριοι}} ({{lang|grc-Latn|myrioi}}) |
||
|} |
|} |
||
Line 96: | Line 90: | ||
Roman numerals are commonly used today in numbered lists (in outline format), clockfaces, pages preceding the main body of a book, chord triads in music analysis ([[Roman numeral analysis]]), the numbering of movie and video game sequels, book publication dates, successive political leaders or children with identical names, and the numbering of some sport events, such as the [[Olympic Games]] or the [[Super Bowl]]. |
Roman numerals are commonly used today in numbered lists (in outline format), clockfaces, pages preceding the main body of a book, chord triads in music analysis ([[Roman numeral analysis]]), the numbering of movie and video game sequels, book publication dates, successive political leaders or children with identical names, and the numbering of some sport events, such as the [[Olympic Games]] or the [[Super Bowl]]. |
||
Unicode has a number of characters specifically designated as Roman numerals, as part of the ''[[Number Forms]]''<ref name="UnicodeChartU2150" >[https://www.unicode.org/charts/PDF/U2150.pdf Unicode Number Forms]</ref> range from U+2160 to U+2188. This range includes both upper- and lowercase numerals, as well as pre-combined characters for numbers up to 12 (Ⅻ or XII). One reason for the existence of pre-combined numbers is to facilitate the setting of multiple-letter numbers (such as VIII) on a single horizontal line in Asian vertical text. The Unicode standard, however, includes special Roman numeral code points for compatibility only, stating that "[f]or most purposes, it is preferable to compose the Roman numerals from sequences of the appropriate Latin letters".<ref>{{ Citation |
|||
| title=The Unicode Standard, Version 6.0 – Electronic edition |
| title=The Unicode Standard, Version 6.0 – Electronic edition |
||
| url=https://www.unicode.org/versions/Unicode6.0.0/ch15.pdf |
| url=https://www.unicode.org/versions/Unicode6.0.0/ch15.pdf |
||
Line 104: | Line 98: | ||
}}</ref> |
}}</ref> |
||
Additionally, characters exist for archaic<ref name="UnicodeChartU2150"/> forms of 1000, 5000, 10,000, large reversed C (Ɔ), late 6 (ↅ, similar to Greek [[Stigma (letter)|Stigma]]: Ϛ), early 50 (ↆ, similar to down arrow ↓⫝⊥<ref name="PropN3218">[http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3218.pdf David J. Perry: Proposal to Add Additional Ancient Roman Characters to UCS]</ref>), 50,000, and 100,000. |
Additionally, characters exist for archaic<ref name="UnicodeChartU2150"/> forms of 1000, 5000, 10,000, [[Roman numerals#Apostrophus|large reversed C (Ɔ)]], late 6 (ↅ, similar to Greek [[Stigma (letter)|Stigma]]: Ϛ), early 50 (ↆ, similar to down arrow ↓⫝⊥<ref name="PropN3218">[http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3218.pdf David J. Perry: Proposal to Add Additional Ancient Roman Characters to UCS]</ref>), 50,000, and 100,000. The small reversed c, ↄ, is not intended to be used in Roman numerals, but as [[lower case]] [[Claudian letter]] Ↄ. |
||
{| class="wikitable" |
{| class="wikitable" |
||
|+ Table of Roman numerals in Unicode |
|+ Table of Roman numerals in Unicode |
||
|- |
|- |
||
! |
! !!0!!1!!2!!3!!4!!5!!6!!7!!8!!9!!A!!B!!C!!D!!E!!F |
||
|- |
|- |
||
! Value<ref>For the first two rows</ref> |
! Value<ref>For the first two rows</ref> |
||
|1||2||3||4||5||6||7||8||9||10||11||12||50||100||500||1,000 |
|||
|- class="Unicode" |
|- class="Unicode" |
||
! U+216x |
! U+216x |
||
Line 118: | Line 113: | ||
! U+217x |
! U+217x |
||
|ⅰ||ⅱ||ⅲ||ⅳ||ⅴ||ⅵ||ⅶ||ⅷ||ⅸ||ⅹ||ⅺ||ⅻ||ⅼ||ⅽ||ⅾ||ⅿ |
|ⅰ||ⅱ||ⅲ||ⅳ||ⅴ||ⅵ||ⅶ||ⅷ||ⅸ||ⅹ||ⅺ||ⅻ||ⅼ||ⅽ||ⅾ||ⅿ |
||
⚫ | |||
⚫ | |||
⚫ | |||
! !!0!!1!!2!!3!!4!!5!!6!!7!!8 |
|||
|- |
|- |
||
⚫ | |||
! Value!!1000!!5000!!10,000!!–!!–!!6!!50!!50,000!!100,000!!colspan=7|500,000 |
|||
|1000||5000||10,000||100||100||6||50||50,000||100,000 |
|||
|- class="Unicode" |
|- class="Unicode" |
||
! U+218x |
! U+218x |
||
|ↀ |
|ↀ||ↁ||ↂ||Ↄ||ↄ||ↅ||ↆ||ↇ||ↈ |
||
|} |
|} |
||
Line 133: | Line 134: | ||
{{Main|Counting rods}} |
{{Main|Counting rods}} |
||
⚫ | Counting rod numerals are included in their own block in the [[Supplementary Multilingual Plane]] (SMP) as of Unicode 5.0. There are nine "horizontal" digits (U+1D360 to U+1D368) and nine "vertical" digits (U+1D369 to U+1D371), the horizontal digits are used for odd powers of ten and the vertical digits for even powers of ten. Zero should be represented by U+3007 (〇, ideographic number zero) and the negative sign should be represented by U+20E5 (combining reverse solidus overlay).<ref>{{ Citation |
||
⚫ | |||
⚫ | |||
! style="width:50px" | 0 |
|||
! style="width:50px" | 1 |
|||
! style="width:50px" | 2 |
|||
! style="width:50px" | 3 |
|||
! style="width:50px" | 4 |
|||
! style="width:50px" | 5 |
|||
! style="width:50px" | 6 |
|||
! style="width:50px" | 7 |
|||
! style="width:50px" | 8 |
|||
! style="width:50px" | 9 |
|||
⚫ | |||
! style="height:50px" | Vertical |
|||
| [[Image:Counting rod 0.png]] |
|||
| [[Image:Counting rod v1.png]] |
|||
| [[Image:Counting rod v2.png]] |
|||
| [[Image:Counting rod v3.png]] |
|||
| [[Image:Counting rod v4.png]] |
|||
| [[Image:Counting rod v5.png]] |
|||
| [[Image:Counting rod v6.png]] |
|||
| [[Image:Counting rod v7.png]] |
|||
| [[Image:Counting rod v8.png]] |
|||
| [[Image:Counting rod v9.png]] |
|||
|- |
|||
! style="height:50px" | Horizontal |
|||
| [[Image:Counting rod 0.png]] |
|||
| [[Image:Counting rod h1.png]] |
|||
| [[Image:Counting rod h2.png]] |
|||
| [[Image:Counting rod h3.png]] |
|||
| [[Image:Counting rod h4.png]] |
|||
| [[Image:Counting rod h5.png]] |
|||
| [[Image:Counting rod h6.png]] |
|||
| [[Image:Counting rod h7 num.png]] |
|||
| [[Image:Counting rod h8 num.png]] |
|||
| [[Image:Counting rod h9 num.png]] |
|||
⚫ | |||
The vertical rods are usually for even powers of ten (1, 100, 10000...) and the horizontal for odd powers (10, 1000...). For example, 126 is represented by |
|||
[[Image:Counting rod v1.png]][[Image:Counting rod h2.png]][[Image:Counting rod v6.png]] instead of [[Image:Counting rod v1.png]] [[Image:Counting rod v2.png]][[Image:Counting rod v6.png]], which could be confused with 36. Historically, red rods were used for [[positive number]]s and black rods for [[negative number]]s. |
|||
==== Counting rod numerals in Unicode ==== |
|||
⚫ | Counting rod numerals are included in their own block in the [[Supplementary Multilingual Plane]] (SMP) |
||
| title=The Unicode Standard, Version 5.0 – Electronic edition |
| title=The Unicode Standard, Version 5.0 – Electronic edition |
||
| url=http://unicode.org/versions/Unicode5.0.0/ch15.pdf |
| url=http://unicode.org/versions/Unicode5.0.0/ch15.pdf |
||
Line 181: | Line 140: | ||
| pages=499–500 |
| pages=499–500 |
||
| publisher=Unicode, Inc. |
| publisher=Unicode, Inc. |
||
}}</ref> As these were recently added to the character set and |
}}</ref> This block also contains other counting-rod-like symbols, such as the well-known tally mark for 5 {{strikethrough|{{vbar}}{{vbar}}{{vbar}}{{vbar}}}}. As these were recently added to the character set and are not in the BMP, font support may still be limited. |
||
{{unicode chart Counting Rod Numerals}} |
{{unicode chart Counting Rod Numerals}} |
||
== See also == |
== See also == |
||
*[[Number Forms]] (Unicode block) |
* [[Number Forms]] (Unicode block) |
||
== References == |
== References == |
Latest revision as of 05:03, 2 November 2024
A numeral (often called number in Unicode) is a character that denotes a number. The decimal number digits 0–9 are used widely in various writing systems throughout the world, however the graphemes representing the decimal digits differ widely. Therefore Unicode includes 22 different sets of graphemes for the decimal digits, and also various decimal points, thousands separators, negative signs, etc. Unicode also includes several non-decimal numerals such as Aegean numerals, Roman numerals, counting rod numerals, Mayan numerals, Cuneiform numerals and ancient Greek numerals. There is also a large number of typographical variations of the Western Arabic numerals provided for specialized mathematical use and for compatibility with earlier character sets, such as ² or ②, and composite characters such as ½.
Numerals by numeric property
[edit]Grouped by their numerical property as used in a text, Unicode has four values for Numeric Type. First there is the "not a number" type. Then there are decimal-radix numbers, commonly used in Western style decimals (plain 0–9), there are numbers that are not part of a decimal system such as Roman numbers, and decimal numbers in typographic context, such as encircled numbers. Not noted is a numbering like "A. B. C." for chapter numbering.
[a][b] (Unicode character property) | Numeric Type||||
---|---|---|---|---|
Numeric type | Code | Has numeric value | Example | Remarks |
Not numeric | <none> |
No |
|
Numeric Value="NaN" |
Decimal | De |
Yes |
|
Straight digit (decimal-radix). Corresponds both ways with General Category=Nd[a] |
Digit | Di |
Yes |
|
Decimal, but in typographic context |
Numeric | Nu |
Yes |
|
Numeric value, but not decimal-radix |
a. ^ "Section 4.6: Numeric Value". The Unicode Standard. Unicode Consortium. September 2024. | ||||
b. ^ "Unicode 16.0 Derived Numeric Types". Unicode Character Database. Unicode Consortium. 2024-04-30. |
Hexadecimal digits
[edit]Hexadecimal digits in Unicode are not separate characters; existing letters and numbers are used. These characters have marked Character properties Hex_digit=Yes
, and ASCII_Hex_digit=Yes
when appropriate.
Characters in Unicode marked Hex_Digit=Yes [a]
| |||
---|---|---|---|
0123456789ABCDEF |
Basic Latin, capitals | Also ASCII_Hex_Digit=Yes
| |
0123456789abcdef |
Basic Latin, small letters | Also ASCII_Hex_Digit=Yes
| |
0123456789ABCDEF |
Fullwidth forms, capitals | ||
0123456789abcdef |
Fullwidth forms, small letters | ||
a. ^ "Unicode 16.0 UCD: PropList.txt". 2024-05-31. Retrieved 2024-09-13. |
Numerals by script
[edit]Hindu–Arabic numerals
[edit]The Hindu–Arabic numeral system involves ten digits representing 0–9. Unicode includes the Western Arabic numerals in the Basic Latin (or ASCII derived) block. The digits are repeated in several other scripts: Eastern Arabic, Balinese, Bengali, Devanagari, Ethiopic, Gujarati, Gurmukhi, Telugu, Khmer, Lao, Limbu, Malayalam, Mongolian, Myanmar, New Tai Lue, Nko, Oriya, Telugu, Thai, Tibetan, Osmanya. Unicode includes a numeric value property for each digit to assist in collation and other text processing operations. However, there is no mapping between the various related digits.
Although Arabic is written from right to left, while English is written left to right, in both languages numbers are written with the most significant digit on the left and the least significant on the right.
Fractions
[edit]The fraction slash character (U+2044) allows authors using Unicode to compose any arbitrary fraction along with the decimal digits. This was intended to instruct font rendering to make the surrounding digits smaller and raise them on the left and lower them on the right, but this is rarely implemented. (A workaround is to use the super/subscript characters described below, but only Arabic numerals are available.) Unicode also includes a handful of vulgar fractions as compatibility characters, but discourages their use.
Decimal fractions
[edit]Several characters in Unicode can serve as a decimal separator depending on the locale. Decimal fractions are represented in text as a sequence of decimal digit numerals with a decimal separator separating the whole-number portion from the fractional portion. For example, the decimal fraction for ¼ is expressed as zero-point-two-five ("0.25"). Unicode has no dedicated general decimal separator but unifies the decimal separator function with other punctuation characters. So the "." used in "0.25" is the same period character (U+002E) used to end the sentence. However, cultures vary in the glyph or grapheme used for a decimal separator. So in some locales, the comma (U+002C) may be used instead: "0,25". Still other locales use a space (or non-breaking space) for "0 25". The Arabic writing system includes a dedicated character for a decimal separator that looks much like a comma "٫" (U+066B) which when combined with the Arabic digits to express one-quarter appears as: "٠٫٢٥".
Characters for mathematical constants
[edit]Currently, three Unicode characters semantically represent mathematical constants: U+210E ℎ PLANCK CONSTANT, the U+210F ℏ PLANCK CONSTANT OVER TWO PI, and U+2107 ℇ EULER CONSTANT (of unknown significance[1]). Other mathematical constants can be represented using characters that have multiple semantic uses. For example, although Unicode includes a character for natural exponent ℯ (U+212F) its UCS canonical name derives from its glyph: U+212F ℯ SCRIPT SMALL E; and the mathematical constant π, 3.141592.., is represented by U+03C0 π GREEK SMALL LETTER PI.
Rich text and other compatibility numerals
[edit]The Western Arabic numerals also appear among the compatibility characters as rich text variant forms including bold, double-struck, monospace, sans-serif and sans-serif bold, along with fullwidth variants for legacy vertical text support.
Rich text parenthesized, circled and other variants are also included in the blocks Enclosed CJK Letters and Months; Enclosed Alphanumerics, Superscripts and Subscripts; Number Forms; and Dingbats.
Suzhou (huāmǎ/Sūzhōu mǎzi) numerals
[edit]The huāmǎ (simplified Chinese: 花码; traditional Chinese: 花碼)/Sūzhōu mǎzi (simplified Chinese: 苏州码子; traditional Chinese: 蘇州碼字) system is a variation of the rod numeral system. Rod numerals are closely related to the counting rods and the abacus, which is why the numeric symbols for 1, 2, 3, 6, 7 and 8 in the huāmǎ system are represented in a similar way as on the abacus. Nowadays, the huāmǎ system is only used for displaying prices in Chinese markets or on traditional handwritten invoices.
The digits of the Suzhou numerals are in the CJK Symbols and Punctuation block at U+3021—U+3029, U+3007, U+5341, U+5344, and U+5345. In Unicode 3.0 these characters are incorrectly called Hangzhou style numerals. In the Unicode 4.0, an erratum was added which stated:[2]
The Suzhou numerals (Chinese su1zhou1ma3zi) are special numeric forms used by traders to display the prices of goods. The use of "HANGZHOU" in the names is a misnomer.
All references to "Hangzhou" in the Unicode standard have been corrected to "Suzhou" except for the character names themselves, which cannot be changed once assigned, according to the Unicode Stability Policy.[3] (This policy allows software to use the names as unique identifiers.)
Japanese and Korean numerals
[edit]Ancient Greek numerals
[edit]Unicode provides support for several variants of Greek numerals, assigned to the Supplementary Multilingual Plane from U+10140 through U+1018F.[4]
Attic numerals were used by ancient Greeks, possibly from the 7th century BC. They were also known as Herodianic numerals because they were first described in a 2nd-century manuscript by Herodian. They are also known as acrophonic numerals because all of the symbols used derive from the first letters of the words that the symbols represent: 'one', 'five', 'ten', 'hundred', 'thousand' and 'ten thousand'. See Greek numerals and acrophony.
Decimal | Symbol | Greek numeral |
---|---|---|
1 | Ι | ἴος or ἰός (ios) |
5 | Π | πέντε ('pente) |
10 | Δ | δέκα (deka) |
100 | Η | ἑκατόν ('hekaton}) |
1000 | Χ | χίλιοι (khilioi) |
10000 | Μ | μύριοι (myrioi) |
Ancient Greek Numbers[1][2] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+1014x | 𐅀 | 𐅁 | 𐅂 | 𐅃 | 𐅄 | 𐅅 | 𐅆 | 𐅇 | 𐅈 | 𐅉 | 𐅊 | 𐅋 | 𐅌 | 𐅍 | 𐅎 | 𐅏 |
U+1015x | 𐅐 | 𐅑 | 𐅒 | 𐅓 | 𐅔 | 𐅕 | 𐅖 | 𐅗 | 𐅘 | 𐅙 | 𐅚 | 𐅛 | 𐅜 | 𐅝 | 𐅞 | 𐅟 |
U+1016x | 𐅠 | 𐅡 | 𐅢 | 𐅣 | 𐅤 | 𐅥 | 𐅦 | 𐅧 | 𐅨 | 𐅩 | 𐅪 | 𐅫 | 𐅬 | 𐅭 | 𐅮 | 𐅯 |
U+1017x | 𐅰 | 𐅱 | 𐅲 | 𐅳 | 𐅴 | 𐅵 | 𐅶 | 𐅷 | 𐅸 | 𐅹 | 𐅺 | 𐅻 | 𐅼 | 𐅽 | 𐅾 | 𐅿 |
U+1018x | 𐆀 | 𐆁 | 𐆂 | 𐆃 | 𐆄 | 𐆅 | 𐆆 | 𐆇 | 𐆈 | 𐆉 | 𐆊 | 𐆋 | 𐆌 | 𐆍 | 𐆎 | |
Notes |
Roman numerals
[edit]Roman numerals originated in ancient Rome, adapted from Etruscan numerals. The system used in classical antiquity was slightly modified in the Middle Ages to produce the system we use today. It is based on certain letters which are given values as numerals.
Roman numerals are commonly used today in numbered lists (in outline format), clockfaces, pages preceding the main body of a book, chord triads in music analysis (Roman numeral analysis), the numbering of movie and video game sequels, book publication dates, successive political leaders or children with identical names, and the numbering of some sport events, such as the Olympic Games or the Super Bowl.
Unicode has a number of characters specifically designated as Roman numerals, as part of the Number Forms[5] range from U+2160 to U+2188. This range includes both upper- and lowercase numerals, as well as pre-combined characters for numbers up to 12 (Ⅻ or XII). One reason for the existence of pre-combined numbers is to facilitate the setting of multiple-letter numbers (such as VIII) on a single horizontal line in Asian vertical text. The Unicode standard, however, includes special Roman numeral code points for compatibility only, stating that "[f]or most purposes, it is preferable to compose the Roman numerals from sequences of the appropriate Latin letters".[6]
Additionally, characters exist for archaic[5] forms of 1000, 5000, 10,000, large reversed C (Ɔ), late 6 (ↅ, similar to Greek Stigma: Ϛ), early 50 (ↆ, similar to down arrow ↓⫝⊥[7]), 50,000, and 100,000. The small reversed c, ↄ, is not intended to be used in Roman numerals, but as lower case Claudian letter Ↄ.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Value[8] | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 50 | 100 | 500 | 1,000 |
U+216x | Ⅰ | Ⅱ | Ⅲ | Ⅳ | Ⅴ | Ⅵ | Ⅶ | Ⅷ | Ⅸ | Ⅹ | Ⅺ | Ⅻ | Ⅼ | Ⅽ | Ⅾ | Ⅿ |
U+217x | ⅰ | ⅱ | ⅲ | ⅳ | ⅴ | ⅵ | ⅶ | ⅷ | ⅸ | ⅹ | ⅺ | ⅻ | ⅼ | ⅽ | ⅾ | ⅿ |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
---|---|---|---|---|---|---|---|---|---|
Value | 1000 | 5000 | 10,000 | 100 | 100 | 6 | 50 | 50,000 | 100,000 |
U+218x | ↀ | ↁ | ↂ | Ↄ | ↄ | ↅ | ↆ | ↇ | ↈ |
If using blackletter or script typefaces, Roman numerals are set in Roman type. Such typefaces may contain Roman numerals matching the style of the typeface in the Unicode range U+2160–217F; if they don't exist, a matching Antiqua typeface is used for Roman numerals.
Unicode has characters for Roman fractions in the Ancient Symbols[9] block: sextans, uncia, semuncia, sextula, dimidia sextula, siliqua, and as.
Counting rod numerals
[edit]Counting rod numerals are included in their own block in the Supplementary Multilingual Plane (SMP) as of Unicode 5.0. There are nine "horizontal" digits (U+1D360 to U+1D368) and nine "vertical" digits (U+1D369 to U+1D371), the horizontal digits are used for odd powers of ten and the vertical digits for even powers of ten. Zero should be represented by U+3007 (〇, ideographic number zero) and the negative sign should be represented by U+20E5 (combining reverse solidus overlay).[10] This block also contains other counting-rod-like symbols, such as the well-known tally mark for 5 ||||. As these were recently added to the character set and are not in the BMP, font support may still be limited.
Counting Rod Numerals[1][2] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+1D36x | 𝍠 | 𝍡 | 𝍢 | 𝍣 | 𝍤 | 𝍥 | 𝍦 | 𝍧 | 𝍨 | 𝍩 | 𝍪 | 𝍫 | 𝍬 | 𝍭 | 𝍮 | 𝍯 |
U+1D37x | 𝍰 | 𝍱 | 𝍲 | 𝍳 | 𝍴 | 𝍵 | 𝍶 | 𝍷 | 𝍸 | |||||||
Notes |
See also
[edit]- Number Forms (Unicode block)
References
[edit]- ^ It is unknown which constant this is supposed to be. Xerox standard XCCS 353/046 just says "Euler's."
- ^ Freytag, Asmus; Rick McGowan; Ken Whistler (2006-05-08). "UTN #27: Known anomalies in Unicode Character Names". Technical Notes. Unicode Consortium. Retrieved 2008-06-13.
- ^ "Name Stability". Unicode Character Encoding Stability Policy. Unicode Consortium. 2008-02-28. Retrieved 2008-06-13.
- ^ Unicode Charts: Ancient Greek Numbers
- ^ a b Unicode Number Forms
- ^ The Unicode Standard, Version 6.0 – Electronic edition (PDF), Unicode, Inc., 2011, p. 486
- ^ David J. Perry: Proposal to Add Additional Ancient Roman Characters to UCS
- ^ For the first two rows
- ^ Unicode Ancient Symbols
- ^ The Unicode Standard, Version 5.0 – Electronic edition (PDF), Unicode, Inc., 2006, pp. 499–500