Jump to content

Complex text layout: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Reverting edit(s) by 2A01:CB06:8031:F655:91F5:2A4E:8AF3:5945 (talk) to rev. 1053024295 by 182.186.248.192: Not providing a reliable source (RW 16.1)
 
(26 intermediate revisions by 23 users not shown)
Line 1: Line 1:
{{Short description|Neighbour-dependent grapheme positioning}}
[[Image:JanaSanskritSans ddhrya.svg|thumb|The [[Devanagari]] ''{{IAST|ddhrya}}''-ligature of [http://tdil.mit.gov.in/download/janasanskrit.htm JanaSanskritSans], should be invoked by the layout engine to render the sequence of seven Unicode characters द + ् + ध + ् + र + ् + य = द्ध्र्य.]]
{{Self reference|For assistance with enabling complex text layout on your computer, see [[Help:Multilingual support]].}}
[[Image:arabicrender.png|thumb|250px|The word {{lang|ar| العربية }} ''{{transl|ar|al-arabiyyah}}'', "the Arabic [language]" in Arabic, in stages of rendering. The first line shows the letters as they are unprocessed, the result that would be given by an application without complex script rendering. In the second line the bidirectional display mechanism has come to play, and in the third the [[glyph]] shaping mechanism has rendered the letters according to context.]]
{{No footnotes|date=July 2013}}
[[File:JanaSanskritSans ddhrya.svg|thumb|The [[Devanagari]] ''{{IAST|ddhrya}}''-ligature, as displayed in the [https://web.archive.org/web/20110716160603/http://tdil.mit.gov.in/download/janasanskrit.htm JanaSanskritSans] font, which should be invoked by the layout engine to render the sequence द + ् + ध + ् + र + ् + य = द्ध्र्य.]]
[[File:Arabicrender.png|thumb|250px|The word {{lang|ar|العربية}} ''{{transl|ar|al-arabiyyah}}'', "the Arabic [language]" in Arabic, in successive stages of rendering. The first line shows the letters in left-to-right order and unjoined, as they might appear in an application without complex text layout. In the second line, bidirectional display has been applied, and in the third the [[glyph]]-shaping mechanism has rendered the letters according to context.]]


'''Complex text layout''' ('''CTL''') or '''complex text rendering''' is the [[typesetting]] of [[writing system]]s in which the shape or positioning of a [[grapheme]] depends on its relation to other graphemes. The term is used in the field of software [[internationalization]], where each grapheme is a [[character (computing)|character]].
:''See [[Wikipedia:COMPLEX|Help:Multilingual support]] for enabling complex text layout on your computer''


Scripts which require CTL for proper display may be known as '''complex scripts'''. Examples include the [[Arabic alphabet]] and scripts of the [[Brahmic scripts|Brahmic family]], such as [[Devanagari]], [[Khmer script]] or the [[Thai alphabet]]. Many scripts do not require CTL. For instance, the [[Latin alphabet]] or [[Chinese character]]s can be typeset by simply displaying each character one after another in straight rows or columns. However, even these scripts have alternate forms or optional features (such as [[cursive]] writing) which require CTL to produce on computers.
'''Complex text layout''' (abbreviated '''CTL''') or '''complex text rendering''' refers to the typesetting of [[writing system]]s which require complex transformations between text input and text display for proper rendering on the screen or the printed page (also known as '''complex scripts'''). In other words, for these scripts the way text is stored is not mapped to the way it is displayed in a straightforward fashion. The term is used in the field of software [[internationalization]].


==Characteristics requiring CTL==
Examples of writing systems requiring CTL are the [[Arabic alphabet]] and scripts of the [[Brahmic family]] such as [[Devanagari]] or the [[Thai alphabet]].
The main characteristics of CTL complexity are:
* [[Bi-directional text]], where characters may be written from either right-to-left or left-to-right direction.
* [[Context-sensitive shaping]] and [[ligature (typography)|ligature]]s, where a character may change its shape, dependent on its location and/or the surrounding characters. For example, a character in [[Arabic script]] can have as many as four different shape-forms, depending on context.
* Ordering, where the displayed order of the characters is not the same as the logical order. For example, in Devanagari, which is written from left to right, the grapheme for "short i" appears to the left of ("before") the consonant that it follows: in {{lang|sa|कि}} ''ki'', the {{lang|sa|ि}} ''-i'' should render on the left, its bow reaching until above the {{lang|sa|क}} ''k-'' to the right.


Not all occurrences of these characteristics require CTL. For example, the [[Greek alphabet]] has context-sensitive shaping of the letter [[sigma]], which appears as ς at the end of a word and σ elsewhere. However, these two forms are normally stored as different characters; for instance, [[Unicode]] has both {{unichar|03C2|GREEK SMALL LETTER FINAL SIGMA}} and {{unichar|03C3|GREEK SMALL LETTER SIGMA}}, and does not treat them as [[Unicode equivalence|equivalent]]. For collation and comparison purposes, software should consider the string "δῖος Ἀχιλλεύς" equivalent to "δῖοσ Ἀχιλλεύσ",<ref>{{cite web|url = https://www.unicode.org/faq/greek.html#5|title = FAQ - Greek Language & Script|accessdate = 2013-09-13|publisher = Unicode Consortium|date = 2012-12-03|quote = It is easier to simply equate the two sigma codes for operations which are concerned with word content, for example.}}</ref> but for typesetting purposes they are distinct and CTL is not required to choose the correct form.
CTL is a generalization of the concept of [[ligature (typography)|ligature]]: for the [[Latin alphabet]], ligatures are usually considered a marginal aesthetic concern, but there is no fundamental difference between the ligatures required for acceptable typesetting of the Arabic script, and typesetting a Latin [[cursive]].<ref>Indeed, historically, the Arabic alphabet is simply a cursive of the [[Nabataean alphabet]], with context-dependent letter shapes that became mandatory from ca. the 4th century AD.</ref> Conversely, most characters of the [[Chinese script]] are compositional and could be considered ligatures, but are usually encoded as so many individual characters, that typesetting requires an enormous typeface rather than sophisticated layout. An example of a contextual variant that is not considered a ligature is Greek final [[sigma]] ς, the word-final contextual variant of the usual σ shape. Unicode encodes both variants separately, at U+03C2 and U+03C3 respectively. However, for collation and comparison purposes, software should likely consider the string "δῖος Ἀχιλλεύς." equivalent to "δῖοσ Ἀχιλλεύσ." (Unicode does not direct conforming software to treat ς and σ as canonically or compatibility [[Unicode equivalence|equivalent]]).


==Implementations==
The main characteristics of CTL language complexity are:
<!-- Lists in this section are in alphabetical order to avoid POV issues. -->
* [[Bi-directional text]], where characters may be written from either right-to-left or left-to-right direction.
Most text-rendering software that is capable of CTL will include information about specific scripts, and so will be able to render them correctly without [[computer font|font files]] needing to supply instructions on how to lay out characters. Such software is usually provided in a [[library (computing)|library]]; examples include:
* [[Context-sensitive shaping]] (ligatures), where a character may change its shape, dependent on its location and/or the surrounding characters. For example, a character in [[Arabic script]] can have as many as four different shape-forms, depending on context.
* [[Core Text]] for [[macOS]]
* Ordering, the displayed order of the characters is not the same as the logical order. For example, in Devanagari, which is written from left to right, the grapheme for "short i" appears to the left of ("before") the preceding consonant: in {{lang|sa|कि}} ''ki'', the {{lang|sa|ि}} ''-i'' should render on the left, its bow reaching until above the {{lang|sa|क}} ''k-'' to the right.
* [[Uniscribe]] (with Universal Shaping Engine) and [[DirectWrite]] for [[Microsoft Windows]]
* [[HarfBuzz]], a [[cross-platform]] library
* [[Pango]], a cross-platform library which nowadays incorporates [[HarfBuzz]]


However, such software is unable to properly render any script for which it lacks instructions, which can include many minority scripts. The alternative approach is to include the rendering instructions in the font file itself. Rendering software still needs to be capable of reading and following the instructions, but this is relatively simple.
== Implementations ==
Some CTL implementations do not encapsulate information about specific scripts. In these implementations, the script-specific CTL information resides within the font files. Therefore, they are able to render any script:
* [[Apple Advanced Typography]]
* [[Graphite (SIL)|Graphite]]


Examples of this latter approach include [[Apple Advanced Typography]] (AAT) and [[Graphite (SIL)|Graphite]]. Both of these names encompass both the instruction format and the software supporting it; AAT is included on [[Apple Inc.|Apple]] [[operating system]]s, while Graphite is available for [[Microsoft Windows]] and [[Linux]]-based systems.
Other CTL implementations encapsulate information about specific scripts. In these implementations, the script-specific CTL information is provided by the CTL implementation. Therefore, they are only able to render the scripts that are previously implemented:
* [[International Components for Unicode]] (ICU)
* [[Pango]] provides text services to [[GTK+]]
* [[Harfbuzz]] is the new [[OpenType]] layout engine for Pango and [[Qt (toolkit)|Qt]]
* [[Uniscribe]] and its successor, [[DirectWrite]]


The [[OpenType]] format is primarily intended for systems using the first approach (layout knowledge in the renderer, not the font), but it has a few features that assist with CTL, such as contextual ligatures. AAT and Graphite instructions can be embedded in OpenType font files.
==Notes==
<references />


== See also ==
==See also==
* [[Typography]]
* [[Typography]]
* [[Unicode]]
* [[Unicode]]
* Writing systems which require complex text layout:
* Writing systems which require complex text layout:
** [[Arabic alphabet]] (technically an [[abjad]])
** [[Arabic alphabet]]
** Most of the [[Brahmic family of scripts]]
** Most of the [[Brahmic scripts|Brahmic]] family of scripts
** [[N'Ko script]]
** [[N'Ko script]]
** [[Tengwar]] (diacritics and numbers)
** [[Tengwar]] (diacritics and numbers)


==References==
== External links ==
{{Reflist}}

==External links==
* [http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=CmplxRndExamples Examples of complex rendering] &mdash; [[SIL international]]'s examples of complex writing systems around the world
* [http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=CmplxRndExamples Examples of complex rendering] &mdash; [[SIL international]]'s examples of complex writing systems around the world
* [http://www.opengroup.org/desktop/ctl/ Complex Text Layout] &mdash; [[The Open Group]]'s Desktop Technologies
* [http://www.opengroup.org/desktop/ctl/ Complex Text Layout] &mdash; [[The Open Group]]'s Desktop Technologies
* [http://www.mozilla.org/projects/ctl/ Supporting Indic Scripts in Mozilla] &mdash; also other CTL scripts
* [https://web.archive.org/web/20050206102353/http://www.mozilla.org/projects/ctl/ Supporting Indic Scripts in Mozilla] &mdash; also other CTL scripts
* [http://sila.mozdev.org/ Project SILA] &mdash; [[Graphite (SIL)|Graphite]] and [[Mozilla]] integration project
* [https://web.archive.org/web/20121020041740/http://sila.mozdev.org/ Project SILA] &mdash; [[Graphite (SIL)|Graphite]] and [[Mozilla]] integration project
* [http://developers.sun.com/techtopics/global/products_platforms/solaris/reference/whitepapers/#ctl CTL Architecture in Solaris] &mdash; Solaris Globalization Whitepapers
* [http://developers.sun.com/techtopics/global/products_platforms/solaris/reference/whitepapers/#ctl CTL Architecture in Solaris] &mdash; Solaris Globalization Whitepapers
* [http://www.microsoft.com/globaldev/DrIntl/faqs/Complex.mspx Complex Scripts] &mdash; Microsoft Global Development and Computing Portal
* [http://www.microsoft.com/globaldev/DrIntl/faqs/Complex.mspx Complex Scripts] &mdash; Microsoft Global Development and Computing Portal
* [http://linux.thai.net/~thep/ Theppitak's Homepage] &mdash; information about Thai language processing
* [http://linux.thai.net/~thep/ Theppitak's Homepage] &mdash; information about Thai language processing
* [http://www.freedesktop.org/wiki/Software/HarfBuzz HarfBuzz' page] at [[Freedesktop.org]]
* [http://www.freedesktop.org/wiki/Software/HarfBuzz HarfBuzz's page] at [[Freedesktop.org]]
* [http://www.d-type.com/unicode_text/ D-Type Unicode Text Module — Portable software library for complex text]
* [http://www.d-type.com/unicode_text/ D-Type Unicode Text Module — Portable software library for complex text]
* [https://github.com/salshaaban/BidiRenderer BidiRenderer] &mdash; An application that illustrates the shaping and layout of complex text in bidirectional paragraphs using FriBidi, FreeType, and HarfBuzz
* [https://github.com/Tehreer/Tehreer-Android Tehreer-Android] &mdash; A library that gives full control over text related technologies such as bidirectional algorithm, open type shaping, text typesetting and text rendering
*[https://github.com/Tehreer/Tehreer-Cocoa Tehreer-Cocoa] &mdash; Standalone font/text engine for iOS


{{DEFAULTSORT:Complex Text Layout}}
[[Category:Typesetting]]
[[Category:Typesetting]]
[[Category:Indic computing]]
[[Category:Indic computing]]
[[Category:Natural language and computing]]

[[de:Complex Text Layout]]
[[ja:複雑なテキスト配置]]
[[th:Complex Text Layout]]
[[zh:複雜文字編排]]

Latest revision as of 13:05, 1 August 2023

The Devanagari ddhrya-ligature, as displayed in the JanaSanskritSans font, which should be invoked by the layout engine to render the sequence द + ् + ध + ् + र + ् + य = द्ध्र्य.
The word العربية al-arabiyyah, "the Arabic [language]" in Arabic, in successive stages of rendering. The first line shows the letters in left-to-right order and unjoined, as they might appear in an application without complex text layout. In the second line, bidirectional display has been applied, and in the third the glyph-shaping mechanism has rendered the letters according to context.

Complex text layout (CTL) or complex text rendering is the typesetting of writing systems in which the shape or positioning of a grapheme depends on its relation to other graphemes. The term is used in the field of software internationalization, where each grapheme is a character.

Scripts which require CTL for proper display may be known as complex scripts. Examples include the Arabic alphabet and scripts of the Brahmic family, such as Devanagari, Khmer script or the Thai alphabet. Many scripts do not require CTL. For instance, the Latin alphabet or Chinese characters can be typeset by simply displaying each character one after another in straight rows or columns. However, even these scripts have alternate forms or optional features (such as cursive writing) which require CTL to produce on computers.

Characteristics requiring CTL

[edit]

The main characteristics of CTL complexity are:

  • Bi-directional text, where characters may be written from either right-to-left or left-to-right direction.
  • Context-sensitive shaping and ligatures, where a character may change its shape, dependent on its location and/or the surrounding characters. For example, a character in Arabic script can have as many as four different shape-forms, depending on context.
  • Ordering, where the displayed order of the characters is not the same as the logical order. For example, in Devanagari, which is written from left to right, the grapheme for "short i" appears to the left of ("before") the consonant that it follows: in कि ki, the ि -i should render on the left, its bow reaching until above the k- to the right.

Not all occurrences of these characteristics require CTL. For example, the Greek alphabet has context-sensitive shaping of the letter sigma, which appears as ς at the end of a word and σ elsewhere. However, these two forms are normally stored as different characters; for instance, Unicode has both U+03C2 ς GREEK SMALL LETTER FINAL SIGMA and U+03C3 σ GREEK SMALL LETTER SIGMA, and does not treat them as equivalent. For collation and comparison purposes, software should consider the string "δῖος Ἀχιλλεύς" equivalent to "δῖοσ Ἀχιλλεύσ",[1] but for typesetting purposes they are distinct and CTL is not required to choose the correct form.

Implementations

[edit]

Most text-rendering software that is capable of CTL will include information about specific scripts, and so will be able to render them correctly without font files needing to supply instructions on how to lay out characters. Such software is usually provided in a library; examples include:

However, such software is unable to properly render any script for which it lacks instructions, which can include many minority scripts. The alternative approach is to include the rendering instructions in the font file itself. Rendering software still needs to be capable of reading and following the instructions, but this is relatively simple.

Examples of this latter approach include Apple Advanced Typography (AAT) and Graphite. Both of these names encompass both the instruction format and the software supporting it; AAT is included on Apple operating systems, while Graphite is available for Microsoft Windows and Linux-based systems.

The OpenType format is primarily intended for systems using the first approach (layout knowledge in the renderer, not the font), but it has a few features that assist with CTL, such as contextual ligatures. AAT and Graphite instructions can be embedded in OpenType font files.

See also

[edit]

References

[edit]
  1. ^ "FAQ - Greek Language & Script". Unicode Consortium. 2012-12-03. Retrieved 2013-09-13. It is easier to simply equate the two sigma codes for operations which are concerned with word content, for example.
[edit]