Jump to content

Binary-to-text encoding: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
remove TXMS, was added by anon user and only linked to one implementation
 
(44 intermediate revisions by 38 users not shown)
Line 3: Line 3:
{{original research|date=April 2010}}
{{original research|date=April 2010}}
{{more citations needed|date=December 2012}}
{{more citations needed|date=December 2012}}
{{Cleanup bare URLs|date=September 2022}}


}}
}}
{{anchor|ASCII armor}} A '''binary-to-text encoding''' is [[code|encoding]] of [[data (computing)|data]] in [[plain text]]. More precisely, it is an encoding of binary data in a sequence of [[character (computing)|printable characters]]. These encodings are necessary for transmission of data when the channel does not allow binary data (such as [[email]] or [[NNTP]]) or is not [[8-bit clean]]. [[Pretty Good Privacy|PGP]] documentation ({{IETF RFC|4880}}) uses the term "'''ASCII armor'''" for binary-to-text encoding when referring to [[Base64]].
{{anchor|ASCII armor}} A '''binary-to-text encoding''' is [[code|encoding]] of [[data (computing)|data]] in [[plain text]]. More precisely, it is an encoding of binary data in a sequence of [[character (computing)|printable characters]]. These encodings are necessary for transmission of data when the [[communication channel]] does not allow binary data (such as [[email]] or [[NNTP]]) or is not [[8-bit clean]]. [[Pretty Good Privacy|PGP]] documentation ({{IETF RFC|4880}}) uses the term "'''ASCII armor'''" for binary-to-text encoding when referring to [[Base64]].


==Overview==
==Overview==
Line 12: Line 11:


==Description==
==Description==
The [[ASCII]] text-encoding standard uses 7 bits to encode characters. With this it is possible to encode 128 (i.e. 2<sup>7</sup>) unique values (0–127) to represent the alphabetic, numeric, and punctuation characters commonly used in [[English language|English]], plus a selection of [[C0 and C1 control codes|control codes]] which do not represent printable characters. For example, the capital letter ''A'' is ASCII character 65, the numeral ''2'' is ASCII 50, the character ''<nowiki>}</nowiki>'' is ASCII 125, and the [[metacharacter]] ''carriage return'' is ASCII 13. Systems based on ASCII use seven bits to represent these values digitally.
The [[ASCII]] text-encoding standard uses 7 bits to encode characters. With this it is possible to encode 128 (i.e. 2<sup>7</sup>) unique values (0–127) to represent the alphabetic, numeric, and punctuation characters commonly used in [[English language|English]], plus a selection of [[Control character|Control characters]] which do not represent printable characters. For example, the capital letter '''A''' is represented in 7 bits as 100 0001<sub>2</sub>, 0x41 (101<sub>8</sub>) , the numeral '''2''' is 011 0010<sub>2</sub> 0x32 (62<sub>8</sub>), the character '''<nowiki>}</nowiki>''' is 111 1101<sub>2</sub> 0x7D (175<sub>8</sub>), and the [[Control character]] '''RETURN''' is 000 1101<sub>2</sub> 0x0D (15<sub>8</sub>).


In contrast, most computers store data in memory organized in eight-bit [[byte]]s. Files that contain machine-executable code and non-textual data typically contain all 256 possible eight-bit byte values. Many computer programs came to rely on this distinction between seven-bit ''text'' and eight-bit ''binary'' data, and would not function properly if non-ASCII characters appeared in data that was expected to include only ASCII text. For example, if the value of the eighth bit is not preserved, the program might interpret a byte value above 127 as a flag telling it to perform some function.
In contrast, most computers store data in memory organized in eight-bit [[byte]]s. Files that contain machine-executable code and non-textual data typically contain all 256 possible eight-bit byte values. Many computer programs came to rely on this distinction between seven-bit ''text'' and eight-bit ''binary'' data, and would not function properly if non-ASCII characters appeared in data that was expected to include only ASCII text. For example, if the value of the eighth bit is not preserved, the program might interpret a byte value above 127 as a flag telling it to perform some function.


It is often desirable, however, to be able to send non-textual data through text-based systems, such as when one might attach an image file to an e-mail message. To accomplish this, the data is encoded in some way, such that eight-bit data is encoded into seven-bit ASCII characters (generally using only alphanumeric and punctuation characters—the [[ASCII#ASCII printable characters|ASCII printable characters]]). Upon safe arrival at its destination, it is then decoded back to its eight-bit form. This process is referred to as binary to text encoding. Many programs perform this conversion to allow for data-transport, such as [[Pretty Good Privacy|PGP]] and [[GNU Privacy Guard]] (GPG).
It is often desirable, however, to be able to send non-textual data through text-based systems, such as when one might attach an image file to an e-mail message. To accomplish this, the data is encoded in some way, such that eight-bit data is encoded into seven-bit ASCII characters (generally using only alphanumeric and punctuation characters—the ASCII printable characters). Upon safe arrival at its destination, it is then decoded back to its eight-bit form. This process is referred to as binary to text encoding. Many programs perform this conversion to allow for data-transport, such as [[Pretty Good Privacy|PGP]] and [[GNU Privacy Guard]].


==Encoding plain text==
==Encoding plain text==
{{See also|Delimiter#ASCII armor|Return-to-libc attack#Protection from return-to-libc attacks}}Binary-to-text encoding methods are also used as a mechanism for encoding [[plain text]]. For example:
{{See also|Delimiter#ASCII armor|Return-to-libc attack#Protection from return-to-libc attacks}}Binary-to-text encoding methods are also used as a mechanism for encoding [[plain text]]. For example:
* Some systems have a more limited character set they can handle; not only are they not [[8-bit clean]], some cannot even handle every printable ASCII character.
* Some systems have a more limited character set they can handle; not only are they not [[8-bit clean]], some cannot even handle every printable ASCII character.
* Other systems have limits on the number of characters that may appear between [[line break (computing)|line break]]s, such as the "1000 characters per line" limit of some [[SMTP]] software, as allowed by {{IETF RFC|2821}}.
* Other systems have limits on the number of characters that may appear between line breaks, such as the "1000 characters per line" limit of some [[Simple Mail Transfer Protocol]] software, as allowed by {{IETF RFC|2821}}.
* Still others add [[header (computing)|header]]s or [[trailer (information technology)|trailer]]s to the text.
* Still others add [[header (computing)|header]]s or [[trailer (information technology)|trailer]]s to the text.
* A few poorly-regarded but still-used protocols use [[in-band signaling]], causing confusion if specific patterns appear in the message. The best-known is the string "From&nbsp;" (including trailing space) at the beginning of a line used to separate mail messages in the [[mbox]] file format.
* A few poorly-regarded but still-used protocols use [[in-band signaling]], causing confusion if specific patterns appear in the message. The best-known is the string "From&nbsp;" (including trailing space) at the beginning of a line, used to separate mail messages in the [[mbox]] file format.
By using a binary-to-text encoding on messages that are already plain text, then decoding on the other end, one can make such systems appear to be completely [[Transparency (telecommunication)|transparent]]. This is sometimes referred to as 'ASCII armoring'. For example, the ViewState component of [[ASP.NET]] uses [[base64]] encoding to safely transmit text via HTTP POST, in order to avoid [[delimiter collision]].
By using a binary-to-text encoding on messages that are already plain text, then decoding on the other end, one can make such systems appear to be completely [[Transparency (telecommunication)|transparent]]. This is sometimes referred to as 'ASCII armoring'. For example, the ViewState component of [[ASP.NET]] uses [[base64]] encoding to safely transmit text via HTTP POST, in order to avoid [[delimiter collision]].


== Encoding standards ==
== Encoding standards ==
The table below compares the most used forms of binary-to-text encodings. The efficiency listed is the ratio between number of bits in the input and the number of bits in the encoded output.
The table below compares the most used forms of binary-to-text encodings. The efficiency listed is the ratio between the number of bits in the input and the number of bits in the encoded output.


{| class="wikitable sortable"
{| class="wikitable sortable"
Line 33: Line 32:
! Encoding !! Data type !! Efficiency !! Programming language implementations !! Comments
! Encoding !! Data type !! Efficiency !! Programming language implementations !! Comments
|-
|-
| [[Ascii85]] || Arbitrary || 80% || [http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts awk] {{Webarchive|url=https://web.archive.org/web/20141229031706/http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts |date=2014-12-29 }}, [http://www.ibiblio.org/pub/packages/ccic/software/unix/utils/btoa.c C], [https://github.com/woolstar/test/blob/master/encode/asc85.c C (2)], [https://web.archive.org/web/20131227071331/http://www.codinghorror.com/blog/2005/10/c-implementation-of-ascii85.html C#], [https://web.archive.org/web/20210927102719/http://blog.wezeku.com/2010/07/01/f-ascii85-module/ F#], [https://pkg.go.dev/encoding/ascii85 Go], [https://web.archive.org/web/20160304035222/http://java.freehep.org/freehep-io/apidocs/org/freehep/util/io/ASCII85.html Java] [https://metacpan.org/pod/Convert::Ascii85 Perl], [https://docs.python.org/3/library/base64.html#base64.a85encode Python], [https://web.archive.org/web/20151208205520/https://code.google.com/p/python-mom/source/browse/mom/codec/base85.py Python (2)]|| There exist several variants of this encoding, [[Base85]], [[btoa]], etc.
| [[ASCII]] {{Efn|Not strictly a text encoding as output includes non-printable characters|name=nonprint}}|| Arbitrary || data-sort-value="87.5%"| 87.5%|| Most languages || This is talking about bit-shifting 8-bit binary to 7-bit data, so that 7 bytes of binary data take up 8 bytes of 7-bit data, which will represent ASCII including all possible control codes. This scheme is seldom used in practice.
|-
|-
| [[Ascii85]] || Arbitrary || 80% || [http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts awk], [http://www.ibiblio.org/pub/packages/ccic/software/unix/utils/btoa.c C], [https://github.com/woolstar/test/blob/master/encode/asc85.c C (2)], [http://www.codinghorror.com/blog/2005/10/c-implementation-of-ascii85.html C#], [http://blog.wezeku.com/2010/07/01/f-ascii85-module/ F#], [https://pkg.go.dev/encoding/ascii85 Go], [https://web.archive.org/web/20160304035222/http://java.freehep.org/freehep-io/apidocs/org/freehep/util/io/ASCII85.html Java] [https://metacpan.org/pod/Convert::Ascii85 Perl], [https://docs.python.org/3/library/base64.html#base64.a85encode Python], [https://code.google.com/p/python-mom/source/browse/mom/codec/base85.py Python (2)] || There exist several variants of this encoding, [[Base85]], [[btoa]], et cetera.
| [[Base32]] || Arbitrary || 62.5% || [http://sourceforge.net/projects/cyoencode/ ANSI C], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi], [https://pkg.go.dev/encoding/base32 Go], [http://commons.apache.org/codec/ Java], [https://github.com/zanaptak/BinaryToTextEncoding C# F#], [https://docs.python.org/dev/library/base64.html#base64.b32encode Python] || {{space}}
|-
|-
| [[Base36]] || Integer || data-sort-value="64%"|~64% || bash, [[C (programming language)|C]], [[C++]], [[C Sharp (programming language)|C#]], [[Java (programming language)|Java]], [[Perl]], [[PHP]], [[Python (programming language)|Python]], Visual Basic, [[Swift (programming language)|Swift]], many others
| [[Base32]] || Arbitrary || 62.5% || [http://sourceforge.net/projects/cyoencode/ ANSI C], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi], [https://pkg.go.dev/encoding/base32 Go], [http://commons.apache.org/codec/ Java], [https://docs.python.org/dev/library/base64.html#base64.b32encode Python] || {{space}}
|-
| [[Base36]] || Integer || data-sort-value="64%"|~64% || [[Base36#bash implementation|bash]], [[Base36#C implementation|C]], [[Base36#C++ implementation|C++]], [[Base36#C|C#]], [[Base36#Java implementation|Java]], [[Base36#Perl implementation|Perl]], [[Base36#PHP implementation|PHP]], [[Base36#Python implementation|Python]], [[Base36#Visual Basic implementation|Visual Basic]], [[Base36#Swift implementation|Swift]], many others
|Uses the [[Arabic numerals]] 0–9 and the [[Latin alphabet|Latin letters]] A–Z (the [[ISO basic Latin alphabet]]). Commonly used by [[URL redirection]] systems like [[TinyURL]] or SnipURL/Snipr as compact alphanumeric identifiers.
|Uses the [[Arabic numerals]] 0–9 and the [[Latin alphabet|Latin letters]] A–Z (the [[ISO basic Latin alphabet]]). Commonly used by [[URL redirection]] systems like [[TinyURL]] or SnipURL/Snipr as compact alphanumeric identifiers.
|-
|-
| [[Base45]] || Arbitrary || ~67% (97%{{efn|Encoding for QR code generation automatically selects the encoding to match the input character set, encoding 2 alphanumeric characters in 11 bits, and Base45 encodes 16 bits into 3 such characters. The efficiency is thus 32 bits of binary data encoded in 33 bits: 97%.}}) || [https://github.com/Dasio/base45/ Go] || Defined in Draft IETF Specification RFC 9285 for including binary data compactly in a [[QR code]].<ref>{{Cite web|url=https://datatracker.ietf.org/doc/draft-faltstrom-base45/|title = The Base45 Data Encoding|date = 2022-08-11|last1 = Fältström|first1 = Patrik|last2 = Ljunggren|first2 = Freik|last3 = Gulik|first3 = Dirk-Willem van|quote=Even in Byte mode, a typical QR code reader tries to interpret a byte sequence as text encoded in UTF-8 or ISO/IEC 8859-1. ... Such data has to be converted into an appropriate text before that text could be encoded as a QR code. ... Base45 ... offers a more compact QR code encoding.}}</ref>
| Base45 || Arbitrary || ~67% (97%{{efn|Encoding for QR code generation automatically selects the encoding to match the input character set, encoding 2 alphanumeric characters in 11 bits, and Base45 encodes 16 bits into 3 such characters. The efficiency is thus 32 bits of binary data encoded in 33 bits: 97%.}}) || [https://github.com/Dasio/base45/ Go], [https://pypi.org/project/base45/ Python] || Defined in IETF Specification RFC 9285 for including binary data compactly in a [[QR code]].<ref>{{Cite web|url=https://rfc-editor.org/rfc/rfc9285|title = The Base45 Data Encoding|date = 2022-08-11|last1 = Fältström|first1 = Patrik|last2 = Ljunggren|first2 = Freik|last3 = Gulik|first3 = Dirk-Willem van|quote=Even in Byte mode, a typical QR code reader tries to interpret a byte sequence as text encoded in UTF-8 or ISO/IEC 8859-1. ... Such data has to be converted into an appropriate text before that text could be encoded as a QR code. ... Base45 ... offers a more compact QR code encoding.}}</ref>
|-
|-
| [[Base56]] || Integer || — || [http://rossduggan.ie/blog/codetry/base-56-integer-encoding-in-php/index.html PHP] [https://github.com/jyn514/base56 Python] [https://pkg.go.dev/toolman.org/encoding/base56 Go] || A variant of Base58 encoding which further sheds the lowercase 'i' and 'o' characters in order to minimise the risk of fraud and human-error.<ref>{{cite web |title=Base-56 Integer Encoding in PHP |url=http://rossduggan.ie/blog/codetry/base-56-integer-encoding-in-php/index.html}}</ref>
| Base56 || Integer || — || [http://rossduggan.ie/blog/codetry/base-56-integer-encoding-in-php/index.html PHP], [https://github.com/jyn514/base56 Python], [https://pkg.go.dev/toolman.org/encoding/base56 Go] || A variant of Base58 encoding which further sheds the '1' and the lowercase 'o' characters in order to minimise the risk of fraud and human-error.<ref>{{cite web |last=Duggan |first=Ross |date=August 18, 2009 |title=Base-56 Integer Encoding in PHP |url=http://rossduggan.ie/blog/codetry/base-56-integer-encoding-in-php/index.html}}</ref>
|-
|-
| {{anchor|Base58}}[[Base58]] || Integer || data-sort-value="73%"|~73% || [https://github.com/bitcoin/bitcoin/blob/master/src/base58.h C++], [https://pypi.python.org/pypi/base58 Python] || Similar to Base64, but modified to avoid both non-alphanumeric characters (+ and /) and letters that might look ambiguous when printed (0{{snd}} zero, I{{snd}} capital i, O{{snd}} capital o and l{{snd}} lower-case L). Base58 is used to represent [[bitcoin]] addresses.<ref>{{cite web |title=Protocol documentation |url=https://en.bitcoin.it/wiki/Protocol_documentation#Addresses |website=Bitcoin Wiki |access-date=10 July 2021}}</ref> Some messaging and social media systems [[Line wrap and word wrap|break lines]] on non-alphanumeric strings. This is avoided by not using [[Percent-encoding#Types of URI characters|URI reserved characters]] such as +. For [[segwit]] it was replaced by Bech32, see below.
| {{anchor|Base58}}Base58 || Integer || data-sort-value="73%"|~73% || [https://github.com/bitcoin/libbase58 C], [https://github.com/bitcoin/bitcoin/blob/master/src/base58.h C++], [https://pypi.python.org/pypi/base58 Python], [https://github.com/medo64/Medo/blob/main/src/Medo/Convert/Base58.cs C#], [https://github.com/NovaCrypto/Base58 Java] || Similar to Base64, but modified to avoid both non-alphanumeric characters (+ and /) and letters that might look ambiguous when printed (0{{snd}} zero, I{{snd}} capital i, O{{snd}} capital o and l{{snd}} lower-case L). Base58 is used to represent [[bitcoin]] addresses.{{cn|date=April 2023}} Some messaging and social media systems [[Line wrap and word wrap|break lines]] on non-alphanumeric strings. This is avoided by not using [[Percent-encoding#Reserved characters|URI reserved characters]] such as +. For [[SegWit]], it was replaced by Bech32, see below.
[[File:Original source code bitcoin-version-0.1.0 file base58.h.png|400px|thumb|Base58 in the original bitcoin source code]]
[[File:Original source code bitcoin-version-0.1.0 file base58.h.png|400px|thumb|Base58 in the original bitcoin source code]]
|-
|-
| [[Base62]] || Arbitrary || ~74% || [https://github.com/fbernier/base62 Rust] || Similar to Base64, but contains only alphanumeric characters.
| [[Base62]] || Arbitrary || ~74% || [https://github.com/fbernier/base62 Rust], [https://pypi.org/project/pybase62/ Python]|| Similar to Base64, but contains only alphanumeric characters.
|-
|-
| [[Base64]] || Arbitrary || 75% || [http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts awk], [http://base64.sourceforge.net/ C], [http://www.fpx.de/fp/Software/UUDeview/ C (2)], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi], [https://pkg.go.dev/encoding/base64 Go], [https://docs.python.org/3/library/base64.html#base64.b64encode Python], many others || {{space}}
| [[Base64]] || Arbitrary || 75% || [http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts awk] {{Webarchive|url=https://web.archive.org/web/20141229031706/http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts |date=2014-12-29 }}, [http://base64.sourceforge.net/ C], [http://www.fpx.de/fp/Software/UUDeview/ C (2)], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi], [https://pkg.go.dev/encoding/base64 Go], [https://docs.python.org/3/library/base64.html#base64.b64encode Python], many others || An early and still-popular encoding, first specified as part of {{IETF RFC|989}} in 1987
|-
|-
| [[Base85]] ({{IETF RFC|1924}}) || Arbitrary || 80% || [https://github.com/woolstar/test/blob/master/encode/base85.c C], [https://docs.python.org/3/library/base64.html#base64.b85encode Python] [https://code.google.com/p/python-mom/source/browse/mom/codec/base85.py Python (2)] || Revised version of [[Ascii85]].
| [[Base85]] || Arbitrary || 80% ||[https://github.com/woolstar/test/blob/master/encode/base85.c C], [https://docs.python.org/3/library/base64.html#base64.b85encode Python], [https://code.google.com/p/python-mom/source/browse/mom/codec/base85.py Python (2)]
| Revised version of [[Ascii85]].
|-
|-
| Base91<ref>https://www.iiis.org/CDs2010/CD2010SCI/CCCT_2010/PapersPdf/TB100QM.pdf {{Bare URL PDF|date=March 2022}}</ref> || Arbitrary || 81% || [https://github.com/zanaptak/BinaryToTextEncoding C# F#] || Constant width variant
| Base91<ref>{{Cite web |author=Dake He |author2=Yu Sun |author3=Zhen Jia |author4=Xiuying Yu |author5=Wei Guo |author6=Wei He |author7=Chao Qi |author8=Xianhui Lu |title=A Proposal of Substitute for Base85/64 – Base91 |url=https://www.iiis.org/CDs2010/CD2010SCI/CCCT_2010/PapersPdf/TB100QM.pdf |website=International Institute of Informatics and Systemics}}</ref>|| Arbitrary || 81% || [https://github.com/zanaptak/BinaryToTextEncoding C# F#] || Constant width variant
|-
|-
| basE91<ref>http://base91.sourceforge.net/</ref> || Arbitrary || 81% || [https://sourceforge.net/projects/base91/ C, Java, PHP, 8086 Assembly, AWK] [https://github.com/zanaptak/BinaryToTextEncoding C# F#] [https://crates.io/crates/base91 Rust] || Variable width variant
| basE91<ref>{{Cite web |title=binary to ASCII text encoding |url=https://base91.sourceforge.net/ |access-date=2023-03-20 |website=basE91 |publisher=[[SourceForge]]}}</ref>|| Arbitrary || 81% || [https://sourceforge.net/projects/base91/ C, Java, PHP, 8086 Assembly, AWK] [https://github.com/zanaptak/BinaryToTextEncoding C#, F#], [https://crates.io/crates/base91 Rust] || Variable width variant
|-
|-
| Base94<ref>{{cite web | url=https://vorakl.com/articles/base94/ | title=Convert binary data to a text with the lowest overhead :: Vorakl's notes }}</ref> || Arbitrary || 82% || [https://github.com/vorakl/base94 Python] [https://gist.github.com/iso2022jp/4054241 C] || {{space}}
| Base94<ref>{{cite web |date=April 18, 2020 |title=Convert binary data to a text with the lowest overhead |url=https://vorakl.com/articles/base94/ |website=Vorakl's notes}}</ref>|| Arbitrary || 82% || [https://github.com/vorakl/base94 Python], [https://gist.github.com/iso2022jp/4054241 C], [https://crates.io/crates/base94 Rust] || {{space}}
|-
|-
| Base122<ref>{{cite web | url=http://blog.kevinalbs.com/base122 | title=Base-122 Encoding }}</ref> || Arbitrary || 87.5% || [https://github.com/kevinAlbs/Base122 JavaScript] [https://github.com/Theelx/pybase122 Python] [https://github.com/patrickfav/base122-java Java] [https://github.com/eyaler/ztml Base125 Python and Javascript] [https://github.com/vence722/base122-go Go] [https://github.com/kevinAlbs/libbase122 C]|| {{space}}
| Base122<ref>{{cite web |last=Albertson |first=Kevin |date=Nov 26, 2016 |title=Base-122 Encoding |url=http://blog.kevinalbs.com/base122}}</ref>|| Arbitrary || 87.5% || [https://github.com/kevinAlbs/Base122 JavaScript], [https://github.com/Theelx/pybase122 Python], [https://github.com/patrickfav/base122-java Java], [https://github.com/eyaler/ztml Base125 Python and Javascript], [https://github.com/vence722/base122-go Go], [https://github.com/kevinAlbs/libbase122 C]|| {{space}}
|-
|-
| BaseXML<ref>{{cite web | url=https://github.com/kriswebdev/BaseXML | title=BaseXML - for XML1.0+ | website=[[GitHub]] | date=16 March 2019 }}</ref> || Arbitrary || 83.5% || [https://github.com/kriswebdev/BaseXML C Python JavaScript] || {{space}}
| BaseXML<ref>{{cite web | url=https://github.com/kriswebdev/BaseXML | title=BaseXML - for XML1.0+ | website=[[GitHub]] | date=16 March 2019 }}</ref> || Arbitrary || 83.5% || [https://github.com/kriswebdev/BaseXML C Python JavaScript] || {{space}}
|-
|-
| Bech32 || Arbitrary || data-sort-value="62.5%"|62.5% + at least 8 chars (label, separator, 6-char [[error correcting code|ECC]]) || C, C++, JavaScript, Go, Python, Haskell, Ruby, Rust || Specification.<ref>{{Cite web|url=https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#bech32|title=Bitcoin/Bips|website=[[GitHub]]|date=8 December 2021}}</ref> Used in Bitcoin and the [[Lightning Network]].<ref>{{cite web|url=https://github.com/lightningnetwork/lightning-rfc/blob/master/11-payment-encoding.md|title=''Payment encoding'' in the Lightning RFC repo|date=2020-10-15|author=Rusty Russell|website=[[GitHub]]|author-link=Rusty Russell|display-authors=etal}}</ref> The data portion is encoded like Base32 with the possibility to check and correct up to 6 mistyped characters using the 6-character [[BCH code]] at the end, which also checks/corrects the HRP (Human Readable Part). The Bech32m variant has a subtle change that makes it more resilient to changes in length.<ref>{{cite web|url=https://github.com/sipa/bips/blob/bip-bech32m/bip-0350.mediawiki|title=Bech32m format for v1+ witness addresses|website=[[GitHub]]|date=5 December 2021}}</ref>
| {{anchor|Bech32|Bech32m}}Bech32 || Arbitrary || data-sort-value="62.5%"|62.5% + at least 8 chars (label, separator, 6-char [[error correcting code|ECC]]) || C, C++, [[JavaScript]], [[Go (programming language)|Go]], Python, [[Haskell]], [[Ruby (programming language)|Ruby]], [[Rust (programming language)|Rust]]|| Specification.<ref>{{Cite web |date=8 December 2021 |title=bitcoin/bips |url=https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#bech32 |website=[[GitHub]]}}</ref> Used in Bitcoin and the [[Lightning Network]].<ref>{{cite web|url=https://github.com/lightningnetwork/lightning-rfc/blob/master/11-payment-encoding.md|title=''Payment encoding'' in the Lightning RFC repo|date=2020-10-15|author=Rusty Russell|website=[[GitHub]]|author-link=Rusty Russell|display-authors=etal}}</ref> The data portion is encoded like Base32 with the possibility to check and correct up to 6 mistyped characters using the 6-character [[BCH code]] at the end, which also checks/corrects the Human Readable Part. The Bech32m variant has a subtle change that makes it more resilient to changes in length.<ref>{{cite web|url=https://github.com/sipa/bips/blob/bip-bech32m/bip-0350.mediawiki|title=Bech32m format for v1+ witness addresses|website=[[GitHub]]|date=5 December 2021}}</ref>
|-
|-
| [[BinHex]] || Arbitrary || 75%|| [http://metacpan.org/module/Convert::BinHex Perl], [http://www.fpx.de/fp/Software/UUDeview/ C], [http://ibiblio.org/pub/linux/utils/compress/macutils.tar.gz C (2)] || MacOS Classic
| [[BinHex]] || Arbitrary || 75%|| [http://metacpan.org/module/Convert::BinHex Perl], [http://www.fpx.de/fp/Software/UUDeview/ C], [http://ibiblio.org/pub/linux/utils/compress/macutils.tar.gz C (2)] || MacOS Classic
Line 73: Line 71:
| [[Hexadecimal#Base16 (transfer encoding)|Hexadecimal]] (Base16) || Arbitrary || 50% || Most languages || Exists in [[uppercase]] and [[Letter case#All lowercase|lowercase]] variants
| [[Hexadecimal#Base16 (transfer encoding)|Hexadecimal]] (Base16) || Arbitrary || 50% || Most languages || Exists in [[uppercase]] and [[Letter case#All lowercase|lowercase]] variants
|-
|-
| [[Intel HEX]] || Arbitrary || data-sort-value="50%"|≲50% || [https://github.com/vsergeev/libGIS C library], [http://srecord.sourceforge.net/ C++] || Typically used to program [[EPROM]], [[Flash memory|NOR-Flash]] memory chips
| [[Intel HEX]] || Arbitrary || data-sort-value="50%"|≲50% || [https://github.com/vsergeev/libGIS C library], [http://srecord.sourceforge.net/ C++] || Typically used to program [[EPROM]], [[Flash memory|NOR flash]] memory chips
|-
|-
| [[MIME]] || Arbitrary || See [[Quoted-printable]] and [[Base64]] || See [[Quoted-printable]] and [[Base64]] || Encoding container for e-mail-like formatting
| [[MIME]] || Arbitrary || See [[Quoted-printable]] and [[Base64]] || See [[Quoted-printable]] and [[Base64]] || Encoding container for e-mail-like formatting
|-
|-
| [[Percent-encoding]]|| Text ([[URI]]s), Arbitrary ([https://tools.ietf.org/html/rfc1738 RFC1738]) || data-sort-value="40%"|~40%{{efn|For arbitrary data; encoding all 189 non-unreserved characters with three bytes, and the remaining 66 characters with one.}} (33–70%{{efn|For text; only encoding each of the 18 reserved characters.}}) || [http://www.geekhideout.com/urlcode.shtml C], [https://docs.python.org/3/library/urllib.parse.html#module-urllib.parse Python], probably many others || {{space}}
| [[MOS Technology file format]] || Arbitrary || || || Typically used to program [[EPROM]], [[Flash memory|NOR-Flash]] memory chips.
|-
| [[Percent encoding]] || Text ([[URI]]s), Arbitrary ([https://tools.ietf.org/html/rfc1738 RFC1738]) || data-sort-value="40%"|~40%{{efn|For arbitrary data; encoding all 189 non-unreserved characters with three bytes, and the remaining 66 characters with one.}} (33–70%{{efn|For text; only encoding each of the 18 reserved characters.}}) || [http://www.geekhideout.com/urlcode.shtml C], [https://docs.python.org/3/library/urllib.parse.html#module-urllib.parse Python], probably many others || {{space}}
|-
|-
| [[Quoted-printable]] || Text || data-sort-value="33%"|~33–100%{{efn|1= One byte stored as =XX. Encoding all but the 94 characters which don't need it (incl. space and tab).}} || Probably many || Preserves line breaks; cuts lines at 76 characters
| [[Quoted-printable]] || Text || data-sort-value="33%"|~33–100%{{efn|1= One byte stored as =XX. Encoding all but the 94 characters which don't need it (incl. space and tab).}} || Probably many || Preserves line breaks; cuts lines at 76 characters
|-
|-
| [[S-record]] (Motorola hex) || Arbitrary || 49.6% || [https://github.com/vsergeev/libGIS C library], [http://srecord.sourceforge.net/ C++] || Typically used to program [[EPROM]], [[Flash memory|NOR-Flash]] memory chips. 49.6% assumes 255 binary bytes per record.
| [[S-record]] (Motorola hex) || Arbitrary || 49.6% || [https://github.com/vsergeev/libGIS C library], [http://srecord.sourceforge.net/ C++] || Typically used to program [[EPROM]], [[Flash memory|NOR flash]] memory chips. 49.6% assumes 255 binary bytes per record.
|-
|-
| [[Tektronix hex]] || Arbitrary || || || Typically used to program [[EPROM]], [[Flash memory|NOR-Flash]] memory chips.
| [[Tektronix hex]] || Arbitrary || || || Typically used to program [[EPROM]], [[Flash memory|NOR flash]] memory chips.
|-
|-
| TxMS || Hexadecimal || ~32% || [https://github.com/bchainhub/txms.js Node.js (and CLI)] || Used to transmit [[Blockchain]] transactions via [[SMS]] using [[UTF-16BE]].
| [[Uuencoding]] || Arbitrary || data-sort-value="60%"|~60% ([[Uuencoding#Disadvantages|up to 70%]]) || [[Uuencoding#Support in Perl|Perl]], [http://www.fpx.de/fp/Software/UUDeview/ C], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi], [https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/sun/misc/UUEncoder.java Java], [https://docs.python.org/3/library/uu.html Python], probably many others || Largely replaced by MIME and yEnc
|-
|-
| [[Xxencoding]] || Arbitrary || data-sort-value="75%"|~75% (similar to Uuencoding) || [http://www.fpx.de/fp/Software/UUDeview/ C], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi] || Proposed (and occasionally used) as replacement for Uuencoding to avoid character set translation problems between ASCII and the EBCDIC systems that could corrupt Uuencoded data
| [[Uuencoding]] || Arbitrary || data-sort-value="60%"|~60% ([[Uuencoding#Disadvantages|up to 70%]]) || [[Uuencoding#Perl|Perl]], [http://www.fpx.de/fp/Software/UUDeview/ C], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi], [https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/sun/misc/UUEncoder.java Java], [https://docs.python.org/3/library/uu.html Python], probably many others || An early encoding developed in 1980 for [[Unix-to-Unix Copy]]. Largely replaced by MIME and [[yEnc]]
|-
|-
| [[yEnc]] {{Efn|name=nonprint}}|| Arbitrary, mostly non-text || data-sort-value="98%"|~98% || [http://www.fpx.de/fp/Software/UUDeview/ C] [https://github.com/whoughton/yEnc JavaScript] [https://github.com/eshaz/simple-yenc JavaScript (2)] [https://github.com/eyaler/ztml crEnc Python and Javascript] || Includes a CRC checksum
| [[Xxencoding]] || Arbitrary || data-sort-value="75%"|~75% (similar to Uuencoding) || [http://www.fpx.de/fp/Software/UUDeview/ C], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi] || Proposed (and occasionally used) as replacement for Uuencoding to avoid character set translation problems between ASCII and the EBCDIC systems that could corrupt Uuencoded data
|-
|-
| z85 ([https://rfc.zeromq.org/spec/32/ ZeroMQ spec:32/Z85]) || Binary & ASCII || 80% (similar to Ascii85/Base85) || [https://github.com/zeromq/rfc/blob/master/src/spec_32.c C] (original), [https://github.com/coenm/Z85e C#], [https://pub.dev/packages/z85 Dart], [https://github.com/jamesruan/z85/blob/master/src/z85.erl Erlang], [https://github.com/tilinna/z85 Go],[https://github.com/philanc/plc/blob/master/plc/base85.lua Lua], [https://github.com/fxn/z85 Ruby], [https://docs.rs/z85/latest/src/z85/lib.rs.html Rust] and others... || Specifies a subset of ASCII similar to [[Ascii85]], omitting a few characters that may cause program bugs (<code>` \ " ' _ , ;</code>). The format conforms to [https://rfc.zeromq.org/spec/32/ ZeroMQ spec:32/Z85].
| z85 ([https://rfc.zeromq.org/spec/32/ ZeroMQ spec:32/Z85]) || Binary & ASCII || 80% (similar to Ascii85/Base85) || [https://github.com/zeromq/rfc/blob/master/src/spec_32.c C] (original), [https://github.com/coenm/Z85e C#], [https://pub.dev/packages/z85 Dart], [https://github.com/jamesruan/z85/blob/master/src/z85.erl Erlang], [https://github.com/tilinna/z85 Go], [https://github.com/philanc/plc/blob/master/plc/base85.lua Lua], [https://github.com/fxn/z85 Ruby], [https://docs.rs/z85/latest/src/z85/lib.rs.html Rust] and others || Specifies a subset of ASCII similar to [[Ascii85]], omitting a few characters that may cause program bugs (<code>` \ " ' _ , ;</code>). The format conforms to [https://rfc.zeromq.org/spec/32/ ZeroMQ spec:32/Z85].
|-
|-
| {{IETF RFC|1751}} ([[S/KEY]]) || Arbitrary || 33% || C,<ref name="RFC1760" /> [https://www.dlitz.net/software/pycrypto/doc/#crypto-util-rfc1751 Python], ...
| {{IETF RFC|1751}} ([[S/KEY]]) || Arbitrary || 33% || C,<ref name="RFC1760" /> [https://www.dlitz.net/software/pycrypto/doc/#crypto-util-rfc1751 Python]
|
|
"A Convention for [[Human-readable]] 128-bit Keys". A series of small English words is easier for humans to read, remember, and type in than decimal or other binary-to-text encoding systems.<ref>
"A Convention for [[Human-readable]] 128-bit Keys". A series of small English words is easier for humans to read, remember, and type in than decimal or other binary-to-text encoding systems.<ref>
Line 114: Line 110:
Some of these encoding (quoted-printable and percent encoding) are based on a set of allowed characters and a single [[escape character]]. The allowed characters are left unchanged, while all other characters are converted into a string starting with the escape character. This kind of conversion allows the resulting text to be almost readable, in that letters and digits are part of the allowed characters, and are therefore left as they are in the encoded text. These encodings produce the shortest plain ASCII output for input that is mostly printable ASCII.
Some of these encoding (quoted-printable and percent encoding) are based on a set of allowed characters and a single [[escape character]]. The allowed characters are left unchanged, while all other characters are converted into a string starting with the escape character. This kind of conversion allows the resulting text to be almost readable, in that letters and digits are part of the allowed characters, and are therefore left as they are in the encoded text. These encodings produce the shortest plain ASCII output for input that is mostly printable ASCII.


Some other encodings ([[base64]], [[uuencoding]]) are based on mapping all possible sequences of six [[bit]]s into different printable characters. Since there are more than 2<sup>6</sup>&nbsp;=&nbsp;64 printable characters, this is possible. A given sequence of bytes is translated by viewing it as stream of bits, breaking this stream in chunks of six bits and generating the sequence of corresponding characters. The different encodings differ in the mapping between sequences of bits and characters and in how the resulting text is formatted.
Some other encodings ([[base64]], [[uuencoding]]) are based on mapping all possible sequences of six [[bit]]s into different printable characters. Since there are more than 2<sup>6</sup>&nbsp;=&nbsp;64 printable characters, this is possible. A given sequence of bytes is translated by viewing it as a stream of bits, breaking this stream in chunks of six bits and generating the sequence of corresponding characters. The different encodings differ in the mapping between sequences of bits and characters and in how the resulting text is formatted.


Some encodings (the original version of BinHex and the recommended encoding for [[CipherSaber]]) use four bits instead of six, mapping all possible sequences of 4 bits onto the 16 standard [[hexadecimal]] digits. Using 4 bits per encoded character leads to a 50% longer output than base64, but simplifies encoding and decoding—expanding each byte in the source independently to two encoded bytes is simpler than base64's expanding 3 source bytes to 4 encoded bytes.
Some encodings (the original version of BinHex and the recommended encoding for [[CipherSaber]]) use four bits instead of six, mapping all possible sequences of 4 bits onto the 16 standard [[hexadecimal]] digits. Using 4 bits per encoded character leads to a 50% longer output than base64, but simplifies encoding and decoding—expanding each byte in the source independently to two encoded bytes is simpler than base64's expanding 3 source bytes to 4 encoded bytes.


Out of [[PETSCII]]'s first 192 codes, 164 have visible representations when quoted: 5 (white), 17–20 and 28–31 (colors and cursor controls), 32–90 (ascii equivalent), 91–127 (graphics), 129 (orange), 133–140 (function keys), 144–159 (colors and cursor controls), and 160–192 (graphics).<ref>http://sta.c64.org/cbm64pet.html et al</ref> This theoretically permits encodings, such as base128, between PETSCII-speaking machines.
Out of [[PETSCII]]'s first 192 codes, 164 have visible representations when quoted: 5 (white), 17–20 and 28–31 (colors and cursor controls), 32–90 (ascii equivalent), 91–127 (graphics), 129 (orange), 133–140 (function keys), 144–159 (colors and cursor controls), and 160–192 (graphics).<ref>{{Cite web |title=Commodore 64 PETSCII codes |url=https://sta.c64.org/cbm64pet.html |website=sta.c64.org}}</ref> This theoretically permits encodings, such as base128, between PETSCII-speaking machines.


== See also ==
== See also ==
Line 126: Line 122:
* [[Computer number format]]
* [[Computer number format]]
* [[Geocode]]
* [[Geocode]]
* [[Numeral system]]s, [[List of numeral systems#By type of notations|listed by notation type]] <!-- This is here to help readers to find encodings that may not belong in this article (e.g. programmers or cryptographers looking for something such as [[Base 26]]), since the topic of this article is _not currently “Data-to-text encodings”, but rather “Binary-to-text encodings”
* [[Numeral system]]s, [[List of numeral systems#By type of notations|listed by notation type]] <!-- This is here to help readers to find encodings that may not belong in this article (e.g. programmers or cryptographers looking for something such as [[Base 26]]), since the topic of this article is _not currently "Data-to-text encodings", but rather "Binary-to-text encodings"


Originally was to be a ‘hatnote’, viz:
Originally was to be a 'hatnote', viz:
{{see also|List of numeral systems#By type of notation}}
{{see also|List of numeral systems#By type of notation}}



Latest revision as of 17:23, 29 October 2024

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the communication channel does not allow binary data (such as email or NNTP) or is not 8-bit clean. PGP documentation (RFC 4880) uses the term "ASCII armor" for binary-to-text encoding when referring to Base64.

Overview

[edit]

The basic need for a binary-to-text encoding comes from a need to communicate arbitrary binary data over preexisting communications protocols that were designed to carry only English language human-readable text. Those communication protocols may only be 7-bit safe (and within that avoid certain ASCII control codes), and may require line breaks at certain maximum intervals, and may not maintain whitespace. Thus, only the 94 printable ASCII characters are "safe" to use to convey data.

Description

[edit]

The ASCII text-encoding standard uses 7 bits to encode characters. With this it is possible to encode 128 (i.e. 27) unique values (0–127) to represent the alphabetic, numeric, and punctuation characters commonly used in English, plus a selection of Control characters which do not represent printable characters. For example, the capital letter A is represented in 7 bits as 100 00012, 0x41 (1018) , the numeral 2 is 011 00102 0x32 (628), the character } is 111 11012 0x7D (1758), and the Control character RETURN is 000 11012 0x0D (158).

In contrast, most computers store data in memory organized in eight-bit bytes. Files that contain machine-executable code and non-textual data typically contain all 256 possible eight-bit byte values. Many computer programs came to rely on this distinction between seven-bit text and eight-bit binary data, and would not function properly if non-ASCII characters appeared in data that was expected to include only ASCII text. For example, if the value of the eighth bit is not preserved, the program might interpret a byte value above 127 as a flag telling it to perform some function.

It is often desirable, however, to be able to send non-textual data through text-based systems, such as when one might attach an image file to an e-mail message. To accomplish this, the data is encoded in some way, such that eight-bit data is encoded into seven-bit ASCII characters (generally using only alphanumeric and punctuation characters—the ASCII printable characters). Upon safe arrival at its destination, it is then decoded back to its eight-bit form. This process is referred to as binary to text encoding. Many programs perform this conversion to allow for data-transport, such as PGP and GNU Privacy Guard.

Encoding plain text

[edit]

Binary-to-text encoding methods are also used as a mechanism for encoding plain text. For example:

  • Some systems have a more limited character set they can handle; not only are they not 8-bit clean, some cannot even handle every printable ASCII character.
  • Other systems have limits on the number of characters that may appear between line breaks, such as the "1000 characters per line" limit of some Simple Mail Transfer Protocol software, as allowed by RFC 2821.
  • Still others add headers or trailers to the text.
  • A few poorly-regarded but still-used protocols use in-band signaling, causing confusion if specific patterns appear in the message. The best-known is the string "From " (including trailing space) at the beginning of a line, used to separate mail messages in the mbox file format.

By using a binary-to-text encoding on messages that are already plain text, then decoding on the other end, one can make such systems appear to be completely transparent. This is sometimes referred to as 'ASCII armoring'. For example, the ViewState component of ASP.NET uses base64 encoding to safely transmit text via HTTP POST, in order to avoid delimiter collision.

Encoding standards

[edit]

The table below compares the most used forms of binary-to-text encodings. The efficiency listed is the ratio between the number of bits in the input and the number of bits in the encoded output.

Encoding Data type Efficiency Programming language implementations Comments
Ascii85 Arbitrary 80% awk Archived 2014-12-29 at the Wayback Machine, C, C (2), C#, F#, Go, Java Perl, Python, Python (2) There exist several variants of this encoding, Base85, btoa, etc.
Base32 Arbitrary 62.5% ANSI C, Delphi, Go, Java, C# F#, Python  
Base36 Integer ~64% bash, C, C++, C#, Java, Perl, PHP, Python, Visual Basic, Swift, many others Uses the Arabic numerals 0–9 and the Latin letters A–Z (the ISO basic Latin alphabet). Commonly used by URL redirection systems like TinyURL or SnipURL/Snipr as compact alphanumeric identifiers.
Base45 Arbitrary ~67% (97%[a]) Go, Python Defined in IETF Specification RFC 9285 for including binary data compactly in a QR code.[1]
Base56 Integer PHP, Python, Go A variant of Base58 encoding which further sheds the '1' and the lowercase 'o' characters in order to minimise the risk of fraud and human-error.[2]
Base58 Integer ~73% C, C++, Python, C#, Java Similar to Base64, but modified to avoid both non-alphanumeric characters (+ and /) and letters that might look ambiguous when printed (0 – zero, I – capital i, O – capital o and l – lower-case L). Base58 is used to represent bitcoin addresses.[citation needed] Some messaging and social media systems break lines on non-alphanumeric strings. This is avoided by not using URI reserved characters such as +. For SegWit, it was replaced by Bech32, see below.
Base58 in the original bitcoin source code
Base62 Arbitrary ~74% Rust, Python Similar to Base64, but contains only alphanumeric characters.
Base64 Arbitrary 75% awk Archived 2014-12-29 at the Wayback Machine, C, C (2), Delphi, Go, Python, many others An early and still-popular encoding, first specified as part of RFC 989 in 1987
Base85 Arbitrary 80% C, Python, Python (2) Revised version of Ascii85.
Base91[3] Arbitrary 81% C# F# Constant width variant
basE91[4] Arbitrary 81% C, Java, PHP, 8086 Assembly, AWK C#, F#, Rust Variable width variant
Base94[5] Arbitrary 82% Python, C, Rust  
Base122[6] Arbitrary 87.5% JavaScript, Python, Java, Base125 Python and Javascript, Go, C  
BaseXML[7] Arbitrary 83.5% C Python JavaScript  
Bech32 Arbitrary 62.5% + at least 8 chars (label, separator, 6-char ECC) C, C++, JavaScript, Go, Python, Haskell, Ruby, Rust Specification.[8] Used in Bitcoin and the Lightning Network.[9] The data portion is encoded like Base32 with the possibility to check and correct up to 6 mistyped characters using the 6-character BCH code at the end, which also checks/corrects the Human Readable Part. The Bech32m variant has a subtle change that makes it more resilient to changes in length.[10]
BinHex Arbitrary 75% Perl, C, C (2) MacOS Classic
Decimal Integer ~42% Most languages Usually the default representation for input/output from/to humans.
Hexadecimal (Base16) Arbitrary 50% Most languages Exists in uppercase and lowercase variants
Intel HEX Arbitrary ≲50% C library, C++ Typically used to program EPROM, NOR flash memory chips
MIME Arbitrary See Quoted-printable and Base64 See Quoted-printable and Base64 Encoding container for e-mail-like formatting
Percent-encoding Text (URIs), Arbitrary (RFC1738) ~40%[b] (33–70%[c]) C, Python, probably many others  
Quoted-printable Text ~33–100%[d] Probably many Preserves line breaks; cuts lines at 76 characters
S-record (Motorola hex) Arbitrary 49.6% C library, C++ Typically used to program EPROM, NOR flash memory chips. 49.6% assumes 255 binary bytes per record.
Tektronix hex Arbitrary Typically used to program EPROM, NOR flash memory chips.
TxMS Hexadecimal ~32% Node.js (and CLI) Used to transmit Blockchain transactions via SMS using UTF-16BE.
Uuencoding Arbitrary ~60% (up to 70%) Perl, C, Delphi, Java, Python, probably many others An early encoding developed in 1980 for Unix-to-Unix Copy. Largely replaced by MIME and yEnc
Xxencoding Arbitrary ~75% (similar to Uuencoding) C, Delphi Proposed (and occasionally used) as replacement for Uuencoding to avoid character set translation problems between ASCII and the EBCDIC systems that could corrupt Uuencoded data
z85 (ZeroMQ spec:32/Z85) Binary & ASCII 80% (similar to Ascii85/Base85) C (original), C#, Dart, Erlang, Go, Lua, Ruby, Rust and others Specifies a subset of ASCII similar to Ascii85, omitting a few characters that may cause program bugs (` \ " ' _ , ;). The format conforms to ZeroMQ spec:32/Z85.
RFC 1751 (S/KEY) Arbitrary 33% C,[11] Python

"A Convention for Human-readable 128-bit Keys". A series of small English words is easier for humans to read, remember, and type in than decimal or other binary-to-text encoding systems.[12] Each 64-bit number is mapped to six short words, of one to four characters each, from a public 2048-word dictionary.[11]

The 95 isprint codes 32 to 126 are known as the ASCII printable characters.

Some older and today uncommon formats include BOO, BTOA, and USR encoding.

Most of these encodings generate text containing only a subset of all ASCII printable characters: for example, the base64 encoding generates text that only contains upper case and lower case letters, (A–Z, a–z), numerals (0–9), and the "+", "/", and "=" symbols.

Some of these encoding (quoted-printable and percent encoding) are based on a set of allowed characters and a single escape character. The allowed characters are left unchanged, while all other characters are converted into a string starting with the escape character. This kind of conversion allows the resulting text to be almost readable, in that letters and digits are part of the allowed characters, and are therefore left as they are in the encoded text. These encodings produce the shortest plain ASCII output for input that is mostly printable ASCII.

Some other encodings (base64, uuencoding) are based on mapping all possible sequences of six bits into different printable characters. Since there are more than 26 = 64 printable characters, this is possible. A given sequence of bytes is translated by viewing it as a stream of bits, breaking this stream in chunks of six bits and generating the sequence of corresponding characters. The different encodings differ in the mapping between sequences of bits and characters and in how the resulting text is formatted.

Some encodings (the original version of BinHex and the recommended encoding for CipherSaber) use four bits instead of six, mapping all possible sequences of 4 bits onto the 16 standard hexadecimal digits. Using 4 bits per encoded character leads to a 50% longer output than base64, but simplifies encoding and decoding—expanding each byte in the source independently to two encoded bytes is simpler than base64's expanding 3 source bytes to 4 encoded bytes.

Out of PETSCII's first 192 codes, 164 have visible representations when quoted: 5 (white), 17–20 and 28–31 (colors and cursor controls), 32–90 (ascii equivalent), 91–127 (graphics), 129 (orange), 133–140 (function keys), 144–159 (colors and cursor controls), and 160–192 (graphics).[13] This theoretically permits encodings, such as base128, between PETSCII-speaking machines.

See also

[edit]

Notes

[edit]
  1. ^ Encoding for QR code generation automatically selects the encoding to match the input character set, encoding 2 alphanumeric characters in 11 bits, and Base45 encodes 16 bits into 3 such characters. The efficiency is thus 32 bits of binary data encoded in 33 bits: 97%.
  2. ^ For arbitrary data; encoding all 189 non-unreserved characters with three bytes, and the remaining 66 characters with one.
  3. ^ For text; only encoding each of the 18 reserved characters.
  4. ^ One byte stored as =XX. Encoding all but the 94 characters which don't need it (incl. space and tab).

References

[edit]
  1. ^ Fältström, Patrik; Ljunggren, Freik; Gulik, Dirk-Willem van (2022-08-11). "The Base45 Data Encoding". Even in Byte mode, a typical QR code reader tries to interpret a byte sequence as text encoded in UTF-8 or ISO/IEC 8859-1. ... Such data has to be converted into an appropriate text before that text could be encoded as a QR code. ... Base45 ... offers a more compact QR code encoding.
  2. ^ Duggan, Ross (August 18, 2009). "Base-56 Integer Encoding in PHP".
  3. ^ Dake He; Yu Sun; Zhen Jia; Xiuying Yu; Wei Guo; Wei He; Chao Qi; Xianhui Lu. "A Proposal of Substitute for Base85/64 – Base91" (PDF). International Institute of Informatics and Systemics.
  4. ^ "binary to ASCII text encoding". basE91. SourceForge. Retrieved 2023-03-20.
  5. ^ "Convert binary data to a text with the lowest overhead". Vorakl's notes. April 18, 2020.
  6. ^ Albertson, Kevin (Nov 26, 2016). "Base-122 Encoding".
  7. ^ "BaseXML - for XML1.0+". GitHub. 16 March 2019.
  8. ^ "bitcoin/bips". GitHub. 8 December 2021.
  9. ^ Rusty Russell; et al. (2020-10-15). "Payment encoding in the Lightning RFC repo". GitHub.
  10. ^ "Bech32m format for v1+ witness addresses". GitHub. 5 December 2021.
  11. ^ a b RFC 1760 "The S/KEY One-Time Password System".
  12. ^ RFC 1751 "A Convention for Human-Readable 128-bit Keys"
  13. ^ "Commodore 64 PETSCII codes". sta.c64.org.