Comma code: Difference between revisions
m →top: http→https for Google Books and Google News using AWB |
50% commas in all data axiom and converting Huffman codes to comma codes |
||
Line 1: | Line 1: | ||
A '''comma code''' is a type of [[prefix-free code]] in which a '''comma''', a particular symbol or sequence of symbols, occurs at the end of a code word and never occurs otherwise.<ref name="Wade1994">{{cite book |last=Wade |first=Graham |title=Signal Coding and Processing |url=https://books.google.com/books?id=CJswCy7_W8YC&pg=PA56 |date=8 September 1994 |publisher=Cambridge University Press |isbn=978-0-521-42336-6 |page=56}}</ref> |
A '''comma code''' is a type of [[prefix-free code]] in which a '''comma''', a particular symbol or sequence of symbols, occurs at the end of a code word and never occurs otherwise.<ref name="Wade1994">{{cite book |last=Wade |first=Graham |title=Signal Coding and Processing |url=https://books.google.com/books?id=CJswCy7_W8YC&pg=PA56 |date=8 September 1994 |publisher=Cambridge University Press |isbn=978-0-521-42336-6 |page=56}}</ref> This is an intuitive way to express arrays. |
||
For example, [[Fibonacci coding]] is a comma code in which the comma is <code>11</code>. <code>11</code> and <code>1011</code> are valid Fibonacci code words, but <code>101</code>, <code>0111</code>, and <code>11011</code> are not. |
For example, [[Fibonacci coding]] is a comma code in which the comma is <code>11</code>. <code>11</code> and <code>1011</code> are valid Fibonacci code words, but <code>101</code>, <code>0111</code>, and <code>11011</code> are not. |
||
Line 5: | Line 5: | ||
== Examples == |
== Examples == |
||
* [[Unary coding]], in which the comma is <code>0</code>. |
* [[Unary coding]], in which the comma is <code>0</code>. This allows NULL values ( when the code and comma is a single <code>0</code>, the value can be taken as a NULL or a 0 ). |
||
* [[Fibonacci coding]], in which the comma is <code>11</code>. |
* [[Fibonacci coding]], in which the comma is <code>11</code>. |
||
* All [[Huffman coding|Huffman codes]] can be converted to comma codes by prepending a <code>1</code> to the entire code and using a single <code>0</code> as a code and the comma. |
|||
{| class="wikitable" style="text-align: center;" |
|||
!Symbol |
|||
!Code |
|||
!Comma Code |
|||
|- |
|||
|Comma |
|||
| - ( NA) |
|||
|0 |
|||
|- |
|||
|0 |
|||
|00 |
|||
|100 |
|||
|- |
|||
|1 |
|||
|01 |
|||
|101 |
|||
|- |
|||
|2 |
|||
|10 |
|||
|110 |
|||
|- |
|||
|3 |
|||
|11 |
|||
|111 |
|||
|} |
|||
* 50% commas in all data axiom - All data specifically variable length bijective data can be shown to be consisting of exactly 50% of commas. |
|||
All data or suitably curated same-length data exhibits so called [[implied probability]]. Scrambled data that cannot be distinguished from maximum entropy or random data exhibits known and observable statistical probabilities whilst trying to read the scrambled data. Ideally no prior knowledge except for the observable fact that the data is high entropy data. Similar to the concept of 'code space' as demonstrated by [[Chen–Ho encoding|Chen-Ho encoding]]. |
|||
Such data can be termed 'generic data' can be analysed using any interleaving unary code as headers where additional bijective bits ( equal to the length of the unary code just read ) are read as data while the unary code just read serves as an introduction or header for the data. This header data serves as a comma. The data can be read in an interleaving fashion between each bit of the header or in post read fashion when the data is only read after the entire unary header code is read like [[Chen–Ho encoding|Chen-Ho encoding]]. |
|||
It can be seen by random walk techniques that all generic data has a header or comma of an average of 2 bits and data of an additional 2 bits on average. |
|||
This also allows for an inexpensive base increase algorithm before transmission in non binary communication channels, like base-3 or base-5 communication channels. |
|||
{| class="wikitable" style="text-align: center;" |
|||
!n |
|||
!RL code |
|||
!Next code |
|||
!Data |
|||
!Commas |
|||
|- |
|||
|1 |
|||
|1<code>''?''</code> |
|||
|0<code>''?''</code> |
|||
|? |
|||
|, |
|||
|- |
|||
|2 |
|||
|1<code>''?''</code>1<code>''?''</code> |
|||
|0<code>''?''</code>0<code>''?''</code> |
|||
|?? |
|||
|,, |
|||
|- |
|||
|3 |
|||
|1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |
|||
|0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |
|||
|??? |
|||
|,,, |
|||
|- |
|||
|4 |
|||
|1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |
|||
|0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |
|||
|???? |
|||
|,,,, |
|||
|- |
|||
|5 |
|||
|1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |
|||
|0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |
|||
|????? |
|||
|,,,,, |
|||
|- |
|||
|6 |
|||
|1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |
|||
|0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |
|||
|?????? |
|||
|,,,,,, |
|||
|- |
|||
|7 |
|||
|1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |
|||
|0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |
|||
|??????? |
|||
|,,,,,,, |
|||
|- |
|||
|8 |
|||
|1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |
|||
|0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |
|||
|???????? |
|||
|,,,,,,,, |
|||
|- |
|||
|9 |
|||
|1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |
|||
|0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |
|||
|????????? |
|||
|,,,,,,,,, |
|||
|- |
|||
|10 |
|||
|1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code>1<code>''?''</code> |
|||
|0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code>0<code>''?''</code> |
|||
|?????????? |
|||
|,,,,,,,,,, |
|||
|- |
|||
| colspan="5" |... |
|||
|} |
|||
Of course we use a single comma to separate each field of data, therefore showing that all the data consists of 50% of commas. This is quite visible from an implied probability of 50% for the <code>0</code> code in Huffman base 3 codes: <code>0</code>,<code>10</code>,<code>11</code> or the base-5 comma code shown above. The cost-per-character quotient of higher base communication has to maintain near logarithmic values <math display="inline">\frac{log(base)}{log(2)}</math>for the data and less than 2-bits for the comma character to maintain cost effectiveness. |
|||
== See also == |
== See also == |
Revision as of 09:23, 12 October 2022
A comma code is a type of prefix-free code in which a comma, a particular symbol or sequence of symbols, occurs at the end of a code word and never occurs otherwise.[1] This is an intuitive way to express arrays.
For example, Fibonacci coding is a comma code in which the comma is 11
. 11
and 1011
are valid Fibonacci code words, but 101
, 0111
, and 11011
are not.
Examples
- Unary coding, in which the comma is
0
. This allows NULL values ( when the code and comma is a single0
, the value can be taken as a NULL or a 0 ). - Fibonacci coding, in which the comma is
11
. - All Huffman codes can be converted to comma codes by prepending a
1
to the entire code and using a single0
as a code and the comma.
Symbol | Code | Comma Code |
---|---|---|
Comma | - ( NA) | 0 |
0 | 00 | 100 |
1 | 01 | 101 |
2 | 10 | 110 |
3 | 11 | 111 |
- 50% commas in all data axiom - All data specifically variable length bijective data can be shown to be consisting of exactly 50% of commas.
All data or suitably curated same-length data exhibits so called implied probability. Scrambled data that cannot be distinguished from maximum entropy or random data exhibits known and observable statistical probabilities whilst trying to read the scrambled data. Ideally no prior knowledge except for the observable fact that the data is high entropy data. Similar to the concept of 'code space' as demonstrated by Chen-Ho encoding.
Such data can be termed 'generic data' can be analysed using any interleaving unary code as headers where additional bijective bits ( equal to the length of the unary code just read ) are read as data while the unary code just read serves as an introduction or header for the data. This header data serves as a comma. The data can be read in an interleaving fashion between each bit of the header or in post read fashion when the data is only read after the entire unary header code is read like Chen-Ho encoding.
It can be seen by random walk techniques that all generic data has a header or comma of an average of 2 bits and data of an additional 2 bits on average.
This also allows for an inexpensive base increase algorithm before transmission in non binary communication channels, like base-3 or base-5 communication channels.
n | RL code | Next code | Data | Commas |
---|---|---|---|---|
1 | 1?
|
0?
|
? | , |
2 | 1? 1?
|
0? 0?
|
?? | ,, |
3 | 1? 1? 1?
|
0? 0? 0?
|
??? | ,,, |
4 | 1? 1? 1? 1?
|
0? 0? 0? 0?
|
???? | ,,,, |
5 | 1? 1? 1? 1? 1?
|
0? 0? 0? 0? 0?
|
????? | ,,,,, |
6 | 1? 1? 1? 1? 1? 1?
|
0? 0? 0? 0? 0? 0?
|
?????? | ,,,,,, |
7 | 1? 1? 1? 1? 1? 1? 1?
|
0? 0? 0? 0? 0? 0? 0?
|
??????? | ,,,,,,, |
8 | 1? 1? 1? 1? 1? 1? 1? 1?
|
0? 0? 0? 0? 0? 0? 0? 0?
|
???????? | ,,,,,,,, |
9 | 1? 1? 1? 1? 1? 1? 1? 1? 1?
|
0? 0? 0? 0? 0? 0? 0? 0? 0?
|
????????? | ,,,,,,,,, |
10 | 1? 1? 1? 1? 1? 1? 1? 1? 1? 1?
|
0? 0? 0? 0? 0? 0? 0? 0? 0? 0?
|
?????????? | ,,,,,,,,,, |
... |
Of course we use a single comma to separate each field of data, therefore showing that all the data consists of 50% of commas. This is quite visible from an implied probability of 50% for the 0
code in Huffman base 3 codes: 0
,10
,11
or the base-5 comma code shown above. The cost-per-character quotient of higher base communication has to maintain near logarithmic values for the data and less than 2-bits for the comma character to maintain cost effectiveness.
See also
References
- ^ Wade, Graham (8 September 1994). Signal Coding and Processing. Cambridge University Press. p. 56. ISBN 978-0-521-42336-6.