Byte

For the computer industry magazine, see Byte (magazine).

Decimal
Value	Metric
1000	kB	kilobyte
1000²	MB	megabyte
1000³	GB	gigabyte
1000⁴	TB	terabyte
1000⁵	PB	petabyte
1000⁶	EB	exabyte
1000⁷	ZB	zettabyte
1000⁸	YB	yottabyte
1000⁹	RB	ronnabyte
1000¹⁰	QB	quettabyte

Binary
Value	IEC		Memory
1024	KiB	kibibyte	KB	kilobyte
1024²	MiB	mebibyte	MB	megabyte
1024³	GiB	gibibyte	GB	gigabyte
1024⁴	TiB	tebibyte	TB	terabyte
1024⁵	PiB	pebibyte	—
1024⁶	EiB	exbibyte	—
1024⁷	ZiB	zebibyte	—
1024⁸	YiB	yobibyte	—
—
—

Orders of magnitude of data

A byte is commonly used as a unit of storage measurement in computers, regardless of the type of data being stored. It is also one of the basic integral data types in many programming languages.

Meanings

The word "byte" has numerous closely related meanings:

A contiguous sequence of a fixed number of bits (binary digits). In recent years, the use of a byte to mean 8 bits has become nearly ubiquitous.
A contiguous sequence of bits within a binary computer that comprises the smallest addressable sub-field of the computer's natural word-size. That is, the smallest unit of binary data on which meaningful computation, or natural data boundaries, could be applied. For example, the CDC 6000 series scientific mainframes divided their 60-bit floating-point words into 10 six-bit bytes. These bytes conveniently held Hollerith data from punched cards, typically the upper-case alphabet and decimal digits. CDC also often referred to 12-bit quantities as bytes, each holding two 6-bit display code characters, due to the 12-bit I/O architecture of the machine. The PDP-10 used assembly instructions LDB and DPB to extract bytes—these operations survive today in Common Lisp. Bytes of six, seven, or nine bits were used on some computers, for example within the 36-bit word of the PDP-10.

History

The term byte was coined by Werner Buchholz in 1957 during the early design phase for the IBM Stretch computer. Originally it was defined in instructions by a 4-bit byte-size field, allowing from one to sixteen bits (the production design reduced this to a 3-bit byte-size field, allowing from one to eight bits in a byte); typical I/O equipment of the period used six-bit units. A fixed eight-bit byte size was later adopted and promulgated as a standard by the System/360. The term "byte" comes from "bite," as in the smallest amount of data a computer could "bite" at once. The spelling change not only reduced the chance of a "bite" being mistaken for a "bit," but also was consistent with the penchant of early computer scientists to make up words and change spellings. However, back in the 1960s, the luminaries at IBM Education Department in the UK were teaching that a bit was a Binary digIT and a byte was a BinarY TuplE (from n-tuple, i.e. [quin]tuple, [sex]tuple, [sep]tuple, [oc]tuple ...).^{[citation needed]} A byte was also often referred to as "an 8-bit byte", reinforcing the notion that it was a tuple of n bits, and that other sizes were possible. Other sources have also said that the word byte comes from the following: BinarY TablE^{[citation needed]}

A contiguous sequence of binary bits in a serial data stream, such as in modem or satellite communications, or from a disk-drive head, which is the smallest meaningful unit of data. These bytes might include start bits, stop bits, or parity bits, and thus could vary from 7 to 12 bits to contain a single 7-bit ASCII code.
A datatype or synonym for a datatype in certain programming languages. C, for example, defines byte as a storage unit capable of at least being large enough to hold any character of the execution environment (clause 3.5 of the C standard). Since the C char integral data type can hold at least 8 bits (clause 5.2.4.2.1), a byte in C is at least capable of holding 256 different values (signed or unsigned char doesn't matter). Java's primitive byte data type is always defined as consisting of 8 bits and being a signed data type, holding values from -128 to 127.

Early microprocessors, such as Intel's 8008 (the direct predecessor of the 8080, and then 8086) could perform a small number of operations on four bits, such as the DAA (decimal adjust) instruction, and the "half carry" flag, that were used to implement decimal arithmetic routines. These four-bit quantities were called "nibbles," in homage to the then-common 8-bit "bytes."

Alternate words

The eight-bit byte is often called an octet in formal contexts such as industry standards, as well as in networking and telecommunication, in order to avoid any confusion about the number of bits involved. However, 8-bit bytes are now firmly embedded in such common standards as Ethernet and HTML. Octet is also the word used for the eight-bit quantity in many non-English languages, where the pun on bite does not translate.

Half of an eight-bit byte (four bits) is sometimes called a nibble (sometimes spelled nybble) or a hex digit. The nibble is often called a semioctet in a networking or telecommunication context and also by some standards organizations. In addition, a 2-bit quantity is sometimes called a crumb, although this term is rarely used. ^{[citation needed]}

Abbreviation/Symbol

IEEE 1541 and Metric-Interchange-Format specify "B" as the symbol for byte (e.g. MB means megabyte), whilst IEC 60027 seems silent on the subject. Furthermore, B means bel (see decibel), another (logarithmic) unit used in the same field.

IEEE 1541 specifies "b" as the symbol for bit; however the IEC 60027 and Metric-Interchange-Format specify "bit" (e.g. Mbit for megabit) for the symbol, achieving maximum disambiguation from byte.

"b" vs. "B" confusion seems to be common enough to have inspired the creation of a dedicated website b is not B.

French-speaking countries sometimes use an uppercase "o" for "octet". This is not allowed in SI because of the risk of confusion with the zero and the convention that capitals are reserved for unit names derived from proper names, e.g., A=ampere, J=joule; s=second, m=metre.

Therefore, lowercase 'o' is good, and already in use with multiples ko, Mo Octet (computing).

Names for different units

The prefixes used for byte measurements are usually the same as the SI prefixes used for other measurements, but have slightly different values. The former are based on powers of 1,024 (2¹⁰), a convenient binary number, while the SI prefixes are based on powers of 1,000 (10³), a convenient decimal number. The table below illustrates these differences. See binary prefix for further discussion.

Prefix	Name	SI Meaning	Binary meaning	Size difference
k	kilo	10³ = 1000¹	2¹⁰ = 1024¹	2.40%
M	mega	10⁶ = 1000²	2²⁰ = 1024²	4.86%
G	giga	10⁹ = 1000³	2³⁰ = 1024³	7.37%
T	tera	10¹² = 1000⁴	2⁴⁰ = 1024⁴	9.95%
P	peta	10¹⁵ = 1000⁵	2⁵⁰ = 1024⁵	12.59%
E	exa	10¹⁸ = 1000⁶	2⁶⁰ = 1024⁶	15.29%

Fractional information is usually measured in bits, nats, or bans.

Meanings

History

Alternate words

Abbreviation/Symbol

Names for different units

See also