Jump to content

AES3: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Filling in 1 references using Reflinks
Channel status word in AES/EBU: minor clarification
Line 117: Line 117:
*bytes 6–9: Four [[ASCII]] characters for indicating channel origin. Widely used in large studios.
*bytes 6–9: Four [[ASCII]] characters for indicating channel origin. Widely used in large studios.
*bytes 10–13: Four ASCII characters indicating channel destination, to control automatic switchers. Less often used.
*bytes 10–13: Four ASCII characters indicating channel destination, to control automatic switchers. Less often used.
*bytes 14–17: 32-bit sample address, incrementing by 192 every frame. At 48 kHz, this wraps every 24h51m18.485333s.
*bytes 14–17: 32-bit sample address, incrementing block-to-block by 192 (because there are 192 frames per block). At 48 kHz, this wraps every 24h51m18.485333s.
*bytes 18–21: as above, but offset to indicate samples since midnight.<ref>{{Citation |url=http://tech.ebu.ch/docs/tech/tech3250.pdf |title=AES/EBU Interface Standard |page=12 |publisher=EBU |quote=Bytes 18 to 21, Bits 0 to 7: Time of day sample address code. Value (each Byte): 32-bit binary value representing the first sample of current block. LSBs are transmitted first. Default value shall be logic "0". Note: This is the time-of-day laid down during the source encoding of the signal and shall remain unchanged during subsequent operations. A value of all zeros for the binary sample address code shall, for the purposes of transcoding to real time, or to time codes in particular, be taken as midnight (i.e. 00 h, 00 mm, 00 s, 00 frame). Transcoding of the binary number to any conventional time code requires accurate sampling frequency information to provide the sample accurate time.}}</ref>
*bytes 18–21: as above, but offset to indicate samples since midnight.<ref>{{Citation |url=http://tech.ebu.ch/docs/tech/tech3250.pdf |title=AES/EBU Interface Standard |page=12 |publisher=EBU |quote=Bytes 18 to 21, Bits 0 to 7: Time of day sample address code. Value (each Byte): 32-bit binary value representing the first sample of current block. LSBs are transmitted first. Default value shall be logic "0". Note: This is the time-of-day laid down during the source encoding of the signal and shall remain unchanged during subsequent operations. A value of all zeros for the binary sample address code shall, for the purposes of transcoding to real time, or to time codes in particular, be taken as midnight (i.e. 00 h, 00 mm, 00 s, 00 frame). Transcoding of the binary number to any conventional time code requires accurate sampling frequency information to provide the sample accurate time.}}</ref>
*byte 22: contains information about the reliability of the channel status word.
*byte 22: contains information about the reliability of the channel status word.

Revision as of 00:47, 1 June 2012

AES3 is the standard used for the transport of digital audio signals between professional audio devices. It is also known as AES/EBU and is published by the Audio Engineering Society (AES) and as part of IEC 60958. It was developed by the AES and the European Broadcasting Union (EBU) and first published in 1985 and later revised in 1992 and 2003. It is able to carry two channels of PCM audio over several different transmission mediums including balanced and unbalanced lines and optical fiber. A consumer variant of the standard, S/PDIF, is also available.

History and development

The development of standards for digitising analog audio, as used to interconnect both professional and domestic audio equipment was started in the mid 1980s as a joint effort between the Audio Engineering Society and the European Broadcasting Union. This culminated in the publishing of the AES3 standard in 1985. Early on, the standard was frequently known as AES/EBU. Revisions were issued 1992 and 2003. Both AES and EBU versions of the standard exist. Variants using different physical connections, essentially a consumer version of AES3 for use within the domestic “Hi-Fi” environment using connectors more commonly found in the consumer market are specified in IEC 60958. These variants are commonly known as S/PDIF. This work has provided the most commonly used method for digitally interconnecting audio equipment worldwide using physically separate cables for each stereo audio connection.

AES is used in television production and in the post-production process with digital audio workstations and digital mixing consoles.

Hardware connections

The AES3 standard parallels part 4 of the international standard IEC 60958. Of the physical interconnection types defined by IEC 60958, three are in common use:

The AES-3id standard defines a 75-ohm BNC electrical variant of AES3. This uses the same cabling, patching and infrastructure as analogue or digital video, and is thus common in the broadcast industry.

F05 connectors, 5 mm connectors for plastic optical fiber, are more commonly known by their Toshiba brand name, TOSLINK. The precursor of the IEC 60958 Type II specification was the Sony/Philips Digital Interface, or S/PDIF. For details on the format of AES/EBU data, see the article on S/PDIF. Note that the electrical levels differ between AES/EBU and S/PDIF.

For information on the synchronization of digital audio structures, see the AES11 standard. The ability to insert unique identifiers into an AES3 bit stream is covered by the AES52 standard.

AES3 digital audio format can also be carried over an Asynchronous Transfer Mode network. The standard for packing AES3 frames into ATM cells is AES47.

Gender of connector

IEC 60958 Type I Balanced:

  • INPUT: XLR male plug (cable) mates to XLR female jack (device)
  • OUTPUT: XLR female plug (cable) mates to XLR male jack (device)

IEC 60958 Type II Unbalanced:

  • INPUT: RCA male plug (cable) mates to RCA female jack (device)
  • OUTPUT: RCA male plug (cable) mates to RCA female jack (device)

IEC 60958 Type II Optical fiber:

  • INPUT: Fiber male plug (cable) mates to TOSLINK female jack (device)
  • OUTPUT: Fiber male plug (cable) mates to TOSLINK female jack (device)

Protocol

Simple representation of the protocol for both AES/EBU and S/PDIF

The low-level protocol for data transmission in AES/EBU and S/PDIF is largely identical, and the following discussion applies for S/PDIF as well unless otherwise noted.

AES/EBU was designed primarily to support stereo PCM encoded audio in either DAT format at 48 kHz or CD format at 44.1 kHz. No attempt was made to use a carrier able to support both rates; instead, AES/EBU allows the data to be run at any rate, and recovers the clock rate by encoding the data using biphase mark code (BMC).

Each sample time, one 64-bit frame is transmitted. This is divided into two 32-bit subframes or channels containing one sample each: A (left) and B (right). Each subframe consists of 32 time slots used to transmit individual data bits or synchronization information. 24 bits are available for audio data, of which 20 bits are normally used.[dubiousdiscuss]

192 consecutive frames are grouped into an audio block. Certain status information is transmitted once per audio block. At the default 48 kHz sample rate, there are 250 audio blocks per second, and 3,072,000 bits per second with a biphase clock of 6.144 MHz [1]

The 32 time slots of each subframe are used as following:

Time slots 0 to 3

These slots contain a specially coded preamble that identify the subframe and its position within the audio block. They are not normal BMC-encoded data bits, although they do still have zero DC bias.

Three preambles are possible :

  • X (or M) : 11100010 if previous time slot was "0", 00011101 if it was "1". (Equivalently, 10010011 NRZI encoded.) Marks a word for channel A (left), other than at the start of an audio block.
  • Y (or W) : 11100100 if previous time slot was "0", 00011011 if it was "1". (Equivalently, 10010110 NRZI encoded.) Marks a word for channel B (right).
  • Z (or B) : 11101000 if previous time slot was "0", 00010111 if it was "1". (Equivalently, 10011100 NRZI encoded.) Marks a word for channel A (left) at the start of an audio block.

They are called X, Y, Z in the AES3 standard; and M, W, B in IEC 958 (an AES extension).

The 8-bit preambles are transmitted in time allocated to the first four time slots of each subframe (time slots 0 to 3). Any of the three marks the beginning of a subframe. X or Z marks the beginning of a frame, and Z marks the beginning of an audio block.

| 0 | 1 | 2 | 3 |  | 0 | 1 | 2 | 3 | Time slots
 _____       _            _____   _
/     \_____/ \_/  \_____/     \_/ \ Preamble X
 _____     _              ___   ___
/     \___/ \___/  \_____/   \_/   \ Preamble Y
 _____   _                _   _____
/     \_/ \_____/  \_____/ \_/     \ Preamble Z
 ___     ___            ___     ___ 
/   \___/   \___/  \___/   \___/   \ Normal 0 bits
 _   _   _   _        _   _   _   _
/ \_/ \_/ \_/ \_/  \_/ \_/ \_/ \_/ \ Normal 1 bits

| 0 | 1 | 2 | 3 |  | 0 | 1 | 2 | 3 | Time slots

In 2-channel AES3, the preambles form a pattern of ZYXYXYXY…, but it is straightforward to extend this structure to additional channels (more subframes per frame), each with a Y preamble, as is done in the MADI protocol.

Time slots 4 to 7

If the audio word length is more than 20 bits, these slots carry the least significant bits of the audio sample data.

If the audio word length is 20 bits (the default) or less, these time slots can carry auxiliary information such as a low-quality auxiliary audio channel for producer talkback or recording studio-to-studio communication.

Time slots 8 to 27

These time slots carry 20 bits of audio information starting with LSB and ending with MSB. If the source provides fewer than 20 bits, the unused LSBs will be set to the logical 0 (for example, for the 16-bit audio read from CDs bits 8–11 are set to 0).

Time slots 28 to 31

These time slots carry associated bits as follows:

  • V (28) Validity bit: it is set to zero if the audio sample word data are correct and suitable for D/A conversion. Otherwise, the receiving equipment may be instructed to mute its output during the presence of defective samples. It is used by most CD players to indicate that concealment rather than error correction is taking place.
  • U (29) User bit: any kind of data such as running time, song, track number, etc. One bit per audio channel per frame form a serial data stream. Each channel of each audio block has a single 192 bit control word.
  • C (30) Channel status bit: like the user bit, the bits from each frame of an audio block are grouped to make a 192-bit channel status word. Its structure depends on whether AES/EBU or S/PDIF is used.
  • P (31) Parity bit: for error detection. A parity bit is provided to permit the detection of an odd number of errors resulting from malfunctions in the interface. If used, it is set to provide even parity over bits 4–31.

Channel status word in AES/EBU

As stated before there is one channel status bit in each subframe, making one 192 bit word for each channel in each block. This 192 bit word is usually presented as 192/8 = 24 bytes. The contents of the channel status word are completely different between the AES3 and S/PDIF standards, although they agree that the first channel status bit (byte 0 bit 0) distinguishes between the two. In the case of AES3, the standard describes in detail how the bits have to be used. Here is a summary of the channel status word:

  • byte 0: basic control data: sample rate, compression, emphasis
    • bit 0: A value of 1 indicates this is AES/EBU channel status data. 0 indicates this is S/PDIF data.
    • bit 1: A value of 0 indicates this is linear audio PCM data. A value of 1 indicates other (usually non-audio) data.
    • bits 2–4: Indicates the type of signal preemphasis applied to the data. Generally set to 100 (none).
    • bit 5: A value of 0 indicates that the source is locked to some (unspecified) external time sync. A value of 1 indicates an unlocked source.
    • Bits 6–7: Sample rate. These bits are redundant when real-time audio is transmitted (the receiver can observe the sample rate directly), but are useful if AES/EBU data is recorded or otherwise stored. Options are unspecified, 48 kHz (the default), 44.1 kHz, and 32 kHz.
  • byte 1: indicates if the audio stream is stereo, mono or some other combination.
    • bits 0–3: Indicates the relationship of the two channels; they might be unrelated audio data, a stereo pair, duplicated mono data, music and voice commentary, a stereo sum/difference code.
    • bits 4–7: Used to indicate the format of the user channel word.
  • byte 2: audio word length
    • bits 0–2: Aux bits usage. This indicates how the aux bits (time slots 4–7) are used. Generally set to 000 (unused) or 001 (used for 24-bit audio data).
    • bits 3–5: Word length. Specifies the sample size, relative to the 20- or 24-bit maximum. Can specify 0, 1, 2 or 4 missing bits. Unused bits are filled with 0, but audio processing functions such as mixing will generally fill them in with valid data without changing the effective word length.
    • bits 6–7: Unused
  • byte 3: used only for multichannel applications
  • byte 4: Additional sample rate information.
    • bits 0–1: indicate the grade of the sample rate reference, per AES11.
    • bit 2: reserved
    • bits 3–6: Extended sample rate. This indicates other sample rates, not representable in byte 0 bits 6–7. Values are assigned for 24, 96, and 192 kHz, as well as 22.05, 88.2, and 176.4 kHz.
    • bit 7: This "sampling frequency scaling flag", if set, indicates that the sample rate is multiplied by 1/1.001 to match NTSC video frame rates.
  • byte 5: reserved
  • bytes 6–9: Four ASCII characters for indicating channel origin. Widely used in large studios.
  • bytes 10–13: Four ASCII characters indicating channel destination, to control automatic switchers. Less often used.
  • bytes 14–17: 32-bit sample address, incrementing block-to-block by 192 (because there are 192 frames per block). At 48 kHz, this wraps every 24h51m18.485333s.
  • bytes 18–21: as above, but offset to indicate samples since midnight.[2]
  • byte 22: contains information about the reliability of the channel status word.
    • bits 0–3: reserved
    • bit 4: if set, bytes 0–5 (signal format) are unreliable.
    • bit 5: if set, bytes 6–13 (channel labels) are unreliable.
    • bit 6: if set, bytes 14–17 (sample address) are unreliable.
    • bit 7: if set, bytes 18–21 (timestamp) are unreliable.
  • byte 23: CRC. This byte is used to detect corruption of the channel status word, as might be caused by switching mid-block. (Generator polynomial is x8+x4+x3+x2+1, preset to 1.)

See also

Notes

  1. ^ "The AES/EBU digital audio signal distribution standard". Broadcastengineering.com. Retrieved 2012-05-13.
  2. ^ AES/EBU Interface Standard (PDF), EBU, p. 12, Bytes 18 to 21, Bits 0 to 7: Time of day sample address code. Value (each Byte): 32-bit binary value representing the first sample of current block. LSBs are transmitted first. Default value shall be logic "0". Note: This is the time-of-day laid down during the source encoding of the signal and shall remain unchanged during subsequent operations. A value of all zeros for the binary sample address code shall, for the purposes of transcoding to real time, or to time codes in particular, be taken as midnight (i.e. 00 h, 00 mm, 00 s, 00 frame). Transcoding of the binary number to any conventional time code requires accurate sampling frequency information to provide the sample accurate time.

References