Media type
An Internet media type[1] is a two-part identifier for file formats on the Internet. The identifiers were originally defined in RFC 2046 for use in email sent through SMTP, but their use has expanded to other protocols such as HTTP, RTP and SIP. These types were called MIME types, and are sometimes referred to as Content-types, after the name of a header in several protocols whose value is such a type. The original name MIME type referred to usage to identify non-ASCII parts of email messages composed using the MIME (Multipurpose Internet Mail Extensions) specification. Without MIME types, email clients would not be able to understand if an attachment file were a graphics file or a spreadsheet etc. and would not be able to handle the attachment appropriately.
A media type is composed of two or more parts: A type, a subtype, and zero or more optional parameters.
For example, subtypes of text
have an optional charset
parameter that can be included to indicate the character encoding (e.g. text/html; charset=UTF-8
), and subtypes of multipart
type often define a boundary
between parts. Allowed charset
values are defined in the list of IANA character sets.
Prior to RFC 6648,[2] experimental or non-standard[3] media types were prefixed with x-
, but this practice was deprecated due to incompatibility problems when the experimental types were standardized. Subtypes that begin with vnd.
are vendor-specific;[4] subtypes that begin with prs.
are in the personal or vanity tree.[5] New media types can be created with the procedures outlined in RFC 4288.
In addition to email clients, web browsers also support various media types. This enables the browser to display or output files that are not in HTML format. Media type specification is also an important information source for search engines for the classification of data files on the web.
There are many registered media types, such as GIF graphics files and PostScript files. It is also possible to define custom media types.
Limitations
Internet media types are often used as part of a communication protocol between two applications (the source and destination). In this context, internet media type specifiers experience several problems.
The first problem is the ability of the source application (i.e. web server, email client) to correctly determine an internet media type for a piece of content. Many applications attempt to heuristically classify a file using its filename extension or with magic numbers. Neither approach is perfect, and may incorrectly classify a content's media type:
- Incorrect filename extension: a filename extension classifier will report an incorrect media type. For instance, some applications incorrectly give Rich text format files the .doc file extensions, instead of the correct .rtf extension.
- No filename extension: a filename extension classifier will report no media type, or will (incorrectly) report a catch-all type such as
application/octet-stream
. Files without extension are common on unix systems. - Filename extension collisions: when multiple formats use the same filename extension, a filename extension classifier will choose one media type arbitrarily. For instance, both Microsoft Word templates and graphviz graph files use the extension .dot.
- Ambiguous container formats: a magic number classifier may give a correct, though non-specific, media type, thus preventing a meaningful interpretation of the content. For instance, Office Open XML (.docx) format and Java executable (.jar) are both implemented internally as a zipped archive. A magic number system may classify such files as
application/zip
instead of the more specific type. Similar problems occur between XML and application formats implemented on top of XML. - Ambiguous magic numbers: an attacker can create a file which is identified simultaneously as two separate internet media types. For instance, the internal structure of a Gifar makes it both a valid GIF image and Java executable.
The second problem is the destination application's ability to trust the internet media type reported by the sender. As above, the internet media type is incorrect in some circumstances, and must be treated with skepticism. As early as 2002, the W3C unambiguously warned that it is a "serious error" if internet media type is incorrect, and that software should not attempt to guess a correct media type.[1]: Section 2 Nonetheless, software engineering principles encourage software that forgives a certain degree of malformed input, and user experience suffers when software fails to correctly interpret the content. Consequently, the many destination applications are designed to attempt recovery from such errors and identify a correct media type.[6][7]
The destination application has no more knowledge of the content than the source application, and attempts to infer the media type at the destination are equally difficult. This can lead to incompatibilities between source and destination applications, and in the worst-case, security vulnerabilities such as the Gifar attack or Cross-site scripting attacks.[8][9] Advanced content sniffing approaches have been proposed to balance interoperability and security in such situations.[7]
List of common media types
IANA manages the official registry of media types. It includes the following types:
Type application
For Multipurpose files:
application/atom+xml
: Atom feedsapplication/ecmascript
: ECMAScript/JavaScript; Defined in RFC 4329 (equivalent toapplication/javascript
but with stricter processing rules)application/EDI-X12
: EDI X12 data; Defined in RFC 1767application/EDIFACT
: EDI EDIFACT data; Defined in RFC 1767application/json
: JavaScript Object Notation JSON; Defined in RFC 4627application/javascript
: ECMAScript/JavaScript; Defined in RFC 4329 (equivalent toapplication/ecmascript
but with looser processing rules) It is not accepted in IE 8 or earlier -text/javascript
is accepted but it is defined as obsolete in RFC 4329. The "type" attribute of the<script>
tag in HTML5 is optional. In practice, omitting the media type of JavaScript programs is the most interoperable solution, since all browsers have always assumed the correct default even before HTML5.application/octet-stream
: Arbitrary binary data.[10] Generally speaking this type identifies files that are not associated with a specific application. Contrary to past assumptions by software packages such as Apache this is not a type that should be applied to unknown files. In such a case, a server or application should not indicate a content type, as it may be incorrect, but rather, should omit the type in order to allow the recipient to guess the type.[11]application/ogg
: Ogg, a multimedia bitstream container format; Defined in RFC 5334application/pdf
: Portable Document Format, PDF has been in use for document exchange on the Internet since 1993; Defined in RFC 3778application/postscript
: PostScript; Defined in RFC 2046application/rdf+xml
: Resource Description Framework; Defined by RFC 3870application/rss+xml
: RSS feedsapplication/soap+xml
: SOAP; Defined by RFC 3902application/font-woff
: Web Open Font Format; (candidate recommendation; useapplication/x-font-woff
until standard is official)application/xhtml+xml
: XHTML; Defined by RFC 3236application/xml
: XML files; Defined by RFC 3023application/xml-dtd
: DTD files; Defined by RFC 3023application/xop+xml
:XOPapplication/zip
: ZIP archive files; Registered[12]application/gzip
: Gzip, Defined in RFC 6713
Type audio
For Audio.
audio/basic
: mulaw audio at 8 kHz, 1 channel; Defined in RFC 2046audio/L24
: 24bit Linear PCM audio at 8–48 kHz, 1-N channels; Defined in RFC 3190audio/mp4
: MP4 audioaudio/mpeg
: MP3 or other MPEG audio; Defined in RFC 3003audio/ogg
: Ogg Vorbis, Speex, Flac and other audio; Defined in RFC 5334audio/vorbis
: Vorbis encoded audio; Defined in RFC 5215audio/vnd.rn-realaudio
: RealAudio; Documented in RealPlayer Help[13]audio/vnd.wave
: WAV audio; Defined in RFC 2361audio/webm
: WebM open media format
Type image
image/gif
: GIF image; Defined in RFC 2045 and RFC 2046image/jpeg
: JPEG JFIF image; Defined in RFC 2045 and RFC 2046image/pjpeg
: JPEG JFIF image; Associated with Internet Explorer; Listed in ms775147(v=vs.85) - Progressive JPEG, initiated before global browser support for progressive JPEGs (Microsoft and Firefox).image/png
: Portable Network Graphics; Registered,[14] Defined in RFC 2083image/svg+xml
: SVG vector image; Defined in SVG Tiny 1.2 Specification Appendix Mimage/tiff
: Tag Image File Format (only for Baseline TIFF); Defined in RFC 3302image/vnd.microsoft.icon
: ICO image; Registered[15]
Type message
message/http
: Defined in RFC 2616message/imdn+xml
: IMDN Instant Message Disposition Notification; Defined in RFC 5438message/partial
: Email; Defined in RFC 2045 and RFC 2046message/rfc822
: Email; EML files, MIME files, MHT files, MHTML files; Defined in RFC 2045 and RFC 2046
Type model
For 3D models.
model/example
: Defined in RFC 4735model/iges
: IGS files, IGES files; Defined in RFC 2077model/mesh
: MSH files, MESH files; Defined in RFC 2077, SILO filesmodel/vrml
: WRL files, VRML files; Defined in RFC 2077model/x3d+binary
: X3D ISO standard for representing 3D computer graphics, X3DB binary filesmodel/x3d+vrml
: X3D ISO standard for representing 3D computer graphics, X3DV VRML filesmodel/x3d+xml
: X3D ISO standard for representing 3D computer graphics, X3D XML files
Type multipart
For archives and other objects made of more than one part.
multipart/mixed
: MIME Email; Defined in RFC 2045 and RFC 2046multipart/alternative
: MIME Email; Defined in RFC 2045 and RFC 2046multipart/related
: MIME Email; Defined in RFC 2387 and used by MHTML (HTML mail)multipart/form-data
: MIME Webform; Defined in RFC 2388multipart/signed
: Defined in RFC 1847multipart/encrypted
: Defined in RFC 1847
Type text
For human-readable text and source code.
text/cmd
: commands; subtype resident in Gecko browsers like Firefox 3.5text/css
: Cascading Style Sheets; Defined in RFC 2318text/csv
: Comma-separated values; Defined in RFC 4180text/html
: HTML; Defined in RFC 2854text/javascript
(Obsolete): JavaScript; Defined in and obsoleted by RFC 4329 in order to discourage its usage in favor ofapplication/javascript
. However,text/javascript
is allowed in HTML 4 and 5 and, unlikeapplication/javascript
, has cross-browser support. The "type" attribute of the<script>
tag in HTML5 is optional and there is no need to use it at all since all browsers have always assumed the correct default (even in HTML 4 where it was required by the specification).text/plain
: Textual data; Defined in RFC 2046 and RFC 3676text/vcard
: vCard (contact information); Defined in RFC 6350text/xml
: Extensible Markup Language; Defined in RFC 3023
Type video
For video.
video/mpeg
: MPEG-1 video with multiplexed audio; Defined in RFC 2045 and RFC 2046video/mp4
: MP4 video; Defined in RFC 4337video/ogg
: Ogg Theora or other video (with audio); Defined in RFC 5334video/quicktime
: QuickTime video; Registered[16]video/webm
: WebM Matroska-based open media formatvideo/x-matroska
: Matroska open media formatvideo/x-ms-wmv
: Windows Media Video; Documented in Microsoft KB 288102video/x-flv
: Flash video (FLV files)
List of common media subtype prefixes
Prefix vnd
For vendor-specific files.
application/vnd.oasis.opendocument.text
: OpenDocument Text; Registered[17]application/vnd.oasis.opendocument.spreadsheet
: OpenDocument Spreadsheet; Registered[18]application/vnd.oasis.opendocument.presentation
: OpenDocument Presentation; Registered[19]application/vnd.oasis.opendocument.graphics
: OpenDocument Graphics; Registered[20]application/vnd.ms-excel
: Microsoft Excel filesapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet
: Microsoft Excel 2007 filesapplication/vnd.ms-powerpoint
: Microsoft Powerpoint filesapplication/vnd.openxmlformats-officedocument.presentationml.presentation
: Microsoft Powerpoint 2007 filesapplication/vnd.openxmlformats-officedocument.wordprocessingml.document
: Microsoft Word 2007 filesapplication/vnd.mozilla.xul+xml
: Mozilla XUL filesapplication/vnd.google-earth.kml+xml
: KML files (e.g. for Google Earth)[21]
Prefix x
For non-standard files.
application/x-deb
: deb (file format), a software package format used by the Debian projectapplication/x-dvi
: device-independent document in DVI formatapplication/x-font-ttf
: TrueType Font No registered MIME type, but this is the most commonly usedapplication/x-javascript
:application/x-latex
: LaTeX filesapplication/x-mpegURL
: .m3u8 variant playlistapplication/x-rar-compressed
: RAR archive filesapplication/x-shockwave-flash
: Adobe Flash files for example with the extension .swfapplication/x-stuffit
: StuffIt archive filesapplication/x-tar
: Tarball filesapplication/x-www-form-urlencoded
Form Encoded Data; Documented in HTML 4.01 Specification, Section 17.13.4.1application/x-xpinstall
: Add-ons to Mozilla applications (Firefox, Thunderbird, SeaMonkey, and the discontinued Sunbird)audio/x-aac
: .aac audio filesaudio/x-caf
: Apple's CAF audio filesimage/x-xcf
: GIMP image filetext/x-gwt-rpc
: GoogleWebToolkit data
text/x-jquery-tmpl
: jQuery template data
Prefix x-pkcs
For PKCS standard files.
application/x-pkcs12
: p12 files
application/x-pkcs12
: pfx files
application/x-pkcs7-certificates
: p7b files
application/x-pkcs7-certificates
: spc files
application/x-pkcs7-certreqresp
: p7r files
application/x-pkcs7-mime
: p7c files
application/x-pkcs7-mime
: p7m files
application/x-pkcs7-signature
: p7s files
See also
References
- ^ a b "Internet Media Type registration, consistency of use". W3C. 2002-09-04. Retrieved 2012-02-29.
- ^ "RFC 6648 - Deprecating the "X-" Prefix and Similar Constructs in Application Protocols". IETF. 2012. Retrieved 2012-10-07.
{{cite web}}
: Unknown parameter |month=
ignored (help)
- ^ Freed, N.; Borenstein, N. (1996). "RFC 2045 - Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies". IETF. Retrieved 2006-11-29.
{{cite web}}
: Unknown parameter |month=
ignored (help)CS1 maint: multiple names: authors list (link)
- ^ Freed, N.; Klensin, J.; Postel, J. (1996). "RFC 2048 - Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures, Section 2.1.2 - Vendor Tree". IETF. Retrieved 2011-12-05.
{{cite web}}
: Unknown parameter |month=
ignored (help)CS1 maint: multiple names: authors list (link)
- ^ Freed, N.; Klensin, J. (2005). "RFC 4288 - Media Type Specifications and Registration Procedures". IETF. Retrieved 2008-06-14.
{{cite web}}
: Unknown parameter |month=
ignored (help)CS1 maint: multiple names: authors list (link)
- ^ "MIME Type Detection in Windows Internet Explorer". Microsoft. Retrieved 2012-07-14.
- ^ a b http://mimesniff.spec.whatwg.org/ MIME Sniffing Standard, Living Standard — Last Updated 29 November 2012. Editors Gordon P. Hemsley, Adam Barth, Ian Hickson.
- ^ "CVE-2008-5343 (under review)". MITRE Corporation. 4 December 2008. Retrieved 1 January 2013.
- ^ Henry Sudhof (11 February 2009). "Risky sniffing: MIME sniffing in Internet Explorer enables cross-site scripting attacks". The H. Retrieved 2012-07-14.
- ^ RFC 2046 - Multipurpose Internet Mail Extensions (MIME) Part Two: Media types. Tools.ietf.org. Retrieved on 2010-09-29.
- ^ W3C (1999). "RFC 2616: 7. Entity". Hypertext Transfer Protocol -- HTTP/1.1. The Internet Society. Retrieved 28 May 2012.
{{cite web}}
: Unknown parameter |month=
ignored (help)CS1 maint: numeric names: authors list (link)
- ^ MIME SUBTYPE NAME: zip
- ^ "Supported Media Formats". RealPlayer Help. RealNetworks. 2010. Retrieved 28 May 2012.
- ^ MIME SUBTYPE NAME: png
- ^ MIME subtype name : Vendor Tree - vnd.microsoft.icon
- ^ Quicktime
- ^ vnd.oasis.opendocument.text
- ^ vnd.oasis.opendocument.spreadsheet
- ^ vnd.oasis.opendocument.presentation
- ^ vnd.oasis.opendocument.graphics
- ^ "Application Media Types". IANA. Retrieved 2012-02-19.
External links
- IANA MIME media types list
- IANA character sets
- RFC 2045, RFC 2046 - Multipurpose Internet Mail Extensions (MIME), parts 1 and 2
- RFC 4288 - Media Type Specifications and Registration Procedures