Shapefile
Filename extension |
.shp, .shx, .dbf |
---|---|
Internet media type | application/vnd.shp, application/vnd.shx, application/vnd.dbf |
Developed by | ESRI |
Type of format | GIS |
Standard | Shapefile Technical Description |
The ESRI Shapefile or simply a shapefile is a popular geospatial vector data format for geographic information systems software. It is developed and regulated by ESRI as a (mostly) open specification for data interoperability among ESRI and other software products.[1] A "shapefile" commonly refers to a collection of files with ".shp", ".shx", ".dbf", and other extensions on a common prefix name (e.g., "lakes.*"). The actual shapefile relates specifically to files with the ".shp" extension, however this file alone is incomplete for distribution, as the other supporting files are required.
Shapefiles spatially describe geometries: points, polylines, and polygons, . These, for example, could represent water wells, rivers, and lakes, respectively. Each item may also have attributes, that describe the items, such as the name or temperature.
Overview
A shapefile is a digital vector storage format for storing geometric location and associated attribute information. This format lacks the capacity to store topological information. The shapefile format was introduced with ArcView GIS version 2 in the beginning of the 1990s. It is now possible to read and write shapefiles using a variety of free and non-free programs.
Shapefiles are simple because they store primitive geometrical data types of points, lines, and polygons. These primitives are of limited use without any attributes to specify what they represent. Therefore, a table of records will store properties/attributes for each primitive shape in the shapefile. Shapes (points/lines/polygons) together with data attributes can create infinitely many representations about geographical data. Representation provides the ability for powerful and accurate computations.
File components
While the term "shapefile" is quite common, a "shapefile" is actually a set of several files. Three individual files are normally mandatory to store the core data the comprises a shapefile. There are a further eight optional files which store primarily index data to improve performance. Each individual file should conform to the MS DOS 8.3 naming convention (8 character filename prefix, fullstop, 3 character filename suffix such as shapefil.shp) in order to be compatible with past applications that handle shapefiles. For this same reason, all files should be located in the same folder.
Mandatory files :
- .shp — shape format; the feature geometry itself
- .shx — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly
- .dbf — attribute format; columnar attributes for each shape, in dBase III format
Optional files :
- .prj — projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format
- .sbn and .sbx — a spatial index of the features
- .fbn and .fbx — a spatial index of the features for shapefiles that are read-only
- .ain and .aih — an attribute index of the active fields in a table or a theme's attribute table
- .ixs — a geocoding index for read-write shapefiles
- .mxs — a geocoding index for read-write shapefiles (ODB format)
- .atx — an attribute index for the .dbf file in the form of shapefile.columnname.atx (ArcGIS 8 and later)
- .shp.xml — metadata in XML format
In each of the .shp, .shx, and .dbf files, the shapes in each file correspond to each other in sequence. That is, the first record in the .shp file corresponds to the first record in the .shx and .dbf files, and so on.
Shapefile shape format (.shp)
The main file (.shp) contains the primary geographic reference data in the shapefile. The file consists of a single fixed length header followed by one or more variable length records. Each of the variable length records includes a record header component and a record contents component. A detailed description of the file format is given in the ESRI Shapefile Technical Description.[1] This format should not be confused with the AutoCAD shape font source format, which shares the .shp extension.
The main file header is fixed at 100 bytes in length and contains 17 fields; nine 4-byte (int unsigned) integer fields followed by eight 8-byte (double) floating point fields:
Bytes | Type | Endianness | Usage |
---|---|---|---|
0-3 | uint32 | big | File code (always hex value 0x0000270a) |
4-23 | uint32 | big | (Unused) |
24-27 | uint32 | big | File length (in 16-bit words) |
28-31 | uint32 | little | Version |
32-35 | uint32 | little | Shape type (see reference below) |
36-99 | double | little | Bounding box of all shapes contained in the shapefile. Minimum and maximum values for X, Y, Z, and M in the following order: min X, min Y, max X, max Y, min Z, max Z, min M, max M. |
The file then contains any number of variable-length records. Each record is prefixed with a record-header of 8 bytes:
Bytes | Type | Endianness | Usage |
---|---|---|---|
0-3 | uint32 | big | Record Number |
4-7 | uint32 | big | Record length (in 16-bit words) |
8-11 | uint32 | little | Shape type (see reference below) |
Following the record header is the actual record:
Bytes | Type | Endianness | Usage |
---|---|---|---|
0-3 | uint32 | little | Shape type (see reference below) |
4- | - | - | Shape content |
The variable length record contents depend on the shape type. The following are the possible shape types:
Value | Shape type | Fields |
---|---|---|
0 | Null Shape | None |
1 | Point | Shape Type, X, Y |
3 | Polyline | Shape Type, Box, NumParts, NumPoints, Parts, Points |
5 | Polygon | Shape Type, Box, NumParts, NumPoints, Parts, Points |
8 | MultiPoint | Shape Type, Box, NumPoints, Points |
11 | PointZ | Shape Type, X, Y, Z, M |
13 | PolylineZ | Mandatory: Shape Type, Box, NumParts, NumPoints, Parts, Points, Z range , Z array Optional: M range, M array |
15 | PolygonZ | Mandatory: Shape Type, Box, NumParts, NumPoints, Parts, Points, Z range, Z array Optional: M range, M array |
18 | MultiPointZ | Mandatory: Shape Type, Box, NumPoints, Points, Z range, Z array Optional: M range, M array |
21 | PointM | Shape Type, X, Y, M |
23 | PolylineM | Mandatory: Shape Type, Box, NumParts, NumPoints, Parts, Points Optional: M range, M array |
25 | PolygonM | Mandatory: Shape Type, Box, NumParts, NumPoints, Parts, Points Optional: M range, M array |
28 | MultiPointM | Mandatory: Shape Type, Box, NumPoints, Points Optional Fields: M range, M array |
31 | MultiPatch | Mandatory: Shape Type, Box, NumParts, NumPoints, Parts, PartTypes, Points, Z range, Z array Optional: M range, M array |
Shapefile shape index format (.shx)
The shapefile index contains the same 100-byte header as the .shp file, followed by any number of 8-byte fixed-length records which consist of the following two fields:
Bytes | Type | Endianness | Usage |
---|---|---|---|
0-3 | uint32 | big | Record offset (in 16-bit words) |
4-7 | uint32 | big | Record length (in 16-bit words) |
Using this index, it is possible to seek backwards in the shapefile by seeking backwards first in the shape index (which is possible because it uses fixed-length records), reading the record offset, and using that to seek to the correct position in the .shp file. It is also possible to seek forwards an arbitrary number of records by using the same method.
Shapefile attribute format (.dbf)
Attributes for each shape are stored in the xBase (dBase) format, which has an open specification.
Shapefile projection format (.prj)
The projection information contained in the .prj file is critical in order to understand the data contained in the .shp file correctly. Although it is technically optional, it is most often provided, as it is not necessarily possible to guess the projection of any given points.
Shapefile spatial index format (.sbn)
Part of ArcView's spatial index. In case this file is outdated, ArcView will not display the shapefile correctly. It will appear like a lot of features have been deleted. To recreate the spatial index in ArcView, do the following:
- Go to the table
- Select the Shape field
- Select Field->Remove Index from the menu
- Select Field->Create Index from the menu
To recreate the spatial index in ArcCatalog, do the following:
- Right click on the shapefile and choose properties
- Click the indexes tab
- At the bottom, choose Delete to remove the index
- At the bottom, choose add to recreate the index
Note that the .shp file contains all of the information necessary to successfully parse it; the spatial index file is not strictly necessary though some implementations do require it.
Limitations
Topology and shapefiles
Shapefiles do not have the ability to store topological information. ArcInfo coverages and Personal/Enterprise Geodatabases do have the ability to store feature topology.
Spatial representation
The edges of a polyline or polygon are defined using points, which can give it a jagged edge at higher resolutions. Additional points are required to give smooth shapes, which requires storing quite a lot of data compared to, for example, bézier curves, which can capture complexity using smooth curves, without using as many points. Currently, none of the shapefile types support bézier curves.
Data storage
Unlike most databases, the database format is based on older xBASE standard, incapable of storing null values in its fields. This limitation can make the storage of data in the attributes less flexible. In ArcGIS products, values that should be null are instead replaced with a 0 (without warning), which can make the data misleading. This problem is addressed in ArcGIS products by using ESRI's geodatabase offerings, one of which is based on Microsoft Access.
Mixing shape types
Each shape file can technically store a mix of different shape types, as the shape type precedes each record, but common use of the specification dictates that only shapes of a single type can be in a single file. For example, a shape file cannot contain both Polyline and Polygon data. Thus, well (point), river (polyline) and lake (polygon) data must be kept in three separate files.
References
- ^ a b Environmental Systems Research Institute, Inc. (July, 1998). "ESRI Shapefile technical description" (PDF). Retrieved 2007-07-04.
{{cite journal}}
: Check date values in:|date=
(help); Cite journal requires|journal=
(help)
External links
- Shapefile file extensions - ESRI Webhelp docs for ArcGIS 9.2 (2007)
- ESRI Shapefile Technical Description - ESRI White Paper, July 1998
- ESRI - Understanding Topology and Shapefiles
- Shapefile Examples
- shapelib - Shapefile C Library
- OGR - The Open Source OGR library for handling many vector formats (e.g. shapefiles, Google Earth KML, GML, GRASS, GMT, PostGIS, etc.)
- A very basic shp viewer