Portable Executable: Difference between revisions
removed non-existing link under "See also" |
→Technical details: wikify |
||
Line 23: | Line 23: | ||
==Technical details== |
==Technical details== |
||
A PE file consists of a number of headers and sections, which together tell the dynamic linker how to map the file into memory. Because an executable image consists of several different regions which require different memory protections, the start of each section must be aligned to a page boundary. For instance, typically the .text section (which holds program code) is mapped execute/readonly, and the .data section (holding global variables) is mapped no-execute/readwrite. However, to avoid wasting space the different sections are not page aligned on disk. Part of the job of the dynamic linker is to map each section individually and assign the correct permissions to the resulting regions, according to the instructions found in the headers. |
A PE file consists of a number of headers and sections, which together tell the [[dynamic linker]] how to map the file into memory. Because an executable image consists of several different regions which require different memory protections, the start of each section must be aligned to a page boundary. For instance, typically the .text section (which holds program code) is mapped execute/readonly, and the .data section (holding global variables) is mapped no-execute/readwrite. However, to avoid wasting space the different sections are not page aligned on disk. Part of the job of the dynamic linker is to map each section individually and assign the correct permissions to the resulting regions, according to the instructions found in the headers. |
||
One section of note is the ''import address table'' (IAT). The IAT is used as a lookup table when the application is calling a Windows API function. Because a compiled PE DLL/EXE cannot know in advance where the other DLLs it depends upon are located in memory, an indirect jump is required. As the dynamic linker loads modules and joins them together, it writes jump instructions into the IAT slots which point to the actual location of the destination function. Though this adds an extra jump over the cost of an intra-module call, the performance hit is mostly negligible and easily worth the flexibility of dynamic libraries. If the compiler knows ahead of time that a call will be inter-module (via a dllimport attribute) it can produce more optimised code that simply results in an indirect call opcode. |
One section of note is the ''import address table'' (IAT). The IAT is used as a lookup table when the application is calling a Windows API function. Because a compiled PE DLL/EXE cannot know in advance where the other DLLs it depends upon are located in memory, an indirect jump is required. As the dynamic linker loads modules and joins them together, it writes jump instructions into the IAT slots which point to the actual location of the destination function. Though this adds an extra jump over the cost of an intra-module call, the performance hit is mostly negligible and easily worth the flexibility of dynamic libraries. If the compiler knows ahead of time that a call will be inter-module (via a dllimport attribute) it can produce more optimised code that simply results in an indirect call opcode. |
Revision as of 10:56, 16 November 2006
Filename extension |
.exe, .obj, .dll |
---|---|
Internet media type | application/vnd.microsoft.portable-executable, application/efi |
Developed by | Microsoft |
Type of format | Binary, executable, object, shared libraries |
Extended from | COFF |
The Portable Executable (PE) format is a file format for executables, object code, and DLLs, used in 32-bit and 64-bit versions of Windows operating systems. The term "portable" refers to the format's portability across all 32-bit (and by extension 64-bit) Windows operating systems. The PE format is basically a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and thread-local storage (TLS) data. On NT operating systems, the PE format is used for EXE, DLL, OBJ, SYS (device driver), and other file types.
PE is a modified version of the Unix COFF file format. PE/COFF is an alternative term in Windows development.
On Windows NT operating systems, PE currently supports the IA-32, IA-64, and AMD64/EM64T (or "x86-64") instruction set architectures. Before Windows 2000, Windows NT, and thus PE, supported the MIPS, DEC Alpha, and PowerPC instruction set architectures. Because PE is used on Windows CE, it continues to support several variants of the MIPS architecture, and also supports the ARM (including Thumb) and SuperH instruction set architectures.
Brief history
Microsoft migrated to the PE format with the introduction of the Windows NT 3.1 operating system. All later versions of Windows, including Windows 95/98/ME, support the file structure. The format has retained limited legacy support to bridge the gap between DOS-based and NT systems. For example, PE/COFF headers still include an MS-DOS executable program, which is by default a stub that displays the simple message "This program cannot be run in DOS mode" (or similar). PE also continues to serve the changing Windows platform. Some extensions include the .NET PE format (see below), a 64-bit version called PE32+ (sometimes PE+), and a specification for Windows CE.
Technical details
A PE file consists of a number of headers and sections, which together tell the dynamic linker how to map the file into memory. Because an executable image consists of several different regions which require different memory protections, the start of each section must be aligned to a page boundary. For instance, typically the .text section (which holds program code) is mapped execute/readonly, and the .data section (holding global variables) is mapped no-execute/readwrite. However, to avoid wasting space the different sections are not page aligned on disk. Part of the job of the dynamic linker is to map each section individually and assign the correct permissions to the resulting regions, according to the instructions found in the headers.
One section of note is the import address table (IAT). The IAT is used as a lookup table when the application is calling a Windows API function. Because a compiled PE DLL/EXE cannot know in advance where the other DLLs it depends upon are located in memory, an indirect jump is required. As the dynamic linker loads modules and joins them together, it writes jump instructions into the IAT slots which point to the actual location of the destination function. Though this adds an extra jump over the cost of an intra-module call, the performance hit is mostly negligible and easily worth the flexibility of dynamic libraries. If the compiler knows ahead of time that a call will be inter-module (via a dllimport attribute) it can produce more optimised code that simply results in an indirect call opcode.
PE files do not contain position-independent code. Instead they are compiled to a preferred base address, and all addresses emitted by the compiler/linker are fixed ahead of time. If a PE file cannot be loaded at its preferred address (because it's already taken by something else), the operating system will rebase it. This involves recalculating every absolute address and modifying the code to use the new values. The loader does this by comparing the preferred and actual load addresses and calculating a delta. This is then added to the value of the preferred address to come up with the new address of the variable. Base relocations are stored in a list and added, as needed, to an existing memory location. The resulting code is now private to the process and no longer shareable, so many of the memory saving benefits of DLLs are lost in this scenario. It also slows down loading of the module significantly. For this reason rebasing is to be avoided wherever possible, and the DLLs shipped by Microsoft have base addresses pre-computed to not overlap. In the no rebase case PE therefore has the advantage of very efficient code, but in the presence of rebasing the memory usage hit can be expensive. Contrast this with ELF which uses fully position independent code and a global offset table, which trades off execution time against memory usage in favour of the latter.
.NET, metadata, and the PE format
Microsoft's .NET Framework has extended the PE format with features which support the Common Language Runtime (an implementation of the .NET Virtual Machine). Among the additions are a CLR Header and CLR Data section. Upon loading a binary, the OS loader yields execution to the CLR via a reference in the PE/COFF IMPORT table. The CLR VM then loads CLR Header and Data sections.
The CLR Data section contains two important segments: Metadata and Intermediate Language (IL) code:
- Metadata contains information relevant to the assembly, including the assembly manifest. A manifest describes the assembly in detail including unique indentification (via a hash, version number, etc.), data on exported components, extensive type information (supported by the Common Type System (CTS)), external references, and a list of files within the assembly. The CLR environment makes extensive use of metadata.
- Intermediate Language (IL) code is abstracted, language independent code that satisfies the .NET CLR's Common Intermediate Language (CIL) requirement. The term "Intermediate" refers to the nature of IL code as cross-language and cross-platform compatible. This intermediate language, similar to bytecode in the Java programming language, allows platforms and languages to support the common .NET CLR (rather than vice versa). IL supports object-oriented programming (polymorphism, inheritance, abstract types, etc.), exceptions, events, and various data structures. IL code is assembled into a .NET PE for execution by the CLR.
Use on other operating systems
The PE format is also used by ReactOS, as ReactOS is intended to be binary-compatible with Windows. It has also historically been used by a number of other operating systems, including SkyOS and BeOS R3. However, both SkyOS and BeOS eventually moved to ELF.
See also
Related tools
- PEBrowse a Portable Executable (Win32) file viewer/dissection utility
- CFF Explorer PE Editor with full support for PE32/64. Utilities, rebuilder, hex editor, import adder, resource viewer. First PE Editor with support for .NET.
- PE Explorer(Shareware) a PE file viewer/editor, also provides an API function syntax lookup, dependency viewer, section editor, and a disassembler for generating annotated code dumps.
- yoda's LordPE Deluxe, PE-Editor + Dumper. LordPE allows to edit nearly all PE-structures and gives you more control than 'PE Explorer'
- PEDUMP ConsoleApp + Source Code(C++), really nice to compare two PE-Files via a script. Redirect console output to a txt-file and compare it with for ex. Examdiff
- GNU Binutils with --target=i386-pe, --target=mips-pe etc.
- Anywhere PE Viewer is a free tool for exploring PE files (headers, export table, import table, resources). It is a pure Java application.
External links
- Microsoft Portable Executable and Common Object File Format Specification
- The original Portable Executable article by Matt Pietrek (MSDN Magazine, March 1994)
- Part I. An In-Depth Look into the Win32 Portable Executable File Format by Matt Pietrek (MSDN Magazine, February 2002)
- Part II. An In-Depth Look into the Win32 Portable Executable File Format by Matt Pietrek (MSDN Magazine, March 2002)
- PE File Format Diagram
- The .NET File Format by Daniel Pistelli