Second Level Address Translation: Difference between revisions
GreenC bot (talk | contribs) Move 1 url. Wayback Medic 2.5 per WP:URLREQ#zdnet.com |
|||
(47 intermediate revisions by 34 users not shown) | |||
Line 1: | Line 1: | ||
'''Second Level Address Translation |
{{Short description|Hardware-assisted virtualization technology}} |
||
'''Second Level Address Translation (SLAT)''', also known as '''nested paging''', is a [[hardware-assisted virtualization]] technology which makes it possible to avoid the overhead associated with software-managed [[shadow page tables]]. |
|||
[[AMD]] has supported SLAT through the Rapid Virtualization Indexing (RVI) technology since the introduction of its third-generation [[Opteron]] processors (code name Barcelona). [[Intel]]'s implementation of SLAT, known as Extended Page Table (EPT), was introduced in the [[Nehalem (microarchitecture)|Nehalem microarchitecture]] found in certain [[Core i7]], [[Core i5]], and [[Core i3]] processors. |
[[AMD]] has supported SLAT through the Rapid Virtualization Indexing (RVI) technology since the introduction of its third-generation [[Opteron]] processors (code name Barcelona). [[Intel]]'s implementation of SLAT, known as Extended Page Table (EPT), was introduced in the [[Nehalem (microarchitecture)|Nehalem microarchitecture]] found in certain [[Core i7]], [[Core i5]], and [[Core i3]] processors. |
||
[[ARM architecture|ARM]]'s virtualization extensions support SLAT, known as Stage-2 page-tables provided by a Stage-2 MMU. The guest uses the Stage-1 MMU. Support was added as optional in the ARMv7ve architecture and is also supported in the ARMv8 (32-bit and 64-bit) architectures. |
[[ARM architecture|ARM]]'s virtualization extensions support SLAT, known as Stage-2 page-tables provided by a Stage-2 [[Memory_management_unit|MMU]]. The guest uses the Stage-1 MMU. Support was added as optional in the ARMv7ve architecture and is also supported in the ARMv8 (32-bit and 64-bit) architectures. |
||
== Overview == |
== Overview == |
||
{{expand section|how SLAT actually works, which is fairly different from shadow page tables, even at a logical level|date=February 2014}} |
|||
The introduction of [[protected mode]] to the x86 architecture with the [[Intel 80286]] processor brought the concepts of [[physical memory]] and [[virtual memory]] to mainstream architectures. When processes use virtual addresses and an instruction requests access to memory, the processor translates the virtual address to a physical address using a [[page table]] or [[translation lookaside buffer]] (TLB). When running a virtual system, it has allocated virtual memory of the host system that serves as a physical memory for the guest system, and the same process of address translation goes on also within the guest system. This increases the cost of memory access since the address translation needs to be performed twice{{snd}} once inside the guest system (using software-emulated guest page table), and once inside the host system (using physical map[pmap]). |
|||
''' |
|||
In order to make this translation |
In order to make this translation efficient, software engineers implemented software based shadow page table. Shadow page table will translate guest virtual memory directly to host physical memory address. Each VM has a separate shadow page table and hypervisor is in charge of managing them. But the cost is very expensive since every time a guest updates its page table, it will trigger the hypervisor to manage the allocation of the page table and its changes. |
||
In order to make this translation more efficient, processor vendors implemented technologies commonly called SLAT. By treating each guest-physical address as a host-virtual address, a slight extension of the hardware used to walk a non-virtualized page table (now the guest page table) can walk the host page table. With [[multilevel page table]]s the host page table can be viewed conceptually as nested within the guest page table. A hardware page table walker can treat the additional translation layer almost like adding levels to the page table. |
|||
Using SLAT and multilevel page tables, the number of levels needed to be walked to find the translation doubles when the guest-physical address is the same size as the guest-virtual address and the same size pages are used. This increases the importance of caching values from intermediate levels of the host and guest page tables. It is also helpful to use large pages in the host page tables to reduce the number of levels (e.g., in x86-64, using 2 [[MiB|MB]] pages removes one level in the page table). Since memory is typically allocated to virtual machines at coarse granularity, using large pages for guest-physical translation is an obvious optimization, reducing the depth of look-ups and the memory required for host page tables. |
Using SLAT and multilevel page tables, the number of levels needed to be walked to find the translation doubles when the guest-physical address is the same size as the guest-virtual address and the same size pages are used. This increases the importance of caching values from intermediate levels of the host and guest page tables. It is also helpful to use large pages in the host page tables to reduce the number of levels (e.g., in x86-64, using 2 [[MiB|MB]] pages removes one level in the page table). Since memory is typically allocated to virtual machines at coarse granularity, using large pages for guest-physical translation is an obvious optimization, reducing the depth of look-ups and the memory required for host page tables. |
||
Line 17: | Line 19: | ||
=== {{Anchor|RVI}}Rapid Virtualization Indexing === |
=== {{Anchor|RVI}}Rapid Virtualization Indexing === |
||
Rapid Virtualization Indexing (RVI), known |
Rapid Virtualization Indexing (RVI), known as Nested Page Tables (NPT) during its development, is an [[AMD]] second generation [[hardware-assisted virtualization]] technology for the processor [[memory management unit]] (MMU).<ref>{{cite web|url=http://blogs.amd.com/virtualization/2009/03/23/rapid-virtualization-indexing-with-windows-server-2008-r2-hyper-v/ |title=Rapid Virtualization Indexing with Windows Server 2008 R2 Hyper-V | The Virtualization Blog |publisher=Blogs.amd.com |date=2009-03-23 |access-date=2010-05-16}}</ref><ref>{{cite web|url=http://developer.amd.com/assets/NPT-WP-1%201-final-TM.pdf|title=AMD-V Nested Paging|date=July 2008|access-date=2013-12-11|url-status=dead|archive-url=https://web.archive.org/web/20120905060541/http://developer.amd.com/assets/NPT-WP-1%201-final-TM.pdf|archive-date=2012-09-05}}</ref> RVI was introduced in the third generation of [[Opteron]] processors, [[AMD K10|code name Barcelona]].<ref>{{cite web|url=http://searchservervirtualization.techtarget.com/news/article/0,289142,sid94_gci1322012,00.html |title=VMware engineer praises AMD's Nested Page Tables |publisher=Searchservervirtualization.techtarget.com |date=2008-07-21 |access-date=2010-05-16}}</ref> |
||
A [[VMware]] research paper found that RVI offers up to 42% gains in performance compared with software-only (shadow page table) implementation.<ref name=vmware>{{cite web|url=http://www.vmware.com/pdf/RVI_performance.pdf |title=Performance Evaluation of AMD RVI Hardware Assist | |
A [[VMware]] research paper found that RVI offers up to 42% gains in performance compared with software-only (shadow page table) implementation.<ref name=vmware>{{cite web|url=http://www.vmware.com/pdf/RVI_performance.pdf |title=Performance Evaluation of AMD RVI Hardware Assist |access-date=2010-05-16}}</ref> Tests conducted by [[Red Hat]] showed a doubling in performance for [[OLTP]] benchmarks.<ref>{{cite web|url=http://magazine.redhat.com/2007/11/20/red-hat-enterprise-linux-51-utilizes-nested-paging-on-amd-barcelona-processor-to-improve-performance-of-virtualized-guests/ |title=Red Hat Magazine | Red Hat Enterprise Linux 5.1 utilizes nested paging on AMD Barcelona Processor to improve performance of virtualized guests |publisher=Magazine.redhat.com |date=2007-11-20 |access-date=2010-05-16}}</ref> |
||
RVI was introduced in the third generation of [[Opteron]] processors, [[AMD K10|code name Barcelona]].<ref>{{cite web|url=http://searchservervirtualization.techtarget.com/news/article/0,289142,sid94_gci1322012,00.html |title=VMware engineer praises AMD's Nested Page Tables |publisher=Searchservervirtualization.techtarget.com |date=2008-07-21 |accessdate=2010-05-16}}</ref> |
|||
=== {{Anchor|EPT}}Extended Page Tables === |
=== {{Anchor|EPT}}Extended Page Tables === |
||
Extended Page Tables (EPT) is an Intel second-generation [[x86 virtualization]] technology for the [[memory management unit]] (MMU). EPT support is found in Intel's [[Core i3#Core i3|Core i3]], [[Core i5#Core i5|Core i5]], [[Core i7#Core i7|Core i7]] and [[Core i9#Core i9|Core i9]] CPUs, among others.<ref>{{cite web |url=http://ark.intel.com/Products/VirtualizationTechnology |title=Intel Virtualization Technology List |publisher=Ark.intel.com |date |
Extended Page Tables (EPT) is an Intel second-generation [[x86 virtualization]] technology for the [[memory management unit]] (MMU). EPT support is found in Intel's [[Core i3#Core i3|Core i3]], [[Core i5#Core i5|Core i5]], [[Core i7#Core i7|Core i7]] and [[Core i9#Core i9|Core i9]] CPUs, among others.<ref>{{cite web |url=http://ark.intel.com/Products/VirtualizationTechnology |title=Intel Virtualization Technology List |publisher=Ark.intel.com |access-date=2014-02-17}}</ref> It is also found in some newer [[VIA Technologies|VIA]] CPUs. EPT is required in order to launch a logical processor directly in [[real mode]], a feature called "unrestricted guest" in Intel's jargon, and introduced in the [[Westmere (microarchitecture)|Westmere microarchitecture]].<ref>[http://2013.asiabsdcon.org/papers/abc2013-P5A-paper.pdf "Intel added unrestricted guest mode on Westmere micro-architecture and later Intel CPUs, it uses EPT to translate guest physical address access to host physical address. With this mode, VMEnter without enable paging is allowed."]</ref><ref>{{cite web|url=http://download.intel.com/products/processor/manual/326019.pdf|title=Intel 64 and IA-32 Architectures Developer's Manual, Vol. 3C|work=Intel|access-date=13 December 2015|quote=If the 'unrestricted guest' VM-execution control is 1, the 'enable EPT' VM-execution control must also be 1.}}</ref> |
||
⚫ | According to a VMware evaluation paper, "EPT provides performance gains of up to 48% for MMU-intensive benchmarks and up to 600% for MMU-intensive microbenchmarks", although it can actually cause code to run slower than a software implementation in some [[corner case]]s.<ref>[http://www.vmware.com/pdf/Perf_ESX_Intel-EPT-eval.pdf Performance Evaluation of Intel EPT Hardware Assist]</ref> |
||
EPT is required in order to launch a logical processor directly in [[real mode]], a feature called "unrestricted guest" in Intel's jargon, and introduced in the [[Westmere (microarchitecture)|Westmere microarchitecture]].<ref>[http://2013.asiabsdcon.org/papers/abc2013-P5A-paper.pdf "Intel added unrestricted guest mode on Westmere micro-architecture and later Intel CPUs, it uses EPT to translate guest physical address access to host physical address. With this mode, VMEnter without enable paging is allowed."]</ref><ref>{{cite web|url=http://download.intel.com/products/processor/manual/326019.pdf|title=Intel 64 and IA-32 Architectures Developer's Manual, Vol. 3C|work=Intel|accessdate=13 December 2015|quote=If the 'unrestricted guest' VM-execution control is 1, the 'enable EPT' VM-execution control must also be 1.}}</ref> |
|||
⚫ | |||
[http://www.vmware.com/pdf/Perf_ESX_Intel-EPT-eval.pdf Performance Evaluation of Intel EPT Hardware Assist] |
|||
</ref> |
|||
=== {{Anchor|EPT}}Stage-2 page-tables === |
=== {{Anchor|EPT}}Stage-2 page-tables === |
||
Stage-2 page-table support is present in ARM processors that implement exception level 2 (EL2). |
Stage-2 page-table support is present in ARM processors that implement exception level 2 (EL2). |
||
== Extensions == |
|||
=== {{Anchor|MBE}}Mode Based Execution Control === |
|||
Mode Based Execution Control ('''MBEC''') is an extension to x86 SLAT implementations first available in Intel [[Kaby Lake]] and AMD [[Zen+]] CPUs (known on the latter as ''Guest Mode Execute Trap'' or '''GMET''').<ref>{{Cite web |last=Cunningham |first=Andrew |date=2021-08-27 |title=Why Windows 11 has such strict hardware requirements, according to Microsoft |url=https://arstechnica.com/gadgets/2021/08/why-windows-11-has-such-strict-hardware-requirements-according-to-microsoft/ |access-date=2024-03-18 |website=Ars Technica |language=en-us}}</ref> The extension extends the [[Executable space protection|execute bit]] in the extended page table (guest page table) into 2 bits - one for user execute, and one for supervisor execute.<ref>{{cite web |last1=Mulnix |first1=David L |title=Intel Xeon Processor Scalable Family Technical Overview |url=https://software.intel.com/content/www/us/en/develop/articles/intel-xeon-processor-scalable-family-technical-overview.html#-12 |website=intel |access-date=3 September 2021}}</ref> |
|||
MBE was introduced to speed up guest usermode unsigned code execution with kernelmode code integrity enforcement. Under this configuration, unsigned code pages can be marked as execute under usermode, but must be marked as [[no execute|no-execute]] under kernelmode. To maintain integrity by ensuring all guest kernelmode executable code are signed even when the guest kernel is compromised, the guest kernel does not have permission to modify the execute bit of any memory pages. Modification of the execute bit, or switching of the guest page table which contains the execute bit, is delegated to a higher privileged entity, in this case the host [[hypervisor]]. Without MBE, each entrance from unsigned usermode execution to signed kernelmode execution must be accompanied by a VM exit to the hypervisor to perform a switch to the kernelmode page table. On the reverse operation, an exit from signed kernelmode to unsigned usermode must be accompanied by a VM exit to perform another page table switch. VM exits significantly impact code execution performance.<ref> |
|||
[https://www.blackhat.com/docs/us-16/materials/us-16-Wojtczuk-Analysis-Of-The-Attack-Surface-Of-Windows-10-Virtualization-Based-Security-wp.pdf Analysis of the Attack Surface of Windows 10 Virtualization-based Security] |
|||
</ref><ref>{{cite web |last1=Arkley |first1=Brent |title=The potential performance Impact of Device Guard (HVCI) |url=http://borec.ch/the-potential-performance-impact-of-device-guard-hvci/ |website=Borec's Legacy meets Modern Device Management Blog |access-date=3 September 2021}}</ref> With MBE, the same page table can be shared between unsigned usermode code and signed kernelmode code, with two sets of execute permission depending on the execution context. VM exits are no longer necessary when execution context switches between unsigned usermode and signed kernel mode. |
|||
== Support in software == |
== Support in software == |
||
[[Hypervisor]]s that support SLAT include the following: |
[[Hypervisor]]s that support SLAT include the following: |
||
* [[Hyper-V]] for [[Windows Server 2008 R2]], [[Windows 8]] and later.<ref>{{cite web|url=http://doingitvirtual.com/blogs/virtualzone/archive/2009/07/20/amd-v-rapid-virtualization-indexing-and-windows-server-2008-r2-hyper-v-second-level-address-translation.aspx |title=AMD-V Rapid Virtualization Indexing and Windows Server 2008 R2 Hyper-V Second Level Address Translation |publisher=Doing IT Virtual |date |
* [[Hyper-V]] for [[Windows Server 2008 R2]], [[Windows 8]] and later.<ref>{{cite web|url=http://doingitvirtual.com/blogs/virtualzone/archive/2009/07/20/amd-v-rapid-virtualization-indexing-and-windows-server-2008-r2-hyper-v-second-level-address-translation.aspx |title=AMD-V Rapid Virtualization Indexing and Windows Server 2008 R2 Hyper-V Second Level Address Translation |publisher=Doing IT Virtual |access-date=2010-05-16}}</ref> The Windows 8 (and later Microsoft Windows) Hyper-V requires SLAT.<ref>{{cite web|last=Bott |first=Ed |url=https://www.zdnet.com/article/does-your-pc-have-what-it-takes-to-run-windows-8s-hyper-v/ |title=Does your PC have what it takes to run Windows 8's Hyper-V? |publisher=ZDNet |date=2011-12-08 |access-date=2014-02-17}}</ref><ref>{{cite web|url=http://support.amd.com/us/kbarticles/Pages/GPU120AMDRVICPUsHyperVWin8.aspx|title=Support & Drivers|access-date=13 December 2015}}</ref> |
||
* Hypervisor.framework, a native [[macOS]] hypervisor, available since macOS 10.10<ref>{{cite web |url=https://developer.apple.com/documentation/hypervisor |title=Hypervisor | Apple Developer Documentation}}</ref> |
* Hypervisor.framework, a native [[macOS]] hypervisor, available since macOS 10.10<ref>{{cite web |url=https://developer.apple.com/documentation/hypervisor |title=Hypervisor | Apple Developer Documentation}}</ref> |
||
* [[Kernel-based Virtual Machine|KVM]], since version 2.6.26 of the [[Linux kernel mainline]]<ref>{{cite web |url=http://kernelnewbies.org/Linux_2_6_26#head-9d3a23b14ad773c04db09d0e920d2a96927b6b35 |title=Kernel Newbies: Linux 2 6 26}}</ref><ref>{{cite web |
* [[Kernel-based Virtual Machine|KVM]], since version 2.6.26 of the [[Linux kernel mainline]]<ref>{{cite web |url=http://kernelnewbies.org/Linux_2_6_26#head-9d3a23b14ad773c04db09d0e920d2a96927b6b35 |title=Kernel Newbies: Linux 2 6 26}}</ref><ref>{{cite web |
||
Line 43: | Line 49: | ||
|title = Extending KVM with new Intel Virtualization technology |
|title = Extending KVM with new Intel Virtualization technology |
||
|date = 2008-06-12 |
|date = 2008-06-12 |
||
| |
|access-date = 2013-03-17 |
||
|author = Sheng Yang |
|author = Sheng Yang |
||
|publisher = KVM Forum |
|publisher = KVM Forum |
||
|work = linux-kvm.org |
|work = linux-kvm.org |
||
|format = PDF |
|||
|url-status = dead |
|url-status = dead |
||
| |
|archive-url = https://web.archive.org/web/20140327051057/http://www.linux-kvm.org/wiki/images/c/c7/KvmForum2008$kdf2008_11.pdf |
||
| |
|archive-date = 2014-03-27 |
||
}}</ref> |
}}</ref> |
||
* [[Parallels Desktop for Mac]], since version 5<ref>{{Cite web|url=http://kb.parallels.com/en/6854|title=KB Parallels: What's new in Parallels Desktop 5 for Mac|last=Inc|first=Parallels|website=kb.parallels.com|access-date=2016-04-12}}</ref> |
* [[Parallels Desktop for Mac]], since version 5<ref>{{Cite web|url=http://kb.parallels.com/en/6854|title=KB Parallels: What's new in Parallels Desktop 5 for Mac|last=Inc|first=Parallels|website=kb.parallels.com|access-date=2016-04-12}}</ref> |
||
* [[VirtualBox]], since version 2.0.0<ref>{{cite web |url=http://www.virtualbox.org/wiki/Changelog-2.0 |title=Changelog for VirtualBox 2.0 |url-status=dead | |
* [[VirtualBox]], since version 2.0.0<ref>{{cite web |url=http://www.virtualbox.org/wiki/Changelog-2.0 |title=Changelog for VirtualBox 2.0 |url-status=dead |archive-url=https://web.archive.org/web/20141022113745/https://www.virtualbox.org/wiki/Changelog-2.0 |archive-date=2014-10-22 }}</ref> |
||
* [[VMware ESX]], since version 3.5<ref name=vmware/> |
* [[VMware ESX]], since version 3.5<ref name=vmware/> |
||
* [[VMware Workstation]]. VMware Workstation 14 (and later VMware Workstation) requires SLAT.<ref>{{Cite web|last=liz|title=VMware Workstation 14 Pro Release Notes|url=https://docs.vmware.com/en/VMware-Workstation-Pro/14/rn/workstation-14-release-notes.html|access-date=2020-11-19|website=docs.vmware.com|language=en}}</ref> |
|||
* [[Xen]], since version 3.2.0<ref>{{cite web|url=http://virtualization.info/en/news/2008/07/benchmarks-xen-320-on-amd-quad-core.html |title=Benchmarks: Xen 3.2.0 on AMD Quad-Core Opteron with RVI |date=2008-06-15 | |
* [[Xen]], since version 3.2.0<ref>{{cite web|url=http://virtualization.info/en/news/2008/07/benchmarks-xen-320-on-amd-quad-core.html |title=Benchmarks: Xen 3.2.0 on AMD Quad-Core Opteron with RVI |date=2008-06-15 |access-date=2011-05-13}}</ref> |
||
*[[Qubes OS]] — SLAT mandatory<ref>{{Cite web|url=https://www.qubes-os.org/doc/hcl/|title=Hardware Compatibility List (HCL)|website=Qubes OS|access-date=2020-01-06}}</ref> |
*[[Qubes OS]] — SLAT mandatory<ref>{{Cite web|url=https://www.qubes-os.org/doc/hcl/|title=Hardware Compatibility List (HCL)|website=Qubes OS|access-date=2020-01-06}}</ref> |
||
* [[bhyve]]<ref>[http://2013.asiabsdcon.org/papers/abc2013-P5A-paper.pdf Implementation of a BIOS emulation support for BHyVe: A BSD Hypervisor]</ref><ref>{{cite web|url=https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/virtualization-host-bhyve.html|title=21.7. FreeBSD as a Host with bhyve| |
* [[bhyve]]<ref>[http://2013.asiabsdcon.org/papers/abc2013-P5A-paper.pdf Implementation of a BIOS emulation support for BHyVe: A BSD Hypervisor]</ref><ref>{{cite web|url=https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/virtualization-host-bhyve.html|title=21.7. FreeBSD as a Host with bhyve|access-date=13 December 2015}}</ref> — SLAT mandatory and slated to remain mandatory |
||
* vmm, a native hypervisor on [[OpenBSD]] — SLAT mandatory<ref>[http://undeadly.org/cgi?action=article&sid=20150831183826 Coming Soon to OpenBSD/amd64: A Native Hypervisor]</ref><ref>[http://man.openbsd.org/OpenBSD-5.9/man4/amd64/vmm.4 vmm(4) — virtual machine monitor]</ref> |
* vmm, a native hypervisor on [[OpenBSD]] — SLAT mandatory<ref>[http://undeadly.org/cgi?action=article&sid=20150831183826 Coming Soon to OpenBSD/amd64: A Native Hypervisor]</ref><ref>[http://man.openbsd.org/OpenBSD-5.9/man4/amd64/vmm.4 vmm(4) — virtual machine monitor]</ref> |
||
* [https://projectacrn.org ACRN], an open-source lightweight hypervisor, built with real-time and safety-criticality in mind, optimized for IoT and [[Edge computing|Edge]] usages.<ref>[https://projectacrn.github.io/latest/developer-guides/hld/hv-memmgt.html ACRN Memory Management High-Level Design]</ref> |
|||
* [[QEMU]] - an open-source embeddable hypervisor and chipset emulator. <ref>{{Cite web |title=Features/VT-d - QEMU |url=https://wiki.qemu.org/Features/VT-d |access-date=2023-11-12 |website=wiki.qemu.org}}</ref><ref>{{Cite web |title=Hyper-V Enlightenments — QEMU documentation |url=https://www.qemu.org/docs/master/system/i386/hyperv.html |access-date=2023-11-12 |website=www.qemu.org}}</ref><ref>{{Cite web |title=Add Intel VT-d nested translation [LWN.net] |url=https://lwn.net/Articles/939082/ |access-date=2023-11-12 |website=lwn.net}}</ref><ref>{{Cite web |date=2018-10-14 |title=Intel Virtualisation: How VT-x, KVM and QEMU Work Together |url=https://binarydebt.wordpress.com/2018/10/14/intel-virtualisation-how-vt-x-kvm-and-qemu-work-together/ |access-date=2023-11-12 |website=Binary Debt |language=en}}</ref><ref>{{Cite web |title=Features/KVMNestedVirtualizationTestsuite - QEMU |url=https://wiki.qemu.org/Features/KVMNestedVirtualizationTestsuite |access-date=2023-11-12 |website=wiki.qemu.org}}</ref> |
|||
Some of the above hypervisors |
Some of the above hypervisors require SLAT in order to work at all (not just faster) as they do not implement a software shadow page table; the list is not fully updated to reflect that. |
||
== See also == |
== See also == |
||
* [[AMD-V]] (codename Pacifica){{snd}} the first-generation AMD hardware virtualization support |
* [[AMD-V]] (codename Pacifica){{snd}} the first-generation AMD hardware virtualization support |
||
* [[Page table]] |
* [[Page table]] |
||
* [[VT- |
* [[VT-x]] |
||
* arouf |
|||
== References == |
== References == |
||
Line 78: | Line 85: | ||
[[Category:Intel x86 microprocessors]] |
[[Category:Intel x86 microprocessors]] |
||
[[Category:Hardware virtualization]] |
[[Category:Hardware virtualization]] |
||
[[Category:Microprocessors]] |
|||
[[ja:X86仮想化#プロセッサ(第2世代)]] |
[[ja:X86仮想化#プロセッサ(第2世代)]] |
Latest revision as of 16:42, 5 July 2024
Second Level Address Translation (SLAT), also known as nested paging, is a hardware-assisted virtualization technology which makes it possible to avoid the overhead associated with software-managed shadow page tables.
AMD has supported SLAT through the Rapid Virtualization Indexing (RVI) technology since the introduction of its third-generation Opteron processors (code name Barcelona). Intel's implementation of SLAT, known as Extended Page Table (EPT), was introduced in the Nehalem microarchitecture found in certain Core i7, Core i5, and Core i3 processors.
ARM's virtualization extensions support SLAT, known as Stage-2 page-tables provided by a Stage-2 MMU. The guest uses the Stage-1 MMU. Support was added as optional in the ARMv7ve architecture and is also supported in the ARMv8 (32-bit and 64-bit) architectures.
Overview
[edit]The introduction of protected mode to the x86 architecture with the Intel 80286 processor brought the concepts of physical memory and virtual memory to mainstream architectures. When processes use virtual addresses and an instruction requests access to memory, the processor translates the virtual address to a physical address using a page table or translation lookaside buffer (TLB). When running a virtual system, it has allocated virtual memory of the host system that serves as a physical memory for the guest system, and the same process of address translation goes on also within the guest system. This increases the cost of memory access since the address translation needs to be performed twice – once inside the guest system (using software-emulated guest page table), and once inside the host system (using physical map[pmap]).
In order to make this translation efficient, software engineers implemented software based shadow page table. Shadow page table will translate guest virtual memory directly to host physical memory address. Each VM has a separate shadow page table and hypervisor is in charge of managing them. But the cost is very expensive since every time a guest updates its page table, it will trigger the hypervisor to manage the allocation of the page table and its changes.
In order to make this translation more efficient, processor vendors implemented technologies commonly called SLAT. By treating each guest-physical address as a host-virtual address, a slight extension of the hardware used to walk a non-virtualized page table (now the guest page table) can walk the host page table. With multilevel page tables the host page table can be viewed conceptually as nested within the guest page table. A hardware page table walker can treat the additional translation layer almost like adding levels to the page table.
Using SLAT and multilevel page tables, the number of levels needed to be walked to find the translation doubles when the guest-physical address is the same size as the guest-virtual address and the same size pages are used. This increases the importance of caching values from intermediate levels of the host and guest page tables. It is also helpful to use large pages in the host page tables to reduce the number of levels (e.g., in x86-64, using 2 MB pages removes one level in the page table). Since memory is typically allocated to virtual machines at coarse granularity, using large pages for guest-physical translation is an obvious optimization, reducing the depth of look-ups and the memory required for host page tables.
Implementations
[edit]Rapid Virtualization Indexing
[edit]Rapid Virtualization Indexing (RVI), known as Nested Page Tables (NPT) during its development, is an AMD second generation hardware-assisted virtualization technology for the processor memory management unit (MMU).[1][2] RVI was introduced in the third generation of Opteron processors, code name Barcelona.[3]
A VMware research paper found that RVI offers up to 42% gains in performance compared with software-only (shadow page table) implementation.[4] Tests conducted by Red Hat showed a doubling in performance for OLTP benchmarks.[5]
Extended Page Tables
[edit]Extended Page Tables (EPT) is an Intel second-generation x86 virtualization technology for the memory management unit (MMU). EPT support is found in Intel's Core i3, Core i5, Core i7 and Core i9 CPUs, among others.[6] It is also found in some newer VIA CPUs. EPT is required in order to launch a logical processor directly in real mode, a feature called "unrestricted guest" in Intel's jargon, and introduced in the Westmere microarchitecture.[7][8]
According to a VMware evaluation paper, "EPT provides performance gains of up to 48% for MMU-intensive benchmarks and up to 600% for MMU-intensive microbenchmarks", although it can actually cause code to run slower than a software implementation in some corner cases.[9]
Stage-2 page-tables
[edit]Stage-2 page-table support is present in ARM processors that implement exception level 2 (EL2).
Extensions
[edit]Mode Based Execution Control
[edit]Mode Based Execution Control (MBEC) is an extension to x86 SLAT implementations first available in Intel Kaby Lake and AMD Zen+ CPUs (known on the latter as Guest Mode Execute Trap or GMET).[10] The extension extends the execute bit in the extended page table (guest page table) into 2 bits - one for user execute, and one for supervisor execute.[11]
MBE was introduced to speed up guest usermode unsigned code execution with kernelmode code integrity enforcement. Under this configuration, unsigned code pages can be marked as execute under usermode, but must be marked as no-execute under kernelmode. To maintain integrity by ensuring all guest kernelmode executable code are signed even when the guest kernel is compromised, the guest kernel does not have permission to modify the execute bit of any memory pages. Modification of the execute bit, or switching of the guest page table which contains the execute bit, is delegated to a higher privileged entity, in this case the host hypervisor. Without MBE, each entrance from unsigned usermode execution to signed kernelmode execution must be accompanied by a VM exit to the hypervisor to perform a switch to the kernelmode page table. On the reverse operation, an exit from signed kernelmode to unsigned usermode must be accompanied by a VM exit to perform another page table switch. VM exits significantly impact code execution performance.[12][13] With MBE, the same page table can be shared between unsigned usermode code and signed kernelmode code, with two sets of execute permission depending on the execution context. VM exits are no longer necessary when execution context switches between unsigned usermode and signed kernel mode.
Support in software
[edit]Hypervisors that support SLAT include the following:
- Hyper-V for Windows Server 2008 R2, Windows 8 and later.[14] The Windows 8 (and later Microsoft Windows) Hyper-V requires SLAT.[15][16]
- Hypervisor.framework, a native macOS hypervisor, available since macOS 10.10[17]
- KVM, since version 2.6.26 of the Linux kernel mainline[18][19]
- Parallels Desktop for Mac, since version 5[20]
- VirtualBox, since version 2.0.0[21]
- VMware ESX, since version 3.5[4]
- VMware Workstation. VMware Workstation 14 (and later VMware Workstation) requires SLAT.[22]
- Xen, since version 3.2.0[23]
- Qubes OS — SLAT mandatory[24]
- bhyve[25][26] — SLAT mandatory and slated to remain mandatory
- vmm, a native hypervisor on OpenBSD — SLAT mandatory[27][28]
- ACRN, an open-source lightweight hypervisor, built with real-time and safety-criticality in mind, optimized for IoT and Edge usages.[29]
- QEMU - an open-source embeddable hypervisor and chipset emulator. [30][31][32][33][34]
Some of the above hypervisors require SLAT in order to work at all (not just faster) as they do not implement a software shadow page table; the list is not fully updated to reflect that.
See also
[edit]- AMD-V (codename Pacifica) – the first-generation AMD hardware virtualization support
- Page table
- VT-x
References
[edit]- ^ "Rapid Virtualization Indexing with Windows Server 2008 R2 Hyper-V | The Virtualization Blog". Blogs.amd.com. 2009-03-23. Retrieved 2010-05-16.
- ^ "AMD-V Nested Paging" (PDF). July 2008. Archived from the original (PDF) on 2012-09-05. Retrieved 2013-12-11.
- ^ "VMware engineer praises AMD's Nested Page Tables". Searchservervirtualization.techtarget.com. 2008-07-21. Retrieved 2010-05-16.
- ^ a b "Performance Evaluation of AMD RVI Hardware Assist" (PDF). Retrieved 2010-05-16.
- ^ "Red Hat Magazine | Red Hat Enterprise Linux 5.1 utilizes nested paging on AMD Barcelona Processor to improve performance of virtualized guests". Magazine.redhat.com. 2007-11-20. Retrieved 2010-05-16.
- ^ "Intel Virtualization Technology List". Ark.intel.com. Retrieved 2014-02-17.
- ^ "Intel added unrestricted guest mode on Westmere micro-architecture and later Intel CPUs, it uses EPT to translate guest physical address access to host physical address. With this mode, VMEnter without enable paging is allowed."
- ^ "Intel 64 and IA-32 Architectures Developer's Manual, Vol. 3C" (PDF). Intel. Retrieved 13 December 2015.
If the 'unrestricted guest' VM-execution control is 1, the 'enable EPT' VM-execution control must also be 1.
- ^ Performance Evaluation of Intel EPT Hardware Assist
- ^ Cunningham, Andrew (2021-08-27). "Why Windows 11 has such strict hardware requirements, according to Microsoft". Ars Technica. Retrieved 2024-03-18.
- ^ Mulnix, David L. "Intel Xeon Processor Scalable Family Technical Overview". intel. Retrieved 3 September 2021.
- ^ Analysis of the Attack Surface of Windows 10 Virtualization-based Security
- ^ Arkley, Brent. "The potential performance Impact of Device Guard (HVCI)". Borec's Legacy meets Modern Device Management Blog. Retrieved 3 September 2021.
- ^ "AMD-V Rapid Virtualization Indexing and Windows Server 2008 R2 Hyper-V Second Level Address Translation". Doing IT Virtual. Retrieved 2010-05-16.
- ^ Bott, Ed (2011-12-08). "Does your PC have what it takes to run Windows 8's Hyper-V?". ZDNet. Retrieved 2014-02-17.
- ^ "Support & Drivers". Retrieved 13 December 2015.
- ^ "Hypervisor | Apple Developer Documentation".
- ^ "Kernel Newbies: Linux 2 6 26".
- ^ Sheng Yang (2008-06-12). "Extending KVM with new Intel Virtualization technology" (PDF). linux-kvm.org. KVM Forum. Archived from the original (PDF) on 2014-03-27. Retrieved 2013-03-17.
- ^ Inc, Parallels. "KB Parallels: What's new in Parallels Desktop 5 for Mac". kb.parallels.com. Retrieved 2016-04-12.
{{cite web}}
:|last=
has generic name (help) - ^ "Changelog for VirtualBox 2.0". Archived from the original on 2014-10-22.
- ^ liz. "VMware Workstation 14 Pro Release Notes". docs.vmware.com. Retrieved 2020-11-19.
- ^ "Benchmarks: Xen 3.2.0 on AMD Quad-Core Opteron with RVI". 2008-06-15. Retrieved 2011-05-13.
- ^ "Hardware Compatibility List (HCL)". Qubes OS. Retrieved 2020-01-06.
- ^ Implementation of a BIOS emulation support for BHyVe: A BSD Hypervisor
- ^ "21.7. FreeBSD as a Host with bhyve". Retrieved 13 December 2015.
- ^ Coming Soon to OpenBSD/amd64: A Native Hypervisor
- ^ vmm(4) — virtual machine monitor
- ^ ACRN Memory Management High-Level Design
- ^ "Features/VT-d - QEMU". wiki.qemu.org. Retrieved 2023-11-12.
- ^ "Hyper-V Enlightenments — QEMU documentation". www.qemu.org. Retrieved 2023-11-12.
- ^ "Add Intel VT-d nested translation [LWN.net]". lwn.net. Retrieved 2023-11-12.
- ^ "Intel Virtualisation: How VT-x, KVM and QEMU Work Together". Binary Debt. 2018-10-14. Retrieved 2023-11-12.
- ^ "Features/KVMNestedVirtualizationTestsuite - QEMU". wiki.qemu.org. Retrieved 2023-11-12.