Jump to content

Io uring: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
add full name with reference to primary source
 
(44 intermediate revisions by 24 users not shown)
Line 1: Line 1:
{{Short description|Asynchronous Linux Input/Output API}}
{{Short description|Linux kernel interface for storage devices}}
{{Short description|Linux async io interface}}{{DISPLAYTITLE:io_uring}}
{{DISPLAYTITLE:io_uring}}
'''io_uring''' (previously known as '''aioring''') is a [[Linux kernel]] [[system call]] interface for storage device [[asynchronous I/O]] operations supposed to address performance issues with similar interfaces provided by functions like {{Code|read()}}/{{Code|write()}} or {{Code|aio_read()}}/{{Code|aio_write()}} etc. for [[File operation|operation]]<nowiki/>s on data accessed by [[file descriptor]]<nowiki/>s.<ref name="Axboe2019" /><ref>{{Cite web|last=Axboe|first=Jens|date=October 15, 2019|title=Efficient IO with io_uring|url=https://kernel.dk/io_uring.pdf|url-status=live}}</ref>{{Rp|page=2}} It has been primarily developed by [[Jens Axboe]] at Facebook.<ref name="Axboe2019" />
'''io_uring'''{{efn|Input/output user ring<ref>{{cite web|url=https://fosstodon.org/@axboe/113541516336804778|first1=Jens|last1=Axboe|author-link=Jens Axboe|title=@axboe@fosstodon.org}}</ref>}} (previously known as '''aioring''') is a [[Linux kernel]] [[system call]] interface for storage device [[asynchronous I/O]] operations addressing performance issues with similar interfaces provided by functions like {{Code|read()}}/{{Code|write()}} or {{Code|aio_read()}}/{{Code|aio_write()}} etc. for [[file operation|operations]] on data accessed by [[file descriptor]]s.{{R|name=Phoronix2019}}<ref>{{Cite web|last=Axboe|first=Jens|date=October 15, 2019|title=Efficient IO with io_uring|url=https://kernel.dk/io_uring.pdf}}</ref>{{Rp|page=2}}


Development is ongoing, worked on primarily by [[Jens Axboe]] at [[Meta Platforms|Meta]].{{R|name=Phoronix2019}}
Internally it works by creating two buffers dubbed as "queue rings" ([[circular buffer]]<nowiki/>s) for storage of submission and completion of I/O requests (for storage devices, submission queue (SQ) and completion queue (CQ) respectively).<ref name=":1">{{Cite web|title=Getting Hands-on with io_uring using Go|url=https://developers.mattermost.com/blog/hands-on-iouring-go/|access-date=2021-11-20|website=developers.mattermost.com|language=en-us}}</ref> Most importantly, keeping these buffers shared between the kernel and application helps to boost the [[IOPS|I/O performance]] by eliminating the need to issue extra and expensive system calls to copy these buffers between the two.<ref name="Axboe2019">{{Cite web|title=Linux Kernel Getting io_uring To Deliver Fast & Efficient I/O - Phoronix|url=https://www.phoronix.com/scan.php?page=news_item&px=Linux-io_uring-Fast-Efficient|url-status=live|access-date=2021-03-14|website=[[Phoronix]]}}</ref><ref name=":0">{{Cite web|title=The rapid growth of io_uring [LWN.net]|url=https://lwn.net/Articles/810414/|access-date=2021-11-20|website=lwn.net}}</ref><ref name=":1" /> According to the io_uring design paper, the SQ buffer is writable only by consumer application, and CQ - by kernel.{{R|name=Axboe2019|page=3}}


==Interface==
The API provided by {{Code|liburing}} library for userspace (applications) can be used to interact with the kernel interface more easily.<ref name="Axboe2019" />{{R|name=Axboe2019|page=12}}


It works by creating two [[circular buffer]]s, called "queue rings", for storage of submission and completion of I/O requests, respectively. For storage devices, these are called the submission queue (SQ) and completion queue (CQ).<ref name=":1">{{Cite web|title=Getting Hands-on with io_uring using Go|url=https://developers.mattermost.com/blog/hands-on-iouring-go/|access-date=2021-11-20|website=developers.mattermost.com|language=en-us}}</ref> Keeping these buffers shared between the kernel and application helps to boost the [[IOPS|I/O performance]] by eliminating the need to issue extra and expensive system calls to copy these buffers between the two.<ref name="Phoronix2019">{{Cite web |title=Linux Kernel Getting io_uring To Deliver Fast & Efficient I/O |date=2019-02-14 |url=https://www.phoronix.com/scan.php?page=news_item&px=Linux-io_uring-Fast-Efficient |access-date=2021-03-14 |website=[[Phoronix]]}}</ref><ref name=":1" /><ref name=":0">{{Cite web|title=The rapid growth of io_uring [LWN.net]|url=https://lwn.net/Articles/810414/|access-date=2021-11-20|website=lwn.net}}</ref> According to the io_uring design paper, the SQ buffer is writable only by consumer applications, and the CQ buffer is writable only by the kernel.{{R|name=Phoronix2019|page=3}}
Both kernel interface and library were adapted in Linux 5.1 kernel version.<ref name="Axboe2019" /><ref name=":0" /><ref>{{Cite web|title=Faster IO through io_uring {{!}} Kernel Recipes 2019|url=https://kernel-recipes.org/en/2019/talks/faster-io-through-io_uring/|access-date=2021-03-14|language=en-GB}}</ref>

[[eBPF]] can be combined with io_uring.<ref>{{Cite web |title=BPF meets io_uring [LWN.net] |url=https://lwn.net/Articles/847951/ |access-date=2023-04-17 |website=[[LWN.net]]}}</ref>

==History==

The Linux kernel has supported [[asynchronous I/O]] since version 2.5, but it was seen as difficult to use and inefficient.<ref>{{Cite web|last=Corbet|first=Jonathan|title=Ringing in a new asynchronous I/O API|url=https://lwn.net/Articles/776703/|access-date=2021-03-14|website=[[LWN.net]]}}</ref> This older API only supported certain niche [[use cases]],<ref>{{cite web | url = https://kernel.dk/axboe-kr2022.pdf | title = What's new with io_uring | access-date = 2022-06-01}}</ref> notably it only enables asynchronous operation when using the O_DIRECT flag and while accessing already allocated files. This prevents utilizing the [[page cache]], while also exposing the application to complex O_DIRECT semantics. Linux AIO also does not support sockets, so it cannot be used to multiplex network and disk I/O.<ref>{{Cite web |url=http://code.google.com/p/kernel/wiki/AIOUserGuide |title=Linux Asynchronous I/O |date=2014-04-21 |archive-url=https://web.archive.org/web/20150406015143/http://code.google.com/p/kernel/wiki/AIOUserGuide |access-date=2023-06-16 |archive-date=2015-04-06 |quote=Blocking during io_submit on ext4, on buffered operations, network access, pipes, etc. Some operations are not well-represented by the AIO interface. With completely unsupported operations like buffered reads, operations on a socket or pipes, the entire operation will be performed during the io_submit syscall, with the completion available immediately for access with io_getevents. AIO access to a file on a filesystem like ext4 is partially supported: if a metadata read is required to look up the data block (ie if the metadata is not already in memory), then the io_submit call will block on the metadata read. Certain types of file-enlarging writes are completely unsupported and block for the entire duration of the operation. }}</ref>

The io_uring kernel interface was adopted in Linux kernel version 5.1 to resolve the deficiencies of Linux AIO.{{R|name=Phoronix2019}}<ref name=":0" /><ref>{{Cite web|title=Faster IO through io_uring |website=Kernel Recipes 2019 |url=https://kernel-recipes.org/en/2019/talks/faster-io-through-io_uring/|access-date=2021-03-14|language=en-GB}}</ref> The liburing library provides an [[API]] to interact with the kernel interface easily from [[userspace]].{{R|name=Phoronix2019|page=12}}

==Security==

io_uring has been noted for exposing a significant attack surface and structural difficulties integrating it with the [[Linux Security Modules|Linux security subsystem]].<ref>{{Cite web |url=https://lwn.net/Articles/902466/ |title=Security requirements for new kernel features |date=2022-07-28 |access-date=2023-06-16 |website=[[LWN.net]] |last=Corbet |first=Jonathan}}</ref>

In June 2023, Google's security team reported that 60% of the [[Exploit (computer security)|exploits]] submitted to their [[bug bounty program]] in 2022 were exploits of the Linux kernel's io_uring vulnerabilities. As a result, <code>io_uring</code> was disabled for apps in [[Android (operating system)|Android]], and disabled entirely in [[ChromeOS]] as well as Google servers.<ref>{{cite web |last1=Koczka |first1=Tamás |title=Learnings from kCTF VRP's 42 Linux kernel exploits submissions |url=https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html |website=Google Online Security Blog |publisher=Google |access-date=14 June 2023 |language=en |archive-url=https://web.archive.org/web/20240922183950/https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html |archive-date=2024-09-22 |url-status=live |quote=60% of the submissions exploited the io_uring component of the Linux kernel}}</ref> [[Docker (software)|Docker]] also consequently disabled io_uring from their default [[seccomp]] profile.<ref>{{Cite web |title=Update RuntimeDefault seccomp profile to disallow io_uring related syscalls by vinayakankugoyal · Pull Request #9320 · containerd/containerd |url=https://github.com/containerd/containerd/pull/9320 |date=2023-11-02 |access-date=2024-10-20 |website=GitHub |language=en |archive-url=https://web.archive.org/web/20240106225425/https://github.com/containerd/containerd/pull/9320 |archive-date=2024-01-06 |url-status=live}}</ref>

== Notes ==
{{Notelist}}


== History ==
The Linux kernel had [[asynchronous I/O]] since version 2.5, but it was seen as difficult to use and inefficient.<ref>{{Cite web|last=Corbet|first=Jonathan|title=Ringing in a new asynchronous I/O API|url=https://lwn.net/Articles/776703/|url-status=live|access-date=2021-03-14|website=[[LWN.net]]}}</ref>
== References ==
== References ==
{{Reflist}}
{{Reflist}}
Line 16: Line 31:
== External links ==
== External links ==
* [https://kernel.dk/io_uring.pdf Efficient I/O with io_uring], in-depth description of motivation behind io_uring, interface (data structures etc.), and performance assessment
* [https://kernel.dk/io_uring.pdf Efficient I/O with io_uring], in-depth description of motivation behind io_uring, interface (data structures etc.), and performance assessment
* [https://git.kernel.dk/cgit/liburing/tree/ {{Code|liburing}} source repository]

* [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/io_uring {{Code|io_uring}} source directory in the Linux kernel repository]


{{Linux kernel}}
{{Linux kernel}}
Line 22: Line 38:
[[Category:Interfaces of the Linux kernel]]
[[Category:Interfaces of the Linux kernel]]
[[Category:Linux kernel features]]
[[Category:Linux kernel features]]
[[Category:Articles with underscores in the title]]

Latest revision as of 20:09, 25 November 2024

io_uring[a] (previously known as aioring) is a Linux kernel system call interface for storage device asynchronous I/O operations addressing performance issues with similar interfaces provided by functions like read()/write() or aio_read()/aio_write() etc. for operations on data accessed by file descriptors.[2][3]: 2 

Development is ongoing, worked on primarily by Jens Axboe at Meta.[2]

Interface

[edit]

It works by creating two circular buffers, called "queue rings", for storage of submission and completion of I/O requests, respectively. For storage devices, these are called the submission queue (SQ) and completion queue (CQ).[4] Keeping these buffers shared between the kernel and application helps to boost the I/O performance by eliminating the need to issue extra and expensive system calls to copy these buffers between the two.[2][4][5] According to the io_uring design paper, the SQ buffer is writable only by consumer applications, and the CQ buffer is writable only by the kernel.[2]: 3 

eBPF can be combined with io_uring.[6]

History

[edit]

The Linux kernel has supported asynchronous I/O since version 2.5, but it was seen as difficult to use and inefficient.[7] This older API only supported certain niche use cases,[8] notably it only enables asynchronous operation when using the O_DIRECT flag and while accessing already allocated files. This prevents utilizing the page cache, while also exposing the application to complex O_DIRECT semantics. Linux AIO also does not support sockets, so it cannot be used to multiplex network and disk I/O.[9]

The io_uring kernel interface was adopted in Linux kernel version 5.1 to resolve the deficiencies of Linux AIO.[2][5][10] The liburing library provides an API to interact with the kernel interface easily from userspace.[2]: 12 

Security

[edit]

io_uring has been noted for exposing a significant attack surface and structural difficulties integrating it with the Linux security subsystem.[11]

In June 2023, Google's security team reported that 60% of the exploits submitted to their bug bounty program in 2022 were exploits of the Linux kernel's io_uring vulnerabilities. As a result, io_uring was disabled for apps in Android, and disabled entirely in ChromeOS as well as Google servers.[12] Docker also consequently disabled io_uring from their default seccomp profile.[13]

Notes

[edit]
  1. ^ Input/output user ring[1]

References

[edit]
  1. ^ Axboe, Jens. "@axboe@fosstodon.org".
  2. ^ a b c d e f "Linux Kernel Getting io_uring To Deliver Fast & Efficient I/O". Phoronix. 2019-02-14. Retrieved 2021-03-14.
  3. ^ Axboe, Jens (October 15, 2019). "Efficient IO with io_uring" (PDF).
  4. ^ a b "Getting Hands-on with io_uring using Go". developers.mattermost.com. Retrieved 2021-11-20.
  5. ^ a b "The rapid growth of io_uring [LWN.net]". lwn.net. Retrieved 2021-11-20.
  6. ^ "BPF meets io_uring [LWN.net]". LWN.net. Retrieved 2023-04-17.
  7. ^ Corbet, Jonathan. "Ringing in a new asynchronous I/O API". LWN.net. Retrieved 2021-03-14.
  8. ^ "What's new with io_uring" (PDF). Retrieved 2022-06-01.
  9. ^ "Linux Asynchronous I/O". 2014-04-21. Archived from the original on 2015-04-06. Retrieved 2023-06-16. Blocking during io_submit on ext4, on buffered operations, network access, pipes, etc. Some operations are not well-represented by the AIO interface. With completely unsupported operations like buffered reads, operations on a socket or pipes, the entire operation will be performed during the io_submit syscall, with the completion available immediately for access with io_getevents. AIO access to a file on a filesystem like ext4 is partially supported: if a metadata read is required to look up the data block (ie if the metadata is not already in memory), then the io_submit call will block on the metadata read. Certain types of file-enlarging writes are completely unsupported and block for the entire duration of the operation.
  10. ^ "Faster IO through io_uring". Kernel Recipes 2019. Retrieved 2021-03-14.
  11. ^ Corbet, Jonathan (2022-07-28). "Security requirements for new kernel features". LWN.net. Retrieved 2023-06-16.
  12. ^ Koczka, Tamás. "Learnings from kCTF VRP's 42 Linux kernel exploits submissions". Google Online Security Blog. Google. Archived from the original on 2024-09-22. Retrieved 14 June 2023. 60% of the submissions exploited the io_uring component of the Linux kernel
  13. ^ "Update RuntimeDefault seccomp profile to disallow io_uring related syscalls by vinayakankugoyal · Pull Request #9320 · containerd/containerd". GitHub. 2023-11-02. Archived from the original on 2024-01-06. Retrieved 2024-10-20.
[edit]