Comparing sequential I/O performance between io_uring
and read(2)
/write()
Quick post, just wanted to share some details about the results from the latest video
As the title says, I’m comparing the performance of sequential read and write with read(2)/write(2) (which I’ll collectively call sync
from now on) with io_uring
. This is nothing more than implementing cp
with uring and comparing it against the familiar command line tool. Vanilla, non fancy sync
based cp
wins, hands down, in case you’re wondering and this shouldn’t come as a surprise if you know how io_uring
works (and if you don’t, may i suggest some intro material)
As you’ll see in the video, measuring the two with time
shows that uring spends a multiple of its life in kernel land.
chris@desktop:~/sources/codetales/code_tales/episode012$ time target/release/copy io-uring
real 0m2,084s
user 0m1,508s
sys 0m3,848s
This makes sense, given that io_uring deals with blocking reads and writes by handing the work off to a kernel worker. In this implementation i use the ring to “emulate” sequential, blocking reads and writes but still we clearly have a lot of parallelism in the kernel. The following flamegraph shows exactly that.
Compare that to the timing for vanilla cp
chris@desktop:~/sources/codetales/code_tales/episode012$ time cp chunk chunk_copy
real 0m1,752s
user 0m0,000s
sys 0m1,686s
and its flamegraph
Yup. That’s about half the work compared to io_uring
and the half that’s missing is io_sq_thread
. And what does that method do? It manages the io_uring
workers.
So there. Not much of a mystery but i thought this would tie a nice bow on the video and is a nice place to host the flamegraphs i’ll show there.
Thank you for reading.