Comparing sequential I/O performance between
Quick post, just wanted to share some details about the results from the latest video
As the title says, I’m comparing the performance of sequential read and write with read(2)/write(2) (which I’ll collectively call
sync from now on) with
io_uring. This is nothing more than implementing
cp with uring and comparing it against the familiar command line tool. Vanilla, non fancy
cp wins, hands down, in case you’re wondering and this shouldn’t come as a surprise if you know how
io_uring works (and if you don’t, may i suggest some intro material)
As you’ll see in the video, measuring the two with
time shows that uring spends a multiple of its life in kernel land.
chris@desktop:~/sources/codetales/code_tales/episode012$ time target/release/copy io-uring real 0m2,084s user 0m1,508s sys 0m3,848s
This makes sense, given that io_uring deals with blocking reads and writes by handing the work off to a kernel worker. In this implementation i use the ring to “emulate” sequential, blocking reads and writes but still we clearly have a lot of parallelism in the kernel. The following flamegraph shows exactly that.
Compare that to the timing for vanilla
chris@desktop:~/sources/codetales/code_tales/episode012$ time cp chunk chunk_copy real 0m1,752s user 0m0,000s sys 0m1,686s
and its flamegraph
Yup. That’s about half the work compared to
io_uring and the half that’s missing is
io_sq_thread. And what does that method do? It manages the
So there. Not much of a mystery but i thought this would tie a nice bow on the video and is a nice place to host the flamegraphs i’ll show there.
Thank you for reading.