Quick post, just wanted to share some details about the results from the latest video

As the title says, I’m comparing the performance of sequential read and write with read(2)/write(2) (which I’ll collectively call sync from now on) with io_uring. This is nothing more than implementing cp with uring and comparing it against the familiar command line tool. Vanilla, non fancy sync based cp wins, hands down, in case you’re wondering and this shouldn’t come as a surprise if you know how io_uring works (and if you don’t, may i suggest some intro material)

As you’ll see in the video, measuring the two with time shows that uring spends a multiple of its life in kernel land.

chris@desktop:~/sources/codetales/code_tales/episode012$ time target/release/copy io-uring

real    0m2,084s
user    0m1,508s
sys     0m3,848s

This makes sense, given that io_uring deals with blocking reads and writes by handing the work off to a kernel worker. In this implementation i use the ring to “emulate” sequential, blocking reads and writes but still we clearly have a lot of parallelism in the kernel. The following flamegraph shows exactly that.

Flamegraph for io_uring copy. Click for the actual SVG

Compare that to the timing for vanilla cp

chris@desktop:~/sources/codetales/code_tales/episode012$ time cp chunk chunk_copy 

real    0m1,752s
user    0m0,000s
sys     0m1,686s

and its flamegraph

Flamegraph for sync copy. Click for the actual SVG

Yup. That’s about half the work compared to io_uring and the half that’s missing is io_sq_thread. And what does that method do? It manages the io_uring workers.

So there. Not much of a mystery but i thought this would tie a nice bow on the video and is a nice place to host the flamegraphs i’ll show there.

Thank you for reading.