Linux AIO sucks less

Posted by Christopher Smith Sun, 30 Mar 2008 16:11:00 GMT

So, the last little while on the Zumastor project, I’ve been working on integrating AIO in to the code base in order to absorb some of the latency penalty that we experience from disk seeks.

Critical for this was getting AIO to work with poll(2), because the ddsnapd daemon follows the tried-and-true “poll then do something” loop that allows for efficient, scalable, and relatively simple Unix server design. Unfortunately (or fortunately, if you are familiar with POSIX ;-), Linux’s native AIO doesn’t follow the POSIX AIO spec, and instead implements it’s own event queue for notification of completion of IO operations. This event queue isn’t exposed as a file, so you can’t poll it. So, I hacked together a library that spawns a separate thread which does nothing but read in events and copy them out to a pipe, so that the main thread can poll said pipe just like any other file descriptor. Ugly? Yes. Wasteful? Yes. Easier to work with than the apparent alternatives? Yes.

I got most of the way through the process. I discovered what appears to be some kind of race condition in AIO where the vast majority of the time I was losing completion events if I submitted multiple IO requests at once. I still haven’t tracked it down, but while looking for possible sources of the problem, I discovered a heretofore unknown (well, by those of us on the project at least) syscall: eventfd(2).

eventfd does for AIO what signalfd(2) does for signals. In other words: it does the obvious thing that we wanted in the first place but were too mentally challenged to find. The moral of the story: even if you think a Linux API (AIO in this case) sucks, expect it to suck less, and question why when it doesn’t.