Archive for the ‘syscall’ tag
Above picture was shamelessly stolen from: http://computer-history.info/Page4.dir/pages/IBM.7030.Stretch.dir/
In this blog post I’m going to follow suit on my threading models post (here) and talk about different types of I/O, how they work, and when you might want to consider using them. Much like with threading models, I/O models have terminology which can be confusing. The confusion leads to misconceptions which will hopefully be cleared up here.
Let’s start first by going over some operating system basics.
A system call is a common interface which allows user applications and the operating system kernel to interact with one another. Some familiar functions which are system calls: open(), read(), and write(). These are system calls which ask the kernel to do I/O on behalf of the user process.
There is a cost associated with making system calls. In Linux, system calls are implemented via a software interrupt which causes a privilege level change in the processor – this switch from user to kernel mode is commonly called a context-switch.
User applications typically execute at the most restricted privilege level available where interaction with I/O devices (and other stuff) is not allowed. As a result user applications use system calls to get the kernel to complete privileged I/O (and other) operations.
Synchronous blocking I/O
This is the most familiar and most common type of I/O out there. When an I/O operation is initiated in this model (maybe by calling a system call such as read(), write(), ioctl(), …), the user application making the system call is put into a waiting state by the kernel. The application sleeps until the I/O operation has completed (or has generated an error) at which point it is scheduled to run again. Data is transferred from the device to memory and possibly into another buffer for the user-land application.
- Easy to use and well understood
- Does not maximize I/O throughput
- Causes all threads in a process to block if that process uses green threads
This method of I/O is very straight forward and simple to use, but it has many downsides. In a previous post about threading models, I mentioned that doing blocking I/O in a green thread causes all green threads to stop executing until the I/O operation has completed.
This happens because there is only one kernel context which can scheduled, so that context is put into a waiting state in the kernel until the I/O has been copied to the user buffer and the process can run again.
Synchronous non-blocking I/O
This model of I/O is not very well known compared to other models. This is good because this model isn’t very useful.
In this model, a file descriptor is created via open(), but a flag is passed in (O_NONBLOCK on most Linux kernels) to tell the kernel: If data is not available immediately, do not put me to sleep. Instead let me know so I can go on with my life. I’ll try back later.
- If no I/O is available other work can be completed in the meantime
- When I/O is available, is does not block the thread (even models with green threads)
- Does not maximize I/O throughput for the application
- Lots of system call overhead – constantly making system calls to see if I/O is ready
- Can be high latency if I/O arrives and a system call is not made for a while
This model of I/O is typically very inefficient because the I/O system call made by the application may return EAGAIN or EWOULDBLOCK repeatedly. The application can either:
- wait around for the data to finish (repeatedly calling its I/O system call over and over) — or
- try to do other work for a bit, and retry the I/O system call later
At some point the I/O will either return an error or it will be able to complete.
If this type of I/O is used in a system with green threads, the entire process is not blocked but the efficiency is very poor due to the constant polling with system calls from user-land. Each time a system call is invoked a privelege level change occurs on the processor and the execution state of the application has to be saved out to memory (or disk!) so that the kernel can execute.
Asynchronous blocking I/O
This model of I/O is much more well known. In fact, this is how Ruby implements I/O for its green threads.
In this model, non-blocking file descriptors are created (similar to the previous model) and they monitored by calling either select() or poll(). The system call to select()/poll() blocks the process (the process is put into a sleeping state in the kernel) and the system call returns when either an error has occurred or when the file descriptors are ready to be read from or written to.
- When I/O is available is does not block
- Lots of I/O can be issued to execute in parallel
- Notifications occur when one or more file descriptors are ready (helps to improve I/O throughput)
- Calling select(), poll(), or epoll_wait() blocks the calling thread (entire application if using green threads)
- Lots of file descriptors for I/O means lots that have to be checked (can be avoided with epoll)
What is important to note here is that more than one file descriptor can be monitored and when select/poll returns, more than one of the file descriptors may be able to do non-blocking I/O. This is great because it increases the application’s I/O throughput by allowing many I/O operations to occur in parallel.
Of course there are two main drawbacks of using this model:
- select()/poll() block – so if they are used in a system with green threads, all the threads are put to sleep while these system calls are executing.
- You must check the entire set of file descriptors to determine which are ready. This can be bad if you have a lot of file descriptors, because you can potentially spend a lot of time checking file descriptors which aren’t ready (epoll() fixes this problem).
This model is important for all you Ruby programmers out there — this is the type of I/O that Ruby uses internally. The calls to select cause Ruby to block while they are being executed.
There are some work-arounds though:
- Timeouts – select() and poll() let you set timeouts so your app doesn’t have to sleep endlessly if there is no I/O to process – it can continue executing other code in the meantime. This what Ruby does.
- epoll() (or kqueue on bsd)- epoll() allows you to register a set of file descriptors you are interested in. You then make blocking epoll_wait calls (they accept timeouts) which will return only the file descriptors which are ready for I/O. This allows you to avoid searching through all your file descriptors every time.
At the very least you should set a timeout so that you can do other work if no I/O is ready. If possible though, use epoll().
Asynchronous non-blocking I/O
This is probably the least widely known model of I/O out there. This model of io is implemented via the libaio library in Linux.
In this I/O model, you can initiate I/O using aio_read(), aio_write(), and a few others. Before using these functions, you must set up a struct aiocb including fields which indicate how you’d like to get notifications and where the data can be read from or written to. Notifications can be delivered in a couple different ways:
- Signal – a SIGIO is delivered to the process when the I/O has completed
- Callback – a callback function is called when the I/O has completed
- Helps maximize I/O throughput by allowing lots of I/O to issued in parallel
- Allows application to continue processing while I/O is executing, callback or POSIX signal when done
- Wrapper for libaio may not exist for your programming environment
- Network I/O may not be supported
This method of I/O is really awesome because it does not block the calling application and allows multiple I/O operations to executed in parallel which increases the I/O throughput of the application.
The downsides to using libaio are:
- Wrapper may not exist for your favorite programming language.
- Unclear whether libaio supports network I/O on all systems — may only support disk I/O. When this happens, the library falls back to using normal synchronous blocking I/O.
You should try out this I/O model if your programming environment has support for it and it either has support for network I/O or you don’t need it.
In conclusion, you should use synchronous blocking I/O when you are writing small apps which won’t see much traffic. For more intense applications, you should definitely use one of the two asynchronous models. If possible, avoid synchronous non-blocking I/O at all costs.
Remember that the goal is to increase I/O throughput to scale your application to withstand thousands of requests per second. Doing any sort of blocking I/O in your application can (depending on threading model) cause your entire application to block, increasing latency and slowing the user experience to a crawl.
In this post I’m going to describe some simple uses for one of my favorite command line tools: strace. In the next post, I’ll show how I used strace to fix a bug in ruby 1.8.7′s threading implementation which had significant performance implications.
strace comes standard with all linux distributions I’ve used and for good reason. strace is a handy command line tool which allows you to see which system calls a particular user application is hitting, the arguments passed passed in, and the return values. This is particularly useful if you are trying to debug cryptic error messages from poorly documented software or if you are interested in taking a peak under the hood of an application you are tuning, debugging, or optimizing. As a rule of thumb, if some command-line utility is misbehaving before googling or bugging some one, I always run strace and take a look at where it is choking up.
Let’s look at a simple C program and a simple ruby program and compare the strace output to see how both apps interact with kernel.
main(int argc, char *argv)
Both programs look pretty simple, but the strace output is not so simple. I like to use “strace -ttT executable” which gives the time each syscall was made and how long each syscall took. Check the strace man page for more flags – it is very flexible. strace allows you to track the syscall counts, filter for specific syscalls, and more.
At a high level: wow. A lot of stuff happens when you execute code, especially ruby code! Let’s take a look at some stuff in particular.
When the C-version of hello starts up, the binary is passed to execve to be executed, the heap is extended with a call to brk, and various regions of RAM are mmapped for shared libraries. Finally, we see that write is called and passed a file descriptor of 1, our message, and the length. So, this strace gives us an idea of the startup cost for a simple C executable and what syscalls that executable hits.
Let’s compare this to the strace for our ruby test script.
The ruby version of hello incurs a much greater initialization cost, but this makes sense. First, the ruby interpreter needs to be loaded which memory maps various regions of RAM for shared libraries. We can see from the strace which libraries ruby loads; in this case we see things like: libruby, libdl, libc, and libcrypt just to name a few. In the end though, ruby also calls the write system call, passes a file descriptor of 1, the message, and the length.
If a syscall fails, you would see error codes returned from the syscalls — on your local box try something like “strace /dev/null/nada” and you should see some “No such file or directory” errors.
That’s about it for my first post – If you don’t recognize some of the listed syscalls, their arguments, or return values consult the almighty man pages for more information. I hope you will work strace into your toolbox if you haven’t already. For those of you who already know about strace, this should serve as a nice warmup before we start digging into ruby internals in my next blog post.