Tuesday, October 1, 2013

How Many Threads Should I Run?

Running operations sequentially (i.e. one after another) often leaves one or more of your CPU cores sitting idle. This can lead to poor performance, or at the very least, not optimum performance. Why not make use of those idle processors and get your work done quicker?

This leads us to spawning multiple threads in our application to run operations concurrently. But how do you know how many threads to run? If you've ever written concurrent applications, you've probably found yourself increasing the thread count, watching your application finish quicker along the way, but then get slower once your thread count exceeds a certain number. So, you reduce the number of threads until you find that "magic number" of threads that seems to yield the best performance. But, how do you find that "magic number" of threads without so much experimentation? Knowing the number of CPU cores on the machine and the type of work that each thread is doing, we can accurately estimate the number of threads we need.

Let's say that we have four CPU cores on our machine. To take full advantage of the CPU, we should run at least four threads. However, we can increase the number of threads by quite a bit, if the threads spend most of their time waiting on a response from outside calls (e.g. web service requests). If the threads are sitting in a blocking/wait state, why not spawn another thread and put another service call in that external service queue? However, if your threads are very busy crunching numbers and chewing up CPU cycles, then you'll actually pay a penalty by spawning more threads. This is because the CPU cores are already busy, so throwing another thread at it, will only make the CPU switch back and forth between jobs (i.e. context switching).

When we talk about type of work that your threads are performing, we're referring to the "blocking coefficient". A job that spends most of it's time blocking/waiting for some external service has a very high blocking coefficient. A job that spends its time crunching numbers and performing calculations has a very low blocking coefficient, because it doesn't block (or blocks very minimally).

When determining the number of threads to run in your application, use the following formula:

Number of Threads = Number of CPU Cores / (1 - Blocking Coefficient Percentage)

For example, if the job that you're planning to multi-thread spends 90% of its time blocking/waiting on an external service, the "Blocking Coefficient Percentage" = .90. This means that on a quad core machine, you would want 40 threads to ensure that you are taking full advantage of your CPU.

There are a few things to keep in mind when using this algorithm. First of all, don't forget that your development machine won't necessarily have the same number of CPU cores as the production server that will run your code. This will require you to calculate the number of available CPU cores at runtime. Also, don't forget that there may be other applications running on that production server, so don't be a CPU hog!

Writing concurrent applications is much more involved than this simple formula, but this should give you a good starting point when trying to figure out how many threads you should spawn.