Performance Engineering: Series-II
How to improve Throughput of a Service?
Just to recap, Throuhput is measured as the number of requests an app can handle in a second without compromising on the latency sla. Before we go deep into how to improve the performance let's look at the flow in a typical web service. Most of the webservices, pull data from a data store, then processes it and serves it as part of the response. Serialization is involved in reading the request and serving the resopnse back. To understand better, let's take a look at how a single request flows through.
The request lands on a worker thread from the server's thread pool. This thread de-serializes the request and makes a database call to load the required data. Until the database server responds, the thread is blocked (blocked state & not runnable). Similarly, the client is blocked until the response is provided by the server. Now let's take a look at a multi-request flow.
If you observe the above flow carefully, you would notice that "server thread-1" goes into a blocked state from time T1 till T3. So a new request comes at time T2, it had to be handles by another worker "sever thread-2". This would mean that the kernel has to perform a thread context switch from "server thread-1" to "server thread-2". Context switching between kernel threads requires saving the value of the CPU registers from the thread being switched out and restoring the CPU registers of the new thread being scheduled, which is a costly operation (cpu cycles are lost here). Now imagine the cycles lost when a webserver that has hundreds of threads is handling incoming requests. Is there was a mechanism to avoid a thread context switch or keep it minimal?
Enter the world of Reactive Programming!!
Reactive programming is non-blocking and can be achieved in any of the below ways:
- Dataflow Programming - Shared Memory
- Reactive Streams
In the case of a multi-threaded model, the blocked thread periodically checks if the socket/connection has data from the other side. In the case of reactive system, the same is taken care by event loop. Event loop uses epoll to check if data is ready on any of the file descriptors. That way, we have a dedicated thread that takes care of the i/o operations while the remaining threads can perform the actual work. Instead of a pool of threads, this model uses a queue of tasks/events that are readily available for processing. The worker thread is kept equal to the logical cores as these threads are going to perform tasks in a non-blocking fashion without needing a thread context switch. Let's see some of the popular implementations in this space:
- Tokio - in Rust
- Quarkus - uses netty & vert.x
- Co-routines in GoLang & Kotlin
- Akka - uses actors
What is the benefit of making a legacy/existing system reactive?
When building a new System, it could be built using a reactive methodology. But what is the incentive in modifying an existing app? Given that most of the services are now deployed on cloud, making an existng system reactive would result in more throughput per host/pod. This translates directly to cutting the cost to half or less depeding on the use-case. If the development budget is less, you could still make use of reactive methodology for the core service being handling by your team. Quarkus is one such frameowork that makes it easier to port your application wwithout a major rewrite.