In modern web development, performance is crucial, especially for API endpoints that handle a high volume of requests. As web applications grow and users demand more features, optimizing the speed of API responses becomes a key objective. One powerful technique that has been proven to improve the performance of API endpoints is parallel programming.
In this article, we’ll explore how parallel programming can significantly better the performance of an API endpoint, using a real-world example to demonstrate the impact.
By leveraging parallel execution, architects and developers can significantly reduce wait times, enhance system efficiency, and build more scalable and high-performing applications.
Let’s imagine we have an API endpoint that retrieves data from multiple external services, aggregates the results, and then returns the combined data to the client. Each external service may take a different amount of time to respond, and the API has to wait for all services to return their data before it can combine and send the final response.
Consider the following scenario:
In this case, the slowest service dictates the total response time, and users experience delays as they wait for the aggregated results.
This inefficiency not only slows down API response times but also impacts the overall performance and scalability of the system. For developers and architects, this means increased processing delays, higher resource consumption, and a suboptimal user experience. Parallel programming addresses these challenges by enabling faster execution, improved responsiveness, and greater flexibility in handling external dependencies.
Parallel programming is a technique that allows multiple tasks to be executed concurrently, reducing the overall time spent on processing. Rather than waiting for one service to respond before sending a request to the next one, parallel programming enables the API to send all the requests at the same time, and once all responses are received, it can aggregate and return the results.
Here’s how the process works with parallel programming:
This approach takes advantage of multi-threading or asynchronous programming to execute operations concurrently, rather than serially. The key benefit here is that the total time to process the requests is determined by the slowest response, rather than the sum of all individual response times.
For example:
In a parallel approach, the total time would be 700ms, since the longest task (Service 3) dictates the total execution time.
We have an eCommerce platform with an API endpoint that aggregates product information for an order that is being processed. As an order becomes arbitrarily large(to a limit of 5000), the requests made to the elastic index to query the product information will become larger and more expensive to iterate over due to the size & complexity of the indexed data. Faced with a situation where the index isn’t modifiable in structure to resolve the complexity, I opted instead to chunk up the requests to the elastic index in discrete chunks.
By introducing parallel programming, we were able to send all of the requests simultaneously. Using C#'s Task. When All() for asynchronous execution, we allowed the application to wait for all responses in parallel. Once all the data was retrieved for all chunks, the results were combined and returned in a single response.
The performance improvement was immediate and allowed Elastic to process more manageable requests simultaneously and aggregated all of the results at the end. This is made possible by the fact that, the results in different chunks do not depend on each other in any way. For small orders, there was no performance impact, but as orders became larger, closer to the size limit the performance gains were in excess of 50%.
By implementing parallel programming, we were able to significantly improve the performance of our API endpoint. The reduction in response time enhanced the user experience, especially during peak traffic periods, and allowed the platform to handle more requests concurrently.
Parallel programming is a powerful tool that can be leveraged to optimize API endpoints, especially when dealing with I/O-bound tasks, like making multiple API calls to external services. With the right implementation, parallelism can unlock significant performance gains, making applications faster, more responsive, and more scalable.