Using async to overwhelm downstream servers

posted: September 5, 2020

tl;dr: Sometimes async code just overloads downstream servers or a service, causing more problems than it is worth...

At my last company, Uprising Technology, we used almost exclusively Python in the back end of our web app. Some people with a cursory exposure to Python think that it is a synchronous programming language, but there are actually multiple ways to write asynchronous (async) code in Python. Python has async and await keywords, coroutines, multiple async frameworks (including asyncio, which is built into the standard library), and several ways to do multithreading and even multiprocessing. I think it’s fair to say that the majority of Python code in the world right now is written in a synchronous fashion, but there is certainly async Python code out there in the wild.

One of the simplest data mining jobs that we had at Uprising was a Python script I wrote which would hit up a third party service’s API for fresh information about people who were already in the Uprising database. In exchange for the monthly fee we paid the third party, we received a fixed number of API credits that we could use over the course of a month. We used some of these credits in our daily load jobs, to obtain data from the third party when brand new people showed up in the Uprising database. Then at the end of the month we would run the script I wrote, to use up almost all of our remaining monthly API credits by refreshing the data for existing people. Once the calendar advanced to a new month, our credits were reset to our monthly allotment. To receive maximum value from the credits, we had to use them up before month’s end.

The monthly script was a very simple synchronous job. It allowed the person running it to specify which database records to refresh. It went through the records one at a time, calling the third party API using the popular Python requests library, receiving a response, parsing it, and then updating the Uprising database record before moving onto the next one. It had exception handling to deal with all the API errors that we had seen. It was slow but reliable, typically taking six to twelve hours to use up the remaining monthly credits, with the main bottleneck by far being the response time of the third party API. But since it was run on the last day or second-to-last day of the month and no one had to babysit it, it really didn’t matter how long it took to run to completion, just so long as it did.

There came a point where the third party data vendor enhanced their service by adding some additional data that we wanted to load into our database, so the job needed a minor enhancement. We had a relatively new developer who wanted to take on this assignment to gain some experience dealing with databases and third party APIs. This developer also was eager to learn and put into practice more about Python, and decided that the job could also be sped up considerably by making it asynchronous, and doing multiple API calls concurrently. The job was rewritten to do so by using a Python multithreading framework. I think it’s fair to say that this was the developer’s first multithreaded program ever.

A program that does multithreading to achieve asynchronous behavior is considerably more complex than a synchronous, single-threaded program. Python has clean, simple syntax for many operations, but not for multithreading. The source code complexity multiplied. The multithreaded program was indeed faster. The big problem, however, was that it was now unreliable. It might run for thirty minutes or an hour and then crash. By issuing many concurrent requests to the third party API, it was causing the API to fail in unforeseen ways. Debugging multithreaded programs is notoriously difficult. We couldn’t get the job to run to completion reliably. Worse, because it had to be restarted numerous times, we were failing to achieve our goal of using up all our credits by month end.

To make it reliable, we could have tried to fix all the problems in the multithreaded job, or we could have reverted to the original single threaded job and added the needed functionality to it. The quickest hack, however, was to effectively make it single threaded by setting the thread pool size to one. This kept all the complexity of the multithreading framework in the source code and the job ran no faster than the original job, but at least it ran to completion.

Asynchronous code and multithreading certainly have their uses, and can be the best way to solve certain problems. But sometimes, especially in client systems calling a downstream server or service, they can overwhelm the downstream server or service with requests, effectively just moving the overall system bottleneck downstream. Sometimes the better solution is just to issue one request at a time and wait for the response to come back. This allows the downstream service to throttle the overall pace of the job to a level that it can handle.