7 minute read

This is a quick look at the performance and usability of Python asyncio. As a bit of a preface; about a decade ago I was a primarily a Python developer, I hadn’t touched Go yet. While asyncio existed, it’s use was uncommon. The usual way to achieve parallel tasks was with multiprocessing, maybe using Flask.

I’m now mainly a Go developer, but a job change means I occasionally work with Python code too. asyncio is common now and with my experience using virtual threads in goroutines I was excited to see how Python felt with coroutines, even if they aren’t virtual threads per se, can coroutines allow us to cleanly perform concurrent work within the confines of Python’s single threaded nature?

Synchronous Python

Consider this simple task; create a list of items using a pure Python workload, then send them over a network which is an io workload. In this case a list of strings with the number of steps the integers between 1 and 100,000 take to reach 1 using the Collatz conjecture, then send them to a HTTP server. This is derived from a real task which had a more complex Python workload:

from datetime import datetime
import requests


def collatz_steps(n):
    steps = 0
    while n > 1:
        steps += 1
        if n % 2 == 0:
            n /= 2
            continue
        n = n * 3 + 1
    return steps


def format_lines(n):
    for i in range(n):
        yield f'The collatz steps for {i} is {collatz_steps(i)}'


def send_lines(lines):
    with requests.Session() as session:
        for line in lines:
            with session.post('http://localhost', data=line) as result:
                result.raise_for_status()


def main():
    start = datetime.now()

    lines = [l for l in format_lines(100_000)]
    lines_end = datetime.now()
    print(f'Generated lines in {lines_end - start}')

    send_lines(lines)
    send_end = datetime.now()
    print(f'Sent lines in {send_end - lines_end}')

    print(f'Total time {send_end - start}')


if __name__ == '__main__':
    main()

# Generated lines in 0:00:00.842061
# Sent lines in 0:00:28.908400
# Total time 0:00:29.750461

We can see it takes about a second to make the strings and then 30 seconds to send them to the server one at a time, clearly this sequential approach is very inefficient.

Asynchronous transmission

Now we get to the state of the code I found, the network requests are performed in parallel with a semaphore limiting the concurrent requests:

from datetime import datetime
import asyncio
import aiohttp


def collatz_steps(n):
    steps = 0
    while n > 1:
        steps += 1
        if n % 2 == 0:
            n /= 2
            continue
        n = n * 3 + 1
    return steps


def format_lines(n):
    for i in range(n):
        yield f'The collatz steps for {i} is {collatz_steps(i)}'


async def send_worker(session, sem, line):
    async with sem, session.post('http://localhost', data=line) as response:
        response.raise_for_status


async def send_lines(lines):
    # The default ClientSession supports 100 concurrent connections
    async with aiohttp.ClientSession() as session:
        sem = asyncio.Semaphore(10)
        tasks = set()
        for line in lines:
            tasks.add(asyncio.create_task(send_worker(session, sem, line)))
        await asyncio.gather(*tasks)


async def main():
    start = datetime.now()

    lines = [l for l in format_lines(100_000)]
    lines_end = datetime.now()
    print(f'Generated lines in {lines_end - start}')

    await send_lines(lines)
    send_end = datetime.now()
    print(f'Sent lines in {send_end - lines_end}')

    print(f'Total time {send_end - start}')


if __name__ == '__main__':
    asyncio.run(main())

# Generated lines in 0:00:00.838362
# Sent lines in 0:00:07.010335
# Total time 0:00:07.848697

The time to send all the requests is now a mere 7 seconds, great!

Fully asynchronous

In my real case it took 30 minutes to produce the data and 90 minutes to send it all over the network. So I was keen to try to start sending results as soon as the first is available, and made this naive implementation using an asynchronous generator:

Getting it wrong

from datetime import datetime
import asyncio
import aiohttp


def collatz_steps(n):
    steps = 0
    while n > 1:
        steps += 1
        if n % 2 == 0:
            n /= 2
            continue
        n = n * 3 + 1
    return steps


async def format_lines(n):
    for i in range(n):
        yield f'The collatz steps for {i} is {collatz_steps(i)}'


async def send_worker(session, sem, line):
    async with sem, session.post('http://localhost', data=line) as response:
        response.raise_for_status


async def send_lines(lines):
    async with aiohttp.ClientSession() as session:
        sem = asyncio.Semaphore(10)
        tasks = set()
        async for line in lines:
            tasks.add(asyncio.create_task(send_worker(session, sem, line)))
        await asyncio.gather(*tasks)


async def main():
    start = datetime.now()
    await send_lines(format_lines(100_000))
    print(f'Total time: {datetime.now() - start}')

if __name__ == '__main__':
    asyncio.run(main())

# Total time: 0:00:08.583039

It takes longer? The reason is that coroutines don’t actually start after you create them. This might seem strange to a Go developer, but a coroutine only has a chance of starting when the single threaded event loop can switch to it, and this can only happen at an await statement. The next await after the coroutines are created is outside the loop at await asyncio.gather(*tasks), so actually we still only start sending results after we’ve generated them all. One should note that any await statement can switch to your coroutine, not just one which awaits it.

async def send_lines(lines):
    async with aiohttp.ClientSession() as session:
        sem = asyncio.Semaphore(10)
        tasks = set()
        async for line in lines:
            tasks.add(asyncio.create_task(send_worker(session, sem, line)))
        # none of the tasks have started yet
        await asyncio.gather(*tasks)

Correct, but slow

After some research, and attempting to smatter my code with await asyncio.sleep(0), I came to the following solution. Periodically we give new coroutines a chance to start, wait for at least one of them to finish (we can’t wait for at least none) and then continue making more. This also handles another problem which is if a coroutine raises an exception we will not find out until we try to gather the tasks. If we did not check for exceptions then we would have attempted to send every request before discovering that all of them failed.

async def send_lines(lines):
    async with aiohttp.ClientSession() as session:
        sem = asyncio.Semaphore(10)
        tasks = set()
        async for line in lines:
            tasks.add(asyncio.create_task(send_worker(session, sem, line)))

            if len(tasks) > 20:
                done, tasks = await asyncio.wait(
                    tasks,
                    return_when=asyncio.FIRST_COMPLETED
                )
                for task in done:
                    if task.exception():
                        raise task.exception()
        await asyncio.gather(*tasks)

# Total time: 0:00:08.187411

Alas, this is still slower than sequential generation with parallel transmission. I suspect the reason for this is a combination of two things:

  • The program saturated a CPU making the requests in parallel, so the best we could hope for is performance parity.
  • We’ve added additional processing by measuring tasks each iteration of the loop and periodically handling exceptions.

I can imagine there may be an example with sufficiently slow transmission, perhaps a much more remote server, where this would be faster than the asynchronous transmission example, but the ratio of generating time to transmitting time is going to be so small as to make it not worth this complex code implementation.

Conclusions

How about the syntax of asyncio; is it easy to use? Does it produce readable code?

It’s surprising to a Go developer that coroutines will not execute when they are created and that coroutines cannot execute until the currently executing coroutine yields to the event loop by an await statement. This is unlike Go where your goroutine could be interrupted at any point. Perhaps task switching is expensive, or this is a deliberate design to maintain the advantages to safety that come with python being single threaded, at least in the reference interpreter.

Exceptions not surfacing until awaiting a coroutine shouldn’t be surprising to a Go developer, we have the same situation with functions returning non-nil errors. A panic may stop the program right away but Go’s panics are not exceptions and exceptions are Python’s errors. Still, we have useful tools like channels, or errgroup.Group with context.Context to fail early without having to write extra code to check for errors.

I find asyncio pollutes your code a lot, for a call at the top of your stack to run in a coroutine, every function in the stack down to the call to asyncio.run must be declared as async def name(): and either be explicitly awaited or be in an async with, or async for, or other async scope. It might be better to start an event loop in send_lines rather than in global scope calling main, but in a program with more duties the event loop couldn’t be shared.

I think asyncio exposes you to too much of the runtime implementation and makes you do too much of the coroutine handling in your code. I did also try using an eager_task_factory but here you’re configuring the event loop, why am I configuring my runtime in the middle of my business logic?

What asyncio does do well is allowing basic io-like tasks to run in parallel while sticking to the mantra of Python being single threaded. None of the drawbacks of multiprocessing and no need to mention the global interpreter lock until this sentence. Python being single threaded is a big advantage in some ways, you can break invariants without needing a mutex, you can make system calls which are reserved for programs with one OS thread.

I think asyncio is a reasonable tool when used to yield from tasks that would otherwise block. Use it when you do IO or you sleep. Don’t use it for pure functions, don’t make pure functions async, and only make generators async if they do IO or sleep. These are all properties of the asynchronous transmission example that I naively tried to improve. If you are trying to make your whole program parallel then you probably shouldn’t be using Python. It’s not magic, but if you use it for the right purpose then as Paul Daniels said; You’ll like this… not a lot, but you’ll like it.

Comments