asyncio

Scrapy has partial support for asyncio. After you install the asyncio reactor, you may use asyncio and asyncio-powered libraries in any coroutine.

Installing the asyncio reactor

To enable asyncio support, your TWISTED_REACTOR setting needs to be set to 'twisted.internet.asyncioreactor.AsyncioSelectorReactor', which is the default value.

If you are using AsyncCrawlerRunner or CrawlerRunner, you also need to install the AsyncioSelectorReactor reactor manually. You can do that using install_reactor():

install_reactor("twisted.internet.asyncioreactor.AsyncioSelectorReactor")

Handling a pre-installed reactor

twisted.internet.reactor and some other Twisted imports install the default Twisted reactor as a side effect. Once a Twisted reactor is installed, it is not possible to switch to a different reactor at run time.

If you configure the asyncio Twisted reactor and, at run time, Scrapy complains that a different reactor is already installed, chances are you have some such imports in your code.

You can usually fix the issue by moving those offending module-level Twisted imports to the method or function definitions where they are used. For example, if you have something like:

from twisted.internet import reactor


def my_function():
    reactor.callLater(...)

Switch to something like:

def my_function():
    from twisted.internet import reactor

    reactor.callLater(...)

Alternatively, you can try to manually install the asyncio reactor, with install_reactor(), before those imports happen.

Integrating Deferred code and asyncio code

Coroutine functions can await on Deferreds by wrapping them into asyncio.Future objects. Scrapy provides two helpers for this:

scrapy.utils.defer.deferred_to_future(d: Deferred[_T]) Future[_T][source]

Return an asyncio.Future object that wraps d.

This function requires AsyncioSelectorReactor to be installed.

When using the asyncio reactor, you cannot await on Deferred objects from Scrapy callables defined as coroutines, you can only await on Future objects. Wrapping Deferred objects into Future objects allows you to wait on them:

class MySpider(Spider):
    ...
    async def parse(self, response):
        additional_request = scrapy.Request('https://example.org/price')
        deferred = self.crawler.engine.download(additional_request)
        additional_response = await deferred_to_future(deferred)

Changed in version 2.14: This function no longer installs an asyncio loop if called before the Twisted asyncio reactor is installed. A RuntimeError is raised in this case.

scrapy.utils.defer.maybe_deferred_to_future(d: Deferred[_T]) Deferred[_T] | Future[_T][source]

Return d as an object that can be awaited from a Scrapy callable defined as a coroutine.

What you can await in Scrapy callables defined as coroutines depends on the value of TWISTED_REACTOR:

If you want to write code that uses Deferred objects but works with any reactor, use this function on all Deferred objects:

class MySpider(Spider):
    ...
    async def parse(self, response):
        additional_request = scrapy.Request('https://example.org/price')
        deferred = self.crawler.engine.download(additional_request)
        additional_response = await maybe_deferred_to_future(deferred)

Tip

If you don’t need to support reactors other than the default AsyncioSelectorReactor, you can use deferred_to_future(), otherwise you should use maybe_deferred_to_future().

Tip

If you need to use these functions in code that aims to be compatible with lower versions of Scrapy that do not provide these functions, down to Scrapy 2.0 (earlier versions do not support asyncio), you can copy the implementation of these functions into your own code.

Coroutines and futures can be wrapped into Deferreds (for example, when a Scrapy API requires passing a Deferred to it) using the following helpers:

scrapy.utils.defer.deferred_from_coro(o: Awaitable[_T]) Deferred[_T][source]
scrapy.utils.defer.deferred_from_coro(o: _T2) _T2

Convert a coroutine or other awaitable object into a Deferred, or return the object as is if it isn’t a coroutine.

scrapy.utils.defer.deferred_f_from_coro_f(coro_f: Callable[_P, Awaitable[_T]]) Callable[_P, Deferred[_T]][source]

Convert a coroutine function into a function that returns a Deferred.

The coroutine function will be called at the time when the wrapper is called. Wrapper args will be passed to it. This is useful for callback chains, as callback functions are called with the previous callback result.

scrapy.utils.defer.ensure_awaitable(o: Awaitable[_T], _warn: str | None = None) Awaitable[_T][source]
scrapy.utils.defer.ensure_awaitable(o: _T, _warn: str | None = None) Awaitable[_T]

Convert any value to an awaitable object.

For a Deferred object, use maybe_deferred_to_future() to wrap it into a suitable object. For an awaitable object of a different type, return it as is. For any other value, return a coroutine that completes with that value.

Added in version 2.14.

Enforcing asyncio as a requirement

If you are writing a component that requires asyncio to work, use scrapy.utils.asyncio.is_asyncio_available() to enforce it as a requirement. For example:

from scrapy.utils.asyncio import is_asyncio_available


class MyComponent:
    def __init__(self):
        if not is_asyncio_available():
            raise ValueError(
                f"{MyComponent.__qualname__} requires the asyncio support. "
                f"Make sure you have configured the asyncio reactor in the "
                f"TWISTED_REACTOR setting. See the asyncio documentation "
                f"of Scrapy for more information."
            )
scrapy.utils.asyncio.is_asyncio_available() bool[source]

Check if it’s possible to call asyncio code that relies on the asyncio event loop.

Added in version 2.14.

Currently this function is identical to scrapy.utils.reactor.is_asyncio_reactor_installed(): it returns True if the Twisted reactor that is installed is AsyncioSelectorReactor, returns False if a different reactor is installed, and raises a RuntimeError if no reactor is installed. In a future Scrapy version, when Scrapy supports running without a Twisted reactor, this function will also return True when running in that mode, so code that doesn’t directly require a Twisted reactor should use this function instead of is_asyncio_reactor_installed().

When this returns True, an asyncio loop is installed and used by Scrapy. It’s possible to call functions that require it, such as asyncio.sleep(), and await on asyncio.Future objects in Scrapy-related code.

When this returns False, a non-asyncio Twisted reactor is installed. It’s not possible to use asyncio features that require an asyncio event loop or await on asyncio.Future objects in Scrapy-related code, but it’s possible to await on Deferred objects.

scrapy.utils.reactor.is_asyncio_reactor_installed() bool[source]

Check whether the installed reactor is AsyncioSelectorReactor.

Raise a RuntimeError if no reactor is installed.

In a future Scrapy version, when Scrapy supports running without a Twisted reactor, this function won’t be useful for checking if it’s possible to use asyncio features, so the code that that doesn’t directly require a Twisted reactor should use scrapy.utils.asyncio.is_asyncio_available() instead of this function.

Changed in version 2.13: In earlier Scrapy versions this function silently installed the default reactor if there was no reactor installed. Now it raises an exception to prevent silent problems in this case.

Windows-specific notes

The Windows implementation of asyncio can use two event loop implementations, ProactorEventLoop (default) and SelectorEventLoop. However, only SelectorEventLoop works with Twisted.

Scrapy changes the event loop class to SelectorEventLoop automatically when you change the TWISTED_REACTOR setting or call install_reactor().

Note

Other libraries you use may require ProactorEventLoop, e.g. because it supports subprocesses (this is the case with playwright), so you cannot use them together with Scrapy on Windows (but you should be able to use them on WSL or native Linux).

Using custom asyncio loops

You can also use custom asyncio event loops with the asyncio reactor. Set the ASYNCIO_EVENT_LOOP setting to the import path of the desired event loop class to use it instead of the default asyncio event loop.

Switching to a non-asyncio reactor

If for some reason your code doesn’t work with the asyncio reactor, you can use a different reactor by setting the TWISTED_REACTOR setting to its import path (e.g. 'twisted.internet.epollreactor.EPollReactor') or to None, which will use the default reactor for your platform. If you are using AsyncCrawlerRunner or AsyncCrawlerProcess you also need to switch to their Deferred-based counterparts: CrawlerRunner or CrawlerProcess respectively.