Drawbacks of installing source distributions (sdist) and how to improve reproducibility

Scope and audience

This page contains relevant information for package consumers (people installing third-party projects that may have been built with Setuptools). It explains why installing from Source Distributions can be less predictable than installing from wheels, and contains tips on how to improve installation-time reproducibility. It does not describe how to build packages with Setuptools, nor is it a statement of policy about what publishers must do [1].

The sdist format was one of the first packaging formats to be created by the Python community (predating the advent of wheel). Although very useful today to distribute and share Python libraries and applications, sdists are notoriously difficult to work with in circumstances that require high build reproducibility and tolerance to disruptions.

This guide reviews the concept of sdist, highlights its potential uses and drawbacks and explores potential practices to improve build reproducibility when relying on sdists.

What is an sdist?

You can read more about the sdist format and its wheel counterpart in Package Formats, but for the sake of this document an sdist can be considered a simple .tar.gz archive that contains all the files necessary to build a Python project that later will be installed in the end-user’s environment.

The most defining characteristic of the sdist format is its platform-independence, as the distributions do not include binary executable files. This format is very flexible and, although usually composed by a simple copy of the source code files with some extra metadata files added, it can also include platform-independent code automatically generated during the build phase [2].

When is an sdist useful?

Sometimes it can be tricky to distribute Python packages that contain binary extensions, especially when they are built for platforms that do not define a cross-version stable ABI. Moreover package indexes like PyPI may restrict their offer to a handful of well-known platforms. Finally, for certain edge cases, the build process may require machine specific parameters.

In this context, distributing code via sdists becomes a valuable fallback. It allows users in other platforms to access the source code and attempt to recompile the extensions locally.

What are the drawbacks of an sdist?

Despite their usefulness, working with sdists can be challenging. One major difficulty is reconstructing a compatible build environment in which the sdist can be processed into a wheel, especially when it comes to build dependencies.

While PEP 518 introduced a standard for declaring build dependencies distributed as Python packages (e.g. via PyPI), many projects also rely on non-Python dependencies, such as compilers and binary system-level libraries, that are not declared as a standard metadata. These dependencies can vary significantly across systems and its installation is often not automated and undocumented, i.e., simply assumed to be present.

Another issue is tooling drift: even if a project was originally buildable from its sdist, changes in the build dependencies (e.g., updates, deprecations and security fixes) can break compatibility over time [3]. This is a natural tendency of software systems and especially true for older projects.

Therefore, mission-critical systems and environments that cannot afford unforeseen/unintended interruptions should not rely on sdists. If your project or product requires high reliability and minimal disruption, you should adapt your workflow to increase resiliency and reproducibility or disallow sdists all together.

How to improve reproducibility in your workflow and avoid sdist drawbacks?

The first step to improve your workflow is to determine whether your workflow is directly or indirectly relying on sdists — and to prevent them from being compiled on demand.

Installers like pip or uv have options that help with this. For example, you can set the environment variable PIP_ONLY_BINARY with the value :all:, to prevent sdists from being installed (see the corresponding uv alternative). When this setting is enabled, any installation that fails will indicate which packages are not available as wheels, helping you pinpoint installations relying on sdists.

Once these packages are identified, the next step is to build them in a controlled environment. You can use pip's PIP_CONSTRAINT / PIP_BUILD_CONSTRAINT environment variables or the --build-constraint uv's CLI option to enforce specific versions of Python packages [4].

To further improve the consistency of OS-level tools and libraries, you can leverage your CI/CD provider’s configuration method, for example GitHub Workflows, Bitbucket Pipelines, GitLab CI/CD, Jenkins, CircleCI or Semaphore.

Alternatively, you can use containers (e.g. docker, nerdctl or podman), immutable operating system distributions or package managers (e.g. NixOS/Nix) or configuration management tools (e.g. Ansible, chef or puppet) to implement Infrastructure as Code (IaC) and ensure build environments are reproducible and version-controlled.

Consider caching the resulting wheels locally via “wheelhouse” directories or hosting them in private package indexes (such as devpi). This allows you to serve pre-built distributions internally, which reduces reliance on external sources, improves build stability, and often results in faster workflows as a welcome side effect.

Finally, it’s important to regularly audit your pinned or cached (build) dependencies for known security vulnerabilities and critical bug fixes and/or update them accordingly. This can be done through an out-of-band workflow — such as a scheduled job or a monthly CI/CD pipeline — that does not interfere with your mission-critical or low-tolerance environments. This approach ensures that your systems remain secure and up to date without compromising the stability of your primary workflows.

Footnotes