n8's blog

switching a big python library from setup.py to pyproject.toml

switching a big python library from setup.py to pyproject.toml

On December 29th, 2022, zanie opened this issue to suggest we (prefecthq/prefect) migrate from the python package setup of old (e.g. setup.py, setup.cfg) to the now-standard pyproject.toml paradigm for defining project and build configuration.

Beyond being modern best practice, it's just a lot more convenient to have everything - all your pytest, ruff etc config - all the same place, and in case you don't trust my opinion, here's another reason: uv's top level API (sync, lock etc) requires a pyproject.toml 🙂.

Here we are, over 2 years later 😅 but we've finally come around to it. prefect is a large codebase that historically required these config files for linting, packaging, tests, etc:

and now, all of that is replaced by a single pyproject.toml file

so, how'd we do it?

Let's walk through each major aspect of our package configuration and how we migrated it, focusing on the benefits we've seen.

package metadata, dependencies, and extras

Previously, we had a setup.py that handled our package metadata and dependencies by reading from multiple requirements files:

from pathlib import Path
import versioneer
from setuptools import find_packages, setup

def read_requirements(file: str) -> list[str]:
    requirements: list[str] = []
    if Path(file).exists():
        requirements = open(file).read().strip().split("\\n")
    return requirements

client_requires = read_requirements("requirements-client.txt")
install_requires = read_requirements("requirements.txt")[1:] + client_requires
dev_requires = read_requirements("requirements-dev.txt")
otel_requires = read_requirements("requirements-otel.txt")

setup(
    name="prefect",
    description="Workflow orchestration and management.",
    packages=find_packages(where="src"),
    package_dir={"": "src"},
    python_requires=">=3.9",
    install_requires=install_requires,
    extras_require={
        "dev": dev_requires,
        "otel": otel_requires,
        "aws": "prefect-aws>=0.5.0",
        # ... many more extras
    },
)

This approach had several drawbacks:

Now with hatch (a modern Python build tool), all of this lives directly in pyproject.toml:

[project]
name = "prefect"
description = "Workflow orchestration and management."
requires-python = ">=3.9"
dependencies = [
    "aiosqlite>=0.17.0,<1.0.0",
    "alembic>=1.7.5,<2.0.0",
    # ... more dependencies
]

[project.optional-dependencies]
aws = ["prefect-aws"]
# ... many more extras

[dependency-groups]
dev = ["..."] # all of our dev dependencies

Note that dev is in the [dependency-groups] table and not in [project.optional-dependencies] - dependency groups are a modern python standard that allows you to group dependencies in a way that won't be exposed in published project metadata. This dev group receives a little bit of special treatment from uv (which we’ll see later on when we run our tests).

This consolidation creates a single source of truth for our dependencies, making it immediately clear what's required for each domain of the project. It also enables us to use modern tools like uv that can automatically manage our environment based on this configuration.

managing integration packages with uv workspaces

Prefect has a core library and multiple integration packages (AWS, GCP, Kubernetes, etc.) that live in the same repository. With our new setup, we've configured each integration package with its own pyproject.toml, while using uv's source references to link them together during development.

In our main pyproject.toml, we define paths to all integration packages:

[tool.uv.sources]
prefect-aws = { path = "src/integrations/prefect-aws" }
prefect-azure = { path = "src/integrations/prefect-azure" }
prefect-gcp = { path = "src/integrations/prefect-gcp" }
# ... other integrations

And in each integration package's pyproject.toml, we reference the main Prefect package:

# In src/integrations/prefect-aws/pyproject.toml
[tool.uv.sources]
prefect = { path = "../../../" }

This approach allows us to:

  1. Maintain separate package configurations for each integration
  2. Develop and test integrations against the local Prefect codebase
  3. Keep dependencies properly isolated while still having a unified development experience

Once you have the prefect repo cloned, install dependencies for all integrations by running:

uv sync --all-extras

Because integrations depend on prefect from PyPI, the old setup was especially bad when working with editable integrations locally. When developing you needed to install from an integration root, then back to project root and install prefect editable to get changes from core. Beyond tedium, this often broke editors’ understanding of the resulting venv in annoying ways.

version management

Version management used to be handled by a customized versioneer.py with more configuration in setup.cfg:

[versioneer]
VCS = git
style = pep440
versionfile_source = src/prefect/_version.py
versionfile_build = prefect/_version.py
version_regex = ^(\\d+\\.\\d+\\.\\d+(?:[a-zA-Z0-9]+(?:\\.[a-zA-Z0-9]+)*)?)$

When evaluating modern alternatives, we initially looked at hatch-vcs, but found it didn't offer the same level of customization we needed to maintain continuity with our existing versioning scheme. Specifically, we needed to:

  1. Support our existing version format for backward compatibility
  2. Generate a version file with additional metadata (build date, git commit)
  3. Handle development versions with specific formatting

After exploring several options, we settled on versioningit, which provides the flexibility we needed while integrating nicely with hatch:

[tool.hatch.version]
source = "versioningit"

[tool.versioningit.vcs]
match = ["[0-9]*.[0-9]*.[0-9]*", "[0-9]*.[0-9]*.[0-9]*.dev[0-9]*"]
default-tag = "0.0.0"

[tool.versioningit.format]
distance = "{base_version}+{distance}.{vcs}{rev}"
dirty = "{base_version}+{distance}.{vcs}{rev}.dirty"
distance-dirty = "{base_version}+{distance}.{vcs}{rev}.dirty"

One particularly nice feature of versioningit is the ability to write version information to a file during build time using a custom script. This allowed us to maintain backward compatibility with code that relied on our previous version information format.

custom script
import textwrap
from datetime import datetime, timezone
from pathlib import Path
from subprocess import CalledProcessError, check_output
from typing import Any

def write_build_info(
    project_dir: str | Path, template_fields: dict[str, Any], params: dict[str, Any]
) -> None:
    """
    Write the build info to the project directory.
    """
    path = Path(project_dir) / params.get("path", "src/prefect/_version.py")

    try:
        git_hash = check_output(["git", "rev-parse", "HEAD"]).decode().strip()
    except CalledProcessError:
        git_hash = "unknown"

    build_dt_str = template_fields.get(
        "build_date", datetime.now(timezone.utc).isoformat()
    )
    version = template_fields.get("version", "unknown")
    dirty = "dirty" in version

    build_info = textwrap.dedent(
        f"""\\
            # Generated by versioningit
            __version__ = "{version}"
            __build_date__ = "{build_dt_str}"
            __git_commit__ = "{git_hash}"
            __dirty__ = {dirty}
            """
    )

    with open(path, "w") as f:
        f.write(build_info)
[tool.versioningit.write]
method = { module = "write_build_info", value = "write_build_info", module-dir = "tools" }
path = "src/prefect/_build_info.py"

build configuration

Our old build configuration was split between setup.py and setup.cfg. Now it's all handled by hatch:

[build-system]
requires = ["hatchling", "versioningit"]
build-backend = "hatchling.build"

[tool.hatch.build]
artifacts = ["src/prefect/_build_info.py", "src/prefect/server/ui"]

[tool.hatch.build.targets.sdist]
include = ["/src/prefect", "/README.md", "/LICENSE", "/pyproject.toml"]

This consolidation has significantly simplified our build process. We no longer need to maintain separate files for different aspects of the build, and the declarative nature of TOML makes it much easier to understand and modify the configuration.

One piece of nuance here is our inclusion of src/prefect/server/ui in the sdist. This is a directory that is .gitignore'd, but is generated at UI build time. We include it in the sdist so that after installing prefect from PyPI users can run the dashboard with prefect server start.

development tooling configuration

Previously, tool configurations were also scattered across multiple files. In our case, we had:

Now (similar to the dependency declarations) they're all in one place:

[tool.mypy]
plugins = ["pydantic.mypy"]
ignore_missing_imports = true
follow_imports = "skip"
python_version = "3.9"

[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-rfEs --mypy-only-local-stub"
norecursedirs = ["*.egg-info"]
python_files = ["test_*.py", "bench_*.py"]
python_functions = ["test_*", "bench_*"]
markers = [
    "service(arg): a service integration test. For example 'docker'",
    "clear_db: marker to clear the database after test completion",
]

[tool.ruff]
...

[tool.codespell]
...

This consolidation makes it much easier to find and modify tooling configuration.

CI/CD improvements

One of the most pleasant improvements we've seen from this migration is in our CI/CD process.

Previously, some or all of our CI pipelines had to:

  1. Install multiple requirements files in the correct order
  2. Implement some relatively manual dependency caching
  3. Use complex multi-step processes for different test scenarios

Looking at our GitHub Actions workflows now, we've dramatically simplified dependency installation across all our test jobs. For example, the core of our python-tests.yaml is now just:

jobs:
  run-tests:
    steps:
      - name: Set up uv and Python ${{ matrix.python-version }}
        uses: astral-sh/setup-uv@v5
        with:
          enable-cache: true
          python-version: ${{ matrix.python-version }}
          cache-dependency-glob: "pyproject.toml"

      - name: Run tests
        run: |
          uv run pytest ${{ matrix.test-type.modules }} \
          --numprocesses auto \
          --maxprocesses 6 \
          --dist worksteal \
          --disable-docker-image-builds \
          --exclude-service kubernetes \
          --exclude-service docker \
          --durations 26 \

All by itself, uv run will inspect the project dependencies, install the dev group by default (or say --no-dev if you want) and then run pytest according to our flags and pyproject.toml config for pytest.

We use slight variations of run and sync for scenarios having varying requirements:

It’s sufficient to say that uv just makes everything easier, but perhaps most significantly our workflow files are now just much cleaner - which makes things easier to read and maintain.

Compare the before:

- name: Install dependencies
  run: |
    uv pip install ".[dev]"
    uv pip install -r requirements-otel.txt
    uv pip install -r requirements-markdown-tests.txt

To the after:

- name: Install dependencies
  run: uv sync --group markdown-docs --extra otel

recap

This migration has delivered several concrete improvements:

For teams considering a similar migration, we recommend:

  1. Start with the PyPA guide on migrating to pyproject.toml
  2. Choose tools that work well together - we found hatch, versioningit, and uv to be excellent choices for us
  3. Consider how your build and dependency management affects different aspects of your workflow, from development to CI/CD

By embracing modern packaging standards and tools, we've not only simplified our configuration but also improved the development experience for our team and contributors.

Have any questions or believe there’s a mistake in this post? Get a hold of us on GitHub!

Bonus!

This has been a library-focused blog post, but check out this great YouTube video by Hynek Schlawack where he explains his app-focused approach to structuring projects with pyproject.toml and uv.

#open-source #python