switching a big python library from setup.py to pyproject.toml
switching a big python library from setup.py to pyproject.toml
On December 29th, 2022, zanie opened this issue to suggest we (prefecthq/prefect) migrate from the python package setup of old (e.g. setup.py
, setup.cfg
) to the now-standard pyproject.toml
paradigm for defining project and build configuration.
Beyond being modern best practice, it's just a lot more convenient to have everything - all your pytest
, ruff
etc config - all the same place, and in case you don't trust my opinion, here's another reason: uv
's top level API (sync
, lock
etc) requires a pyproject.toml
🙂.
Here we are, over 2 years later 😅 but we've finally come around to it. prefect
is a large codebase that historically required these config files for linting, packaging, tests, etc:
.ruff.toml
requirements.txt
requirements-dev.txt
requirements-client.txt
requirements-otel.txt
requirements-markdown-tests.txt
setup.cfg
setup.py
versioneer.py
MANIFEST.in
and now, all of that is replaced by a single pyproject.toml
file
so, how'd we do it?
Let's walk through each major aspect of our package configuration and how we migrated it, focusing on the benefits we've seen.
package metadata, dependencies, and extras
Previously, we had a setup.py
that handled our package metadata and dependencies by reading from multiple requirements files:
from pathlib import Path
import versioneer
from setuptools import find_packages, setup
def read_requirements(file: str) -> list[str]:
requirements: list[str] = []
if Path(file).exists():
requirements = open(file).read().strip().split("\\n")
return requirements
client_requires = read_requirements("requirements-client.txt")
install_requires = read_requirements("requirements.txt")[1:] + client_requires
dev_requires = read_requirements("requirements-dev.txt")
otel_requires = read_requirements("requirements-otel.txt")
setup(
name="prefect",
description="Workflow orchestration and management.",
packages=find_packages(where="src"),
package_dir={"": "src"},
python_requires=">=3.9",
install_requires=install_requires,
extras_require={
"dev": dev_requires,
"otel": otel_requires,
"aws": "prefect-aws>=0.5.0",
# ... many more extras
},
)
This approach had several drawbacks:
- Multiple sources of truth for dependencies (easy to make inconsistent updates)
- Harder to track which dependencies were needed for what purpose (installing extras in CI)
Now with hatch
(a modern Python build tool), all of this lives directly in pyproject.toml
:
[project]
name = "prefect"
description = "Workflow orchestration and management."
requires-python = ">=3.9"
dependencies = [
"aiosqlite>=0.17.0,<1.0.0",
"alembic>=1.7.5,<2.0.0",
# ... more dependencies
]
[project.optional-dependencies]
aws = ["prefect-aws"]
# ... many more extras
[dependency-groups]
dev = ["..."] # all of our dev dependencies
Note that dev
is in the [dependency-groups]
table and not in [project.optional-dependencies]
- dependency groups are a modern python standard that allows you to group dependencies in a way that won't be exposed in published project metadata. This dev
group receives a little bit of special treatment from uv
(which we’ll see later on when we run our tests).
This consolidation creates a single source of truth for our dependencies, making it immediately clear what's required for each domain of the project. It also enables us to use modern tools like uv
that can automatically manage our environment based on this configuration.
managing integration packages with uv
workspaces
Prefect has a core library and multiple integration packages (AWS, GCP, Kubernetes, etc.) that live in the same repository. With our new setup, we've configured each integration package with its own pyproject.toml
, while using uv
's source references to link them together during development.
In our main pyproject.toml
, we define paths to all integration packages:
[tool.uv.sources]
prefect-aws = { path = "src/integrations/prefect-aws" }
prefect-azure = { path = "src/integrations/prefect-azure" }
prefect-gcp = { path = "src/integrations/prefect-gcp" }
# ... other integrations
And in each integration package's pyproject.toml
, we reference the main Prefect package:
# In src/integrations/prefect-aws/pyproject.toml
[tool.uv.sources]
prefect = { path = "../../../" }
This approach allows us to:
- Maintain separate package configurations for each integration
- Develop and test integrations against the local Prefect codebase
- Keep dependencies properly isolated while still having a unified development experience
Once you have the prefect
repo cloned, install dependencies for all integrations by running:
uv sync --all-extras
Because integrations depend on prefect
from PyPI, the old setup was especially bad when working with editable integrations locally. When developing you needed to install from an integration root, then back to project root and install prefect
editable to get changes from core. Beyond tedium, this often broke editors’ understanding of the resulting venv
in annoying ways.
version management
Version management used to be handled by a customized versioneer.py
with more configuration in setup.cfg
:
[versioneer]
VCS = git
style = pep440
versionfile_source = src/prefect/_version.py
versionfile_build = prefect/_version.py
version_regex = ^(\\d+\\.\\d+\\.\\d+(?:[a-zA-Z0-9]+(?:\\.[a-zA-Z0-9]+)*)?)$
When evaluating modern alternatives, we initially looked at hatch-vcs
, but found it didn't offer the same level of customization we needed to maintain continuity with our existing versioning scheme. Specifically, we needed to:
- Support our existing version format for backward compatibility
- Generate a version file with additional metadata (build date, git commit)
- Handle development versions with specific formatting
After exploring several options, we settled on versioningit
, which provides the flexibility we needed while integrating nicely with hatch
:
[tool.hatch.version]
source = "versioningit"
[tool.versioningit.vcs]
match = ["[0-9]*.[0-9]*.[0-9]*", "[0-9]*.[0-9]*.[0-9]*.dev[0-9]*"]
default-tag = "0.0.0"
[tool.versioningit.format]
distance = "{base_version}+{distance}.{vcs}{rev}"
dirty = "{base_version}+{distance}.{vcs}{rev}.dirty"
distance-dirty = "{base_version}+{distance}.{vcs}{rev}.dirty"
One particularly nice feature of versioningit
is the ability to write version information to a file during build time using a custom script. This allowed us to maintain backward compatibility with code that relied on our previous version information format.
custom script
import textwrap
from datetime import datetime, timezone
from pathlib import Path
from subprocess import CalledProcessError, check_output
from typing import Any
def write_build_info(
project_dir: str | Path, template_fields: dict[str, Any], params: dict[str, Any]
) -> None:
"""
Write the build info to the project directory.
"""
path = Path(project_dir) / params.get("path", "src/prefect/_version.py")
try:
git_hash = check_output(["git", "rev-parse", "HEAD"]).decode().strip()
except CalledProcessError:
git_hash = "unknown"
build_dt_str = template_fields.get(
"build_date", datetime.now(timezone.utc).isoformat()
)
version = template_fields.get("version", "unknown")
dirty = "dirty" in version
build_info = textwrap.dedent(
f"""\\
# Generated by versioningit
__version__ = "{version}"
__build_date__ = "{build_dt_str}"
__git_commit__ = "{git_hash}"
__dirty__ = {dirty}
"""
)
with open(path, "w") as f:
f.write(build_info)
[tool.versioningit.write]
method = { module = "write_build_info", value = "write_build_info", module-dir = "tools" }
path = "src/prefect/_build_info.py"
build configuration
Our old build configuration was split between setup.py
and setup.cfg
. Now it's all handled by hatch
:
[build-system]
requires = ["hatchling", "versioningit"]
build-backend = "hatchling.build"
[tool.hatch.build]
artifacts = ["src/prefect/_build_info.py", "src/prefect/server/ui"]
[tool.hatch.build.targets.sdist]
include = ["/src/prefect", "/README.md", "/LICENSE", "/pyproject.toml"]
This consolidation has significantly simplified our build process. We no longer need to maintain separate files for different aspects of the build, and the declarative nature of TOML makes it much easier to understand and modify the configuration.
One piece of nuance here is our inclusion of src/prefect/server/ui
in the sdist. This is a directory that is .gitignore
'd, but is generated at UI build time. We include it in the sdist so that after installing prefect
from PyPI users can run the dashboard with prefect server start
.
development tooling configuration
Previously, tool configurations were also scattered across multiple files. In our case, we had:
.ruff.toml
for linting configurationsetup.cfg
for assorted tooling configuration (pytest
,mypy
, etc).codespellrc
for spellcheck configuration viacodespell
Now (similar to the dependency declarations) they're all in one place:
[tool.mypy]
plugins = ["pydantic.mypy"]
ignore_missing_imports = true
follow_imports = "skip"
python_version = "3.9"
[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-rfEs --mypy-only-local-stub"
norecursedirs = ["*.egg-info"]
python_files = ["test_*.py", "bench_*.py"]
python_functions = ["test_*", "bench_*"]
markers = [
"service(arg): a service integration test. For example 'docker'",
"clear_db: marker to clear the database after test completion",
]
[tool.ruff]
...
[tool.codespell]
...
This consolidation makes it much easier to find and modify tooling configuration.
CI/CD improvements
One of the most pleasant improvements we've seen from this migration is in our CI/CD process.
Previously, some or all of our CI pipelines had to:
- Install multiple requirements files in the correct order
- Implement some relatively manual dependency caching
- Use complex multi-step processes for different test scenarios
Looking at our GitHub Actions workflows now, we've dramatically simplified dependency installation across all our test jobs. For example, the core of our python-tests.yaml
is now just:
jobs:
run-tests:
steps:
- name: Set up uv and Python ${{ matrix.python-version }}
uses: astral-sh/setup-uv@v5
with:
enable-cache: true
python-version: ${{ matrix.python-version }}
cache-dependency-glob: "pyproject.toml"
- name: Run tests
run: |
uv run pytest ${{ matrix.test-type.modules }} \
--numprocesses auto \
--maxprocesses 6 \
--dist worksteal \
--disable-docker-image-builds \
--exclude-service kubernetes \
--exclude-service docker \
--durations 26 \
All by itself, uv run
will inspect the project dependencies, install the dev
group by default (or say --no-dev
if you want) and then run pytest
according to our flags and pyproject.toml
config for pytest
.
We use slight variations of run
and sync
for scenarios having varying requirements:
uv run pytest
to simply run tests as configured by our project (requires no setup! ✨)uv sync --compile-bytecode --no-editable
to install deps for static analysisuv sync --group benchmark --compile-bytecode
to install deps for benchmarksuv sync --group markdown-docs
to install deps for documentation tests
It’s sufficient to say that uv
just makes everything easier, but perhaps most significantly our workflow files are now just much cleaner - which makes things easier to read and maintain.
Compare the before:
- name: Install dependencies
run: |
uv pip install ".[dev]"
uv pip install -r requirements-otel.txt
uv pip install -r requirements-markdown-tests.txt
To the after:
- name: Install dependencies
run: uv sync --group markdown-docs --extra otel
recap
This migration has delivered several concrete improvements:
- Project configuration consolidated from nearly a dozen files to 1
pyproject.toml
file - Simplified build configuration with
hatch
andversioningit
- Reliable one-step contributor setup with
uv sync
- Consistent and concise installation of specific dependency groups in CI
- Faster, more efficient workflows with modern tools
- Clean separation between core and integration packages while maintaining a unified development experience
For teams considering a similar migration, we recommend:
- Start with the PyPA guide on migrating to
pyproject.toml
- Choose tools that work well together - we found
hatch
,versioningit
, anduv
to be excellent choices for us - Consider how your build and dependency management affects different aspects of your workflow, from development to CI/CD
By embracing modern packaging standards and tools, we've not only simplified our configuration but also improved the development experience for our team and contributors.
Have any questions or believe there’s a mistake in this post? Get a hold of us on GitHub!
Bonus!
This has been a library-focused blog post, but check out this great YouTube video by Hynek Schlawack where he explains his app-focused approach to structuring projects with pyproject.toml
and uv
.