how to use pydantic settings

10 Oct, 2024

note: i use os.getenv often, and my point is not that it's bad, it's just ∃ tools..

all hyperbole is for emphatic effect

...

each time I see code like this in the wild, I cry 1 tear. I've cried many tears

import os

CURRENT_USER = os.getenv("USER")
REDIS_HOST = os.getenv("REDIS_HOST")
REDIS_PORT = os.getenv("REDIS_PORT")

if not REDIS_HOST or not REDIS_PORT:
    raise ValueError("REDIS_HOST and REDIS_PORT must be set")

if not (OPENAI_API_KEY := os.getenv("OPENAI_API_KEY")):
    raise ValueError("OPENAI_API_KEY must be set")



print(f"""
Current user: {CURRENT_USER}
Redis host: {REDIS_HOST}
Redis port: {REDIS_PORT}
OpenAI API key: {OPENAI_API_KEY}
""")

yes, it does "work" (as long as your only stakeholders are your eyeballs right now)

Current user: nate
Redis host: localhost
Redis port: 6379
OpenAI API key: sk-yeah-right

and yes, i also love scripting and using raw os is fine to get things done in a pinch, but...

there's too much pain in the world for another being to embark on this defensive side-quest to get correct env vars values for the thing they actually care about. we may be authors of python, but we can have validated types to avoid ambiguous values. TFCTMTT

when I look at the above code, it feels like it's going to cause downstream clutter because I'm deferring silly questions like "am I sure REDIS_PORT is a valid integer?" that I can definitely rely on myself to answer concisely in application code, right?

uv pip install pydantic-settings

details if you want to follow along

# you _can_ use pip, but I am impatient
uv pip install pydantic-settings
export REDIS_HOST="localhost"
export REDIS_PORT="6379"
export OPENAI_API_KEY="sk-yeah-right"

The same code, but with pydantic_settings (scoops env vars by field name or alias):

from pydantic import Field, SecretStr
from pydantic_settings import BaseSettings


class Settings(BaseSettings):
    current_user: str = Field(alias="user")
    redis_host: str
    redis_port: int = Field(ge=0)
    openai_api_key: SecretStr

print(f"""
Current user: {(settings := Settings()).current_user}
Redis host: {settings.redis_host}
Redis port: {settings.redis_port}
OpenAI API key: {(k := settings.openai_api_key)} value: {k.get_secret_value()}
""")

can produce the same output, but now you have more confidence in values being correct

Current user: nate
Redis host: localhost
Redis port: 6379
OpenAI API key: ********** value: sk-yeah-right

because if they weren't at least the right type... we throw ValidationError immediately

» export REDIS_PORT=trustmebro

» python posts/auxiliary/python/pydantic_settings_example.py
Traceback (most recent call last):
  File "/Users/nate/github.com/zzstoatzz/alternatebuild.dev/posts/auxiliary/python/pydantic_settings_example.py", line 35, in <module>
    print(Settings().to_env_vars())  # type: ignore
          ^^^^^^^^^^
...
pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
redis_port
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='trustmebro', input_type=str]

identifying a few things we've implicitly gained here:

gained		why is that good?
type safety		immediate failure if env vars values can't be cast to expected types
validation		easily define ranges for integers or regex for strings via `Field`
co-location		all config in one place -- easy to trace from usage in app code
extensibility		can write methods on `Settings` to prepare config for our app

...

getting specific on the last point, say we need to dump these settings to env vars for a subprocess we want to start in our app (this leads into a discussion of serialization).

class Settings(BaseSettings):
    ...

    def to_env_vars(self) -> dict[str, str]:
        return {
            k.upper(): str(v)
            for k, v in self.model_dump().items()
            if v is not None
        }

print(Settings().to_env_vars())

{
    'CURRENT_USER': 'nate',
    'REDIS_HOST': 'localhost',
    'REDIS_PORT': '6379',
    'OPENAI_API_KEY': '**********'
}

"Why is the value of OPENAI_API_KEY masked? openai is throwing errors now!" 😡

calm, its ok. firstly, its probably good that our secret values are masked by default.

secondly, we can fix this by (once again) talking about annotated types.

...

let's assume that, for us, env vars should always be unmasked - besides, values can't really be recovered once masked and dumped to env vars. (perhaps in a different use case, we'd want to encrypt values while serializing, or write them in a specific format)

to do this, we can use pydantic's PlainSerializer to customize the serialization of our "secret type" values and selectively reveal those values as we see fit based on context:

def maybe_unmask(v: Secret[T], info: SerializationInfo) -> T | Secret[T]:
    if info.context.get("unmask"):
        return v.get_secret_value()
    return v


class Settings(BaseSettings):
    ...
    openai_api_key: Annotated[SecretStr, PlainSerializer(maybe_unmask)]

    def to_env_vars(self) -> dict[str, str]:
        return {
            k.upper(): str(v)
            for k, v in self.model_dump(context={"unmask": True}).items()
            if v is not None
        }

notice we're now passing context={"unmask": True} to model_dump when we decide to dump to env vars, since pydantic will provide this context as SerializationInfo to our maybe_unmask function we wrote to customize how our SecretStr is serialized.

this is cool because the serialization logic is decoupled, such that if we had SecretBytes or any Secret[T] values, we could similarly annotate the "secret type" with PlainSerializer(maybe_unmask) to contextually reveal values. You can apply this pattern to other fields, like controlling how datetimes are serialized to strings.

the updated example

# /// script
# dependencies = [
#     "pydantic-settings>=2",
# ]
# ///
from typing import Annotated, TypeVar

from pydantic import Field, PlainSerializer, Secret, SecretStr, SerializationInfo
from pydantic_settings import BaseSettings

T = TypeVar("T")


def maybe_unmask(v: Secret[T], info: SerializationInfo) -> T | Secret[T]:
    if info.context and info.context.get("unmask"):
        return v.get_secret_value()
    return v


class Settings(BaseSettings):
    current_user: str = Field(alias="user")
    redis_host: str
    redis_port: int = Field(ge=0)
    openai_api_key: Annotated[SecretStr, PlainSerializer(maybe_unmask)]

    def to_env_vars(self) -> dict[str, str]:
        return {
            k.upper(): str(v)
            for k, v in self.model_dump(context={"unmask": True}).items()
            if v is not None
        }


print(Settings().to_env_vars())  # type: ignore

Q: "what is this # /// script nonsense?"

A: its cool, its inline metadata from PEP 723

» uv run posts/auxiliary/python/pydantic_settings_example.py
Reading inline script metadata: posts/auxiliary/python/pydantic_settings_example.py
Installed 6 packages in 16ms
{'CURRENT_USER': 'nate', 'REDIS_HOST': 'localhost', 'REDIS_PORT': '6379', 'OPENAI_API_KEY': 'sk-yeah-right'}

I hope that now you too have become a pydantic-settings evangelist, and that you join me in shedding tears of sorrow for those consuming raw os.getenv values in the wild (like me sometimes).

...

P.S.

Sure, yes, sometimes this level of type safety and validation is overkill - however I've found that after maintaining large library and application codebases for a while:

The cost of adding a standarization layer later rather than sooner can be very high.

The default experience of using pydantic-settings is consistently sane.

The default experience of using raw os depends on individual developers (i.e insane).

and if you ever let me catch you using load_dotenv() I will stare at you like this

because look, we can just add .env as a source, or really any other source:

echo "THING_1_FROM_DOTENV=hello" >> .env
echo "THING_2_FROM_DOTENV=42" >> /tmp/.env

from pydantic_settings import BaseSettings, SettingsConfigDict

class Settings(BaseSettings):
    model_config = SettingsConfigDict(env_file=(".env", "/tmp/.env"))

    thing_1_from_dotenv: str
    thing_2_from_dotenv: int

assert Settings().model_dump() == {
    "thing_1_from_dotenv": "hello",
    "thing_2_from_dotenv": 42,
}

#open-source #pydantic #python #settings