strings matter
I recently contributed to MarshalX/atproto, a Python client for the ATProtocol.
The goal: introduce optional Pydantic validation for ATProtocol-specific strings like handles, at-uris, NSIDs, and more. These formats are defined in the spec (this is my current understanding):
Handle
- Format:
handle
- Example:
alice.bsky.social
- Constraints: 2+ segments, alphanumeric/hyphens, 1-63 chars/segment, max 253 total, last segment no leading digit
AT-URI
- Format:
at-uri
- Example:
at://alice.bsky.social/app.bsky.feed.post/...
- Constraints:
at://
+ handle/DID + optional/collection/rkey
; strict format, max ~8KB
DateTime
- Format:
datetime
- Example:
2024-11-24T06:02:00Z
- Constraints: ISO 8601/RFC 3339 with timezone, no
-00:00
NSID
- Format:
nsid
- Example:
app.bsky.feed.post
- Constraints: reversed domain structure, lowercase alphanums/hyphens
TID
- Format:
tid
- Example:
3jxtb5w2hkt2m
- Constraints: 13 chars [2-7a-z], bit constraint on first byte
Record Key
- Format:
record-key
- Example:
3jxtb5w2hkt2m
- Constraints: 1-512 chars [A-Za-z0-9._:~-], no "." or ".."
URI
- Format:
uri
- Example:
https://example.com/path
- Constraints: RFC 3986, letter scheme, no spaces, max ~8KB
DID:PLC
- Format:
did:plc
- Example:
did:plc:z72i7hdynmk6r22z27h6tvur
- Constraints: must start
did:plc:
, followed by 24 chars of base32
Why optional validation? Strict checks slow things down, so MarshalX suggested we make them opt-in. By default, models ignore these strict format checks. If you want them, you provide a context flag to enforce validation.
an illustrative example:
from pydantic import BaseModel, BeforeValidator, ValidationInfo
from typing import Annotated, Mapping
PLS_BE_SERIOUS = "i am being so serious rn"
def maybe_validate_bespoke_format(
v: str, info: ValidationInfo
) -> str:
if (
isinstance(info.context, Mapping)
and info.context.get(PLS_BE_SERIOUS)
and "lol" in v.lower()
):
raise ValueError("this is serious business")
return v
class BskyModel(BaseModel):
handle: Annotated[str, BeforeValidator(maybe_validate_bespoke_format)]
# Without context: no strict check
BskyModel.model_validate({"handle": "alice.bsky.social"}) # passes
# With context: triggers strict check
BskyModel.model_validate(
{"handle": "alice.bsky.social"}, context={PLS_BE_SERIOUS: True}
) # passes
BskyModel.model_validate(
{"handle": "lol @ whatever"}, context={PLS_BE_SERIOUS: True}
)
# raises ValidationError
the actual validators were a little more involved...
i wrote about this more here, i think i wanna move writing here and refactor that place to just be zen mode :)