n8's blog

english to type-safe schemas

from the very beginning of LLM function calling, Jeremiah, Adam and I have been geeking out on ways of making schemas from hazy natural language.

it has truly never been easier to do so

for example

"a Movie with a title, release year, and a list of actors"

can be translated in a type-safe1 manner into

class Movie(BaseModel):
  title: str
  release_year: int
  actors: list[str]

... without ever having to write the schema2.

this model can then be used downstream to generate instances of a Movie, like:

[In]: "that one with the red or blue pill"
[Out]: title='The Matrix' release_year=1999 actors=['Keanu Reeves', 'Laurence Fishburne', 'Carrie-Anne Moss']

e2e in less than 50 lines

# /// script
# dependencies = ["marvin"]
# ///

from typing import Annotated, Any, TypedDict

from pydantic import ConfigDict, Field, create_model
import marvin

class FieldDefinition(TypedDict):
    type: Annotated[
        str,
        Field(
            description="a string that can be eval()'d into a Python type",
            examples=["str", "int", "list[str]", "dict[str, int]"],
        ),
    ]
    description: str
    properties: dict[str, Any]


class CreateModelInput(TypedDict):
    model_name: str
    fields: dict[str, FieldDefinition]
    description: str


create_model_input = marvin.cast(
    "a Movie with a title, release year, and a list of actors",
    target=CreateModelInput,
    instructions="suitable inputs for pydantic.create_model for the described schema",
)

Movie = create_model(
    create_model_input["model_name"],
    __config__=None,
    __doc__=create_model_input["description"],
    __module__=__name__,
    __validators__=None,
    __cls_kwargs__=None,
    **{
        k: (eval(v["type"]), Field(description=v["description"]))
        for k, v in create_model_input["fields"].items()
    },
)

print(marvin.cast("red or blue pill", target=Movie).model_dump_json(indent=2))

there are clearly problems with the above example in practice (cleanly handling references to child models, annotated types, etc), but tools like datamodel code generator do exist

if you have an OPENAI_API_KEY and uv installed you can simply copy the above code to your clipboard and run the following command to try this:

» pbpaste | uv run -
Installed 42 packages in 81ms
[01/19/25 01:01:17] DEBUG    marvin.engine.end_turn: Agent "Marvin" (28f6ddd2): Marking Task "Cast    end_turn.py:45
                             Task" successful with result {'model_name': 'Movie', 'fields': {'title':
                             {'type': 'str', 'description': 'The title of the movie', 'properties':
                             {}}, 'release_year': {'type': 'int', 'description': 'The release year of
                             the movie', 'properties': {}}, 'actors': {'type': 'list[str]',
                             'description': 'List of actors in the movie', 'properties': {}}},
                             'description': 'A schema for representing a movie with a title, release
                             year, and a list of actors.'}
{
  "additionalProperties": false,
  "description": "A schema for representing a movie with a title, release year, and a list of actors.",
  "properties": {
    "title": {
      "description": "The title of the movie",
      "title": "Title",
      "type": "string"
    },
    "release_year": {
      "description": "The release year of the movie",
      "title": "Release Year",
      "type": "integer"
    },
    "actors": {
      "description": "List of actors in the movie",
      "items": {
        "type": "string"
      },
      "title": "Actors",
      "type": "array"
    }
  },
  "required": [
    "title",
    "release_year",
    "actors"
  ],
  "title": "Movie",
  "type": "object"
}
[01/19/25 01:01:18] DEBUG    marvin.engine.end_turn: Agent "Marvin" (28f6ddd2): Marking Task "Cast    end_turn.py:45
                             Task" successful with result title='The Matrix' release_year=1999
                             actors=['Keanu Reeves', 'Laurence Fishburne', 'Carrie-Anne Moss']
title='The Matrix' release_year=1999 actors=['Keanu Reeves', 'Laurence Fishburne', 'Carrie-Anne Moss']

you might be asking, "who cares? why would anyone want to do this?"

well, I'd say "it depends". i just imagine that at some point in the relatively near future, AI will be a better designer of my data model than me (my job is to articulate intent)

rick rubin

hopefully this demo react/shadcn UI on top of this idea illustrates why it's exciting to me.

  1. with respect to the static type-checker, obviously eval is it's own problem

  2. you can get creative to get constrained types, but I'm not familiar with a single "best" way of doing this

#llm #open-source #pydantic #python #structured outputs