english to type-safe schemas
from the very beginning of LLM function calling, Jeremiah, Adam and I have been geeking out on ways of making schemas from hazy natural language.
it has truly never been easier to do so
for example
"a Movie with a title, release year, and a list of actors"
can be translated in a type-safe1 manner into
class Movie(BaseModel):
title: str
release_year: int
actors: list[str]
... without ever having to write the schema2.
this model can then be used downstream to generate instances of a Movie
, like:
[In]: "that one with the red or blue pill"
[Out]: title='The Matrix' release_year=1999 actors=['Keanu Reeves', 'Laurence Fishburne', 'Carrie-Anne Moss']
e2e in less than 50 lines
# /// script
# dependencies = ["marvin"]
# ///
from typing import Annotated, Any, TypedDict
from pydantic import ConfigDict, Field, create_model
import marvin
class FieldDefinition(TypedDict):
type: Annotated[
str,
Field(
description="a string that can be eval()'d into a Python type",
examples=["str", "int", "list[str]", "dict[str, int]"],
),
]
description: str
properties: dict[str, Any]
class CreateModelInput(TypedDict):
model_name: str
fields: dict[str, FieldDefinition]
description: str
create_model_input = marvin.cast(
"a Movie with a title, release year, and a list of actors",
target=CreateModelInput,
instructions="suitable inputs for pydantic.create_model for the described schema",
)
Movie = create_model(
create_model_input["model_name"],
__config__=None,
__doc__=create_model_input["description"],
__module__=__name__,
__validators__=None,
__cls_kwargs__=None,
**{
k: (eval(v["type"]), Field(description=v["description"]))
for k, v in create_model_input["fields"].items()
},
)
print(marvin.cast("red or blue pill", target=Movie).model_dump_json(indent=2))
there are clearly problems with the above example in practice (cleanly handling references to child models, annotated types, etc), but tools like datamodel code generator do exist
if you have an OPENAI_API_KEY
and uv
installed you can simply copy the above code to your clipboard and run the following command to try this:
» pbpaste | uv run -
Installed 42 packages in 81ms
[01/19/25 01:01:17] DEBUG marvin.engine.end_turn: Agent "Marvin" (28f6ddd2): Marking Task "Cast end_turn.py:45
Task" successful with result {'model_name': 'Movie', 'fields': {'title':
{'type': 'str', 'description': 'The title of the movie', 'properties':
{}}, 'release_year': {'type': 'int', 'description': 'The release year of
the movie', 'properties': {}}, 'actors': {'type': 'list[str]',
'description': 'List of actors in the movie', 'properties': {}}},
'description': 'A schema for representing a movie with a title, release
year, and a list of actors.'}
{
"additionalProperties": false,
"description": "A schema for representing a movie with a title, release year, and a list of actors.",
"properties": {
"title": {
"description": "The title of the movie",
"title": "Title",
"type": "string"
},
"release_year": {
"description": "The release year of the movie",
"title": "Release Year",
"type": "integer"
},
"actors": {
"description": "List of actors in the movie",
"items": {
"type": "string"
},
"title": "Actors",
"type": "array"
}
},
"required": [
"title",
"release_year",
"actors"
],
"title": "Movie",
"type": "object"
}
[01/19/25 01:01:18] DEBUG marvin.engine.end_turn: Agent "Marvin" (28f6ddd2): Marking Task "Cast end_turn.py:45
Task" successful with result title='The Matrix' release_year=1999
actors=['Keanu Reeves', 'Laurence Fishburne', 'Carrie-Anne Moss']
title='The Matrix' release_year=1999 actors=['Keanu Reeves', 'Laurence Fishburne', 'Carrie-Anne Moss']
you might be asking, "who cares? why would anyone want to do this?"
well, I'd say "it depends". i just imagine that at some point in the relatively near future, AI will be a better designer of my data model than me (my job is to articulate intent)
hopefully this demo react/shadcn UI on top of this idea illustrates why it's exciting to me.