Simon Willison: Using DSPy to Evaluate LLM Prompts — Schema Without Column Names Causes Agent Retry Loops
Main idea
Willison used DSPy — a framework for systematically evaluating and optimizing AI prompts — to test system prompts for Datasette Agent's SQL feature. Key finding: schema listings that show only table names without column names force the model to guess columns blindly and enter error-retry loops.
Context
This directly builds on his work on the llm library and Datasette. Willison delegated the research to Claude Code, which tested improvements via GPT models against a live database with auto-generated gold-standard datasets and custom metrics.
Why it matters
A concrete, replicable finding with direct impact on SQL agent prompt engineering: explicitly including column names in schema listings dramatically reduces errors and retry loops. The methodology (DSPy + live database + gold datasets) is a template for systematic agent prompt testing.
Details / arguments
- DSPy acted as a testing harness: agents invoked actual Datasette tools against a live database
- Problem: 'do not call describe_table if you already have the information' caused column-name guessing and retry loops
- Solution: include column names in the schema listing or modify the guidance
- Research delegated to Claude Code, tested via GPT models with auto-generated gold datasets