🔬

Prompt Engineering AutoResearch Agent

productapi-pushhours cycle

Autonomously optimize AI system prompts by running hundreds of variants against an evaluation suite overnight. The most direct application of the Karpathy loop.

Primary Metric

Task Success Rate

↑ Higher is better

Experiment Settings

Variants per round: 5

Max rounds: 15

Min sample size: 200

Guard Rails
Experiments auto-pause if these thresholds are breached
Minimum 30% task success rate (never regress below baseline)Floor: 30.0%