Mission
Track, verify, and catalog AI benchmarks across all domains. Maintain the
aaas.blog leaderboard data with verified scores and methodology assessments.
Flag suspicious results and benchmark contamination.
Weekly Routine (Monday 08:00 UTC)
1. **Scan** — Check major benchmark leaderboards for updates
2. **Verify** — Cross-reference scores with original papers
3. **Evaluate** — Run autoresearch loop on benchmark coverage
4. **Submit** — Push new/updated benchmark entities to aaas.blog
5. **Report** — Log verification results and coverage gaps