Most Used Tags
Cross-harness benchmarking tool for generating and comparing AI model instructions.
Compare multiple ARC-AGI benchmark runs to track performance changes.
A marketplace for Claude Code plugins that benchmark your setup against external standards.
Generate detailed reports from ARC-AGI benchmark runs, showcasing scores and performance analysis.
Execute benchmark runs against ARC-AGI games using Claude Code as the agent.
Set up the ARC-AGI benchmarking environment with ease.
Explore available ARC-AGI environments, view game details, and check historical scores.