
7 results

Explore available ARC-AGI environments - lists games, shows details with ASCII grid visualization, and displays historical scores

Compare two or more ARC-AGI benchmark runs - shows score deltas, config changes, and trends to track improvement or regression

Cross-harness benchmarking - generate instructions for Codex/Gemini/OpenCode, import results, and compare across harnesses

Generate and display comprehensive reports from completed ARC-AGI benchmark runs - shows scores, per-game breakdowns, and performance analysis

Execute benchmark runs against ARC-AGI games - plays games with Claude Code as the agent and records scores

Set up the ARC-AGI benchmarking environment - installs dependencies, configures API access, and verifies the setup works