6 results
Read-only isolated agent that evaluates skill/agent execution quality
Compare skill scores against ideal benchmarks
View and manage Verdict auto-judge configuration
View score history and trends for skills
Evaluates the execution quality of any skill or agent using 7-dimension scoring with configurable rubrics