5 results
Orchestrates blind skill evaluation across model tiers
Blind agent that executes a skill and produces output without knowledge of expected outcomes
Path to the test cases JSON file
Path to a JSON file containing test cases (array of { input, expectedOutcome })