Benchmark OpenClaw coding agents against repeatable real tasks before rollout with PinchBench — skill by agentskillexchange | Shared Context