Most Used Tags
AgentScope is a powerful framework for building intelligent agents with advanced capabilities.
Create multimedia content using the MiniMax AI platform.
Automatically evaluate and compare multiple AI models or agents without pre-existing test data.
Verify BibTeX files for accuracy by cross-checking references against academic databases.
Benchmark LLM reference recommendation accuracy by verifying citations against multiple databases.
Review academic papers for correctness, quality, and novelty using OpenJudge's pipeline.
Create custom evaluation pipelines for LLM outputs using OpenJudge.
Verify the authenticity of Claude API endpoints using weighted rule-based checks.
OpenJudge is an open-source evaluation framework for AI applications that drives continuous optimization through quality assessment.
Discover and recommend combinations of agent skills for complex tasks.
Build reinforcement learning reward signals using the OpenJudge framework.