Library of Evaluators to automate quality control
LLM as a Judge
Run mass experiments to compare large numbers of different prompts with different configurations
Docs
Use orq.ai's pre-defined Evaluators in your Playgrounds and Experiments to automatically evaluate quality and correctness of your Gen AI use cases
Docs
Leverage the power of LLM's to evaluate the outcomes of large experiments to automatically classify, evaluate, and judge the quality of outcomes.
Docs
Analyze granular logs of your experiments with a full breakdown of your transactions regarding cost, quality and performance
Docs
Use Function Calling and Tools in large-scale experiments to generate structured outcomes and evaluate them using our built-in JSON and JSON schema evaluators
Docs
Generative AI Collaboration Platform
Full transparency on quality, performance and cost
Available as stand-alone module for offline experiments
No code operations
Collaborate with domain experts and product management
Seamlessly integrated workflow
Export capabilities for analysis and BI