Evaluate with
Selene 1
Run evals with the world’s best LLM-as-a-Judge. Define your evaluation criteria and get precise judgments on how your AI apps are performing.
A new standard for AI evaluations
State-of-the-art models
Selene outperforms frontier models on commonly-used evaluation benchmarks, making it the most accurate and reliable model for evaluation.
Customize to
your use case
Make your evals more fine-grained, format your score as you wish, and fit eval criteria to your use case with few-shots in our Alignment Platform.
Accurate scores, actionable critiques
Designed for straightforward integration into existing workflows. Use our API to generate accurate eval scores with actionable critiques.
Introducing Selene 1: the world’s best LLM-as-a-Judge

Evaluating with Selene
Custom metrics
Use our Alignment Platform to easily align eval prompts to your custom use case.

Pre-built metrics
Use these to get started on common eval use cases, such as detecting hallucinations in RAG applications or comparing answers to a ground truth.