Evaluate with
Selene

Get precise judgments on your AI app's performance. Run evals with the most accurate LLM Judges on the market.

Selene models

Explore the right size and implementation methods
for your evaluation needs.
Optimized for speed
Selene 1 Mini
The best evaluation model of its size (8B). Suitable for running evals at inference time.
Industry-leading accuracy
Selene 1
The best model for evaluation on the market. Capable of accurately judging a wide variety of eval tasks, as well as adapting to custom eval criteria. Suitable for pre-production evals.
Cost
Intelligence

A new standard for AI evaluations

01

State-of-the-art models

Decorative video with abstract shapes

Selene outperforms frontier models on commonly-used evaluation benchmarks, making it the most accurate and reliable model for evaluation.

02

Customize to your use case

Decorative video with abstract shapes

Make your evals more fine-grained, format your score as you wish, and fit eval criteria to your use case with few-shots in our Eval Copilot (beta).

03

Accurate scores, actionable critiques

Designed for straightforward integration into existing workflows. Use our API to generate accurate eval scores with actionable critiques.

Decorative video with abstract shapes

Introducing Selene 1: the world’s best LLM-as-a-Judge

Evaluating with Selene

Custom metrics

Use Eval Copilot (beta) to easily align eval prompts to your custom use case.

Read more in our docs
Read more in our docs

Pre-built metrics

Use these to get started on common eval use cases, such as detecting hallucinations in RAG applications or comparing answers to a ground truth.

Relevance
Correctness
Helpfulness
Faithfulness
Logical coherence
Conciseness
Read more in our docs
Read more in our docs

Boost your GenAI accuracy

Run evals with Selene 1 and Selene Mini
Custom eval metric deployment
using Eval Copilot (beta)
Free credits & usage-based pricing
Docs & guides