Evaluate with
Selene 1

Run evals with the world’s best LLM-as-a-Judge. Define your evaluation criteria and get precise judgments on how your AI apps are performing.

A new standard for AI evaluations

01

State-of-the-art
models

Selene outperforms frontier models on commonly-used evaluation benchmarks, making it the most accurate and reliable model for evaluation.

Decorative video with abstract shapes
02

Customize to
your use case

Make your evals more fine-grained, format your score as you wish, and fit eval criteria to your use case with few-shots in our Alignment Platform.

Decorative video with abstract shapes
03

Accurate scores, actionable critiques

Designed for straightforward integration into existing workflows. Use our API to generate accurate eval scores with actionable critiques.

Decorative video with abstract shapes

Introducing Selene 1: the world’s best LLM-as-a-Judge

Evaluating with Selene

Custom metrics

Use our Alignment Platform to easily align eval prompts to your custom use case.

Read more in our docs
Read more in our docs

Pre-built metrics

Use these to get started on common eval use cases, such as detecting hallucinations in RAG applications or comparing answers to a ground truth.

Relevance
Correctness
Helpfulness
Faithfulness
Logical
coherence
Conciseness
Read more in our docs
Read more in our docs

Boost your GenAI accuarcy

Run evals with Selene 1
Custom eval metric deployment
on the Alignment Platform
Free credits &
usage-based pricing
Docs & guides