Selene Mini: SOTA 8B LLM Judge, now available via API

Atla team

April 15, 2025

Today, we’re excited to announce that Selene Mini, our SOTA small language model-as-a-judge (SLMJ), is available via API. Since our open source release, Selene Mini has been downloaded over 30,000 times on Hugging Face and Ollama, demonstrating strong demand for compact, efficient evaluation models. For users who requested a hosted option, this release delivers: Selene Mini is now accessible via API with no infrastructure to manage.

At 8B parameters, Selene Mini offers an efficient alternative that is 2x faster and 3x cheaper than our flagship model, Selene 1. Users now have the flexibility to choose Selene Mini to run low latency evals, or Selene 1 to run evals with industry-leading accuracy.

✨ April offer

Selene Mini is available to use for free until the end of April. From May, Selene Mini will be available on our plans and charged at $3 per 1K API calls.

Key capabilities

Strong Benchmark Performance: Selene Mini outperforms top small models, including GPT-4o mini, on average performance across 11 benchmarks for evaluations. These cover absolute scoring, classification, and pairwise preference tasks.
Flexible Evaluation Formats: Supports various scoring approaches including binary pass/fail and 1-5 Likert scale.
‍Domain Adaptability: Shows good zero-shot performance in specialized fields. Testing on expert-annotated finance (FinanceBench) and medical (CRAFT-MD) datasets shows improvements of 5% and 10% over the base model.

For more on Selene Mini, check out our blog post from the open source release. For technical details on how we trained our model to be a state-of-the-art SLMJ, check out our research paper on arXiv.

How to use Selene Mini

A small model that can handle high-volume evaluations is beneficial for use cases such as real-time content guardrails and continuous development testing. As agents become more popular, the need for efficient, reliable evaluation at scale becomes critical. Users need models that can validate thousands of agent actions without introducing significant latency or cost.

Try it out: we're offering API calls with Selene Mini for free until the end of April.

To use Selene Mini through the Atla SDK, install the Atla package:

pip install atla

export ATLA_API_KEY=pk-...

Run your eval with atla-selene-mini as the model_id

from atla import Atla

client = Atla()

evaluation_criteria = """Is the response correct?
0: It is not correct.
1: It is correct.
Your score should be 0 or 1.
"""

evaluation = client.evaluation.create(
   model_id="atla-selene-mini",
   model_input="What is the capital of France?",
   model_output="Paris",
   evaluation_criteria=evaluation_criteria,
).result.evaluation

print(f"Atla's score: {evaluation.score}")
print(f"Atla's critique: {evaluation.critique}")

Important: Be sure to follow the notes in this docs page when specifying your evaluation criteria for the best results.

Guardrails Cookbook

Check out our Guardrails Cookbook with instructions on using Selene Mini as an effective safety filter. The cookbook includes code examples for implementing pass/fail guardrails that evaluate outputs for toxicity, bias, and medical advice. If the output fails, then the generated text is returned with a filtered response.

Maintain oversight of your AI outputs with a few extra lines of code.