Introducing the Atla MCP Server: purpose-built LLM Judges now at your command

Atla team
April 22, 2025

Today we're excited to announce the launch of the Atla MCP Server, giving developers access to Atla’s powerful LLM Judge models through the Model Context Protocol. For teams looking to easily integrate accurate LLM-based evaluations into their workflow, the Atla MCP Server provides a clean, local solution that fits into existing dev environments.

Model Context Protocol (MCP) is rapidly becoming the standard for how LLMs interact with external tools. With MCP, any tool can be called by any model in a consistent way.

The Atla MCP Server runs locally and provides direct access to Atla’s evaluation models. Connect it to any MCP-compatible client and start using our eval tools in your dev workflow. 

Use cases

  • Claude Desktop users can evaluate responses across multiple criteria through chat.
  • Cursor users can add evals to their workflow to check for criteria like ‘code security.’
  • OpenAI Agents SDK users can run evals on chosen criteria before shipping.

Developers can also use Atla’s evals to create feedback loops for self-improvement. This is particularly useful for agents, who can now evaluate their outputs or reasoning against specific criteria and dynamically improve their responses without human intervention.

To demonstrate a simple example of a feedback loop…

Let’s rename a Pokemon 🔥 

We connected Claude Desktop to our MCP server to give Charizard a new name. We wanted the new name to be both original and funny. Here’s how we used Claude to make a name > evaluated that name with our Atla tools > and then improved the name based on Selene’s critiques!

Purpose-built eval models

Our MCP server gives you access to our flagship model Selene and our lightweight model Selene Mini. Both are specifically trained on evaluation tasks, unlike general purpose LLMs that are merely prompted to evaluate. The benefit over general purpose models is significant: more accurate scores, better critiques, and less self-bias.

Which Selene model does the agent use?

If you don’t want to leave model choice up to the agent, you can specify a model. 

Tools

🛠️ evaluate_llm_response: Evaluate a single response against one criterion

🛠️ evaluate_llm_response_on_multiple_criteria: Evaluate a single response against multiple criteria. (Useful for complex evaluations which can be decomposed into multiple single-dimensional evaluations.)

Getting started

To use the MCP server, you will need an Atla API key. Find your API key in the Dashboard or sign-up to get one.

Follow the instructions in our Github repo to install the server and connect to the server using a MCP client. We provide specific instructions for OpenAI Agents SDK, Claude Desktop, and Cursor. 

Behind the scenes / What's next

We're just getting started with our MCP integration. In fact, we've been developing this server with a little help from our AI friends. We collaborated with Claude to optimize the design of our server:

It's a recursive development process that has helped us ensure our tools actually work well with the assistants they're designed to support.

Stay tuned for more evaluation capabilities and deeper integrations with the tools you already use. Want to contribute to the Atla MCP Server? Check out our GitHub repo for contribution guidelines.

Let us know how you're using the Atla MCP Server by tagging us @Atla_AI on X!

Try our AI evaluation models — available through our API and the Eval Copilot (beta)
‍‍
Download our OSS model
Start for free
Start for free