From RAG to Agentic Tool-Calling
Part III in our AI journey series
It’s been over a year since Part II of this series, and a lot has happened.
We’ve been building AI systems for almost two years now. Most of our client projects involve AI in some form. It started with the RAG approach I wrote about in Part II — retrieve documents, inject context, generate response. That worked, but it only went so far.
The problem was that RAG was a single-pass process. Complex questions often required multiple retrieval steps, but we were stuck with one upfront retrieval — either copying results manually from one query to the next, or building custom scripts to chain them programmatically. It was brittle and limited. The real breakthrough came from letting the LLM make these decisions autonomously: try one search approach, examine the results, decide what to try next, query multiple sources if needed, then synthesize everything together. The agentic flow — where the LLM recursively uses tools until it has what it needs — is fundamentally more effective than any amount of manual or scripted prompt chaining we could build.
Then MCP came along. To be clear, agentic tool-calling patterns existed before MCP — frameworks like LangGraph and various agent libraries had been doing this for a while. But MCP standardized it. When Anthropic released MCP, it was a bit of a wake-up call that we weren’t building with state-of-the-art approaches. We’d been so focused on our custom RAG implementation that we’d missed the broader shift toward letting LLMs autonomously orchestrate tools. MCP didn’t magically solve all our problems — we still had to build access control, multi-tenancy, data pipelines, and all the production concerns ourselves — but it gave us a standardized framework for the agentic tool-calling pattern we were missing.
The Shift: From One-Shot to Multi-Step
Before MCP, if a user asked “What’s the status of Project X?”, we’d retrieve relevant documents, inject them into the prompt, and send everything to the LLM. The LLM would respond with something like “Based on these documents, Project X is in progress…” and we’d be done.
With MCP, imagine a user asks “What’s the status of Project X and how does it compare to similar GitHub projects?” The LLM thinks through this step by step. First it calls query_documents("Project X")
. It examines those results, sees it has the project status, but realizes it still needs GitHub data. So it calls `github:search_repositories("Project X similar projects")
`. Now with both pieces of information, it synthesizes a complete answer.
The LLM keeps calling tools — examining results and deciding what to do next — until it has what it needs. It’s not following a script we wrote. It’s reasoning about what information it needs and going to get it.
Building Our MCP Client
When we started exploring MCP in Elixir, we found Hermes MCP, an existing implementation that handles both client and server functionality. It’s solid. We’ve even written about integrating Hermes with Phoenix if you want to see that approach in action.
We chose to build our own because our multi-tenant architecture required tight coupling with our existing user and team permission system. MCP provided the tool-calling framework, but we needed to build the access control ourselves. We also wanted to understand MCP deeply rather than treating it as a black box, and we preferred full control without taking on another external dependency.
Both approaches are valid. This post shares what we learned — the patterns, the Elixir advantages, and why the BEAM is particularly well-suited for these multi-step, agentic workflows.
What We Actually Built
Here’s the thing: we haven’t actually shipped any external MCP servers to production. All of our tools are custom implementations — semantic document search through our vector database, web search with custom result formatting, full-text search across document collections, and context retrieval with permission checks.
This isn’t because external servers don’t work. It’s because the internal tools are all we’ve needed so far. When you have full control over the implementation, you can integrate directly with your security model, format results exactly how your LLM works best, and avoid running external code in your production environment.
In our old RAG system, document retrieval happened upfront based on the prompt template configuration. If you started a chat with a template that had document queries enabled, we’d retrieve and hydrate that context before the conversation even started. Once the chat began, the retrieval step was done.
With MCP, the LLM actively calls tools throughout the conversation. It might start by calling query_documents
, then call web_search
for more recent information, then circle back and call get_document_context
for more detail. The LLM makes these decisions dynamically as the conversation unfolds.
This agentic capability — where the LLM calls a tool, examines the result, and decides the next step — is what makes it powerful.
User Control: Tool Configuration and Transparency
One crucial lesson: users need to know what tools are enabled and have control over them.
Giving an LLM autonomous tool-calling capability sounds great until you realize users might not want certain tools active (some users only want proprietary data without external web search, others need the opposite), different tasks need different tool sets, and transparency builds trust.
We handle this two ways. First, users can start a chat from a prompt template that pre-configures which tools are enabled. A “Document Analysis” template might enable document query tools but disable web search and GitHub integration.
Second, users can adjust tool configurations mid-conversation. They might start with a template, then enable web search on the fly when they realize they need external data. The UI shows which tools are enabled, lets users toggle them, and displays when tools are called.
This transparency is critical. It’s a best practice we’ve carried across all our MCP implementations.
Tracing and Source Attribution
Beyond showing which tools are enabled, we also trace every tool call and its results. This serves two purposes: developer auditability and user citations.
From a development standpoint, tracing tool execution gives you visibility into how the LLM is reasoning through queries. When a user reports an unexpected answer, you can trace back through the tool calls — what the LLM searched for, what results it got, and how it used them. This is invaluable for debugging and improving your prompts and tool implementations.
For users, source attribution is even more critical. When our document query tool returns results, it includes not just the content but the source document, page numbers, and relevance scores. The UI displays these as citations alongside the LLM’s response. Users can click through to see exactly where information came from. This builds trust and lets users verify answers against source material.
The pattern is straightforward: each tool’s execute
function returns results with metadata. For document queries, that’s document IDs and page references. For web search, it’s URLs and timestamps. For database queries, it might be record IDs. The LiveView UI renders this metadata as citations, and we log the full tool call chain (query parameters, results, timing) for later analysis.
This traceability makes the system auditable in ways that simple RAG never was. You’re not just injecting context and hoping the LLM uses it well — you’re seeing exactly what the LLM asked for, what it received, and how it synthesized the response.
Implementing Local Tools
A local tool implements a simple behavior. Here’s the contract:
defmodule Revelry.MCP.Tool do
@moduledoc """
Defines the callbacks that must be implemented by any MCP tool
"""
@type params :: map()
@type result :: %{
type: :text | :resource | :image,
content: term(), # Can be string, list, map, or any structure
metadata: map(),
status: :success | :error,
error_message: binary() | nil
}
@callback name() :: String.t()
@callback description() :: String.t()
@callback parameters() :: map()
@callback annotations() :: map()
@callback execute(params(), opts :: Keyword.t()) :: {:ok, result()} | {:error, term()}
@callback configure(params()) :: {:ok, result()} | {:error, term()}
@callback feature_flag() :: atom()
@optional_callbacks annotations: 0, configure: 1, feature_flag: 0
end
The core callbacks are name/0
, description/0
, parameters/0
, and execute/2
. Optional callbacks like annotations/0
support advanced features from the MCP specification (like marking tools as read-only or destructive), while feature_flag/0
lets you gate tools behind feature toggles.
Here’s a complete implementation:
defmodule Tools.DocumentQuery do
@behaviour Revelry.MCP.Tool
def name, do: "query_documents"
def description do
"""
Search uploaded documents using semantic search. Returns relevant
passages with citations.
"""
end
def parameters do
%{
type: "object",
properties: %{
query: %{type: "string", description: "What to search for"},
limit: %{type: "integer", description: "Max results", default: 5}
},
required: ["query"]
}
end
def execute(%{"query" => query} = params, opts) do
user = Keyword.get(opts, :user)
team = Keyword.get(opts, :team)
limit = Map.get(params, "limit", 5)
{:ok, embedding} = Embeddings.generate(query)
results = VectorDB.search(
embedding: embedding,
limit: limit,
filters: %{
team_id: team.id, # Team isolation
accessible_by_user: user.id # Permission check
}
)
{:ok, %{
type: :text,
content: results, # Can be a string, list, map - whatever makes sense
metadata: %{result_count: length(results)},
status: :success,
error_message: nil
}}
end
end
Notice how the access control is implemented right in the tool? MCP gave us the tool-calling framework, but we built the security model ourselves. This example simplifies our actual implementation (in production we use a dedicated Permissions module with functions like Permissions.enforce
and pass user context through various scoping functions), but the principle is the same: your tool implementations integrate directly with your existing authorization system. Whether that’s checking user groups, direct resource assignments, team memberships, or other permission layers, your tools respect the same access rules as the rest of your app. This level of control is essential when dealing with sensitive data.
The MCP Client: Tool Registry and Routing
The MCP Client is a GenServer that manages the tool registry and routes execution. Here’s the core:
defmodule MCP.Client do
use GenServer
defstruct [
:local_tools, # %{name => tool_info}
:enabled_tools, # MapSet of enabled tool names
:user,
:team
]
# Client API
def start_link(opts) do
GenServer.start_link(__MODULE__, opts)
end
def list_enabled_tools(client) do
GenServer.call(client, :list_enabled_tools)
end
def execute_tool(client, tool_name, params) do
GenServer.call(client, {:execute_tool, tool_name, params}, 300_000)
end
def enable_tool(client, tool_name) do
GenServer.call(client, {:enable_tool, tool_name})
end
def disable_tool(client, tool_name) do
GenServer.call(client, {:disable_tool, tool_name})
end
# Server Callbacks
def init(opts) do
# Register all available tools
local_tools =
[Tools.DocumentQuery, Tools.FullTextSearch, Tools.DocumentContext, Tools.WebSearch]
|> Enum.map(fn module ->
{module.name(), %{
name: module.name(),
description: module.description(),
parameters: module.parameters(),
type: :local,
module: module
}}
end)
|> Map.new()
{:ok, %__MODULE__{
local_tools: local_tools,
enabled_tools: MapSet.new(), # Start with no tools enabled
user: opts[:user],
team: opts[:team]
}}
end
def handle_call(:list_enabled_tools, _from, state) do
enabled =
state.local_tools
|> Enum.filter(fn {name, _info} -> MapSet.member?(state.enabled_tools, name) end)
|> Map.new()
{:reply, {:ok, enabled}, state}
end
def handle_call({:execute_tool, tool_name, params}, _from, state) do
# Only execute if tool is enabled
if MapSet.member?(state.enabled_tools, tool_name) do
result = case Map.get(state.local_tools, tool_name) do
%{module: module, type: :local} ->
# Execute local tool with user/team context
module.execute(params, user: state.user, team: state.team)
nil ->
{:error, :tool_not_found}
end
{:reply, result, state}
else
{:reply, {:error, :tool_not_enabled}, state}
end
end
def handle_call({:enable_tool, tool_name}, _from, state) do
new_enabled = MapSet.put(state.enabled_tools, tool_name)
{:reply, :ok, %{state | enabled_tools: new_enabled}}
end
def handle_call({:disable_tool, tool_name}, _from, state) do
new_enabled = MapSet.delete(state.enabled_tools, tool_name)
{:reply, :ok, %{state | enabled_tools: new_enabled}}
end
end
The client registers all available tools but starts with none enabled. This separation is deliberate: registration defines what exists in the system, while enabling controls what this specific session can use. This pattern enables several key features:
- Users can toggle tools on/off mid-conversation without re-registering
- The UI can show available tools vs. enabled tools, letting users discover what they could use
- Prompt templates can enable different tool sets from the same registry
- You can show tools as available-but-disabled based on permissions, providing transparency
The LiveView session determines which tools to enable based on the user, team, and prompt template context. This gives you fine-grained control over what each user can do.
The timeout on execute_tool
is set to 5 minutes because some tools (like complex document queries or web searches) can take a while. This is where Elixir shines — long-running operations don’t block anything else.
External MCP Servers: We also built a Transport layer that handles JSON-RPC 2.0 communication with external MCP servers via stdio. It’s a GenServer that spawns external processes using Erlang ports, manages request/response cycles with buffered I/O, and supervises the external process lifecycle. If you need to connect to external MCP servers (like GitHub’s official server), this pattern works well — though for this post we’re focusing on local tool implementation since that’s been our primary use case.
LiveView Integration
This is where it all comes together.
Each LiveView creates its own MCP client. The client is a separate GenServer rather than embedded in the LiveView’s state because we use MCP clients in multiple contexts: interactive chats, API endpoints that need tool access, and custom multi-step workflows where individual steps initialize their own clients with step-specific tools. This separation of concerns means the same MCP client architecture works everywhere — whether it’s attached to a LiveView, processing an API request, or orchestrating a background workflow that chains prompts together (combining prompt chaining with agentic tool-calling).
For LiveView specifically, this gives you clean per-chat isolation — enabling a tool in one chat doesn’t affect other open chats, and when the LiveView process terminates (user closes tab, navigates away), the MCP client automatically cleans up via process linking.
defmodule ChatLive do
use Phoenix.LiveView
def mount(_params, _session, socket) do
user = socket.assigns.current_user
team = socket.assigns.team
# Each LiveView gets its own MCP client
{:ok, mcp_client} = MCP.Client.start_link(
user: user,
team: team
)
# Configure which tools are enabled based on user/team/template
configure_tools_for_session(mcp_client, user, team, socket.assigns.prompt_template)
{:ok, assign(socket,
mcp_client: mcp_client,
messages: [],
streamed_response: ""
)}
end
defp configure_tools_for_session(mcp_client, user, team, prompt_template) do
# Enable tools based on team permissions, feature flags, prompt template
if team.web_search_enabled && (prompt_template == nil || prompt_template.web_search_enabled) do
MCP.Client.enable_tool(mcp_client, "web_search")
end
if has_document_access?(user, team) do
MCP.Client.enable_tool(mcp_client, "query_documents")
MCP.Client.enable_tool(mcp_client, "document_full_text_search")
end
end
def handle_event("send_message", %{"message" => text}, socket) do
{:ok, tools} = MCP.Client.list_enabled_tools(socket.assigns.mcp_client)
Task.async(fn ->
LanguageModel.chat(
socket.assigns.messages ++ [%{role: "user", content: text}],
tools: tools,
stream: true,
stream_to: self()
)
end)
{:noreply, assign(socket, generating: true)}
end
# Streaming chunks arrive as messages
def handle_info({:stream_chunk, chunk}, socket) do
updated_response = socket.assigns.streamed_response <> chunk
{:noreply, assign(socket, streamed_response: updated_response)}
end
# LLM requests tool execution
def handle_info({:tool_calls, tool_calls}, socket) do
# Execute all tools concurrently, but await them all to before passing on to
# `regenerate_with_tool_results` which asks the LLM to synthesize the results.
results =
Enum.map(tool_calls, fn call ->
Task.async(fn ->
MCP.Client.execute_tool(
socket.assigns.mcp_client,
call.name,
call.arguments
)
end)
end)
|> Task.await_many()
{:noreply, regenerate_with_tool_results(socket, results)}
end
defp regenerate_with_tool_results(socket, results) do
# Add tool results to the message history
updated_messages = append_tool_results_to_history(
socket.assigns.messages,
results
)
# Call the LLM again - it will see the tool results and continue
{:ok, tools} = MCP.Client.list_enabled_tools(socket.assigns.mcp_client)
Task.async(fn ->
LanguageModel.chat(
updated_messages,
tools: tools,
stream: true,
stream_to: self()
)
end)
assign(socket, messages: updated_messages, streamed_response: "")
end
# Users can toggle tools mid-conversation
def handle_event("toggle_tool_enabled", %{"tool" => tool}, socket) do
# In production, check permissions here (team settings, feature flags, etc.)
mcp_client = socket.assigns.mcp_client
enable? = tool not in socket.assigns.enabled_tool_names
if enable? do
MCP.Client.enable_tool(mcp_client, tool)
else
MCP.Client.disable_tool(mcp_client, tool)
end
{:noreply, refresh_tool_list(socket)}
end
def terminate(_reason, _socket) do
:ok # Process linking handles MCP client cleanup automatically
end
end
The flow is: user sends message → LLM generates → LLM requests tools → we execute them concurrently → append results to history → LLM continues. This cycle repeats until the LLM has everything it needs.
The streaming is simple. When we update streamed_response
, LiveView automatically pushes the change to the browser:
<div :if={@generating && @streamed_response != ""}>
<.message_component role="assistant" content={@streamed_response} streaming={true} />
</div>
No WebSocket boilerplate, no state synchronization, no manual cleanup. One MCP client per chat, streaming via handle_info
, concurrent tool execution via Task.async
, and automatic cleanup via process links.
Why Elixir Makes This Easy
Managing external processes? Erlang ports are built-in. Per-chat state isolation? GenServer per LiveView. Concurrent tool execution? Task.async
. Real-time streaming? LiveView with handle_info
. Crash recovery? Supervision trees. Cleanup on disconnect? Process links.
Every hard part maps to something Elixir does well.
When the LLM calls multiple tools, each might take seconds. In Python, you’d need thread management or asyncio coordination. In Elixir:
tool_calls
|> Enum.map(fn call ->
Task.async(fn -> execute_tool(call) end)
end)
|> Task.await_many()
Done. The LLM gets results back and continues its recursive reasoning.
Local vs External Tools
External MCP servers are compelling — they’re essentially pre-built SDKs for LLMs. Install GitHub’s MCP server and your LLM instantly gets dozens of GitHub operations. No need to write tool definitions, parameter schemas, or API integration code. It’s all there.
But here’s the thing: external MCP servers run as processes in your production environment. You’re spawning external code, piping data through it, and trusting it with your users’ requests. That’s a significant security consideration, especially in regulated industries or with sensitive data.
Consider that even GitHub — a reputable organization with an official MCP server — had a security vulnerability discovered in early 2025 involving prompt injection that could expose private repository data, along with a command injection flaw in their git-mcp-server
package. If that can happen with a major vendor’s official implementation, the risks only increase as you move to less established community servers. Every external MCP server you run is external code executing in your production environment.
We’ve always built API wrappers and adapters for external systems. It’s second nature. When we need GitHub functionality, we build a local tool that wraps GitHub’s REST API with exactly the operations we need, formatted exactly how our LLM works best. Same for web search, database queries, or any other service. We don’t need a full SDK — we just need the REST (or GraphQL etc) API and we can create tools that give the LLM exactly the capabilities we want.
When local tools make sense:
- Core business logic that touches sensitive data
- Security-conscious environments where external code is a concern
- Custom formatting for your specific LLM and use case
- Operations requiring integration with your auth/permission system
- High-volume operations where you need performance control
When external MCP servers make sense:
- Rapid prototyping and exploration
- Well-maintained community tools you trust
- Complex third-party APIs with frequently changing functionality
- Standardized operations where the pre-built tool definitions work well
Note: Emerging solutions like mcp.run offer WASM sandboxing with fine-grained permission control, letting you run external MCP servers in isolated environments. This approach allows patterns like running multiple GitHub MCP instances, each with single-repo access tokens, rather than one all-powerful server. These sandboxing approaches may address some of the security concerns around running external code in production.
Our approach: core capabilities as local tools, with the Transport layer architecture ready when an external server genuinely saves significant development time. So far, local tools have been sufficient. Building API wrappers is straightforward, and we get full control over security, formatting, and integration with our existing systems.
What We Learned
The agentic approach is fundamentally more effective. Moving from one-shot RAG to agentic tool-calling wasn’t just an incremental improvement — it changed what’s possible. The LLM autonomously deciding what information to gather and when, examining results and refining its approach, is qualitatively different from any amount of prompt chaining we could manually orchestrate.
MCP gave us a standardized path forward. Agentic patterns existed before MCP, but early implementations had… issues. While bleeding-edge adopters worked through prompt injection vulnerabilities and experimental frameworks, we focused on building solid RAG foundations. When MCP emerged with clear standards, we got the benefits of mature agentic tool-calling without having debugged the early chaos ourselves.
Local tools give you control where it matters. We’re comfortable building API wrappers — it’s what we do. When we can wrap GitHub’s API or any other service in a local tool, we get the exact operations we need, formatted how our LLM works best, with full integration into our auth system. External MCP servers are compelling (pre-built SDKs for LLMs), but they’re external code running in your production environment. The GitHub security vulnerabilities in early 2025 reinforced that even reputable vendors have risks. Building local tools is straightforward and keeps security under your control.
User control and transparency aren’t optional. Users need to know what tools are enabled and have control over them. Some users want only proprietary data, others need external sources. Template-based configuration handles common workflows, mid-conversation toggling gives fine-grained control, and showing tool activity in the UI builds trust. This has been a best practice across all our implementations.
Elixir’s architecture maps perfectly to this problem. The process model handles agentic tool execution naturally. LiveView makes streaming trivial. Concurrent tool execution via Task.async
just works. GenServers provide per-user state isolation. Supervision trees handle crashes. Whether you build your own MCP client or use Hermes MCP, these advantages apply.
MCP solved one problem, not all of them. MCP gave us the tool-calling framework, but access control, multi-tenancy, rate limiting, and cost management are still on you. The separation of tool registration vs. enabling, integration with your permission system, timeout handling — these are production concerns you have to build. The good news is that Elixir’s architecture makes these problems manageable. The core MCP client is a few hundred lines. The production concerns were straightforward to layer on.
Should You Use Elixir for MCP?
If you’re building production AI systems with tool-calling capabilities, Elixir offers distinct advantages that map naturally to the problem space.
The streaming story alone is compelling. LiveView handles streaming LLM responses with minimal code — no WebSocket boilerplate, no manual state synchronization, just assign a value and it pushes to the browser. When you need per-chat isolation, each LiveView gets its own MCP client GenServer with its own tool configuration and state. The LLM calls multiple tools simultaneously? Task.async
and Task.await_many
make parallel execution trivial without thread management or asyncio coordination. Tool calls that take seconds or minutes don’t block anything else because Elixir’s lightweight processes handle long-running operations naturally.
Production reliability comes built-in. Supervision trees mean crashes don’t take down the system. Process links ensure cleanup. Erlang ports handle external process management with built-in supervision. If you’re building multi-tenant architecture — multiple teams or organizations with different tool access, permission checks, and configurations — Elixir’s process model makes isolation straightforward.
That said, if you’re building a simple proof-of-concept or single-user CLI tool, Python with existing MCP libraries might be faster to prototype. If your team doesn’t have Elixir experience and you’re not dealing with concurrent users, real-time streaming, or complex state management, the learning curve might not be worth it.
For getting started, check out Hermes MCP if you want a full-featured library. Build your own if you have specific integration needs like tight coupling with existing auth systems, as we did. Either way, you get Elixir’s process model advantages — the architecture naturally supports what MCP tool-calling systems need to do.
What’s Next
This is Part III in our series. Part I covered StoryBot with prompts and LangChain. Part II dove into RAG with embeddings and retrieval. This part covered MCP with tools and recursive reasoning.
Future topics: building an MCP server to expose our tools externally, multi-step workflows and orchestration patterns, and tool result caching and optimization strategies.
Resources
MCP and Elixir:
Recursive Tool Calling and Agentic Patterns:
- Master Recursive Prompting for Deeper AI Insights – Relevance AI
- Understanding Recursion in Systems Cognition & AI – Recursive Cognitive Architectures
- Agents and Tool Calling in Agentic Frameworks
Moving from one-shot RAG to agentic tool-calling changed what we could build. It’s not just better answers — it’s letting the LLM autonomously reason about what information it needs and go get it. MCP gave us the standardized framework. Local tools gave us security and control. Elixir made the hard parts — recursive execution, concurrent operations, real-time streaming, per-user isolation — surprisingly manageable.
The result is a system where users can toggle tools mid-conversation, watch the LLM reason through complex queries by recursively calling tools, and trust that their data stays secure. The core MCP client is a few hundred lines. The production concerns layered on cleanly.
If you’re building something similar — production AI with tool-calling, concurrent users, and security requirements — these patterns should translate. Elixir’s architecture maps naturally to the problem.
Questions about MCP, agentic workflows, or building AI tools in Elixir? We’re continuing to iterate on these patterns and would love to hear what you’re building.