From RAG to Agentic Tool-Calling

Part III in our AI journey series

It’s been over a year since Part II of this series, and a lot has happened.

We’ve been building AI systems for almost two years now. Most of our client projects involve AI in some form. It started with the RAG approach I wrote about in Part II — retrieve documents, inject context, generate response. That worked, but it only went so far.

The problem was that RAG was a single-pass process. Complex questions often required multiple retrieval steps, but we were stuck with one upfront retrieval — either copying results manually from one query to the next, or building custom scripts to chain them programmatically. It was brittle and limited. The real breakthrough came from letting the LLM make these decisions autonomously: try one search approach, examine the results, decide what to try next, query multiple sources if needed, then synthesize everything together. The agentic flow — where the LLM recursively uses tools until it has what it needs — is fundamentally more effective than any amount of manual or scripted prompt chaining we could build.

Then MCP came along. To be clear, agentic tool-calling patterns existed before MCP — frameworks like LangGraph and various agent libraries had been doing this for a while. But MCP standardized it. When Anthropic released MCP, it was a bit of a wake-up call that we weren’t building with state-of-the-art approaches. We’d been so focused on our custom RAG implementation that we’d missed the broader shift toward letting LLMs autonomously orchestrate tools. MCP didn’t magically solve all our problems — we still had to build access control, multi-tenancy, data pipelines, and all the production concerns ourselves — but it gave us a standardized framework for the agentic tool-calling pattern we were missing.

The Shift: From One-Shot to Multi-Step

Before MCP, if a user asked “What’s the status of Project X?”, we’d retrieve relevant documents, inject them into the prompt, and send everything to the LLM. The LLM would respond with something like “Based on these documents, Project X is in progress…” and we’d be done.

With MCP, imagine a user asks “What’s the status of Project X and how does it compare to similar GitHub projects?” The LLM thinks through this step by step. First it calls query_documents("Project X"). It examines those results, sees it has the project status, but realizes it still needs GitHub data. So it calls `github:search_repositories("Project X similar projects")`. Now with both pieces of information, it synthesizes a complete answer.

The LLM keeps calling tools — examining results and deciding what to do next — until it has what it needs. It’s not following a script we wrote. It’s reasoning about what information it needs and going to get it.

Building Our MCP Client

When we started exploring MCP in Elixir, we found Hermes MCP, an existing implementation that handles both client and server functionality. It’s solid. We’ve even written about integrating Hermes with Phoenix if you want to see that approach in action.

We chose to build our own because our multi-tenant architecture required tight coupling with our existing user and team permission system. MCP provided the tool-calling framework, but we needed to build the access control ourselves. We also wanted to understand MCP deeply rather than treating it as a black box, and we preferred full control without taking on another external dependency.

Both approaches are valid. This post shares what we learned — the patterns, the Elixir advantages, and why the BEAM is particularly well-suited for these multi-step, agentic workflows.

What We Actually Built

Here’s the thing: we haven’t actually shipped any external MCP servers to production. All of our tools are custom implementations — semantic document search through our vector database, web search with custom result formatting, full-text search across document collections, and context retrieval with permission checks.

This isn’t because external servers don’t work. It’s because the internal tools are all we’ve needed so far. When you have full control over the implementation, you can integrate directly with your security model, format results exactly how your LLM works best, and avoid running external code in your production environment.

In our old RAG system, document retrieval happened upfront based on the prompt template configuration. If you started a chat with a template that had document queries enabled, we’d retrieve and hydrate that context before the conversation even started. Once the chat began, the retrieval step was done.

With MCP, the LLM actively calls tools throughout the conversation. It might start by calling query_documents, then call web_search for more recent information, then circle back and call get_document_context for more detail. The LLM makes these decisions dynamically as the conversation unfolds.

This agentic capability — where the LLM calls a tool, examines the result, and decides the next step — is what makes it powerful.

User Control: Tool Configuration and Transparency

One crucial lesson: users need to know what tools are enabled and have control over them.

Giving an LLM autonomous tool-calling capability sounds great until you realize users might not want certain tools active (some users only want proprietary data without external web search, others need the opposite), different tasks need different tool sets, and transparency builds trust.

We handle this two ways. First, users can start a chat from a prompt template that pre-configures which tools are enabled. A “Document Analysis” template might enable document query tools but disable web search and GitHub integration.

Second, users can adjust tool configurations mid-conversation. They might start with a template, then enable web search on the fly when they realize they need external data. The UI shows which tools are enabled, lets users toggle them, and displays when tools are called.

This transparency is critical. It’s a best practice we’ve carried across all our MCP implementations.

Tracing and Source Attribution

Beyond showing which tools are enabled, we also trace every tool call and its results. This serves two purposes: developer auditability and user citations.

From a development standpoint, tracing tool execution gives you visibility into how the LLM is reasoning through queries. When a user reports an unexpected answer, you can trace back through the tool calls — what the LLM searched for, what results it got, and how it used them. This is invaluable for debugging and improving your prompts and tool implementations.

For users, source attribution is even more critical. When our document query tool returns results, it includes not just the content but the source document, page numbers, and relevance scores. The UI displays these as citations alongside the LLM’s response. Users can click through to see exactly where information came from. This builds trust and lets users verify answers against source material.

The pattern is straightforward: each tool’s execute function returns results with metadata. For document queries, that’s document IDs and page references. For web search, it’s URLs and timestamps. For database queries, it might be record IDs. The LiveView UI renders this metadata as citations, and we log the full tool call chain (query parameters, results, timing) for later analysis.

This traceability makes the system auditable in ways that simple RAG never was. You’re not just injecting context and hoping the LLM uses it well — you’re seeing exactly what the LLM asked for, what it received, and how it synthesized the response.

Implementing Local Tools

A local tool implements a simple behavior. Here’s the contract:

defmodule Revelry.MCP.Tool do
  @moduledoc """
  Defines the callbacks that must be implemented by any MCP tool
  """

  @type params :: map()
  @type result :: %{
          type: :text | :resource | :image,
          content: term(),  # Can be string, list, map, or any structure
          metadata: map(),
          status: :success | :error,
          error_message: binary() | nil
        }

  @callback name() :: String.t()
  @callback description() :: String.t()
  @callback parameters() :: map()
  @callback annotations() :: map()
  @callback execute(params(), opts :: Keyword.t()) :: {:ok, result()} | {:error, term()}
  @callback configure(params()) :: {:ok, result()} | {:error, term()}
  @callback feature_flag() :: atom()

  @optional_callbacks annotations: 0, configure: 1, feature_flag: 0
end

The core callbacks are name/0, description/0, parameters/0, and execute/2. Optional callbacks like annotations/0 support advanced features from the MCP specification (like marking tools as read-only or destructive), while feature_flag/0 lets you gate tools behind feature toggles.

Here’s a complete implementation:

defmodule Tools.DocumentQuery do
  @behaviour Revelry.MCP.Tool
  
  def name, do: "query_documents"
  
  def description do
    """
    Search uploaded documents using semantic search. Returns relevant 
    passages with citations.
    """
  end
  
  def parameters do
    %{
      type: "object",
      properties: %{
        query: %{type: "string", description: "What to search for"},
        limit: %{type: "integer", description: "Max results", default: 5}
      },
      required: ["query"]
    }
  end
  
  def execute(%{"query" => query} = params, opts) do
    user = Keyword.get(opts, :user)
    team = Keyword.get(opts, :team)
    limit = Map.get(params, "limit", 5)
    
    {:ok, embedding} = Embeddings.generate(query)
    
    results = VectorDB.search(
      embedding: embedding,
      limit: limit,
      filters: %{
        team_id: team.id,           # Team isolation
        accessible_by_user: user.id # Permission check
      }
    )
    
    {:ok, %{
      type: :text,
      content: results,  # Can be a string, list, map - whatever makes sense
      metadata: %{result_count: length(results)},
      status: :success,
      error_message: nil
    }}
  end
end

Notice how the access control is implemented right in the tool? MCP gave us the tool-calling framework, but we built the security model ourselves. This example simplifies our actual implementation (in production we use a dedicated Permissions module with functions like Permissions.enforce and pass user context through various scoping functions), but the principle is the same: your tool implementations integrate directly with your existing authorization system. Whether that’s checking user groups, direct resource assignments, team memberships, or other permission layers, your tools respect the same access rules as the rest of your app. This level of control is essential when dealing with sensitive data.

The MCP Client: Tool Registry and Routing

The MCP Client is a GenServer that manages the tool registry and routes execution. Here’s the core:

defmodule MCP.Client do
  use GenServer
  
  defstruct [
    :local_tools,      # %{name => tool_info}
    :enabled_tools,    # MapSet of enabled tool names
    :user,
    :team
  ]
  
  # Client API
  
  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts)
  end
  
  def list_enabled_tools(client) do
    GenServer.call(client, :list_enabled_tools)
  end
  
  def execute_tool(client, tool_name, params) do
    GenServer.call(client, {:execute_tool, tool_name, params}, 300_000)
  end
  
  def enable_tool(client, tool_name) do
    GenServer.call(client, {:enable_tool, tool_name})
  end
  
  def disable_tool(client, tool_name) do
    GenServer.call(client, {:disable_tool, tool_name})
  end
  
  # Server Callbacks
  
  def init(opts) do
    # Register all available tools
    local_tools = 
      [Tools.DocumentQuery, Tools.FullTextSearch, Tools.DocumentContext, Tools.WebSearch]
      |> Enum.map(fn module ->
        {module.name(), %{
          name: module.name(),
          description: module.description(),
          parameters: module.parameters(),
          type: :local,
          module: module
        }}
      end)
      |> Map.new()
    
    {:ok, %__MODULE__{
      local_tools: local_tools,
      enabled_tools: MapSet.new(),  # Start with no tools enabled
      user: opts[:user],
      team: opts[:team]
    }}
  end
  
  def handle_call(:list_enabled_tools, _from, state) do
    enabled = 
      state.local_tools
      |> Enum.filter(fn {name, _info} -> MapSet.member?(state.enabled_tools, name) end)
      |> Map.new()
    
    {:reply, {:ok, enabled}, state}
  end
  
  def handle_call({:execute_tool, tool_name, params}, _from, state) do
    # Only execute if tool is enabled
    if MapSet.member?(state.enabled_tools, tool_name) do
      result = case Map.get(state.local_tools, tool_name) do
        %{module: module, type: :local} ->
          # Execute local tool with user/team context
          module.execute(params, user: state.user, team: state.team)
        
        nil ->
          {:error, :tool_not_found}
      end
      
      {:reply, result, state}
    else
      {:reply, {:error, :tool_not_enabled}, state}
    end
  end
  
  def handle_call({:enable_tool, tool_name}, _from, state) do
    new_enabled = MapSet.put(state.enabled_tools, tool_name)
    {:reply, :ok, %{state | enabled_tools: new_enabled}}
  end
  
  def handle_call({:disable_tool, tool_name}, _from, state) do
    new_enabled = MapSet.delete(state.enabled_tools, tool_name)
    {:reply, :ok, %{state | enabled_tools: new_enabled}}
  end
end

The client registers all available tools but starts with none enabled. This separation is deliberate: registration defines what exists in the system, while enabling controls what this specific session can use. This pattern enables several key features:

Users can toggle tools on/off mid-conversation without re-registering
The UI can show available tools vs. enabled tools, letting users discover what they could use
Prompt templates can enable different tool sets from the same registry
You can show tools as available-but-disabled based on permissions, providing transparency

The LiveView session determines which tools to enable based on the user, team, and prompt template context. This gives you fine-grained control over what each user can do.

The timeout on execute_tool is set to 5 minutes because some tools (like complex document queries or web searches) can take a while. This is where Elixir shines — long-running operations don’t block anything else.

External MCP Servers: We also built a Transport layer that handles JSON-RPC 2.0 communication with external MCP servers via stdio. It’s a GenServer that spawns external processes using Erlang ports, manages request/response cycles with buffered I/O, and supervises the external process lifecycle. If you need to connect to external MCP servers (like GitHub’s official server), this pattern works well — though for this post we’re focusing on local tool implementation since that’s been our primary use case.

LiveView Integration

This is where it all comes together.

Each LiveView creates its own MCP client. The client is a separate GenServer rather than embedded in the LiveView’s state because we use MCP clients in multiple contexts: interactive chats, API endpoints that need tool access, and custom multi-step workflows where individual steps initialize their own clients with step-specific tools. This separation of concerns means the same MCP client architecture works everywhere — whether it’s attached to a LiveView, processing an API request, or orchestrating a background workflow that chains prompts together (combining prompt chaining with agentic tool-calling).

For LiveView specifically, this gives you clean per-chat isolation — enabling a tool in one chat doesn’t affect other open chats, and when the LiveView process terminates (user closes tab, navigates away), the MCP client automatically cleans up via process linking.

defmodule ChatLive do
  use Phoenix.LiveView
  
  def mount(_params, _session, socket) do
    user = socket.assigns.current_user
    team = socket.assigns.team
    
    # Each LiveView gets its own MCP client
    {:ok, mcp_client} = MCP.Client.start_link(
      user: user,
      team: team
    )
    
    # Configure which tools are enabled based on user/team/template
    configure_tools_for_session(mcp_client, user, team, socket.assigns.prompt_template)
    
    {:ok, assign(socket, 
      mcp_client: mcp_client, 
      messages: [],
      streamed_response: ""
    )}
  end
  
  defp configure_tools_for_session(mcp_client, user, team, prompt_template) do
    # Enable tools based on team permissions, feature flags, prompt template
    if team.web_search_enabled && (prompt_template == nil || prompt_template.web_search_enabled) do
      MCP.Client.enable_tool(mcp_client, "web_search")
    end
    
    if has_document_access?(user, team) do
      MCP.Client.enable_tool(mcp_client, "query_documents")
      MCP.Client.enable_tool(mcp_client, "document_full_text_search")
    end
  end
  
  def handle_event("send_message", %{"message" => text}, socket) do
    {:ok, tools} = MCP.Client.list_enabled_tools(socket.assigns.mcp_client)
    
    Task.async(fn ->
      LanguageModel.chat(
        socket.assigns.messages ++ [%{role: "user", content: text}],
        tools: tools,
        stream: true,
        stream_to: self()
      )
    end)
    
    {:noreply, assign(socket, generating: true)}
  end
  
  # Streaming chunks arrive as messages
  def handle_info({:stream_chunk, chunk}, socket) do
    updated_response = socket.assigns.streamed_response <> chunk
    {:noreply, assign(socket, streamed_response: updated_response)}
  end
  
  # LLM requests tool execution
  def handle_info({:tool_calls, tool_calls}, socket) do
    # Execute all tools concurrently, but await them all to before passing on to
    # `regenerate_with_tool_results` which asks the LLM to synthesize the results.
    results = 
      Enum.map(tool_calls, fn call ->
        Task.async(fn ->
          MCP.Client.execute_tool(
            socket.assigns.mcp_client,
            call.name,
            call.arguments
          )
        end)
      end)
      |> Task.await_many()
    
    {:noreply, regenerate_with_tool_results(socket, results)}
  end
  
  defp regenerate_with_tool_results(socket, results) do
    # Add tool results to the message history
    updated_messages = append_tool_results_to_history(
      socket.assigns.messages, 
      results
    )
    
    # Call the LLM again - it will see the tool results and continue
    {:ok, tools} = MCP.Client.list_enabled_tools(socket.assigns.mcp_client)
    
    Task.async(fn ->
      LanguageModel.chat(
        updated_messages,
        tools: tools,
        stream: true,
        stream_to: self()
      )
    end)
    
    assign(socket, messages: updated_messages, streamed_response: "")
  end
  
  # Users can toggle tools mid-conversation
  def handle_event("toggle_tool_enabled", %{"tool" => tool}, socket) do
    # In production, check permissions here (team settings, feature flags, etc.)
    mcp_client = socket.assigns.mcp_client
    enable? = tool not in socket.assigns.enabled_tool_names
    
    if enable? do
      MCP.Client.enable_tool(mcp_client, tool)
    else
      MCP.Client.disable_tool(mcp_client, tool)
    end
    
    {:noreply, refresh_tool_list(socket)}
  end
  
  def terminate(_reason, _socket) do
    :ok  # Process linking handles MCP client cleanup automatically
  end
end

The flow is: user sends message → LLM generates → LLM requests tools → we execute them concurrently → append results to history → LLM continues. This cycle repeats until the LLM has everything it needs.

The streaming is simple. When we update streamed_response, LiveView automatically pushes the change to the browser:

<div :if={@generating && @streamed_response != ""}>
  <.message_component role="assistant" content={@streamed_response} streaming={true} />
</div>

No WebSocket boilerplate, no state synchronization, no manual cleanup. One MCP client per chat, streaming via handle_info, concurrent tool execution via Task.async, and automatic cleanup via process links.

Why Elixir Makes This Easy

Managing external processes? Erlang ports are built-in. Per-chat state isolation? GenServer per LiveView. Concurrent tool execution? Task.async. Real-time streaming? LiveView with handle_info. Crash recovery? Supervision trees. Cleanup on disconnect? Process links.

Every hard part maps to something Elixir does well.

When the LLM calls multiple tools, each might take seconds. In Python, you’d need thread management or asyncio coordination. In Elixir:

tool_calls
|> Enum.map(fn call -> 
  Task.async(fn -> execute_tool(call) end) 
 end)
|> Task.await_many()

Done. The LLM gets results back and continues its recursive reasoning.

Local vs External Tools

External MCP servers are compelling — they’re essentially pre-built SDKs for LLMs. Install GitHub’s MCP server and your LLM instantly gets dozens of GitHub operations. No need to write tool definitions, parameter schemas, or API integration code. It’s all there.

But here’s the thing: external MCP servers run as processes in your production environment. You’re spawning external code, piping data through it, and trusting it with your users’ requests. That’s a significant security consideration, especially in regulated industries or with sensitive data.

Consider that even GitHub — a reputable organization with an official MCP server — had a security vulnerability discovered in early 2025 involving prompt injection that could expose private repository data, along with a command injection flaw in their git-mcp-server package. If that can happen with a major vendor’s official implementation, the risks only increase as you move to less established community servers. Every external MCP server you run is external code executing in your production environment.

We’ve always built API wrappers and adapters for external systems. It’s second nature. When we need GitHub functionality, we build a local tool that wraps GitHub’s REST API with exactly the operations we need, formatted exactly how our LLM works best. Same for web search, database queries, or any other service. We don’t need a full SDK — we just need the REST (or GraphQL etc) API and we can create tools that give the LLM exactly the capabilities we want.

When local tools make sense:

Core business logic that touches sensitive data
Security-conscious environments where external code is a concern
Custom formatting for your specific LLM and use case
Operations requiring integration with your auth/permission system
High-volume operations where you need performance control

When external MCP servers make sense:

Rapid prototyping and exploration
Well-maintained community tools you trust
Complex third-party APIs with frequently changing functionality
Standardized operations where the pre-built tool definitions work well

Note: Emerging solutions like mcp.run offer WASM sandboxing with fine-grained permission control, letting you run external MCP servers in isolated environments. This approach allows patterns like running multiple GitHub MCP instances, each with single-repo access tokens, rather than one all-powerful server. These sandboxing approaches may address some of the security concerns around running external code in production.

Our approach: core capabilities as local tools, with the Transport layer architecture ready when an external server genuinely saves significant development time. So far, local tools have been sufficient. Building API wrappers is straightforward, and we get full control over security, formatting, and integration with our existing systems.

What We Learned

The agentic approach is fundamentally more effective. Moving from one-shot RAG to agentic tool-calling wasn’t just an incremental improvement — it changed what’s possible. The LLM autonomously deciding what information to gather and when, examining results and refining its approach, is qualitatively different from any amount of prompt chaining we could manually orchestrate.

MCP gave us a standardized path forward. Agentic patterns existed before MCP, but early implementations had… issues. While bleeding-edge adopters worked through prompt injection vulnerabilities and experimental frameworks, we focused on building solid RAG foundations. When MCP emerged with clear standards, we got the benefits of mature agentic tool-calling without having debugged the early chaos ourselves.

Local tools give you control where it matters. We’re comfortable building API wrappers — it’s what we do. When we can wrap GitHub’s API or any other service in a local tool, we get the exact operations we need, formatted how our LLM works best, with full integration into our auth system. External MCP servers are compelling (pre-built SDKs for LLMs), but they’re external code running in your production environment. The GitHub security vulnerabilities in early 2025 reinforced that even reputable vendors have risks. Building local tools is straightforward and keeps security under your control.

User control and transparency aren’t optional. Users need to know what tools are enabled and have control over them. Some users want only proprietary data, others need external sources. Template-based configuration handles common workflows, mid-conversation toggling gives fine-grained control, and showing tool activity in the UI builds trust. This has been a best practice across all our implementations.

Elixir’s architecture maps perfectly to this problem. The process model handles agentic tool execution naturally. LiveView makes streaming trivial. Concurrent tool execution via Task.async just works. GenServers provide per-user state isolation. Supervision trees handle crashes. Whether you build your own MCP client or use Hermes MCP, these advantages apply.

MCP solved one problem, not all of them. MCP gave us the tool-calling framework, but access control, multi-tenancy, rate limiting, and cost management are still on you. The separation of tool registration vs. enabling, integration with your permission system, timeout handling — these are production concerns you have to build. The good news is that Elixir’s architecture makes these problems manageable. The core MCP client is a few hundred lines. The production concerns were straightforward to layer on.

Should You Use Elixir for MCP?

If you’re building production AI systems with tool-calling capabilities, Elixir offers distinct advantages that map naturally to the problem space.

The streaming story alone is compelling. LiveView handles streaming LLM responses with minimal code — no WebSocket boilerplate, no manual state synchronization, just assign a value and it pushes to the browser. When you need per-chat isolation, each LiveView gets its own MCP client GenServer with its own tool configuration and state. The LLM calls multiple tools simultaneously? Task.async and Task.await_many make parallel execution trivial without thread management or asyncio coordination. Tool calls that take seconds or minutes don’t block anything else because Elixir’s lightweight processes handle long-running operations naturally.

Production reliability comes built-in. Supervision trees mean crashes don’t take down the system. Process links ensure cleanup. Erlang ports handle external process management with built-in supervision. If you’re building multi-tenant architecture — multiple teams or organizations with different tool access, permission checks, and configurations — Elixir’s process model makes isolation straightforward.

That said, if you’re building a simple proof-of-concept or single-user CLI tool, Python with existing MCP libraries might be faster to prototype. If your team doesn’t have Elixir experience and you’re not dealing with concurrent users, real-time streaming, or complex state management, the learning curve might not be worth it.

For getting started, check out Hermes MCP if you want a full-featured library. Build your own if you have specific integration needs like tight coupling with existing auth systems, as we did. Either way, you get Elixir’s process model advantages — the architecture naturally supports what MCP tool-calling systems need to do.

What’s Next

This is Part III in our series. Part I covered StoryBot with prompts and LangChain. Part II dove into RAG with embeddings and retrieval. This part covered MCP with tools and recursive reasoning.

Future topics: building an MCP server to expose our tools externally, multi-step workflows and orchestration patterns, and tool result caching and optimization strategies.

Resources

MCP and Elixir:

Recursive Tool Calling and Agentic Patterns:

Master Recursive Prompting for Deeper AI Insights – Relevance AI
Understanding Recursion in Systems Cognition & AI – Recursive Cognitive Architectures
Agents and Tool Calling in Agentic Frameworks

Moving from one-shot RAG to agentic tool-calling changed what we could build. It’s not just better answers — it’s letting the LLM autonomously reason about what information it needs and go get it. MCP gave us the standardized framework. Local tools gave us security and control. Elixir made the hard parts — recursive execution, concurrent operations, real-time streaming, per-user isolation — surprisingly manageable.

The result is a system where users can toggle tools mid-conversation, watch the LLM reason through complex queries by recursively calling tools, and trust that their data stays secure. The core MCP client is a few hundred lines. The production concerns layered on cleanly.

If you’re building something similar — production AI with tool-calling, concurrent users, and security requirements — these patterns should translate. Elixir’s architecture maps naturally to the problem.

Questions about MCP, agentic workflows, or building AI tools in Elixir? We’re continuing to iterate on these patterns and would love to hear what you’re building.

AI Elixir MCP

Building an MCP Client in Elixir