monitoring phoenix

Monitoring Phoenix Applications and Recording Metrics

The Beam Community has lots of ways to monitor applications. Part of the problem is picking which solution works for you. Having gone through the exercise recently, I thought it be a good idea to share the solution which we have came up with. At Revelry, we are heavy users of DataDog, so this solution leans heavily into putting metrics there.

Tools

      # Monitoring deps
      {:vmstats, "~> 2.3"},
      {:recon, "~> 2.3"},
      {:dogstatsd, "~> 0.0.4"}

These are the only dependencies we use to monitor our apps. vmstats is used to collect memory data about the application. recon is also used to collect memory, but for our Phoenix responses. dogstatsd is what we use to send metrics to DataDog. It turns out there isn’t much needed to get things going.

Stats Module

We have GenServer setup just for taking in metrics calls and forwarding them to DogStatsD. This does two things. One, it allows us to call the functions on DogStatsD without passing the pid all the time. Two, it configures some app-level tags to be sent along with our metrics.

defmodule App.Stats do
  @moduledoc """
  GenServer wrapper for DogStatsd.
  """
  use GenServer
  require DogStatsd

  def start_link() do
    GenServer.start_link(__MODULE__, [], name: __MODULE__)
  end

  def init(_config) do
    %{host: host, port: port, app: app} =
      :app
      |> Confex.get_env(:stats)
      |> Enum.into(%{})

    DogStatsd.new(host, port, %{tags: ["app:#{app}"]})
  end

  def forward(func_name, args) when is_atom(func_name) and is_list(args) do
    GenServer.call(__MODULE__, {func_name, args})
  end

  def handle_call({function_name, args}, _from, state) do
    result = apply(DogStatsd, function_name, [state | args])
    {:reply, result, state}
  end

  # Wrap DogStatsd functions that you wish to use.

  def increment(name, opts \\ []), do: forward(:increment, [name, opts])
  def histogram(name, value, opts \\ []), do: forward(:histogram, [name, value, opts])
  def gauge(name, value, opts \\ []), do: forward(:gauge, [name, value, opts])

Our application supervises it:

# In application.ex
    children = [
      worker(App.Stats, []),

Erlang VM Monitoring

We use vmstats to collect info on the vm. In the example below, our App.Metrics module implements the :vmstats_sink behavior. All it needs is a collect function implemented.

Ecto Monitoring

Now to show how to get metrics from Ecto.

To monitor Ecto queries, we needed to create a logger. Loggers, for now, are configured at compile time. First let’s show the module.

defmodule App.Metrics do
  @moduledoc """
  Collects metrics from erlang system and ecto
  """
  import App.Stats
  require Logger
  
  def collect(_type, name, value) do
    try do
      gauge(IO.iodata_to_binary(name), value)
    catch
      :exit, value ->
        Logger.error("Exited. Make sure the :app app is running. Value: #{inspect(value)}")
    end
  end

  def record_ecto_metric(entry) do
    try do
      opts = %{
        tags: []
      }

      queue_time = entry.queue_time || 0
      duration = entry.query_time + queue_time

      increment(
        "ecto.query.count",
        opts
      )

      histogram("ecto.query.exec.time", duration, opts)
      histogram("ecto.query.queue.time", queue_time, opts)
    catch
      :exit, value ->
        Logger.error("Exited. Make sure the :app app is running. Value: #{inspect(value)}")
    end
  end
end

Here we are collecting information about the query times and the queue times. Next, we must update configuration to tell Ecto about our Logger and vmstats about our module.

# in config.exs
config :app, App.Repo,
  # other configurations ...
  loggers: [{Ecto.LogEntry, :log, []}, {App.Metrics, :record_ecto_metric, []}]
  
config(
  :vmstats,
  sink: App.Metrics,
  base_key: "app.erlang",
  key_separator: ".",
  interval: 1_000
)

Now Ecto will use our logger. Note: when running migrations, this code will still run. Ecto’s migration runner does not start up our app however, this is the reason we wrapped our code in a try..catch.

Phoenix Request/Response Monitoring

We use a plug to get metrics about Phoenix requests and responses:

defmodule AppWeb.Stats do
  @behaviour Plug
  alias Plug.Conn
  import Plug.Conn, only: [register_before_send: 2]
  import App.Stats

  @doc """
  The Plug hook that records our metrics.
  """

  def call(%Conn{} = conn, _config) do
    opts = %{tags: standard_tags(conn)}

    # increment request count
    increment("phoenix.request.count", opts)

    req_start_time = :os.timestamp()

    register_before_send(conn, fn conn ->
      # increment response count
      increment("phoenix.response.count", opts)

      # log response time in microseconds
      req_end_time = :os.timestamp()
      duration = :timer.now_diff(req_end_time, req_start_time)
      histogram("phoenix.response.time", duration, opts)

      threshold = Confex.get_env(:app, :response_memory_capture_threshold)

      if :rand.uniform() <= threshold do
        [memory: memory] = :recon.info(self(), [:memory])

        histogram("phoenix.response.memory", memory, opts)
      end

      conn
    end)
  end
end

We collect response times as well as sample memory usage. This comes in handy when trying to figure out which responses take up the most memory. We add the plug to the controller function in our Web module.

defmodule AppWeb do

  def controller do
    quote do
      use Phoenix.Controller, namespace: AppWeb

      plug(TSSWeb.Stats)

      import Plug.Conn
      import AppWeb.Router.Helpers
      import AppWeb.Gettext
    end
  end
  
  # ...
end

All this together gives us a nice picture into how our app is doing. It has already come in handy with giving us clues where to improve things, which code paths are used the most, etc.

For Phoenix, there is an Instrumentation API now. It has been there since 1.3. We aren’t using it only because we all thought it was supposed to be coming in 1.4 instead. Oops! We will be experimenting with it soon to see if it works for us.

We're building an AI-powered Product Operations Cloud, leveraging AI in almost every aspect of the software delivery lifecycle. Want to test drive it with us? Join the ProdOps party at ProdOps.ai.