The Beam Community has lots of ways to monitor applications. Part of the problem is picking which solution works for you. Having gone through the exercise recently, I thought it be a good idea to share the solution which we have came up with. At Revelry, we are heavy users of DataDog, so this solution leans heavily into putting metrics there.
Tools
# Monitoring deps
{:vmstats, "~> 2.3"},
{:recon, "~> 2.3"},
{:dogstatsd, "~> 0.0.4"}
These are the only dependencies we use to monitor our apps. vmstats
is used to collect memory data about the application. recon
is also used to collect memory, but for our Phoenix responses. dogstatsd
is what we use to send metrics to DataDog. It turns out there isn’t much needed to get things going.
Stats Module
We have GenServer setup just for taking in metrics calls and forwarding them to DogStatsD. This does two things. One, it allows us to call the functions on DogStatsD without passing the pid all the time. Two, it configures some app-level tags to be sent along with our metrics.
defmodule App.Stats do
@moduledoc """
GenServer wrapper for DogStatsd.
"""
use GenServer
require DogStatsd
def start_link() do
GenServer.start_link(__MODULE__, [], name: __MODULE__)
end
def init(_config) do
%{host: host, port: port, app: app} =
:app
|> Confex.get_env(:stats)
|> Enum.into(%{})
DogStatsd.new(host, port, %{tags: ["app:#{app}"]})
end
def forward(func_name, args) when is_atom(func_name) and is_list(args) do
GenServer.call(__MODULE__, {func_name, args})
end
def handle_call({function_name, args}, _from, state) do
result = apply(DogStatsd, function_name, [state | args])
{:reply, result, state}
end
# Wrap DogStatsd functions that you wish to use.
def increment(name, opts \\ []), do: forward(:increment, [name, opts])
def histogram(name, value, opts \\ []), do: forward(:histogram, [name, value, opts])
def gauge(name, value, opts \\ []), do: forward(:gauge, [name, value, opts])
Our application supervises it:
# In application.ex
children = [
worker(App.Stats, []),
Erlang VM Monitoring
We use vmstats
to collect info on the vm. In the example below, our App.Metrics
module implements the :vmstats_sink
behavior. All it needs is a collect
function implemented.
Ecto Monitoring
Now to show how to get metrics from Ecto.
To monitor Ecto queries, we needed to create a logger. Loggers, for now, are configured at compile time. First let’s show the module.
defmodule App.Metrics do
@moduledoc """
Collects metrics from erlang system and ecto
"""
import App.Stats
require Logger
def collect(_type, name, value) do
try do
gauge(IO.iodata_to_binary(name), value)
catch
:exit, value ->
Logger.error("Exited. Make sure the :app app is running. Value: #{inspect(value)}")
end
end
def record_ecto_metric(entry) do
try do
opts = %{
tags: []
}
queue_time = entry.queue_time || 0
duration = entry.query_time + queue_time
increment(
"ecto.query.count",
opts
)
histogram("ecto.query.exec.time", duration, opts)
histogram("ecto.query.queue.time", queue_time, opts)
catch
:exit, value ->
Logger.error("Exited. Make sure the :app app is running. Value: #{inspect(value)}")
end
end
end
Here we are collecting information about the query times and the queue times. Next, we must update configuration to tell Ecto about our Logger and vmstats about our module.
# in config.exs
config :app, App.Repo,
# other configurations ...
loggers: [{Ecto.LogEntry, :log, []}, {App.Metrics, :record_ecto_metric, []}]
config(
:vmstats,
sink: App.Metrics,
base_key: "app.erlang",
key_separator: ".",
interval: 1_000
)
Now Ecto will use our logger. Note: when running migrations, this code will still run. Ecto’s migration runner does not start up our app however, this is the reason we wrapped our code in a try..catch
.
Phoenix Request/Response Monitoring
We use a plug to get metrics about Phoenix requests and responses:
defmodule AppWeb.Stats do
@behaviour Plug
alias Plug.Conn
import Plug.Conn, only: [register_before_send: 2]
import App.Stats
@doc """
The Plug hook that records our metrics.
"""
def call(%Conn{} = conn, _config) do
opts = %{tags: standard_tags(conn)}
# increment request count
increment("phoenix.request.count", opts)
req_start_time = :os.timestamp()
register_before_send(conn, fn conn ->
# increment response count
increment("phoenix.response.count", opts)
# log response time in microseconds
req_end_time = :os.timestamp()
duration = :timer.now_diff(req_end_time, req_start_time)
histogram("phoenix.response.time", duration, opts)
threshold = Confex.get_env(:app, :response_memory_capture_threshold)
if :rand.uniform() <= threshold do
[memory: memory] = :recon.info(self(), [:memory])
histogram("phoenix.response.memory", memory, opts)
end
conn
end)
end
end
We collect response times as well as sample memory usage. This comes in handy when trying to figure out which responses take up the most memory. We add the plug to the controller function in our Web module.
defmodule AppWeb do
def controller do
quote do
use Phoenix.Controller, namespace: AppWeb
plug(TSSWeb.Stats)
import Plug.Conn
import AppWeb.Router.Helpers
import AppWeb.Gettext
end
end
# ...
end
All this together gives us a nice picture into how our app is doing. It has already come in handy with giving us clues where to improve things, which code paths are used the most, etc.
For Phoenix, there is an Instrumentation API now. It has been there since 1.3. We aren’t using it only because we all thought it was supposed to be coming in 1.4 instead. Oops! We will be experimenting with it soon to see if it works for us.
We're building an AI-powered Product Operations Cloud, leveraging AI in almost every aspect of the software delivery lifecycle. Want to test drive it with us? Join the ProdOps party at ProdOps.ai.