A couple of weeks ago, a colleague asked a question in Slack that gave me a distinct feeling of déjà vu. They were testing out a production release and seeing some scary logs out of nowhere, seemingly totally unrelated to the changes they’d just deployed. They were seeing the a Phoenix.Socket check_origin error in their logs. The logs in question would probably be familiar to many readers familiar with Phoenix…

        Could not check origin for Phoenix.Socket transport.

        Origin of the request: https://my-phoenix-app.fly.dev

        This happens when you are attempting a socket connection to
        a different host than the one configured in your config/
        files. For example, in development the host is configured
        to "localhost" but you may be trying to access it from
        "127.0.0.1". To fix this issue, you may either:

          1. update [url: [host: ...]] to your actual host in the
             config file for your current environment (recommended)

          2. pass the :check_origin option when configuring your
             endpoint or when configuring the transport in your
             UserSocket module, explicitly outlining which origins
             are allowed:

                check_origin: ["https://example.com",
                               "//another.com:888", "//other.com"]

This Phoenix.Socket check_origin error is far from the most esoteric message I’ve ever seen; in fact, it’s one of the most verbose and helpful! But I’ve seen it come up over and over again in cases where the suggested fixes aren’t really what we want.

Why is this even a thing?

You’re probably familiar with Cross Site Request Forgery (CSRF) and may have noticed that every new Phoenix app comes with a CSRF Prevention plug in the router’s browser pipeline by default. While CSRF pertains mainly to HTTP POST/PUT/DELETE requests (and sometimes other HTTP methods), the same kind of problem can arise with WebSockets: An attacker could use client-side code on their own site to get your user’s browser to create an authenticated WebSocket connection to your server and carry out various read and write operations over that socket without the user’s consent.

It turns out the way to protect against Cross-site WebSocket Hijacking(CSWSH) is to check the origin header of the initial WebSocket handshake. When that header matches expected value (i.e., the URL of your site), the connection passes the check and we have some confidence the request is coming from a user who’s intentionally using your site. If it doesn’t match, the user may be on a different site where some malicious code is trying to trick them, so Phoenix rejects the WebSocket request and floods your logs with the message above.

The check_origin setting mentioned in the logs tells Phoenix which site (or sites) we expect users to connect from.

How do we keep stubbing our toes on this?

As a software agency, we start a lot of projects from scratch, and as Elixir fans, those that are web apps are going to be Phoenix apps. Before launch, the production environment is hosted on a temporary domain, whether that’s provided by our hosting service or a custom one we configure. At launch time, we do all the usual DNS and app config updates, test thoroughly on the shiny new production domain, and Revel in Victory when we’re sure it’s all working.

Two weeks later, after no hiccups or complaints, we’re suddenly getting a flurry of these errors in our logs! “Hey, I’m seeing all these socket errors but things seem fine to me… Is prod working for you?”

Why does the phoenix.socket check_origin error happen?

The simple thing that causes these errors on a site whose endpoint config is 100% correct is that somebody’s visiting it from the old domain! They bookmarked it back when we were testing it before launch, or clicked the helpful link in the hosting provider’s admin panel, and now their LiveView is totally broken. (It was even more insidious on pre-LiveView client-side Phoenix Channels integrations, when only a small part of the page might be broken!)

So what’s the right fix here? Usually for us, it’s to add a little plug to our browser pipeline that redirects anybody visiting on the wrong domain over to the correct one. It’s an easy bit of code, but somebody has to remember to add it to every new app that needs it.

We usually use something like this:

defmodule MyAppWeb.DomainRedirect do
  @moduledoc """
  If the hostname of the current request doesn't match the one configured
  on the endpoint, redirect... on GET requests only
  """
  import Plug.Conn

  alias Phoenix.Controller
  alias MyAppWeb.Endpoint

  @default_opts %{active: Mix.env() == :prod}

  def init(opts) do
    Map.merge(@default_opts, Map.new(opts))
  end

  def call(%{method: "GET"} = conn, %{active: true}) do
    req_host = conn.host
    %{host: endpoint_host} = Endpoint.struct_url()

    if req_host == endpoint_host do
      conn
    else
      redirect_url = Endpoint.url() <> path(conn)

      conn
      |> Controller.redirect(external: redirect_url)
      |> halt()
    end
  end

  def call(conn, _), do: conn

  defp path(%{request_path: path, query_string: ""}), do: path

  defp path(%{request_path: path, query_string: query}) do
    "#{path}?#{query}"
  end
end

And that’s it, our Phoenix.Socket check_origin error is fixed! With a simple redirect plug like this in your browser pipeline, you know your visitors are using the URLs you expect, and that their legitimate WebSocket connections will be accepted by the server. If you’re using Phoenix sockets from a separate client-side application or doing multi-tenancy across multiple subdomains, you’ll need something slightly more sophisticated, of course, but this approach has worked for us in most cases.

Elixir

A Common Phoenix.Socket check_origin error and its Simple Fix

Why is this even a thing?

How do we keep stubbing our toes on this?

Why does the phoenix.socket check_origin error happen?

Categories