Revelry

AI-Driven Custom Software Development

Revelry image chatgpt image apr 4 2025 03 46 59 pm

Preventing Race Conditions in Ecto with PostgreSQL Triggers

At Revelry, we’ve built all sorts of software for our partners over the years, which means we’ve encountered (and solved) a wide variety of problems. One persistent challenge? Race conditions — those pesky situations where multiple processes try to modify the same data simultaneously. At Revelry, we use Elixir and Ecto for our backend systems, which gives us powerful tools for managing data integrity. Still, race conditions are notorious for causing bugs that are hard to reproduce and debug, often only surfacing in production under heavy load.

We recently tackled an interesting race condition while building a text messaging platform. The platform lets organizations manage large-scale texting campaigns where live agents respond to incoming messages through a LiveView-powered inbox. Our challenge was making sure two agents couldn’t accidentally claim and respond to the same conversation.

Without proper safeguards, here’s what could happen:

  1. Agent A checks if a conversation is available
  2. Agent B checks the same conversation
  3. Both agents claim it and start responding
  4. Customer receives duplicate or conflicting messages – not great!

While there are several ways to prevent this, we found a particularly elegant solution by combining PostgreSQL triggers with Ecto constraints. This approach lets us enforce our business rules at the database level while keeping our Elixir code clean and predictable. Here’s how we implemented it.

The Problem: Race Conditions in Conversation Assignment

When an agent claims a conversation, we set a texter_id on the conversation record. However, without proper constraints, multiple agents could successfully claim the same conversation at nearly the same time, leading to confusion and duplicate responses.

Why Database-Level Constraints Matter

Race conditions are particularly challenging because they’re timing-dependent and often invisible during development. Application-level solutions frequently suffer from Time of Check to Time of Use (TOCTOU) vulnerabilities. Consider this naive approach:


def claim_conversation(conversation_id, texter_id) do
  conversation = Repo.get!(Conversation, conversation_id)

  if is_nil(conversation.texter_id) do
    conversation
    |> Conversation.changeset(%{texter_id: texter_id})
    |> Repo.update()
  end
end

This code has a fundamental flaw: the time between checking texter_id and performing the update creates a window where multiple processes could pass the check and attempt updates. Database constraints eliminate this vulnerability by handling concurrency at the transaction level.

Requirements

A conversation’s texter_id cannot be changed from one non-NULL value to another non-NULL value, but it must be possible to set it back to NULL from any state.

The Solution: Custom PostgreSQL Trigger

Standard PostgreSQL constraints couldn’t handle these requirements, so we implemented a custom trigger:

CREATE OR REPLACE FUNCTION check_texter_update()
RETURNS TRIGGER AS $$
BEGIN
  IF OLD.texter_id IS NOT NULL
     AND NEW.texter_id IS NOT NULL
     AND OLD.texter_id != NEW.texter_id THEN
      RAISE EXCEPTION 'Conversation is already assigned to a texter.'
      USING ERRCODE = 'check_violation',
            CONSTRAINT = 'conversation_texter_assignment_constraint';
  END IF;

  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

The migration to implement this:

defmodule MyApp.Repo.Migrations.AddTexterAssignmentConstraintTrigger do
  use Ecto.Migration

  def up do
    execute """
    CREATE OR REPLACE FUNCTION check_texter_update()
    RETURNS TRIGGER AS $$
    BEGIN
      IF OLD.texter_id IS NOT NULL
         AND NEW.texter_id IS NOT NULL
         AND OLD.texter_id != NEW.texter_id THEN
          RAISE EXCEPTION 'Conversation is already assigned to a texter.'
          USING ERRCODE = 'check_violation',
                CONSTRAINT = 'conversation_texter_assignment_constraint';
      END IF;

      RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    """

    execute """
    CREATE TRIGGER conversation_texter_assignment_constraint
    BEFORE UPDATE ON conversations
    FOR EACH ROW
    EXECUTE FUNCTION check_texter_update();
    """
  end

  def down do
    execute "DROP TRIGGER IF EXISTS conversation_texter_assignment_constraint ON conversations"
    execute "DROP FUNCTION IF EXISTS check_texter_update CASCADE"
  end
end

Integration with Phoenix and Ecto

The trigger integrates with our application through an Ecto changeset constraint:

def changeset(conversation, attrs) do
  conversation
  |> cast(attrs, [:texter_id, ...])
  |> check_constraint(:texter_id,
    name: :conversation_texter_assignment_constraint,
    message: "This conversation is already checked out"
  )
end

And the LiveView handles assignment attempts:

def handle_event("claim_conversation", %{"id" => conversation_id}, socket) do
  case Conversations.assign_to_texter(conversation_id, socket.assigns.current_user_id) do
    {:ok, conversation} ->
      {:noreply,
       socket
       |> assign(:current_conversation, conversation)
       |> put_flash(:info, "Conversation claimed successfully")}

    {:error, _changeset} ->
      {:noreply,
       socket
       |> put_flash(:error, "This conversation is already checked out")}
  end
end

Understanding the Trigger-to-Ecto Connection

The trigger’s exception is structured specifically to work with Ecto’s constraint system:

RAISE EXCEPTION 'Conversation is already assigned to a texter.'
USING ERRCODE = 'check_violation',
      CONSTRAINT = 'conversation_texter_assignment_constraint';

When PostgreSQL raises this exception, Postgrex (the PostgreSQL driver for Elixir) transforms the error code into the atom :check_violation. Ecto’s PostgreSQL adapter then matches on this atom to convert the database error into an Ecto constraint error:

def to_constraints(
  %Postgrex.Error{postgres: %{code: :check_violation, constraint: constraint}},
  _opts
),
do: [check: constraint]

from Ecto’s PostgreSQL adapter within ecto_sql

This connection between PostgreSQL’s error system and Ecto’s constraint system allows our database-level rules to seamlessly integrate with our application logic.

Trade-offs and Considerations

While triggers effectively solve our race condition, they come with important considerations:

1. Visibility and Documentation

Database triggers operate behind the scenes and can be easily overlooked since they only appear in migration files by default. To maintain transparency:

  • Document triggers thoroughly in schemas, changesets, and project documentation
  • Create Architectural Decision Records (ADRs) to explain the reasoning behind choosing triggers over application-level solutions. At Revelry, we’ve found ADRs particularly helpful in documenting why we chose what might look like over-engineered dark magic instead of simple application logic (spoiler: it’s because race conditions are nasty!)
  • Include clear comments in migrations explaining trigger behavior and purpose

2. Debugging Complexity

Trigger-related issues can be challenging to diagnose because they execute at the database level. To mitigate this:

  • Use consistent, descriptive naming conventions that connect triggers to their corresponding Ecto constraints
  • Include comprehensive trigger testing in your test suite, especially for concurrent operations
  • Log trigger actions appropriately for debugging purposes

3. Maintenance Considerations

Changes to triggers require careful planning and migration management. You’ll want to treat triggers as integral parts of your schema design and consider the impact on existing data when making modifications.

Speaking of trigger management, here’s a fun story: I remember a Phoenix project where we went all-in on triggers and ended up hacking together a custom mix task for generating trigger migrations. The cool (and admittedly hacky) part was that it would create a symlink for each generated trigger migration to a priv/triggers directory, with some fancy smart versioning to handle trigger updates. Now, this is probably overkill if you just need one or two triggers for specific reasons, but in that case we were managing a bunch of triggers that were changing often, so it made sense. The take-away here is: find a good way to document and manage database triggers if you’re going to make use of them.

While triggers might seem like overengineering compared to application-level solutions, they provide a robust defense against race conditions by enforcing rules at the database level. The key is to implement them thoughtfully and maintain clear documentation about their purpose and behavior.

Conclusion

Race conditions are an inevitable challenge in concurrent systems. While it’s tempting to handle these scenarios at the application level, database constraints provide a more robust solution by leveraging PostgreSQL’s built-in concurrency controls.

The combination of PostgreSQL triggers and Ecto constraints allows us to implement complex state transition rules that are enforced consistently across all database operations. When building systems where multiple users interact with the same data simultaneously, identifying and addressing potential race conditions early in the development process is crucial. Taking the time to implement proper database-level constraints might require more upfront effort, but it prevents subtle bugs that could be much more costly to diagnose and fix later.