Revelry

AI-Driven Custom Software Development

Geometric delta with glowing data table to represent delta sharing

Today we’re excited to announce the open-source release of Delta Query! This library allows Elixir developers to query Delta Sharing sources, and handles the filtering and parsing of parquet files so you can just get on with working with your data.

You may have read my recent post about how NimbleParsec helped me build a predicate parser to filter parquet files. Now, this library is an attempt to wrap that up into a tool that should help you access data via delta sharing with less hassle.

What is Delta Sharing?

Delta Sharing is an open protocol for secure data sharing across organizations. It’s popular for sharing large datasets without copying data or managing complex access controls. Companies use it to share analytics data, collaborate on datasets, and build data pipelines that span organizational boundaries. The protocol is straightforward: you query a Delta Sharing server, it returns metadata about Parquet files, you download those files, parse them, and work with the data. Simples!

Well…it can be.

Whenever I’ve run across Delta Sharing data retrieval in the past, I’ve been much more likely to reach for Python than Elixir. After all, the Delta Lake framework spec and their Delta sharing protocol seem to be are explicitly built for it. It might be exhausting to slog through Python code like every other non-Pythonista (yeah, the AI can write it: and I have to read it) but I can do it and be fairly confident everything is going to work as intended.

But sometimes, you just want to write some nice, clean, Elixir. Sometimes, you get tired of running serverless functions, and you just wish you could get your data first-hand, without having to transform it or pass it across various boundaries.

Why another library?

There is already an Elixir Delta Sharing client: instadeq/elixir-delta-sharing-client. It seems to be a solid (if dusty) library, and its stated goal is to “implement the whole (delta sharing) protocol”. And it works great!

But that’s not really the question this library was built to address: I don’t need anything beyond the POST .../query endpoint, and ideally, I’d like to get back elixir data structures and not columnar parquet files. All I want is to be able to query a table, and get back something I can use immediately in Elixir.

So DeltaQuery is intentionally opinionated and limited in scope. The idea is to abstract away some of the more esoteric things that you need to worry about when you’re working with this type of data, and make it easy.

The Problem, summed up:

Here’s what you’d need to do to query a Delta Sharing table in Elixir without DeltaQuery:

  1. Use elixir-delta-sharing-client or:
    • Implement a Delta Sharing protocol API (that passes along server side hints)
    • Parse newline-delimited JSON
    • Download Parquet files
  2. Parse Parquet data
  3. Apply client-side filters and transformations
  4. Convert DataFrames to Elixir data structures

Enter DeltaQuery

The idea of DeltaQuery is to handle all of that complexity for you. Here’s what the same query looks like:

"books"
|> DeltaQuery.query()
|> DeltaQuery.where("library_id = 100")
|> DeltaQuery.select(["book_id", "title", "author"])
|> DeltaQuery.limit(100)
|> DeltaQuery.execute!()
|> DeltaQuery.to_rows()

At least, that’s the idea. I wanted it to be easy to read through the queries and know exactly what you’re getting. And I wanted it to be pretty.

SQL-Like Filtering

Write filters using SQL-like predicates. DeltaQuery handles both server-side partition filtering (reducing the files you download) and client-side row filtering. Under the hood, these predicates are parsed using NimbleParsec for robust, extensible parsing with clear error messages:

"books"
|> DeltaQuery.query()
|> DeltaQuery.where("book_id = 123")
|> DeltaQuery.where("genre = 'Fiction'")
|> DeltaQuery.where("publication_date > '2024-01-01'")
|> DeltaQuery.execute!()

Fast local processing (Explorer + Polars)

Delta Sharing gives you Parquet files. Parquet is columnar, and while that’s a good thing for a lot of reasons, understanding it and parsing it might give you a headache.

Luckily, we have Explorer, which uses the Polars under the hood (so you know it’s good). In simple-ish terms, that means once the Parquet bytes are downloaded, local filtering/joins/aggregations are handled by a vectorized, columnar runtime.

Joins Across Tables

books = DeltaQuery.query("books") |> DeltaQuery.execute!()
publishers = DeltaQuery.query("publishers") |> DeltaQuery.execute!()

joined = DeltaQuery.join(books, publishers, on: "publisher_id", how: :inner)
DeltaQuery.to_rows(joined)

Post-Query Operations

Apply additional filters, search text, or aggregate data after fetching:

results = DeltaQuery.query("books") |> DeltaQuery.execute!()

# Additional filtering
{:ok, filtered} = DeltaQuery.filter(results, ["page_count > 300"])

# Text search across columns
{:ok, searched} = DeltaQuery.text_search(results, "science fiction", ["genre", "title"])

# Aggregations
DeltaQuery.aggregate_by_column(results, :status)
# => [%{status: "Approved", count: 42}, %{status: "Pending", count: 18}]

No DataFrame Knowledge Required

The library uses Explorer internally for efficient data processing, but you don’t need to know anything about DataFrames. Everything returns plain Elixir data structures.

That said, if you really want to, you can access the underlying DataFrame:

results = DeltaQuery.query("books") |> DeltaQuery.execute!()
df = results.dataframe  # Access the Explorer DataFrame directly

Tradeoffs

DeltaQuery is not trying to be a full Delta Sharing protocol implementation. If you need:

  • table metadata/version endpoints
  • pagination helpers
  • a client shaped around the full protocol surface

…you should probably start with instadeq/elixir-delta-sharing-client.

On the other hand, if you’re building an Elixir application and are quietly sobbing “just give me my data” while columns slither around your screen in Pythonic glee…this might be the library for you.

Expansion and Contribution

This library is was laser focused on simply querying delta sharing tables…

…aaand I already sullied this purity by adding the functionality to list schemas, tables and table columns for a given delta sharing endpoint. This was for two reasons: firstly, because this information is essential to the usage of the library, and secondly, because these can be obscure to figure out.

Therefore, please feel free to open issues or pull requests suggesting or implementing new features! While I began working on this with quite a narrow scope in mind, I don’t want to keep it narrow to the point of uselessness. If people find this library helpful and want to augment and/or expand it, I am absolutely open to doing so. I really do appreciate any and all contributions!

And of course, bug fixes/reports are always welcome – well, not *welcome* because it means I wrote bad code but…you know what I mean.

Getting Started

Add DeltaQuery to your `mix.exs`:

def deps do
 [{:delta_query, "~> 0.2.1"}]
end

Configure your Delta Sharing credentials. You have two options here: you can configure via the application environment, or you can create and pass the configuration at runtime:

# Option 1: Application config (config.exs or runtime.exs)

config :delta_query, :config,
  endpoint: System.get_env("DELTA_SHARING_ENDPOINT"),
  bearer_token: System.get_env("DELTA_SHARING_BEARER_TOKEN"),
  share: "my_share",
  schema: "public"

# Option 2: Pass config explicitly at runtime

config = Application.get_env(:delta_query, :config)
"my_table"
|> DeltaQuery.query()
|> DeltaQuery.execute!(config: config)
|> DeltaQuery.to_rows()

With this approach, you can use different credentials to query different delta sharing endpoints within the same application. This is the preferred way to do it – but I’m not going to tell you what to do!

Now go! Query delta sharing tables to your heart’s content – without the hassle, hopefully!

Try It Out

DeltaQuery is available on Hex and open source on GitHub.