AI generated concept art on pluggability. Plugs dangling above desk

The Importance of Pluggability: Migrating Vectors Between Database Providers

Vector Databases at a Glance

Vector databases are software systems designed to store large amounts of vector data and other properties related to the vectors. Vector databases have seen an uptick in demand since LLMs emerged; you can use them to prevent LLM hallucinations, which occur when LLMs generate incorrect information. Hallucinations often occur because the correct response to the prompt being sent in doesn’t have an answer in the training data. But the LLM has to output something, and since it doesn’t have any concept of its own correctness, it produces plausible sounding answers that don’t actually have a basis in facts. One method for reducing hallucinations is to provide supplemental data to the LLM; data that aren’t included in the training data set (e.g. proprietary data) that the LLM can draw from. Then, you instruct the LLM to use that information when generating its response. 

This method of retrieving data – adding (or augmenting) it to the prompt and generating output based on that additional information – is called Retrieval-Augmented Generation, or RAG. RAG is being used in LLM-enabled software systems, because input prompt sizes are limited due to the nature of the architecture of LLMs. If you put too much text in a prompt, the LLM requires too much memory to make an output and won’t be able to produce any response at all, or it will produce it very slowly. RAG is the workaround for this limitation. It involves finding the parts of your data that are most semantically relevant to your input prompt, and adding these data to the prompt to produce output based on your proprietary data that isn’t full of hallucinations. 

But how does one determine which parts of your data are semantically most relevant to your prompt? One way is to convert your text into vectors that represent their semantic meaning, and using a concept called similarity search. Using similarity search, you can determine which parts of your proprietary data are most semantically similar to your input prompt and return only those data points that are relevant to your prompt. 

This brings about the purpose of a vector database with relation to LLM-enabled software systems. To RAG successfully, you need a tool to efficiently store, manage, and compare the vectors that comprise your proprietary data. Vector databases are tools that are optimized for this exact purpose. 

Exploring Vector Database Options

There are many options when it comes to vector databases and cloud-hosted vector database providers. They span the gamut from open-source and on-prem-enabled solutions to closed-source, cloud-enabled proprietary systems. They also exist at varying levels of popularity and maturity.

At Revelry, we initially chose Pinecone, a closed-source, cloud-hosted provider, for our vector database needs. The thought behind this was that we’re building fast on a small team, and we don’t want the unnecessary technical overhead that goes into maintaining the infrastructure and technology of a subsystem of our product. We wanted to defer to the experts in building, managing, and monitoring vector databases and simply send our vectors in via HTTP requests and pull out the semantic search output vectors when we needed to perform RAG. 

Down the line, after implementing the initial iteration of the system, we considered other vector database providers. One in particular, Qdrant, was attractive for several reasons: open-source, on-prem-enabled (with a managed cloud hosted option), with an emphasis on optimized speed and storage optimizations.

Given the nature of LLMs’ tendency to take a long time to produce outputs (especially compared with the seemingly instantaneous response time that high-speed internet provides for loading many websites), any speed optimizations that are available from the flow of starting to create a prompt to sending the prompt to the LLM are crucial. Also as a small operation, low storage overhead means less expensive resource requirements and overall cost-savings for our team.

We decided we wanted to make the switch, but we already had thousands of vectors representing our data saved to our initial, cloud-hosted vector database provider. To enable  a seamless transition between database providers for our users, we needed a way to migrate all of our data from one vector database to another, and we needed our system to support both of the vector databases at the same time, to foster a transition that didn’t take a long time pulling in a bunch of existing data, storing it indefinitely while new vector data might be being created, and then transferring it to the new provider.  We knew it had to be a quick and seamless process.

The Value of Pluggability

Due to these concerns, our application was purposefully built with pluggability in mind, and we want all of our LLM infrastructure (from models, to vector databases, to data sources) to be as modular and pluggable as possible, so users have as many options as they desire to configure their LLM-enabled workflows. 

Our backend application is written in Elixir, and in that application we have modules that interface with each vector database provider we support. Right now, we support both Pinecone and Qdrant as vector database providers, and we have data migration tasks that can copy data from one database to another as needed. If we wanted to adopt a new vector database provider, it would be as simple as building out a module to support it and running the migration from one of the up-to-date databases to the new vector database provider.

The migration moved all of our vectors from Pinecone to Qdrant in about 3 minutes, with the majority of that time being spent receiving/sending HTTP requests; the processing of the data from a Pinecone-shaped vector to a Qdrant-shaped vector with Elixir was nearly instantaneous. We processed the vectors in batches of 100 to minimize the number of total HTTP requests we were sending and receiving while preventing timeouts that could occur if we sent too many vectors at once. The 3 minute migration was short enough to not cause our users to experience any significant delay in application functionality. Without a pluggable architecture, the migration would have taken much longer. 

By investing in an adaptable infrastructure, we can ensure our systems remain at the forefront of LLM-enabled software development while minimizing disruption to our users. To any development teams joining us on this journey of pushing the boundaries of LLM-enabled software, we strongly encourage you to architect your application with pluggability and modularity in mind.

We're building an AI-powered Product Operations Cloud, leveraging AI in almost every aspect of the software delivery lifecycle. Want to test drive it with us? Join the ProdOps party at ProdOps.ai.