lightbulb art illustration used in Revelry blog post on Pinecone vector database

Can Pinecone and Other Vector Databases Prevent LLM Hallucinations?

With a certain level of confidence, it appears yes!

In July, a group of software engineers and strategists from Revelry attended Pinecone’s AI Transformation Summit, a one-day conference featuring many of the partners and projects that use vector databases at their core.

The conference tagline was “AI Transformation without Hallucination” – and it’s certainly an important topic, because as we become increasingly reliant on the generated output of Large Language Models (LLMs), inaccurate outputs are already having real-world consequences.

Though LLMs appear to exhibit human (or superhuman) levels of reasoning, at their core, most are probabilistic, autoregressive models designed to predict the next token, i.e. things like words, punctuation marks, or sometimes parts of words. They do this by looking at three things:

  • The system prompt: Instructions written by the model’s designers, which are generally hidden from the user.
  • The user prompt: The text the user has input.
  • All previously generated tokens

Though LLMs have been trained on massive amounts of text data, they don’t have access to any data as they are generating their outputs (a process called inference); this leads to hallucinations, meaning the model predicts tokens that write incorrect/fake information.

Pinecone and other vector databases say they have the answer to this, and to some extent they do. Their answer is a technique called Retrieval Augmented Generation (RAG). With RAG, the LLM generates a response based on the user’s prompt combined with relevant information retrieved from a vector database. If you ask a question of an LLM, and also give it accurate information to answer that question, the chance the LLM will hallucinate a response is extremely low. However, it’s still not zero, because the model could ignore the context, and in any probabilistic model involving randomness, unlikely outputs are still possible.

Vector databases allow you to do similarity search and retrieval across your text-based data. While I covered vector embeddings and semantic search in more detail in a previous post, the important thing to know is that you can compare an input piece of text, such as a user question, with information you have previously stored in the vector database based on their semantic similarity, or “meaning”.

For example, let’s say you’re an outdoor retailer, and a user asks your website’s chat interface, “What shoes do you have that are good for water?”. An LLM is going to be pretty useless here, as it may be able to give generic information, but doesn’t know anything about your product catalog or what brands you carry, and will be prone to hallucinating results. However, if we turn the user’s question into a vector and use it to retrieve relevant results from the product catalog in the vector database, we can send the user’s question along with a list of products and their descriptions to the LLM. The LLM becomes highly effective at answering questions, because it has the real information in front of it. Better yet, even though the user searched for “shoes”, semantic similarity means results for hiking boots, sandals, etc. will also be included, even though they don’t match the keyword directly.

How close Retrieval Augmented Generation comes to eliminating hallucinations is still an area of active research, but a recent paper found that using the RAG technique made it possible to eliminate about 98 percent of them. In high-stakes problem domains like medicine or the law, that remaining 2 percent still presents a significant problem. For many other use cases though, 98 percent is more than good enough.

Want help building AI-powered products? Get in touch with us!