In a recent article, I shared Revelry’s journey into building with generative AI. As a brief recap: since 2013, Revelry has been at the forefront of custom software development, consistently integrating cutting-edge technology. Fast forward to June 2022: with the release of GPT-3.5 and the rising prominence of ChatGPT, it became clear that now was the time to dive into this powerful new technology.
In my previous post, I detailed our first experiment: StoryBot, a CLI tool for generating user stories developed in Node with LangChain.js. While this was an exciting initial endeavor, it was merely the beginning of our journey.
Initially, we found it straightforward to connect LangChain to an LLM to generate content. However, this approach was quite basic: we were essentially sending hard-coded prompts, enhanced with user inputs, directly to OpenAI’s GPT-3.5-turbo.
Recognizing the need for more sophisticated methods, we turned our attention to what is now widely known as Retrieval Augmented Generation (RAG). RAG significantly enhances content generation by incorporating relevant information that may not be present in the model’s training data. In its simplest form, RAG involves retrieving pertinent data and embedding it into the prompt before it’s sent to the LLM. This additional context enables the LLM to generate responses that are more accurate and contextually relevant.
With this advanced technique in mind, we embarked on our second experimental project: RevBot. RevBot is a bot designed to answer questions about Revelry by leveraging our internal company playbook. Let’s delve into what went into building RevBot and how this approach has furthered our journey into generative AI.
RevBot: A RAG Prototype Built with LangChain
Given how easy it was to get up and running with LangChain during our StoryBot experiment, we decided to continue down this track. This time, however, we aimed to experiment with some of the more advanced features provided by LangChain. One feature that immediately caught our interest was Chains, described in the documentation as “sequences of calls – whether to an LLM, a tool, or a data preprocessing step.”
Exploring Chains and Vector Databases
LangChain provides several out-of-the-box chains tailored for various solutions. For our purposes, the obvious choice was the VectorDBQAChain (which is now deprecated in favor of createRetrievalChain). This chain facilitated querying an external vector database using semantic search and piping those relevant results into the prompt sent to the LLM.
Semantic search, at its core, leverages vector embeddings and cosine similarity:
- Vector Embeddings: Text data is transformed into high-dimensional vectors, which capture the semantic meaning of the text. These embeddings are generated by pre-trained models like OpenAI’s embeddings or other NLP (Natural Language Processing) models.
- Cosine Similarity: Once text data is converted into vectors, semantic similarity between queries and documents is measured using cosine similarity. Cosine similarity calculates the cosine of the angle between two vectors, effectively comparing their orientation in the vector space. Scores closer to 1 indicate higher similarity, whereas those closer to 0 indicate lower similarity.
This technique allows you to obtain accurate answers to questions that the LLM might not inherently know based on its training data. The process works by retrieving semantically relevant information from the vector database and injecting it into the prompt, thus giving the LLM additional context to generate responses.
Here’s a basic flow of how this works:
- A query (or user question) is converted into a vector embedding.
- The vector database (populated with document embeddings) is queried using cosine similarity to find the indexed document(s) most semantically similar to the query.
- These documents are then used to enrich the prompt sent to the LLM, enabling it to generate a more accurate and context-aware response.
Key Steps for Implementing RevBot
To bring RevBot to life, we followed these high-level steps:
- Populate a Vector Database with Relevant Data
- LangChain offers various “Document Loaders” that extract documents from different data sources. The text in these documents gets “chunked” into semantically meaningful segments, which are then converted into vector embeddings and stored in a vector database for semantic search.
- We used the GitHubRepoLoader to transform our company playbook, stored as markdown files in a GitHub repo, into searchable documents in a vector database for our Q&A app.
- For the vector database, we opted for Pinecone due to its fully managed solution and ease of use.
- Create a Web Application for User Interaction
- We chose Next.js owing to its compatibility with LangChain.js and its popularity as a modern JS web stack.
- The core of our app is a single page with a form that accepts user input (i.e., the question). This input is passed to the
VectorDBQAChain
to generate answers based on the playbook data from our vector database.
- Presenting the Generated Content
- The application then presents the generated answer to the user, along with the source documents retrieved via the semantic search.
Considerations and Future Improvements
As this was a proof of concept for utilizing proprietary data in Retrieval Augmented Generation, we did not address several production-level concerns. For instance:
- Static Data in Vector Database: The data in our vector database is static because it was populated via a one-off script. To reflect changes in our playbook GitHub repo, we would need to manually rerun the script. This could be automated with a GitHub action to update the vector DB whenever a pull request is merged.
Advantages and Limitations of LangChain
Our journey with LangChain began with excitement due to its numerous out-of-the-box features, which made integrating LLMs into applications relatively straightforward. This experimentation led to proofs of concept like StoryBot—a simple CLI for generating user stories—and RevBot, a web app answering questions based on our company handbook.
Advantages of LangChain
- Out-of-the-Box Tooling:
- LangChain’s pre-built features, such as document loaders and QA chains, significantly accelerated our initial development. The ease of integrating these tools allowed us to quickly prototype and iterate on our ideas.
- Document Loaders: These tools extracted documents from various sources and chunked them into semantically meaningful segments. This process is essential for generating accurate vector embeddings used in semantic search.
- QA Chains: The Vector Database QA Chain made it easy to leverage semantic search. These chains come with pre-built prompts for the various steps in the chain, as well as logic to format prompts differently depending on the target LLM. This saved us time but also abstracted much of the process, limiting our control and requiring deep dives to understand and modify the underlying logic.
- Open Source:
- Since LangChain is open source, we can see all of the underlying code, and have the ability to contribute to and follow the project as it develops.
Limitations of LangChain
- Ecosystem Lock-In:
- One significant drawback was LangChain’s restriction to Python and JavaScript ecosystems. Despite being functional, we have a strong preference for Elixir, which offers distinct advantages in scalability and concurrency. This lock-in was not a dealbreaker on its own, but it highlighted another layer of constraint we faced.
- Object-Oriented Architecture:
- LangChain was originally built in Python and later ported to JavaScript, resulting in a very object-oriented architecture with extensive inheritance and abstractions. As enthusiasts of Elixir and functional programming, this approach was not particularly appealing to us.
- Learning: The object-oriented nature made it difficult to understand how things worked under the hood and even more challenging to modify the base logic. The layers of abstraction, while offering convenience, obscured visibility and control over various components.
- Control and Customization:
- LangChain’s abstractions, though powerful, limited our ability to fine-tune specific processes such as how different types of documents were semantically chunked or modifying steps in a QA chain.
- Learning: This realization highlighted the trade-off: while LangChain offered rapid setup and ease of use, it sacrificed the granular control we needed for more complex and customized applications.
- Frequent API Changes:
- LangChain’s APIs frequently change, causing disruptions. For example, the chain we used for RevBot is now deprecated and will soon no longer be supported. These changes stem from shifts in LangChain’s abstractions rather than from the needs of underlying technologies like LLMs, vector databases, and embedding models. This instability forces us to constantly adapt our codebase even though our fundamental tasks—semantic search, prompt management, and LLM interfacing—remain stable.
Transitioning to Elixir
Given these insights, we decided to set LangChain aside and invest in building an LLM-powered application in Elixir, driven by several compelling reasons:
Why Elixir?
- Scalability:
- Elixir excels at managing real-time data and dynamically scaling to meet the demanding data processing needs of LLM applications, ensuring smooth performance even under heavy loads.
- Concurrency:
- Elixir’s lightweight process model efficiently handles AI workloads, enabling parallel processing of LLM tasks without overburdening resources.
- Fault Tolerance:
- Built on the BEAM, Elixir inherently provides high fault tolerance, allowing LLM applications to maintain consistent service by gracefully handling failures and self-recovering from errors.
- The Elixir Community:
- Libraries like Phoenix, Ecto, LiveView, and Oban offer robust frameworks to hit the ground running. They support various use cases, from fast iteration on interactive UIs (with LiveView) to resource-intensive ETL pipelines (with Oban).
- The Elixir community actively supports AI and ML initiatives through projects like Nx, Axon, and Bumblebee. Nx offers numerical computing, Axon facilitates neural network development, and Bumblebee supports pre-trained models and NLP tasks—all enhancing the ecosystem’s power and relevance for our needs.
- We wanted to give back to the Elixir community, having benefited so much from its amazing work over the years. If we can make it easier for others to build LLM-powered applications in Elixir, that’d be a huge win!
With these considerations, we felt confident that Elixir would provide the control and performance we needed. Our next post will dive into this new project, detailing how we approached building a fully customized solution from scratch in Elixir. Stay tuned!
We're building an AI-powered Product Operations Cloud, leveraging AI in almost every aspect of the software delivery lifecycle. Want to test drive it with us? Join the ProdOps party at ProdOps.ai.