Businesses of all sizes and industries are eager to take advantage of generative artificial intelligence (AI), so I’m going to share some details on Revelry’s journey with this emerging technology over the past year. Even if you’re already working in the space, I believe you’ll find something interesting and helpful about our experience. We’ve got a lot of learnings to share, so this will be the first of a series of posts.

In this first post, I’ll cover our early exploration of generative AI – when we were moving as fast as possible to learn as much as we could. Subsequent posts will get deeper into our learnings around building more complex generative AI systems, in particular diving into how to incorporate Retrieval Augmented Generation (RAG) into software systems (and how you don’t need LangChain to do it).

Some Background

Here’s a little backstory on Revelry. Since 2013, we’ve been building custom software. Our bread and butter has primarily been web and mobile app development. In the early days, it was all about Rails and Node, but we’ve played around with PHP, .NET, Java, Python, and more.

We’ve always had a bit of a thing for new and emerging tech. Take React, for instance. Back in 2014, when JQuery and Angular were the big names, we were already building apps with React. And we didn’t stop there – we jumped into React Native pretty early, too. Our first app in React Native hit the AppStore when it was just at version 0.21, and now it’s up to 0.73. (By the way, when are we getting a major version update? Looking at you too, LiveView 😉).

We still work across a variety of tech stacks, but have collectively fallen in love with the elegance, performance, and strong community around Elixir and Phoenix, which we adopted as our preferred stack around 2018. We were building sophisticated Phoenix LiveView apps before there was even an official LiveView hex package published. (Yes, we were just referencing a commit hash in our mix.exs file — don’t judge.) We have done a lot in the blockchain space too, but I’m definitely not going into that in this article.

This is all to give you a glimpse into how we at Revelry dive into new technologies. We’re not shy about exploring the bleeding edge, and it’s really paid off. Our early dives into spaces like React and Elixir have made us the experts we are today.

Where We Started

Thinking back to June 2022, when OpenAI released GPT-3, it’s been nothing short of a rollercoaster ride. We at Revelry, like many software development companies, quickly caught on that this was a game-changer for our industry. Sure, we had a bunch of engineers who were into machine learning, but AI-driven apps weren’t really our main gig. Our partners didn’t ask for them much, and they didn’t seem all that necessary… until GPT-3 came along.

By Fall 2022, we were all in, diving deep into the world of these large language models (LLMs). The pace at which things have evolved since then is mind-blowing. Back then, things weren’t quite ready for the big stage, but it was obvious this was just the start.

We saw a golden opportunity to weave generative AI into our tried-and-true software and product delivery processes. This wasn’t about replacing our team, but turbocharging their productivity. Imagine our folks focusing on the creative, problem-solving aspects of product and software design, while AI handles the tedious stuff – like writing user stories, plotting out product roadmaps, or drafting sprint reports. And what if getting up to speed on a new project could be quicker and smoother? If this could work for us, it’d surely catch on elsewhere, right?

So, we rolled up our sleeves and jumped into the nitty-gritty. It started as a research and development adventure, filled with questions, like:

Just how far can the capabilities of these LLMs go?
What’s the engineering effort needed to integrate generative AI into our custom software?
What does it take to set up an AI-powered app in a live environment?
Can LLMs genuinely enhance our team’s productivity? If so, in what ways?
Is it possible to create something other engineering teams would want to use as well?

Experimentation

So, everyone at Revelry began dabbling with ChatGPT, mostly just for kicks. Some of us were crafting Eminem-style raps to add a bit of flair to our company-wide All Hands meetings (We’ve got a different Reveler hosting each week.). Meanwhile, our CEO, Gerard Ramos – or G, as we call him – was tinkering with how ChatGPT could enhance our product delivery process.

G found out pretty fast – with some clever prompting – that ChatGPT could whip up some solid product roadmaps and user stories, and even spin out working code examples based on those stories. This was more than just cool – it was promising. So, he proposed we start building tools around these use cases. And that’s how the idea for our first proof of concept came about: an AI-powered app to create user stories from just a few inputs. Sure, it wasn’t a game-changer yet, but it was a great starting point – allowing us to dip our toes in the water, while simultaneously boosting our productivity.

Our First AI- Powered Toy: StoryBot

Enter StoryBot. This little gem was a straightforward CLI tool that we ended up releasing as an open-source NPM package. It’s essentially a single JavaScript file, leveraging LangChain to tap into GPT-3 via OpenAI’s API (This was before GPT-4.). We threw in some tailored prompts, injected the user input, and voilà – it started spitting out decent user stories right in the command line.

We went a bit further with it after that, letting the user refine their story through chat, still all in the command line. The cherry on top was the ability to export the story as an issue in a GitHub Repo (At Revelry, we not only use GitHub to store our code, but also for issue tracking, project management, and more.). Ultimately, we ended up with this StoryBot iteration:

StoryBot Under the Hood

Diving into the StoryBot repo, you’ll see the core functionality is in one JavaScript file. This file uses LangChain.js for communicating with the OpenAI API, generating user stories from command line inputs. We could have opted for LangChain’s Python library, but the two have close to feature parity, and our team works more in JavaScript than Python. At this point, it was all still experimentation, so we opted for whatever we could move the fastest in.

Technically, for such a straightforward use case, direct API calls to OpenAI would have done the trick. However, LangChain offered ease of setup and capabilities beyond just interfacing with an LLM. It’s packed with features for creating AI-powered apps, like Retrieval, Agents, and Chains, though we didn’t dive deep into any of these for StoryBot.

LangChain simplifies building AI applications, but it’s not without its complexities and limitations. It abstracts a lot, sometimes obscuring the underlying mechanics, and is currently limited to Python and JavaScript ecosystems. There is now an Elixir implementation of LangChain, which is exciting because we’re huge Elixir fans, but it isn’t nearly as far along as its Python and JS counterparts. This Elixir library also wasn’t around yet at this point in our journey.

Looking a Bit More Into StoryBot Code

The first code that actually gets executed when you run npx gen.story generates the initial prompt:

const generateInitialPrompt = () => {
  const { featureText, techStackText, contextText } = parseArgs();

  return `Context: Act as a product manager at a software development company. Write a user story for the 'Feature' defined below. Explain in detailed steps how to implement this in a section called 'Implementation Notes' at the end of the story. Please make sure that the implementation notes are complete; do not leave any incomplete sentences. ${contextText}

  ${featureText}

  ${techStackText}

  User Story Spec:
    overview:
      "The goal is to convert your response into a GitHub Issue that a software engineer can use to implement the feature. Start your response with a 'Background' section, with a few sentences about why this feature is valuable to the application and why we want the user story written. Follow with one or more 'Scenarios' containing the relevant Acceptance Criteria (AC). Use markdown format, with subheaders (e.g. '##' ) for each section (i.e. '## Background', '## Scenario - [Scenario 1]', '## Implementation Notes').",
    scenarios:
    "detailed stories covering the core loop of the feature requested",
    style:
      "Use BDD / gherkin style to describe the user scenarios, prefacing each line of acceptance criteria (AC) with a markdown checkbox (e.g. '- [ ]').",
  }`;
};

...

const prompt = generateInitialPrompt()

You can see that the prompt has specific instructions to format the story in the way that Revelry prefers, which may be different than a lot of other teams. This said, if anyone wanted to use this with different prompts, they could easily fork it and change the prompts. The most important part here is that we are injecting user input into the prompt before we send it to the LLM. In this case, there are 3 potential user inputs:

The feature in question, which comes in as the 3rd command line argument (e.g. npx gen.story [feature]).
optional --stack flag to specify the tech stack that the user story will need to be implemented in.
optional --context flag to add some additional context around the feature you are writing a user story for.

Next, we take that hydrated prompt and send it to OpenAI using the tools provided by LangChain:

const model = new OpenAI({
  streaming: true,
  modelName: "gpt-3.5-turbo",
  callbacks: [
    {
      handleLLMNewToken(token) {
        process.stdout.write(token)
      },
    },
  ],
});
const memory = new BufferMemory()
const chain = new ConversationChain({llm: model, memory: memory})
const {response} = await chain.call({input: prompt})

A few things happen at this point: we are creating a new ConversationChain object, which is a wrapper around the LangChain Chain object that we’ll use to send the prompt to the LLM. We are also creating a BufferMemory object, which is a LangChain Memory object that we’ll use to store the results of the conversation.

Sidenote: If we were just using the OpenAI API directly instead of LangChain, it would be easy to pass the chat history alongside the prompt to the API call. (I’m just clarifying that LangChain isn’t necessary for this, even though it was very easy to set up.)

Because we set streaming: true when we initialized the OpenAI object, the chain.call method will return immediately, and the LLM will start sending responses to the callback we set up earlier. Since StoryBot is a CLI tool, we’re just outputting to process.stdout here. If you’re thinking about adapting this for a web app, you’d probably need to figure out how to send JSON responses or stream them to the client. We’ll get more into that later. The main takeaway? It doesn’t take much to start seeing some cool results by plugging user inputs into a well-crafted prompt template and sending it off to GPT-3.

So at this point, there is a response that is the generated user story, but also the entire user story has been streamed into the terminal and could easily be copy pasted to wherever. However, there is no ability to make any followup refinements yet. I’m not going to go line-for-line through the rest of the final result, but the long story short is that after we get the initial generated response back, we pass it to a function that creates a readline interface that allows us to prompt the user with questions in the terminal, and then we take the users’ response and send it back as another message to the LLM in the chat history. We also added the ability to export the final result to github if you had the github API token set.

That’s it, that’s StoryBot!

If you want to play around with it, you can install it via npm.

npm install -g storybot-ai

Fun? Absolutely. Somewhat useful? Sure. But let’s be real – it was an experiment. The thing is, not everyone wants to write user stories via the command line. Plus, every team has its own style for these stories. Our hardcoded prompts were great for us, but might not hit the mark for teams outside of Revelry, especially since we often work in staff augmentation, where teams have their own preferences.

Once we had a proof of concept, we started to see the potential. We were able to get a lot of mileage out of it, and it was a great way to get started with generative AI. This got a lot of ideas spinning about how we could get better user stories based on relevant context, which ultimately led us to the next part of our journey: diving deeper into RAG (Retrieval Augmented Generation) application development.

This is the first post in a series about Revelry’s journey exploring and developing custom software powered by generative AI. The next post will dive into our next experiment: building a chatbot to answer questions about Revelry based on our company playbook. Stay tuned!

Until then, here are a few other articles we’ve put out about AI:

Agile AI Artificial Intelligence Daniel Andrews Elixir Emerging Tech Innovation Lean Agile Machine Learning

Our Journey: Building with Generative AI, Part I