integrating orq.ai for rag workflows step-by-step
Discover 5 top practices to keep your AI projects on track. Learn effective strategies to manage timelines, budgets, and teams to ensure the success of your business' transformation.
January 29, 2024
Author(s)
Key Takeaways
Managing AI projects successfully involves addressing unique challenges like data dependency, ongoing updates, and stakeholder collaboration to deliver measurable outcomes.
From maintaining cost control to ensuring data privacy, their insights help organizations navigate the complexities of AI project lifecycles effectively.
By supporting cost tracking, privacy protection, and collaborative workflows, Orq.ai enables teams to manage AI development with greater precision and security.
Orq.ai is a suite of collaboration and no-code building blocks for product teams building their custom technology solutions.
Our intuitive platform enables technical and non-technical team members to manage and run business rules, remote configurations, and AI prompts in a highly intuitive environment. This process is supported by powerful tools such as versioning, simulators, code generators, and logs.
In this guide, we show you how to implement a RAG pipeline with our Python SDK using an OpenAI LLM in combination with a Weaviate vector database and an OpenAI embedding model. LangChain is used for orchestration, generating responses from Orq.ai, and logging additional data.
Prerequisites
For you to be able to follow along in this tutorial, you will need the following:
Jupyter Notebook (or any IDE of your choice)
langchain
for orchestrationOpenAI
for the embedding model and LLMweaviate-client
for the vector databaseOrquesta
Python SDK
Install SDK and Packages
Install the orquesta-sdk package
Install the langchain package
Install the openai package
# Install the weaviate-client package
Grab your OpenAI API keys.
Enable models in the Model Garden
Orq.ai allows you to pick and enable the models of your choice and work with them. Enabling a model(s) is very easy; all you have to do is navigate to the Model Garden and toggle on the model of your choice.
Collect and load data
The raw text document is available in LangChain’s GitHub repository.
Import the
requests
library for making HTTP requestsImport the
TextLoader
module from langchain to load text data fromlangchain.document_loaders
Define the URL from which to fetch the text data"
Make an HTTP GET request to fetch the content from the specified URL
Open the local file named '
state_of_the_union.txt
' in write mode ('w')Create a TextLoader instance, specifying the path to the local text file
Load the text data
Chunk your documents
LangChain has many built-in text splitters for this purpose. For this example, you can use CharacterTextSplitter
with a chunk_size
of about 1000
and a chunk_overlap
of 0
to preserve text continuity between the chunks.
The code uses the
CharacterTextSplitter
class from thelangchain.text_splitter
module to split a given set of documents into smaller chunks.The
chunk_size
parameter determines the size of each chunk, and thechunk_overlap
parameter specifies the overlap between adjacent chunks.Creating an instance of
CharacterTextSplitter
allows for customization of chunking parameters based on the specific needs of the text data.The
split_documents
method is called to perform the actual splitting, and the result is stored in thechunks
variable, which now holds a list of text chunks.
Embed and store the chunks
To enable semantic search across the text chunks, you need to generate the vector embeddings for each chunk and then store them together with their embeddings.
Initialize a Weaviate client with embedded options. This client will be used to interact with the Weaviate service.
Utilize the Weaviate module from LangChain to create a Weaviate vector store. This involves providing the Weaviate client, the documents (chunks) to be processed, specifying the OpenAI embeddings for vectorization, and setting the option for processing non-text data (
by_text=False
).
Step 1: Retrieve
Populate the vector database and define it as the retriever component, which fetches the additional context based on the semantic similarity between the user query and the embedded chunks.
We convert the vector store into a retriever, enabling similarity searches.
Then perform a similarity search using the provided query ("What did the president say about Justice Breyer"). The result, stored in the variable
docs
, is a list of documents ranked by similarity.Finally, extract the content of the most similar document in the search.
Step 2: Augment
Create a client instance for Orq.ai. You can instantiate as many client instances as necessary with the `OrquestaClient` class. You can find your API Key in your workspace: `https://my.orquesta.dev/<workspace-name>/settings/develop
`
Prepare a Deployment in Orq.ai and set up the primary model, fallback model, number of retries, and the prompt itself with variables. Whatever the information from the RAG process, you need to attach it as a variable when you call a Deployment in Orq.ai. An example is shown below:
Request a variant by right-clicking on the row and generate the code snippet.
Invoke Orq.ai Deployment; for the context, we set it to the similarity search result, chaining together the retriever and the prompt.
Step 3: Generate
Your LLM response is generated from Orq.ai using the selected model from the Deployment, and you can print it out.
Logging additional metrics to the request
After a successful query, Orq.ai will generate a log with the evaluation result. You can add metadata and score to the Deployment by using the add_metrics()
method.
You can also fetch the deployment configuration if you are using Orq.ai as a prompt management system.
Finally, you can head over to your Deployment in the Orq.ai dashboard and click on logs, and you will be able to see your LLM response and other information about the LLM interaction.