Integrating for RAG Workflows Step-by-Step

Post date :

Jan 29, 2024 is a suite of collaboration and no-code building blocks for product teams building their custom technology solutions.

Our intuitive platform enables technical and non-technical team members to manage and run business rules, remote configurations, and AI prompts in a highly intuitive environment. This process is supported by powerful tools such as versioning, simulators, code generators, and logs.

In this guide, we show you how to implement a RAG pipeline with our Python SDK using an OpenAI LLM in combination with a Weaviate vector database and an OpenAI embedding model. LangChain is used for orchestration, generating responses from, and logging additional data.


For you to be able to follow along in this tutorial, you will need the following:

  • Jupyter Notebook (or any IDE of your choice)

  • langchain for orchestration

  • OpenAI for the embedding model and LLM

  • weaviate-client for the vector database

  • Orquesta Python SDK

Install SDK and Packages

pip install orquesta-sdk langchain openai weaviate-client
  • Install the orquesta-sdk package

  • Install the langchain package

  • Install the openai package

  • # Install the weaviate-client package

Grab your OpenAI API keys.


Enable models in the Model Garden allows you to pick and enable the models of your choice and work with them. Enabling a model(s) is very easy; all you have to do is navigate to the Model Garden and toggle on the model of your choice.

Collect and load data

The raw text document is available in LangChain’s GitHub repository.

import requests
from langchain.document_loaders import TextLoader

url = ""
res = requests.get(url)
with open("state_of_the_union.txt", "w") as f:

loader = TextLoader('./state_of_the_union.txt')
documents = loader.load()
  • Import the requests library for making HTTP requests

  • Import the TextLoader module from langchain to load text data from langchain.document_loaders

  • Define the URL from which to fetch the text data"

  • Make an HTTP GET request to fetch the content from the specified URL

  • Open the local file named 'state_of_the_union.txt' in write mode ('w')

  • Create a TextLoader instance, specifying the path to the local text file

  • Load the text data

Chunk your documents

LangChain has many built-in text splitters for this purpose. For this example, you can use CharacterTextSplitter with a chunk_size of about 1000 and a chunk_overlap of 0 to preserve text continuity between the chunks.

from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
chunks = text_splitter.split_documents(documents)
  • The code uses the CharacterTextSplitter class from the langchain.text_splitter module to split a given set of documents into smaller chunks.

  • The chunk_size parameter determines the size of each chunk, and the chunk_overlap parameter specifies the overlap between adjacent chunks.

  • Creating an instance of CharacterTextSplitter allows for customization of chunking parameters based on the specific needs of the text data.

  • The split_documents method is called to perform the actual splitting, and the result is stored in the chunks variable, which now holds a list of text chunks.

Embed and store the chunks

To enable semantic search across the text chunks, you need to generate the vector embeddings for each chunk and then store them together with their embeddings.

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Weaviate
import weaviate
from weaviate.embedded import EmbeddedOptions

client = weaviate.Client(
  embedded_options = EmbeddedOptions()

vectorstore = Weaviate.from_documents(
    client = client,    
    documents = chunks,
    embedding = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY),
    by_text = False
  • Initialize a Weaviate client with embedded options. This client will be used to interact with the Weaviate service.

  • Utilize the Weaviate module from LangChain to create a Weaviate vector store. This involves providing the Weaviate client, the documents (chunks) to be processed, specifying the OpenAI embeddings for vectorization, and setting the option for processing non-text data (by_text=False).

Step 1: Retrieve

Populate the vector database and define it as the retriever component, which fetches the additional context based on the semantic similarity between the user query and the embedded chunks.

retriever = vectorstore.as_retriever()
docs = vectorstore.similarity_search("What did the president say about Justice Breyer")
  • We convert the vector store into a retriever, enabling similarity searches.

  • Then perform a similarity search using the provided query ("What did the president say about Justice Breyer"). The result, stored in the variable docs, is a list of documents ranked by similarity.

  • Finally, extract the content of the most similar document in the search.

Step 2: Augment

Create a client instance for You can instantiate as many client instances as necessary with the `OrquestaClient` class. You can find your API Key in your workspace: `<workspace-name>/settings/develop`

from orquesta_sdk import Orquesta, OrquestaClientOptions

api_key = "ORQUESTA_API_KEY"

options = OrquestaClientOptions(

client = Orquesta(options)

Prepare a Deployment in and set up the primary model, fallback model, number of retries, and the prompt itself with variables. Whatever the information from the RAG process, you need to attach it as a variable when you call a Deployment in An example is shown below:

Request a variant by right-clicking on the row and generate the code snippet.

Invoke Deployment; for the context, we set it to the similarity search result, chaining together the retriever and the prompt.

deployment = client.deployments.invoke(
    "environments": [
    "locale": [
    "context": docs[0].page_content,
    "question": "What did the president say about Justice Breyer"

Step 3: Generate

Your LLM response is generated from using the selected model from the Deployment, and you can print it out.


Logging additional metrics to the request

After a successful query, will generate a log with the evaluation result. You can add metadata and score to the Deployment by using the add_metrics() method.

  feedback={"score": 100},
      "custom": "custom_metadata",
      "chain_id": "ad1231xsdaABw",

You can also fetch the deployment configuration if you are using as a prompt management system.

config = client.deployments.get_config(
    "environments": [
    "locale": [
    "context": docs[0].page_content,
    "question": "What did the president say about Justice Breyer"

deployment_config = config.to_dict()

Finally, you can head over to your Deployment in the dashboard and click on logs, and you will be able to see your LLM response and other information about the LLM interaction.