Enhancing Your Haystack Experience: The Unofficial Chatbot Guide

A Short Guide to Integrating LLM Agents with Haystack Documentation

Generated by MidJourney (https://midjourney.com/)

Lately, i’ve been playing with Haystack, a framework that allows you to build end-to-end question answering systems. Haystack is a great tool for building QA systems, it gives you (customizable) building blocks that you can use to build your own QA system and bring it smoothly to production.

While it has a great documentation, I found myself going back and forth between the documentation and my code. As a consequence I decided to create a chatbot that I could use to ask questions about the documentation. Since i’ve been working a lot on acyclic RAG pipelines, I decided to change a bit and use an agent structure instead.

I gave to the agent a UI using Streamlit, so that I could use it in a browser and I deployed it using a Docker container.

Setup

First of all, you need to install the dependencies. You can do it by running:

pip install farm-haystack[inference,preprocessing]
pip install streamlit==1.29.0

What is an agent?

To put it shortly, an agent is an LLM that can use a toolbox to perform actions. The toolbox is a set of functions that can be called by the agent to perform some actions with respect to the external world.

Our agent is composed of 3 main components:

  1. A retrival tool: this tool is used to retrieve the documents that are relevant to the question asked by the user.
  2. A memory: this is where the memory of the conversation is stored. It is used to keep track of the conversation understand the relation between different steps of the conversation.
  3. An LLM: this is the agent itself. It is “brain” of the agent. It is responsible for understanding the question asked by the user, using tools and for generating the answer.

The toolbox of the agent

The easiest way to create a tool for our agent is to create a function that takes as input the question asked by the user and returns the documents that are relevant to the question.

First of all, we create a retriever:

def return_retriever():
    """
    Returns the retriever.
    :return: the retriever
    """
    preprocessor = PreProcessor(
        split_by='word',
        split_length=4096,
        split_respect_sentence_boundary=True,
        split_overlap=40,
    )

    return WebRetriever(
        api_key=os.environ['SERPERDEV_API_KEY'],
        allowed_domains=['docs.haystack.deepset.ai'],
        mode='preprocessed_documents',
        preprocessor=preprocessor,
        top_search_results=40,
        top_k=20,
    )

In this function we create a retriever able to retrieve webpages from the Haystack documentation. To do so we use the WebRetriever class. Such class is used to search for webpages on the web and to retrieve them.

Since our objective is to retrieve the documentation of Haystack, we need to specify the domain of the documentation using the allowed_domains parameter.

We also define a preprocessor that is used to preprocess the documents retrieved by the retriever. The preprocessor takes care of splitting the documents into smaller chunks and to apply some basic cleaning operations.

Having the core componend of our tool, we can now create a function that takes as input the question asked by the user and returns the documents that are relevant to the question:

def define_haystack_doc_searcher_tool() -> Tool:
    """
    Defines the tool for searching the Haystack documentation.
    :return: the Haystack documentation searcher tool
    """
    ranker = SentenceTransformersRanker(model_name_or_path='cross-encoder/ms-marco-MiniLM-L-12-v2', top_k=5)
    retriever = return_retriever()
    haystack_docs = Pipeline()
    haystack_docs.add_node(component=retriever, name='retriever', inputs=['Query'])
    haystack_docs.add_node(component=ranker, name='ranker', inputs=['retriever'])

    return Tool(
        name='haystack_documentation_search_tool',
        pipeline_or_node=haystack_docs,
        description='Searches the Haystack documentation for information.',
        output_variable='documents',
    )

The function above creates a tool. Our tool wraps a pipeline that is composed of a retriever and a ranker. The retriever is the one we defined above. After our retriever we place a reranker. This component is used to rerank the documents retrieved by the retriever. In this case we use a cross-encoder model trained on the MS Marco dataset.

The pipeline is then passed to the Tool class. This class is used to wrap the pipeline and enrich it with some informations for the LLM agent on how to use it.

The Memory of the Agent

The memory of the agent is used to keep track of the conversation and to understand the relation between the different steps of the conversation. In our case, we want to keep track of the question asked by the users and their replies so to be able to answer causal questions.

While theoretically we could simply pass to the agent a transcript of the previous conversation, in practice this would cause the agent to quickly finish it’s prompt buffer and to start generating nonsense. To avoid this, we use a ConversationSummaryMemory that is used to summarize the conversation and to keep track of the main topics.

def return_memory_node(openai_key: str) -> ConversationSummaryMemory:
    """
    Returns the memory node.
    :param openai_key: the OpenAI key
    :return: the memory node
    """
    memory_prompt_node = PromptNode('gpt-3.5-turbo-16k', api_key=openai_key, max_length=1024)
    return ConversationSummaryMemory(memory_prompt_node)

Our Memory is composed of a single node. This node is a PromptNode that is used to generate a summary of the conversation.

The Agent

Before putting all the pieces together, we need a resolver_function, used to put all the pieces together.

def resolver_function(
    query: str,
    agent: Agent,
    agent_step: Callable,
) -> Dict[str, Any]:
    """
    This function is used to resolve the parameters of the prompt template.
    :param query: the query
    :param agent: the agent
    :param agent_step: the agent step
    :return: a dictionary of parameters
    """
    return {
        'query': query,
        'tool_names_with_descriptions': agent.tm.get_tool_names_with_descriptions(),
        'transcript': agent_step.transcript,
        'memory': agent.memory.load(),
    }

We can now define the agent itself. It’s code is rather simple: first we define a PromptNode, the LLM brain of the agent. Then we instantiate the agent itself using the Agent class.

def return_haystack_documentation_agent(openai_key: str) -> Agent:
    """
    Returns an agent that can answer questions about the Haystack documentation.
    :param openai_key: the OpenAI key
    :return: the agent
    """

    agent_prompt_node = PromptNode(
        'gpt-3.5-turbo-16k',
        api_key=openai_key,
        stop_words=['Observation:'],
        model_kwargs={'temperature': 0.05},
        max_length=10000,
    )

    agent = Agent(
        agent_prompt_node,
        prompt_template=agent_prompt,
        prompt_parameters_resolver=resolver_function,
        memory=return_memory_node(openai_key),
        tools_manager=ToolsManager([define_haystack_doc_searcher_tool()]),
        final_answer_pattern=r"(?s)Final Answer\s*:\s*(.*)",
    )

    return agent

Conclusions

The agent is now ready to be used. You can find the full code here and a demo on Hugging Face Spaces.

Below a video of the agent in action:

A few disclaimers:

  • This is a toy project and could provide wrong answers.
  • The Agent structure is probabl not the best one for this use case. I used it because I wanted to play with it.
  • I am not related to the Haystack project in any way.
Matteo Villosio
Matteo Villosio
Artificial Intelligence Specialist and Trail Runner

My work focuses on Natural Language Processing and Machine Learning. I am also a trail runner and love mountains.