Agentic RAG¶ 代理 RAG¶
In this tutorial we will build a retrieval agent. Retrieval agents are useful when you want an LLM to make a decision about whether to retrieve context from a vectorstore or respond to the user directly.
在本教程中,我们将构建一个 检索代理 。当您希望 LLM 决定是从 vectorstore 检索上下文还是直接响应用户时,检索代理非常有用。
By the end of the tutorial we will have done the following:
在本教程结束时,我们将完成以下工作:
- Fetch and preprocess documents that will be used for retrieval.
获取和预处理将用于检索的文档。 - Index those documents for semantic search and create a retriever tool for the agent.
为这些文档编制索引以进行语义搜索,并为代理创建检索工具。 - Build an agentic RAG system that can decide when to use the retriever tool.
构建一个代理 RAG 系统,可以决定何时使用检索工具。
Setup¶ 设置 ¶
Let's download the required packages and set our API keys:
让我们下载所需的包并设置我们的 API 密钥:
%%capture --no-stderr
%pip install -U --quiet langgraph "langchain[openai]" langchain-community langchain-text-splitters
import getpass
import os
def _set_env(key: str):
if key not in os.environ:
os.environ[key] = getpass.getpass(f"{key}:")
_set_env("OPENAI_API_KEY")
Tip
Sign up for LangSmith to quickly spot issues and improve the performance of your LangGraph projects. LangSmith lets you use trace data to debug, test, and monitor your LLM apps built with LangGraph.
注册 LangSmith 以快速发现问题并提高 LangGraph 项目的性能。 LangSmith 允许您使用跟踪数据来调试、测试和监控使用 LangGraph 构建的 LLM 应用程序。
1. Preprocess documents¶
1. 预处理文档 ¶
-
Fetch documents to use in our RAG system. We will use three of the most recent pages from Lilian Weng's excellent blog. We'll start by fetching the content of the pages using
WebBaseLoader
utility:
获取文档以在我们的 RAG 系统中使用。我们将使用 Lilian Weng 优秀博客中的三个最新页面。我们首先使用WebBaseLoader
实用程序获取页面的内容:from langchain_community.document_loaders import WebBaseLoader urls = [ "https://lilianweng.github.io/posts/2024-11-28-reward-hacking/", "https://lilianweng.github.io/posts/2024-07-07-hallucination/", "https://lilianweng.github.io/posts/2024-04-12-diffusion-video/", ] docs = [WebBaseLoader(url).load() for url in urls]
-
Split the fetched documents into smaller chunks for indexing into our vectorstore:
将获取的文档拆分成更小的块,以便索引到我们的 vectorstore 中:
2. Create a retriever tool¶
2. 创建检索器工具 ¶
Now that we have our split documents, we can index them into a vector store that we'll use for semantic search.
现在我们有了拆分的文档,我们可以将它们索引到用于语义搜索的向量存储中。
-
Use an in-memory vector store and OpenAI embeddings:
使用内存中向量存储和 OpenAI 嵌入: -
Create a retriever tool using LangChain's prebuilt
create_retriever_tool
:
使用 LangChain 的预构建create_retriever_tool
创建 retriever 工具: -
Test the tool: 测试工具:
3. Generate query¶
3. 生成查询 ¶
Now we will start building components (nodes and edges) for our agentic RAG graph. Note that the components will operate on the MessagesState
— graph state that contains a messages
key with a list of chat messages.
现在,我们将开始为代理 RAG 图构建组件( 节点和边缘 )。请注意,这些组件将对 MessagesState
— 图形状态进行作,该状态包含带有聊天消息列表的 messages
键。
-
Build a
generate_query_or_respond
node. It will call an LLM to generate a response based on the current graph state (list of messages). Given the input messages, it will decide to retrieve using the retriever tool, or respond directly to the user. Note that we're giving the chat model access to theretriever_tool
we created earlier via.bind_tools
:
构建generate_query_or_respond
节点。它将调用 LLM 以根据当前图形状态(消息列表)生成响应。给定输入消息,它将决定使用 retriever 工具进行检索,或直接响应用户。请注意,我们将为聊天模型提供对之前通过.bind_tools
创建的retriever_tool
的访问权限:from langgraph.graph import MessagesState from langchain.chat_models import init_chat_model response_model = init_chat_model("openai:gpt-4.1", temperature=0) def generate_query_or_respond(state: MessagesState): """Call the model to generate a response based on the current state. Given the question, it will decide to retrieve using the retriever tool, or simply respond to the user. """ response = ( response_model .bind_tools([retriever_tool]).invoke(state["messages"]) ) return {"messages": [response]}
-
Try it on a random input:
在随机输入上尝试一下:input = {"messages": [{"role": "user", "content": "hello!"}]} generate_query_or_respond(input)["messages"][-1].pretty_print()
Output: 输出:
-
Ask a question that requires semantic search:
提出一个需要语义搜索的问题:input = { "messages": [ { "role": "user", "content": "What does Lilian Weng say about types of reward hacking?", } ] } generate_query_or_respond(input)["messages"][-1].pretty_print()
Output: 输出:
4. Grade documents¶
4. 对文档进行评分 ¶
-
Add a conditional edge —
grade_documents
— to determine whether the retrieved documents are relevant to the question. We will use a model with a structured output schemaGradeDocuments
for document grading. Thegrade_documents
function will return the name of the node to go to based on the grading decision (generate_answer
orrewrite_question
):
添加 条件边 —grade_documents
— 以确定检索到的文档是否与问题相关。我们将使用具有结构化输出架构GradeDocuments
的模型进行文档评分。grade_documents
函数将根据评分决定(generate_answer
或rewrite_question
)返回要转到的节点的名称:from pydantic import BaseModel, Field from typing import Literal GRADE_PROMPT = ( "You are a grader assessing relevance of a retrieved document to a user question. \n " "Here is the retrieved document: \n\n {context} \n\n" "Here is the user question: {question} \n" "If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n" "Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question." ) class GradeDocuments(BaseModel): """Grade documents using a binary score for relevance check.""" binary_score: str = Field( description="Relevance score: 'yes' if relevant, or 'no' if not relevant" ) grader_model = init_chat_model("openai:gpt-4.1", temperature=0) def grade_documents( state: MessagesState, ) -> Literal["generate_answer", "rewrite_question"]: """Determine whether the retrieved documents are relevant to the question.""" question = state["messages"][0].content context = state["messages"][-1].content prompt = GRADE_PROMPT.format(question=question, context=context) response = ( grader_model .with_structured_output(GradeDocuments).invoke( [{"role": "user", "content": prompt}] ) ) score = response.binary_score if score == "yes": return "generate_answer" else: return "rewrite_question"
-
Run this with irrelevant documents in the tool response:
在工具响应中使用不相关的文档运行此命令:from langchain_core.messages import convert_to_messages input = { "messages": convert_to_messages( [ { "role": "user", "content": "What does Lilian Weng say about types of reward hacking?", }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "1", "name": "retrieve_blog_posts", "args": {"query": "types of reward hacking"}, } ], }, {"role": "tool", "content": "meow", "tool_call_id": "1"}, ] ) } grade_documents(input)
-
Confirm that the relevant documents are classified as such:
确认相关文件分类如下:input = { "messages": convert_to_messages( [ { "role": "user", "content": "What does Lilian Weng say about types of reward hacking?", }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "1", "name": "retrieve_blog_posts", "args": {"query": "types of reward hacking"}, } ], }, { "role": "tool", "content": "reward hacking can be categorized into two types: environment or goal misspecification, and reward tampering", "tool_call_id": "1", }, ] ) } grade_documents(input)
5. Rewrite question¶
5. 重写问题 ¶
-
Build the
rewrite_question
node. The retriever tool can return potentially irrelevant documents, which indicates a need to improve the original user question. To do so, we will call therewrite_question
node:
构建rewrite_question
节点。检索工具可能会返回可能不相关的文档,这表明需要改进原始用户问题。为此,我们将调用rewrite_question
节点:REWRITE_PROMPT = ( "Look at the input and try to reason about the underlying semantic intent / meaning.\n" "Here is the initial question:" "\n ------- \n" "{question}" "\n ------- \n" "Formulate an improved question:" ) def rewrite_question(state: MessagesState): """Rewrite the original user question.""" messages = state["messages"] question = messages[0].content prompt = REWRITE_PROMPT.format(question=question) response = response_model.invoke([{"role": "user", "content": prompt}]) return {"messages": [{"role": "user", "content": response.content}]}
-
Try it out: 试试看:
input = { "messages": convert_to_messages( [ { "role": "user", "content": "What does Lilian Weng say about types of reward hacking?", }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "1", "name": "retrieve_blog_posts", "args": {"query": "types of reward hacking"}, } ], }, {"role": "tool", "content": "meow", "tool_call_id": "1"}, ] ) } response = rewrite_question(input) print(response["messages"][-1]["content"])
Output: 输出:
6. Generate an answer¶
6. 生成答案 ¶
-
Build
generate_answer
node: if we pass the grader checks, we can generate the final answer based on the original question and the retrieved context:
构建generate_answer
节点:如果我们通过了 grader 检查,我们可以根据原始问题和检索到的上下文生成最终答案:GENERATE_PROMPT = ( "You are an assistant for question-answering tasks. " "Use the following pieces of retrieved context to answer the question. " "If you don't know the answer, just say that you don't know. " "Use three sentences maximum and keep the answer concise.\n" "Question: {question} \n" "Context: {context}" ) def generate_answer(state: MessagesState): """Generate an answer.""" question = state["messages"][0].content context = state["messages"][-1].content prompt = GENERATE_PROMPT.format(question=question, context=context) response = response_model.invoke([{"role": "user", "content": prompt}]) return {"messages": [response]}
-
Try it: 试一试:
input = { "messages": convert_to_messages( [ { "role": "user", "content": "What does Lilian Weng say about types of reward hacking?", }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "1", "name": "retrieve_blog_posts", "args": {"query": "types of reward hacking"}, } ], }, { "role": "tool", "content": "reward hacking can be categorized into two types: environment or goal misspecification, and reward tampering", "tool_call_id": "1", }, ] ) } response = generate_answer(input) response["messages"][-1].pretty_print()
Output: 输出:
================================== Ai Message ================================== Lilian Weng categorizes reward hacking into two types: environment or goal misspecification, and reward tampering. She considers reward hacking as a broad concept that includes both of these categories. Reward hacking occurs when an agent exploits flaws or ambiguities in the reward function to achieve high rewards without performing the intended behaviors.
7. Assemble the graph¶
7. 组装图表 ¶
- Start with a
generate_query_or_respond
and determine if we need to callretriever_tool
从generate_query_or_respond
开始,并确定是否需要调用retriever_tool
- Route to next step using
tools_condition
:
使用tools_condition
路由到下一步:- If
generate_query_or_respond
returnedtool_calls
, callretriever_tool
to retrieve context
如果generate_query_or_respond
返回tool_calls
,则调用retriever_tool
以检索上下文 - Otherwise, respond directly to the user
否则,请直接响应用户
- If
- Grade retrieved document content for relevance to the question (
grade_documents
) and route to next step:
对检索到的文档内容进行评分以使其与问题相关 (grade_documents
) 并转到下一步:- If not relevant, rewrite the question using
rewrite_question
and then callgenerate_query_or_respond
again
如果不相关,请使用rewrite_question
重写问题,然后再次致电generate_query_or_respond
- If relevant, proceed to
generate_answer
and generate final response using theToolMessage
with the retrieved document context
如果相关,请继续generate_answer
并使用ToolMessage
和检索到的文档上下文生成最终响应
- If not relevant, rewrite the question using
API Reference: StateGraph | START | END | ToolNode | tools_condition
API 参考: StateGraph |开始 |完 |工具节点 |tools_condition
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode
from langgraph.prebuilt import tools_condition
workflow = StateGraph(MessagesState)
# Define the nodes we will cycle between
workflow.add_node(generate_query_or_respond)
workflow.add_node("retrieve", ToolNode([retriever_tool]))
workflow.add_node(rewrite_question)
workflow.add_node(generate_answer)
workflow.add_edge(START, "generate_query_or_respond")
# Decide whether to retrieve
workflow.add_conditional_edges(
"generate_query_or_respond",
# Assess LLM decision (call `retriever_tool` tool or respond to the user)
tools_condition,
{
# Translate the condition outputs to nodes in our graph
"tools": "retrieve",
END: END,
},
)
# Edges taken after the `action` node is called.
workflow.add_conditional_edges(
"retrieve",
# Assess agent decision
grade_documents,
)
workflow.add_edge("generate_answer", END)
workflow.add_edge("rewrite_question", "generate_query_or_respond")
# Compile
graph = workflow.compile()
Visualize the graph: 可视化图表:
8. Run the agentic RAG¶
8. 运行代理 RAG¶
for chunk in graph.stream(
{
"messages": [
{
"role": "user",
"content": "What does Lilian Weng say about types of reward hacking?",
}
]
}
):
for node, update in chunk.items():
print("Update from node", node)
update["messages"][-1].pretty_print()
print("\n\n")
Output: 输出:
Update from node generate_query_or_respond
================================== Ai Message ==================================
Tool Calls:
retrieve_blog_posts (call_NYu2vq4km9nNNEFqJwefWKu1)
Call ID: call_NYu2vq4km9nNNEFqJwefWKu1
Args:
query: types of reward hacking
Update from node retrieve
================================= Tool Message ==================================
Name: retrieve_blog_posts
(Note: Some work defines reward tampering as a distinct category of misalignment behavior from reward hacking. But I consider reward hacking as a broader concept here.)
At a high level, reward hacking can be categorized into two types: environment or goal misspecification, and reward tampering.
Why does Reward Hacking Exist?#
Pan et al. (2022) investigated reward hacking as a function of agent capabilities, including (1) model size, (2) action space resolution, (3) observation space noise, and (4) training time. They also proposed a taxonomy of three types of misspecified proxy rewards:
Let's Define Reward Hacking#
Reward shaping in RL is challenging. Reward hacking occurs when an RL agent exploits flaws or ambiguities in the reward function to obtain high rewards without genuinely learning the intended behaviors or completing the task as designed. In recent years, several related concepts have been proposed, all referring to some form of reward hacking:
Update from node generate_answer
================================== Ai Message ==================================
Lilian Weng categorizes reward hacking into two types: environment or goal misspecification, and reward tampering. She considers reward hacking as a broad concept that includes both of these categories. Reward hacking occurs when an agent exploits flaws or ambiguities in the reward function to achieve high rewards without performing the intended behaviors.