Agent 执行循环
Agent 是 Llama Stack 应用的核心。它们将推理、记忆、安全和工具使用结合到连贯的工作流中。Agent 的核心是一个精密的执行循环,它支持多步推理、工具使用和安全检查。
Agent 工作流中的步骤
每个 Agent 回合遵循以下关键步骤
初始安全检查: 用户的输入首先通过配置的安全防护进行筛选
上下文检索:
如果启用了 RAG,Agent 可以选择从记忆库中查询相关文档。您可以使用
instructions
字段来引导 Agent。对于新文档,它们首先被插入到记忆库中。
检索到的上下文作为工具响应在消息历史中提供给 LLM。
推理循环: Agent 进入其主要执行循环
LLM 接收用户提示 (包含先前的工具输出)
LLM 生成响应,可能包含工具调用
如果存在工具调用
工具输入经过安全检查
工具被执行 (例如,网络搜索、代码执行)
工具响应反馈给 LLM 进行合成
循环持续进行,直到
LLM 提供不含工具调用的最终响应
达到最大迭代次数
超出 token 限制
最终安全检查: Agent 的最终响应通过安全防护进行筛选
sequenceDiagram participant U as User participant E as Executor participant M as Memory Bank participant L as LLM participant T as Tools participant S as Safety Shield Note over U,S: Agent Turn Start U->>S: 1. Submit Prompt activate S S->>E: Input Safety Check deactivate S loop Inference Loop E->>L: 2.1 Augment with Context L-->>E: 2.2 Response (with/without tool calls) alt Has Tool Calls E->>S: Check Tool Input S->>T: 3.1 Execute Tool T-->>E: 3.2 Tool Response E->>L: 4.1 Tool Response L-->>E: 4.2 Synthesized Response end opt Stop Conditions Note over E: Break if: Note over E: - No tool calls Note over E: - Max iterations reached Note over E: - Token limit exceeded end end E->>S: Output Safety Check S->>U: 5. Final Response
此过程中的每个步骤都可以通过配置进行监控和控制。
Agent 执行循环示例
这里有一个示例演示如何监控 Agent 的执行
from llama_stack_client import LlamaStackClient, Agent, AgentEventLogger
from rich.pretty import pprint
# Replace host and port
client = LlamaStackClient(base_url=f"http://{HOST}:{PORT}")
agent = Agent(
client,
# Check with `llama-stack-client models list`
model="Llama3.2-3B-Instruct",
instructions="You are a helpful assistant",
# Enable both RAG and tool usage
tools=[
{
"name": "builtin::rag/knowledge_search",
"args": {"vector_db_ids": ["my_docs"]},
},
"builtin::code_interpreter",
],
# Configure safety (optional)
input_shields=["llama_guard"],
output_shields=["llama_guard"],
# Control the inference loop
max_infer_iters=5,
sampling_params={
"strategy": {"type": "top_p", "temperature": 0.7, "top_p": 0.95},
"max_tokens": 2048,
},
)
session_id = agent.create_session("monitored_session")
# Stream the agent's execution steps
response = agent.create_turn(
messages=[{"role": "user", "content": "Analyze this code and run it"}],
documents=[
{
"content": "https://raw.githubusercontent.com/example/code.py",
"mime_type": "text/plain",
}
],
session_id=session_id,
)
# Monitor each step of execution
for log in AgentEventLogger().log(response):
log.print()
# Using non-streaming API, the response contains input, steps, and output.
response = agent.create_turn(
messages=[{"role": "user", "content": "Analyze this code and run it"}],
documents=[
{
"content": "https://raw.githubusercontent.com/example/code.py",
"mime_type": "text/plain",
}
],
session_id=session_id,
)
pprint(f"Input: {response.input_messages}")
pprint(f"Output: {response.output_message.content}")
pprint(f"Steps: {response.steps}")