快速入门

在几分钟内开始使用 Llama Stack！

Llama Stack 是一个有状态服务，提供 REST API，支持 AI 应用在不同环境之间无缝迁移。您可以先使用本地服务器构建和测试，然后部署到托管端点以供生产使用。

在本指南中，我们将演示如何使用 Llama Stack 在本地构建 RAG 应用，其中使用 Ollama 作为 Llama 模型的推理提供商。

步骤 1：安装和设置

安装 uv
使用 Ollama 在 Llama 模型上运行推理

ollama run llama3.2:3b --keepalive 60m

步骤 2：运行 Llama Stack 服务器

我们将使用 uv 来运行 Llama Stack 服务器。

INFERENCE_MODEL=llama3.2:3b uv run --with llama-stack llama stack build --template ollama --image-type venv --run

步骤 3：运行演示

现在打开一个新的终端，将以下脚本复制到名为 demo_script.py 的文件中。

from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient

vector_db_id = "my_demo_vector_db"
client = LlamaStackClient(base_url="https://:8321")

models = client.models.list()

# Select the first LLM and first embedding models
model_id = next(m for m in models if m.model_type == "llm").identifier
embedding_model_id = (
    em := next(m for m in models if m.model_type == "embedding")
).identifier
embedding_dimension = em.metadata["embedding_dimension"]

_ = client.vector_dbs.register(
    vector_db_id=vector_db_id,
    embedding_model=embedding_model_id,
    embedding_dimension=embedding_dimension,
    provider_id="faiss",
)
source = "https://www.paulgraham.com/greatwork.html"
print("rag_tool> Ingesting document:", source)
document = RAGDocument(
    document_id="document_1",
    content=source,
    mime_type="text/html",
    metadata={},
)
client.tool_runtime.rag_tool.insert(
    documents=[document],
    vector_db_id=vector_db_id,
    chunk_size_in_tokens=50,
)
agent = Agent(
    client,
    model=model_id,
    instructions="You are a helpful assistant",
    tools=[
        {
            "name": "builtin::rag/knowledge_search",
            "args": {"vector_db_ids": [vector_db_id]},
        }
    ],
)

prompt = "How do you do great work?"
print("prompt>", prompt)

response = agent.create_turn(
    messages=[{"role": "user", "content": prompt}],
    session_id=agent.create_session("rag_session"),
    stream=True,
)

for log in AgentEventLogger().log(response):
    log.print()

我们将使用 uv 来运行该脚本

uv run --with llama-stack-client demo_script.py

您应该会看到如下输出。

rag_tool> Ingesting document: https://www.paulgraham.com/greatwork.html

prompt> How do you do great work?

inference> [knowledge_search(query="What is the key to doing great work")]

tool_execution> Tool:knowledge_search Args:{'query': 'What is the key to doing great work'}

tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text="Result 1:\nDocument_id:docum\nContent:  work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 2:\nDocument_id:docum\nContent:  work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 3:\nDocument_id:docum\nContent:  work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 4:\nDocument_id:docum\nContent:  work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 5:\nDocument_id:docum\nContent:  work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')]

inference> Based on the search results, it seems that doing great work means doing something important so well that you expand people's ideas of what's possible. However, there is no clear threshold for importance, and it can be difficult to judge at the time.

To further clarify, I would suggest that doing great work involves:

* Completing tasks with high quality and attention to detail
* Expanding on existing knowledge or ideas
* Making a positive impact on others through your work
* Striving for excellence and continuous improvement

Ultimately, great work is about making a meaningful contribution and leaving a lasting impression.

恭喜！您已成功使用 Llama Stack 构建了您的第一个 RAG 应用！🎉🥳

后续步骤

现在，您已准备好深入了解 Llama Stack！

浏览详细教程。
尝试入门笔记本。
在 GitHub 上浏览更多笔记本。
了解 Llama Stack 概念。
了解如何构建 Llama 堆栈。
请参阅我们的参考资料，了解有关 Llama CLI 和 Python SDK 的详细信息。
查看 llama-stack-apps 仓库以获取示例应用和教程。