Stateless-FileSystem-Agent

A Serverless AI Agent system built on Claude Agent SDK, implementing stateful conversation persistence across stateless containers using S3+DynamoDB.

English Documentation

Project Overview

A Serverless AI Agent system built on Claude Agent SDK, implementing stateful conversation persistence across stateless containers using S3+DynamoDB.

Exploratory Project | This project explores how to achieve stateful AI Agent sessions using FileSystem + Stateless Containers (AWS Lambda with Firecracker runtime as the foundation). It demonstrates how to maintain conversation persistence across stateless function invocations.

Architecture

Telegram User → Bot API → API Gateway → Producer Lambda → SQS FIFO Queue → Consumer Lambda
                                              ↓                                  ↓
                                        Return 200                      agent-server Lambda
                                        immediately                              ↓
                                              DynamoDB (Session mapping) + S3 (Session files) + Bedrock (Claude)

Core Design:

Uses the Hybrid Sessions pattern recommended by Claude Agent SDK
SQS FIFO Async Architecture: Producer returns 200 immediately to Telegram, Consumer processes requests asynchronously with message ordering guarantee

Features

Session Persistence: DynamoDB for mapping storage, S3 for conversation history, cross-request recovery support
Multi-tenant Isolation: Client isolation based on Telegram chat_id + thread_id
Forum Group Support: Topic-based conversation isolation with auto-precheck
User Whitelist: Control private chat and group invitation permissions
SubAgent Support: Configurable specialized Agents (e.g., AWS support) with example implementations
Skills Support: Reusable skill modules with hello-world example
MCP Integration: Support for HTTP and local command-based MCP servers (Node.js 20+)
Security: Telegram Webhook secret token verification (HMAC)
Auto Cleanup: 25-day TTL + S3 lifecycle management
SQS FIFO Queue: Ordered async processing + auto retry + dead letter queue
Quick Start: Provides example Skill/SubAgent/MCP configurations for adding other components

Commands

Command	Description
`/newchat <message>`	Create new Topic in Forum group and start conversation
`/debug`	Download current session files (conversation.jsonl, debug.txt, todos.json)
`/start`	Welcome message (private chat)
`/help`	Show help message

Project Structure

├── agent-sdk-server/          # Agent Runtime (Docker Container)
│   ├── handler.py             # Lambda Entry Point
│   ├── agent_session.py       # SDK Wrapper
│   ├── session_store.py       # Session Persistence
│   └── claude-config/         # Configuration Files
│       ├── agents.json        # SubAgent Definitions
│       ├── mcp.json           # MCP Server Configuration
│       ├── skills/            # Skills Definitions
│       │   └── hello-world/   # Example Skill
│       └── system_prompt.md   # System Prompt
│
├── agent-sdk-client/          # Telegram Client (ZIP Deployment)
│   ├── handler.py             # Producer: Webhook receiver, writes to SQS
│   ├── consumer.py            # Consumer: SQS consumer, calls Agent
│   ├── config.py              # Configuration management
│   ├── config.toml            # Command configuration
│   └── security.py            # Security utilities
│
├── docs/                      # Documentation
│   └── anthropic-agent-sdk-official/  # SDK Official Docs Reference
│
├── template.yaml              # SAM Deployment Template
└── samconfig.toml             # SAM Configuration

Deployment

Prerequisites

AWS CLI + SAM CLI
Docker
Amazon Bedrock access (Claude models)
Telegram Bot Token

Configuration

Copy and modify configuration files:

cp .env.example .env
# Edit .env to fill in required environment variables

Build and deploy:

sam build
sam deploy --guided

Environment Variables

Variable	Description
`SESSION_BUCKET`	S3 bucket name (auto-created)
`SESSION_TABLE`	DynamoDB table name (auto-created)
`BEDROCK_ACCESS_KEY_ID`	Bedrock access key
`BEDROCK_SECRET_ACCESS_KEY`	Bedrock secret key
`SDK_CLIENT_AUTH_TOKEN`	Internal authentication token
`TELEGRAM_BOT_TOKEN`	Telegram Bot Token
`TELEGRAM_WEBHOOK_SECRET`	(Optional) Webhook secret for security verification
`QUEUE_URL`	SQS queue URL (auto-created)

Tech Stack

Runtime: Python 3.12 + Claude Agent SDK
Computing: AWS Lambda (ARM64)
Storage: S3 + DynamoDB
Message Queue: AWS SQS (FIFO Queue + DLQ)
AI: Claude via Amazon Bedrock
Orchestration: AWS SAM
Integration: Telegram Bot API + MCP

SQS FIFO Async Architecture

Problem Solved: Telegram Webhook times out and retries after ~27s, while Agent processing may take 30-70s, causing duplicate responses.

Solution:

Producer Lambda receives Webhook, writes to SQS FIFO, returns 200 immediately (<1s)
Consumer Lambda consumes from SQS, calls Agent Server, sends response to Telegram
FIFO queue ensures message ordering within same session (MessageGroupId = chat_id:thread_id)
Retry 3 times on failure, then move to dead letter queue (DLQ)

Queue Configuration:

FifoQueue: true (ordered delivery per MessageGroupId)
VisibilityTimeout: 900s (= Lambda timeout)
maxReceiveCount: 3 (retry 3 times)
DLQ Alarm: CloudWatch alarm triggers when messages enter DLQ

Session Management

Lifecycle:

New message → Query DynamoDB mapping
Mapping exists → Download conversation.jsonl from S3 → Restore session
No mapping → Create new session → Save mapping to DynamoDB
Processing done → Upload updates to S3

Persistent Files:

conversation.jsonl - Conversation history (required for restoration)
debug.txt - Debug logs
todos.json - Task status

Configure Commands

Edit agent-sdk-client/config.toml:

[agent_commands]
commands = ["/custom-skill", "/hello-world"]

[local_commands]
# Static response
help = { type = "static", response = "Hello World" }
# Handler function
newchat = { type = "handler", handler = "newchat" }
debug = { type = "handler", handler = "debug" }

[security]
user_whitelist = ["all"]  # or [123456789, 987654321]

Configure SubAgents

Edit agent-sdk-server/claude-config/agents.json:

{
  "agent-name": {
    "description": "Agent description",
    "prompt_file": "agents/prompt.md",
    "tools": ["specific tool name"],
    "model": "haiku"
  }
}

Note: The tools field does not support wildcards; you must specify complete tool names.

Configure Skills

Create a new Skill in the agent-sdk-server/claude-config/skills/ directory:

Create a folder: skills/your-skill/
Create a SKILL.md file with YAML frontmatter and Markdown description
Claude Agent SDK will auto-discover and use these Skills

Example: skills/hello-world/SKILL.md

Configure MCP Servers

Edit agent-sdk-server/claude-config/mcp.json, supporting two types:

HTTP MCP: HTTP endpoint pointing to remote MCP servers
Command-line MCP: Start local MCP servers via command and args

Examples include AWS knowledge base MCP servers. Refer to existing configurations to add more MCP servers.

Forum Group Setup

For Telegram Forum groups:

Enable Topics feature in group settings
Add Bot to group (must be by whitelisted user)
Promote Bot to admin with "Manage Topics" permission
Use /newchat <message> to create new conversation topics

See docs/forum-group-security.md for details.

Quick Start Examples

The project includes the following example components; follow these examples to add other components:

SubAgent Example: aws-support Agent in agents.json
Skill Example: skills/hello-world/SKILL.md
MCP Example: AWS knowledge base and documentation MCP servers in mcp.json

TODO

Multi-tenant TenantID isolation

License

MIT

中文文档

项目概述

基于 Claude Agent SDK 构建的 Serverless AI Agent 系统，通过 S3+DynamoDB 实现无状态容器的"有状态"会话持久化。

探索性项目 | 本项目旨在探索如何通过 FileSystem + 无状态容器（以 Firecracker 为底层的 AWS Lambda）实现有状态 AI Agent 会话。项目展示了在无状态函数调用间维持对话持久化的实现方式。

架构

Telegram User → Bot API → API Gateway → Producer Lambda → SQS FIFO Queue → Consumer Lambda
                                              ↓                                  ↓
                                        立即返回 200                      agent-server Lambda
                                                                                ↓
                                              DynamoDB (Session映射) + S3 (Session文件) + Bedrock (Claude)

核心设计：

采用 Claude Agent SDK 官方推荐的 Hybrid Sessions 模式
SQS FIFO 异步架构：Producer 立即返回 200 给 Telegram，Consumer 异步处理请求，保证消息顺序

特性

Session 持久化：DynamoDB 存储映射，S3 存储对话历史，支持跨请求恢复
多租户隔离：基于 Telegram chat_id + thread_id 实现客户端隔离
Forum 群组支持：基于 Topic 的对话隔离，自动预检权限
用户白名单：控制私聊和群组邀请权限
SubAgent 支持：可配置多个专业 Agent（如 AWS 支持），包含示例实现
Skills 支持：可复用的技能模块，包含 hello-world 示例
MCP 集成：支持 HTTP 和本地命令类型的 MCP 服务器 (Node.js 20+)
安全验证：支持 Telegram Webhook 密钥验证 (HMAC)
自动清理：25天 TTL + S3 生命周期管理
SQS FIFO 队列：有序异步处理 + 自动重试 + 死信队列
快速开始：提供示例 Skill/SubAgent/MCP 配置，可按照示例添加其他组件

命令

命令	说明
`/newchat <消息>`	在 Forum 群组中创建新 Topic 开始对话
`/debug`	下载当前会话文件 (conversation.jsonl, debug.txt, todos.json)
`/start`	欢迎消息 (私聊)
`/help`	显示帮助信息

项目结构

├── agent-sdk-server/          # Agent Runtime (Docker容器)
│   ├── handler.py             # Lambda入口
│   ├── agent_session.py       # SDK包装器
│   ├── session_store.py       # Session持久化
│   └── claude-config/         # 配置文件
│       ├── agents.json        # SubAgent定义
│       ├── mcp.json           # MCP服务器配置
│       ├── skills/            # Skills定义
│       │   └── hello-world/   # 示例 Skill
│       └── system_prompt.md   # 系统提示
│
├── agent-sdk-client/          # Telegram客户端 (ZIP部署)
│   ├── handler.py             # Producer: Webhook接收，写入SQS
│   ├── consumer.py            # Consumer: SQS消费，调用Agent
│   ├── config.py              # 配置管理
│   ├── config.toml            # 命令配置
│   └── security.py            # 安全工具
│
├── docs/                      # 文档
│   └── anthropic-agent-sdk-official/  # SDK官方文档参考
│
├── template.yaml              # SAM部署模板
└── samconfig.toml             # SAM配置

部署

前置条件

AWS CLI + SAM CLI
Docker
Amazon Bedrock 访问权限（Claude模型）
Telegram Bot Token

配置

复制并修改配置文件：

cp .env.example .env
# 编辑 .env 填入必要的环境变量

构建和部署：

sam build
sam deploy --guided

环境变量

变量	说明
`SESSION_BUCKET`	S3桶名称（自动创建）
`SESSION_TABLE`	DynamoDB表名（自动创建）
`BEDROCK_ACCESS_KEY_ID`	Bedrock访问密钥
`BEDROCK_SECRET_ACCESS_KEY`	Bedrock密钥
`SDK_CLIENT_AUTH_TOKEN`	内部认证Token
`TELEGRAM_BOT_TOKEN`	Telegram Bot Token
`TELEGRAM_WEBHOOK_SECRET`	(可选) Webhook密钥验证
`QUEUE_URL`	SQS队列URL（自动创建）

技术栈

Runtime: Python 3.12 + Claude Agent SDK
计算: AWS Lambda (ARM64)
存储: S3 + DynamoDB
消息队列: AWS SQS (FIFO Queue + DLQ)
AI: Claude via Amazon Bedrock
编排: AWS SAM
集成: Telegram Bot API + MCP

SQS FIFO 异步架构

解决的问题：Telegram Webhook 在 ~27s 后超时重试，而 Agent 处理可能需要 30-70s，导致重复响应。

解决方案：

Producer Lambda 接收 Webhook，写入 SQS FIFO，立即返回 200（<1s）
Consumer Lambda 从 SQS 消费，调用 Agent Server，发送响应给 Telegram
FIFO 队列保证同一会话内消息顺序 (MessageGroupId = chat_id:thread_id)
失败重试 3 次，最终失败进入死信队列（DLQ）

队列配置：

FifoQueue: true（按 MessageGroupId 有序投递）
VisibilityTimeout: 900s（= Lambda 超时）
maxReceiveCount: 3（重试 3 次）
DLQ 告警：消息进入 DLQ 时触发 CloudWatch 告警

Session 管理

生命周期：

新消息 → 查询 DynamoDB 映射
存在映射 → 从 S3 下载 conversation.jsonl → 恢复会话
不存在 → 创建新 session → 保存映射到 DynamoDB
处理完成 → 上传更新到 S3

持久化文件：

conversation.jsonl - 对话历史（恢复必需）
debug.txt - 调试日志
todos.json - 任务状态

配置命令

编辑 agent-sdk-client/config.toml：

[agent_commands]
commands = ["/custom-skill", "/hello-world"]

[local_commands]
# 静态回复
help = { type = "static", response = "Hello World" }
# 处理函数
newchat = { type = "handler", handler = "newchat" }
debug = { type = "handler", handler = "debug" }

[security]
user_whitelist = ["all"]  # 或 [123456789, 987654321]

配置 SubAgent

编辑 agent-sdk-server/claude-config/agents.json：

{
  "agent-name": {
    "description": "Agent描述",
    "prompt_file": "agents/prompt.md",
    "tools": ["具体工具名称"],
    "model": "haiku"
  }
}

注意：tools 字段不支持通配符，必须指定完整工具名称。

配置 Skills

在 agent-sdk-server/claude-config/skills/ 目录下创建新 Skill：

创建文件夹：skills/your-skill/
在文件夹中创建 SKILL.md 文件，包含 YAML 前置和 Markdown 描述
Claude Agent SDK 会自动发现并使用这些 Skills

参考示例：skills/hello-world/SKILL.md

配置 MCP 服务器

编辑 agent-sdk-server/claude-config/mcp.json，支持两种类型：

HTTP MCP：指向远程 MCP 服务器的 HTTP 端点
命令行 MCP：通过 command 和 args 启动本地 MCP 服务器

示例中配置了 AWS 知识库 MCP 服务器。可参考现有配置添加更多 MCP 服务器。

Forum 群组设置

在 Telegram Forum 群组中使用：

在群组设置中启用 Topics 功能
将 Bot 添加到群组（必须由白名单用户添加）
将 Bot 提升为管理员，授予「管理 Topics」权限
使用 /newchat <消息> 创建新对话 Topic

详见 docs/forum-group-security.md。

快速开始示例

项目已包含以下示例组件，可按照这些示例添加其他组件：

SubAgent 示例：agents.json 中的 aws-support Agent
Skill 示例：skills/hello-world/SKILL.md
MCP 示例：mcp.json 中的 AWS 知识库和文档 MCP 服务器

TODO

多租户 TenantID 隔离

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github/workflows		.github/workflows
agent-sdk-client		agent-sdk-client
agent-sdk-server		agent-sdk-server
docs		docs
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
samconfig.toml.example		samconfig.toml.example
template.yaml		template.yaml
uv.lock		uv.lock

BukeLy/Stateless-FileSystem-Agent

Folders and files

Latest commit

History

Repository files navigation

Stateless-FileSystem-Agent

English Documentation

Project Overview

Architecture

Features

Commands

Project Structure

Deployment

Prerequisites

Configuration

Environment Variables

Tech Stack

SQS FIFO Async Architecture

Session Management

Configure Commands

Configure SubAgents

Configure Skills

Configure MCP Servers

Forum Group Setup

Quick Start Examples

TODO

License

中文文档

项目概述

架构

特性

命令

项目结构

部署

前置条件

配置

环境变量

技术栈

SQS FIFO 异步架构

Session 管理

配置命令

配置 SubAgent

配置 Skills

配置 MCP 服务器

Forum 群组设置

快速开始示例

TODO

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 3

Uh oh!

Languages

Packages