Local GPU Backend
The Local backend runs SDPO training and vLLM inference on your own hardware. It requires a GPU with >= 24 GB VRAM.Requirements
- NVIDIA GPU with >= 24 GB VRAM (e.g. RTX 3090, RTX 4090, A5000, L40S)
- NVIDIA Container Toolkit (for Docker)
- Docker and Docker Compose
- Python 3.11+ and uv
Installation
Configure environment
.env and set TELEGRAM_BOT_TOKEN (required). Optionally set HF_TOKEN for gated models.Start the stack
Services
| Service | Port | Description |
|---|---|---|
vllm | 8000 | Qwen3-8B with LoRA serving and sleep/wake support |
claas-api | 8080 | CLaaS feedback API and distill worker |
openclaw-local | 18789 | OpenClaw gateway with Telegram bot |
init-local | — | One-shot: creates LoRA adapter + writes OpenClaw config |
Configuration
These variables are set in the.env file.
| Variable | Required | Default | Description |
|---|---|---|---|
TELEGRAM_BOT_TOKEN | Yes | — | Bot token from @BotFather |
HF_TOKEN | No | — | HuggingFace token (gated models only) |
MODEL | No | Qwen/Qwen3-8B | Base model ID |
GPU_MEMORY_UTILIZATION | No | 0.70 | VRAM fraction for vLLM |
MAX_MODEL_LEN | No | 32768 | Max sequence length |
Verification
openclaw-assistant-latest LoRA model.
Manual Setup (without Docker)
Manual Setup (without Docker)
If you prefer not to use Docker, you can run each service manually:

