Documentation Index
Fetch the complete documentation index at: https://docs.openclaas.com/llms.txt
Use this file to discover all available pages before exploring further.
Local GPU Backend
The Local backend runs SDPO training and vLLM inference on your own hardware. It requires a GPU with >= 24 GB VRAM.
Requirements
- NVIDIA GPU with >= 24 GB VRAM (e.g. RTX 3090, RTX 4090, A5000, L40S)
- NVIDIA Container Toolkit (for Docker)
- Docker and Docker Compose
- Python 3.11+ and uv
Installation
Clone and install
git clone https://github.com/kfallah/CLaaS.git
cd CLaaS
uv sync --extra local
Configure environment
cd docker
cp .env.local.example .env
Edit .env and set TELEGRAM_BOT_TOKEN (required). Optionally set HF_TOKEN for gated models.Start the stack
docker compose --profile local up --build
The first run downloads Qwen3-8B (~16 GB). The vLLM health check takes 10-20 minutes on first start.Verify
curl http://localhost:8000/v1/models -H "Authorization: Bearer sk-local"
curl http://localhost:8080/
curl http://localhost:8080/v1/lora
Services
| Service | Port | Description |
|---|
vllm | 8000 | Qwen3-8B with LoRA serving and sleep/wake support |
claas-api | 8080 | CLaaS feedback API and distill worker |
openclaw-local | 18789 | OpenClaw gateway with Telegram bot |
init-local | — | One-shot: creates LoRA adapter + writes OpenClaw config |
Configuration
These variables are set in the .env file.
| Variable | Required | Default | Description |
|---|
TELEGRAM_BOT_TOKEN | Yes | — | Bot token from @BotFather |
HF_TOKEN | No | — | HuggingFace token (gated models only) |
MODEL | No | Qwen/Qwen3-8B | Base model ID |
GPU_MEMORY_UTILIZATION | No | 0.70 | VRAM fraction for vLLM |
MAX_MODEL_LEN | No | 32768 | Max sequence length |
For the full Hydra config and all environment variables, see the Configuration Reference.
Verification
# Check vLLM models
curl http://localhost:8000/v1/models -H "Authorization: Bearer sk-local"
# Check CLaaS API
curl http://localhost:8080/
# List LoRA adapters
curl http://localhost:8080/v1/lora
# Test feedback loop
curl -X POST http://localhost:8080/v1/feedback \
-H "Content-Type: application/json" \
-d '{
"lora_id": "openclaw/assistant-latest",
"prompt": "hi",
"response": "hello",
"feedback": "good",
"training": {"teacher_mode": "self"}
}'
Send a DM to your Telegram bot. It should respond using the openclaw-assistant-latest LoRA model.
Manual Setup (without Docker)
If you prefer not to use Docker, you can run each service manually:# 1. Start vLLM with LoRA support
vllm serve Qwen/Qwen3-8B --host 0.0.0.0 --port 8000 \
--enable-lora --lora-modules my-lora=/loras/user/my-lora-init
# 2. Start the CLaaS API
uv run uvicorn claas.api:web_app --host 0.0.0.0 --port 8080
# 3. Initialize a LoRA adapter
curl -X POST http://localhost:8080/v1/lora/init \
-H "Content-Type: application/json" \
-d '{"lora_id": "user/my-lora"}'
# 4. Send feedback
curl -X POST http://localhost:8080/v1/feedback \
-H "Content-Type: application/json" \
-d '{
"lora_id": "user/my-lora-init",
"prompt": "Write a function to calculate factorial",
"response": "def factorial(n): ...",
"feedback": "Good recursive solution"
}'
When running uvicorn directly, use claas.api:web_app, not claas.api:app. The app object is a Modal App and is not ASGI-compatible.