Documentation Index
Fetch the complete documentation index at: https://docs.openclaas.com/llms.txt
Use this file to discover all available pages before exploring further.
Quick Start
Prerequisites
- Python 3.11+
- uv, fast Python package manager
- Docker (for Local and Tinker Docker deployments)
- GPU with >= 24 GB VRAM (Local backend only)
git clone https://github.com/kfallah/CLaaS.git
cd CLaaS
Choose your backend
Runs SDPO training and vLLM inference on your own GPU. Requires >= 24 GB VRAM.# Install dependencies
uv sync --extra local
# Docker setup (recommended)
cd docker
cp .env.local.example .env
# Edit .env -- set TELEGRAM_BOT_TOKEN
docker compose --profile local up --build
The first run downloads Qwen3-8B (~16 GB). Expect the vLLM health check to take 10–20 minutes on first start.Full Local backend reference
Requirements, all config variables, services, and manual setup
Uses Tinker’s hosted inference and training. No GPU required.# Install dependencies
uv sync --extra tinker
# Docker setup
cd docker
cp .env.tinker.example .env.tinker
# Edit .env.tinker -- set TELEGRAM_BOT_TOKEN + TINKER_API_KEY
docker compose -f docker-compose.tinker.yml --env-file .env.tinker up --build
Full Tinker backend reference
Tinker config, model naming gotchas, and services
Runs distillation remotely on Modal (L40S GPUs). No local GPU needed.# Install dependencies
uv sync --extra local
# Authenticate with Modal
uv run modal token new
# Deploy
export HF_TOKEN=...
export CLAAS_BASE_MODEL_ID=Qwen/Qwen3-8B
uv run modal deploy -m claas.deploy
The deployed app exposes the same API at https://your-app--claas-distill-fastapi-app.modal.run.Full Modal backend reference
Deployment, config, and LoRA storage on Modal Volumes
Verify your setup
Once the stack is running, verify with:
# Check inference endpoint
curl http://localhost:8000/v1/models -H "Authorization: Bearer sk-local"
# Check CLaaS API
curl http://localhost:8080/
# List LoRA adapters
curl http://localhost:8080/v1/lora