Quick Start

Prerequisites

Python 3.11+
uv, fast Python package manager
Docker (for Local and Tinker Docker deployments)
GPU with >= 24 GB VRAM (Local backend only)

git clone https://github.com/kfallah/CLaaS.git
cd CLaaS

Choose your backend

Local GPU
Tinker SDK
Modal Cloud (Coming Soon)

Runs SDPO training and vLLM inference on your own GPU. Requires >= 24 GB VRAM.

# Install dependencies
uv sync --extra local

# Docker setup (recommended)
cd docker
cp .env.local.example .env
# Edit .env -- set TELEGRAM_BOT_TOKEN
docker compose --profile local up --build

The first run downloads Qwen3-8B (~16 GB). Expect the vLLM health check to take 10–20 minutes on first start.

Full Local backend reference

Requirements, all config variables, services, and manual setup

Uses Tinker’s hosted inference and training. No GPU required.

# Install dependencies
uv sync --extra tinker

# Docker setup
cd docker
cp .env.tinker.example .env.tinker
# Edit .env.tinker -- set TELEGRAM_BOT_TOKEN + TINKER_API_KEY
docker compose -f docker-compose.tinker.yml --env-file .env.tinker up --build

Full Tinker backend reference

Tinker config, model naming gotchas, and services

Runs distillation remotely on Modal (L40S GPUs). No local GPU needed.

# Install dependencies
uv sync --extra local

# Authenticate with Modal
uv run modal token new

# Deploy
export HF_TOKEN=...
export CLAAS_BASE_MODEL_ID=Qwen/Qwen3-8B
uv run modal deploy -m claas.deploy

The deployed app exposes the same API at https://your-app--claas-distill-fastapi-app.modal.run.

Full Modal backend reference

Deployment, config, and LoRA storage on Modal Volumes

Verify your setup

Once the stack is running, verify with:

# Check inference endpoint
curl http://localhost:8000/v1/models -H "Authorization: Bearer sk-local"

# Check CLaaS API
curl http://localhost:8080/

# List LoRA adapters
curl http://localhost:8080/v1/lora

Documentation Index

​Quick Start

​Prerequisites

​Choose your backend