Skip to main content

Quick Start

Prerequisites

  • Python 3.11+
  • uv, fast Python package manager
  • Docker (for Local and Tinker Docker deployments)
  • GPU with >= 24 GB VRAM (Local backend only)
git clone https://github.com/kfallah/CLaaS.git
cd CLaaS

Choose your backend

Runs SDPO training and vLLM inference on your own GPU. Requires >= 24 GB VRAM.
# Install dependencies
uv sync --extra local

# Docker setup (recommended)
cd docker
cp .env.local.example .env
# Edit .env -- set TELEGRAM_BOT_TOKEN
docker compose --profile local up --build
The first run downloads Qwen3-8B (~16 GB). Expect the vLLM health check to take 10–20 minutes on first start.

Full Local backend reference

Requirements, all config variables, services, and manual setup

Verify your setup

Once the stack is running, verify with:
# Check inference endpoint
curl http://localhost:8000/v1/models -H "Authorization: Bearer sk-local"

# Check CLaaS API
curl http://localhost:8080/

# List LoRA adapters
curl http://localhost:8080/v1/lora