Skip to main content

Modal Cloud Backend

The Modal backend is coming soon. The configuration and deployment instructions below reflect the planned design.
The Modal backend runs SDPO distillation remotely on Modal using L40S GPUs. No local GPU is needed. Modal handles all infrastructure.
Modal manages the compute infrastructure. You do not use Docker Compose for this backend. Deployment is handled entirely via the Modal CLI.

Requirements

  • Python 3.11+ and uv
  • A Modal account
  • HF_TOKEN (optional, for gated models)

Deployment

1

Install dependencies

git clone https://github.com/kfallah/CLaaS.git
cd CLaaS
uv sync --extra local
2

Authenticate with Modal

uv run modal token new
3

Deploy

export HF_TOKEN=...   # optional, for gated models
export CLAAS_BASE_MODEL_ID=Qwen/Qwen3-8B
uv run modal deploy -m claas.deploy
The deployed app exposes the same API at https://your-app--claas-distill-fastapi-app.modal.run.
4

Verify

curl https://your-app--claas-distill-fastapi-app.modal.run/
curl https://your-app--claas-distill-fastapi-app.modal.run/v1/lora

Configuration

VariableRequiredDefaultDescription
HF_TOKENNoHuggingFace token (gated models only)
CLAAS_BASE_MODEL_IDNoQwen/Qwen3-8BBase model for LoRA training
For the full Hydra config and all environment variables, see the Configuration Reference.

LoRA Storage

LoRAs are stored in the claas-loras Modal Volume, which persists across deployments. The volume is automatically created on first deploy.

Claude Code integration

If you use Claude Code, the /setup-modal slash command deploys the CLaaS distillation service to Modal automatically.