Modal Cloud Backend

The Modal backend is coming soon. The configuration and deployment instructions below reflect the planned design.

The Modal backend runs SDPO distillation remotely on Modal using L40S GPUs. No local GPU is needed. Modal handles all infrastructure.

Modal manages the compute infrastructure. You do not use Docker Compose for this backend. Deployment is handled entirely via the Modal CLI.

Requirements

Install dependencies

git clone https://github.com/kfallah/CLaaS.git
cd CLaaS
uv sync --extra local

Authenticate with Modal

uv run modal token new

Deploy

export HF_TOKEN=...   # optional, for gated models
export CLAAS_BASE_MODEL_ID=Qwen/Qwen3-8B
uv run modal deploy -m claas.deploy

The deployed app exposes the same API at https://your-app--claas-distill-fastapi-app.modal.run.

Verify

curl https://your-app--claas-distill-fastapi-app.modal.run/
curl https://your-app--claas-distill-fastapi-app.modal.run/v1/lora

Variable	Required	Default	Description
`HF_TOKEN`	No	—	HuggingFace token (gated models only)
`CLAAS_BASE_MODEL_ID`	No	`Qwen/Qwen3-8B`	Base model for LoRA training

For the full Hydra config and all environment variables, see the Configuration Reference.

LoRAs are stored in the claas-loras Modal Volume, which persists across deployments. The volume is automatically created on first deploy.

If you use Claude Code, the /setup-modal slash command deploys the CLaaS distillation service to Modal automatically.