Modal Cloud Backend
The Modal backend is coming soon. The configuration and deployment instructions below reflect the planned design.
The Modal backend runs SDPO distillation remotely on Modal using L40S GPUs. No local GPU is needed. Modal handles all infrastructure.
Modal manages the compute infrastructure. You do not use Docker Compose for this backend. Deployment is handled entirely via the Modal CLI.
Requirements
- Python 3.11+ and uv
- A Modal account
HF_TOKEN (optional, for gated models)
Deployment
Install dependencies
git clone https://github.com/kfallah/CLaaS.git
cd CLaaS
uv sync --extra local
Deploy
export HF_TOKEN=... # optional, for gated models
export CLAAS_BASE_MODEL_ID=Qwen/Qwen3-8B
uv run modal deploy -m claas.deploy
The deployed app exposes the same API at https://your-app--claas-distill-fastapi-app.modal.run.Verify
curl https://your-app--claas-distill-fastapi-app.modal.run/
curl https://your-app--claas-distill-fastapi-app.modal.run/v1/lora
Configuration
| Variable | Required | Default | Description |
|---|
HF_TOKEN | No | — | HuggingFace token (gated models only) |
CLAAS_BASE_MODEL_ID | No | Qwen/Qwen3-8B | Base model for LoRA training |
For the full Hydra config and all environment variables, see the Configuration Reference.
LoRA Storage
LoRAs are stored in the claas-loras Modal Volume, which persists across deployments. The volume is automatically created on first deploy.
Claude Code integration
If you use Claude Code, the /setup-modal slash command deploys the CLaaS distillation service to Modal automatically.