Training Backends
CLaaS supports three training backends. Each implements the sameTrainingEngine abstract base class, so the feedback API works identically regardless of which backend you choose.
Comparison
| Feature | Local | Tinker | Modal |
|---|---|---|---|
| GPU Required | Yes (>= 24 GB VRAM) | No | No (remote L40S) |
| Docker Support | Yes | Yes | No (CLI deploy) |
| Base Model | Qwen/Qwen3-8B | Qwen/Qwen3-30B-A3B | Qwen/Qwen3-8B |
| Inference | Local vLLM | Tinker proxy | Modal vLLM |
| LoRA Storage | Local filesystem | Tinker JSON state | Modal Volume |
| Cost | Own hardware | API credits | Modal compute |
| Status | Available | Available | Coming soon |
| Best For | Full control, low latency | No GPU, quick start | Scalable cloud training |
Engine abstraction
All backends implement theTrainingEngine ABC defined in claas/training/engine/base.py. The key interface:
CLAAS_DISTILL_EXECUTION_MODE environment variable (local, tinker, or modal). The API reads this at startup and instantiates the corresponding engine.
DistillBatchRequestPayload
Typed batched payload forwarded to the training engine. Defined in claas/core/types.py.
| Field | Type | Description |
|---|---|---|
lora_id | str | Target LoRA adapter identifier |
training | TrainingConfig | Hyperparameters (learning rate, alpha, clip, grad norm, KL weight, teacher top-k) |
samples | list[DistillBatchItem] | One or more cache-enriched training samples. Each contains prompt, response, feedback, logprobs, and token IDs. |
save_in_place | bool | If True, overwrite the adapter in place instead of creating a new version |
DistillResponse
Response returned after a distillation step completes. Defined in claas/core/types.py.
| Field | Type | Description |
|---|---|---|
lora_id | str | Updated LoRA identifier (new version suffix after training) |
metadata | dict[str, Any] | Training metrics and diagnostics (loss, grad norm, step timing) |
Hybrid engine (Local)
The locally hosted request path uses a hybrid engine that alternates between:- Serving mode - routes traffic through vLLM for low-latency generation
- Update mode - pauses serving, frees GPU memory, runs a single SDPO step, then resumes

