Documentation Index
Fetch the complete documentation index at: https://docs.openclaas.com/llms.txt
Use this file to discover all available pages before exploring further.
Training Backends
CLaaS supports three training backends. Each implements the same TrainingEngine abstract base class, so the feedback API works identically regardless of which backend you choose.
Comparison
| Feature | Local | Tinker | Modal |
|---|
| GPU Required | Yes (>= 24 GB VRAM) | No | No (remote L40S) |
| Docker Support | Yes | Yes | No (CLI deploy) |
| Base Model | Qwen/Qwen3-8B | Qwen/Qwen3-30B-A3B | Qwen/Qwen3-8B |
| Inference | Local vLLM | Tinker proxy | Modal vLLM |
| LoRA Storage | Local filesystem | Tinker JSON state | Modal Volume |
| Cost | Own hardware | API credits | Modal compute |
| Status | Available | Available | Coming soon |
| Best For | Full control, low latency | No GPU, quick start | Scalable cloud training |
Engine abstraction
All backends implement the TrainingEngine ABC defined in claas/training/engine/base.py. The key interface:
class TrainingEngine(ABC):
@abstractmethod
async def distill(self, payload: DistillBatchRequestPayload) -> DistillResponse:
"""Run one distillation step."""
...
The execution mode is selected via the CLAAS_DISTILL_EXECUTION_MODE environment variable (local, tinker, or modal). The API reads this at startup and instantiates the corresponding engine.
DistillBatchRequestPayload
Typed batched payload forwarded to the training engine. Defined in claas/core/types.py.
class DistillBatchRequestPayload(BaseModel):
lora_id: str
training: TrainingConfig
samples: list[DistillBatchItem] = Field(min_length=1)
save_in_place: bool = False
| Field | Type | Description |
|---|
lora_id | str | Target LoRA adapter identifier |
training | TrainingConfig | Hyperparameters (learning rate, alpha, clip, grad norm, KL weight, teacher top-k) |
samples | list[DistillBatchItem] | One or more cache-enriched training samples. Each contains prompt, response, feedback, logprobs, and token IDs. |
save_in_place | bool | If True, overwrite the adapter in place instead of creating a new version |
DistillResponse
Response returned after a distillation step completes. Defined in claas/core/types.py.
class DistillResponse(BaseModel):
lora_id: str = Field(..., description="Updated LoRA identifier")
metadata: dict[str, Any] = Field(..., description="Training metrics and diagnostics")
| Field | Type | Description |
|---|
lora_id | str | Updated LoRA identifier (new version suffix after training) |
metadata | dict[str, Any] | Training metrics and diagnostics (loss, grad norm, step timing) |
Hybrid engine (Local)
The locally hosted request path uses a hybrid engine that alternates between:
- Serving mode - routes traffic through vLLM for low-latency generation
- Update mode - pauses serving, frees GPU memory, runs a single SDPO step, then resumes
This sleep/wake mechanism ensures vLLM and CLaaS don’t compete for GPU memory.