Training Backends

CLaaS supports three training backends. Each implements the same TrainingEngine abstract base class, so the feedback API works identically regardless of which backend you choose.

Comparison

Feature	Local	Tinker	Modal
GPU Required	Yes (>= 24 GB VRAM)	No	No (remote L40S)
Docker Support	Yes	Yes	No (CLI deploy)
Base Model	`Qwen/Qwen3-8B`	`Qwen/Qwen3-30B-A3B`	`Qwen/Qwen3-8B`
Inference	Local vLLM	Tinker proxy	Modal vLLM
LoRA Storage	Local filesystem	Tinker JSON state	Modal Volume
Cost	Own hardware	API credits	Modal compute
Status	Available	Available	Coming soon
Best For	Full control, low latency	No GPU, quick start	Scalable cloud training

Engine abstraction

All backends implement the TrainingEngine ABC defined in claas/training/engine/base.py. The key interface:

class TrainingEngine(ABC):
    @abstractmethod
    async def distill(self, payload: DistillBatchRequestPayload) -> DistillResponse:
        """Run one distillation step."""
        ...

The execution mode is selected via the CLAAS_DISTILL_EXECUTION_MODE environment variable (local, tinker, or modal). The API reads this at startup and instantiates the corresponding engine.

`DistillBatchRequestPayload`

Typed batched payload forwarded to the training engine. Defined in claas/core/types.py.

class DistillBatchRequestPayload(BaseModel):
    lora_id: str
    training: TrainingConfig
    samples: list[DistillBatchItem] = Field(min_length=1)
    save_in_place: bool = False

Field	Type	Description
`lora_id`	`str`	Target LoRA adapter identifier
`training`	`TrainingConfig`	Hyperparameters (learning rate, alpha, clip, grad norm, KL weight, teacher top-k)
`samples`	`list[DistillBatchItem]`	One or more cache-enriched training samples. Each contains prompt, response, feedback, logprobs, and token IDs.
`save_in_place`	`bool`	If `True`, overwrite the adapter in place instead of creating a new version

`DistillResponse`

Response returned after a distillation step completes. Defined in claas/core/types.py.

class DistillResponse(BaseModel):
    lora_id: str = Field(..., description="Updated LoRA identifier")
    metadata: dict[str, Any] = Field(..., description="Training metrics and diagnostics")

Field	Type	Description
`lora_id`	`str`	Updated LoRA identifier (new version suffix after training)
`metadata`	`dict[str, Any]`	Training metrics and diagnostics (loss, grad norm, step timing)

Hybrid engine (Local)

The locally hosted request path uses a hybrid engine that alternates between:

Serving mode - routes traffic through vLLM for low-latency generation
Update mode - pauses serving, frees GPU memory, runs a single SDPO step, then resumes

This sleep/wake mechanism ensures vLLM and CLaaS don’t compete for GPU memory.

​Training Backends

​Comparison

​Engine abstraction

​DistillBatchRequestPayload

​DistillResponse

​Hybrid engine (Local)