Skip to main content

Training Backends

CLaaS supports three training backends. Each implements the same TrainingEngine abstract base class, so the feedback API works identically regardless of which backend you choose.

Comparison

FeatureLocalTinkerModal
GPU RequiredYes (>= 24 GB VRAM)NoNo (remote L40S)
Docker SupportYesYesNo (CLI deploy)
Base ModelQwen/Qwen3-8BQwen/Qwen3-30B-A3BQwen/Qwen3-8B
InferenceLocal vLLMTinker proxyModal vLLM
LoRA StorageLocal filesystemTinker JSON stateModal Volume
CostOwn hardwareAPI creditsModal compute
StatusAvailableAvailableComing soon
Best ForFull control, low latencyNo GPU, quick startScalable cloud training

Engine abstraction

All backends implement the TrainingEngine ABC defined in claas/training/engine/base.py. The key interface:
class TrainingEngine(ABC):
    @abstractmethod
    async def distill(self, payload: DistillBatchRequestPayload) -> DistillResponse:
        """Run one distillation step."""
        ...
The execution mode is selected via the CLAAS_DISTILL_EXECUTION_MODE environment variable (local, tinker, or modal). The API reads this at startup and instantiates the corresponding engine.

DistillBatchRequestPayload

Typed batched payload forwarded to the training engine. Defined in claas/core/types.py.
class DistillBatchRequestPayload(BaseModel):
    lora_id: str
    training: TrainingConfig
    samples: list[DistillBatchItem] = Field(min_length=1)
    save_in_place: bool = False
FieldTypeDescription
lora_idstrTarget LoRA adapter identifier
trainingTrainingConfigHyperparameters (learning rate, alpha, clip, grad norm, KL weight, teacher top-k)
sampleslist[DistillBatchItem]One or more cache-enriched training samples. Each contains prompt, response, feedback, logprobs, and token IDs.
save_in_placeboolIf True, overwrite the adapter in place instead of creating a new version

DistillResponse

Response returned after a distillation step completes. Defined in claas/core/types.py.
class DistillResponse(BaseModel):
    lora_id: str = Field(..., description="Updated LoRA identifier")
    metadata: dict[str, Any] = Field(..., description="Training metrics and diagnostics")
FieldTypeDescription
lora_idstrUpdated LoRA identifier (new version suffix after training)
metadatadict[str, Any]Training metrics and diagnostics (loss, grad norm, step timing)

Hybrid engine (Local)

The locally hosted request path uses a hybrid engine that alternates between:
  • Serving mode - routes traffic through vLLM for low-latency generation
  • Update mode - pauses serving, frees GPU memory, runs a single SDPO step, then resumes
This sleep/wake mechanism ensures vLLM and CLaaS don’t compete for GPU memory.