Eval Harness
The CLaaS eval harness runs automated feedback loops against a live CLaaS stack and measures whether training shifts the model toward preferred behaviours without collapsing.What it does
- Initializes a fresh LoRA adapter for each preference being tested
- Runs N feedback steps, each sending a preference-targeted prompt and feedback to the CLaaS API
- Measures metrics at each step: logprob margins, compliance rates, capability retention, and collapse signals
- Generates plots (optional) showing learning curves over training steps
- Produces a summary with per-preference pass/fail verdicts
Preferences
The eval harness ships with three built-in preferences:| Preference | What it trains | Example feedback |
|---|---|---|
no_emoji | Suppress emoji in responses | ”Don’t use any emoji in your responses” |
concise | Keep responses short | ”Be more concise, use 3 sentences or fewer” |
identity | Adopt a persona | ”Your name is Kuro” |
preferences list in the Hydra config. You can run any subset:
Output structure
Each eval run produces a directory underdata/evals/<run-id>/:
steps.jsonl contains: step number, timestamp, feedback given, SDPO training metrics, eval metrics (logprob margin, compliance, general capability, collapse), and rollout transcripts.

