Eval Harness

The CLaaS eval harness runs automated feedback loops against a live CLaaS stack and measures whether training shifts the model toward preferred behaviours without collapsing.

What it does

Initializes a fresh LoRA adapter for each preference being tested
Runs N feedback steps, each sending a preference-targeted prompt and feedback to the CLaaS API
Measures metrics at each step: logprob margins, compliance rates, capability retention, and collapse signals
Generates plots (optional) showing learning curves over training steps
Produces a summary with per-preference pass/fail verdicts

Preferences

The eval harness ships with three built-in preferences:

Preference	What it trains	Example feedback
`no_emoji`	Suppress emoji in responses	”Don’t use any emoji in your responses”
`concise`	Keep responses short	”Be more concise, use 3 sentences or fewer”
`identity`	Adopt a persona	”Your name is Kuro”

Preferences are configured via the preferences list in the Hydra config. You can run any subset:

uv run python -m claas.eval 'preferences=[concise]' num_steps=10

Output structure

Each eval run produces a directory under data/evals/<run-id>/:

data/evals/<run-id>/
├── summary.json              # Per-preference pass/fail verdicts
└── <preference>/
    ├── metadata.json          # Run config + LoRA ID
    ├── baseline.json          # Pre-training metric snapshot
    └── steps.jsonl            # One JSON object per feedback step

Each line in steps.jsonl contains: step number, timestamp, feedback given, SDPO training metrics, eval metrics (logprob margin, compliance, general capability, collapse), and rollout transcripts.

Viewing results

Results can be viewed in the browser via the built-in eval dashboard served by the CLaaS API:

GET http://localhost:8080/v1/eval?results_dir=./data/evals

​Eval Harness

​What it does

​Preferences

​Output structure

​Viewing results

Eval Harness

What it does

Preferences

Output structure

Viewing results