Metrics

The eval harness supports four metric types. Select which to run via the metrics list in config or CLI override.

Metric descriptions

Metric	What it measures
`logprob`	Logprob margin between preferred and dispreferred response pairs. A positive margin means the model favours the preferred response. Delta from baseline tracks training progress.
`compliance`	Generates responses to probe prompts, runs a programmatic verifier (e.g. emoji count, sentence count, keyword presence), and averages the pass rate.
`general`	Coding task (fibonacci with exec + verify) plus 3 IFEval-style instruction-following probes. Measures capability retention during training.
`collapse`	Three collapse detectors: token entropy (distribution confidence), self-ROUGE-L (output diversity across stochastic samples), and logprob drift (mean logprob shift from baseline).

The compliance metric uses programmatic verifiers to check whether generated responses match the trained preference:

Verifier	Preference	Pass condition
`no_emoji`	no_emoji	Zero emoji characters in response
`concise`	concise	3 or fewer sentences (linear decay to 0.0 at 9+ sentences)
`identity`	identity	”kuro” appears in response (case-insensitive)

Each verifier returns a score between 0.0 and 1.0. The compliance metric averages scores across all probe prompts for a given step.