Skip to main content

Metrics

The eval harness supports four metric types. Select which to run via the metrics list in config or CLI override.

Metric descriptions

MetricWhat it measures
logprobLogprob margin between preferred and dispreferred response pairs. A positive margin means the model favours the preferred response. Delta from baseline tracks training progress.
complianceGenerates responses to probe prompts, runs a programmatic verifier (e.g. emoji count, sentence count, keyword presence), and averages the pass rate.
generalCoding task (fibonacci with exec + verify) plus 3 IFEval-style instruction-following probes. Measures capability retention during training.
collapseThree collapse detectors: token entropy (distribution confidence), self-ROUGE-L (output diversity across stochastic samples), and logprob drift (mean logprob shift from baseline).

Compliance verifiers

The compliance metric uses programmatic verifiers to check whether generated responses match the trained preference:
VerifierPreferencePass condition
no_emojino_emojiZero emoji characters in response
conciseconcise3 or fewer sentences (linear decay to 0.0 at 9+ sentences)
identityidentity”kuro” appears in response (case-insensitive)
Each verifier returns a score between 0.0 and 1.0. The compliance metric averages scores across all probe prompts for a given step.