Skip to main content

Configuration & Running

The eval harness uses Hydra for configuration. This page covers the key settings and a step-by-step guide for running evals in Tinker mode (no GPU required). For Local or Modal mode, swap the environment variables and dependencies accordingly.

Key Config Fields

FieldDefaultDescription
preferences[no_emoji, concise, identity]Preferences to train and evaluate
num_steps20Feedback steps per preference
batch_size4Samples per feedback batch
modetinkerExecution backend: local, tinker, or modal
base_modelQwen/Qwen3-30B-A3BBase model for LoRA init (use Tinker name in Tinker mode)
plotstrueGenerate matplotlib plots
For the full config field reference, see the Configuration Reference.

Secrets

VariableRequired forPurpose
CLAAS_TINKER_API_KEYTinker modeTinker SDK authentication
GEMINI_API_KEYgeneral metricGemini-based capability evaluation

CLI Overrides

Hydra overrides are positional arguments after the eval subcommand:
# Run only conciseness for 10 steps
uv run python -m claas.eval 'preferences=[concise]' num_steps=10

# Override base model and mode
uv run python -m claas.eval base_model=Qwen/Qwen3-30B-A3B mode=tinker

# Skip OpenClaw gateway, proxy completions through CLaaS directly
uv run python -m claas.eval openclaw_url=null

# Use a custom config directory
uv run python -m claas.eval --config-dir ./my_configs --config-name my_config

Programmatic Usage

from claas.eval.config import build_harness_config
from claas.eval.runner import run_harness
from claas.eval.types import EvalConfig
import asyncio

config = build_harness_config(
    EvalConfig(
        preferences=["concise"],
        num_steps=5,
    )
)
asyncio.run(run_harness(config))

Running the Eval

1

Install dependencies

uv sync --extra tinker --extra dev
2

Start the Tinker inference proxy

CLAAS_TINKER_API_KEY="tml-..." \
CLAAS_TINKER_BASE_MODEL="Qwen/Qwen3-30B-A3B" \
  uv run uvicorn claas.proxy.tinker_inference_proxy:app \
    --host 0.0.0.0 --port 8000
3

Start the CLaaS API

CLAAS_DISTILL_EXECUTION_MODE=tinker \
CLAAS_TINKER_API_KEY="tml-..." \
CLAAS_TINKER_BASE_MODEL="Qwen/Qwen3-30B-A3B" \
CLAAS_ALLOWED_INIT_BASE_MODELS="Qwen/Qwen3-30B-A3B" \
  uv run uvicorn claas.api:web_app \
    --host 0.0.0.0 --port 8080
Use claas.api:web_app, not claas.api:app. The app object is a Modal App and is not ASGI-compatible.
4

Run the eval

CLAAS_DISTILL_EXECUTION_MODE=tinker \
CLAAS_TINKER_API_KEY="tml-..." \
CLAAS_TINKER_BASE_MODEL="Qwen/Qwen3-30B-A3B" \
  uv run python -m claas.eval
This runs with the default Hydra config (claas/eval/configs/base.yaml). Override any field via key=value arguments.
5

View results

Results are written to ./data/evals/<run-id>/. View them in the browser via the eval dashboard:
http://localhost:8080/v1/eval?results_dir=./data/evals
Or inspect the raw output:
cat data/evals/<run-id>/summary.json

Known Gotchas

Tinker uses its own model identifiers that differ from HuggingFace names. For example, the HuggingFace model Qwen/Qwen3-Coder-30B-A3B-Instruct is Qwen/Qwen3-30B-A3B in Tinker. Sampling works with either name, but LoRA training init will reject the HuggingFace name with a 400 error. Always use the Tinker name in base_model.
When running the CLaaS API with uvicorn directly (no Docker/Modal), use claas.api:web_app, not claas.api:app. The app object is a Modal App and is not ASGI-compatible.
The proxy reads CLAAS_TINKER_BASE_MODEL to initialize its sampling client, and the eval config’s base_model is passed to the API for LoRA init. If they reference different models, scoring and training will target different models.
The collapse metric generates multiple stochastic samples per step. It only runs at steps listed in collapse_steps (default [0, 5, 10, 15, 19]) to limit overhead. You can further reduce cost by narrowing the list.