Model selection¶
This document defines how TaskCLF selects a “default” model for inference (when --model-dir is omitted), how to rank candidate models, and what “compatible” means.
Terms¶
- Model bundle: a directory containing
model.txt,metadata.json,metrics.json, and other bundle files (seedocs/model_bundle_layout.md). - Promoted model: a model bundle that lives under
models/(themodels_dir). - Rejected model: a model bundle moved to
<out_dir>/rejected_models/by retrain. - Compatible model: a promoted bundle that matches the current runtime schema and label set (defined below).
- Rankable model: a promoted + compatible bundle that has the files required to compute ranking metrics (always includes
metrics.jsontoday). - Default model: the model used by inference when
--model-diris not provided.
Compatibility (hard requirement)¶
A model bundle is compatible with the current code if BOTH hold:
metadata.schema_hashexactly matches the expected hash formetadata.schema_versionsorted(metadata.label_set) == sorted(LABEL_SET_V1)(exact match)
This matches the behavior of load_model_bundle(..., validate_schema=True, validate_labels=True).
If a bundle is not compatible, it MUST NOT be used as the default model (or auto-selected), and tooling should surface a clear reason.
Where schema compatibility comes from¶
Each supported schema version (v1, v2, v3) has its own deterministic hash computed from the canonical column registry in src/taskclf/core/schema.py. The current default feature schema is v3, but bundle loading remains schema-version-aware and older supported bundles can still be loaded explicitly.
Current state (today)¶
- The model registry (
model_registry.py) scans, validates, ranks, and selects bundles. - The active model pointer (
active.json) is implemented:read_active,write_active_atomic,append_active_history, andresolve_active_modelare available. resolve_active_modelreadsactive.jsonif valid, otherwise falls back tofind_best_modeland self-heals the pointer.- Automatic best-model selection now prefers
v3bundles first and falls back to older supported schemas when no validv3bundle exists. - Inference auto-resolves the model when
--model-diris omitted viaresolve_model_dir()intaskclf.infer.resolve: readsactive.jsonif valid, otherwise falls back to best-model selection. Bothinfer batchandinfer onlineaccept an optional--models-dir(defaultmodels/). - The online inference loop watches
active.jsonmtime and hot-reloads the model when it changes (swap only after successful load). - Retrain resolves the champion via a priority chain:
read_active()→find_best_model()→find_latest_model()(legacy fallback). See_resolve_champion()inretrain.py. - After a successful promotion to
models/, retrain runsfind_best_model(), writesmodels/index.json(cached ranking), and atomically updatesactive.jsonif the best model has changed and passes the hysteresis threshold (the “overthrow” mechanism). - Hysteresis:
SelectionPolicy.min_improvement(default0.0) controls the macro-F1 improvement threshold for switching the active pointer. Configured viamin_improvementinconfigs/retrain.yaml. taskclf model set-active --model-id ...allows manual rollback by setting the active pointer to a specific bundle after validation.- Regression gates are split into candidate-only gates (
check_candidate_gates) and comparative gates (check_regression_gates). Candidate gates evaluate the challenger in isolation (BreakIdle precision, per-class precision, acceptance checks); comparative gates add the macro-F1 no-regression check against the champion. RetrainResultsurfacescandidate_gates,champion_source, andactive_updatedfor observability.metrics.jsoninside the bundle contains onlymacro_f1,weighted_f1,confusion_matrix,label_names.
Acceptance gates (BreakIdle precision, “no class below 0.50 precision”, etc.) are computed during evaluation and stored in evaluation.json written under --out-dir (not inside the bundle).
Default resolution precedence (implemented)¶
Default resolution precedence¶
When --model-dir is omitted, inference resolves the model directory by the following precedence (implemented in taskclf.infer.resolve.resolve_model_dir):
- If
models/active.jsonexists and is valid: use the bundle it points to. - Else: select the best bundle from
models/by the selection policy (below). - If no eligible bundle exists: fail loudly with a message listing why bundles were excluded, and require explicit
--model-dir.
--model-dir always overrides the default resolution.
Why an active pointer file¶
A pointer file makes “continuous best model” explicit and stable:
- training/promotion updates the pointer when a challenger overthrows the previous best
- inference reads a single small file, rather than re-scanning and re-ranking on every call
- updates can be atomic and auditable
Selection policy v1 (ranking)¶
Candidate set¶
Start from all promoted bundles under models/.
Filter to bundles that are: - valid bundles (have required files) - compatible (schema_hash + label_set match)
Score and tie-breakers¶
Rank compatible bundles using:
- higher
metrics.macro_f1wins - tie-break: higher
metrics.weighted_f1wins - tie-break: newer
metadata.created_atwins (ISO8601 datetime with UTC offset)
Notes:
- metadata.created_at is produced by datetime.now(UTC).isoformat() (includes offset).
- metrics.json uses floats for F1 metrics.
Constraints / gates¶
Policy v1 does NOT re-check acceptance gates by default, because acceptance checks are not stored in bundle metrics.json today.
If you want selection to apply gates, you must either:
- (preferred) persist a minimal acceptance.json (or equivalent) inside the bundle at promotion time, OR
- teach the selector to locate the corresponding evaluation.json in the --out-dir history (hard, because it is not colocated with the bundle and not part of the bundle contract).
Until that exists, promotion-time gating is the control point: - retrain already rejects bundles that fail gates (moved to rejected_models) - selection chooses the best among promoted bundles by F1
Active pointer file: models/active.json¶
Purpose¶
models/active.json is a small file that defines the default model bundle for inference when --model-dir is omitted.
Schema¶
{
"model_dir": "models/<bundle_dir_name>",
"model_id": "<optional stable id; may equal bundle dir name>",
"selected_at": "2026-02-26T12:34:56.789123+00:00",
"policy_version": 1,
"reason": {
"metric": "macro_f1",
"macro_f1": 0.8123,
"weighted_f1": 0.9012
}
}
````
Required keys:
* `model_dir` (string): path to the promoted bundle directory. Prefer a relative path (from repo root or from `models/`), but be consistent.
* `selected_at` (string): ISO8601 datetime string.
* `policy_version` (int): selection policy version.
Optional keys:
* `model_id` (string)
* `reason` (object)
### Validation rules
On read, inference MUST:
* ensure file is valid JSON and required keys exist
* ensure `model_dir` exists and is a valid bundle
* ensure bundle is compatible (schema_hash + label_set match)
If the pointer file is invalid, inference falls back to selecting best-by-policy.
### Atomic update requirement
Updates to `active.json` MUST be atomic to avoid partial reads:
* write to `active.json.tmp`
* `os.replace("active.json.tmp", "active.json")` on the same filesystem
### Audit log (recommended)
Append a line to `models/active_history.jsonl` on every active change:
```json
{"at":"...","old":{...},"new":{...}}
This enables rollback and debugging of “why did we switch models?”
Overthrow flow (continuous best) — implemented¶
When a new model is promoted to models/ by run_retrain_pipeline():
find_best_model()scans and ranks promoted compatible bundles using policy v1.write_index_cache()writesmodels/index.jsonwith the full ranking snapshot.should_switch_active()applies the hysteresis check (see below).- If the best bundle differs from the currently active bundle and passes the hysteresis threshold:
write_active_atomic()atomically updatesmodels/active.json- The transition is appended to
models/active_history.jsonl
Champion resolution for regression gates uses a priority chain (_resolve_champion()):
read_active()— use the active pointer if valid.find_best_model()— use the highest-ranked eligible bundle.find_latest_model()— legacy fallback for repos that predate the registry.
Index cache: models/index.json — implemented¶
After every retrain promotion, write_index_cache() writes a snapshot of the current scan results to models/index.json. The cache is informational only — find_best_model() and resolve_active_model() always do a live scan. The cache is useful for:
taskclf train listspeed (optional fast path)- Operators inspecting
models/index.jsondirectly to see the current ranking and why bundles were excluded
The file is written atomically (temp + os.replace). See IndexCache in the API reference for the schema.
Hysteresis (min_improvement) — implemented¶
To avoid unnecessary model switches when the improvement is marginal, SelectionPolicy supports a min_improvement field (default 0.0, meaning no hysteresis).
When min_improvement > 0, the candidate model's macro_f1 must exceed the current active model's macro_f1 by at least min_improvement before the active pointer is updated.
To enable hysteresis, set min_improvement in configs/retrain.yaml:
The should_switch_active(current, candidate, policy) function implements this check. If the current pointer has no recorded macro_f1 in its reason dict, the switch is always allowed (safe fallback).
CLI commands — implemented¶
-
taskclf train list: -
lists promoted bundles
- shows ranking metrics and compatibility status
- highlights the active bundle (if pointer exists)
-
supports
--json -
taskclf model set-active --model-id <...>: -
validates the bundle exists, is valid, and is compatible
- writes
active.jsonatomically after validation - appends to
active_history.jsonl - useful for rollback after a bad promotion