Inference Contract¶

Version: 1.0 Status: Stable Last Updated: 2026-03-28

This document defines the canonical inference pipeline order for taskclf and maps each stage to its implementation across the three runtime paths.

1. Canonical Pipeline Order¶

Every inference path must execute these stages in this order:

features → encode categoricals → impute missing → predict probabilities
→ calibrate → reject → smooth/aggregate → taxonomy map → UI label

#	Stage	Input	Output
1	Build features	Raw events	`FeatureRow` / feature DataFrame
2	Encode categoricals	String columns	Integer-encoded columns
3	Impute missing values	Encoded matrix (with gaps)	Dense numeric matrix
4	Predict probabilities	Dense matrix	`(N, n_classes)` probability matrix
5	Calibrate	Raw probabilities	Calibrated probabilities
6	Reject	Calibrated confidence	Rejection flags + `Mixed/Unknown` labels
7	Smooth / aggregate	Per-bucket labels	Rolling-majority-smoothed labels
8	Taxonomy map	Core labels + calibrated probs	Mapped labels + mapped probs
9	UI label	Mapped or smoothed label	Final user-facing string

2. Runtime Paths¶

2.1 Batch¶

Entry point: run_batch_inference() in src/taskclf/infer/batch.py.

Stage	Implementation
Build features	Caller supplies a pre-built `features_df`.
Encode categoricals	`predict_proba()` → `encode_categoricals()` from `train/lgbm.py`.
Impute missing	`predict_proba()` → `fillna(0)`.
Predict probabilities	`predict_proba()` → `model.predict(x)`.
Calibrate	`CalibratorStore.calibrate_batch()` (per-user) or single `Calibrator.calibrate()`.
Reject	`max(proba) < reject_threshold` → label becomes `Mixed/Unknown`.
Smooth / aggregate	`rolling_majority()` from `infer/smooth.py`, then `segmentize()` + `merge_short_segments()`.
Taxonomy map	`TaxonomyResolver.resolve_batch()` from `infer/taxonomy.py`.
UI label	`BatchInferenceResult.mapped_labels` (or `smoothed_labels` if no taxonomy).

Helper functions predict_proba() and predict_labels() cover only stages 2–4 (encode, impute, predict). They do not calibrate, reject, smooth, or map. They are used for evaluation and calibrator fitting, not end-user inference.

2.2 Online¶

Entry point: OnlinePredictor.predict_bucket() in src/taskclf/infer/online.py.

Stage	Implementation
Build features	`run_online_loop()` calls `build_features_from_aw_events()`.
Encode categoricals	`_encode_value()` per column (categorical → `LabelEncoder`, unknown → `-1`).
Impute missing	`_encode_value()` returns `float("nan")` for missing numerics.
Predict probabilities	`model.predict(x)` on the single-row array.
Calibrate	`CalibratorStore.get_calibrator(row.user_id)` or fallback `Calibrator.calibrate()`.
Reject	`confidence < reject_threshold` → label becomes `Mixed/Unknown`.
Smooth / aggregate	`rolling_majority()` over an internal deque buffer.
Taxonomy map	`TaxonomyResolver.resolve()` per bucket.
UI label	`WindowPrediction.mapped_label_name` (or `smoothed_label` if no taxonomy).

2.3 Tray¶

Entry point: _LabelSuggester.suggest() in src/taskclf/ui/runtime.py.

Delegates entirely to the online path:

Fetches AW events for the requested time window.
Builds features via build_features_from_aw_events().
Calls OnlinePredictor.predict_bucket() on the last row.
Returns (prediction.core_label_name, prediction.confidence).

The tray path inherits all online-path stages. Its return value currently exposes core_label_name, not mapped_label_name.

3. Known Deviations¶

These deviations from the canonical order are documented here and addressed in Phase 1.

3.1 Imputation mismatch (batch vs online)¶

Batch predict_proba() uses fillna(0) — missing numerics become zero.
Online _encode_value() returns float("nan") — missing numerics stay NaN.
Training prepare_xy() in train/lgbm.py uses fillna(0).

Batch matches training; online does not. LightGBM handles NaN natively (routes to best split), so the model tolerates this, but predictions may differ for identical inputs depending on path.

3.2 predict_proba / predict_labels skip calibration¶

predict_proba() and predict_labels() execute only encode → impute → predict. They do not calibrate. This is intentional for evaluation and calibrator fitting, but callers must not treat their output as production-grade calibrated predictions.

3.3 Taxonomy inputs are pre-smooth¶

In both batch and online, taxonomy mapping is sequenced after smoothing in the code flow. However, TaxonomyResolver.resolve() / resolve_batch() receive the per-bucket argmax index and calibrated probabilities — not the smoothed label. Taxonomy therefore maps from the raw (pre-smooth) prediction, not the smoothed one.

3.4 Tray suggest returns core_label_name¶

_LabelSuggester.suggest() returns core_label_name and confidence. It does not return mapped_label_name, so taxonomy configuration is invisible to the tray UI return value.