Inference Pipeline Overview¶
Version: 1.1 Status: Stable Last Updated: 2026-03-03
End-to-end reference for how a feature row becomes a labeled prediction. Each stage links to its authoritative documentation.
Pipeline stages¶
| Stage | What happens | Details |
|---|---|---|
| 1. Input | A single feature row enters the model. Must conform to the Feature Schema and include all columns from FEATURE_COLUMNS. |
Core Types, Configuration |
| 2. Model prediction | LightGBM produces core_probs: float[8], ordered by label ID, summing to 1.0. |
Task Labels (v1), LightGBM Trainer |
| 3. Calibration | Per-user or global calibrator adjusts probabilities. Preserves ordering and sum constraint. | Personalization Sections 3–5, Calibration API |
| 4. Reject check | If max(calibrated_probs) < reject_threshold, the prediction is rejected and mapped to Mixed/Unknown. |
Acceptance Criteria Section 4 |
| 5. Smoothing | Rolling majority window merges noisy per-minute predictions into stable blocks. | Prediction Smoothing |
| 6. Taxonomy mapping | Optional: maps core labels to user-defined buckets with aggregated probabilities. | Custom Taxonomy |
Final output object¶
Each prediction produces:
| Field | Type | Source |
|---|---|---|
user_id |
string | Feature row |
bucket_start_ts |
timestamp | Feature row |
core_label_id |
int | argmax(core_probs) |
core_label_name |
string | Label set lookup |
core_probs |
float[8] | Model output |
confidence |
float | max(core_probs) |
is_rejected |
boolean | Reject check |
mapped_label_name |
string | Taxonomy mapping (or core label) |
mapped_probs |
dict[str, float] | Taxonomy mapping |
model_version |
string | Model Bundle metadata |
schema_version |
string | Feature schema |
label_version |
string | Label schema |
See Prediction Types for the implementation.
Invariants¶
- Determinism: same model + calibrator + mapping + input = identical output. No stochastic inference.
- Error handling: missing features, unknown categoricals, or schema mismatches produce
Mixed/Unknown— the pipeline never crashes. See Data Validation. - Privacy: no raw titles, keystrokes, or URLs enter or leave the pipeline. See Privacy Model.
Versioning¶
Changes to any of the following require a version bump:
- Label ordering or count
- Output structure
- Confidence definition
- Reject semantics
- Calibration insertion point
- Taxonomy mapping semantics