Skip to content

Inference Pipeline Overview

Version: 1.1 Status: Stable Last Updated: 2026-03-03

End-to-end reference for how a feature row becomes a labeled prediction. Each stage links to its authoritative documentation.


Pipeline stages

Feature Row → Model → core_probs → Calibration → Reject Check → Taxonomy Mapping → Final Output
Stage What happens Details
1. Input A single feature row enters the model. Must conform to the Feature Schema and include all columns from FEATURE_COLUMNS. Core Types, Configuration
2. Model prediction LightGBM produces core_probs: float[8], ordered by label ID, summing to 1.0. Task Labels (v1), LightGBM Trainer
3. Calibration Per-user or global calibrator adjusts probabilities. Preserves ordering and sum constraint. Personalization Sections 3–5, Calibration API
4. Reject check If max(calibrated_probs) < reject_threshold, the prediction is rejected and mapped to Mixed/Unknown. Acceptance Criteria Section 4
5. Smoothing Rolling majority window merges noisy per-minute predictions into stable blocks. Prediction Smoothing
6. Taxonomy mapping Optional: maps core labels to user-defined buckets with aggregated probabilities. Custom Taxonomy

Final output object

Each prediction produces:

Field Type Source
user_id string Feature row
bucket_start_ts timestamp Feature row
core_label_id int argmax(core_probs)
core_label_name string Label set lookup
core_probs float[8] Model output
confidence float max(core_probs)
is_rejected boolean Reject check
mapped_label_name string Taxonomy mapping (or core label)
mapped_probs dict[str, float] Taxonomy mapping
model_version string Model Bundle metadata
schema_version string Feature schema
label_version string Label schema

See Prediction Types for the implementation.


Invariants

  • Determinism: same model + calibrator + mapping + input = identical output. No stochastic inference.
  • Error handling: missing features, unknown categoricals, or schema mismatches produce Mixed/Unknown — the pipeline never crashes. See Data Validation.
  • Privacy: no raw titles, keystrokes, or URLs enter or leave the pipeline. See Privacy Model.

Versioning

Changes to any of the following require a version bump:

  • Label ordering or count
  • Output structure
  • Confidence definition
  • Reject semantics
  • Calibration insertion point
  • Taxonomy mapping semantics