Skip to content

train.calibrate

Per-user probability calibration: eligibility checks and calibrator fitting.

Overview

Training-side logic for the personalization pipeline. After a model is trained, this module fits probability calibrators on validation data so that predicted confidences better reflect true accuracy:

model + labeled_df → predict → fit global calibrator
                             → check each user's eligibility
                             → fit per-user calibrators (eligible users only)
                             → CalibratorStore + eligibility reports

The resulting CalibratorStore is used at inference time to adjust raw model probabilities before the reject decision.

Models

PersonalizationEligibility

Frozen Pydantic model reporting whether a user qualifies for per-user calibration.

Field Type Description
user_id str User identifier
labeled_windows int Number of labeled windows for this user
labeled_days int Number of distinct calendar days with labels
distinct_labels int Number of distinct core labels observed
is_eligible bool Whether all thresholds are met

Eligibility thresholds

A user must meet all three thresholds to receive a per-user calibrator. Defaults are defined in core.defaults:

Threshold Default Description
min_windows DEFAULT_MIN_LABELED_WINDOWS (200) Minimum labeled window count
min_days DEFAULT_MIN_LABELED_DAYS (3) Minimum distinct calendar days
min_labels DEFAULT_MIN_DISTINCT_LABELS (3) Minimum distinct core labels

Ineligible users fall back to the global calibrator at inference time.

Functions

check_personalization_eligible

check_personalization_eligible(
    df: pd.DataFrame,
    user_id: str,
    *,
    min_windows: int = DEFAULT_MIN_LABELED_WINDOWS,
    min_days: int = DEFAULT_MIN_LABELED_DAYS,
    min_labels: int = DEFAULT_MIN_DISTINCT_LABELS,
) -> PersonalizationEligibility

Checks whether user_id has enough labeled data. Returns a PersonalizationEligibility report. If the user is not present in df, returns a report with all counts at 0 and is_eligible=False.

fit_temperature_calibrator

fit_temperature_calibrator(
    y_true_indices: np.ndarray,
    y_proba: np.ndarray,
) -> TemperatureCalibrator

Finds the temperature scalar that minimizes negative log-likelihood on validation data. Uses a two-pass grid search:

  1. Coarse: 0.1 to 5.0, step 0.1
  2. Fine: best ± 0.1, step 0.01

Returns a TemperatureCalibrator with the optimal temperature.

fit_isotonic_calibrator

fit_isotonic_calibrator(
    y_true_indices: np.ndarray,
    y_proba: np.ndarray,
    n_classes: int,
) -> IsotonicCalibrator

Fits per-class sklearn.isotonic.IsotonicRegression with y_min=0.0, y_max=1.0, out_of_bounds="clip". Returns an IsotonicCalibrator wrapping the fitted regressors.

fit_calibrator_store

fit_calibrator_store(
    model: lgb.Booster,
    labeled_df: pd.DataFrame,
    *,
    cat_encoders: dict[str, LabelEncoder] | None = None,
    method: Literal["temperature", "isotonic"] = DEFAULT_CALIBRATION_METHOD,
    min_windows: int = DEFAULT_MIN_LABELED_WINDOWS,
    min_days: int = DEFAULT_MIN_LABELED_DAYS,
    min_labels: int = DEFAULT_MIN_DISTINCT_LABELS,
) -> tuple[CalibratorStore, list[PersonalizationEligibility]]

Orchestrates the full calibration flow:

  1. Predicts on labeled_df to get raw probabilities
  2. Fits a global calibrator on all validation data
  3. Checks each user's eligibility
  4. Fits per-user calibrators for qualifying users

Returns a (CalibratorStore, eligibility_reports) tuple. The default calibration method is "temperature" (from core.defaults).

Method comparison

Temperature Isotonic
Parameters Single scalar T Per-class non-parametric regression
Size Lightweight (one float) Larger (one IsotonicRegression per class)
Flexibility Uniform scaling across all classes Independent adjustment per class
Best for Well-calibrated models needing minor adjustment Models with class-specific miscalibration
Risk Cannot fix per-class bias Can overfit with small validation sets

Usage

from taskclf.train.calibrate import fit_calibrator_store
from taskclf.infer.calibration import save_calibrator_store

store, reports = fit_calibrator_store(
    model, val_df,
    cat_encoders=cat_encoders,
    method="temperature",
)

for r in reports:
    print(f"{r.user_id}: eligible={r.is_eligible}")

save_calibrator_store(store, Path("artifacts/calibrator_store"))

See the personalization guide for end-to-end setup and the infer.calibration page for runtime calibrator usage.

taskclf.train.calibrate

Per-user probability calibration: eligibility checks and calibrator fitting.

Provides the training-side logic for the personalization pipeline:

  • :func:check_personalization_eligible — gate that ensures a user has enough labeled data before fitting a per-user calibrator.
  • :func:fit_temperature_calibrator — optimizes a temperature scalar that minimizes NLL on held-out probabilities.
  • :func:fit_isotonic_calibrator — fits per-class isotonic regression.
  • :func:fit_calibrator_store — orchestrates the full flow: predict on validation data, fit a global calibrator, check each user's eligibility, and fit per-user calibrators for qualifying users.

PersonalizationEligibility

Bases: BaseModel

Result of checking whether a user qualifies for per-user calibration.

Source code in src/taskclf/train/calibrate.py
class PersonalizationEligibility(BaseModel, frozen=True):
    """Result of checking whether a user qualifies for per-user calibration."""

    user_id: str
    labeled_windows: int
    labeled_days: int
    distinct_labels: int
    is_eligible: bool

check_personalization_eligible(df, user_id, *, min_windows=DEFAULT_MIN_LABELED_WINDOWS, min_days=DEFAULT_MIN_LABELED_DAYS, min_labels=DEFAULT_MIN_DISTINCT_LABELS)

Check whether user_id has enough labeled data for per-user calibration.

The eligibility thresholds follow docs/guide/acceptance.md Section 8.

Parameters:

Name Type Description Default
df DataFrame

Labeled DataFrame with user_id, bucket_start_ts, and label columns.

required
user_id str

The user to check.

required
min_windows int

Minimum labeled window count.

DEFAULT_MIN_LABELED_WINDOWS
min_days int

Minimum number of distinct calendar days.

DEFAULT_MIN_LABELED_DAYS
min_labels int

Minimum number of distinct core labels observed.

DEFAULT_MIN_DISTINCT_LABELS

Returns:

Name Type Description
A PersonalizationEligibility

class:PersonalizationEligibility report.

Source code in src/taskclf/train/calibrate.py
def check_personalization_eligible(
    df: pd.DataFrame,
    user_id: str,
    *,
    min_windows: int = DEFAULT_MIN_LABELED_WINDOWS,
    min_days: int = DEFAULT_MIN_LABELED_DAYS,
    min_labels: int = DEFAULT_MIN_DISTINCT_LABELS,
) -> PersonalizationEligibility:
    """Check whether *user_id* has enough labeled data for per-user calibration.

    The eligibility thresholds follow ``docs/guide/acceptance.md`` Section 8.

    Args:
        df: Labeled DataFrame with ``user_id``, ``bucket_start_ts``, and
            ``label`` columns.
        user_id: The user to check.
        min_windows: Minimum labeled window count.
        min_days: Minimum number of distinct calendar days.
        min_labels: Minimum number of distinct core labels observed.

    Returns:
        A :class:`PersonalizationEligibility` report.
    """
    user_df = df[df["user_id"] == user_id]
    n_windows = len(user_df)

    if n_windows == 0:
        return PersonalizationEligibility(
            user_id=user_id,
            labeled_windows=0,
            labeled_days=0,
            distinct_labels=0,
            is_eligible=False,
        )

    n_days = user_df["bucket_start_ts"].dt.date.nunique()
    n_labels = user_df["label"].nunique()
    eligible = (
        n_windows >= min_windows and n_days >= min_days and n_labels >= min_labels
    )

    return PersonalizationEligibility(
        user_id=user_id,
        labeled_windows=n_windows,
        labeled_days=n_days,
        distinct_labels=n_labels,
        is_eligible=eligible,
    )

fit_temperature_calibrator(y_true_indices, y_proba)

Find the temperature that minimizes NLL on validation data.

Uses a two-pass grid search: coarse (0.1–5.0 step 0.1), then fine (±0.1 around best at step 0.01).

Parameters:

Name Type Description Default
y_true_indices ndarray

Integer-encoded true labels, shape (n,).

required
y_proba ndarray

Raw model probabilities, shape (n, n_classes).

required

Returns:

Type Description
TemperatureCalibrator

A fitted :class:TemperatureCalibrator.

Source code in src/taskclf/train/calibrate.py
def fit_temperature_calibrator(
    y_true_indices: np.ndarray,
    y_proba: np.ndarray,
) -> TemperatureCalibrator:
    """Find the temperature that minimizes NLL on validation data.

    Uses a two-pass grid search: coarse (0.1–5.0 step 0.1), then fine
    (±0.1 around best at step 0.01).

    Args:
        y_true_indices: Integer-encoded true labels, shape ``(n,)``.
        y_proba: Raw model probabilities, shape ``(n, n_classes)``.

    Returns:
        A fitted :class:`TemperatureCalibrator`.
    """
    eps = 1e-12
    logits = np.log(np.clip(y_proba, eps, None))

    def _apply_temp(t: float) -> float:
        scaled = logits / t
        shifted = scaled - scaled.max(axis=-1, keepdims=True)
        exp_vals = np.exp(shifted)
        probs = exp_vals / exp_vals.sum(axis=-1, keepdims=True)
        return _nll(probs, y_true_indices)

    # Coarse pass
    best_t = 1.0
    best_nll = _apply_temp(1.0)
    for t in np.arange(0.1, 5.05, 0.1):
        nll = _apply_temp(float(t))
        if nll < best_nll:
            best_nll = nll
            best_t = float(t)

    # Fine pass around best
    lo = max(0.01, best_t - 0.1)
    hi = best_t + 0.1
    for t in np.arange(lo, hi + 0.005, 0.01):
        nll = _apply_temp(float(t))
        if nll < best_nll:
            best_nll = nll
            best_t = float(t)

    return TemperatureCalibrator(temperature=round(best_t, 4))

fit_isotonic_calibrator(y_true_indices, y_proba, n_classes)

Fit per-class isotonic regression on validation data.

For each class c, fits IsotonicRegression on (y_proba[:, c], (y_true == c).astype(float)).

Parameters:

Name Type Description Default
y_true_indices ndarray

Integer-encoded true labels, shape (n,).

required
y_proba ndarray

Raw model probabilities, shape (n, n_classes).

required
n_classes int

Number of classes.

required

Returns:

Type Description
IsotonicCalibrator

A fitted :class:IsotonicCalibrator.

Source code in src/taskclf/train/calibrate.py
def fit_isotonic_calibrator(
    y_true_indices: np.ndarray,
    y_proba: np.ndarray,
    n_classes: int,
) -> IsotonicCalibrator:
    """Fit per-class isotonic regression on validation data.

    For each class *c*, fits ``IsotonicRegression`` on
    ``(y_proba[:, c], (y_true == c).astype(float))``.

    Args:
        y_true_indices: Integer-encoded true labels, shape ``(n,)``.
        y_proba: Raw model probabilities, shape ``(n, n_classes)``.
        n_classes: Number of classes.

    Returns:
        A fitted :class:`IsotonicCalibrator`.
    """
    regressors: list[IsotonicRegression] = []
    for c in range(n_classes):
        binary_target = (y_true_indices == c).astype(np.float64)
        reg = IsotonicRegression(y_min=0.0, y_max=1.0, out_of_bounds="clip")
        reg.fit(y_proba[:, c], binary_target)
        regressors.append(reg)
    return IsotonicCalibrator(regressors)

fit_calibrator_store(model, labeled_df, *, cat_encoders=None, method=DEFAULT_CALIBRATION_METHOD, min_windows=DEFAULT_MIN_LABELED_WINDOWS, min_days=DEFAULT_MIN_LABELED_DAYS, min_labels=DEFAULT_MIN_DISTINCT_LABELS, model_bundle_id=None, model_schema_hash=None)

Fit a global calibrator and per-user calibrators for eligible users.

  1. Predicts on the full labeled_df to get raw probabilities.
  2. Fits a global calibrator on all validation data.
  3. Checks each user's eligibility; for eligible users fits a per-user calibrator.

Parameters:

Name Type Description Default
model Booster

Trained LightGBM booster (frozen — not retrained).

required
labeled_df DataFrame

Labeled validation DataFrame with user_id, bucket_start_ts, label, and FEATURE_COLUMNS.

required
cat_encoders dict[str, LabelEncoder] | None

Pre-fitted categorical encoders from training.

None
method Literal['temperature', 'isotonic']

Calibration method — "temperature" or "isotonic".

DEFAULT_CALIBRATION_METHOD
min_windows int

Minimum labeled windows for per-user eligibility.

DEFAULT_MIN_LABELED_WINDOWS
min_days int

Minimum distinct days for per-user eligibility.

DEFAULT_MIN_LABELED_DAYS
min_labels int

Minimum distinct labels for per-user eligibility.

DEFAULT_MIN_DISTINCT_LABELS
model_bundle_id str | None

Run directory name of the model bundle. Recorded in the store for traceability.

None
model_schema_hash str | None

Schema hash of the model bundle. Recorded in the store so the inference policy can validate compatibility.

None

Returns:

Type Description
CalibratorStore

(store, eligibility_reports) — a :class:CalibratorStore

list[PersonalizationEligibility]

and a list of :class:PersonalizationEligibility for every

tuple[CalibratorStore, list[PersonalizationEligibility]]

unique user in labeled_df.

Source code in src/taskclf/train/calibrate.py
def fit_calibrator_store(
    model: lgb.Booster,
    labeled_df: pd.DataFrame,
    *,
    cat_encoders: dict[str, LabelEncoder] | None = None,
    method: Literal["temperature", "isotonic"] = DEFAULT_CALIBRATION_METHOD,  # type: ignore[assignment]
    min_windows: int = DEFAULT_MIN_LABELED_WINDOWS,
    min_days: int = DEFAULT_MIN_LABELED_DAYS,
    min_labels: int = DEFAULT_MIN_DISTINCT_LABELS,
    model_bundle_id: str | None = None,
    model_schema_hash: str | None = None,
) -> tuple[CalibratorStore, list[PersonalizationEligibility]]:
    """Fit a global calibrator and per-user calibrators for eligible users.

    1. Predicts on the full *labeled_df* to get raw probabilities.
    2. Fits a global calibrator on all validation data.
    3. Checks each user's eligibility; for eligible users fits a
       per-user calibrator.

    Args:
        model: Trained LightGBM booster (frozen — not retrained).
        labeled_df: Labeled validation DataFrame with ``user_id``,
            ``bucket_start_ts``, ``label``, and ``FEATURE_COLUMNS``.
        cat_encoders: Pre-fitted categorical encoders from training.
        method: Calibration method — ``"temperature"`` or ``"isotonic"``.
        min_windows: Minimum labeled windows for per-user eligibility.
        min_days: Minimum distinct days for per-user eligibility.
        min_labels: Minimum distinct labels for per-user eligibility.
        model_bundle_id: Run directory name of the model bundle.
            Recorded in the store for traceability.
        model_schema_hash: Schema hash of the model bundle.  Recorded
            in the store so the inference policy can validate
            compatibility.

    Returns:
        ``(store, eligibility_reports)`` — a :class:`CalibratorStore`
        and a list of :class:`PersonalizationEligibility` for every
        unique user in *labeled_df*.
    """
    le = LabelEncoder()
    le.fit(sorted(LABEL_SET_V1))
    n_classes = len(le.classes_)

    y_proba = predict_proba(model, labeled_df, cat_encoders)
    y_true = le.transform(labeled_df["label"].values)

    global_cal: TemperatureCalibrator | IsotonicCalibrator
    if method == "isotonic":
        global_cal = fit_isotonic_calibrator(y_true, y_proba, n_classes)
    else:
        global_cal = fit_temperature_calibrator(y_true, y_proba)

    # Per-user eligibility and calibration
    user_ids = sorted(labeled_df["user_id"].unique())
    eligibility_reports: list[PersonalizationEligibility] = []
    user_calibrators: dict[str, Calibrator] = {}

    for uid in user_ids:
        elig = check_personalization_eligible(
            labeled_df,
            uid,
            min_windows=min_windows,
            min_days=min_days,
            min_labels=min_labels,
        )
        eligibility_reports.append(elig)

        if not elig.is_eligible:
            logger.info(
                "User %s ineligible (windows=%d, days=%d, labels=%d)",
                uid,
                elig.labeled_windows,
                elig.labeled_days,
                elig.distinct_labels,
            )
            continue

        mask = labeled_df["user_id"].values == uid
        user_proba = y_proba[mask]
        user_true = y_true[mask]

        user_cal: TemperatureCalibrator | IsotonicCalibrator
        if method == "isotonic":
            user_cal = fit_isotonic_calibrator(user_true, user_proba, n_classes)
        else:
            user_cal = fit_temperature_calibrator(user_true, user_proba)

        user_calibrators[uid] = user_cal
        logger.info("Fitted %s calibrator for user %s", method, uid)

    from datetime import UTC, datetime

    store = CalibratorStore(
        global_calibrator=global_cal,
        user_calibrators=user_calibrators,
        method=method,
        model_bundle_id=model_bundle_id,
        model_schema_hash=model_schema_hash,
        created_at=datetime.now(UTC).isoformat(),
    )
    return store, eligibility_reports