Skip to content

train.retrain

Retraining workflow: cadence scheduling, reproducible pipeline, regression gates.

Overview

The retrain module provides automated retraining with safety gates:

  • Cadence checks determine when a global retrain or calibrator update is due.
  • Dataset hashing ensures each model can be traced to its exact training data.
  • Regression gates prevent deploying a model that is worse than the champion.

Configuration

Retrain configuration lives in configs/retrain.yaml (versioned in git):

Field Type Default Description
global_retrain_cadence_days int 7 Days between global model retrains
calibrator_update_cadence_days int 7 Days between calibrator updates
data_lookback_days int 30 Days of data to include in training
regression_tolerance float 0.02 Max allowed macro-F1 drop vs champion
require_baseline_improvement bool true Challenger must beat rule baseline
auto_promote bool false Auto-promote when all gates pass
train_params.num_boost_round int 100 LightGBM boosting rounds
train_params.class_weight str balanced Class weight strategy

Promotion Gates

Both candidate gates and comparative gates must pass for promotion.

Candidate Gates (check_candidate_gates)

Hard gates evaluated on the challenger model in isolation (no champion needed):

  1. breakidle_precision — BreakIdle precision >= 0.95
  2. no_class_below_50_precision — every class precision >= 0.50
  3. challenger_acceptance — all acceptance checks from docs/guide/acceptance.md pass

Comparative Gates (check_regression_gates)

Regression check comparing challenger against the champion:

  1. macro_f1_no_regression — challenger macro-F1 within regression_tolerance of champion

Comparative gates are only evaluated when a champion exists. If no champion is available, only candidate gates apply.

CLI

# Check if retraining is due
taskclf train check-retrain --models-dir models/

# Run full retrain pipeline
taskclf train retrain --config configs/retrain.yaml --force --synthetic

# Dry run (evaluate without promoting)
taskclf train retrain --config configs/retrain.yaml --dry-run --synthetic

taskclf.train.retrain

Retraining workflow: cadence scheduling, reproducible pipeline, regression gates.

Implements TODO 11 from the labelling roadmap:

  • Cadence — :func:check_retrain_due / :func:check_calibrator_update_due determine whether a global retrain or per-user calibrator update should run.
  • Reproducibility — :func:compute_dataset_hash produces a deterministic SHA-256 of the training data so each model bundle can be traced back to its exact dataset. :class:RetrainConfig is loadable from a versioned YAML file.
  • Regression gates — :func:check_regression_gates compares a challenger model against the current champion on key metrics (macro-F1, BreakIdle precision, per-class precision) and only promotes if no regression exceeds the configured tolerance.
  • Pipeline — :func:run_retrain_pipeline orchestrates the full flow from data loading through training, evaluation, gate checking, and promotion.

TrainParams

Bases: BaseModel

Hyperparameters forwarded to :func:~taskclf.train.lgbm.train_lgbm.

Source code in src/taskclf/train/retrain.py
class TrainParams(BaseModel):
    """Hyperparameters forwarded to :func:`~taskclf.train.lgbm.train_lgbm`."""

    num_boost_round: int = DEFAULT_NUM_BOOST_ROUND
    class_weight: Literal["balanced", "none"] = "balanced"

RetrainConfig

Bases: BaseModel

Cadence, gate thresholds, and training parameters for the retrain loop.

Source code in src/taskclf/train/retrain.py
class RetrainConfig(BaseModel):
    """Cadence, gate thresholds, and training parameters for the retrain loop."""

    global_retrain_cadence_days: int = DEFAULT_RETRAIN_CADENCE_DAYS
    calibrator_update_cadence_days: int = DEFAULT_CALIBRATOR_UPDATE_CADENCE_DAYS
    data_lookback_days: int = DEFAULT_DATA_LOOKBACK_DAYS
    regression_tolerance: float = DEFAULT_REGRESSION_TOLERANCE
    require_baseline_improvement: bool = True
    auto_promote: bool = False
    min_improvement: float = 0.0
    train_params: TrainParams = Field(default_factory=TrainParams)

DatasetSnapshot

Bases: BaseModel

Immutable record of the dataset used for a training run.

Source code in src/taskclf/train/retrain.py
class DatasetSnapshot(BaseModel, frozen=True):
    """Immutable record of the dataset used for a training run."""

    dataset_hash: str
    row_count: int
    date_from: str
    date_to: str
    user_count: int
    class_distribution: dict[str, int]

RegressionGate

Bases: BaseModel

Result of a single regression gate check.

Source code in src/taskclf/train/retrain.py
class RegressionGate(BaseModel, frozen=True):
    """Result of a single regression gate check."""

    name: str
    passed: bool
    detail: str

RegressionResult

Bases: BaseModel

Aggregate result of all regression gates.

Source code in src/taskclf/train/retrain.py
class RegressionResult(BaseModel, frozen=True):
    """Aggregate result of all regression gates."""

    all_passed: bool
    gates: list[RegressionGate]

RetrainResult

Bases: BaseModel

Output of :func:run_retrain_pipeline.

Source code in src/taskclf/train/retrain.py
class RetrainResult(BaseModel):
    """Output of :func:`run_retrain_pipeline`."""

    promoted: bool
    champion_macro_f1: float | None = None
    challenger_macro_f1: float
    regression: RegressionResult | None = None
    candidate_gates: RegressionResult | None = None
    dataset_snapshot: DatasetSnapshot
    run_dir: str
    reason: str
    champion_source: str | None = None
    active_updated: bool = False

load_retrain_config(path)

Load a :class:RetrainConfig from a YAML file.

Parameters:

Name Type Description Default
path Path

Path to a YAML config file whose keys match :class:RetrainConfig fields.

required

Returns:

Type Description
RetrainConfig

A validated config instance.

Source code in src/taskclf/train/retrain.py
def load_retrain_config(path: Path) -> RetrainConfig:
    """Load a :class:`RetrainConfig` from a YAML file.

    Args:
        path: Path to a YAML config file whose keys match
            :class:`RetrainConfig` fields.

    Returns:
        A validated config instance.
    """
    raw = yaml.safe_load(path.read_text())
    return RetrainConfig.model_validate(raw)

compute_dataset_hash(features_df, labels)

Compute a deterministic SHA-256 of the training data.

The hash is built from a canonical JSON representation of:

  • Sorted feature column names
  • Feature values serialized row-by-row (sorted by primary key)
  • Label spans serialized in chronological order

Parameters:

Name Type Description Default
features_df DataFrame

Feature DataFrame (pre-projection).

required
labels Sequence[LabelSpan]

Label spans used for projection.

required

Returns:

Type Description
str

A hex-encoded SHA-256 digest (first 16 characters).

Source code in src/taskclf/train/retrain.py
def compute_dataset_hash(features_df: pd.DataFrame, labels: Sequence[LabelSpan]) -> str:
    """Compute a deterministic SHA-256 of the training data.

    The hash is built from a canonical JSON representation of:

    * Sorted feature column names
    * Feature values serialized row-by-row (sorted by primary key)
    * Label spans serialized in chronological order

    Args:
        features_df: Feature DataFrame (pre-projection).
        labels: Label spans used for projection.

    Returns:
        A hex-encoded SHA-256 digest (first 16 characters).
    """
    h = hashlib.sha256()

    cols = sorted(features_df.columns.tolist())
    h.update(json.dumps(cols, sort_keys=True).encode())

    sorted_df = features_df.sort_values(["user_id", "bucket_start_ts"]).reset_index(
        drop=True
    )
    h.update(sorted_df.to_csv(index=False).encode())

    sorted_spans = sorted(labels, key=lambda s: (s.start_ts, s.end_ts))
    for span in sorted_spans:
        h.update(span.model_dump_json().encode())

    return h.hexdigest()[:16]

find_latest_model(models_dir)

Return the path to the most recently created model bundle.

Scans models_dir for subdirectories containing metadata.json and picks the one with the latest created_at timestamp.

Parameters:

Name Type Description Default
models_dir Path

Base directory containing model run folders.

required

Returns:

Type Description
Path | None

Path to the latest run directory, or None if none found.

Source code in src/taskclf/train/retrain.py
def find_latest_model(models_dir: Path) -> Path | None:
    """Return the path to the most recently created model bundle.

    Scans ``models_dir`` for subdirectories containing ``metadata.json``
    and picks the one with the latest ``created_at`` timestamp.

    Args:
        models_dir: Base directory containing model run folders.

    Returns:
        Path to the latest run directory, or ``None`` if none found.
    """
    if not models_dir.is_dir():
        return None

    best_path: Path | None = None
    best_ts: str = ""

    for candidate in models_dir.iterdir():
        meta_path = candidate / "metadata.json"
        if not meta_path.is_file():
            continue
        try:
            raw = json.loads(meta_path.read_text())
            created = raw.get("created_at", "")
            if created > best_ts:
                best_ts = created
                best_path = candidate
        except json.JSONDecodeError, OSError:
            logger.debug("Skipping unreadable metadata at %s", meta_path, exc_info=True)
            continue

    return best_path

check_retrain_due(models_dir, cadence_days=DEFAULT_RETRAIN_CADENCE_DAYS)

Check whether a global retrain is due based on the last model's age.

Parameters:

Name Type Description Default
models_dir Path

Base directory containing model run folders.

required
cadence_days int

Maximum age in days before a retrain is needed.

DEFAULT_RETRAIN_CADENCE_DAYS

Returns:

Type Description
bool

True if retrain is due (or no model exists).

Source code in src/taskclf/train/retrain.py
def check_retrain_due(
    models_dir: Path,
    cadence_days: int = DEFAULT_RETRAIN_CADENCE_DAYS,
) -> bool:
    """Check whether a global retrain is due based on the last model's age.

    Args:
        models_dir: Base directory containing model run folders.
        cadence_days: Maximum age in days before a retrain is needed.

    Returns:
        ``True`` if retrain is due (or no model exists).
    """
    latest = find_latest_model(models_dir)
    if latest is None:
        return True

    raw = json.loads((latest / "metadata.json").read_text())
    created_at = raw.get("created_at", "")
    if not created_at:
        return True

    try:
        model_ts = ts_utc_aware_get(datetime.fromisoformat(created_at))
        age = datetime.now(UTC) - model_ts
        return age >= timedelta(days=cadence_days)
    except ValueError, TypeError:
        logger.debug("Could not parse model created_at timestamp", exc_info=True)
        return True

check_calibrator_update_due(calibrator_store_dir, cadence_days=DEFAULT_CALIBRATOR_UPDATE_CADENCE_DAYS)

Check whether a calibrator store update is due.

Parameters:

Name Type Description Default
calibrator_store_dir Path

Directory containing the calibrator store (expected to have a store.json with a created_at field).

required
cadence_days int

Maximum age in days before an update is needed.

DEFAULT_CALIBRATOR_UPDATE_CADENCE_DAYS

Returns:

Type Description
bool

True if update is due (or no store exists).

Source code in src/taskclf/train/retrain.py
def check_calibrator_update_due(
    calibrator_store_dir: Path,
    cadence_days: int = DEFAULT_CALIBRATOR_UPDATE_CADENCE_DAYS,
) -> bool:
    """Check whether a calibrator store update is due.

    Args:
        calibrator_store_dir: Directory containing the calibrator store
            (expected to have a ``store.json`` with a ``created_at`` field).
        cadence_days: Maximum age in days before an update is needed.

    Returns:
        ``True`` if update is due (or no store exists).
    """
    store_file = calibrator_store_dir / "store.json"
    if not store_file.is_file():
        return True

    try:
        raw = json.loads(store_file.read_text())
        created = raw.get("created_at", "")
        if not created:
            return True
        store_ts = ts_utc_aware_get(datetime.fromisoformat(created))
        return datetime.now(UTC) - store_ts >= timedelta(days=cadence_days)
    except json.JSONDecodeError, ValueError, TypeError, OSError:
        logger.debug("Could not parse calibrator store.json", exc_info=True)
        return True

check_candidate_gates(challenger_report)

Check candidate-only hard gates (no champion needed).

These gates evaluate the challenger model in isolation:

  1. breakidle_precision — BreakIdle precision >= 0.95.
  2. no_class_below_50_precision — no class may have precision < 0.50.
  3. challenger_acceptance — all of the challenger's own acceptance checks must pass.

Parameters:

Name Type Description Default
challenger_report EvaluationReport

Evaluation report of the newly trained model.

required

Returns:

Name Type Description
A RegressionResult

class:RegressionResult with per-gate pass/fail details.

Source code in src/taskclf/train/retrain.py
def check_candidate_gates(
    challenger_report: EvaluationReport,
) -> RegressionResult:
    """Check candidate-only hard gates (no champion needed).

    These gates evaluate the challenger model in isolation:

    1. **breakidle_precision** — BreakIdle precision >= 0.95.
    2. **no_class_below_50_precision** — no class may have precision < 0.50.
    3. **challenger_acceptance** — all of the challenger's own acceptance
       checks must pass.

    Args:
        challenger_report: Evaluation report of the newly trained model.

    Returns:
        A :class:`RegressionResult` with per-gate pass/fail details.
    """
    gates: list[RegressionGate] = []

    bi = challenger_report.per_class.get("BreakIdle", {})
    bi_prec = bi.get("precision", 0.0)
    gates.append(
        RegressionGate(
            name="breakidle_precision",
            passed=bi_prec >= 0.95,
            detail=f"BreakIdle precision={bi_prec:.4f} (>= 0.95 required)",
        )
    )

    low_classes = [
        (name, m["precision"])
        for name, m in challenger_report.per_class.items()
        if m.get("precision", 0.0) < 0.50
    ]
    gates.append(
        RegressionGate(
            name="no_class_below_50_precision",
            passed=len(low_classes) == 0,
            detail=(
                "all classes >= 0.50"
                if not low_classes
                else "FAIL: " + ", ".join(f"{n}={p:.4f}" for n, p in low_classes)
            ),
        )
    )

    all_acceptance = all(challenger_report.acceptance_checks.values())
    failed_checks = [k for k, v in challenger_report.acceptance_checks.items() if not v]
    gates.append(
        RegressionGate(
            name="challenger_acceptance",
            passed=all_acceptance,
            detail=(
                "all acceptance checks passed"
                if all_acceptance
                else "FAIL: " + ", ".join(failed_checks)
            ),
        )
    )

    return RegressionResult(
        all_passed=all(g.passed for g in gates),
        gates=gates,
    )

check_regression_gates(champion_report, challenger_report, config)

Comparative regression check: challenger vs champion on macro-F1.

This function is purely comparative — it checks only whether the challenger regresses relative to the champion. Candidate-only hard gates (BreakIdle precision, per-class precision floor, acceptance checks) are handled separately by :func:check_candidate_gates.

Gates:

  1. macro_f1_no_regression — challenger macro-F1 must be within regression_tolerance of the champion.

Parameters:

Name Type Description Default
champion_report EvaluationReport

Evaluation report of the current deployed model.

required
challenger_report EvaluationReport

Evaluation report of the newly trained model.

required
config RetrainConfig

Retrain configuration with tolerance settings.

required

Returns:

Name Type Description
A RegressionResult

class:RegressionResult with per-gate pass/fail details.

Source code in src/taskclf/train/retrain.py
def check_regression_gates(
    champion_report: EvaluationReport,
    challenger_report: EvaluationReport,
    config: RetrainConfig,
) -> RegressionResult:
    """Comparative regression check: challenger vs champion on macro-F1.

    This function is purely comparative — it checks only whether the
    challenger regresses relative to the champion.  Candidate-only hard
    gates (BreakIdle precision, per-class precision floor, acceptance
    checks) are handled separately by :func:`check_candidate_gates`.

    Gates:

    1. **macro_f1_no_regression** — challenger macro-F1 must be within
       ``regression_tolerance`` of the champion.

    Args:
        champion_report: Evaluation report of the current deployed model.
        challenger_report: Evaluation report of the newly trained model.
        config: Retrain configuration with tolerance settings.

    Returns:
        A :class:`RegressionResult` with per-gate pass/fail details.
    """
    gates: list[RegressionGate] = []

    delta = champion_report.macro_f1 - challenger_report.macro_f1
    passed = delta <= config.regression_tolerance
    gates.append(
        RegressionGate(
            name="macro_f1_no_regression",
            passed=passed,
            detail=(
                f"champion={champion_report.macro_f1:.4f} "
                f"challenger={challenger_report.macro_f1:.4f} "
                f"delta={delta:+.4f} tolerance={config.regression_tolerance}"
            ),
        )
    )

    return RegressionResult(
        all_passed=all(g.passed for g in gates),
        gates=gates,
    )

run_retrain_pipeline(config, features_df, label_spans, *, models_dir=Path(DEFAULT_MODELS_DIR), out_dir=Path('artifacts'), force=False, dry_run=False, holdout_user_fraction=0.0, reject_threshold=DEFAULT_REJECT_THRESHOLD, data_provenance='real')

Run the full retrain pipeline: train, evaluate, gate, promote.

Steps:

  1. Optionally check cadence (skipped if force).
  2. Compute dataset snapshot hash.
  3. Project labels, split, and train a challenger model.
  4. Evaluate the challenger.
  5. If a champion exists, run regression gates.
  6. Promote the challenger if all gates pass (skipped if dry_run).

Parameters:

Name Type Description Default
config RetrainConfig

Retrain configuration.

required
features_df DataFrame

Feature DataFrame covering the training range.

required
label_spans Sequence[LabelSpan]

Label spans for projection.

required
models_dir Path

Base directory for model bundles.

Path(DEFAULT_MODELS_DIR)
out_dir Path

Directory for evaluation artifacts.

Path('artifacts')
force bool

Skip cadence check.

False
dry_run bool

Evaluate without promoting.

False
holdout_user_fraction float

Fraction of users to hold out for test.

0.0
reject_threshold float

Reject threshold for evaluation.

DEFAULT_REJECT_THRESHOLD
data_provenance Literal['real', 'synthetic', 'mixed']

Origin of the training data.

'real'

Returns:

Name Type Description
A RetrainResult

class:RetrainResult summarizing what happened.

Source code in src/taskclf/train/retrain.py
def run_retrain_pipeline(
    config: RetrainConfig,
    features_df: pd.DataFrame,
    label_spans: Sequence[LabelSpan],
    *,
    models_dir: Path = Path(DEFAULT_MODELS_DIR),
    out_dir: Path = Path("artifacts"),
    force: bool = False,
    dry_run: bool = False,
    holdout_user_fraction: float = 0.0,
    reject_threshold: float = DEFAULT_REJECT_THRESHOLD,
    data_provenance: Literal["real", "synthetic", "mixed"] = "real",
) -> RetrainResult:
    """Run the full retrain pipeline: train, evaluate, gate, promote.

    Steps:

    1. Optionally check cadence (skipped if *force*).
    2. Compute dataset snapshot hash.
    3. Project labels, split, and train a challenger model.
    4. Evaluate the challenger.
    5. If a champion exists, run regression gates.
    6. Promote the challenger if all gates pass (skipped if *dry_run*).

    Args:
        config: Retrain configuration.
        features_df: Feature DataFrame covering the training range.
        label_spans: Label spans for projection.
        models_dir: Base directory for model bundles.
        out_dir: Directory for evaluation artifacts.
        force: Skip cadence check.
        dry_run: Evaluate without promoting.
        holdout_user_fraction: Fraction of users to hold out for test.
        reject_threshold: Reject threshold for evaluation.
        data_provenance: Origin of the training data.

    Returns:
        A :class:`RetrainResult` summarizing what happened.
    """
    from taskclf.labels.projection import project_blocks_to_windows
    from taskclf.train.dataset import split_by_time
    from taskclf.train.lgbm import train_lgbm

    if not force and not check_retrain_due(
        models_dir, config.global_retrain_cadence_days
    ):
        dataset_hash = compute_dataset_hash(features_df, label_spans)
        return RetrainResult(
            promoted=False,
            challenger_macro_f1=0.0,
            dataset_snapshot=DatasetSnapshot(
                dataset_hash=dataset_hash,
                row_count=len(features_df),
                date_from=str(features_df["bucket_start_ts"].min()),
                date_to=str(features_df["bucket_start_ts"].max()),
                user_count=features_df["user_id"].nunique(),
                class_distribution={},
            ),
            run_dir="",
            reason="Retrain not due yet",
        )

    # Dataset snapshot
    dataset_hash = compute_dataset_hash(features_df, label_spans)

    labeled_df = project_blocks_to_windows(features_df, label_spans)
    if labeled_df.empty:
        raise ValueError("No labeled rows after projection — cannot retrain")

    class_dist = labeled_df["label"].value_counts().to_dict()
    ts_col = labeled_df["bucket_start_ts"]

    snapshot = DatasetSnapshot(
        dataset_hash=dataset_hash,
        row_count=len(labeled_df),
        date_from=str(ts_col.min()),
        date_to=str(ts_col.max()),
        user_count=labeled_df["user_id"].nunique(),
        class_distribution={str(k): int(v) for k, v in class_dist.items()},
    )

    # Split
    splits = split_by_time(
        labeled_df,
        holdout_user_fraction=holdout_user_fraction,
    )
    train_df = labeled_df.iloc[splits["train"]].reset_index(drop=True)
    val_df = labeled_df.iloc[splits["val"]].reset_index(drop=True)
    test_df = labeled_df.iloc[splits["test"]].reset_index(drop=True)
    holdout_users: list[str] = splits.get("holdout_users", [])

    if train_df.empty or val_df.empty:
        raise ValueError("Train or validation split is empty — need more data")

    schema_version = LATEST_FEATURE_SCHEMA_VERSION
    if "schema_version" in labeled_df.columns and not labeled_df.empty:
        raw_schema_version = str(labeled_df["schema_version"].iloc[0]).strip()
        if raw_schema_version:
            schema_version = raw_schema_version

    # Train challenger
    logger.info(
        "Training challenger model (%d train, %d val rows)", len(train_df), len(val_df)
    )
    model, metrics, cm_df, params, cat_encoders = train_lgbm(
        train_df,
        val_df,
        num_boost_round=config.train_params.num_boost_round,
        class_weight=config.train_params.class_weight,
        schema_version=schema_version,
    )

    # Evaluate challenger
    eval_df = test_df if not test_df.empty else val_df
    challenger_report = evaluate_model(
        model,
        eval_df,
        cat_encoders=cat_encoders,
        holdout_users=holdout_users,
        reject_threshold=reject_threshold,
        schema_version=schema_version,
    )

    # Determine date range from features
    date_from = pd.Timestamp(ts_col.min()).date()
    date_to = pd.Timestamp(ts_col.max()).date()

    metadata = build_metadata(
        label_set=list(metrics["label_names"]),
        train_date_from=date_from,
        train_date_to=date_to,
        params=params,
        dataset_hash=dataset_hash,
        reject_threshold=reject_threshold,
        data_provenance=data_provenance,
        unknown_category_freq_threshold=params.get("unknown_category_freq_threshold"),
        unknown_category_mask_rate=params.get("unknown_category_mask_rate"),
        schema_version=schema_version,
    )

    # Candidate-only hard gates (no champion needed)
    candidate_gates = check_candidate_gates(challenger_report)

    # Regression gates against champion
    regression: RegressionResult | None = None
    champion_macro_f1: float | None = None
    champion_source: str | None = None
    champion_path = _resolve_champion(models_dir)

    if champion_path is not None:
        champion_source = champion_path.name
        try:
            champ_model, champ_meta, champ_encoders = load_model_bundle(champion_path)
            champion_report = evaluate_model(
                champ_model,
                eval_df,
                cat_encoders=champ_encoders,
                holdout_users=holdout_users,
                reject_threshold=reject_threshold,
                schema_version=champ_meta.schema_version,
            )
            champion_macro_f1 = champion_report.macro_f1
            regression = check_regression_gates(
                champion_report, challenger_report, config
            )
        except (ValueError, OSError) as exc:
            logger.warning(
                "Could not evaluate champion: %s — skipping regression gates", exc
            )

    # Promotion decision
    comparative_passed = regression is None or regression.all_passed
    candidate_passed = candidate_gates.all_passed
    should_promote = comparative_passed and candidate_passed and not dry_run

    active_updated = False

    if should_promote:
        run_dir = save_model_bundle(
            model,
            metadata,
            metrics,
            cm_df,
            models_dir,
            cat_encoders=cat_encoders,
        )
        reason = "Promoted: all gates passed"

        # Update active pointer if the newly promoted model is the best
        try:
            policy = SelectionPolicy(min_improvement=config.min_improvement)
            report = find_best_model(models_dir, policy)
            write_index_cache(models_dir, report)
            current = read_active(models_dir)
            if report.best is not None and should_switch_active(
                current,
                report.best,
                policy,
            ):
                current_model_dir = current.model_dir if current else None
                best_model_dir = str(report.best.path.relative_to(models_dir.parent))
                if current_model_dir != best_model_dir:
                    write_active_atomic(
                        models_dir,
                        report.best,
                        policy,
                        reason="retrain promotion",
                    )
                    active_updated = True
                    logger.info(
                        "Active model updated: %s -> %s",
                        current_model_dir,
                        best_model_dir,
                    )
        except Exception as exc:
            logger.warning("Failed to update active pointer: %s", exc)
    else:
        run_dir = save_model_bundle(
            model,
            metadata,
            metrics,
            cm_df,
            out_dir / "rejected_models",
            cat_encoders=cat_encoders,
        )
        if dry_run:
            reason = "Dry run: model saved to rejected_models"
        elif not candidate_passed:
            reason = "Rejected: candidate gates failed"
        else:
            reason = "Rejected: regression gates failed"

    return RetrainResult(
        promoted=should_promote,
        champion_macro_f1=champion_macro_f1,
        challenger_macro_f1=challenger_report.macro_f1,
        regression=regression,
        candidate_gates=candidate_gates,
        dataset_snapshot=snapshot,
        run_dir=str(run_dir),
        reason=reason,
        champion_source=champion_source,
        active_updated=active_updated,
    )