model_inspection¶
Read-only inspection of trained model bundles and optional replay of held-out test evaluation.
Bundle vs replay¶
- Bundle artifacts (
metrics.json,confusion_matrix.csv) inmodels/<run_id>/record validation metrics from :func:~taskclf.train.lgbm.train_lgbm, not the chronological test split. From the saved confusion matrix alone, inspection can derive per-class support, per-class precision/recall/F1, and top confusion pairs (largest off-diagonal counts). Probability-based scores (ECE, Brier, log loss), slice metrics, and unknown-category rates are not persisted in the bundle; they need a dataframe replay. - Held-out test evaluation replays the same pipeline as
:func:
~taskclf.train.evaluate.evaluate_modelon labeled data for a date range (see :func:~taskclf.model_inspection.replay_test_evaluation). That path fillsreplayed_test_evaluation.reportwith a full :class:~taskclf.train.evaluate.EvaluationReport(including calibration scalars, slices, unknown rates, and test class distribution). The same enriched report is written toevaluation.jsonbytrain evaluate.
Public API¶
taskclf.model_inspection
¶
Read-only inspection of trained model bundles and optional test-set replay.
Bundle metrics.json / confusion_matrix.csv reflect validation metrics
from training (see :func:~taskclf.train.lgbm.train_lgbm). Held-out test
metrics and class distribution require replaying evaluation on labeled data for
a date range (same pipeline as taskclf train evaluate).
PredictionLogicInfo
¶
Bases: BaseModel
Stable description of how multiclass predictions are formed.
Source code in src/taskclf/model_inspection.py
BundleInspectionSection
¶
Bases: BaseModel
Metrics persisted in the bundle at train time (validation split).
Source code in src/taskclf/model_inspection.py
ReplayTestSection
¶
Bases: BaseModel
Held-out test replay (same pipeline as taskclf train evaluate).
Source code in src/taskclf/model_inspection.py
ModelInspectResult
¶
Bases: BaseModel
Complete output of :func:inspect_model.
Source code in src/taskclf/model_inspection.py
per_class_metrics_from_confusion_matrix(cm, label_names)
¶
Derive per-class precision, recall, and F1 from a square confusion matrix.
Uses the same layout as :func:sklearn.metrics.confusion_matrix with
labels=label_names: rows are true class, columns are predicted
class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cm
|
list[list[int]]
|
Square matrix |
required |
label_names
|
list[str]
|
Ordered class names (row/column order). |
required |
Returns:
| Type | Description |
|---|---|
dict[str, dict[str, float | int]]
|
Mapping each label to |
dict[str, dict[str, float | int]]
|
to 4 decimals, matching :func: |
Source code in src/taskclf/model_inspection.py
build_labeled_dataframe(date_from, date_to, *, data_dir, synthetic)
¶
Load features and labels for [date_from, date_to], project to windows.
Raises:
| Type | Description |
|---|---|
ValueError
|
If no feature rows exist or the labeled frame is empty. |
Source code in src/taskclf/model_inspection.py
prediction_logic_info()
¶
inspect_bundle_only(model_dir)
¶
Load bundle and build the validation-metrics inspection section.
Returns:
| Type | Description |
|---|---|
tuple[Path, ModelMetadata, BundleInspectionSection]
|
Resolved bundle path, metadata, and bundle_saved_validation section. |
Source code in src/taskclf/model_inspection.py
replay_test_evaluation(model_dir, date_from, date_to, *, data_dir, synthetic=False, holdout_fraction=0.0, reject_threshold=DEFAULT_REJECT_THRESHOLD)
¶
Run held-out evaluation on the test split (same as train evaluate).
Source code in src/taskclf/model_inspection.py
inspect_model(model_dir, *, date_from=None, date_to=None, data_dir=None, synthetic=False, holdout_fraction=0.0, reject_threshold=DEFAULT_REJECT_THRESHOLD)
¶
Inspect a bundle; optionally replay test evaluation when dates are set.
When date_from and date_to are both provided, data_dir must be set
(or use synthetic data with synthetic=True).