core.drift¶
Pure statistical functions for drift and distribution-shift detection.
Overview¶
All functions operate on numerical aggregates and prediction outputs only. No raw content (keystrokes, titles, URLs) is ever accessed or stored.
Thresholds follow acceptance.md section 7:
- Feature PSI > 0.2 triggers investigation
- Reject rate increase >= 10% triggers investigation
- Class distribution shift > 15% triggers investigation
Result Models¶
| Model | Fields | Description |
|---|---|---|
KsResult |
statistic, p_value |
Two-sample KS test output |
FeatureDriftResult |
feature, psi, ks_statistic, ks_p_value, is_drifted |
Per-feature drift metrics |
FeatureDriftReport |
results, flagged_features, timestamp |
Aggregated multi-feature report |
RejectRateDrift |
ref_rate, cur_rate, increase, is_flagged |
Reject-rate comparison |
EntropyDrift |
ref_mean_entropy, cur_mean_entropy, ratio, is_flagged |
Prediction entropy comparison |
ClassShiftResult |
ref_dist, cur_dist, max_shift, shifted_classes, is_flagged |
Class distribution shift |
Functions¶
compute_psi¶
Population Stability Index between two 1-D distributions. Uses equal-frequency (quantile) binning derived from the reference set.
from taskclf.core.drift import compute_psi
psi = compute_psi(reference_array, current_array, bins=10)
compute_ks¶
Two-sample Kolmogorov-Smirnov test via scipy.stats.ks_2samp.
from taskclf.core.drift import compute_ks
result = compute_ks(reference_array, current_array)
print(result.statistic, result.p_value, result.is_significant(alpha=0.05))
feature_drift_report¶
Run PSI + KS across all numerical features and flag drifted ones.
from taskclf.core.drift import feature_drift_report
report = feature_drift_report(ref_df, cur_df, numerical_features)
print(report.flagged_features)
detect_reject_rate_increase¶
Flag if reject rate increased by >= threshold (absolute).
from taskclf.core.drift import detect_reject_rate_increase
result = detect_reject_rate_increase(ref_labels, cur_labels, threshold=0.10)
detect_entropy_spike¶
Flag if mean prediction entropy spiked relative to reference.
from taskclf.core.drift import detect_entropy_spike
result = detect_entropy_spike(ref_probs, cur_probs, spike_multiplier=2.0)
detect_class_shift¶
Flag if any class proportion changed by more than threshold.
from taskclf.core.drift import detect_class_shift
result = detect_class_shift(ref_labels, cur_labels, threshold=0.15)
taskclf.core.drift
¶
Pure statistical functions for drift and distribution-shift detection.
All functions operate on numerical aggregates and prediction outputs only. No raw content (keystrokes, titles, URLs) is ever accessed or stored.
KsResult
¶
Bases: BaseModel
Result of a two-sample Kolmogorov-Smirnov test.
Source code in src/taskclf/core/drift.py
FeatureDriftResult
¶
FeatureDriftReport
¶
RejectRateDrift
¶
Bases: BaseModel
Reject-rate comparison between reference and current windows.
Source code in src/taskclf/core/drift.py
EntropyDrift
¶
ClassShiftResult
¶
Bases: BaseModel
Class-distribution shift between reference and current.
Source code in src/taskclf/core/drift.py
compute_psi(reference, current, bins=DEFAULT_PSI_BINS)
¶
Population Stability Index between two 1-D distributions.
Uses equal-frequency (quantile) binning derived from reference. Returns 0.0 when either array is empty or constant.
Source code in src/taskclf/core/drift.py
compute_ks(reference, current)
¶
Two-sample Kolmogorov-Smirnov test.
Returns KsResult with statistic and p_value.
Requires scipy (transitively available via scikit-learn).
Source code in src/taskclf/core/drift.py
feature_drift_report(ref_df, cur_df, numerical_features, *, psi_threshold=DEFAULT_PSI_THRESHOLD, ks_alpha=DEFAULT_KS_ALPHA)
¶
Run PSI + KS across numerical_features and flag drifted ones.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ref_df
|
'pd.DataFrame'
|
Reference-period feature DataFrame. |
required |
cur_df
|
'pd.DataFrame'
|
Current-period feature DataFrame. |
required |
numerical_features
|
Sequence[str]
|
Column names to test. |
required |
psi_threshold
|
float
|
PSI above this marks a feature as drifted. |
DEFAULT_PSI_THRESHOLD
|
ks_alpha
|
float
|
Significance level for the KS test. |
DEFAULT_KS_ALPHA
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
FeatureDriftReport
|
class: |
Source code in src/taskclf/core/drift.py
detect_reject_rate_increase(ref_labels, cur_labels, *, threshold=DEFAULT_REJECT_RATE_INCREASE_THRESHOLD, reject_label=MIXED_UNKNOWN)
¶
Flag if reject rate increased by >= threshold (absolute).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ref_labels
|
Sequence[str]
|
Reference-period predicted labels. |
required |
cur_labels
|
Sequence[str]
|
Current-period predicted labels. |
required |
threshold
|
float
|
Absolute increase that triggers the flag. |
DEFAULT_REJECT_RATE_INCREASE_THRESHOLD
|
reject_label
|
str
|
The label treated as a reject. |
MIXED_UNKNOWN
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
RejectRateDrift
|
class: |
Source code in src/taskclf/core/drift.py
detect_entropy_spike(ref_probs, cur_probs, *, spike_multiplier=DEFAULT_ENTROPY_SPIKE_MULTIPLIER)
¶
Flag if mean prediction entropy spiked.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ref_probs
|
ndarray
|
Reference probability matrix |
required |
cur_probs
|
ndarray
|
Current probability matrix |
required |
spike_multiplier
|
float
|
Current mean entropy must exceed
|
DEFAULT_ENTROPY_SPIKE_MULTIPLIER
|
Returns:
| Name | Type | Description |
|---|---|---|
An |
EntropyDrift
|
class: |
Source code in src/taskclf/core/drift.py
detect_class_shift(ref_labels, cur_labels, *, threshold=DEFAULT_CLASS_SHIFT_THRESHOLD)
¶
Flag if any class proportion changed by more than threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ref_labels
|
Sequence[str]
|
Reference-period labels. |
required |
cur_labels
|
Sequence[str]
|
Current-period labels. |
required |
threshold
|
float
|
Absolute proportion change that triggers the flag. |
DEFAULT_CLASS_SHIFT_THRESHOLD
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
ClassShiftResult
|
class: |