infer.batch¶
Batch inference: predict, smooth, and segmentize over a feature DataFrame.
See Inference Contract for the canonical pipeline order and how this module fits the batch runtime path.
Overview¶
run_batch_inference executes the full post-training prediction
pipeline in a single call:
Each step is also available as a standalone function for granular use.
BatchInferenceResult¶
Frozen dataclass returned by run_batch_inference. Core prediction
fields are always populated; taxonomy-mapped fields are None when
no taxonomy config was provided.
| Field | Type | Description |
|---|---|---|
raw_labels |
list[str] |
Pre-smoothing labels (rejected buckets become Mixed/Unknown) |
smoothed_labels |
list[str] |
Post-smoothing labels |
segments |
list[Segment] |
Hysteresis-merged contiguous segments |
confidences |
np.ndarray |
max(proba) per row |
is_rejected |
np.ndarray |
Boolean rejection flags |
core_probs |
np.ndarray |
(N, 8) probability matrix |
mapped_labels |
list[str] \| None |
Taxonomy-mapped labels (if taxonomy provided) |
mapped_probs |
list[dict] \| None |
Per-bucket mapped probabilities (if taxonomy provided) |
Functions¶
predict_proba¶
Return the raw probability matrix (n_rows, n_classes) for a feature
DataFrame. Applies categorical encoding via encode_categoricals
before prediction.
predict_labels¶
Run the model and return predicted label strings. When
reject_threshold is set, predictions with max(proba) below the
threshold are replaced with Mixed/Unknown.
run_batch_inference¶
End-to-end pipeline that predicts, calibrates, rejects, smooths, segmentizes, and applies hysteresis merging.
Optional parameters:
calibrator-- a singleCalibratorinstance applied to all rows.calibrator_store-- aCalibratorStorefor per-user calibration; takes precedence overcalibratorwhenuser_idis present in the DataFrame.taxonomy-- aTaxonomyConfigto map core labels to user-defined buckets, populatingmapped_labelsandmapped_probson the result.reject_threshold-- confidence floor; predictions below this becomeMixed/Unknownbefore smoothing.smooth_window-- window size forrolling_majority(default fromcore.defaults).
write_predictions_csv¶
Write per-bucket predictions to CSV with columns for timestamp, predicted label, confidence, rejection flag, mapped label, and core probabilities.
write_segments_json / read_segments_json¶
Serialize and deserialize Segment lists as JSON. Timestamps are
stored in ISO 8601 format.
Usage¶
from pathlib import Path
from taskclf.infer.batch import run_batch_inference, write_segments_json
result = run_batch_inference(
model, features_df,
cat_encoders=cat_encoders,
reject_threshold=0.40,
)
print(f"{len(result.segments)} segments, "
f"{result.is_rejected.sum()} rejected buckets")
write_segments_json(result.segments, Path("artifacts/segments.json"))
taskclf.infer.batch
¶
Batch inference: predict, smooth, and segmentize over a feature DataFrame.
BatchInferenceResult
dataclass
¶
Container for batch inference outputs.
Always contains core prediction fields. Taxonomy-mapped fields
(mapped_labels, mapped_probs) are None when no taxonomy
config was provided.
Source code in src/taskclf/infer/batch.py
predict_proba(model, features_df, cat_encoders=None, *, schema_version=None)
¶
Return raw probability matrix for features_df.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Booster
|
Trained LightGBM booster. |
required |
features_df
|
DataFrame
|
Feature DataFrame with the model feature columns for the selected schema. |
required |
cat_encoders
|
dict[str, LabelEncoder] | None
|
Pre-fitted categorical encoders for string columns. |
None
|
schema_version
|
str | None
|
|
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Probability matrix of shape |
Source code in src/taskclf/infer/batch.py
predict_labels(model, features_df, label_encoder, cat_encoders=None, *, reject_threshold=None, schema_version=None)
¶
Run the model on features_df and return predicted label strings.
When reject_threshold is set, any prediction whose maximum
probability falls below the threshold is replaced with
Mixed/Unknown.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Booster
|
Trained LightGBM booster. |
required |
features_df
|
DataFrame
|
Feature DataFrame with the model feature columns for the selected schema. |
required |
label_encoder
|
LabelEncoder
|
Encoder fitted on the canonical label vocabulary. |
required |
cat_encoders
|
dict[str, LabelEncoder] | None
|
Pre-fitted categorical encoders for string columns. |
None
|
reject_threshold
|
float | None
|
If given, predictions with
|
None
|
Returns:
| Type | Description |
|---|---|
list[str]
|
Predicted label per row. |
Source code in src/taskclf/infer/batch.py
run_batch_inference(model, features_df, *, cat_encoders=None, smooth_window=DEFAULT_SMOOTH_WINDOW, bucket_seconds=DEFAULT_BUCKET_SECONDS, reject_threshold=None, taxonomy=None, calibrator=None, calibrator_store=None, schema_version=None)
¶
Predict, smooth, segmentize, and apply hysteresis merging.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Booster
|
Trained LightGBM booster. |
required |
features_df
|
DataFrame
|
Feature DataFrame with the selected schema's feature
columns and |
required |
cat_encoders
|
dict[str, LabelEncoder] | None
|
Pre-fitted categorical encoders for string columns. |
None
|
smooth_window
|
int
|
Window size for rolling-majority smoothing. |
DEFAULT_SMOOTH_WINDOW
|
bucket_seconds
|
int
|
Width of each time bucket in seconds. |
DEFAULT_BUCKET_SECONDS
|
reject_threshold
|
float | None
|
If given, predictions with
|
None
|
taxonomy
|
TaxonomyConfig | None
|
Optional taxonomy config. When provided, core labels
are mapped to user-defined buckets and |
None
|
calibrator
|
Calibrator | None
|
Optional probability calibrator. When provided, raw model probabilities are calibrated before the reject decision. |
None
|
calibrator_store
|
CalibratorStore | None
|
Optional per-user calibrator store. When
provided, per-user calibration is applied using the
|
None
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
BatchInferenceResult
|
class: |
BatchInferenceResult
|
(hysteresis-merged), and optional taxonomy-mapped outputs. |
Source code in src/taskclf/infer/batch.py
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 | |
write_predictions_csv(features_df, labels, path, *, confidences=None, is_rejected=None, mapped_labels=None, core_probs=None)
¶
Write per-bucket predictions to a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features_df
|
DataFrame
|
Source feature DataFrame. |
required |
labels
|
Sequence[str]
|
Predicted (smoothed) label per row. |
required |
path
|
Path
|
Destination CSV path. |
required |
confidences
|
ndarray | None
|
Optional max-probability per row. |
None
|
is_rejected
|
ndarray | None
|
Optional boolean rejection flag per row. |
None
|
mapped_labels
|
Sequence[str] | None
|
Optional taxonomy-mapped label per row. |
None
|
core_probs
|
ndarray | None
|
Optional probability matrix |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
The path that was written. |
Source code in src/taskclf/infer/batch.py
write_segments_json(segments, path)
¶
Write segments to a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
segments
|
Sequence[Segment]
|
Segment instances from :func: |
required |
path
|
Path
|
Destination JSON path. |
required |
Returns:
| Type | Description |
|---|---|
Path
|
The path that was written. |
Source code in src/taskclf/infer/batch.py
read_segments_json(path)
¶
Read segments from a JSON file written by :func:write_segments_json.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to an existing segments JSON file. |
required |
Returns:
| Type | Description |
|---|---|
list[Segment]
|
List of |