infer.taxonomy¶
User-specific taxonomy mapping: core labels to user-defined buckets.
Overview¶
The taxonomy layer sits between the model's core 8-class predictions and the user-facing display. It maps one or more core labels into user-defined buckets via a YAML config, without altering the underlying core predictions.
TaxonomyResolver is implemented as a slotted dataclass and still
accepts the same constructor input (TaxonomyConfig).
See the taxonomy guide and
configs/user_taxonomy_example.yaml for configuration details.
Config model hierarchy¶
TaxonomyConfig¶
Top-level config loaded from YAML.
| Field | Type | Default | Description |
|---|---|---|---|
version |
str |
"1.0" |
Config schema version |
label_schema_version |
str |
"labels_v1" |
Expected label schema |
user_id |
str \| None |
None |
Optional user scope |
display |
TaxonomyDisplay |
(defaults) | Display preferences |
reject |
TaxonomyReject |
(defaults) | Rejection display settings |
buckets |
list[TaxonomyBucket] |
(required) | At least one bucket |
advanced |
TaxonomyAdvanced |
(defaults) | Tuning knobs |
TaxonomyBucket¶
A user-facing task category that aggregates one or more core labels.
| Field | Type | Description |
|---|---|---|
name |
str |
Unique display name |
description |
str |
Human-readable description |
core_labels |
list[str] |
Core labels mapped to this bucket (must be valid LABEL_SET_V1 entries) |
color |
str |
Hex color for display (#RRGGBB) |
TaxonomyDisplay¶
| Field | Type | Default | Description |
|---|---|---|---|
show_core_labels |
bool |
False |
Show underlying core labels in UI |
default_view |
"mapped" \| "core" |
"mapped" |
Default view mode |
color_theme |
str |
"default" |
Color theme name |
TaxonomyReject¶
| Field | Type | Default | Description |
|---|---|---|---|
mixed_label_name |
str |
"Mixed/Unknown" |
Label shown for rejected predictions |
include_rejected_in_reports |
bool |
False |
Include rejected buckets in reports |
TaxonomyAdvanced¶
| Field | Type | Default | Description |
|---|---|---|---|
probability_aggregation |
"sum" \| "max" |
"sum" |
How core-label probs are combined per bucket |
min_confidence_for_mapping |
float |
0.55 |
Minimum confidence for mapping |
reweight_core_labels |
dict[str, float] |
{} |
Per-label probability multipliers |
TaxonomyResolver¶
Stateless mapper from core predictions to user-defined buckets. Precomputes index lookups at construction time for fast per-row resolution.
from pathlib import Path
from taskclf.infer.taxonomy import load_taxonomy, TaxonomyResolver
config = load_taxonomy(Path("configs/user_taxonomy.yaml"))
resolver = TaxonomyResolver(config)
result = resolver.resolve(core_label_id, core_probs)
print(result.mapped_label, result.mapped_probs)
resolve_batch maps an entire batch at once:
results = resolver.resolve_batch(pred_indices, proba_matrix)
mapped_labels = [r.mapped_label for r in results]
Aggregation modes¶
When a bucket contains multiple core labels, their probabilities are combined using the configured aggregation mode:
sum(default) -- probabilities are summed, then the full vector is renormalized.max-- the maximum probability among the bucket's core labels is used, then renormalized.
Fallback bucket¶
Core labels not assigned to any user bucket are automatically collected
into an "Other" fallback bucket. A log message lists the unmapped
labels when this occurs.
Reweighting¶
advanced.reweight_core_labels allows adjusting core-label
probabilities before mapping. Each entry is a label: weight
multiplier applied to the probability vector, which is then
renormalized. This can bias the mapping toward or away from
specific core labels without retraining.
I/O helpers¶
load_taxonomy(path)-- load and validate a YAML config.save_taxonomy(config, path)-- serialize a config to YAML.default_taxonomy()-- create an identity mapping (one bucket per core label) as a starting point for customisation.
taskclf.infer.taxonomy
¶
User-specific taxonomy mapping: core labels -> user-defined buckets.
This module implements the personalization mapping layer described in
docs/guide/model_io.md Section 5. It converts model predictions
(core label + probability vector) into user-facing bucket labels with
aggregated probabilities, without altering the underlying core predictions.
Typical flow::
config = load_taxonomy(Path("configs/user_taxonomy.yaml"))
resolver = TaxonomyResolver(config)
result = resolver.resolve(core_label_id, core_probs)
# result.mapped_label, result.mapped_probs
TaxonomyBucket
¶
Bases: BaseModel
A user-facing task category that aggregates one or more core labels.
Source code in src/taskclf/infer/taxonomy.py
TaxonomyDisplay
¶
Bases: BaseModel
User display preferences (not used by resolver logic).
Source code in src/taskclf/infer/taxonomy.py
TaxonomyReject
¶
TaxonomyAdvanced
¶
Bases: BaseModel
Advanced mapping tuning knobs.
Source code in src/taskclf/infer/taxonomy.py
TaxonomyConfig
¶
Bases: BaseModel
Full user-specific taxonomy mapping configuration.
Loaded from a YAML file matching the format in
configs/user_taxonomy_example.yaml.
Source code in src/taskclf/infer/taxonomy.py
TaxonomyResult
¶
Bases: BaseModel
Output of the taxonomy mapping resolver for a single window.
Source code in src/taskclf/infer/taxonomy.py
TaxonomyResolver
dataclass
¶
Stateless mapper from core predictions to user-defined buckets.
Precomputes index lookups at construction time so that per-row resolution is fast.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
TaxonomyConfig
|
Validated taxonomy config. |
required |
Source code in src/taskclf/infer/taxonomy.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 | |
bucket_names
property
¶
Ordered list of bucket names (including fallback if present).
resolve(core_label_id, core_probs, *, is_rejected=False)
¶
Map a single window's core prediction to a user bucket.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
core_label_id
|
int
|
Index of the predicted core label (unused directly -- probabilities drive the mapping). |
required |
core_probs
|
ndarray
|
Probability vector of shape |
required |
is_rejected
|
bool
|
Whether the prediction was below the reject threshold. |
False
|
Returns:
| Type | Description |
|---|---|
TaxonomyResult
|
A |
TaxonomyResult
|
|
Source code in src/taskclf/infer/taxonomy.py
resolve_batch(core_label_ids, core_probs, *, is_rejected=None)
¶
Map a batch of core predictions to user buckets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
core_label_ids
|
ndarray
|
Shape |
required |
core_probs
|
ndarray
|
Shape |
required |
is_rejected
|
ndarray | None
|
Optional boolean array of shape |
None
|
Returns:
| Type | Description |
|---|---|
list[TaxonomyResult]
|
List of |
Source code in src/taskclf/infer/taxonomy.py
load_taxonomy(path)
¶
Load and validate a taxonomy config from a YAML file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to a YAML file matching the taxonomy config schema. |
required |
Returns:
| Type | Description |
|---|---|
TaxonomyConfig
|
Validated |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If path does not exist. |
ValueError / ValidationError
|
If the YAML is malformed or invalid. |
Source code in src/taskclf/infer/taxonomy.py
save_taxonomy(config, path)
¶
Serialize a taxonomy config to YAML.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
TaxonomyConfig
|
Validated taxonomy config to write. |
required |
path
|
Path
|
Destination file path. |
required |
Returns:
| Type | Description |
|---|---|
Path
|
The path that was written. |
Source code in src/taskclf/infer/taxonomy.py
default_taxonomy()
¶
Create an identity taxonomy: one bucket per core label.
Useful as a starting point for user customisation.