User-Specific Taxonomy Mapping¶
Version: 1.0 Status: Stable Last Updated: 2026-02-24
This document explains how users can map core classifier labels to their own task categories without retraining the model.
1. Overview¶
The global classifier always predicts one of 8 core labels (see
docs/guide/labels_v1.md). Different users may want to see different
categories in their time reports. The taxonomy mapping layer sits
after model prediction and converts core labels + probability
vectors into user-defined buckets with aggregated probabilities.
Key properties:
- Core predictions are never modified.
- Mapping is deterministic given the same config and input.
- Multiple core labels can map to the same user bucket (many-to-one).
- Unmapped core labels fall into an automatic "Other" bucket.
- Rejected predictions remain
Mixed/Unknownregardless of mapping.
2. Config Format¶
Taxonomy configs are YAML files. See configs/user_taxonomy_example.yaml
for a complete example.
Minimal example¶
version: "1.0"
label_schema_version: labels_v1
buckets:
- name: "Deep Work"
core_labels: [Build, Debug, Review, Write]
color: "#2E86DE"
- name: "Research"
core_labels: [ReadResearch]
color: "#9B59B6"
- name: "Communication"
core_labels: [Communicate, Meet]
color: "#27AE60"
- name: "Break"
core_labels: [BreakIdle]
color: "#7F8C8D"
Fields¶
| Field | Required | Description |
|---|---|---|
version |
Yes | Config format version (currently "1.0") |
label_schema_version |
No | Must be labels_v1 |
user_id |
No | Ties config to a specific user |
display |
No | Display preferences (show_core_labels, default_view) |
reject |
No | Reject label name, whether to include in reports |
buckets |
Yes | List of user-facing categories (at least one) |
advanced |
No | Aggregation mode, reweighting, min confidence |
Bucket fields¶
| Field | Required | Description |
|---|---|---|
name |
Yes | Unique display name |
description |
No | Human-readable description |
core_labels |
Yes | List of core label names (at least one) |
color |
No | Hex color #RRGGBB (default: #808080) |
3. Validation Rules¶
The following rules are enforced when loading a taxonomy config:
- Every
core_labelin a bucket must be a validCoreLabelmember. - Bucket names must be unique.
- Colors must be valid hex (
#RRGGBB). - Reweight keys must be valid
CoreLabelvalues with weight > 0. - At least one bucket is required.
If a core label is not in any bucket, it is automatically assigned to an implicit "Other" fallback bucket.
If validation fails, the system falls back to core labels only.
4. Probability Aggregation¶
When multiple core labels map to one bucket, their probabilities are combined.
Sum mode (default)¶
Example: if "Deep Work" maps Build + Debug + Write, and their core
probabilities are 0.4 + 0.1 + 0.2, then bucket_prob = 0.7.
Max mode¶
Set in the config:
After aggregation, bucket probabilities are normalized to sum to 1.0.
5. Reweighting¶
Advanced users can adjust the contribution of individual core labels without retraining:
Reweights are multiplied into core_probs before aggregation, then
re-normalized. This lets users bias the mapping toward or away from
specific activities.
6. CLI Commands¶
Validate a config¶
Show the mapping table¶
Generate a default config¶
This creates an identity mapping (one bucket per core label) as a starting point for customisation.
Use with batch inference¶
taskclf infer batch \
--model-dir models/run_dir \
--from 2025-06-10 --to 2025-06-15 \
--taxonomy configs/user_taxonomy.yaml
Adds a mapped_label column to predictions.csv.
Use with online inference¶
7. Integration Points¶
The taxonomy mapping layer fits into the inference pipeline as follows:
- Batch inference:
run_batch_inference()accepts an optionaltaxonomyparameter. When set,BatchInferenceResultincludesmapped_labelsandmapped_probs. - Online inference:
OnlinePredictoraccepts an optionaltaxonomyparameter.predict_bucket()returns the mapped label when set. - Core label smoothing always operates on core labels; taxonomy mapping is applied afterwards.
8. Versioning¶
The taxonomy config has its own version field. Changes to:
- Bucket definitions
- Aggregation semantics
- Reweight behavior
should bump the version. Taxonomy configs are user-editable and should be versioned alongside other project configs.