infer.smooth¶
Post-prediction smoothing and segmentization into contiguous spans.
Segment¶
Frozen dataclass representing a contiguous run of identical predicted labels.
| Field | Type | Description |
|---|---|---|
start_ts |
datetime |
Start of the first bucket in the segment |
end_ts |
datetime |
End of the last bucket (start + bucket width) |
label |
str |
Predicted label for the entire segment |
bucket_count |
int |
Number of time buckets in the segment |
rolling_majority¶
Centred sliding-window majority vote smoother.
For each position, the most common label within the window wins.
Ties are broken by keeping the original label, which biases toward
stability. The window is centred: for window=5, positions
[i-2, i+2] are considered (clamped at boundaries).
from taskclf.infer.smooth import rolling_majority
raw = ["Coding", "Coding", "Meeting", "Coding", "Coding"]
smoothed = rolling_majority(raw, window=3)
# smoothed == ["Coding", "Coding", "Coding", "Coding", "Coding"]
The default window size comes from DEFAULT_SMOOTH_WINDOW in
core.defaults.
segmentize¶
Merges consecutive identical labels into Segment spans. Every input
bucket is covered by exactly one segment; segments are ordered and
non-overlapping.
from taskclf.infer.smooth import segmentize
segments = segmentize(bucket_starts, smoothed_labels, bucket_seconds=60)
Raises ValueError if bucket_starts and labels differ in length
or are empty.
flap_rate¶
Computes the label flap rate: label_changes / total_windows.
As defined in the acceptance guide, section 5:
| Metric | Threshold |
|---|---|
| Raw flap rate | ≤ 0.25 |
| Smoothed flap rate | ≤ 0.15 |
Returns 0.0 for sequences of length ≤ 1.
merge_short_segments¶
Hysteresis rule that absorbs segments shorter than
MIN_BLOCK_DURATION_SECONDS (default 180 s / 3 minutes) into their
neighbours.
Strategy for each short segment:
- If either neighbour has the same label, merge into that neighbour.
- Otherwise merge into the longer neighbour.
- On a tie, prefer the preceding neighbour.
- First and last segments are never removed (but may absorb neighbours).
The function iterates until no more short segments can be merged, guaranteeing convergence because total segment count strictly decreases each pass.
from taskclf.infer.smooth import merge_short_segments
merged = merge_short_segments(segments, min_duration_seconds=180)
taskclf.infer.smooth
¶
Post-prediction smoothing and segmentization.
Smoothing reduces short prediction spikes via majority vote. Segmentization merges consecutive identical labels into contiguous spans.
Segment
dataclass
¶
rolling_majority(labels, window=DEFAULT_SMOOTH_WINDOW)
¶
Smooth labels with a centred sliding-window majority vote.
For each position the most common label within the window wins. Ties are broken by keeping the original label.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
labels
|
Sequence[str]
|
Ordered sequence of predicted labels (one per bucket). |
required |
window
|
int
|
Odd window size centred on each position. |
DEFAULT_SMOOTH_WINDOW
|
Returns:
| Type | Description |
|---|---|
list[str]
|
Smoothed label list of the same length as labels. |
Source code in src/taskclf/infer/smooth.py
segmentize(bucket_starts, labels, bucket_seconds=DEFAULT_BUCKET_SECONDS)
¶
Merge runs of identical labels into :class:Segment spans.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bucket_starts
|
Sequence[datetime]
|
Sorted bucket-start timestamps (one per bucket). |
required |
labels
|
Sequence[str]
|
Predicted (or smoothed) label per bucket, same length. |
required |
bucket_seconds
|
int
|
Width of each bucket in seconds. |
DEFAULT_BUCKET_SECONDS
|
Returns:
| Type | Description |
|---|---|
list[Segment]
|
Ordered, non-overlapping list of segments covering every input bucket. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If bucket_starts and labels differ in length or are empty. |
Source code in src/taskclf/infer/smooth.py
flap_rate(labels)
¶
Compute label flap rate: label changes / total windows.
As defined in docs/guide/acceptance.md section 5.
Acceptance thresholds: raw <= 0.25, smoothed <= 0.15.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
labels
|
Sequence[str]
|
Ordered sequence of predicted labels (one per bucket). |
required |
Returns:
| Type | Description |
|---|---|
float
|
Flap rate in [0, 1]. Returns 0.0 for sequences of length <= 1. |
Source code in src/taskclf/infer/smooth.py
merge_short_segments(segments, *, min_duration_seconds=MIN_BLOCK_DURATION_SECONDS, bucket_seconds=DEFAULT_BUCKET_SECONDS)
¶
Absorb segments shorter than min_duration_seconds into neighbours.
Implements the hysteresis rule from the time spec: label changes
lasting less than MIN_BLOCK_DURATION_SECONDS (default 180 s /
3 minutes) are smoothed into the surrounding label.
Strategy for each short segment:
- If either neighbour has the same label, merge into that neighbour.
- Otherwise merge into the longer neighbour (prefer the preceding one on ties).
- First and last segments are never removed (but may absorb neighbours).
The function iterates until no more short segments can be merged, guaranteeing convergence because total segment count strictly decreases each pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
segments
|
Sequence[Segment]
|
Ordered, non-overlapping segments (as from
:func: |
required |
min_duration_seconds
|
int
|
Minimum block duration in seconds. |
MIN_BLOCK_DURATION_SECONDS
|
bucket_seconds
|
int
|
Width of each time bucket in seconds. |
DEFAULT_BUCKET_SECONDS
|
Returns:
| Type | Description |
|---|---|
list[Segment]
|
A new list of segments with short blocks absorbed. |
Source code in src/taskclf/infer/smooth.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |