labels.weak_rules¶
Heuristic weak-labeling rules that map feature rows to task-type labels.
Overview¶
Weak rules provide an automated, low-confidence alternative to manual
labeling. Each rule inspects a single feature column and, when matched,
proposes a LabelSpan with
provenance="weak:<rule_name>". Rules are evaluated in list order;
the first match wins.
Weak labels share the same LabelSpan structure as gold labels.
The provenance field distinguishes them — gold labels use "manual",
while weak labels use the "weak:<rule_name>" convention. This allows
downstream consumers (training, projection) to filter or weight labels
by origin.
WeakRule¶
Frozen dataclass representing a single heuristic rule.
| Field | Type | Description |
|---|---|---|
name |
str |
Human-readable identifier, also used in provenance |
field |
str |
Feature column to inspect (e.g. "app_id") |
pattern |
str |
Value the column must equal for the rule to fire |
label |
str |
Task-type label to assign (must be in LABEL_SET_V1) |
confidence |
float \| None |
Optional confidence attached to produced spans |
Construction raises ValueError if label is not in LABEL_SET_V1.
Built-in rule maps¶
The module provides three dictionaries that feed build_default_rules().
They are ordered by specificity when assembled into the default list.
APP_ID_RULES (highest priority)¶
Maps reverse-domain app_id values to labels.
app_id |
Label |
|---|---|
com.apple.Terminal |
Build |
com.microsoft.VSCode |
Build |
com.jetbrains.intellij |
Build |
com.googlecode.iterm2 |
Build |
org.mozilla.firefox |
ReadResearch |
com.google.Chrome |
ReadResearch |
com.apple.Safari |
ReadResearch |
com.apple.mail |
Communicate |
com.tinyspeck.slackmacgap |
Communicate |
us.zoom.xos |
Meet |
com.apple.Notes |
Write |
com.apple.finder |
BreakIdle |
APP_CATEGORY_RULES¶
Maps app_category values to labels.
| Category | Label |
|---|---|
lockscreen |
BreakIdle |
meeting |
Meet |
chat |
Communicate |
email |
Communicate |
editor |
Build |
terminal |
Build |
devtools |
Debug |
docs |
Write |
design |
Write |
media |
BreakIdle |
file_manager |
BreakIdle |
The lockscreen category covers OS lock/login screens (macOS loginwindow,
Windows LockApp.exe/LogonUI.exe, Linux screen lockers like i3lock,
swaylock, gnome-screensaver, etc.). No productive task is possible while
the screen is locked, so these are unconditionally labeled BreakIdle.
DOMAIN_CATEGORY_RULES (lowest priority)¶
Maps browser domain_category values to labels.
| Domain | Label |
|---|---|
code_hosting |
Build |
email_web |
Communicate |
chat |
Communicate |
social |
BreakIdle |
video |
BreakIdle |
news |
ReadResearch |
docs |
ReadResearch |
search |
ReadResearch |
productivity |
Write |
Functions¶
build_default_rules¶
Builds the default rule list from the three built-in maps, ordered by
specificity: APP_ID_RULES first, then APP_CATEGORY_RULES, then
DOMAIN_CATEGORY_RULES.
match_rule¶
Matches a single feature row (as a dict) against an ordered list of
rules. Returns (label, rule_name) on the first match, or None
if no rule fires.
apply_weak_rules¶
def apply_weak_rules(
features_df: pd.DataFrame,
rules: Sequence[WeakRule] | None = None,
user_id: str | None = None,
bucket_seconds: int = DEFAULT_BUCKET_SECONDS,
) -> list[LabelSpan]
Applies rules to every row in features_df. Consecutive buckets
(by bucket_start_ts) that receive the same label are merged
into a single LabelSpan. A new span starts when the label changes
or there is a time gap between buckets.
| Parameter | Default | Description |
|---|---|---|
features_df |
— | DataFrame with bucket_start_ts, bucket_end_ts, and feature columns |
rules |
None |
Rule list; defaults to build_default_rules() |
user_id |
None |
User id attached to every produced span |
bucket_seconds |
60 |
Expected bucket duration for gap detection |
Usage¶
import pandas as pd
from taskclf.labels.weak_rules import apply_weak_rules, build_default_rules
rules = build_default_rules()
weak_spans = apply_weak_rules(features_df, rules=rules, user_id="u1")
for span in weak_spans:
print(f"{span.start_ts} → {span.end_ts} {span.label} ({span.provenance})")
Custom rules can be mixed with or replace the defaults:
from taskclf.labels.weak_rules import WeakRule, apply_weak_rules
custom_rules = [
WeakRule(name="figma", field="app_id", pattern="com.figma.Desktop", label="Write"),
]
spans = apply_weak_rules(features_df, rules=custom_rules)
See also¶
labels.store— persisting label spanslabels.projection— projecting spans onto feature windowscore.types.LabelSpan— the shared label data model
taskclf.labels.weak_rules
¶
Heuristic weak-labeling rules that map feature rows to task-type labels.
Weak rules provide an automated, low-confidence alternative to manual
labeling. Each rule inspects a single feature column (e.g. app_id,
app_category, domain_category) and, when matched, proposes a
:class:~taskclf.core.types.LabelSpan with provenance="weak:<rule_name>".
Rules are evaluated in list order; the first match wins. The default
rule list is ordered by specificity: app_id rules first, then
app_category, then domain_category.
WeakRule
dataclass
¶
A single heuristic labeling rule.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Human-readable identifier (also used in |
field |
str
|
Feature column to inspect (e.g. |
pattern |
str
|
Value that the column must equal for the rule to fire. |
label |
str
|
Task-type label to assign (must be in |
confidence |
float | None
|
Optional confidence score attached to produced spans. |
Source code in src/taskclf/labels/weak_rules.py
build_default_rules()
¶
Build the default rule list ordered by specificity.
Order: app_id rules, then app_category, then
domain_category. Within each group the iteration order of the
corresponding dict is preserved.
Returns:
| Type | Description |
|---|---|
list[WeakRule]
|
List of :class: |
Source code in src/taskclf/labels/weak_rules.py
match_rule(row, rules)
¶
Match a single feature row against rules (first match wins).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
row
|
dict[str, Any]
|
Feature row as a dict (column name -> value). |
required |
rules
|
Sequence[WeakRule]
|
Ordered sequence of rules to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
tuple[str, str] | None
|
|
tuple[str, str] | None
|
if no rule fires. |
Source code in src/taskclf/labels/weak_rules.py
apply_weak_rules(features_df, rules=None, user_id=None, bucket_seconds=DEFAULT_BUCKET_SECONDS)
¶
Apply weak rules to every row in features_df and merge spans.
Consecutive buckets (ordered by bucket_start_ts) that receive
the same label are merged into a single :class:LabelSpan.
A new span starts whenever the label changes or there is a gap
between buckets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features_df
|
DataFrame
|
DataFrame with at least |
required |
rules
|
Sequence[WeakRule] | None
|
Rule list to apply. Defaults to
:func: |
None
|
user_id
|
str | None
|
Optional user id attached to every produced span. |
None
|
bucket_seconds
|
int
|
Expected bucket duration; used for gap detection. |
DEFAULT_BUCKET_SECONDS
|
Returns:
| Type | Description |
|---|---|
list[LabelSpan]
|
List of :class: |
list[LabelSpan]
|
|
Source code in src/taskclf/labels/weak_rules.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 | |