labels.store¶
Label span I/O, validation, and synthetic label generation.
Key functions¶
| Function | Description |
|---|---|
write_label_spans |
Serialize spans to parquet |
read_label_spans |
Deserialize spans from parquet (converts NaN values to None for nullable fields) |
import_labels_from_csv |
Read spans from CSV (supports optional user_id and confidence columns) |
append_label_span |
Append a single span to an existing parquet file with overlap validation; if the previous same-user label has extend_forward=True, its end_ts is stretched to the new span's start_ts for contiguous coverage. Before overlap checks, same-user boundaries within 1 ms are snapped together so JavaScript millisecond precision does not create false microsecond overlaps. Accepts allow_overlap=True to skip the overlap check and permit multiple labels on the same time range |
overwrite_label_span |
Append a span, resolving overlaps by truncating, splitting, or removing conflicting same-user spans. For extend_forward labels, overlap resolution uses the label's effective coverage through the next same-user label (or open-endedly when none exists), so retrospective inserts can split a running label into before/after fragments and preserve the resumed active fragment |
update_label_span |
Change the label, timestamps, and optionally extend_forward on an existing span identified by its start_ts and end_ts; validates the resulting span against LABEL_SET_V1 |
delete_label_span |
Remove a label span identified by its start_ts and end_ts |
generate_label_summary |
Summarise features in a time range (top apps, input rates, session count) |
generate_dummy_labels |
Create synthetic spans for testing |
taskclf.labels.store
¶
Label span I/O, validation, export, and synthetic label generation.
write_label_spans(spans, path)
¶
Serialize spans to a parquet file at path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spans
|
Sequence[LabelSpan]
|
Label span instances to persist. |
required |
path
|
Path
|
Destination parquet file path. |
required |
Returns:
| Type | Description |
|---|---|
Path
|
The path that was written. |
Source code in src/taskclf/labels/store.py
read_label_spans(path)
¶
Deserialize label spans from a parquet file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to an existing parquet file written by
:func: |
required |
Returns:
| Type | Description |
|---|---|
list[LabelSpan]
|
List of validated |
Source code in src/taskclf/labels/store.py
import_labels_from_csv(path)
¶
Read label spans from a CSV file and validate each row.
Required columns: start_ts, end_ts, label, provenance.
Optional columns: user_id, confidence.
Timestamps are parsed via pd.to_datetime so ISO-8601 and common
date-time formats are accepted.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to an existing CSV file. |
required |
Returns:
| Type | Description |
|---|---|
list[LabelSpan]
|
List of validated |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required columns are missing or any row fails
|
Source code in src/taskclf/labels/store.py
export_labels_to_csv(parquet_path, csv_path)
¶
Export label spans from a Parquet file to CSV.
Columns written: start_ts, end_ts, label, provenance,
user_id, confidence, extend_forward.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parquet_path
|
Path
|
Path to an existing labels Parquet file. |
required |
csv_path
|
Path
|
Destination CSV file path. |
required |
Returns:
| Type | Description |
|---|---|
Path
|
The csv_path that was written. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the Parquet file does not exist or contains no spans. |
Source code in src/taskclf/labels/store.py
merge_label_spans(existing, imported)
¶
Merge imported spans into existing, deduplicating and checking overlaps.
Deduplication key is (start_ts, end_ts, user_id); when a
collision occurs the imported span wins (newer provenance).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
existing
|
Sequence[LabelSpan]
|
Currently stored label spans. |
required |
imported
|
Sequence[LabelSpan]
|
Newly imported label spans. |
required |
Returns:
| Type | Description |
|---|---|
list[LabelSpan]
|
Merged list of |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the merged set contains overlapping spans for the same user. |
Source code in src/taskclf/labels/store.py
append_label_span(span, path, *, allow_overlap=False)
¶
Append a single label span to an existing (or new) parquet file.
If the most recent same-user label has extend_forward=True, its
end_ts is stretched (or truncated) to span.start_ts so labels
form contiguous coverage. Otherwise a plain overlap check is performed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
LabelSpan
|
The label span to append. |
required |
path
|
Path
|
Parquet file to read-append-write. |
required |
allow_overlap
|
bool
|
When True, skip the overlap check and allow multiple labels to coexist on overlapping time ranges. |
False
|
Returns:
| Type | Description |
|---|---|
Path
|
The path that was written. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the span overlaps an existing span for the same user (after any extension) and allow_overlap is False. |
Source code in src/taskclf/labels/store.py
overwrite_label_span(span, path)
¶
Append a label span, resolving overlaps by truncating/splitting/removing existing same-user spans that conflict with the new one.
Extend-forward handling is identical to :func:append_label_span.
For each same-user span that overlaps span:
- Fully contained (existing inside new) -- removed.
- Partial overlap at start (existing starts before new, ends inside
new) -- existing
end_tstruncated tospan.start_ts. - Partial overlap at end (existing starts inside new, ends after
new) -- existing
start_tsmoved tospan.end_ts. - Fully contains (existing wraps around new) -- split into a before and after fragment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
LabelSpan
|
The label span to insert. |
required |
path
|
Path
|
Parquet file to read-modify-write. |
required |
Returns:
| Type | Description |
|---|---|
Path
|
The path that was written. |
Source code in src/taskclf/labels/store.py
update_label_span(start_ts, end_ts, new_label, path, *, new_start_ts=None, new_end_ts=None, new_extend_forward=None)
¶
Change the label and/or timestamps on an existing span.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_ts
|
datetime
|
Original start timestamp of the span (lookup key). |
required |
end_ts
|
datetime
|
Original end timestamp of the span (lookup key). |
required |
new_label
|
str
|
Replacement label (must be in |
required |
path
|
Path
|
Parquet file containing label spans. |
required |
new_start_ts
|
datetime | None
|
If provided, replaces the span's start timestamp. |
None
|
new_end_ts
|
datetime | None
|
If provided, replaces the span's end timestamp. |
None
|
new_extend_forward
|
bool | None
|
If provided, replaces the span's
|
None
|
Returns:
| Type | Description |
|---|---|
LabelSpan
|
The updated |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no matching span is found or the new label is invalid. |
Source code in src/taskclf/labels/store.py
delete_label_span(start_ts, end_ts, path)
¶
Remove a label span identified by its timestamps.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_ts
|
datetime
|
Start timestamp of the span to remove. |
required |
end_ts
|
datetime
|
End timestamp of the span to remove. |
required |
path
|
Path
|
Parquet file containing label spans. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no matching span is found. |
Source code in src/taskclf/labels/store.py
generate_label_summary(features_df, start_ts, end_ts)
¶
Summarise feature rows within a time range for display in CLI / UI.
Returns a dict with top apps, aggregated interaction stats, and session count. Respects privacy: no raw titles.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
features_df
|
DataFrame
|
Feature DataFrame with |
required |
start_ts
|
datetime
|
Start of summary window (inclusive). |
required |
end_ts
|
datetime
|
End of summary window (exclusive). |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dict with keys |
dict
|
|
dict
|
|
Source code in src/taskclf/labels/store.py
586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 | |
generate_dummy_labels(date, n_rows=DEFAULT_DUMMY_ROWS)
¶
Create synthetic label spans aligned to the dummy feature timestamps.
Each span covers exactly one minute bucket, mirroring the timestamps
generated by features.build.generate_dummy_features so that every
feature row has a covering label.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
date
|
date
|
Calendar date to generate spans for. |
required |
n_rows
|
int
|
Number of one-minute spans to create. |
DEFAULT_DUMMY_ROWS
|
Returns:
| Type | Description |
|---|---|
list[LabelSpan]
|
List of |