Time & Windowing Specification v1¶
Version: 1.1 Status: Stable Last Updated: 2026-03-24
This document defines the canonical time semantics used across:
- Feature extraction
- Dataset construction
- Label projection
- Model training
- Inference
- Aggregation
- Time tracking summaries
All components MUST adhere to this specification.
1. Time Standard¶
1.1 Timestamp Format¶
All timestamps MUST be:
- UTC
- ISO 8601 format
- Stored as:
bucket_start_ts(inclusive)bucket_end_ts(exclusive)
Example:
Local time (for display only) must be derived from UTC using stored user timezone.
1.2 Internal Representation¶
All Python domain-model datetimes MUST be timezone-aware UTC
(tzinfo=datetime.timezone.utc).
The canonical normalizer is taskclf.core.time.ts_utc_aware_get():
- Naive datetimes are treated as UTC and tagged with
timezone.utc. - Aware non-UTC datetimes are converted to UTC.
- Aware UTC datetimes pass through unchanged.
Naive-UTC datetimes (tzinfo is None) MUST NOT be used in domain models.
Legacy naive inputs from external sources (CSV, API, CLI) are accepted at
ingestion boundaries and normalized immediately via ts_utc_aware_get().
Models that enforce this contract:
FeatureRow.bucket_start_ts,FeatureRow.bucket_end_tsLabelSpan.start_ts,LabelSpan.end_tsLabelRequest.bucket_start_ts,LabelRequest.bucket_end_ts,LabelRequest.created_at
2. Window Definition¶
2.1 Fixed Window Size¶
Window duration:
All feature aggregation operates over fixed, non-overlapping windows.
Windows are aligned to wall clock boundaries.
Example:
05:30:00–05:31:00 05:31:00–05:32:00
Rolling windows are NOT allowed.
2.2 Window Identity¶
Each window is uniquely identified by:
user_idbucket_start_ts
Optional:
- device_id
3. Session Definition¶
A session is a contiguous period of activity separated by idle gaps.
3.1 Idle Threshold¶
If no interaction events are recorded for 5 minutes or more:
- The previous session ends
- A new session begins upon next activity
3.2 session_length_so_far¶
Definition:
Minutes since session start.
For each window:
If a session restarts, this resets to 0.
4. Idle Detection¶
Idle detection is based on:
- No keyboard events
- No mouse movement
- No clicks
- No scroll
- No app switches
If a window contains no interaction events:
- It is marked as idle candidate
- If idle gap >= IDLE_GAP_THRESHOLD_SECONDS → label becomes BreakIdle
BreakIdle overrides model prediction.
5. Partial Windows¶
If activity starts mid-window:
- Window still exists
- Features computed from available activity
- Missing interaction treated as zero
No partial window truncation allowed.
6. Label Projection (Block → Window)¶
Manual labeling is done in time blocks.
Projection rules:
- A window is labeled if:
-
Its entire duration falls within a labeled block
-
If a window overlaps multiple labeled blocks:
-
Drop from training set
-
If a window partially overlaps a block:
-
Drop from training set
-
Unlabeled windows:
- Not used in supervised training
7. Aggregation Rules (Time Tracking)¶
7.1 Window → Block Merge¶
Adjacent windows with identical final label (post-smoothing) must be merged.
7.2 Minimum Block Duration¶
If a predicted label change lasts less than 3 minutes:
- It may be smoothed to surrounding label
- Exact smoothing strategy defined in inference layer
8. Day Boundaries¶
Daily summaries use user local time.
Day boundary:
Windows are grouped by local date after timezone conversion.
9. Feature Time Semantics¶
The following features depend on window start time:
hour_of_dayday_of_week
They are computed from:
10. Determinism Requirements¶
Given identical raw event logs:
- Windowing MUST produce identical buckets
- Session segmentation MUST be deterministic
- Feature values MUST be reproducible
No randomness allowed in feature pipeline.
11. Versioning¶
Changing any of the following requires version bump:
- Window size
- Idle threshold
- Session definition
- Block projection policy
- Minimum block duration