infer.baseline¶

Rule-based baseline classifier (no ML).

Applies heuristic rules to feature windows in priority order:

BreakIdle — lockscreen / OS login window (app_category == "lockscreen")
BreakIdle — near-zero activity or long idle run
ReadResearch — browser foreground with high scroll and low typing
Build — editor/terminal foreground with high typing and shortcuts
Mixed/Unknown — fallback reject label

Rule 0 fires when the foreground app is an OS lock screen (macOS loginwindow, Windows LockApp.exe, Linux screen lockers). Since no productive task is possible while the screen is locked, this overrides all other signals unconditionally.

This establishes the cold-start performance floor that ML models must beat.

Thresholds¶

All thresholds are defined in taskclf.core.defaults and can be overridden per-call:

Constant	Default	Used by
`BASELINE_IDLE_ACTIVE_THRESHOLD`	5.0 s	BreakIdle rule
`BASELINE_IDLE_RUN_THRESHOLD`	50.0 s	BreakIdle rule
`BASELINE_SCROLL_HIGH`	3.0 events/min	ReadResearch rule
`BASELINE_KEYS_LOW`	10.0 keys/min	ReadResearch rule
`BASELINE_KEYS_HIGH`	30.0 keys/min	Build rule
`BASELINE_SHORTCUT_HIGH`	1.0 shortcuts/min	Build rule

`taskclf.infer.baseline` ¶

Rule-based baseline classifier (no ML).

Applies heuristic rules to feature windows in priority order:

BreakIdle — near-zero activity or long idle run
ReadResearch — browser foreground with high scroll and low typing
Build — editor/terminal foreground with high typing and shortcuts
Mixed/Unknown — fallback reject label

This establishes the cold-start performance floor that ML models must beat.

`classify_single_row(row, *, idle_active_threshold=BASELINE_IDLE_ACTIVE_THRESHOLD, idle_run_threshold=BASELINE_IDLE_RUN_THRESHOLD, scroll_high=BASELINE_SCROLL_HIGH, keys_low=BASELINE_KEYS_LOW, keys_high=BASELINE_KEYS_HIGH, shortcut_high=BASELINE_SHORTCUT_HIGH)` ¶

Classify a single feature row using heuristic rules.

Rules are applied in strict priority order so that BreakIdle always overrides other signals (e.g. an idle browser window is BreakIdle, not ReadResearch).

Parameters:

Name	Type	Description	Default
`row`	`Series`	A pandas Series with feature columns from `features_v1`.	required
`idle_active_threshold`	`float`	Max `active_seconds_any` to be considered idle.	`BASELINE_IDLE_ACTIVE_THRESHOLD`
`idle_run_threshold`	`float`	Min `max_idle_run_seconds` to be considered idle.	`BASELINE_IDLE_RUN_THRESHOLD`
`scroll_high`	`float`	Min `scroll_events_per_min` for "high scroll".	`BASELINE_SCROLL_HIGH`
`keys_low`	`float`	Max `keys_per_min` for "low typing".	`BASELINE_KEYS_LOW`
`keys_high`	`float`	Min `keys_per_min` for "high typing".	`BASELINE_KEYS_HIGH`
`shortcut_high`	`float`	Min `shortcut_rate` for "high shortcuts".	`BASELINE_SHORTCUT_HIGH`

Returns:

Type	Description
`str`	A label string: one of the :class:`CoreLabel` values or
`str`	data:`MIXED_UNKNOWN`.

Source code in src/taskclf/infer/baseline.py

def classify_single_row(
    row: pd.Series,
    *,
    idle_active_threshold: float = BASELINE_IDLE_ACTIVE_THRESHOLD,
    idle_run_threshold: float = BASELINE_IDLE_RUN_THRESHOLD,
    scroll_high: float = BASELINE_SCROLL_HIGH,
    keys_low: float = BASELINE_KEYS_LOW,
    keys_high: float = BASELINE_KEYS_HIGH,
    shortcut_high: float = BASELINE_SHORTCUT_HIGH,
) -> str:
    """Classify a single feature row using heuristic rules.

    Rules are applied in strict priority order so that BreakIdle always
    overrides other signals (e.g. an idle browser window is BreakIdle,
    not ReadResearch).

    Args:
        row: A pandas Series with feature columns from ``features_v1``.
        idle_active_threshold: Max ``active_seconds_any`` to be considered idle.
        idle_run_threshold: Min ``max_idle_run_seconds`` to be considered idle.
        scroll_high: Min ``scroll_events_per_min`` for "high scroll".
        keys_low: Max ``keys_per_min`` for "low typing".
        keys_high: Min ``keys_per_min`` for "high typing".
        shortcut_high: Min ``shortcut_rate`` for "high shortcuts".

    Returns:
        A label string: one of the :class:`CoreLabel` values or
        :data:`MIXED_UNKNOWN`.
    """
    # Rule 0: BreakIdle — lockscreen / login window is unconditionally idle
    if row.get("app_category") == "lockscreen":
        return CoreLabel.BreakIdle

    active_any = row.get("active_seconds_any")
    max_idle = row.get("max_idle_run_seconds")

    # Rule 1: BreakIdle — near-zero activity or dominant idle run
    active_val = _safe_float(active_any, default=-1.0)
    idle_val = _safe_float(max_idle, default=0.0)

    if active_val < 0:
        # null active_seconds_any → no collector data → treat as idle
        return CoreLabel.BreakIdle
    if active_val < idle_active_threshold:
        return CoreLabel.BreakIdle
    if idle_val > idle_run_threshold:
        return CoreLabel.BreakIdle

    # Rule 2: ReadResearch — browser + high scroll + low keys
    is_browser = bool(row.get("is_browser", False))
    scroll = _safe_float(row.get("scroll_events_per_min"))
    keys = _safe_float(row.get("keys_per_min"))

    if is_browser and scroll > scroll_high and keys < keys_low:
        return CoreLabel.ReadResearch

    # Rule 3: Build — editor/terminal + high keys + shortcuts
    is_editor = bool(row.get("is_editor", False))
    is_terminal = bool(row.get("is_terminal", False))
    shortcut = _safe_float(row.get("shortcut_rate"))

    if (is_editor or is_terminal) and keys > keys_high and shortcut > shortcut_high:
        return CoreLabel.Build

    # Rule 4: fallback
    return MIXED_UNKNOWN

`predict_baseline(features_df, *, idle_active_threshold=BASELINE_IDLE_ACTIVE_THRESHOLD, idle_run_threshold=BASELINE_IDLE_RUN_THRESHOLD, scroll_high=BASELINE_SCROLL_HIGH, keys_low=BASELINE_KEYS_LOW, keys_high=BASELINE_KEYS_HIGH, shortcut_high=BASELINE_SHORTCUT_HIGH)` ¶

Apply heuristic rules to every row and return one label per window.

Parameters:

Name	Type	Description	Default
`features_df`	`DataFrame`	Feature DataFrame conforming to `features_v1`.	required
`idle_active_threshold`	`float`	Threshold for `active_seconds_any`.	`BASELINE_IDLE_ACTIVE_THRESHOLD`
`idle_run_threshold`	`float`	Threshold for `max_idle_run_seconds`.	`BASELINE_IDLE_RUN_THRESHOLD`
`scroll_high`	`float`	Threshold for `scroll_events_per_min`.	`BASELINE_SCROLL_HIGH`
`keys_low`	`float`	Upper bound on `keys_per_min` for "low typing".	`BASELINE_KEYS_LOW`
`keys_high`	`float`	Lower bound on `keys_per_min` for "high typing".	`BASELINE_KEYS_HIGH`
`shortcut_high`	`float`	Lower bound on `shortcut_rate`.	`BASELINE_SHORTCUT_HIGH`

Returns:

Type	Description
`list[str]`	Predicted label per row (same length as features_df).

Source code in src/taskclf/infer/baseline.py

def predict_baseline(
    features_df: pd.DataFrame,
    *,
    idle_active_threshold: float = BASELINE_IDLE_ACTIVE_THRESHOLD,
    idle_run_threshold: float = BASELINE_IDLE_RUN_THRESHOLD,
    scroll_high: float = BASELINE_SCROLL_HIGH,
    keys_low: float = BASELINE_KEYS_LOW,
    keys_high: float = BASELINE_KEYS_HIGH,
    shortcut_high: float = BASELINE_SHORTCUT_HIGH,
) -> list[str]:
    """Apply heuristic rules to every row and return one label per window.

    Args:
        features_df: Feature DataFrame conforming to ``features_v1``.
        idle_active_threshold: Threshold for ``active_seconds_any``.
        idle_run_threshold: Threshold for ``max_idle_run_seconds``.
        scroll_high: Threshold for ``scroll_events_per_min``.
        keys_low: Upper bound on ``keys_per_min`` for "low typing".
        keys_high: Lower bound on ``keys_per_min`` for "high typing".
        shortcut_high: Lower bound on ``shortcut_rate``.

    Returns:
        Predicted label per row (same length as *features_df*).
    """
    return [
        classify_single_row(
            row,
            idle_active_threshold=idle_active_threshold,
            idle_run_threshold=idle_run_threshold,
            scroll_high=scroll_high,
            keys_low=keys_low,
            keys_high=keys_high,
            shortcut_high=shortcut_high,
        )
        for _, row in features_df.iterrows()
    ]

`run_baseline_inference(features_df, *, smooth_window=DEFAULT_SMOOTH_WINDOW, bucket_seconds=DEFAULT_BUCKET_SECONDS, idle_active_threshold=BASELINE_IDLE_ACTIVE_THRESHOLD, idle_run_threshold=BASELINE_IDLE_RUN_THRESHOLD, scroll_high=BASELINE_SCROLL_HIGH, keys_low=BASELINE_KEYS_LOW, keys_high=BASELINE_KEYS_HIGH, shortcut_high=BASELINE_SHORTCUT_HIGH)` ¶

Predict, smooth, and segmentize using the rule baseline.

Mirrors :func:taskclf.infer.batch.run_batch_inference but uses heuristic rules instead of a trained model.

Parameters:

Name	Type	Description	Default
`features_df`	`DataFrame`	Feature DataFrame (must contain feature columns and `bucket_start_ts`).	required
`smooth_window`	`int`	Window size for rolling-majority smoothing.	`DEFAULT_SMOOTH_WINDOW`
`bucket_seconds`	`int`	Width of each time bucket in seconds.	`DEFAULT_BUCKET_SECONDS`
`idle_active_threshold`	`float`	Threshold for `active_seconds_any`.	`BASELINE_IDLE_ACTIVE_THRESHOLD`
`idle_run_threshold`	`float`	Threshold for `max_idle_run_seconds`.	`BASELINE_IDLE_RUN_THRESHOLD`
`scroll_high`	`float`	Threshold for `scroll_events_per_min`.	`BASELINE_SCROLL_HIGH`
`keys_low`	`float`	Upper bound on `keys_per_min` for "low typing".	`BASELINE_KEYS_LOW`
`keys_high`	`float`	Lower bound on `keys_per_min` for "high typing".	`BASELINE_KEYS_HIGH`
`shortcut_high`	`float`	Lower bound on `shortcut_rate`.	`BASELINE_SHORTCUT_HIGH`

Returns:

Type	Description
`tuple[list[str], list[Segment]]`	A `(smoothed_labels, segments)` tuple.

Source code in src/taskclf/infer/baseline.py

def run_baseline_inference(
    features_df: pd.DataFrame,
    *,
    smooth_window: int = DEFAULT_SMOOTH_WINDOW,
    bucket_seconds: int = DEFAULT_BUCKET_SECONDS,
    idle_active_threshold: float = BASELINE_IDLE_ACTIVE_THRESHOLD,
    idle_run_threshold: float = BASELINE_IDLE_RUN_THRESHOLD,
    scroll_high: float = BASELINE_SCROLL_HIGH,
    keys_low: float = BASELINE_KEYS_LOW,
    keys_high: float = BASELINE_KEYS_HIGH,
    shortcut_high: float = BASELINE_SHORTCUT_HIGH,
) -> tuple[list[str], list[Segment]]:
    """Predict, smooth, and segmentize using the rule baseline.

    Mirrors :func:`taskclf.infer.batch.run_batch_inference` but uses
    heuristic rules instead of a trained model.

    Args:
        features_df: Feature DataFrame (must contain feature columns
            and ``bucket_start_ts``).
        smooth_window: Window size for rolling-majority smoothing.
        bucket_seconds: Width of each time bucket in seconds.
        idle_active_threshold: Threshold for ``active_seconds_any``.
        idle_run_threshold: Threshold for ``max_idle_run_seconds``.
        scroll_high: Threshold for ``scroll_events_per_min``.
        keys_low: Upper bound on ``keys_per_min`` for "low typing".
        keys_high: Lower bound on ``keys_per_min`` for "high typing".
        shortcut_high: Lower bound on ``shortcut_rate``.

    Returns:
        A ``(smoothed_labels, segments)`` tuple.
    """
    features_df = features_df.sort_values("bucket_start_ts").reset_index(drop=True)
    raw_labels = predict_baseline(
        features_df,
        idle_active_threshold=idle_active_threshold,
        idle_run_threshold=idle_run_threshold,
        scroll_high=scroll_high,
        keys_low=keys_low,
        keys_high=keys_high,
        shortcut_high=shortcut_high,
    )
    smoothed = rolling_majority(raw_labels, window=smooth_window)

    bucket_starts: list[datetime] = [
        ts_utc_aware_get(pd.Timestamp(ts).to_pydatetime())
        for ts in features_df["bucket_start_ts"].values
    ]
    segments = segmentize(bucket_starts, smoothed, bucket_seconds=bucket_seconds)

    return smoothed, segments

infer.baseline¶

Thresholds¶

taskclf.infer.baseline ¶

classify_single_row(row, *, idle_active_threshold=BASELINE_IDLE_ACTIVE_THRESHOLD, idle_run_threshold=BASELINE_IDLE_RUN_THRESHOLD, scroll_high=BASELINE_SCROLL_HIGH, keys_low=BASELINE_KEYS_LOW, keys_high=BASELINE_KEYS_HIGH, shortcut_high=BASELINE_SHORTCUT_HIGH) ¶

predict_baseline(features_df, *, idle_active_threshold=BASELINE_IDLE_ACTIVE_THRESHOLD, idle_run_threshold=BASELINE_IDLE_RUN_THRESHOLD, scroll_high=BASELINE_SCROLL_HIGH, keys_low=BASELINE_KEYS_LOW, keys_high=BASELINE_KEYS_HIGH, shortcut_high=BASELINE_SHORTCUT_HIGH) ¶

`taskclf.infer.baseline` ¶

`classify_single_row(row, *, idle_active_threshold=BASELINE_IDLE_ACTIVE_THRESHOLD, idle_run_threshold=BASELINE_IDLE_RUN_THRESHOLD, scroll_high=BASELINE_SCROLL_HIGH, keys_low=BASELINE_KEYS_LOW, keys_high=BASELINE_KEYS_HIGH, shortcut_high=BASELINE_SHORTCUT_HIGH)` ¶

`predict_baseline(features_df, *, idle_active_threshold=BASELINE_IDLE_ACTIVE_THRESHOLD, idle_run_threshold=BASELINE_IDLE_RUN_THRESHOLD, scroll_high=BASELINE_SCROLL_HIGH, keys_low=BASELINE_KEYS_LOW, keys_high=BASELINE_KEYS_HIGH, shortcut_high=BASELINE_SHORTCUT_HIGH)` ¶