Skip to content

cli.main

Typer CLI entrypoint and commands.

Entry point

The console script (taskclf) is defined in cli.entry.cli_entry(). It intercepts --version / -v only when that is the first argument after the executable (for example taskclf --version or entry -v), before importing the full command module, so that version queries return instantly without loading the entire Typer command tree and its transitive dependencies. Invocations such as taskclf tray … --title-salt -v still load the full CLI so -v is handled as a value, not a version shortcut. All other invocations delegate to cli_main() in this module.

Crash handler

The CLI entry point (cli_main()) wraps all commands in a top-level crash handler. If an unhandled exception escapes, a timestamped crash report is written to <TASKCLF_HOME>/logs/crash_<YYYYMMDD_HHMMSS>.txt and the file path plus issue URL are printed to stderr. SystemExit and KeyboardInterrupt pass through normally.

See core.crash for crash file contents and privacy details.

Release version bumps (Makefile)

Targets at the repo root:

  • bump-patch / bump-minor / bump-major — bump pyproject.toml and uv.lock, commit, tag vX.Y.Z, and trigger .github/workflows/payload-release.yml. Use for a Python / sidecar payload release (payload zips and payload manifest.json on the v* release).
  • bump-launcher-patch / bump-launcher-minor / bump-launcher-major — bump electron/package.json, commit, tag launcher-vX.Y.Z, and trigger .github/workflows/electron-release.yml. Use for a desktop launcher release (installers + launcher compatibility manifest.json on the launcher-v* release only).

You only need bump-patch when you want a v* tag; you only need bump-launcher-* when you want a launcher-v* tag. They are independent unless you choose to run both.

Guards: By default, a bump aborts if there are no changes since the last matching tag under the paths that affect that release (see Makefile: PAYLOAD_BUMP_PATHS covers the Python sidecar tree; LAUNCHER_BUMP_PATHS is electron/ only). Set BUMP_FORCE=1 to tag anyway (e.g. re-release the same tree or you only changed docs).

CI: On tag push, workflows compare github.event.before to the tagged commit and skip the job when only unrelated files changed; workflow_dispatch always runs the full workflow.

Payload releases also refresh the GitHub Pages payload index at https://fruitiecutiepie.github.io/taskclf/payload-index.json, which the packaged Electron launcher uses to discover the newest compatible v* sidecar release. That file is regenerated from GitHub Releases during both the payload release workflow and ordinary docs deploys, but the docs workflow now skips automatic master runs when that commit is also tagged v*. That makes the payload release workflow the only GitHub Pages publisher for release commits, while later ordinary master pushes still refresh metadata from the same published releases. The payload release workflow also supports a manual index-refresh-only dispatch for repairing stale Pages metadata without rebuilding payload archives. All Pages publishers share the same pages concurrency group and queue behind each other so docs deploys and payload-metadata deploys serialize instead of canceling each other.

Commands

Command Description
taskclf ingest aw Ingest an ActivityWatch JSON export
taskclf features build Build per-minute feature rows by fetching from ActivityWatch (supports single date or date range)
taskclf labels import Import label spans from CSV
taskclf labels add-block Create a manual label block for a time range
taskclf labels label-now Label the last N minutes (no timestamps needed)
taskclf labels show-queue Show pending items in the active labeling queue
taskclf labels export Export labels.parquet to CSV
taskclf labels project Project label blocks onto feature windows
taskclf train build-dataset Build training dataset (X/y/splits artifacts)
taskclf train lgbm Train a LightGBM multiclass model
taskclf train evaluate Evaluate a trained model: metrics, calibration, acceptance checks
taskclf train tune-reject Sweep reject thresholds and recommend the best one
taskclf train calibrate Fit per-user probability calibrators
taskclf train retrain Run full retrain pipeline (train, evaluate, gate-check, promote)
taskclf train check-retrain Check whether retraining or calibrator update is due
taskclf train list List model bundles with ranking metrics and status
taskclf model set-active Manually set the active model pointer (rollback / override)
taskclf model inspect Inspect bundle metrics and optionally replay held-out test evaluation
taskclf policy show Print the current inference policy
taskclf policy create Create an inference policy binding model + calibrator + threshold
taskclf policy remove Remove the inference policy (falls back to active.json)
taskclf taxonomy validate Validate a user taxonomy YAML file
taskclf taxonomy show Display taxonomy mapping as a table
taskclf taxonomy init Generate a default taxonomy YAML
taskclf infer batch Run batch inference (supports --taxonomy)
taskclf infer baseline Run rule-based baseline inference (no ML)
taskclf infer compare Compare baseline vs ML model on labeled data
taskclf infer online Run online inference loop (supports --taxonomy, --label-queue)
taskclf report daily Generate a daily report
taskclf monitor drift-check Run drift detection (reference vs current)
taskclf monitor telemetry Compute and store a telemetry snapshot
taskclf monitor show Display recent telemetry snapshots
taskclf diagnostics Collect environment info for bug reports
taskclf ui Launch the labeling UI as a native floating window
taskclf electron Launch the Electron desktop shell with native tray + multi-window popups
taskclf tray Run system tray labeling app with activity transition detection

features build

Build per-minute feature rows by fetching from a running ActivityWatch server. Supports a single --date or a --date-from / --date-to range for backfilling multiple days.

# Single date
taskclf features build --date 2026-03-10

# Date range (backfill)
taskclf features build --date-from 2026-02-16 --date-to 2026-03-12

# Dummy/synthetic features for testing
taskclf features build --date 2026-03-10 --synthetic
Option Default Description
--date -- Single date (YYYY-MM-DD)
--date-from -- Start of date range (inclusive)
--date-to -- End of date range (inclusive; defaults to --date-from if omitted)
--aw-host http://localhost:5600 ActivityWatch server URL
--title-salt taskclf-default-salt Salt for hashing window titles
--synthetic False Generate dummy features instead of fetching from AW
--data-dir <TASKCLF_HOME>/data/processed Output directory for parquet files

labels add-block

Create a manual label block with a feature summary and optional predicted label display.

taskclf labels add-block \
  --start 2025-06-15T10:00:00 --end 2025-06-15T11:00:00 \
  --label Build --user-id my-user --confidence 0.9

labels label-now

Label the last N minutes without typing timestamps. Queries a running ActivityWatch server for a live summary of apps used in the window. Falls back gracefully if AW is unreachable.

taskclf labels label-now --minutes 10 --label Build

With options:

taskclf labels label-now \
  --minutes 15 --label Debug \
  --user-id my-user --confidence 0.9 \
  --aw-host http://localhost:5600

labels show-queue

Display pending labeling requests sorted by confidence (lowest first).

taskclf labels show-queue --user-id my-user --limit 10

labels export

Export label spans from labels.parquet to a CSV file. Columns written: start_ts, end_ts, label, provenance, user_id, confidence, extend_forward.

taskclf labels export

With a custom output path:

taskclf labels export --out ~/Desktop/my_labels.csv
Option Default Description
--out, -o labels.csv Destination CSV file path
--data-dir <TASKCLF_HOME>/data/processed Processed data directory (reads labels_v1/labels.parquet)

labels project

Run strict block-to-window projection and write per-window labels.

taskclf labels project --from 2025-06-10 --to 2025-06-15

train build-dataset

Joins features with labels via strict block-to-window projection, applies exclusion rules (short sessions, missing features), splits by time (70/15/15 per user), and writes X.parquet, y.parquet, and splits.json.

taskclf train build-dataset \
  --from 2025-06-10 --to 2025-06-15 \
  --synthetic \
  --holdout-fraction 0.1

train evaluate

Run full evaluation of a trained model against labeled data. Outputs overall metrics, per-class precision/recall/F1, per-user macro-F1, and acceptance-check verdicts. Writes evaluation.json, calibration.json, confusion_matrix.csv, and calibration.png to the output directory.

taskclf train evaluate \
  --model-dir models/run_20250615_120000 \
  --from 2025-06-10 --to 2025-06-15 --synthetic

With user holdout for seen/unseen evaluation:

taskclf train evaluate \
  --model-dir models/run_20250615_120000 \
  --from 2025-06-10 --to 2025-06-15 \
  --holdout-fraction 0.1 --reject-threshold 0.55
Option Default Description
--model-dir (required) Path to a model run directory
--from (required) Start date (YYYY-MM-DD)
--to (required) End date (YYYY-MM-DD, inclusive)
--synthetic off Generate dummy features + labels
--data-dir data/processed Processed data directory
--out-dir artifacts Output directory for evaluation artifacts
--holdout-fraction 0.0 Fraction of users held out for unseen-user evaluation
--reject-threshold 0.55 Max-probability below which prediction is rejected

train tune-reject

Sweep reject thresholds on a validation set and recommend the optimal value. Outputs a Rich table showing accuracy, reject rate, coverage, and macro-F1 at each threshold. Writes reject_tuning.json to the output directory.

taskclf train tune-reject \
  --model-dir models/run_20250615_120000 \
  --from 2025-06-10 --to 2025-06-15 --synthetic
Option Default Description
--model-dir (required) Path to a model run directory
--from (required) Start date (YYYY-MM-DD)
--to (required) End date (YYYY-MM-DD, inclusive)
--synthetic off Generate dummy features + labels
--data-dir data/processed Processed data directory
--out-dir artifacts Output directory for tuning report
--write-policy / --no-write-policy off Write an inference policy binding model + calibrator + tuned threshold
--calibrator-store (none) Path to calibrator store (included in policy when --write-policy)
--models-dir models Base directory for model bundles (used for policy file location)

train calibrate

Fit per-user probability calibrators and save a calibrator store. Reports each user's eligibility (labeled windows, days, distinct labels) and writes the store (global + per-user calibrators) to disk.

taskclf train calibrate \
  --model-dir models/run_20250615_120000 \
  --from 2025-06-10 --to 2025-06-15 --synthetic

With custom eligibility thresholds and isotonic method:

taskclf train calibrate \
  --model-dir models/run_20250615_120000 \
  --from 2025-06-10 --to 2025-06-15 \
  --method isotonic \
  --min-windows 100 --min-days 2 --min-labels 2
Option Default Description
--model-dir (required) Path to a model run directory
--from (required) Start date (YYYY-MM-DD)
--to (required) End date (YYYY-MM-DD, inclusive)
--synthetic off Generate dummy features + labels
--data-dir data/processed Processed data directory
--out artifacts/calibrator_store Output directory for calibrator store
--method temperature Calibration method: temperature or isotonic
--min-windows 200 Minimum labeled windows for per-user calibration
--min-days 3 Minimum distinct days for per-user calibration
--min-labels 3 Minimum distinct core labels for per-user calibration

train retrain

Run the full retrain pipeline: build dataset, train a challenger model, evaluate it, run regression gates against the current champion, and promote if all gates pass. Uses configs/retrain.yaml for cadence, gate thresholds, and training parameters.

taskclf train retrain \
  --config configs/retrain.yaml \
  --from 2025-06-01 --to 2025-06-30 \
  --force --synthetic

Use --dry-run to evaluate without promoting:

taskclf train retrain --config configs/retrain.yaml --dry-run --synthetic

train check-retrain

Check whether a global retrain or calibrator update is due (read-only).

taskclf train check-retrain \
  --models-dir models/ \
  --calibrator-store artifacts/calibrator_store

train list

List all model bundles under models/ with ranking metrics, eligibility status, and active pointer. Columns include macro F1, weighted F1, BreakIdle precision (derived from confusion matrix), and minimum per-class precision.

taskclf train list

Filter to eligible bundles only (compatible schema + label set):

taskclf train list --eligible

Sort by a different metric:

taskclf train list --sort weighted_f1

Output as JSON for automation:

taskclf train list --json

Filter to a specific schema hash:

taskclf train list --schema-hash 740b4db787e9 --eligible
Option Default Description
--models-dir models Base directory for model bundles
--sort macro_f1 Sort column: macro_f1, weighted_f1, or created_at
--eligible off Show only eligible bundles (compatible schema + label set)
--schema-hash (current runtime) Filter to bundles matching this schema hash
--json off Output as JSON instead of a table

model inspect

Inspect a trained model bundle: metadata, bundle-saved validation metrics (with per-class precision/recall/F1 and support derived from the saved confusion matrix, plus top confusion pairs), and a short description of multiclass prediction code paths. Optionally replay held-out test evaluation for a date range (same pipeline as taskclf train evaluate), which adds calibration scalars (ECE, Brier, log loss), slice metrics (default: user_id, app_id, app_category, domain_category, hour_of_day), unknown-category rates vs the bundle encoders, and test-set class distribution. Use --json for the full structured payload (bundle_saved_validation vs replayed_test_evaluation.report).

Bundle-only (no labeled replay):

taskclf model inspect --model-dir models/2026-02-19_013000_run-0042

Machine-readable JSON:

taskclf model inspect --model-dir models/run_001 --json

Replay test metrics on synthetic data (no feature parquet required):

taskclf model inspect \
  --model-dir models/run_001 \
  --from 2026-02-01 --to 2026-02-16 \
  --synthetic

Replay using processed features under --data-dir (omit --synthetic):

taskclf model inspect \
  --model-dir models/run_001 \
  --from 2026-02-01 --to 2026-02-16 \
  --data-dir data/processed
Option Default Description
--model-dir (required) Path to a model run directory
--json off Emit structured JSON instead of tables
--from (none) Start date for test replay (YYYY-MM-DD); requires --to
--to (none) End date for test replay (inclusive); requires --from
--data-dir data/processed when replaying without --synthetic Processed data directory
--synthetic off Replay on dummy features + labels
--holdout-fraction 0.0 Fraction of users held out for unseen-user evaluation
--reject-threshold see defaults Max probability below which a prediction is rejected

model set-active

Manually set the active model pointer to a specific bundle. Useful for rollback (reverting to a known-good model) or manual override after a bad promotion. The bundle must be valid and compatible with the current schema and label set.

taskclf model set-active --model-id run_20260215_120000

With a custom models directory:

taskclf model set-active --model-id run_20260215_120000 --models-dir /data/models
Option Default Description
--model-id (required) Bundle directory name under models/
--models-dir models Base directory for model bundles

policy show

Print the current inference policy or report that none exists.

taskclf policy show
Option Default Description
--models-dir models Base directory for model bundles

policy create

Create an inference policy binding model + calibrator store + reject threshold. Validates all bindings before writing.

taskclf policy create \
  --model-dir models/run_001 \
  --calibrator-store artifacts/calibrator_store \
  --reject-threshold 0.55
Option Default Description
--model-dir (required) Path to a model run directory
--reject-threshold 0.55 Reject threshold for this model+calibration pair
--calibrator-store (none) Path to calibrator store directory
--models-dir models Base directory for model bundles

policy remove

Remove the inference policy file. Inference falls back to active.json resolution.

taskclf policy remove
Option Default Description
--models-dir models Base directory for model bundles

taxonomy validate

Validate a user taxonomy YAML file and report any errors.

taskclf taxonomy validate --config configs/user_taxonomy.yaml

taxonomy show

Display the taxonomy mapping as a Rich table showing bucket names, core labels, colors, and advanced settings.

taskclf taxonomy show --config configs/user_taxonomy.yaml

taxonomy init

Generate a default taxonomy YAML with one bucket per core label (identity mapping) as a starting point for customisation.

taskclf taxonomy init --out configs/user_taxonomy.yaml

infer batch

Run batch inference with optional taxonomy mapping. When --taxonomy is provided, a mapped_label column is added to predictions.csv.

--model-dir is optional. When omitted, the model is auto-resolved: first from models/active.json, then by best-model selection from --models-dir (default models/). See docs/guide/model_selection.md for the full resolution precedence.

# With explicit model directory:
taskclf infer batch \
  --model-dir models/run_20250615_120000 \
  --from 2025-06-10 --to 2025-06-15 --synthetic \
  --taxonomy configs/user_taxonomy.yaml

# With auto-resolution (uses active.json or best model):
taskclf infer batch \
  --from 2025-06-10 --to 2025-06-15 --synthetic
Option Default Description
--model-dir (auto-resolved) Path to a model bundle directory
--models-dir models Base directory for model bundles (used for auto-resolution)

infer online (with taxonomy, calibrator, and label queue)

Run online inference with optional taxonomy mapping and probability calibration. Each prediction is written as a full WindowPrediction row (core label, core probs, confidence, rejection status, mapped label, mapped probs). Segments are hysteresis-merged so blocks shorter than MIN_BLOCK_DURATION_SECONDS (3 min) are absorbed by neighbours.

--model-dir is optional (same auto-resolution as infer batch). When --models-dir is provided, the online loop watches models/active.json for changes and hot-reloads the model bundle without restarting. The swap only occurs after the new bundle loads successfully; on failure the current model is kept.

When --label-queue is enabled, predictions with confidence below --label-confidence (default 0.55) are auto-enqueued to the labeling queue for manual review. Enqueued items appear in labels show-queue and the web UI.

# With auto-resolution and model reload:
taskclf infer online \
  --taxonomy configs/user_taxonomy.yaml \
  --calibrator calibrators/user_cal.json

# With explicit model directory:
taskclf infer online \
  --model-dir models/run_20250615_120000 \
  --taxonomy configs/user_taxonomy.yaml \
  --calibrator calibrators/user_cal.json

With label queue integration:

taskclf infer online \
  --label-queue \
  --label-confidence 0.50
Option Default Description
--model-dir (auto-resolved) Path to a model bundle directory
--models-dir models Base directory for model bundles (used for auto-resolution and reload)

infer baseline

Run rule-based heuristic inference without a trained model. Produces baseline_predictions.csv and baseline_segments.json.

taskclf infer baseline \
  --from 2025-06-10 --to 2025-06-15 --synthetic

infer compare

Compare rule baseline vs a trained ML model on labeled data. Outputs a Rich summary table and writes baseline_vs_model.json.

taskclf infer compare \
  --model-dir models/run_20250615_120000 \
  --from 2025-06-10 --to 2025-06-15 --synthetic

monitor drift-check

Run drift detection comparing reference vs current prediction windows. Flags feature PSI/KS drift, reject-rate increases, entropy spikes, and class-distribution shifts. Optionally auto-creates labeling tasks.

taskclf monitor drift-check \
  --ref-features data/processed/features_ref.parquet \
  --cur-features data/processed/features_cur.parquet \
  --ref-predictions artifacts/predictions_ref.csv \
  --cur-predictions artifacts/predictions_cur.csv \
  --auto-label --auto-label-limit 50

monitor telemetry

Compute a telemetry snapshot (feature missingness, confidence stats, reject rate, entropy, class distribution) and append to the store.

taskclf monitor telemetry \
  --features data/processed/features.parquet \
  --predictions artifacts/predictions.csv \
  --user-id user-1 \
  --store-dir artifacts/telemetry

monitor show

Display recent telemetry snapshots as a Rich table.

taskclf monitor show --store-dir artifacts/telemetry --last 10

diagnostics

Collect environment and runtime info for bug reports. Prints a human-readable summary by default, or JSON with --json.

taskclf diagnostics

Output as JSON:

taskclf diagnostics --json

Include log tail:

taskclf diagnostics --json --include-logs --log-lines 100

Write to a file:

taskclf diagnostics --json --out diagnostics.json

Output includes: - taskclf version, Python version, OS and architecture - Resolved TASKCLF_HOME path - ActivityWatch reachability (graceful on failure) - Available model bundles (from models/) - Config summary (user_id always redacted) - Disk usage of data/, models/, logs/ - Last N log lines (when --include-logs is passed)

Option Default Description
--json off Output as JSON instead of human-readable text
--include-logs off Append the last N log lines to the output
--log-lines 50 Number of log lines to include (requires --include-logs)
--out (stdout) Write output to a file instead of stdout

tray

Run a persistent system tray app that polls ActivityWatch, detects activity transitions, and prompts for labels. Automatically starts the web UI server. Left-clicking the tray icon opens the web dashboard where all labeling is done through the UI. Right-clicking shows the tray menu: Toggle Dashboard, Pause / Resume, Show Status; Today's Labels, Import Labels, Export Labels; Prediction Model (submenu), Open Data Folder, Edit Config, Advanced (submenu: Edit Inference Policy), Report Issue; Quit. When --model-dir is provided, the app suggests labels using the trained model. To reduce cold-start latency, the web UI server starts before the optional model bundle finishes loading, so suggestions can appear a moment after the dashboard becomes reachable.

taskclf tray

With a model for label suggestions:

taskclf tray --model-dir models/run_20260226

With frontend hot reload for development:

taskclf tray --dev

Open in browser instead of native window (useful combined with --dev):

taskclf tray --dev --browser

Keep the browser server headless for another host shell (for example Electron):

taskclf tray --browser --no-open-browser --no-tray

Fully browser-based (no native tray icon, no pywebview):

taskclf tray --dev --browser --no-tray
Option Default Description
--model-dir (none) Model bundle for label suggestions
--aw-host http://localhost:5600 ActivityWatch server URL
--poll-seconds 60 Seconds between AW polls
--aw-timeout 10 Seconds to wait for AW API responses
--transition-minutes 2 Minutes a new app must persist before prompting
--data-dir data/processed (ephemeral in --dev) Processed data directory; omit with --dev for an auto-cleaned temp dir
--port 8741 Port for the embedded web UI server
--dev off Vite hot reload + ephemeral data dir (unless --data-dir is set)
--browser off Open UI in browser instead of native window
--open-browser / --no-open-browser --open-browser When --browser is set, open the dashboard automatically in the default browser
--no-tray off Skip the native tray icon (use with --browser for browser-only mode)

electron

Launch the optional Electron desktop shell. Electron owns the native tray icon and three frameless BrowserWindows (compact pill, label popup, state panel popup); the label and panel popups are created when first opened. The existing Python tray backend runs as a sidecar process in browser mode without opening a separate browser tab. Clicking the tray icon shows or focuses the dashboard; hiding remains an explicit tray-menu action via Toggle Dashboard.

taskclf electron

With a model for label suggestions:

taskclf electron --model-dir models/run_20260226

With frontend hot reload and an ephemeral data dir:

taskclf electron --dev
Option Default Description
--model-dir (none) Model bundle for label suggestions
--models-dir models Directory containing model bundles
--aw-host http://localhost:5600 ActivityWatch server URL
--poll-seconds 60 Seconds between AW polls
--title-salt taskclf-default-salt Salt for hashing window titles
--data-dir data/processed (ephemeral in --dev) Processed data directory; omit with --dev for an auto-cleaned temp dir
--transition-minutes 2 Minutes a new app must persist before prompting
--port 8741 Port for the embedded web UI server and Electron sidecar
--dev off Enable frontend hot reload inside Electron; uses an ephemeral data dir unless --data-dir is set
--username (none) Display name to persist in config.toml
--retrain-config (none) Retrain YAML config path passed through to the tray backend

In the packaged Electron app (installers / .app), the main process runs a UI port preflight on TASKCLF_ELECTRON_UI_PORT (default 8741) before starting the sidecar: it can stop a leftover taskclf listener automatically or prompt if another process owns the port. See electron_shell.

Optional launcher environment variables are documented under electron_shell (for example TASKCLF_ELECTRON_SHELL_WAIT_MS, TASKCLF_ELECTRON_TRAY_SYNC_MS, manifest timeouts).

ui

Launch the labeling UI as a native floating window with live prediction streaming. Starts a FastAPI server, an ActivityMonitor background thread, and a pywebview window. When --model-dir is provided, the app suggests labels on activity transitions using the trained model.

taskclf ui

With a model for live predictions:

taskclf ui --model-dir models/run_20260226

With frontend hot reload for development:

taskclf ui --dev

Open in browser instead of native window:

taskclf ui --browser

Combined dev + browser mode:

taskclf ui --dev --browser

In browser mode, --dev now enables backend auto-reload as well, so editing Python files under src/taskclf/ restarts the FastAPI process while Vite keeps frontend HMR active.

Before the backend starts, taskclf ui runs a local UI-port preflight on --port (default 8741). If the listener on that port confidently looks like another taskclf UI or tray backend, the CLI stops it and continues; if a different process owns the port, startup aborts with the PID and command line so you can stop it manually or choose another port.

Option Default Description
--port 8741 Port for the web UI server
--model-dir (none) Model bundle for live predictions
--aw-host http://localhost:5600 ActivityWatch server URL
--poll-seconds 60 Seconds between AW polling iterations
--title-salt taskclf-default-salt Salt for hashing window titles
--data-dir data/processed (ephemeral in --dev) Processed data directory; omit with --dev for an auto-cleaned temp dir
--transition-minutes 2 Minutes a new app must persist before suggesting a label
--browser off Open UI in browser instead of native window
--dev off Start Vite dev server for frontend hot reload; in browser mode, the FastAPI backend also runs with auto-reload. Uses an ephemeral data dir unless --data-dir is set

taskclf.cli.main

Typer CLI entrypoint and command definitions for taskclf.

main(version=typer.Option(False, '--version', '-v', callback=_version_callback, is_eager=True, help='Show version and exit.'), verbose=typer.Option(False, '--verbose', help='Enable DEBUG logging with file/line info.'))

Local-first task classifier CLI.

Source code in src/taskclf/cli/main.py
@app.callback()
def main(
    version: bool = typer.Option(
        False,
        "--version",
        "-v",
        callback=_version_callback,
        is_eager=True,
        help="Show version and exit.",
    ),
    verbose: bool = typer.Option(
        False,
        "--verbose",
        help="Enable DEBUG logging with file/line info.",
    ),
) -> None:
    """Local-first task classifier CLI."""
    level = logging.DEBUG if verbose else logging.WARNING
    logging.basicConfig(
        level=level,
        format="%(levelname)s %(name)s %(pathname)s:%(lineno)d%(message)s",
    )
    ensure_taskclf_dirs()

    from taskclf.core.logging import setup_file_logging

    setup_file_logging()

ingest_aw_cmd(input_file=typer.Option(..., '--input', help='Path to an ActivityWatch export JSON file'), out_dir=typer.Option(DEFAULT_RAW_AW_DIR, help='Output directory for normalized events (partitioned by date)'), title_salt=typer.Option(DEFAULT_TITLE_SALT, '--title-salt', help='Salt for hashing window titles'))

Ingest an ActivityWatch JSON export into privacy-safe normalized events.

Source code in src/taskclf/cli/main.py
@ingest_app.command("aw")
def ingest_aw_cmd(
    input_file: str = typer.Option(
        ..., "--input", help="Path to an ActivityWatch export JSON file"
    ),
    out_dir: str = typer.Option(
        DEFAULT_RAW_AW_DIR,
        help="Output directory for normalized events (partitioned by date)",
    ),
    title_salt: str = typer.Option(
        DEFAULT_TITLE_SALT, "--title-salt", help="Salt for hashing window titles"
    ),
) -> None:
    """Ingest an ActivityWatch JSON export into privacy-safe normalized events."""
    from collections import defaultdict

    import pandas as pd

    from taskclf.adapters.activitywatch.client import (
        parse_aw_export,
        parse_aw_input_export,
    )
    from taskclf.core.store import write_parquet

    input_path = Path(input_file)
    if not input_path.exists():
        typer.echo(f"File not found: {input_path}", err=True)
        raise typer.Exit(code=1)

    events = parse_aw_export(input_path, title_salt=title_salt)
    if not events:
        typer.echo("No window events found in the export file.", err=True)
        raise typer.Exit(code=1)

    by_date: dict[str, list] = defaultdict(list)
    for ev in events:
        day = ev.timestamp.date().isoformat()
        by_date[day].append(ev.model_dump())

    out_base = Path(out_dir)
    for day, rows in sorted(by_date.items()):
        df = pd.DataFrame(rows)
        out_path = out_base / day / "events.parquet"
        write_parquet(df, out_path)
        typer.echo(f"  {day}: {len(rows)} events -> {out_path}")

    unique_apps = {ev.app_id for ev in events}
    date_range = f"{events[0].timestamp.date()} to {events[-1].timestamp.date()}"
    typer.echo(
        f"Ingested {len(events)} events across {len(by_date)} day(s) "
        f"({date_range}), {len(unique_apps)} unique apps"
    )

    # Ingest input events (aw-watcher-input) if present
    input_events = parse_aw_input_export(input_path)
    if input_events:
        input_by_date: dict[str, list] = defaultdict(list)
        for ie in input_events:
            day = ie.timestamp.date().isoformat()
            input_by_date[day].append(ie.model_dump())

        input_out_base = Path(out_dir).parent / "aw-input"
        for day, rows in sorted(input_by_date.items()):
            df = pd.DataFrame(rows)
            out_path = input_out_base / day / "events.parquet"
            write_parquet(df, out_path)
            typer.echo(f"  {day}: {len(rows)} input events -> {out_path}")

        ie_range = (
            f"{input_events[0].timestamp.date()} to {input_events[-1].timestamp.date()}"
        )
        typer.echo(
            f"Ingested {len(input_events)} input events across "
            f"{len(input_by_date)} day(s) ({ie_range})"
        )

features_build(date=typer.Option(None, help='Single date in YYYY-MM-DD format (mutually exclusive with --date-from/--date-to)'), date_from=typer.Option(None, '--date-from', help='Start of date range (YYYY-MM-DD, inclusive)'), date_to=typer.Option(None, '--date-to', help='End of date range (YYYY-MM-DD, inclusive)'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'), aw_host=typer.Option(DEFAULT_AW_HOST, '--aw-host', help='ActivityWatch server URL'), title_salt=typer.Option(DEFAULT_TITLE_SALT, '--title-salt', help='Salt for hashing window titles'), synthetic=typer.Option(False, '--synthetic', help='Generate dummy features instead of fetching from AW'))

Build per-minute feature rows by fetching from a running ActivityWatch server.

Supports a single --date or a --date-from / --date-to range. Use --synthetic to generate dummy data for testing.

Source code in src/taskclf/cli/main.py
@features_app.command("build")
def features_build(
    date: str = typer.Option(
        None,
        help="Single date in YYYY-MM-DD format (mutually exclusive with --date-from/--date-to)",
    ),
    date_from: str = typer.Option(
        None, "--date-from", help="Start of date range (YYYY-MM-DD, inclusive)"
    ),
    date_to: str = typer.Option(
        None, "--date-to", help="End of date range (YYYY-MM-DD, inclusive)"
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
    aw_host: str = typer.Option(
        DEFAULT_AW_HOST, "--aw-host", help="ActivityWatch server URL"
    ),
    title_salt: str = typer.Option(
        DEFAULT_TITLE_SALT, "--title-salt", help="Salt for hashing window titles"
    ),
    synthetic: bool = typer.Option(
        False, "--synthetic", help="Generate dummy features instead of fetching from AW"
    ),
) -> None:
    """Build per-minute feature rows by fetching from a running ActivityWatch server.

    Supports a single --date or a --date-from / --date-to range.
    Use --synthetic to generate dummy data for testing.
    """
    from taskclf.features.build import build_features_for_date

    if date and (date_from or date_to):
        typer.echo("Use either --date or --date-from/--date-to, not both.", err=True)
        raise typer.Exit(code=1)
    if not date and not date_from:
        typer.echo("Provide --date or --date-from/--date-to.", err=True)
        raise typer.Exit(code=1)

    if date:
        start = dt.date.fromisoformat(date)
        end = start
    else:
        start = dt.date.fromisoformat(date_from)  # type: ignore[arg-type]
        end = dt.date.fromisoformat(date_to) if date_to else start

    if end < start:
        typer.echo(f"--date-to ({end}) is before --date-from ({start}).", err=True)
        raise typer.Exit(code=1)

    effective_host = None if synthetic else aw_host
    data_path = Path(data_dir)
    current = start
    ok_days: list[dt.date] = []
    empty_days: list[dt.date] = []
    failed: list[tuple[dt.date, str]] = []

    while current <= end:
        try:
            out_path = build_features_for_date(
                current,
                data_path,
                aw_host=effective_host,
                title_salt=title_salt,
                synthetic=synthetic,
            )
            row_count = _parquet_row_count(out_path)
            if row_count == 0:
                empty_days.append(current)
            else:
                ok_days.append(current)
        except Exception as exc:
            failed.append((current, str(exc)))
        current += dt.timedelta(days=1)

    total = len(ok_days) + len(empty_days) + len(failed)
    out_dir = _feature_output_dir(data_path)
    typer.echo(
        f"Done — {total} day(s): {len(ok_days)} ok, "
        f"{len(empty_days)} empty, {len(failed)} failed."
    )
    typer.echo(f"  output: {out_dir}")
    if empty_days:
        dates_str = ", ".join(d.isoformat() for d in empty_days)
        typer.echo(f"  empty (no AW events): {dates_str}")
    if failed:
        for d, reason in failed:
            typer.echo(f"  FAILED {d}: {reason}", err=True)

labels_import_cmd(file=typer.Option(..., '--file', help='Path to labels CSV (start_ts, end_ts, label, provenance)'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'))

Import label spans from a CSV file and write to parquet.

Source code in src/taskclf/cli/main.py
@labels_app.command("import")
def labels_import_cmd(
    file: str = typer.Option(
        ..., "--file", help="Path to labels CSV (start_ts, end_ts, label, provenance)"
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
) -> None:
    """Import label spans from a CSV file and write to parquet."""
    from taskclf.labels.store import import_labels_from_csv, write_label_spans

    csv_path = Path(file)
    if not csv_path.exists():
        typer.echo(f"File not found: {csv_path}", err=True)
        raise typer.Exit(code=1)

    spans = import_labels_from_csv(csv_path)
    typer.echo(f"Validated {len(spans)} label spans")

    out_path = Path(data_dir) / "labels_v1" / "labels.parquet"
    write_label_spans(spans, out_path)
    typer.echo(f"Wrote labels to {out_path}")

labels_add_block_cmd(start=typer.Option(..., '--start', help='Block start timestamp (ISO-8601 UTC)'), end=typer.Option(..., '--end', help='Block end timestamp (ISO-8601 UTC)'), label=typer.Option(..., '--label', help='Core label (Build, Debug, Review, Write, ReadResearch, Communicate, Meet, BreakIdle)'), user_id=typer.Option('default-user', '--user-id', help='User ID for this label'), confidence=typer.Option(None, '--confidence', min=0.0, max=1.0, help='Labeler confidence (0-1)'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'), model_dir=typer.Option(None, '--model-dir', help='Model bundle directory (for predicted label display)'))

Create a manual label block for a time range.

Source code in src/taskclf/cli/main.py
@labels_app.command("add-block")
def labels_add_block_cmd(
    start: str = typer.Option(
        ..., "--start", help="Block start timestamp (ISO-8601 UTC)"
    ),
    end: str = typer.Option(..., "--end", help="Block end timestamp (ISO-8601 UTC)"),
    label: str = typer.Option(
        ...,
        "--label",
        help="Core label (Build, Debug, Review, Write, ReadResearch, Communicate, Meet, BreakIdle)",
    ),
    user_id: str = typer.Option(
        "default-user", "--user-id", help="User ID for this label"
    ),
    confidence: float | None = typer.Option(
        None, "--confidence", min=0.0, max=1.0, help="Labeler confidence (0-1)"
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
    model_dir: str | None = typer.Option(
        None, "--model-dir", help="Model bundle directory (for predicted label display)"
    ),
) -> None:
    """Create a manual label block for a time range."""
    import pandas as pd
    from rich.console import Console
    from rich.table import Table

    from taskclf.core.store import read_parquet
    from taskclf.core.types import LabelSpan
    from taskclf.labels.store import append_label_span, generate_label_summary

    console = Console()

    start_ts = dt.datetime.fromisoformat(start)
    end_ts = dt.datetime.fromisoformat(end)
    effective_confidence = confidence if confidence is not None else 1.0

    span = LabelSpan(
        start_ts=start_ts,
        end_ts=end_ts,
        label=label,
        provenance="manual",
        user_id=user_id,
        confidence=effective_confidence,
    )

    features_dfs: list[pd.DataFrame] = []
    data_path = Path(data_dir)
    current_date = start_ts.date()
    while current_date <= end_ts.date():
        fp = _feature_parquet_for_date(data_path, current_date)
        if fp is not None and fp.exists():
            features_dfs.append(read_parquet(fp))
        current_date += dt.timedelta(days=1)

    if features_dfs:
        feat_df = pd.concat(features_dfs, ignore_index=True)
        summary = generate_label_summary(feat_df, start_ts, end_ts)

        table = Table(title="Block Summary")
        table.add_column("Metric", style="bold")
        table.add_column("Value")
        table.add_row("Time range", f"{start_ts}{end_ts}")
        table.add_row("Buckets", str(summary["total_buckets"]))
        table.add_row("Sessions", str(summary["session_count"]))
        if summary["mean_keys_per_min"] is not None:
            table.add_row("Avg keys/min", str(summary["mean_keys_per_min"]))
        if summary["mean_clicks_per_min"] is not None:
            table.add_row("Avg clicks/min", str(summary["mean_clicks_per_min"]))
        for app_info in summary["top_apps"][:3]:
            table.add_row(
                f"App: {app_info['app_id']}", f"{app_info['buckets']} buckets"
            )
        console.print(table)

    if model_dir is not None and features_dfs:
        try:
            from taskclf.core.model_io import load_model_bundle
            from taskclf.infer.batch import predict_labels

            model, _meta, cat_encoders = load_model_bundle(Path(model_dir))
            from sklearn.preprocessing import LabelEncoder

            from taskclf.core.types import LABEL_SET_V1

            le = LabelEncoder()
            le.fit(sorted(LABEL_SET_V1))
            preds = predict_labels(model, feat_df, le, cat_encoders=cat_encoders)
            from collections import Counter

            top_pred = Counter(preds).most_common(1)[0][0]
            console.print(f"[bold]Top predicted label:[/bold] {top_pred}")
        except Exception as exc:
            console.print(f"[dim]Could not predict: {exc}[/dim]")

    labels_path = Path(data_dir) / "labels_v1" / "labels.parquet"
    try:
        append_label_span(span, labels_path)
    except ValueError as exc:
        typer.echo(f"Error: {exc}", err=True)
        raise typer.Exit(code=1)

    typer.echo(
        f"Added label block: {label} [{start_ts}{end_ts}] "
        f"user={user_id} confidence={confidence}"
    )

labels_label_now_cmd(minutes=typer.Option(10, '--minutes', min=1, help='How many minutes back from now to label'), label=typer.Option(..., '--label', help='Core label (Build, Debug, Review, Write, ReadResearch, Communicate, Meet, BreakIdle)'), user_id=typer.Option('default-user', '--user-id', help='User ID for this label'), confidence=typer.Option(None, '--confidence', min=0.0, max=1.0, help='Labeler confidence (0-1)'), aw_host=typer.Option(DEFAULT_AW_HOST, '--aw-host', help='ActivityWatch server URL for live summary'), title_salt=typer.Option(DEFAULT_TITLE_SALT, '--title-salt', help='Salt for hashing window titles'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'))

Label the last N minutes with a single command (no timestamps needed).

Source code in src/taskclf/cli/main.py
@labels_app.command("label-now")
def labels_label_now_cmd(
    minutes: int = typer.Option(
        10, "--minutes", min=1, help="How many minutes back from now to label"
    ),
    label: str = typer.Option(
        ...,
        "--label",
        help="Core label (Build, Debug, Review, Write, ReadResearch, Communicate, Meet, BreakIdle)",
    ),
    user_id: str = typer.Option(
        "default-user", "--user-id", help="User ID for this label"
    ),
    confidence: float | None = typer.Option(
        None, "--confidence", min=0.0, max=1.0, help="Labeler confidence (0-1)"
    ),
    aw_host: str = typer.Option(
        DEFAULT_AW_HOST, "--aw-host", help="ActivityWatch server URL for live summary"
    ),
    title_salt: str = typer.Option(
        DEFAULT_TITLE_SALT, "--title-salt", help="Salt for hashing window titles"
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
) -> None:
    """Label the last N minutes with a single command (no timestamps needed)."""
    from collections import Counter

    import pandas as pd
    from rich.console import Console
    from rich.table import Table

    from taskclf.core.store import read_parquet
    from taskclf.core.time import ts_utc_aware_get
    from taskclf.core.types import LabelSpan
    from taskclf.labels.store import append_label_span, generate_label_summary

    console = Console()

    end_ts = dt.datetime.now(dt.timezone.utc)
    start_ts = end_ts - dt.timedelta(minutes=minutes)

    console.print(
        f"[bold]Labeling last {minutes} minute(s):[/bold] "
        f"{start_ts:%H:%M:%S}{end_ts:%H:%M:%S} UTC"
    )

    # Live AW summary (best-effort)
    try:
        from taskclf.adapters.activitywatch.client import (
            fetch_aw_events,
            find_window_bucket_id,
        )

        bucket_id = find_window_bucket_id(aw_host)
        events = fetch_aw_events(
            aw_host, bucket_id, start_ts, end_ts, title_salt=title_salt
        )
        if events:
            app_counts = Counter(ev.app_id for ev in events)
            table = Table(title="Live Activity (from ActivityWatch)")
            table.add_column("App", style="bold")
            table.add_column("Events", justify="right")
            for app, cnt in app_counts.most_common(5):
                table.add_row(app, str(cnt))
            table.add_row("Total events", str(len(events)), style="dim")
            console.print(table)
        else:
            console.print("[dim]No ActivityWatch events in this window.[/dim]")
    except Exception as exc:
        console.print(
            f"[dim]ActivityWatch not reachable ({exc}); skipping live summary.[/dim]"
        )

    start_utc = ts_utc_aware_get(start_ts)
    end_utc = ts_utc_aware_get(end_ts)

    # On-disk feature summary (best-effort)
    features_dfs: list[pd.DataFrame] = []
    data_path = Path(data_dir)
    current_date = start_utc.date()
    while current_date <= end_utc.date():
        fp = _feature_parquet_for_date(data_path, current_date)
        if fp is not None and fp.exists():
            features_dfs.append(read_parquet(fp))
        current_date += dt.timedelta(days=1)

    if features_dfs:
        feat_df = pd.concat(features_dfs, ignore_index=True)
        summary = generate_label_summary(feat_df, start_utc, end_utc)
        if summary["total_buckets"] > 0:
            table = Table(title="Feature Summary")
            table.add_column("Metric", style="bold")
            table.add_column("Value")
            table.add_row("Buckets", str(summary["total_buckets"]))
            table.add_row("Sessions", str(summary["session_count"]))
            if summary["mean_keys_per_min"] is not None:
                table.add_row("Avg keys/min", str(summary["mean_keys_per_min"]))
            if summary["mean_clicks_per_min"] is not None:
                table.add_row("Avg clicks/min", str(summary["mean_clicks_per_min"]))
            for app_info in summary["top_apps"][:3]:
                table.add_row(
                    f"App: {app_info['app_id']}", f"{app_info['buckets']} buckets"
                )
            console.print(table)

    effective_confidence = confidence if confidence is not None else 1.0
    span = LabelSpan(
        start_ts=start_utc,
        end_ts=end_utc,
        label=label,
        provenance="manual",
        user_id=user_id,
        confidence=effective_confidence,
    )

    labels_path = Path(data_dir) / "labels_v1" / "labels.parquet"
    try:
        append_label_span(span, labels_path)
    except ValueError as exc:
        typer.echo(f"Error: {exc}", err=True)
        raise typer.Exit(code=1)

    typer.echo(
        f"Labeled: {label} [{start_ts:%H:%M}{end_ts:%H:%M} UTC] "
        f"user={user_id} confidence={effective_confidence}"
    )

labels_show_queue_cmd(user_id=typer.Option(None, '--user-id', help='Filter to a specific user'), limit=typer.Option(10, '--limit', help='Maximum items to show'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'))

Show pending items in the active labeling queue.

Source code in src/taskclf/cli/main.py
@labels_app.command("show-queue")
def labels_show_queue_cmd(
    user_id: str | None = typer.Option(
        None, "--user-id", help="Filter to a specific user"
    ),
    limit: int = typer.Option(10, "--limit", help="Maximum items to show"),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
) -> None:
    """Show pending items in the active labeling queue."""
    from rich.console import Console
    from rich.table import Table

    from taskclf.labels.queue import ActiveLabelingQueue

    console = Console()
    queue_path = Path(data_dir) / "labels_v1" / "queue.json"

    if not queue_path.exists():
        console.print("[dim]No labeling queue found.[/dim]")
        return

    queue = ActiveLabelingQueue(queue_path)
    pending = queue.get_pending(user_id=user_id, limit=limit)

    if not pending:
        console.print("[dim]No pending labeling requests.[/dim]")
        return

    table = Table(title=f"Pending Labeling Requests ({len(pending)})")
    table.add_column("ID", style="dim", max_width=8)
    table.add_column("User")
    table.add_column("Time Range")
    table.add_column("Reason")
    table.add_column("Predicted")
    table.add_column("Confidence", justify="right")

    for req in pending:
        table.add_row(
            req.request_id[:8],
            req.user_id,
            f"{req.bucket_start_ts:%H:%M}{req.bucket_end_ts:%H:%M}",
            req.reason,
            req.predicted_label or "—",
            f"{req.confidence:.2f}" if req.confidence is not None else "—",
        )

    console.print(table)

labels_export_cmd(out=typer.Option('labels.csv', '--out', '-o', help='Destination CSV file path'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory'))

Export labels.parquet to CSV.

Source code in src/taskclf/cli/main.py
@labels_app.command("export")
def labels_export_cmd(
    out: str = typer.Option(
        "labels.csv", "--out", "-o", help="Destination CSV file path"
    ),
    data_dir: str = typer.Option(DEFAULT_DATA_DIR, help="Processed data directory"),
) -> None:
    """Export labels.parquet to CSV."""
    from taskclf.labels.store import export_labels_to_csv

    parquet_path = Path(data_dir) / "labels_v1" / "labels.parquet"
    csv_path = Path(out)
    try:
        result = export_labels_to_csv(parquet_path, csv_path)
        typer.echo(f"Exported labels -> {result}")
    except ValueError as exc:
        typer.echo(str(exc), err=True)
        raise typer.Exit(code=1)

labels_project_cmd(date_from=typer.Option(..., '--from', help='Start date (YYYY-MM-DD)'), date_to=typer.Option(..., '--to', help='End date (YYYY-MM-DD, inclusive)'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'), out_dir=typer.Option(DEFAULT_DATA_DIR, '--out-dir', help='Output directory for projected labels'))

Project label blocks onto feature windows using strict containment rules.

Source code in src/taskclf/cli/main.py
@labels_app.command("project")
def labels_project_cmd(
    date_from: str = typer.Option(..., "--from", help="Start date (YYYY-MM-DD)"),
    date_to: str = typer.Option(..., "--to", help="End date (YYYY-MM-DD, inclusive)"),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
    out_dir: str = typer.Option(
        DEFAULT_DATA_DIR, "--out-dir", help="Output directory for projected labels"
    ),
) -> None:
    """Project label blocks onto feature windows using strict containment rules."""
    import pandas as pd

    from taskclf.core.store import read_parquet, write_parquet
    from taskclf.labels.projection import project_blocks_to_windows
    from taskclf.labels.store import read_label_spans

    labels_path = Path(data_dir) / "labels_v1" / "labels.parquet"
    if not labels_path.exists():
        typer.echo(f"Labels file not found: {labels_path}", err=True)
        raise typer.Exit(code=1)

    spans = read_label_spans(labels_path)
    typer.echo(f"Loaded {len(spans)} label spans")

    start = dt.date.fromisoformat(date_from)
    end = dt.date.fromisoformat(date_to)

    all_features: list[pd.DataFrame] = []
    current = start
    while current <= end:
        fp = _feature_parquet_for_date(Path(data_dir), current)
        if fp is not None and fp.exists():
            all_features.append(read_parquet(fp))
        else:
            typer.echo(f"  skipping {current} (no features file)")
        current += dt.timedelta(days=1)

    if not all_features:
        typer.echo("No feature data found for the given date range.", err=True)
        raise typer.Exit(code=1)

    features_df = pd.concat(all_features, ignore_index=True)
    projected = project_blocks_to_windows(features_df, spans)

    out_path = Path(out_dir) / "labels_v1" / "projected_labels.parquet"
    write_parquet(projected, out_path)
    typer.echo(
        f"Projected {len(projected)} labeled windows "
        f"(from {len(features_df)} total) -> {out_path}"
    )

train_build_dataset_cmd(date_from=typer.Option(..., '--from', help='Start date (YYYY-MM-DD)'), date_to=typer.Option(..., '--to', help='End date (YYYY-MM-DD, inclusive)'), synthetic=typer.Option(False, '--synthetic', help='Generate dummy features + labels'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'), out_dir=typer.Option(DEFAULT_DATA_DIR, '--out-dir', help='Output directory for X/y/splits'), holdout_fraction=typer.Option(0.0, '--holdout-fraction', help='Fraction of users to hold out for test'), train_ratio=typer.Option(0.7, '--train-ratio', help='Chronological train fraction per user'), val_ratio=typer.Option(0.15, '--val-ratio', help='Chronological val fraction per user'))

Build a training dataset: join features + labels, exclude, split, and write X/y/splits.

Source code in src/taskclf/cli/main.py
@train_app.command("build-dataset")
def train_build_dataset_cmd(
    date_from: str = typer.Option(..., "--from", help="Start date (YYYY-MM-DD)"),
    date_to: str = typer.Option(..., "--to", help="End date (YYYY-MM-DD, inclusive)"),
    synthetic: bool = typer.Option(
        False, "--synthetic", help="Generate dummy features + labels"
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
    out_dir: str = typer.Option(
        DEFAULT_DATA_DIR, "--out-dir", help="Output directory for X/y/splits"
    ),
    holdout_fraction: float = typer.Option(
        0.0, "--holdout-fraction", help="Fraction of users to hold out for test"
    ),
    train_ratio: float = typer.Option(
        0.70, "--train-ratio", help="Chronological train fraction per user"
    ),
    val_ratio: float = typer.Option(
        0.15, "--val-ratio", help="Chronological val fraction per user"
    ),
) -> None:
    """Build a training dataset: join features + labels, exclude, split, and write X/y/splits."""
    import pandas as pd

    from taskclf.core.store import read_parquet
    from taskclf.features.build import generate_dummy_features
    from taskclf.labels.store import generate_dummy_labels
    from taskclf.train.build_dataset import build_training_dataset

    start = dt.date.fromisoformat(date_from)
    end = dt.date.fromisoformat(date_to)

    all_features: list[pd.DataFrame] = []
    all_labels: list = []
    current = start

    if not synthetic:
        all_labels = _load_real_labels(data_dir, start, end)
        typer.echo(f"Loaded {len(all_labels)} label spans from disk")
        if not all_labels:
            typer.echo(
                "WARNING: no label spans found — labeled dataset will be empty",
                err=True,
            )

    while current <= end:
        if synthetic:
            rows = generate_dummy_features(current, n_rows=60)
            df = pd.DataFrame([r.model_dump() for r in rows])
            labels = generate_dummy_labels(current, n_rows=60)
            all_labels.extend(labels)
        else:
            parquet_path = _feature_parquet_for_date(Path(data_dir), current)
            if parquet_path is None or not parquet_path.exists():
                typer.echo(f"  skipping {current} (no features file)")
                current += dt.timedelta(days=1)
                continue
            df = read_parquet(parquet_path)

        all_features.append(df)
        current += dt.timedelta(days=1)

    if not all_features:
        typer.echo("No feature data found for the given date range.", err=True)
        raise typer.Exit(code=1)

    features_df = pd.concat(all_features, ignore_index=True)
    typer.echo(
        f"Loaded {len(features_df)} feature rows across {len(all_features)} day(s)"
    )

    manifest = build_training_dataset(
        features_df,
        all_labels,
        output_dir=Path(out_dir) / "training_dataset",
        train_ratio=train_ratio,
        val_ratio=val_ratio,
        holdout_user_fraction=holdout_fraction,
    )

    typer.echo(
        f"Dataset: {manifest.total_rows} rows ({manifest.excluded_rows} excluded)"
    )
    typer.echo(
        f"  Train: {manifest.train_rows}  Val: {manifest.val_rows}  Test: {manifest.test_rows}"
    )
    typer.echo(f"  X -> {manifest.x_path}")
    typer.echo(f"  y -> {manifest.y_path}")
    typer.echo(f"  splits -> {manifest.splits_path}")

train_lgbm_cmd(date_from=typer.Option(..., '--from', help='Start date (YYYY-MM-DD)'), date_to=typer.Option(..., '--to', help='End date (YYYY-MM-DD, inclusive)'), synthetic=typer.Option(False, '--synthetic', help='Generate dummy features + labels instead of reading from disk'), models_dir=typer.Option(DEFAULT_MODELS_DIR, help='Base directory for model bundles (derived from TASKCLF_HOME)'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'), num_boost_round=typer.Option(DEFAULT_NUM_BOOST_ROUND, help='Number of boosting rounds'), class_weight=typer.Option('balanced', '--class-weight', help="Class-weight strategy: 'balanced' (inverse-frequency) or 'none'"))

Train a LightGBM multiclass model and save the model bundle.

Source code in src/taskclf/cli/main.py
@train_app.command("lgbm")
def train_lgbm_cmd(
    date_from: str = typer.Option(..., "--from", help="Start date (YYYY-MM-DD)"),
    date_to: str = typer.Option(..., "--to", help="End date (YYYY-MM-DD, inclusive)"),
    synthetic: bool = typer.Option(
        False,
        "--synthetic",
        help="Generate dummy features + labels instead of reading from disk",
    ),
    models_dir: str = typer.Option(
        DEFAULT_MODELS_DIR,
        help="Base directory for model bundles (derived from TASKCLF_HOME)",
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
    num_boost_round: int = typer.Option(
        DEFAULT_NUM_BOOST_ROUND, help="Number of boosting rounds"
    ),
    class_weight: str = typer.Option(
        "balanced",
        "--class-weight",
        help="Class-weight strategy: 'balanced' (inverse-frequency) or 'none'",
    ),
) -> None:
    """Train a LightGBM multiclass model and save the model bundle."""
    import pandas as pd

    from taskclf.core.model_io import build_metadata, save_model_bundle
    from taskclf.core.store import read_parquet
    from taskclf.features.build import generate_dummy_features
    from taskclf.labels.projection import project_blocks_to_windows
    from taskclf.labels.store import generate_dummy_labels
    from taskclf.train.dataset import split_by_time
    from taskclf.train.lgbm import train_lgbm

    start = dt.date.fromisoformat(date_from)
    end = dt.date.fromisoformat(date_to)

    all_features: list[pd.DataFrame] = []
    all_labels: list = []
    current = start

    if not synthetic:
        all_labels = _load_real_labels(data_dir, start, end)
        typer.echo(f"Loaded {len(all_labels)} label spans from disk")
        if not all_labels:
            typer.echo(
                "WARNING: no label spans found — labeled dataset will be empty",
                err=True,
            )

    while current <= end:
        if synthetic:
            rows = generate_dummy_features(current, n_rows=60)
            df = pd.DataFrame([r.model_dump() for r in rows])
            labels = generate_dummy_labels(current, n_rows=60)
            all_labels.extend(labels)
        else:
            parquet_path = _feature_parquet_for_date(Path(data_dir), current)
            if parquet_path is None or not parquet_path.exists():
                typer.echo(f"  skipping {current} (no features file)")
                current += dt.timedelta(days=1)
                continue
            df = read_parquet(parquet_path)

        all_features.append(df)
        current += dt.timedelta(days=1)

    if not all_features:
        typer.echo("No feature data found for the given date range.", err=True)
        raise typer.Exit(code=1)

    features_df = pd.concat(all_features, ignore_index=True)
    typer.echo(
        f"Loaded {len(features_df)} feature rows across {len(all_features)} day(s)"
    )

    labeled_df = project_blocks_to_windows(features_df, all_labels)
    typer.echo(f"Labeled {len(labeled_df)} / {len(features_df)} rows")

    if labeled_df.empty:
        typer.echo("No labeled rows — cannot train.", err=True)
        raise typer.Exit(code=1)

    splits = split_by_time(labeled_df)
    train_df = labeled_df.iloc[splits["train"]].reset_index(drop=True)
    val_df = labeled_df.iloc[splits["val"]].reset_index(drop=True)
    typer.echo(
        f"Train: {len(train_df)} rows, Val: {len(val_df)} rows, "
        f"Test: {len(splits['test'])} rows (held out)"
    )

    cw: Literal["balanced", "none"] = "none" if class_weight == "none" else "balanced"
    schema_version = (
        str(labeled_df["schema_version"].iloc[0]).strip()
        if "schema_version" in labeled_df.columns and not labeled_df.empty
        else LATEST_FEATURE_SCHEMA_VERSION
    )
    model, metrics, cm_df, params, cat_encoders = train_lgbm(
        train_df,
        val_df,
        num_boost_round=num_boost_round,
        class_weight=cw,
        schema_version=schema_version,
    )
    typer.echo(
        f"Macro F1: {metrics['macro_f1']}  Weighted F1: {metrics['weighted_f1']}"
    )

    from taskclf.train.retrain import compute_dataset_hash

    dataset_hash = compute_dataset_hash(features_df, all_labels)
    typer.echo(f"Dataset hash: {dataset_hash}")

    metadata = build_metadata(
        label_set=metrics["label_names"],
        train_date_from=start,
        train_date_to=end,
        params=params,
        dataset_hash=dataset_hash,
        data_provenance="synthetic" if synthetic else "real",
        unknown_category_freq_threshold=params.get("unknown_category_freq_threshold"),
        unknown_category_mask_rate=params.get("unknown_category_mask_rate"),
        schema_version=schema_version,
    )

    run_dir = save_model_bundle(
        model=model,
        metadata=metadata,
        metrics=metrics,
        confusion_df=cm_df,
        base_dir=Path(models_dir),
        cat_encoders=cat_encoders,
    )
    typer.echo(f"Model bundle saved to {run_dir}")

train_evaluate_cmd(model_dir=typer.Option(..., '--model-dir', help='Path to a model run directory'), date_from=typer.Option(..., '--from', help='Start date (YYYY-MM-DD)'), date_to=typer.Option(..., '--to', help='End date (YYYY-MM-DD, inclusive)'), synthetic=typer.Option(False, '--synthetic', help='Generate dummy features + labels'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'), out_dir=typer.Option(DEFAULT_OUT_DIR, help='Output directory for evaluation artifacts'), holdout_fraction=typer.Option(0.0, '--holdout-fraction', help='Fraction of users held out for unseen-user evaluation'), reject_threshold=typer.Option(DEFAULT_REJECT_THRESHOLD, '--reject-threshold', help='Max-probability below which prediction is rejected'))

Run full evaluation of a trained model: metrics, calibration, acceptance checks.

Source code in src/taskclf/cli/main.py
@train_app.command("evaluate")
def train_evaluate_cmd(
    model_dir: str = typer.Option(
        ..., "--model-dir", help="Path to a model run directory"
    ),
    date_from: str = typer.Option(..., "--from", help="Start date (YYYY-MM-DD)"),
    date_to: str = typer.Option(..., "--to", help="End date (YYYY-MM-DD, inclusive)"),
    synthetic: bool = typer.Option(
        False, "--synthetic", help="Generate dummy features + labels"
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
    out_dir: str = typer.Option(
        DEFAULT_OUT_DIR, help="Output directory for evaluation artifacts"
    ),
    holdout_fraction: float = typer.Option(
        0.0,
        "--holdout-fraction",
        help="Fraction of users held out for unseen-user evaluation",
    ),
    reject_threshold: float = typer.Option(
        DEFAULT_REJECT_THRESHOLD,
        "--reject-threshold",
        help="Max-probability below which prediction is rejected",
    ),
) -> None:
    """Run full evaluation of a trained model: metrics, calibration, acceptance checks."""
    import pandas as pd
    from rich.console import Console
    from rich.table import Table

    from taskclf.core.model_io import load_model_bundle
    from taskclf.core.store import read_parquet
    from taskclf.features.build import generate_dummy_features
    from taskclf.labels.projection import project_blocks_to_windows
    from taskclf.labels.store import generate_dummy_labels
    from taskclf.train.dataset import split_by_time
    from taskclf.train.evaluate import evaluate_model, write_evaluation_artifacts

    console = Console()

    model, metadata, cat_encoders = load_model_bundle(Path(model_dir))
    typer.echo(f"Loaded model from {model_dir} (schema={metadata.schema_hash})")

    start = dt.date.fromisoformat(date_from)
    end = dt.date.fromisoformat(date_to)

    all_features: list[pd.DataFrame] = []
    all_labels: list = []
    current = start

    if not synthetic:
        all_labels = _load_real_labels(data_dir, start, end)
        typer.echo(f"Loaded {len(all_labels)} label spans from disk")
        if not all_labels:
            typer.echo(
                "WARNING: no label spans found — labeled dataset will be empty",
                err=True,
            )

    while current <= end:
        if synthetic:
            rows = generate_dummy_features(current, n_rows=60)
            df = pd.DataFrame([r.model_dump() for r in rows])
            labels = generate_dummy_labels(current, n_rows=60)
            all_labels.extend(labels)
        else:
            parquet_path = _feature_parquet_for_date(Path(data_dir), current)
            if parquet_path is None or not parquet_path.exists():
                typer.echo(f"  skipping {current} (no features file)")
                current += dt.timedelta(days=1)
                continue
            df = read_parquet(parquet_path)
        all_features.append(df)
        current += dt.timedelta(days=1)

    if not all_features:
        typer.echo("No feature data found for the given date range.", err=True)
        raise typer.Exit(code=1)

    features_df = pd.concat(all_features, ignore_index=True)
    labeled_df = project_blocks_to_windows(features_df, all_labels)

    if labeled_df.empty:
        typer.echo("No labeled rows — cannot evaluate.", err=True)
        raise typer.Exit(code=1)

    splits = split_by_time(labeled_df, holdout_user_fraction=holdout_fraction)
    test_df = labeled_df.iloc[splits["test"]].reset_index(drop=True)
    holdout_users = splits.get("holdout_users", [])

    if test_df.empty:
        typer.echo("Test set is empty — cannot evaluate.", err=True)
        raise typer.Exit(code=1)

    typer.echo(
        f"Evaluating on {len(test_df)} test rows ({len(holdout_users)} holdout users)"
    )

    report = evaluate_model(
        model,
        test_df,
        cat_encoders=cat_encoders,
        holdout_users=holdout_users,
        reject_threshold=reject_threshold,
        schema_version=metadata.schema_version,
    )

    # -- Overall metrics table --
    overall = Table(title="Overall Metrics")
    overall.add_column("Metric", style="bold")
    overall.add_column("Value", justify="right")
    overall.add_row("Macro F1", f"{report.macro_f1:.4f}")
    overall.add_row("Weighted F1", f"{report.weighted_f1:.4f}")
    overall.add_row("Reject Rate", f"{report.reject_rate:.4f}")
    if report.seen_user_f1 is not None:
        overall.add_row("Seen-User F1", f"{report.seen_user_f1:.4f}")
    if report.unseen_user_f1 is not None:
        overall.add_row("Unseen-User F1", f"{report.unseen_user_f1:.4f}")
    console.print(overall)

    # -- Per-class table --
    pc_table = Table(title="Per-Class Metrics")
    pc_table.add_column("Class", style="bold")
    pc_table.add_column("Precision", justify="right")
    pc_table.add_column("Recall", justify="right")
    pc_table.add_column("F1", justify="right")
    for cls in report.label_names:
        m = report.per_class.get(cls, {})
        pc_table.add_row(
            cls,
            f"{m.get('precision', 0):.4f}",
            f"{m.get('recall', 0):.4f}",
            f"{m.get('f1', 0):.4f}",
        )
    console.print(pc_table)

    # -- Per-user table --
    pu_table = Table(title="Per-User Macro F1")
    pu_table.add_column("User", style="bold")
    pu_table.add_column("Rows", justify="right")
    pu_table.add_column("Macro F1", justify="right")
    for uid, um in report.per_user.items():
        pu_table.add_row(
            uid, str(int(um.get("count", 0))), f"{um.get('macro_f1', 0):.4f}"
        )
    console.print(pu_table)

    # -- Acceptance checks table --
    acc_table = Table(title="Acceptance Checks")
    acc_table.add_column("Check", style="bold")
    acc_table.add_column("Result")
    acc_table.add_column("Detail")
    for check_name, passed in report.acceptance_checks.items():
        status = "[green]PASS[/green]" if passed else "[red]FAIL[/red]"
        acc_table.add_row(
            check_name, status, report.acceptance_details.get(check_name, "")
        )
    console.print(acc_table)

    # -- Stratification warnings --
    if report.stratification.get("warnings"):
        for w in report.stratification["warnings"]:
            console.print(f"[yellow]WARNING:[/yellow] {w}")

    # -- Write artifacts --
    out = Path(out_dir)
    paths = write_evaluation_artifacts(report, out)
    for name, p in paths.items():
        typer.echo(f"  {name}: {p}")

    all_pass = all(report.acceptance_checks.values())
    if all_pass:
        console.print("[bold green]All acceptance checks passed.[/bold green]")
    else:
        console.print("[bold red]Some acceptance checks failed.[/bold red]")

train_tune_reject_cmd(model_dir=typer.Option(..., '--model-dir', help='Path to a model run directory'), date_from=typer.Option(..., '--from', help='Start date (YYYY-MM-DD)'), date_to=typer.Option(..., '--to', help='End date (YYYY-MM-DD, inclusive)'), synthetic=typer.Option(False, '--synthetic', help='Generate dummy features + labels'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'), out_dir=typer.Option(DEFAULT_OUT_DIR, help='Output directory for tuning report'), write_policy=typer.Option(False, '--write-policy/--no-write-policy', help='Write an inference policy binding model + calibrator + tuned threshold'), calibrator_store=typer.Option(None, '--calibrator-store', help='Path to calibrator store directory (included in policy when --write-policy)'), models_dir=typer.Option(DEFAULT_MODELS_DIR, '--models-dir', help='Base directory for model bundles (used for policy file location)'))

Sweep reject thresholds on a validation set and recommend the best one.

Source code in src/taskclf/cli/main.py
@train_app.command("tune-reject")
def train_tune_reject_cmd(
    model_dir: str = typer.Option(
        ..., "--model-dir", help="Path to a model run directory"
    ),
    date_from: str = typer.Option(..., "--from", help="Start date (YYYY-MM-DD)"),
    date_to: str = typer.Option(..., "--to", help="End date (YYYY-MM-DD, inclusive)"),
    synthetic: bool = typer.Option(
        False, "--synthetic", help="Generate dummy features + labels"
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
    out_dir: str = typer.Option(
        DEFAULT_OUT_DIR, help="Output directory for tuning report"
    ),
    write_policy: bool = typer.Option(
        False,
        "--write-policy/--no-write-policy",
        help="Write an inference policy binding model + calibrator + tuned threshold",
    ),
    calibrator_store: str | None = typer.Option(
        None,
        "--calibrator-store",
        help="Path to calibrator store directory (included in policy when --write-policy)",
    ),
    models_dir: str = typer.Option(
        DEFAULT_MODELS_DIR,
        "--models-dir",
        help="Base directory for model bundles (used for policy file location)",
    ),
) -> None:
    """Sweep reject thresholds on a validation set and recommend the best one."""
    import json

    import pandas as pd
    from rich.console import Console
    from rich.table import Table

    from taskclf.core.model_io import load_model_bundle
    from taskclf.core.store import read_parquet
    from taskclf.features.build import generate_dummy_features
    from taskclf.labels.projection import project_blocks_to_windows
    from taskclf.labels.store import generate_dummy_labels
    from taskclf.train.dataset import split_by_time
    from taskclf.train.evaluate import tune_reject_threshold

    console = Console()

    model, metadata, cat_encoders = load_model_bundle(Path(model_dir))
    typer.echo(f"Loaded model from {model_dir} (schema={metadata.schema_hash})")

    start = dt.date.fromisoformat(date_from)
    end = dt.date.fromisoformat(date_to)

    all_features: list[pd.DataFrame] = []
    all_labels: list = []
    current = start

    if not synthetic:
        all_labels = _load_real_labels(data_dir, start, end)
        typer.echo(f"Loaded {len(all_labels)} label spans from disk")
        if not all_labels:
            typer.echo(
                "WARNING: no label spans found — labeled dataset will be empty",
                err=True,
            )

    while current <= end:
        if synthetic:
            rows = generate_dummy_features(current, n_rows=60)
            df = pd.DataFrame([r.model_dump() for r in rows])
            labels = generate_dummy_labels(current, n_rows=60)
            all_labels.extend(labels)
        else:
            parquet_path = _feature_parquet_for_date(Path(data_dir), current)
            if parquet_path is None or not parquet_path.exists():
                typer.echo(f"  skipping {current} (no features file)")
                current += dt.timedelta(days=1)
                continue
            df = read_parquet(parquet_path)
        all_features.append(df)
        current += dt.timedelta(days=1)

    if not all_features:
        typer.echo("No feature data found for the given date range.", err=True)
        raise typer.Exit(code=1)

    features_df = pd.concat(all_features, ignore_index=True)
    labeled_df = project_blocks_to_windows(features_df, all_labels)

    if labeled_df.empty:
        typer.echo("No labeled rows — cannot tune.", err=True)
        raise typer.Exit(code=1)

    splits = split_by_time(labeled_df)
    val_df = labeled_df.iloc[splits["val"]].reset_index(drop=True)
    if val_df.empty:
        typer.echo("Validation split is empty — using full dataset.", err=True)
        val_df = labeled_df

    typer.echo(f"Tuning reject threshold on {len(val_df)} validation rows")

    cal = None
    if calibrator_store is not None:
        from taskclf.infer.calibration import load_calibrator_store

        cal_store_obj = load_calibrator_store(Path(calibrator_store))
        cal = cal_store_obj.global_calibrator
        typer.echo(
            f"Calibrating scores with {type(cal).__name__} from {calibrator_store}"
        )

    result = tune_reject_threshold(
        model,
        val_df,
        cat_encoders=cat_encoders,
        calibrator=cal,
        schema_version=metadata.schema_version,
    )

    sweep_table = Table(title="Reject Threshold Sweep")
    sweep_table.add_column("Threshold", justify="right")
    sweep_table.add_column("Accuracy", justify="right")
    sweep_table.add_column("Reject Rate", justify="right")
    sweep_table.add_column("Coverage", justify="right")
    sweep_table.add_column("Macro F1", justify="right")

    for row in result.sweep:
        marker = " *" if row["threshold"] == result.best_threshold else ""
        sweep_table.add_row(
            f"{row['threshold']:.2f}{marker}",
            f"{row['accuracy_on_accepted']:.4f}",
            f"{row['reject_rate']:.4f}",
            f"{row['coverage']:.4f}",
            f"{row['macro_f1']:.4f}",
        )

    console.print(sweep_table)
    console.print(
        f"\n[bold]Recommended reject threshold:[/bold] {result.best_threshold:.4f}"
    )

    out = Path(out_dir)
    out.mkdir(parents=True, exist_ok=True)
    report_path = out / "reject_tuning.json"
    report_path.write_text(json.dumps(result.model_dump(), indent=2))
    typer.echo(f"Tuning report: {report_path}")

    if write_policy:
        from taskclf.core.inference_policy import (
            build_inference_policy,
            save_inference_policy,
        )

        models_path = Path(models_dir)
        model_path = Path(model_dir)
        cal_store_rel: str | None = None
        cal_method: str | None = None
        if calibrator_store is not None:
            cal_store_path = Path(calibrator_store)
            cal_store_rel = str(cal_store_path.relative_to(models_path.parent))
            store_meta_path = cal_store_path / "store.json"
            if store_meta_path.is_file():
                cal_method = json.loads(store_meta_path.read_text()).get("method")

        policy = build_inference_policy(
            model_dir=str(model_path.relative_to(models_path.parent)),
            model_schema_hash=metadata.schema_hash,
            model_label_set=list(metadata.label_set),
            reject_threshold=result.best_threshold,
            calibrator_store_dir=cal_store_rel,
            calibration_method=cal_method,
            source="tune-reject",
        )
        policy_path = save_inference_policy(policy, models_path)
        typer.echo(f"Inference policy: {policy_path}")

train_calibrate_cmd(model_dir=typer.Option(..., '--model-dir', help='Path to a model run directory'), date_from=typer.Option(..., '--from', help='Start date (YYYY-MM-DD)'), date_to=typer.Option(..., '--to', help='End date (YYYY-MM-DD, inclusive)'), synthetic=typer.Option(False, '--synthetic', help='Generate dummy features + labels'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'), out=typer.Option(DEFAULT_OUT_DIR + '/calibrator_store', '--out', help='Output directory for calibrator store'), method=typer.Option(DEFAULT_CALIBRATION_METHOD, '--method', help="Calibration method: 'temperature' or 'isotonic'"), min_windows=typer.Option(DEFAULT_MIN_LABELED_WINDOWS, '--min-windows', help='Minimum labeled windows for per-user calibration'), min_days=typer.Option(DEFAULT_MIN_LABELED_DAYS, '--min-days', help='Minimum distinct days for per-user calibration'), min_labels=typer.Option(DEFAULT_MIN_DISTINCT_LABELS, '--min-labels', help='Minimum distinct core labels for per-user calibration'))

Fit per-user probability calibrators and save a calibrator store.

Source code in src/taskclf/cli/main.py
@train_app.command("calibrate")
def train_calibrate_cmd(
    model_dir: str = typer.Option(
        ..., "--model-dir", help="Path to a model run directory"
    ),
    date_from: str = typer.Option(..., "--from", help="Start date (YYYY-MM-DD)"),
    date_to: str = typer.Option(..., "--to", help="End date (YYYY-MM-DD, inclusive)"),
    synthetic: bool = typer.Option(
        False, "--synthetic", help="Generate dummy features + labels"
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
    out: str = typer.Option(
        DEFAULT_OUT_DIR + "/calibrator_store",
        "--out",
        help="Output directory for calibrator store",
    ),
    method: str = typer.Option(
        DEFAULT_CALIBRATION_METHOD,
        "--method",
        help="Calibration method: 'temperature' or 'isotonic'",
    ),
    min_windows: int = typer.Option(
        DEFAULT_MIN_LABELED_WINDOWS,
        "--min-windows",
        help="Minimum labeled windows for per-user calibration",
    ),
    min_days: int = typer.Option(
        DEFAULT_MIN_LABELED_DAYS,
        "--min-days",
        help="Minimum distinct days for per-user calibration",
    ),
    min_labels: int = typer.Option(
        DEFAULT_MIN_DISTINCT_LABELS,
        "--min-labels",
        help="Minimum distinct core labels for per-user calibration",
    ),
) -> None:
    """Fit per-user probability calibrators and save a calibrator store."""
    import pandas as pd
    from rich.console import Console
    from rich.table import Table

    from taskclf.core.model_io import load_model_bundle
    from taskclf.core.store import read_parquet
    from taskclf.features.build import generate_dummy_features
    from taskclf.infer.calibration import save_calibrator_store
    from taskclf.labels.projection import project_blocks_to_windows
    from taskclf.labels.store import generate_dummy_labels
    from taskclf.train.calibrate import fit_calibrator_store
    from taskclf.train.dataset import split_by_time

    console = Console()

    model, metadata, cat_encoders = load_model_bundle(Path(model_dir))
    typer.echo(f"Loaded model from {model_dir} (schema={metadata.schema_hash})")

    start = dt.date.fromisoformat(date_from)
    end = dt.date.fromisoformat(date_to)

    all_features: list[pd.DataFrame] = []
    all_labels: list = []
    current = start

    if not synthetic:
        all_labels = _load_real_labels(data_dir, start, end)
        typer.echo(f"Loaded {len(all_labels)} label spans from disk")
        if not all_labels:
            typer.echo(
                "WARNING: no label spans found — labeled dataset will be empty",
                err=True,
            )

    while current <= end:
        if synthetic:
            rows = generate_dummy_features(current, n_rows=60)
            df = pd.DataFrame([r.model_dump() for r in rows])
            labels = generate_dummy_labels(current, n_rows=60)
            all_labels.extend(labels)
        else:
            parquet_path = _feature_parquet_for_date(Path(data_dir), current)
            if parquet_path is None or not parquet_path.exists():
                typer.echo(f"  skipping {current} (no features file)")
                current += dt.timedelta(days=1)
                continue
            df = read_parquet(parquet_path)

        all_features.append(df)
        current += dt.timedelta(days=1)

    if not all_features:
        typer.echo("No feature data found for the given date range.", err=True)
        raise typer.Exit(code=1)

    features_df = pd.concat(all_features, ignore_index=True)
    labeled_df = project_blocks_to_windows(features_df, all_labels)

    if labeled_df.empty:
        typer.echo("No labeled rows — cannot calibrate.", err=True)
        raise typer.Exit(code=1)

    splits = split_by_time(labeled_df)
    val_df = labeled_df.iloc[splits["val"]].reset_index(drop=True)

    if val_df.empty:
        typer.echo("Validation split is empty — using full dataset.", err=True)
        val_df = labeled_df

    typer.echo(f"Fitting {method} calibrators on {len(val_df)} validation rows")

    cal_method: Literal["temperature", "isotonic"] = (
        "isotonic" if method == "isotonic" else "temperature"
    )
    store, eligibility = fit_calibrator_store(
        model,
        val_df,
        cat_encoders=cat_encoders,
        method=cal_method,
        min_windows=min_windows,
        min_days=min_days,
        min_labels=min_labels,
        model_bundle_id=Path(model_dir).name,
        model_schema_hash=metadata.schema_hash,
    )

    # Eligibility table
    elig_table = Table(title="Per-User Calibration Eligibility")
    elig_table.add_column("User", style="bold")
    elig_table.add_column("Windows", justify="right")
    elig_table.add_column("Days", justify="right")
    elig_table.add_column("Labels", justify="right")
    elig_table.add_column("Eligible")

    for e in eligibility:
        status = "[green]YES[/green]" if e.is_eligible else "[dim]no[/dim]"
        elig_table.add_row(
            e.user_id[:16],
            str(e.labeled_windows),
            str(e.labeled_days),
            str(e.distinct_labels),
            status,
        )
    console.print(elig_table)

    out_path = Path(out)
    save_calibrator_store(store, out_path)

    n_per_user = len(store.user_calibrators)
    typer.echo(
        f"Calibrator store saved to {out_path} "
        f"(global + {n_per_user} per-user calibrators, method={cal_method})"
    )

train_retrain_cmd(config=typer.Option(None, '--config', help='Path to retrain YAML config'), date_from=typer.Option(None, '--from', help='Start date (YYYY-MM-DD); defaults to lookback from today'), date_to=typer.Option(None, '--to', help='End date (YYYY-MM-DD, inclusive); defaults to today'), synthetic=typer.Option(False, '--synthetic', help='Generate dummy features + labels'), models_dir=typer.Option(DEFAULT_MODELS_DIR, help='Base directory for model bundles (derived from TASKCLF_HOME)'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'), out_dir=typer.Option(DEFAULT_OUT_DIR, help='Output directory for evaluation artifacts'), force=typer.Option(False, '--force', help='Skip cadence check and retrain immediately'), dry_run=typer.Option(False, '--dry-run', help='Evaluate but do not promote'), holdout_fraction=typer.Option(0.0, '--holdout-fraction', help='Fraction of users held out for test'), reject_threshold=typer.Option(DEFAULT_REJECT_THRESHOLD, '--reject-threshold', help='Reject threshold'), write_policy=typer.Option(False, '--write-policy/--no-write-policy', help='Write an inference policy on promotion'))

Run the full retrain pipeline: train, evaluate, gate-check, promote.

Source code in src/taskclf/cli/main.py
@train_app.command("retrain")
def train_retrain_cmd(
    config: str | None = typer.Option(
        None, "--config", help="Path to retrain YAML config"
    ),
    date_from: str | None = typer.Option(
        None, "--from", help="Start date (YYYY-MM-DD); defaults to lookback from today"
    ),
    date_to: str | None = typer.Option(
        None, "--to", help="End date (YYYY-MM-DD, inclusive); defaults to today"
    ),
    synthetic: bool = typer.Option(
        False, "--synthetic", help="Generate dummy features + labels"
    ),
    models_dir: str = typer.Option(
        DEFAULT_MODELS_DIR,
        help="Base directory for model bundles (derived from TASKCLF_HOME)",
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
    out_dir: str = typer.Option(
        DEFAULT_OUT_DIR, help="Output directory for evaluation artifacts"
    ),
    force: bool = typer.Option(
        False, "--force", help="Skip cadence check and retrain immediately"
    ),
    dry_run: bool = typer.Option(
        False, "--dry-run", help="Evaluate but do not promote"
    ),
    holdout_fraction: float = typer.Option(
        0.0, "--holdout-fraction", help="Fraction of users held out for test"
    ),
    reject_threshold: float = typer.Option(
        DEFAULT_REJECT_THRESHOLD, "--reject-threshold", help="Reject threshold"
    ),
    write_policy: bool = typer.Option(
        False,
        "--write-policy/--no-write-policy",
        help="Write an inference policy on promotion",
    ),
) -> None:
    """Run the full retrain pipeline: train, evaluate, gate-check, promote."""
    import pandas as pd
    from rich.console import Console
    from rich.table import Table

    from taskclf.core.store import read_parquet
    from taskclf.features.build import generate_dummy_features
    from taskclf.labels.store import generate_dummy_labels
    from taskclf.train.retrain import (
        RetrainConfig,
        load_retrain_config,
        run_retrain_pipeline,
    )

    console = Console()

    retrain_config = load_retrain_config(Path(config)) if config else RetrainConfig()

    if date_to is not None:
        end = dt.date.fromisoformat(date_to)
    else:
        end = dt.date.today()

    if date_from is not None:
        start = dt.date.fromisoformat(date_from)
    else:
        start = end - dt.timedelta(days=retrain_config.data_lookback_days)

    all_features: list[pd.DataFrame] = []
    all_labels: list = []
    current = start

    if not synthetic:
        all_labels = _load_real_labels(data_dir, start, end)
        typer.echo(f"Loaded {len(all_labels)} label spans from disk")
        if not all_labels:
            typer.echo(
                "WARNING: no label spans found — labeled dataset will be empty",
                err=True,
            )

    while current <= end:
        if synthetic:
            rows = generate_dummy_features(current, n_rows=60)
            df = pd.DataFrame([r.model_dump() for r in rows])
            labels = generate_dummy_labels(current, n_rows=60)
            all_labels.extend(labels)
        else:
            parquet_path = _feature_parquet_for_date(Path(data_dir), current)
            if parquet_path is None or not parquet_path.exists():
                typer.echo(f"  skipping {current} (no features file)")
                current += dt.timedelta(days=1)
                continue
            df = read_parquet(parquet_path)

        all_features.append(df)
        current += dt.timedelta(days=1)

    if not all_features:
        typer.echo("No feature data found for the given date range.", err=True)
        raise typer.Exit(code=1)

    features_df = pd.concat(all_features, ignore_index=True)
    typer.echo(f"Loaded {len(features_df)} feature rows ({start} to {end})")

    result = run_retrain_pipeline(
        retrain_config,
        features_df,
        all_labels,
        models_dir=Path(models_dir),
        out_dir=Path(out_dir),
        force=force,
        dry_run=dry_run,
        holdout_user_fraction=holdout_fraction,
        reject_threshold=reject_threshold,
        data_provenance="synthetic" if synthetic else "real",
    )

    # Display results
    summary = Table(title="Retrain Result")
    summary.add_column("Metric", style="bold")
    summary.add_column("Value")
    summary.add_row("Dataset hash", result.dataset_snapshot.dataset_hash)
    summary.add_row("Dataset rows", str(result.dataset_snapshot.row_count))
    if result.champion_macro_f1 is not None:
        summary.add_row("Champion macro-F1", f"{result.champion_macro_f1:.4f}")
    summary.add_row("Challenger macro-F1", f"{result.challenger_macro_f1:.4f}")
    summary.add_row("Promoted", str(result.promoted))
    summary.add_row("Reason", result.reason)
    summary.add_row("Run dir", result.run_dir)
    console.print(summary)

    if result.regression is not None:
        gate_table = Table(title="Regression Gates")
        gate_table.add_column("Gate", style="bold")
        gate_table.add_column("Result")
        gate_table.add_column("Detail")
        for gate in result.regression.gates:
            status = "[green]PASS[/green]" if gate.passed else "[red]FAIL[/red]"
            gate_table.add_row(gate.name, status, gate.detail)
        console.print(gate_table)

    if result.promoted:
        console.print("[bold green]Model promoted successfully.[/bold green]")
        if write_policy and result.run_dir:
            import json

            from taskclf.core.inference_policy import (
                build_inference_policy,
                save_inference_policy,
            )

            run_path = Path(result.run_dir)
            models_path = Path(models_dir)
            meta_raw = json.loads((run_path / "metadata.json").read_text())
            policy = build_inference_policy(
                model_dir=str(run_path.relative_to(models_path.parent)),
                model_schema_hash=meta_raw["schema_hash"],
                model_label_set=meta_raw["label_set"],
                reject_threshold=reject_threshold,
                source="retrain",
            )
            policy_path = save_inference_policy(policy, models_path)
            console.print(f"Inference policy: {policy_path}")
    else:
        console.print(f"[bold yellow]Model not promoted: {result.reason}[/bold yellow]")

train_check_retrain_cmd(config=typer.Option(None, '--config', help='Path to retrain YAML config'), models_dir=typer.Option(DEFAULT_MODELS_DIR, help='Base directory for model bundles (derived from TASKCLF_HOME)'), calibrator_store=typer.Option(None, '--calibrator-store', help='Path to calibrator store directory'))

Check whether retraining or calibrator update is due (read-only).

Source code in src/taskclf/cli/main.py
@train_app.command("check-retrain")
def train_check_retrain_cmd(
    config: str | None = typer.Option(
        None, "--config", help="Path to retrain YAML config"
    ),
    models_dir: str = typer.Option(
        DEFAULT_MODELS_DIR,
        help="Base directory for model bundles (derived from TASKCLF_HOME)",
    ),
    calibrator_store: str | None = typer.Option(
        None, "--calibrator-store", help="Path to calibrator store directory"
    ),
) -> None:
    """Check whether retraining or calibrator update is due (read-only)."""
    from rich.console import Console
    from rich.table import Table

    from taskclf.train.retrain import (
        RetrainConfig,
        check_calibrator_update_due,
        check_retrain_due,
        find_latest_model,
        load_retrain_config,
    )

    console = Console()

    retrain_config = load_retrain_config(Path(config)) if config else RetrainConfig()

    table = Table(title="Retrain Status")
    table.add_column("Check", style="bold")
    table.add_column("Status")
    table.add_column("Detail")

    latest = find_latest_model(Path(models_dir))
    retrain_due = check_retrain_due(
        Path(models_dir),
        retrain_config.global_retrain_cadence_days,
    )

    if latest is not None:
        import json

        raw = json.loads((latest / "metadata.json").read_text())
        created = raw.get("created_at", "unknown")
        table.add_row("Latest model", str(latest.name), f"created {created}")
    else:
        table.add_row("Latest model", "none", "no models found")

    status = "[red]DUE[/red]" if retrain_due else "[green]OK[/green]"
    table.add_row(
        "Global retrain",
        status,
        f"cadence={retrain_config.global_retrain_cadence_days}d",
    )

    if calibrator_store is not None:
        cal_due = check_calibrator_update_due(
            Path(calibrator_store),
            retrain_config.calibrator_update_cadence_days,
        )
        cal_status = "[red]DUE[/red]" if cal_due else "[green]OK[/green]"
        table.add_row(
            "Calibrator update",
            cal_status,
            f"cadence={retrain_config.calibrator_update_cadence_days}d",
        )

    console.print(table)

train_list_cmd(models_dir=typer.Option(DEFAULT_MODELS_DIR, help='Base directory for model bundles (derived from TASKCLF_HOME)'), sort=typer.Option('macro_f1', '--sort', help='Sort column: macro_f1|weighted_f1|created_at'), eligible_only=typer.Option(False, '--eligible', help='Show only eligible bundles (compatible schema + label set)'), schema_hash=typer.Option(None, '--schema-hash', help='Filter to bundles matching this schema hash (default: current runtime hash)'), json_output=typer.Option(False, '--json', help='Output as JSON instead of a table'))

List model bundles with ranking metrics and status.

Source code in src/taskclf/cli/main.py
@train_app.command("list")
def train_list_cmd(
    models_dir: str = typer.Option(
        DEFAULT_MODELS_DIR,
        help="Base directory for model bundles (derived from TASKCLF_HOME)",
    ),
    sort: str = typer.Option(
        "macro_f1", "--sort", help="Sort column: macro_f1|weighted_f1|created_at"
    ),
    eligible_only: bool = typer.Option(
        False,
        "--eligible",
        help="Show only eligible bundles (compatible schema + label set)",
    ),
    schema_hash: str | None = typer.Option(
        None,
        "--schema-hash",
        help="Filter to bundles matching this schema hash (default: current runtime hash)",
    ),
    json_output: bool = typer.Option(
        False, "--json", help="Output as JSON instead of a table"
    ),
) -> None:
    """List model bundles with ranking metrics and status."""
    import json

    from rich.console import Console
    from rich.table import Table

    from taskclf.core.types import LABEL_SET_V1
    from taskclf.model_registry import (
        SelectionPolicy,
        is_compatible,
        list_bundles,
        passes_constraints,
        read_active,
    )

    console = Console()
    models_path = Path(models_dir)
    policy = SelectionPolicy()
    req_hash = schema_hash

    bundles = list_bundles(models_path)
    if not bundles:
        console.print("[yellow]No model bundles found.[/yellow]")
        raise typer.Exit()

    pointer = read_active(models_path)
    active_rel = pointer.model_dir if pointer else None

    def _per_class_precision(
        cm: list[list[int]],
        label_names: list[str],
    ) -> dict[str, float]:
        n = len(label_names)
        result: dict[str, float] = {}
        for j in range(n):
            col_sum = sum(cm[i][j] for i in range(n))
            result[label_names[j]] = cm[j][j] / col_sum if col_sum > 0 else 0.0
        return result

    rows: list[dict[str, object]] = []
    for b in bundles:
        if req_hash is not None:
            compat = is_compatible(b, req_hash, LABEL_SET_V1)
        elif b.valid and b.metadata is not None:
            compat = is_compatible(
                b,
                get_feature_schema(b.metadata.schema_version).SCHEMA_HASH,
                LABEL_SET_V1,
            )
        else:
            compat = False
        elig = compat and passes_constraints(b, policy)
        if eligible_only and not elig:
            continue

        is_active = False
        if active_rel is not None:
            active_path = (models_path / active_rel).resolve()
            is_active = b.path.resolve() == active_path

        row: dict[str, object] = {
            "model_id": b.model_id,
            "created_at": b.metadata.created_at if b.metadata else None,
            "schema_hash": b.metadata.schema_hash if b.metadata else None,
            "macro_f1": None,
            "weighted_f1": None,
            "bi_prec": None,
            "min_prec": None,
            "eligible": elig,
            "active": is_active,
            "notes": b.invalid_reason if not b.valid else None,
        }
        if b.metrics is not None:
            row["macro_f1"] = b.metrics.macro_f1
            row["weighted_f1"] = b.metrics.weighted_f1
            precs = _per_class_precision(
                b.metrics.confusion_matrix, b.metrics.label_names
            )
            row["bi_prec"] = precs.get("BreakIdle")
            row["min_prec"] = min(precs.values()) if precs else None
        rows.append(row)

    sort_keys: dict[str, object] = {
        "macro_f1": lambda r: (r["macro_f1"] is not None, r["macro_f1"] or 0),
        "weighted_f1": lambda r: (r["weighted_f1"] is not None, r["weighted_f1"] or 0),
        "created_at": lambda r: (r["created_at"] is not None, r["created_at"] or ""),
    }
    if sort not in sort_keys:
        console.print(
            f"[red]Unknown --sort value {sort!r}; choose macro_f1|weighted_f1|created_at[/red]"
        )
        raise typer.Exit(code=1)
    rows.sort(key=sort_keys[sort], reverse=True)  # type: ignore[arg-type]

    if json_output:
        console.print_json(json.dumps(rows, default=str))
        raise typer.Exit()

    table = Table(title="Model Bundles")
    table.add_column("Model ID", style="bold")
    table.add_column("Created", style="dim")
    table.add_column("Schema Hash")
    table.add_column("macro F1", justify="right")
    table.add_column("wt F1", justify="right")
    table.add_column("BI Prec", justify="right")
    table.add_column("Min Prec", justify="right")
    table.add_column("Eligible")
    table.add_column("Active")
    table.add_column("Notes")

    def _fmt(v: object, decimals: int = 4) -> str:
        if v is None:
            return "--"
        if isinstance(v, float):
            return f"{v:.{decimals}f}"
        return str(v)

    for r in rows:
        active_marker = "*" if r["active"] else ""
        elig_marker = "yes" if r["eligible"] else "no"
        table.add_row(
            str(r["model_id"]),
            _fmt(r["created_at"]),
            _fmt(r["schema_hash"]),
            _fmt(r["macro_f1"]),
            _fmt(r["weighted_f1"]),
            _fmt(r["bi_prec"]),
            _fmt(r["min_prec"]),
            elig_marker,
            active_marker,
            _fmt(r["notes"]),
        )

    console.print(table)

model_set_active_cmd(model_id=typer.Option(..., '--model-id', help='Model bundle directory name (under models/)'), models_dir=typer.Option(DEFAULT_MODELS_DIR, '--models-dir', help='Base directory for model bundles (derived from TASKCLF_HOME)'))

Manually set the active model pointer (rollback / override).

Source code in src/taskclf/cli/main.py
@model_app.command("set-active")
def model_set_active_cmd(
    model_id: str = typer.Option(
        ..., "--model-id", help="Model bundle directory name (under models/)"
    ),
    models_dir: str = typer.Option(
        DEFAULT_MODELS_DIR,
        "--models-dir",
        help="Base directory for model bundles (derived from TASKCLF_HOME)",
    ),
) -> None:
    """Manually set the active model pointer (rollback / override)."""
    from rich.console import Console

    from taskclf.core.types import LABEL_SET_V1
    from taskclf.model_registry import (
        SelectionPolicy,
        is_compatible,
        list_bundles,
        write_active_atomic,
    )

    console = Console()
    models_path = Path(models_dir)
    bundle_path = models_path / model_id

    if not bundle_path.is_dir():
        console.print(f"[red]Bundle directory not found: {bundle_path}[/red]")
        raise typer.Exit(code=1)

    bundles = list_bundles(models_path)
    bundle = next((b for b in bundles if b.model_id == model_id), None)

    if bundle is None:
        console.print(f"[red]Could not parse bundle: {model_id}[/red]")
        raise typer.Exit(code=1)

    if not bundle.valid:
        console.print(f"[red]Bundle is invalid: {bundle.invalid_reason}[/red]")
        raise typer.Exit(code=1)

    metadata = bundle.metadata
    if metadata is None:
        console.print(f"[red]Bundle metadata is missing: {model_id}[/red]")
        raise typer.Exit(code=1)

    if not is_compatible(
        bundle,
        get_feature_schema(metadata.schema_version).SCHEMA_HASH,
        LABEL_SET_V1,
    ):
        console.print("[red]Bundle is incompatible with current schema/labels[/red]")
        raise typer.Exit(code=1)

    policy = SelectionPolicy()
    pointer = write_active_atomic(
        models_path, bundle, policy, reason="manual set-active"
    )

    macro_f1 = bundle.metrics.macro_f1 if bundle.metrics else "N/A"
    console.print(
        f"[green]Active model set to {model_id} "
        f"(macro_f1={macro_f1}, at={pointer.selected_at})[/green]"
    )

    from taskclf.core.inference_policy import load_inference_policy

    if load_inference_policy(models_path) is not None:
        console.print(
            "[yellow]Note: an inference_policy.json exists and takes "
            "precedence over active.json for inference.  Use "
            "'taskclf policy create' to update it, or "
            "'taskclf policy remove' to revert to active.json.[/yellow]"
        )

model_inspect_cmd(model_dir=typer.Option(..., '--model-dir', help='Path to a model run directory'), json_output=typer.Option(False, '--json', help='Output structured inspection as JSON'), date_from=typer.Option(None, '--from', help='Start date for held-out test replay (YYYY-MM-DD); requires --to'), date_to=typer.Option(None, '--to', help='End date for held-out test replay (inclusive); requires --from'), data_dir=typer.Option(None, '--data-dir', help=f'Processed data directory for replay (default when omitted: {DEFAULT_DATA_DIR}); not needed with --synthetic'), synthetic=typer.Option(False, '--synthetic', help='Replay on dummy features + labels (no disk features required)'), holdout_fraction=typer.Option(0.0, '--holdout-fraction', help='Fraction of users held out for unseen-user evaluation (replay)'), reject_threshold=typer.Option(DEFAULT_REJECT_THRESHOLD, '--reject-threshold', help='Max-probability below which prediction is rejected (replay)'))

Inspect a model bundle and optionally replay held-out test evaluation.

Source code in src/taskclf/cli/main.py
@model_app.command("inspect")
def model_inspect_cmd(
    model_dir: str = typer.Option(
        ..., "--model-dir", help="Path to a model run directory"
    ),
    json_output: bool = typer.Option(
        False, "--json", help="Output structured inspection as JSON"
    ),
    date_from: str | None = typer.Option(
        None,
        "--from",
        help="Start date for held-out test replay (YYYY-MM-DD); requires --to",
    ),
    date_to: str | None = typer.Option(
        None,
        "--to",
        help="End date for held-out test replay (inclusive); requires --from",
    ),
    data_dir: str | None = typer.Option(
        None,
        "--data-dir",
        help="Processed data directory for replay (default when omitted: "
        f"{DEFAULT_DATA_DIR}); not needed with --synthetic",
    ),
    synthetic: bool = typer.Option(
        False,
        "--synthetic",
        help="Replay on dummy features + labels (no disk features required)",
    ),
    holdout_fraction: float = typer.Option(
        0.0,
        "--holdout-fraction",
        help="Fraction of users held out for unseen-user evaluation (replay)",
    ),
    reject_threshold: float = typer.Option(
        DEFAULT_REJECT_THRESHOLD,
        "--reject-threshold",
        help="Max-probability below which prediction is rejected (replay)",
    ),
) -> None:
    """Inspect a model bundle and optionally replay held-out test evaluation."""
    import json

    from rich.console import Console
    from rich.table import Table

    from taskclf.model_inspection import inspect_model

    console = Console()

    if (date_from is None) ^ (date_to is None):
        console.print(
            "[red]Both --from and --to are required for test replay, or omit both "
            "for bundle-only inspection.[/red]"
        )
        raise typer.Exit(code=1)

    start = dt.date.fromisoformat(date_from) if date_from else None
    end = dt.date.fromisoformat(date_to) if date_to else None
    dd = Path(data_dir) if data_dir is not None else None
    if dd is None and not synthetic and start is not None and end is not None:
        dd = Path(DEFAULT_DATA_DIR)

    try:
        result = inspect_model(
            model_dir,
            date_from=start,
            date_to=end,
            data_dir=dd,
            synthetic=synthetic,
            holdout_fraction=holdout_fraction,
            reject_threshold=reject_threshold,
        )
    except (FileNotFoundError, ValueError, OSError) as exc:
        console.print(f"[red]{exc}[/red]")
        raise typer.Exit(code=1) from exc

    if json_output:
        console.print_json(json.dumps(result.model_dump(), default=str))
        raise typer.Exit(code=0)

    console.print("[bold]Model bundle[/bold]")
    console.print(f"  Path: {result.bundle_path}")
    meta = result.metadata
    console.print(
        f"  Schema: {meta.get('schema_version')}  hash={meta.get('schema_hash')}"
    )
    console.print(
        f"  Train dates: {meta.get('train_date_from')} .. {meta.get('train_date_to')}"
    )
    console.print(f"  Dataset hash: {meta.get('dataset_hash')}")
    console.print(f"  Git commit: {meta.get('git_commit')}")

    bs = result.bundle_saved_validation
    console.print("\n[bold]Bundle-saved validation metrics[/bold]")
    console.print(
        "  [dim](from metrics.json — computed on the validation split at train time, "
        "not the held-out test split)[/dim]"
    )
    console.print(f"  Macro F1: {bs.macro_f1:.4f}  Weighted F1: {bs.weighted_f1:.4f}")

    pc_table = Table(title="Per-class (derived from saved confusion matrix)")
    pc_table.add_column("Class", style="bold")
    pc_table.add_column("P", justify="right")
    pc_table.add_column("R", justify="right")
    pc_table.add_column("F1", justify="right")
    pc_table.add_column("Supp", justify="right")
    for cls in bs.label_names:
        m = bs.per_class_derived.get(cls, {})
        pc_table.add_row(
            cls,
            f"{m.get('precision', 0):.4f}",
            f"{m.get('recall', 0):.4f}",
            f"{m.get('f1', 0):.4f}",
            str(int(m.get("support", 0))),
        )
    console.print(pc_table)

    if bs.top_confusion_pairs:
        tcp = Table(title="Top confusion pairs (validation, off-diagonal)")
        tcp.add_column("True", style="bold")
        tcp.add_column("Predicted", style="bold")
        tcp.add_column("Count", justify="right")
        for row in bs.top_confusion_pairs[:15]:
            tcp.add_row(
                str(row.get("true_label", "")),
                str(row.get("pred_label", "")),
                str(int(row.get("count", 0))),
            )
        console.print(tcp)

    pl = result.prediction_logic
    console.print("\n[bold]Prediction / metrics logic[/bold]")
    console.print(f"  Problem: {pl.problem_type} (multilabel={pl.multilabel})")
    console.print(f"  {pl.lightgbm_outputs}")
    console.print(f"  {pl.argmax_rule}")
    console.print(f"  {pl.evaluation_reject}")
    console.print(
        "  Code: " + ", ".join(f"{k}={v}" for k, v in pl.code_references.items())
    )

    if result.replayed_test_evaluation is not None:
        rt = result.replayed_test_evaluation
        console.print("\n[bold]Replayed held-out test evaluation[/bold]")
        console.print(
            f"  Rows: {rt.test_row_count}  holdout users: {len(rt.holdout_users)}"
        )
        console.print(
            f"  Dates: {rt.date_from} .. {rt.date_to}  synthetic={rt.synthetic}"
        )
        console.print(f"  data_dir={rt.data_dir}")
        dist_table = Table(title="Test set class distribution")
        dist_table.add_column("Class", style="bold")
        dist_table.add_column("Count", justify="right")
        dist_table.add_column("Fraction", justify="right")
        for cls, d in sorted(rt.test_class_distribution.items()):
            dist_table.add_row(
                cls,
                str(int(d.get("count", 0))),
                f"{float(d.get('fraction', 0)):.4f}",
            )
        console.print(dist_table)
        rep = rt.report
        console.print(
            f"  Macro F1: {rep['macro_f1']:.4f}  Weighted F1: {rep['weighted_f1']:.4f}  "
            f"Reject rate: {rep['reject_rate']:.4f}"
        )
        if rep.get("seen_user_f1") is not None:
            console.print(f"  Seen-user F1: {rep['seen_user_f1']:.4f}")
        if rep.get("unseen_user_f1") is not None:
            console.print(f"  Unseen-user F1: {rep['unseen_user_f1']:.4f}")

        console.print(
            "\n[bold]Calibration / probability scores (test, raw proba)[/bold]"
        )
        console.print(
            f"  ECE: {rep.get('expected_calibration_error', 0):.4f}  "
            f"Brier: {rep.get('multiclass_brier_score', 0):.4f}  "
            f"Log loss: {rep.get('multiclass_log_loss', 0):.4f}"
        )

        unk = rep.get("unknown_category_rates") or {}
        if unk.get("per_column"):
            console.print(
                "\n[bold]Unknown-category rate (raw value not in encoder vocab)[/bold]"
            )
            for col, rate in sorted(unk["per_column"].items()):
                console.print(f"  {col}: {rate:.4f}")
            if unk.get("overall_rate") is not None:
                console.print(f"  (mean across columns: {unk['overall_rate']:.4f})")

        top_rep = rep.get("top_confusion_pairs") or []
        if top_rep:
            trt = Table(title="Top confusion pairs (test)")
            trt.add_column("True", style="bold")
            trt.add_column("Predicted", style="bold")
            trt.add_column("Count", justify="right")
            for row in top_rep[:15]:
                trt.add_row(
                    str(row.get("true_label", "")),
                    str(row.get("pred_label", "")),
                    str(int(row.get("count", 0))),
                )
            console.print(trt)

        sm = rep.get("slice_metrics") or {}
        if sm:
            console.print("\n[bold]Slice metrics (top groups per column)[/bold]")
            for col, groups in sm.items():
                st = Table(title=f"Slice: {col}")
                st.add_column("Value", style="bold")
                st.add_column("n", justify="right")
                st.add_column("macro F1", justify="right")
                st.add_column("wt F1", justify="right")
                st.add_column("rej", justify="right")
                for gkey, g in list(groups.items())[:12]:
                    st.add_row(
                        gkey[:48] + ("…" if len(gkey) > 48 else ""),
                        str(int(g.get("row_count", 0))),
                        f"{float(g.get('macro_f1', 0)):.4f}",
                        f"{float(g.get('weighted_f1', 0)):.4f}",
                        f"{float(g.get('reject_rate', 0)):.4f}",
                    )
                console.print(st)
    elif result.replay_error:
        console.print("\n[yellow]Test replay skipped or failed[/yellow]")
        console.print(f"  {result.replay_error}")
    elif start is None:
        console.print(
            "\n[dim]Tip: pass --from and --to (and --data-dir) to replay held-out "
            "test metrics and class distribution.[/dim]"
        )

policy_show_cmd(models_dir=typer.Option(DEFAULT_MODELS_DIR, '--models-dir', help='Base directory for model bundles'))

Print the current inference policy or report that none exists.

Source code in src/taskclf/cli/main.py
@policy_app.command("show")
def policy_show_cmd(
    models_dir: str = typer.Option(
        DEFAULT_MODELS_DIR,
        "--models-dir",
        help="Base directory for model bundles",
    ),
) -> None:
    """Print the current inference policy or report that none exists."""
    from rich.console import Console

    from taskclf.core.inference_policy import load_inference_policy

    console = Console()
    policy = load_inference_policy(Path(models_dir))

    if policy is None:
        console.print("[yellow]No inference policy found.[/yellow]")
        console.print(f"  Looked in {Path(models_dir) / 'inference_policy.json'}")
        raise typer.Exit(code=0)

    console.print("[bold]Inference Policy[/bold]")
    console.print(f"  Policy version:  {policy.policy_version}")
    console.print(f"  Model dir:       {policy.model_dir}")
    console.print(f"  Schema hash:     {policy.model_schema_hash}")
    console.print(f"  Label set:       {policy.model_label_set}")
    console.print(f"  Reject threshold:{policy.reject_threshold:.4f}")
    if policy.calibrator_store_dir:
        console.print(f"  Calibrator store:{policy.calibrator_store_dir}")
        console.print(f"  Calibration:     {policy.calibration_method}")
    else:
        console.print("  Calibrator:      none (identity)")
    console.print(f"  Source:          {policy.source}")
    console.print(f"  Created at:      {policy.created_at}")
    console.print(f"  Git commit:      {policy.git_commit}")

policy_create_cmd(model_dir=typer.Option(..., '--model-dir', help='Path to a model run directory'), reject_threshold=typer.Option(DEFAULT_REJECT_THRESHOLD, '--reject-threshold', help='Reject threshold for this model+calibration pair'), calibrator_store_dir=typer.Option(None, '--calibrator-store', help='Path to calibrator store directory'), models_dir=typer.Option(DEFAULT_MODELS_DIR, '--models-dir', help='Base directory for model bundles'))

Create an inference policy binding model + calibrator + threshold.

Source code in src/taskclf/cli/main.py
@policy_app.command("create")
def policy_create_cmd(
    model_dir: str = typer.Option(
        ..., "--model-dir", help="Path to a model run directory"
    ),
    reject_threshold: float = typer.Option(
        DEFAULT_REJECT_THRESHOLD,
        "--reject-threshold",
        help="Reject threshold for this model+calibration pair",
    ),
    calibrator_store_dir: str | None = typer.Option(
        None,
        "--calibrator-store",
        help="Path to calibrator store directory",
    ),
    models_dir: str = typer.Option(
        DEFAULT_MODELS_DIR,
        "--models-dir",
        help="Base directory for model bundles",
    ),
) -> None:
    """Create an inference policy binding model + calibrator + threshold."""
    import json

    from rich.console import Console

    from taskclf.core.inference_policy import (
        PolicyValidationError,
        build_inference_policy,
        save_inference_policy,
        validate_policy,
    )

    console = Console()
    models_path = Path(models_dir)
    model_path = Path(model_dir)

    meta_path = model_path / "metadata.json"
    if not meta_path.is_file():
        console.print(f"[red]No metadata.json in {model_path}[/red]")
        raise typer.Exit(code=1)

    meta = json.loads(meta_path.read_text())

    cal_store_rel: str | None = None
    cal_method: str | None = None
    if calibrator_store_dir is not None:
        cal_store_path = Path(calibrator_store_dir)
        cal_store_rel = str(cal_store_path.relative_to(models_path.parent))
        store_meta_path = cal_store_path / "store.json"
        if store_meta_path.is_file():
            cal_method = json.loads(store_meta_path.read_text()).get("method")

    policy = build_inference_policy(
        model_dir=str(model_path.relative_to(models_path.parent)),
        model_schema_hash=meta["schema_hash"],
        model_label_set=meta["label_set"],
        reject_threshold=reject_threshold,
        calibrator_store_dir=cal_store_rel,
        calibration_method=cal_method,
        source="manual",
    )

    try:
        validate_policy(policy, models_path)
    except PolicyValidationError as exc:
        console.print(f"[red]Validation failed: {exc}[/red]")
        raise typer.Exit(code=1) from exc

    policy_path = save_inference_policy(policy, models_path)
    console.print(f"[green]Inference policy written to {policy_path}[/green]")

policy_remove_cmd(models_dir=typer.Option(DEFAULT_MODELS_DIR, '--models-dir', help='Base directory for model bundles'))

Remove the inference policy file (falls back to active.json).

Source code in src/taskclf/cli/main.py
@policy_app.command("remove")
def policy_remove_cmd(
    models_dir: str = typer.Option(
        DEFAULT_MODELS_DIR,
        "--models-dir",
        help="Base directory for model bundles",
    ),
) -> None:
    """Remove the inference policy file (falls back to active.json)."""
    from taskclf.core.inference_policy import remove_inference_policy

    removed = remove_inference_policy(Path(models_dir))
    if removed:
        typer.echo("Inference policy removed.")
    else:
        typer.echo("No inference policy to remove.")

taxonomy_validate_cmd(config=typer.Option(..., '--config', help='Path to a taxonomy YAML file'))

Validate a user taxonomy YAML file and report errors.

Source code in src/taskclf/cli/main.py
@taxonomy_app.command("validate")
def taxonomy_validate_cmd(
    config: str = typer.Option(..., "--config", help="Path to a taxonomy YAML file"),
) -> None:
    """Validate a user taxonomy YAML file and report errors."""
    from pydantic import ValidationError

    from taskclf.infer.taxonomy import load_taxonomy

    config_path = Path(config)
    if not config_path.exists():
        typer.echo(f"File not found: {config_path}", err=True)
        raise typer.Exit(code=1)

    try:
        tax = load_taxonomy(config_path)
    except (ValidationError, ValueError) as exc:
        typer.echo(f"Validation failed:\n{exc}", err=True)
        raise typer.Exit(code=1)

    typer.echo(f"Taxonomy valid: {len(tax.buckets)} buckets, version={tax.version}")
    for bucket in tax.buckets:
        typer.echo(f"  {bucket.name}: {', '.join(bucket.core_labels)}")

taxonomy_show_cmd(config=typer.Option(..., '--config', help='Path to a taxonomy YAML file'))

Display a taxonomy mapping as a Rich table.

Source code in src/taskclf/cli/main.py
@taxonomy_app.command("show")
def taxonomy_show_cmd(
    config: str = typer.Option(..., "--config", help="Path to a taxonomy YAML file"),
) -> None:
    """Display a taxonomy mapping as a Rich table."""
    from pydantic import ValidationError
    from rich.console import Console
    from rich.table import Table

    from taskclf.infer.taxonomy import load_taxonomy

    console = Console()
    config_path = Path(config)
    if not config_path.exists():
        typer.echo(f"File not found: {config_path}", err=True)
        raise typer.Exit(code=1)

    try:
        tax = load_taxonomy(config_path)
    except (ValidationError, ValueError) as exc:
        typer.echo(f"Validation failed:\n{exc}", err=True)
        raise typer.Exit(code=1)

    table = Table(title=f"Taxonomy Mapping (v{tax.version})")
    table.add_column("Bucket", style="bold")
    table.add_column("Core Labels")
    table.add_column("Color")
    table.add_column("Description")

    for bucket in tax.buckets:
        table.add_row(
            bucket.name,
            ", ".join(bucket.core_labels),
            bucket.color,
            bucket.description,
        )

    console.print(table)

    info = Table(title="Advanced Settings")
    info.add_column("Setting", style="bold")
    info.add_column("Value")
    info.add_row("Aggregation", tax.advanced.probability_aggregation)
    info.add_row("Min confidence", str(tax.advanced.min_confidence_for_mapping))
    info.add_row("Reject label", tax.reject.mixed_label_name)
    if tax.user_id:
        info.add_row("User ID", tax.user_id)
    console.print(info)

taxonomy_init_cmd(out=typer.Option('configs/user_taxonomy.yaml', '--out', help='Output path for the generated taxonomy YAML'))

Generate a default taxonomy YAML (identity mapping: one bucket per core label).

Source code in src/taskclf/cli/main.py
@taxonomy_app.command("init")
def taxonomy_init_cmd(
    out: str = typer.Option(
        "configs/user_taxonomy.yaml",
        "--out",
        help="Output path for the generated taxonomy YAML",
    ),
) -> None:
    """Generate a default taxonomy YAML (identity mapping: one bucket per core label)."""
    from taskclf.infer.taxonomy import default_taxonomy, save_taxonomy

    out_path = Path(out)
    if out_path.exists():
        typer.echo(f"File already exists: {out_path}", err=True)
        raise typer.Exit(code=1)

    config = default_taxonomy()
    save_taxonomy(config, out_path)
    typer.echo(
        f"Default taxonomy written to {out_path} ({len(config.buckets)} buckets)"
    )

infer_batch_cmd(model_dir=typer.Option(None, '--model-dir', help='Path to a model run directory (auto-resolved from models/ if omitted)'), models_dir=typer.Option(DEFAULT_MODELS_DIR, '--models-dir', help='Base directory for model bundles (used for auto-resolution)'), date_from=typer.Option(..., '--from', help='Start date (YYYY-MM-DD)'), date_to=typer.Option(..., '--to', help='End date (YYYY-MM-DD, inclusive)'), synthetic=typer.Option(False, '--synthetic', help='Generate dummy features instead of reading from disk'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'), out_dir=typer.Option(DEFAULT_OUT_DIR, help='Output directory for predictions and segments'), smooth_window=typer.Option(DEFAULT_SMOOTH_WINDOW, help='Rolling majority smoothing window size'), reject_threshold=typer.Option(DEFAULT_REJECT_THRESHOLD, '--reject-threshold', help='Max-probability below which prediction is rejected as Mixed/Unknown'), taxonomy_config=typer.Option(None, '--taxonomy', help='Path to a taxonomy YAML file for user-specific label mapping'), calibrator_store_dir=typer.Option(None, '--calibrator-store', help='Path to a calibrator store directory for per-user calibration'))

Run batch inference: predict, smooth, and segmentize.

Source code in src/taskclf/cli/main.py
@infer_app.command("batch")
def infer_batch_cmd(
    model_dir: str | None = typer.Option(
        None,
        "--model-dir",
        help="Path to a model run directory (auto-resolved from models/ if omitted)",
    ),
    models_dir: str = typer.Option(
        DEFAULT_MODELS_DIR,
        "--models-dir",
        help="Base directory for model bundles (used for auto-resolution)",
    ),
    date_from: str = typer.Option(..., "--from", help="Start date (YYYY-MM-DD)"),
    date_to: str = typer.Option(..., "--to", help="End date (YYYY-MM-DD, inclusive)"),
    synthetic: bool = typer.Option(
        False,
        "--synthetic",
        help="Generate dummy features instead of reading from disk",
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
    out_dir: str = typer.Option(
        DEFAULT_OUT_DIR, help="Output directory for predictions and segments"
    ),
    smooth_window: int = typer.Option(
        DEFAULT_SMOOTH_WINDOW, help="Rolling majority smoothing window size"
    ),
    reject_threshold: float = typer.Option(
        DEFAULT_REJECT_THRESHOLD,
        "--reject-threshold",
        help="Max-probability below which prediction is rejected as Mixed/Unknown",
    ),
    taxonomy_config: str | None = typer.Option(
        None,
        "--taxonomy",
        help="Path to a taxonomy YAML file for user-specific label mapping",
    ),
    calibrator_store_dir: str | None = typer.Option(
        None,
        "--calibrator-store",
        help="Path to a calibrator store directory for per-user calibration",
    ),
) -> None:
    """Run batch inference: predict, smooth, and segmentize."""
    import pandas as pd

    from taskclf.core.defaults import MIXED_UNKNOWN
    from taskclf.core.metrics import reject_rate
    from taskclf.core.model_io import load_model_bundle
    from taskclf.core.store import read_parquet
    from taskclf.features.build import generate_dummy_features
    from taskclf.infer.batch import (
        run_batch_inference,
        write_predictions_csv,
        write_segments_json,
    )
    from taskclf.infer.resolve import ModelResolutionError, resolve_model_dir

    try:
        resolved_dir = resolve_model_dir(model_dir, Path(models_dir))
    except ModelResolutionError as exc:
        typer.echo(str(exc), err=True)
        raise typer.Exit(code=1) from exc

    taxonomy = None
    if taxonomy_config is not None:
        from taskclf.infer.taxonomy import load_taxonomy

        taxonomy = load_taxonomy(Path(taxonomy_config))
        typer.echo(f"Loaded taxonomy from {taxonomy_config}")

    cal_store = None
    if calibrator_store_dir is not None:
        from taskclf.infer.calibration import load_calibrator_store

        cal_store = load_calibrator_store(Path(calibrator_store_dir))
        typer.echo(
            f"Loaded calibrator store from {calibrator_store_dir} "
            f"({len(cal_store.user_calibrators)} per-user calibrators)"
        )

    model, metadata, cat_encoders = load_model_bundle(resolved_dir)
    typer.echo(f"Loaded model from {resolved_dir} (schema={metadata.schema_hash})")

    start = dt.date.fromisoformat(date_from)
    end = dt.date.fromisoformat(date_to)

    all_features: list[pd.DataFrame] = []
    current = start
    while current <= end:
        if synthetic:
            rows = generate_dummy_features(current, n_rows=60)
            df = pd.DataFrame([r.model_dump() for r in rows])
        else:
            parquet_path = _feature_parquet_for_date(Path(data_dir), current)
            if parquet_path is None or not parquet_path.exists():
                typer.echo(f"  skipping {current} (no features file)")
                current += dt.timedelta(days=1)
                continue
            df = read_parquet(parquet_path)
        all_features.append(df)
        current += dt.timedelta(days=1)

    if not all_features:
        typer.echo("No feature data found for the given date range.", err=True)
        raise typer.Exit(code=1)

    features_df = pd.concat(all_features, ignore_index=True)
    features_df = features_df.sort_values("bucket_start_ts").reset_index(drop=True)
    typer.echo(f"Loaded {len(features_df)} feature rows")

    result = run_batch_inference(
        model,
        features_df,
        cat_encoders=cat_encoders,
        smooth_window=smooth_window,
        reject_threshold=reject_threshold,
        taxonomy=taxonomy,
        calibrator_store=cal_store,
        schema_version=metadata.schema_version,
    )
    rr = reject_rate(result.smoothed_labels, MIXED_UNKNOWN)
    typer.echo(
        f"Predicted {len(result.smoothed_labels)} buckets -> {len(result.segments)} segments "
        f"(reject rate: {rr:.1%})"
    )

    out = Path(out_dir)
    pred_path = write_predictions_csv(
        features_df,
        result.smoothed_labels,
        out / "predictions.csv",
        confidences=result.confidences,
        is_rejected=result.is_rejected,
        mapped_labels=result.mapped_labels,
        core_probs=result.core_probs,
    )
    seg_path = write_segments_json(result.segments, out / "segments.json")

    typer.echo(f"Predictions: {pred_path}")
    typer.echo(f"Segments:    {seg_path}")
    if result.mapped_labels is not None:
        typer.echo(
            "Taxonomy mapping applied: mapped_label column included in predictions"
        )

    # -- Daily report with full data --
    from taskclf.infer.smooth import flap_rate
    from taskclf.report.daily import build_daily_report
    from taskclf.report.export import export_report_json

    app_switch_counts: list[float | int | None] | None = None
    if "app_switch_count_last_5m" in features_df.columns:
        app_switch_counts = list(features_df["app_switch_count_last_5m"].values)

    if result.segments:
        report = build_daily_report(
            result.segments,
            raw_labels=result.raw_labels,
            smoothed_labels=result.smoothed_labels,
            mapped_labels=result.mapped_labels,
            app_switch_counts=app_switch_counts,
        )
        report_path = export_report_json(report, out / f"report_{report.date}.json")
        typer.echo(f"Report:      {report_path}")
        fr_raw = flap_rate(result.raw_labels)
        fr_smooth = flap_rate(result.smoothed_labels)
        typer.echo(f"Flap rate: raw={fr_raw:.4f}  smoothed={fr_smooth:.4f}")

infer_online_cmd(model_dir=typer.Option(None, '--model-dir', help='Path to a model run directory (auto-resolved from models/ if omitted)'), models_dir=typer.Option(DEFAULT_MODELS_DIR, '--models-dir', help='Base directory for model bundles (used for auto-resolution and reload)'), poll_seconds=typer.Option(DEFAULT_POLL_SECONDS, '--poll-seconds', help='Seconds between polling iterations'), aw_host=typer.Option(DEFAULT_AW_HOST, '--aw-host', help='ActivityWatch server URL'), smooth_window=typer.Option(DEFAULT_SMOOTH_WINDOW, '--smooth-window', help='Rolling majority smoothing window size'), title_salt=typer.Option(DEFAULT_TITLE_SALT, '--title-salt', help='Salt for hashing window titles'), out_dir=typer.Option(DEFAULT_OUT_DIR, help='Output directory for predictions and segments'), reject_threshold=typer.Option(DEFAULT_REJECT_THRESHOLD, '--reject-threshold', help='Max-probability below which prediction is rejected as Mixed/Unknown'), taxonomy_config=typer.Option(None, '--taxonomy', help='Path to a taxonomy YAML file for user-specific label mapping'), calibrator_config=typer.Option(None, '--calibrator', help='Path to a calibrator JSON file for probability calibration'), calibrator_store_dir=typer.Option(None, '--calibrator-store', help='Path to a calibrator store directory for per-user calibration'), label_queue=typer.Option(False, '--label-queue/--no-label-queue', help='Auto-enqueue low-confidence buckets for manual labeling'), label_confidence=typer.Option(DEFAULT_LABEL_CONFIDENCE_THRESHOLD, '--label-confidence', help='Confidence threshold below which buckets are enqueued for labeling'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (used for label queue path)'))

Run online inference: poll ActivityWatch, predict, smooth, and report.

Source code in src/taskclf/cli/main.py
@infer_app.command("online")
def infer_online_cmd(
    model_dir: str | None = typer.Option(
        None,
        "--model-dir",
        help="Path to a model run directory (auto-resolved from models/ if omitted)",
    ),
    models_dir: str = typer.Option(
        DEFAULT_MODELS_DIR,
        "--models-dir",
        help="Base directory for model bundles (used for auto-resolution and reload)",
    ),
    poll_seconds: int = typer.Option(
        DEFAULT_POLL_SECONDS,
        "--poll-seconds",
        help="Seconds between polling iterations",
    ),
    aw_host: str = typer.Option(
        DEFAULT_AW_HOST, "--aw-host", help="ActivityWatch server URL"
    ),
    smooth_window: int = typer.Option(
        DEFAULT_SMOOTH_WINDOW,
        "--smooth-window",
        help="Rolling majority smoothing window size",
    ),
    title_salt: str = typer.Option(
        DEFAULT_TITLE_SALT, "--title-salt", help="Salt for hashing window titles"
    ),
    out_dir: str = typer.Option(
        DEFAULT_OUT_DIR, help="Output directory for predictions and segments"
    ),
    reject_threshold: float = typer.Option(
        DEFAULT_REJECT_THRESHOLD,
        "--reject-threshold",
        help="Max-probability below which prediction is rejected as Mixed/Unknown",
    ),
    taxonomy_config: str | None = typer.Option(
        None,
        "--taxonomy",
        help="Path to a taxonomy YAML file for user-specific label mapping",
    ),
    calibrator_config: str | None = typer.Option(
        None,
        "--calibrator",
        help="Path to a calibrator JSON file for probability calibration",
    ),
    calibrator_store_dir: str | None = typer.Option(
        None,
        "--calibrator-store",
        help="Path to a calibrator store directory for per-user calibration",
    ),
    label_queue: bool = typer.Option(
        False,
        "--label-queue/--no-label-queue",
        help="Auto-enqueue low-confidence buckets for manual labeling",
    ),
    label_confidence: float = typer.Option(
        DEFAULT_LABEL_CONFIDENCE_THRESHOLD,
        "--label-confidence",
        help="Confidence threshold below which buckets are enqueued for labeling",
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (used for label queue path)"
    ),
) -> None:
    """Run online inference: poll ActivityWatch, predict, smooth, and report."""
    from taskclf.infer.online import run_online_loop
    from taskclf.infer.resolve import ModelResolutionError, resolve_model_dir

    try:
        resolved_dir = resolve_model_dir(model_dir, Path(models_dir))
    except ModelResolutionError as exc:
        typer.echo(str(exc), err=True)
        raise typer.Exit(code=1) from exc

    queue_path = Path(data_dir) / "labels_v1" / "queue.json" if label_queue else None

    run_online_loop(
        model_dir=resolved_dir,
        models_dir=Path(models_dir),
        aw_host=aw_host,
        poll_seconds=poll_seconds,
        smooth_window=smooth_window,
        title_salt=title_salt,
        out_dir=Path(out_dir),
        reject_threshold=reject_threshold,
        taxonomy_path=Path(taxonomy_config) if taxonomy_config else None,
        calibrator_path=Path(calibrator_config) if calibrator_config else None,
        calibrator_store_path=Path(calibrator_store_dir)
        if calibrator_store_dir
        else None,
        label_queue_path=queue_path,
        label_confidence_threshold=label_confidence,
    )

infer_baseline_cmd(date_from=typer.Option(..., '--from', help='Start date (YYYY-MM-DD)'), date_to=typer.Option(..., '--to', help='End date (YYYY-MM-DD, inclusive)'), synthetic=typer.Option(False, '--synthetic', help='Generate dummy features instead of reading from disk'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'), out_dir=typer.Option(DEFAULT_OUT_DIR, help='Output directory for predictions and segments'), smooth_window=typer.Option(DEFAULT_SMOOTH_WINDOW, help='Rolling majority smoothing window size'))

Run rule-based baseline inference (no ML model required).

Source code in src/taskclf/cli/main.py
@infer_app.command("baseline")
def infer_baseline_cmd(
    date_from: str = typer.Option(..., "--from", help="Start date (YYYY-MM-DD)"),
    date_to: str = typer.Option(..., "--to", help="End date (YYYY-MM-DD, inclusive)"),
    synthetic: bool = typer.Option(
        False,
        "--synthetic",
        help="Generate dummy features instead of reading from disk",
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
    out_dir: str = typer.Option(
        DEFAULT_OUT_DIR, help="Output directory for predictions and segments"
    ),
    smooth_window: int = typer.Option(
        DEFAULT_SMOOTH_WINDOW, help="Rolling majority smoothing window size"
    ),
) -> None:
    """Run rule-based baseline inference (no ML model required)."""
    import pandas as pd

    from taskclf.core.defaults import MIXED_UNKNOWN
    from taskclf.core.store import read_parquet
    from taskclf.features.build import generate_dummy_features
    from taskclf.infer.baseline import run_baseline_inference
    from taskclf.infer.batch import write_predictions_csv, write_segments_json

    start = dt.date.fromisoformat(date_from)
    end = dt.date.fromisoformat(date_to)

    all_features: list[pd.DataFrame] = []
    current = start
    while current <= end:
        if synthetic:
            rows = generate_dummy_features(current, n_rows=60)
            df = pd.DataFrame([r.model_dump() for r in rows])
        else:
            parquet_path = _feature_parquet_for_date(Path(data_dir), current)
            if parquet_path is None or not parquet_path.exists():
                typer.echo(f"  skipping {current} (no features file)")
                current += dt.timedelta(days=1)
                continue
            df = read_parquet(parquet_path)
        all_features.append(df)
        current += dt.timedelta(days=1)

    if not all_features:
        typer.echo("No feature data found for the given date range.", err=True)
        raise typer.Exit(code=1)

    features_df = pd.concat(all_features, ignore_index=True)
    features_df = features_df.sort_values("bucket_start_ts").reset_index(drop=True)
    typer.echo(f"Loaded {len(features_df)} feature rows")

    smoothed_labels, segments = run_baseline_inference(
        features_df,
        smooth_window=smooth_window,
    )

    from taskclf.core.metrics import reject_rate

    rr = reject_rate(smoothed_labels, MIXED_UNKNOWN)
    typer.echo(
        f"Baseline predicted {len(smoothed_labels)} buckets -> "
        f"{len(segments)} segments (reject rate: {rr:.1%})"
    )

    out = Path(out_dir)
    pred_path = write_predictions_csv(
        features_df, smoothed_labels, out / "baseline_predictions.csv"
    )
    seg_path = write_segments_json(segments, out / "baseline_segments.json")

    typer.echo(f"Predictions: {pred_path}")
    typer.echo(f"Segments:    {seg_path}")

infer_compare_cmd(model_dir=typer.Option(..., '--model-dir', help='Path to a model run directory'), date_from=typer.Option(..., '--from', help='Start date (YYYY-MM-DD)'), date_to=typer.Option(..., '--to', help='End date (YYYY-MM-DD, inclusive)'), synthetic=typer.Option(False, '--synthetic', help='Generate dummy features + labels'), data_dir=typer.Option(DEFAULT_DATA_DIR, help='Processed data directory (derived from TASKCLF_HOME)'), out_dir=typer.Option(DEFAULT_OUT_DIR, help='Output directory for comparison report'))

Compare rule baseline vs ML model on labeled data.

Source code in src/taskclf/cli/main.py
@infer_app.command("compare")
def infer_compare_cmd(
    model_dir: str = typer.Option(
        ..., "--model-dir", help="Path to a model run directory"
    ),
    date_from: str = typer.Option(..., "--from", help="Start date (YYYY-MM-DD)"),
    date_to: str = typer.Option(..., "--to", help="End date (YYYY-MM-DD, inclusive)"),
    synthetic: bool = typer.Option(
        False, "--synthetic", help="Generate dummy features + labels"
    ),
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR, help="Processed data directory (derived from TASKCLF_HOME)"
    ),
    out_dir: str = typer.Option(
        DEFAULT_OUT_DIR, help="Output directory for comparison report"
    ),
) -> None:
    """Compare rule baseline vs ML model on labeled data."""
    import json

    import pandas as pd
    from rich.console import Console
    from rich.table import Table

    from taskclf.core.defaults import MIXED_UNKNOWN
    from taskclf.core.metrics import compare_baselines
    from taskclf.core.model_io import load_model_bundle
    from taskclf.core.store import read_parquet
    from taskclf.core.types import LABEL_SET_V1
    from taskclf.features.build import generate_dummy_features
    from taskclf.infer.baseline import predict_baseline
    from taskclf.infer.batch import predict_labels
    from taskclf.labels.projection import project_blocks_to_windows
    from taskclf.labels.store import generate_dummy_labels

    console = Console()
    start = dt.date.fromisoformat(date_from)
    end = dt.date.fromisoformat(date_to)

    all_features: list[pd.DataFrame] = []
    all_labels: list = []
    current = start

    if not synthetic:
        all_labels = _load_real_labels(data_dir, start, end)
        typer.echo(f"Loaded {len(all_labels)} label spans from disk")
        if not all_labels:
            typer.echo(
                "WARNING: no label spans found — labeled dataset will be empty",
                err=True,
            )

    while current <= end:
        if synthetic:
            rows = generate_dummy_features(current, n_rows=60)
            df = pd.DataFrame([r.model_dump() for r in rows])
            labels = generate_dummy_labels(current, n_rows=60)
            all_labels.extend(labels)
        else:
            parquet_path = _feature_parquet_for_date(Path(data_dir), current)
            if parquet_path is None or not parquet_path.exists():
                typer.echo(f"  skipping {current} (no features file)")
                current += dt.timedelta(days=1)
                continue
            df = read_parquet(parquet_path)
        all_features.append(df)
        current += dt.timedelta(days=1)

    if not all_features:
        typer.echo("No feature data found for the given date range.", err=True)
        raise typer.Exit(code=1)

    features_df = pd.concat(all_features, ignore_index=True)
    labeled_df = project_blocks_to_windows(features_df, all_labels)

    if labeled_df.empty:
        typer.echo("No labeled rows — cannot compare.", err=True)
        raise typer.Exit(code=1)

    y_true = list(labeled_df["label"].values)
    typer.echo(f"Comparing on {len(y_true)} labeled windows")

    baseline_preds = predict_baseline(labeled_df)

    from sklearn.preprocessing import LabelEncoder

    model, _meta, cat_encoders = load_model_bundle(Path(model_dir))
    le = LabelEncoder()
    le.fit(sorted(LABEL_SET_V1))
    model_preds = predict_labels(model, labeled_df, le, cat_encoders=cat_encoders)

    label_names = sorted(LABEL_SET_V1)
    results = compare_baselines(
        y_true,
        {"baseline": baseline_preds, "model": model_preds},
        label_names,
        reject_label=MIXED_UNKNOWN,
    )

    summary_table = Table(title="Baseline vs Model Comparison")
    summary_table.add_column("Metric", style="bold")
    summary_table.add_column("Baseline", justify="right")
    summary_table.add_column("Model", justify="right")
    summary_table.add_column("Delta", justify="right")

    bl = results["baseline"]
    ml = results["model"]
    for metric in ("macro_f1", "weighted_f1", "reject_rate"):
        delta = ml[metric] - bl[metric]
        sign = "+" if delta >= 0 else ""
        summary_table.add_row(
            metric,
            f"{bl[metric]:.4f}",
            f"{ml[metric]:.4f}",
            f"{sign}{delta:.4f}",
        )
    console.print(summary_table)

    detail_table = Table(title="Per-Class F1")
    detail_table.add_column("Class", style="bold")
    detail_table.add_column("Baseline F1", justify="right")
    detail_table.add_column("Model F1", justify="right")

    all_labels_list = bl.get("label_names", label_names)
    for cls in all_labels_list:
        bl_f1 = bl["per_class"].get(cls, {}).get("f1", 0.0)
        ml_f1 = ml["per_class"].get(cls, {}).get("f1", 0.0)
        detail_table.add_row(cls, f"{bl_f1:.4f}", f"{ml_f1:.4f}")
    console.print(detail_table)

    out = Path(out_dir)
    out.mkdir(parents=True, exist_ok=True)
    report_path = out / "baseline_vs_model.json"
    report_path.write_text(json.dumps(results, indent=2))
    typer.echo(f"Full comparison report: {report_path}")

report_daily_cmd(segments_file=typer.Option(..., '--segments-file', help='Path to segments.json'), predictions_file=typer.Option(None, '--predictions-file', help='Path to predictions CSV (for flap rates and mapped breakdown)'), features_dir=typer.Option(None, '--features-dir', help='Path to features data dir (for context-switching stats)'), out_dir=typer.Option(DEFAULT_OUT_DIR, help='Output directory for report files'), fmt=typer.Option('json', '--format', help='Output format: json, csv, parquet, or all'))

Generate a daily report from a segments JSON file.

When --predictions-file is provided, flap rates and mapped-label breakdown are included. When --features-dir is provided, context-switching statistics are computed from app_switch_count_last_5m.

Source code in src/taskclf/cli/main.py
@report_app.command("daily")
def report_daily_cmd(
    segments_file: str = typer.Option(
        ..., "--segments-file", help="Path to segments.json"
    ),
    predictions_file: str | None = typer.Option(
        None,
        "--predictions-file",
        help="Path to predictions CSV (for flap rates and mapped breakdown)",
    ),
    features_dir: str | None = typer.Option(
        None,
        "--features-dir",
        help="Path to features data dir (for context-switching stats)",
    ),
    out_dir: str = typer.Option(
        DEFAULT_OUT_DIR, help="Output directory for report files"
    ),
    fmt: str = typer.Option(
        "json", "--format", help="Output format: json, csv, parquet, or all"
    ),
) -> None:
    """Generate a daily report from a segments JSON file.

    When --predictions-file is provided, flap rates and mapped-label
    breakdown are included.  When --features-dir is provided,
    context-switching statistics are computed from app_switch_count_last_5m.
    """
    import pandas as pd
    from rich.console import Console
    from rich.table import Table

    from taskclf.core.store import read_parquet
    from taskclf.infer.batch import read_segments_json
    from taskclf.report.daily import build_daily_report
    from taskclf.report.export import (
        export_report_csv,
        export_report_json,
        export_report_parquet,
    )

    console = Console()

    seg_path = Path(segments_file)
    if not seg_path.exists():
        typer.echo(f"Segments file not found: {seg_path}", err=True)
        raise typer.Exit(code=1)

    segments = read_segments_json(seg_path)
    typer.echo(f"Loaded {len(segments)} segments")

    raw_labels: list[str] | None = None
    smoothed_labels: list[str] | None = None
    mapped_labels: list[str] | None = None

    if predictions_file is not None:
        pred_path = Path(predictions_file)
        if not pred_path.exists():
            typer.echo(f"Predictions file not found: {pred_path}", err=True)
            raise typer.Exit(code=1)
        pred_df = pd.read_csv(pred_path)
        if "predicted_label" in pred_df.columns:
            smoothed_labels = list(pred_df["predicted_label"].values)
        if "core_label" in pred_df.columns:
            raw_labels = list(pred_df["core_label"].values)
            if smoothed_labels is None:
                smoothed_labels = raw_labels
        if "mapped_label" in pred_df.columns:
            from taskclf.core.defaults import MIXED_UNKNOWN as _MU

            mapped_labels = list(pred_df["mapped_label"].fillna(_MU).values)

    app_switch_counts: list[float | int | None] | None = None
    if features_dir is not None:
        feat_base = Path(features_dir)
        if segments:
            report_date = segments[0].start_ts.date()
            feat_path = _feature_parquet_for_date(feat_base, report_date)
            if feat_path is not None and feat_path.exists():
                feat_df = read_parquet(feat_path)
                if "app_switch_count_last_5m" in feat_df.columns:
                    app_switch_counts = list(feat_df["app_switch_count_last_5m"].values)

    report = build_daily_report(
        segments,
        raw_labels=raw_labels,
        smoothed_labels=smoothed_labels,
        mapped_labels=mapped_labels,
        app_switch_counts=app_switch_counts,
    )

    # -- Display --
    summary = Table(title=f"Daily Report: {report.date}")
    summary.add_column("Metric", style="bold")
    summary.add_column("Value", justify="right")
    summary.add_row("Total minutes", f"{report.total_minutes:.1f}")
    summary.add_row("Segments", str(report.segments_count))
    if report.flap_rate_raw is not None:
        summary.add_row("Flap rate (raw)", f"{report.flap_rate_raw:.4f}")
    if report.flap_rate_smoothed is not None:
        summary.add_row("Flap rate (smoothed)", f"{report.flap_rate_smoothed:.4f}")
    console.print(summary)

    breakdown_table = Table(title="Core Label Breakdown")
    breakdown_table.add_column("Label", style="bold")
    breakdown_table.add_column("Minutes", justify="right")
    for label, minutes in sorted(report.core_breakdown.items()):
        breakdown_table.add_row(label, f"{minutes:.1f}")
    console.print(breakdown_table)

    if report.mapped_breakdown is not None:
        mapped_table = Table(title="Mapped Label Breakdown")
        mapped_table.add_column("Label", style="bold")
        mapped_table.add_column("Minutes", justify="right")
        for label, minutes in sorted(report.mapped_breakdown.items()):
            mapped_table.add_row(label, f"{minutes:.1f}")
        console.print(mapped_table)

    if report.context_switch_stats is not None:
        ctx = report.context_switch_stats
        ctx_table = Table(title="Context Switching")
        ctx_table.add_column("Metric", style="bold")
        ctx_table.add_column("Value", justify="right")
        ctx_table.add_row("Mean switches/bucket", f"{ctx.mean:.1f}")
        ctx_table.add_row("Median switches/bucket", f"{ctx.median:.1f}")
        ctx_table.add_row("Max switches", str(ctx.max_value))
        ctx_table.add_row("Total switches", str(ctx.total_switches))
        ctx_table.add_row("Buckets counted", str(ctx.buckets_counted))
        console.print(ctx_table)

    # -- Export --
    out = Path(out_dir)
    formats = [fmt] if fmt != "all" else ["json", "csv", "parquet"]
    for f in formats:
        base = f"report_{report.date}"
        if f == "json":
            p = export_report_json(report, out / f"{base}.json")
        elif f == "csv":
            p = export_report_csv(report, out / f"{base}.csv")
        elif f == "parquet":
            p = export_report_parquet(report, out / f"{base}.parquet")
        else:
            typer.echo(f"Unknown format: {f}", err=True)
            continue
        typer.echo(f"Report written to {p}")

monitor_drift_check_cmd(ref_features=typer.Option(..., '--ref-features', help='Path to reference features parquet'), cur_features=typer.Option(..., '--cur-features', help='Path to current features parquet'), ref_predictions=typer.Option(..., '--ref-predictions', help='Path to reference predictions CSV'), cur_predictions=typer.Option(..., '--cur-predictions', help='Path to current predictions CSV'), psi_threshold=typer.Option(DEFAULT_PSI_THRESHOLD, '--psi-threshold', help='PSI threshold for feature drift'), ks_alpha=typer.Option(DEFAULT_KS_ALPHA, '--ks-alpha', help='KS significance level'), reject_increase=typer.Option(DEFAULT_REJECT_RATE_INCREASE_THRESHOLD, '--reject-increase', help='Reject-rate increase threshold'), entropy_multiplier=typer.Option(DEFAULT_ENTROPY_SPIKE_MULTIPLIER, '--entropy-multiplier', help='Entropy spike multiplier'), class_shift=typer.Option(DEFAULT_CLASS_SHIFT_THRESHOLD, '--class-shift', help='Class distribution shift threshold'), auto_label=typer.Option(True, '--auto-label/--no-auto-label', help='Auto-create labeling tasks on drift'), auto_label_limit=typer.Option(DEFAULT_DRIFT_AUTO_LABEL_LIMIT, '--auto-label-limit', help='Max buckets to auto-enqueue'), queue_path=typer.Option('data/processed/labels_v1/queue.json', '--queue-path', help='Path to labeling queue JSON'), out_dir=typer.Option(DEFAULT_OUT_DIR, help='Output directory for drift report'))

Run drift detection comparing reference vs current prediction windows.

Source code in src/taskclf/cli/main.py
@monitor_app.command("drift-check")
def monitor_drift_check_cmd(
    ref_features: str = typer.Option(
        ..., "--ref-features", help="Path to reference features parquet"
    ),
    cur_features: str = typer.Option(
        ..., "--cur-features", help="Path to current features parquet"
    ),
    ref_predictions: str = typer.Option(
        ..., "--ref-predictions", help="Path to reference predictions CSV"
    ),
    cur_predictions: str = typer.Option(
        ..., "--cur-predictions", help="Path to current predictions CSV"
    ),
    psi_threshold: float = typer.Option(
        DEFAULT_PSI_THRESHOLD, "--psi-threshold", help="PSI threshold for feature drift"
    ),
    ks_alpha: float = typer.Option(
        DEFAULT_KS_ALPHA, "--ks-alpha", help="KS significance level"
    ),
    reject_increase: float = typer.Option(
        DEFAULT_REJECT_RATE_INCREASE_THRESHOLD,
        "--reject-increase",
        help="Reject-rate increase threshold",
    ),
    entropy_multiplier: float = typer.Option(
        DEFAULT_ENTROPY_SPIKE_MULTIPLIER,
        "--entropy-multiplier",
        help="Entropy spike multiplier",
    ),
    class_shift: float = typer.Option(
        DEFAULT_CLASS_SHIFT_THRESHOLD,
        "--class-shift",
        help="Class distribution shift threshold",
    ),
    auto_label: bool = typer.Option(
        True, "--auto-label/--no-auto-label", help="Auto-create labeling tasks on drift"
    ),
    auto_label_limit: int = typer.Option(
        DEFAULT_DRIFT_AUTO_LABEL_LIMIT,
        "--auto-label-limit",
        help="Max buckets to auto-enqueue",
    ),
    queue_path: str = typer.Option(
        "data/processed/labels_v1/queue.json",
        "--queue-path",
        help="Path to labeling queue JSON",
    ),
    out_dir: str = typer.Option(
        DEFAULT_OUT_DIR, help="Output directory for drift report"
    ),
) -> None:
    """Run drift detection comparing reference vs current prediction windows."""
    import json

    import numpy as np
    import pandas as pd
    from rich.console import Console
    from rich.table import Table

    from taskclf.core.store import read_parquet
    from taskclf.infer.monitor import (
        auto_enqueue_drift_labels,
        run_drift_check,
        write_drift_report,
    )

    console = Console()

    ref_df = read_parquet(Path(ref_features))
    cur_df = read_parquet(Path(cur_features))
    typer.echo(f"Reference: {len(ref_df)} rows  Current: {len(cur_df)} rows")

    ref_pred = pd.read_csv(ref_predictions)
    cur_pred = pd.read_csv(cur_predictions)

    ref_labels = list(ref_pred["predicted_label"].values)
    cur_labels = list(cur_pred["predicted_label"].values)

    ref_probs: np.ndarray | None = None
    cur_probs: np.ndarray | None = None
    if "core_probs" in ref_pred.columns and "core_probs" in cur_pred.columns:
        ref_probs = np.array([json.loads(p) for p in ref_pred["core_probs"]])
        cur_probs = np.array([json.loads(p) for p in cur_pred["core_probs"]])

    cur_confidences: np.ndarray | None = None
    if "confidence" in cur_pred.columns:
        cur_confidences = cur_pred["confidence"].to_numpy(dtype=np.float64)

    report = run_drift_check(
        ref_df,
        cur_df,
        ref_labels,
        cur_labels,
        ref_probs=ref_probs,
        cur_probs=cur_probs,
        cur_confidences=cur_confidences,
        psi_threshold=psi_threshold,
        ks_alpha=ks_alpha,
        reject_increase_threshold=reject_increase,
        entropy_multiplier=entropy_multiplier,
        class_shift_threshold=class_shift,
    )

    if not report.alerts:
        console.print("[green]No drift detected.[/green]")
    else:
        table = Table(title=f"Drift Alerts ({len(report.alerts)})")
        table.add_column("Trigger", style="bold")
        table.add_column("Severity")
        table.add_column("Details")
        table.add_column("Features")

        for alert in report.alerts:
            sev = (
                "[red]CRITICAL[/red]"
                if alert.severity == "critical"
                else "[yellow]WARNING[/yellow]"
            )
            details_str = ", ".join(f"{k}={v}" for k, v in alert.details.items())
            table.add_row(
                alert.trigger.value,
                sev,
                details_str,
                ", ".join(alert.affected_features) or "-",
            )
        console.print(table)

    out = Path(out_dir)
    report_path = write_drift_report(report, out / "drift_report.json")
    typer.echo(f"Drift report: {report_path}")

    if auto_label and report.alerts:
        enqueued = auto_enqueue_drift_labels(
            report,
            cur_df,
            Path(queue_path),
            cur_confidences=cur_confidences,
            limit=auto_label_limit,
        )
        typer.echo(f"Auto-enqueued {enqueued} labeling tasks")

    console.print(f"\n[bold]Summary:[/bold] {report.summary}")

monitor_telemetry_cmd(features=typer.Option(..., '--features', help='Path to features parquet'), predictions=typer.Option(..., '--predictions', help='Path to predictions CSV'), user_id=typer.Option(None, '--user-id', help='Scope to a specific user'), store_dir=typer.Option(DEFAULT_TELEMETRY_DIR, '--store-dir', help='Telemetry store directory'))

Compute a telemetry snapshot and append to the store.

Source code in src/taskclf/cli/main.py
@monitor_app.command("telemetry")
def monitor_telemetry_cmd(
    features: str = typer.Option(..., "--features", help="Path to features parquet"),
    predictions: str = typer.Option(
        ..., "--predictions", help="Path to predictions CSV"
    ),
    user_id: str | None = typer.Option(
        None, "--user-id", help="Scope to a specific user"
    ),
    store_dir: str = typer.Option(
        DEFAULT_TELEMETRY_DIR, "--store-dir", help="Telemetry store directory"
    ),
) -> None:
    """Compute a telemetry snapshot and append to the store."""
    import json

    import numpy as np
    import pandas as pd

    from taskclf.core.store import read_parquet
    from taskclf.core.telemetry import TelemetryStore, compute_telemetry

    feat_df = read_parquet(Path(features))
    pred_df = pd.read_csv(predictions)

    labels = (
        list(pred_df["predicted_label"].values)
        if "predicted_label" in pred_df.columns
        else None
    )
    confidences: np.ndarray | None = None
    if "confidence" in pred_df.columns:
        confidences = pred_df["confidence"].to_numpy(dtype=np.float64)
    core_probs: np.ndarray | None = None
    if "core_probs" in pred_df.columns:
        core_probs = np.array([json.loads(p) for p in pred_df["core_probs"]])

    snapshot = compute_telemetry(
        feat_df,
        labels=labels,
        confidences=confidences,
        core_probs=core_probs,
        user_id=user_id,
    )

    store = TelemetryStore(store_dir)
    path = store.append(snapshot)
    typer.echo(f"Telemetry snapshot appended to {path}")
    typer.echo(
        f"  Windows: {snapshot.total_windows}  Reject rate: {snapshot.reject_rate:.2%}"
    )
    if snapshot.confidence_stats:
        cs = snapshot.confidence_stats
        typer.echo(
            f"  Confidence: mean={cs.mean:.3f}  median={cs.median:.3f}  p5={cs.p5:.3f}  p95={cs.p95:.3f}"
        )
    typer.echo(f"  Mean entropy: {snapshot.mean_entropy:.4f}")

monitor_show_cmd(store_dir=typer.Option(DEFAULT_TELEMETRY_DIR, '--store-dir', help='Telemetry store directory'), user_id=typer.Option(None, '--user-id', help='Filter to a specific user'), last=typer.Option(10, '--last', help='Number of recent snapshots to show'))

Display recent telemetry snapshots.

Source code in src/taskclf/cli/main.py
@monitor_app.command("show")
def monitor_show_cmd(
    store_dir: str = typer.Option(
        DEFAULT_TELEMETRY_DIR, "--store-dir", help="Telemetry store directory"
    ),
    user_id: str | None = typer.Option(
        None, "--user-id", help="Filter to a specific user"
    ),
    last: int = typer.Option(10, "--last", help="Number of recent snapshots to show"),
) -> None:
    """Display recent telemetry snapshots."""
    from rich.console import Console
    from rich.table import Table

    from taskclf.core.telemetry import TelemetryStore

    console = Console()
    store = TelemetryStore(store_dir)
    snapshots = store.read_recent(last, user_id=user_id)

    if not snapshots:
        console.print("[dim]No telemetry snapshots found.[/dim]")
        return

    table = Table(title=f"Recent Telemetry ({len(snapshots)} snapshots)")
    table.add_column("Timestamp", style="dim")
    table.add_column("User")
    table.add_column("Windows", justify="right")
    table.add_column("Reject Rate", justify="right")
    table.add_column("Mean Conf", justify="right")
    table.add_column("Mean Entropy", justify="right")
    table.add_column("Missing Features", justify="right")

    for snap in snapshots:
        n_missing = sum(1 for v in snap.feature_missingness.values() if v > 0.0)
        table.add_row(
            snap.timestamp.strftime("%Y-%m-%d %H:%M"),
            snap.user_id or "global",
            str(snap.total_windows),
            f"{snap.reject_rate:.2%}",
            f"{snap.confidence_stats.mean:.3f}" if snap.confidence_stats else "-",
            f"{snap.mean_entropy:.4f}",
            str(n_missing),
        )

    console.print(table)

migrate_audit_cmd(data_dir=typer.Option(DEFAULT_DATA_DIR, '--data-dir', help='Root directory to scan for datetime artifacts'))

Scan data files for naive (tz-unaware) datetime columns.

Source code in src/taskclf/cli/main.py
@migrate_app.command("audit")
def migrate_audit_cmd(
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR,
        "--data-dir",
        help="Root directory to scan for datetime artifacts",
    ),
) -> None:
    """Scan data files for naive (tz-unaware) datetime columns."""
    from rich.console import Console
    from rich.table import Table

    from taskclf.migrate.timestamps import audit_data_dir

    console = Console()
    dir_path = Path(data_dir)

    if not dir_path.exists():
        typer.echo(f"Data directory not found: {dir_path}")
        raise typer.Exit(code=1)

    results = audit_data_dir(dir_path)

    if not results:
        typer.echo("No parquet or queue.json files found.")
        return

    table = Table(title="Timestamp Audit")
    table.add_column("File", style="cyan")
    table.add_column("Type")
    table.add_column("Column")
    table.add_column("Total", justify="right")
    table.add_column("Naive", justify="right")
    table.add_column("Aware", justify="right")
    table.add_column("Status")

    needs_any = False
    for fa in results:
        if not fa.columns:
            table.add_row(
                str(fa.path),
                fa.file_type,
                "-",
                "-",
                "-",
                "-",
                "[dim]no datetime cols[/dim]",
            )
            continue
        for ca in fa.columns:
            status = (
                "[red]needs migration[/red]"
                if ca.naive_count > 0
                else "[green]ok[/green]"
            )
            if ca.naive_count > 0:
                needs_any = True
            table.add_row(
                str(fa.path),
                fa.file_type,
                ca.column,
                str(ca.total),
                str(ca.naive_count),
                str(ca.aware_count),
                status,
            )

    console.print(table)

    if needs_any:
        typer.echo("\nRun 'taskclf migrate rewrite' to normalize naive timestamps.")
    else:
        typer.echo("\nAll datetime columns are already timezone-aware UTC.")

migrate_rewrite_cmd(data_dir=typer.Option(DEFAULT_DATA_DIR, '--data-dir', help='Root directory to scan and rewrite'), no_backup=typer.Option(False, '--no-backup', help='Skip creating .bak backup files'))

Rewrite naive datetime columns/fields to timezone-aware UTC.

Source code in src/taskclf/cli/main.py
@migrate_app.command("rewrite")
def migrate_rewrite_cmd(
    data_dir: str = typer.Option(
        DEFAULT_DATA_DIR,
        "--data-dir",
        help="Root directory to scan and rewrite",
    ),
    no_backup: bool = typer.Option(
        False,
        "--no-backup",
        help="Skip creating .bak backup files",
    ),
) -> None:
    """Rewrite naive datetime columns/fields to timezone-aware UTC."""
    from taskclf.migrate.timestamps import rewrite_data_dir

    dir_path = Path(data_dir)

    if not dir_path.exists():
        typer.echo(f"Data directory not found: {dir_path}")
        raise typer.Exit(code=1)

    results = rewrite_data_dir(dir_path, backup=not no_backup)

    if not results:
        typer.echo("No parquet or queue.json files found.")
        return

    migrated = 0
    skipped = 0
    for r in results:
        if r.already_migrated:
            skipped += 1
            typer.echo(f"  [skip] {r.path} (already migrated)")
        else:
            migrated += 1
            cols = ", ".join(r.columns_migrated)
            typer.echo(f"  [done] {r.path}{cols} ({r.rows_affected} rows)")
            if r.backup_path:
                typer.echo(f"         backup: {r.backup_path}")

    typer.echo(f"\nMigrated: {migrated}  Skipped: {skipped}")

tray_cmd(model_dir=typer.Option(None, '--model-dir', help='Path to a model run directory (enables label suggestions)'), models_dir=typer.Option(DEFAULT_MODELS_DIR, '--models-dir', help='Directory containing model bundles (enables Model submenu for hot-swapping)'), aw_host=typer.Option(DEFAULT_AW_HOST, '--aw-host', help='ActivityWatch server URL'), poll_seconds=typer.Option(DEFAULT_POLL_SECONDS, '--poll-seconds', help='Seconds between polling iterations'), aw_timeout=typer.Option(DEFAULT_AW_TIMEOUT_SECONDS, '--aw-timeout', help='Seconds to wait for ActivityWatch API responses'), title_salt=typer.Option(DEFAULT_TITLE_SALT, '--title-salt', help='Salt for hashing window titles'), data_dir=typer.Option(None, '--data-dir', help='Processed data directory (default: TASKCLF_HOME/data/processed; ephemeral in --dev)'), transition_minutes=typer.Option(DEFAULT_TRANSITION_MINUTES, '--transition-minutes', help='Minutes a new app must persist before prompting to label'), port=typer.Option(8741, '--port', help='Port for the embedded web UI server'), dev=typer.Option(False, '--dev', help='Start Vite dev server for frontend hot reload'), browser=typer.Option(False, '--browser', help='Open UI in browser instead of native window'), open_browser=typer.Option(True, '--open-browser/--no-open-browser', help='When --browser is set, open the dashboard in the default browser'), no_tray=typer.Option(False, '--no-tray', help='Skip the native tray icon (use with --browser for browser-only mode)'), username=typer.Option(None, '--username', help='Display name (persisted in config.json; does not affect label identity)'), retrain_config=typer.Option(None, '--retrain-config', help='Path to retrain YAML config (enables Retrain Status in Prediction Model submenu)'))

Run a system tray labeling app with activity transition detection.

Left-clicking the tray icon opens the web dashboard where all labeling is done. Right-click shows the full tray menu (Toggle Dashboard, Pause/Resume, Show Status, label import/export, Prediction Model, etc.).

Source code in src/taskclf/cli/main.py
@app.command("tray")
def tray_cmd(
    model_dir: str | None = typer.Option(
        None,
        "--model-dir",
        help="Path to a model run directory (enables label suggestions)",
    ),
    models_dir: str = typer.Option(
        DEFAULT_MODELS_DIR,
        "--models-dir",
        help="Directory containing model bundles (enables Model submenu for hot-swapping)",
    ),
    aw_host: str = typer.Option(
        DEFAULT_AW_HOST, "--aw-host", help="ActivityWatch server URL"
    ),
    poll_seconds: int = typer.Option(
        DEFAULT_POLL_SECONDS,
        "--poll-seconds",
        help="Seconds between polling iterations",
    ),
    aw_timeout: int = typer.Option(
        DEFAULT_AW_TIMEOUT_SECONDS,
        "--aw-timeout",
        help="Seconds to wait for ActivityWatch API responses",
    ),
    title_salt: str = typer.Option(
        DEFAULT_TITLE_SALT, "--title-salt", help="Salt for hashing window titles"
    ),
    data_dir: str | None = typer.Option(
        None,
        "--data-dir",
        help="Processed data directory (default: TASKCLF_HOME/data/processed; ephemeral in --dev)",
    ),
    transition_minutes: int = typer.Option(
        DEFAULT_TRANSITION_MINUTES,
        "--transition-minutes",
        help="Minutes a new app must persist before prompting to label",
    ),
    port: int = typer.Option(
        8741, "--port", help="Port for the embedded web UI server"
    ),
    dev: bool = typer.Option(
        False, "--dev", help="Start Vite dev server for frontend hot reload"
    ),
    browser: bool = typer.Option(
        False, "--browser", help="Open UI in browser instead of native window"
    ),
    open_browser: bool = typer.Option(
        True,
        "--open-browser/--no-open-browser",
        help="When --browser is set, open the dashboard in the default browser",
    ),
    no_tray: bool = typer.Option(
        False,
        "--no-tray",
        help="Skip the native tray icon (use with --browser for browser-only mode)",
    ),
    username: str | None = typer.Option(
        None,
        "--username",
        help="Display name (persisted in config.json; does not affect label identity)",
    ),
    retrain_config: str | None = typer.Option(
        None,
        "--retrain-config",
        help="Path to retrain YAML config (enables Retrain Status in Prediction Model submenu)",
    ),
) -> None:
    """Run a system tray labeling app with activity transition detection.

    Left-clicking the tray icon opens the web dashboard where all
    labeling is done.  Right-click shows the full tray menu (Toggle Dashboard,
    Pause/Resume, Show Status, label import/export, Prediction Model, etc.).
    """
    import shutil
    import tempfile

    from taskclf.ui.tray import run_tray

    if dev:
        logging.getLogger("taskclf").setLevel(logging.DEBUG)

    tmp_dir: str | None = None
    if data_dir is not None:
        resolved_data_dir = data_dir
    elif dev:
        tmp_dir = tempfile.mkdtemp(prefix="taskclf-dev-")
        resolved_data_dir = tmp_dir
        typer.echo(f"Dev mode: using ephemeral data dir → {tmp_dir}")
    else:
        resolved_data_dir = DEFAULT_DATA_DIR

    try:
        run_tray(
            model_dir=Path(model_dir) if model_dir else None,
            models_dir=Path(models_dir),
            aw_host=aw_host,
            poll_seconds=poll_seconds,
            aw_timeout_seconds=aw_timeout,
            title_salt=title_salt,
            data_dir=Path(resolved_data_dir),
            transition_minutes=transition_minutes,
            ui_port=port,
            dev=dev,
            browser=browser,
            open_browser=open_browser,
            no_tray=no_tray,
            username=username,
            retrain_config=Path(retrain_config) if retrain_config else None,
        )
    finally:
        if tmp_dir is not None:
            shutil.rmtree(tmp_dir, ignore_errors=True)
            typer.echo(f"Dev mode: cleaned up ephemeral data dir → {tmp_dir}")

electron_cmd(model_dir=typer.Option(None, '--model-dir', help='Path to a model run directory (enables label suggestions)'), models_dir=typer.Option(DEFAULT_MODELS_DIR, '--models-dir', help='Directory containing model bundles'), aw_host=typer.Option(DEFAULT_AW_HOST, '--aw-host', help='ActivityWatch server URL'), poll_seconds=typer.Option(DEFAULT_POLL_SECONDS, '--poll-seconds', help='Seconds between polling iterations'), title_salt=typer.Option(DEFAULT_TITLE_SALT, '--title-salt', help='Salt for hashing window titles'), data_dir=typer.Option(None, '--data-dir', help='Processed data directory (default: TASKCLF_HOME/data/processed; ephemeral in --dev)'), transition_minutes=typer.Option(DEFAULT_TRANSITION_MINUTES, '--transition-minutes', help='Minutes a new app must persist before prompting to label'), port=typer.Option(8741, '--port', help='Port for the embedded web UI server'), dev=typer.Option(False, '--dev', help='Enable frontend hot reload inside the Electron shell'), username=typer.Option(None, '--username', help='Display name (persisted in config.json; does not affect label identity)'), retrain_config=typer.Option(None, '--retrain-config', help='Path to retrain YAML config passed through to the tray backend'))

Launch the Electron desktop shell with a native tray and floating dashboard.

Source code in src/taskclf/cli/main.py
@app.command("electron")
def electron_cmd(
    model_dir: str | None = typer.Option(
        None,
        "--model-dir",
        help="Path to a model run directory (enables label suggestions)",
    ),
    models_dir: str = typer.Option(
        DEFAULT_MODELS_DIR,
        "--models-dir",
        help="Directory containing model bundles",
    ),
    aw_host: str = typer.Option(
        DEFAULT_AW_HOST, "--aw-host", help="ActivityWatch server URL"
    ),
    poll_seconds: int = typer.Option(
        DEFAULT_POLL_SECONDS,
        "--poll-seconds",
        help="Seconds between polling iterations",
    ),
    title_salt: str = typer.Option(
        DEFAULT_TITLE_SALT, "--title-salt", help="Salt for hashing window titles"
    ),
    data_dir: str | None = typer.Option(
        None,
        "--data-dir",
        help="Processed data directory (default: TASKCLF_HOME/data/processed; ephemeral in --dev)",
    ),
    transition_minutes: int = typer.Option(
        DEFAULT_TRANSITION_MINUTES,
        "--transition-minutes",
        help="Minutes a new app must persist before prompting to label",
    ),
    port: int = typer.Option(
        8741, "--port", help="Port for the embedded web UI server"
    ),
    dev: bool = typer.Option(
        False, "--dev", help="Enable frontend hot reload inside the Electron shell"
    ),
    username: str | None = typer.Option(
        None,
        "--username",
        help="Display name (persisted in config.json; does not affect label identity)",
    ),
    retrain_config: str | None = typer.Option(
        None,
        "--retrain-config",
        help="Path to retrain YAML config passed through to the tray backend",
    ),
) -> None:
    """Launch the Electron desktop shell with a native tray and floating dashboard."""
    import shutil
    import subprocess
    import tempfile

    from taskclf.ui.electron_shell import ElectronLaunchConfig, launch_electron_shell

    if dev:
        logging.getLogger("taskclf").setLevel(logging.DEBUG)

    tmp_dir: str | None = None
    if data_dir is not None:
        resolved_data_dir = data_dir
    elif dev:
        tmp_dir = tempfile.mkdtemp(prefix="taskclf-dev-")
        resolved_data_dir = tmp_dir
        typer.echo(f"Dev mode: using ephemeral data dir → {tmp_dir}")
    else:
        resolved_data_dir = DEFAULT_DATA_DIR

    try:
        launch_electron_shell(
            ElectronLaunchConfig(
                model_dir=Path(model_dir) if model_dir else None,
                models_dir=Path(models_dir),
                aw_host=aw_host,
                poll_seconds=poll_seconds,
                title_salt=title_salt,
                data_dir=Path(resolved_data_dir),
                transition_minutes=transition_minutes,
                ui_port=port,
                dev=dev,
                username=username,
                retrain_config=Path(retrain_config) if retrain_config else None,
            )
        )
    except RuntimeError as exc:
        typer.echo(f"Error: {exc}", err=True)
        raise typer.Exit(1) from exc
    except subprocess.CalledProcessError as exc:
        typer.echo(
            f"Electron shell exited with status {exc.returncode}",
            err=True,
        )
        raise typer.Exit(exc.returncode) from exc
    finally:
        if tmp_dir is not None:
            shutil.rmtree(tmp_dir, ignore_errors=True)
            typer.echo(f"Dev mode: cleaned up ephemeral data dir → {tmp_dir}")

ui_serve_cmd(port=typer.Option(8741, '--port', help='Port for the web UI server'), model_dir=typer.Option(None, '--model-dir', help='Path to a model run directory (enables live predictions)'), aw_host=typer.Option(DEFAULT_AW_HOST, '--aw-host', help='ActivityWatch server URL'), poll_seconds=typer.Option(DEFAULT_POLL_SECONDS, '--poll-seconds', help='Seconds between AW polling iterations'), title_salt=typer.Option(DEFAULT_TITLE_SALT, '--title-salt', help='Salt for hashing window titles'), data_dir=typer.Option(None, '--data-dir', help='Processed data directory (default: TASKCLF_HOME/data/processed; ephemeral in --dev)'), transition_minutes=typer.Option(DEFAULT_TRANSITION_MINUTES, '--transition-minutes', help='Minutes a new app must persist before suggesting a label'), browser=typer.Option(False, '--browser', help='Open in browser instead of native window'), dev=typer.Option(False, '--dev', help='Start Vite dev server for frontend hot reload (browser mode also reloads the backend)'))

Launch the labeling UI as a native floating window with live prediction streaming.

Source code in src/taskclf/cli/main.py
@app.command("ui")
def ui_serve_cmd(
    port: int = typer.Option(8741, "--port", help="Port for the web UI server"),
    model_dir: str | None = typer.Option(
        None,
        "--model-dir",
        help="Path to a model run directory (enables live predictions)",
    ),
    aw_host: str = typer.Option(
        DEFAULT_AW_HOST, "--aw-host", help="ActivityWatch server URL"
    ),
    poll_seconds: int = typer.Option(
        DEFAULT_POLL_SECONDS,
        "--poll-seconds",
        help="Seconds between AW polling iterations",
    ),
    title_salt: str = typer.Option(
        DEFAULT_TITLE_SALT, "--title-salt", help="Salt for hashing window titles"
    ),
    data_dir: str | None = typer.Option(
        None,
        "--data-dir",
        help="Processed data directory (default: TASKCLF_HOME/data/processed; ephemeral in --dev)",
    ),
    transition_minutes: int = typer.Option(
        DEFAULT_TRANSITION_MINUTES,
        "--transition-minutes",
        help="Minutes a new app must persist before suggesting a label",
    ),
    browser: bool = typer.Option(
        False, "--browser", help="Open in browser instead of native window"
    ),
    dev: bool = typer.Option(
        False,
        "--dev",
        help="Start Vite dev server for frontend hot reload (browser mode also reloads the backend)",
    ),
) -> None:
    """Launch the labeling UI as a native floating window with live prediction streaming."""
    import os
    import shutil
    import subprocess
    import tempfile
    import threading

    if dev:
        logging.getLogger("taskclf").setLevel(logging.DEBUG)

    tmp_dir: str | None = None
    if data_dir is not None:
        resolved_data_dir = data_dir
    elif dev:
        tmp_dir = tempfile.mkdtemp(prefix="taskclf-dev-")
        resolved_data_dir = tmp_dir
        typer.echo(f"Dev mode: using ephemeral data dir → {tmp_dir}")
    else:
        resolved_data_dir = DEFAULT_DATA_DIR

    _ensure_ui_port_available(port)

    backend_proc = None
    server_thread = None
    win_api = None
    use_reloadable_backend = dev and browser

    if use_reloadable_backend:
        backend_proc = _start_reloadable_ui_backend(
            port=port,
            data_dir=resolved_data_dir,
            aw_host=aw_host,
            poll_seconds=poll_seconds,
            title_salt=title_salt,
            transition_minutes=transition_minutes,
            model_dir=model_dir,
        )
        typer.echo(f"taskclf API on http://127.0.0.1:{port} (backend hot reload)")
        if not _wait_for_local_http(
            f"http://127.0.0.1:{port}",
            proc=backend_proc,
        ):
            if backend_proc.poll() is not None:
                typer.echo("Error: backend dev server exited unexpectedly", err=True)
                raise typer.Exit(1)
            typer.echo(
                "Warning: backend dev server not responding, opening anyway",
                err=True,
            )
    else:
        import uvicorn

        from taskclf.ui.events import EventBus
        from taskclf.ui.runtime import ActivityMonitor, _LabelSuggester
        from taskclf.ui.server import create_app
        from taskclf.ui.window import WindowAPI

        bus = EventBus()
        win_api = WindowAPI()

        suggester: _LabelSuggester | None = None
        if model_dir is not None:
            try:
                suggester = _LabelSuggester(Path(model_dir))
                suggester._aw_host = aw_host
                suggester._title_salt = title_salt
            except Exception as exc:
                typer.echo(f"Warning: could not load model — {exc}", err=True)

        def on_transition(
            prev: str,
            new: str,
            start: dt.datetime,
            end: dt.datetime,
        ) -> None:
            if suggester is not None:
                result = suggester.suggest(start, end)
                if result is not None:
                    label, confidence = result
                    bus.publish_threadsafe(
                        {
                            "type": "suggest_label",
                            "reason": "app_switch",
                            "old_label": prev,
                            "suggested": label,
                            "confidence": confidence,
                            "block_start": start.isoformat(),
                            "block_end": end.isoformat(),
                        }
                    )
                    return
            bus.publish_threadsafe(
                {
                    "type": "prediction",
                    "label": "unknown",
                    "confidence": 0.0,
                    "ts": end.isoformat(),
                    "mapped_label": "unknown",
                    "current_app": new,
                }
            )

        monitor = ActivityMonitor(
            aw_host=aw_host,
            title_salt=title_salt,
            poll_seconds=poll_seconds,
            transition_minutes=transition_minutes,
            on_transition=on_transition,
            event_bus=bus,
        )
        monitor_thread = threading.Thread(target=monitor.run, daemon=True)
        monitor_thread.start()

        fastapi_app = create_app(
            data_dir=Path(resolved_data_dir),
            aw_host=aw_host,
            title_salt=title_salt,
            event_bus=bus,
            window_api=win_api,
            get_activity_provider_status=lambda: monitor.activity_provider_status,
        )

        uvicorn_config = uvicorn.Config(
            fastapi_app,
            host="127.0.0.1",
            port=port,
            log_level="warning",
            ws_ping_interval=30,
            ws_ping_timeout=30,
        )
        server = uvicorn.Server(uvicorn_config)
        server_thread = threading.Thread(target=server.run, daemon=True)
        server_thread.start()
        typer.echo(f"taskclf API on http://127.0.0.1:{port}")

    vite_proc: subprocess.Popen[bytes] | None = None
    ui_port = port

    if dev:
        import taskclf.ui.server as _ui_srv

        frontend_dir = Path(_ui_srv.__file__).resolve().parent / "frontend"
        if not frontend_dir.is_dir():
            typer.echo(
                "Error: frontend source not found (--dev requires a repo checkout)",
                err=True,
            )
            raise typer.Exit(1)

        if not (frontend_dir / "node_modules").is_dir():
            typer.echo("Installing frontend dependencies…")
            subprocess.run(["pnpm", "install"], cwd=frontend_dir, check=True)

        vite_env = {**os.environ, "TASKCLF_PORT": str(port)}
        vite_proc = subprocess.Popen(
            ["pnpm", "run", "dev"],
            cwd=frontend_dir,
            env=vite_env,
        )
        ui_port = 5173
        typer.echo(f"Vite dev server → http://127.0.0.1:{ui_port} (hot reload)")

        if not _wait_for_local_http(f"http://127.0.0.1:{ui_port}", proc=vite_proc):
            if vite_proc.poll() is not None:
                typer.echo("Error: Vite dev server exited unexpectedly", err=True)
                raise typer.Exit(1)
            typer.echo(
                "Warning: Vite dev server not responding, opening anyway", err=True
            )

    try:
        if browser:
            import webbrowser

            webbrowser.open(f"http://127.0.0.1:{ui_port}")
            if backend_proc is not None:
                backend_proc.wait()
            elif server_thread is not None:
                server_thread.join()
        else:
            from taskclf.ui.window_run import window_run

            window_run(port=ui_port, window_api=win_api)
    except KeyboardInterrupt:
        pass
    finally:
        if vite_proc is not None:
            vite_proc.terminate()
            vite_proc.wait(timeout=5)
        if backend_proc is not None:
            backend_proc.terminate()
            backend_proc.wait(timeout=5)
        if tmp_dir is not None:
            shutil.rmtree(tmp_dir, ignore_errors=True)
            typer.echo(f"Dev mode: cleaned up ephemeral data dir → {tmp_dir}")

diagnostics(output_json=typer.Option(False, '--json', help='Output as JSON instead of human-readable text.'), include_logs=typer.Option(False, '--include-logs/--no-include-logs', help='Append the last N log lines to the output.'), log_lines=typer.Option(50, '--log-lines', help='Number of log lines to include (requires --include-logs).'), out=typer.Option(None, '--out', help='Write output to a file instead of stdout.'))

Collect environment info for bug reports.

Source code in src/taskclf/cli/main.py
@app.command()
def diagnostics(
    output_json: bool = typer.Option(
        False,
        "--json",
        help="Output as JSON instead of human-readable text.",
    ),
    include_logs: bool = typer.Option(
        False,
        "--include-logs/--no-include-logs",
        help="Append the last N log lines to the output.",
    ),
    log_lines: int = typer.Option(
        50,
        "--log-lines",
        help="Number of log lines to include (requires --include-logs).",
    ),
    out: Path | None = typer.Option(
        None,
        "--out",
        help="Write output to a file instead of stdout.",
    ),
) -> None:
    """Collect environment info for bug reports."""
    import json

    info = _collect_diagnostics(
        aw_host=DEFAULT_AW_HOST,
        data_dir=DEFAULT_DATA_DIR,
        models_dir=DEFAULT_MODELS_DIR,
        include_logs=include_logs,
        log_lines=log_lines,
    )

    if output_json:
        text = json.dumps(info, indent=2, default=str)
    else:
        text = _format_diagnostics_text(info)

    if out is not None:
        out.parent.mkdir(parents=True, exist_ok=True)
        out.write_text(text, "utf-8")
        typer.echo(f"Diagnostics written to {out}")
    else:
        typer.echo(text)

cli_main()

Entry-point wrapper that catches unhandled crashes.

Source code in src/taskclf/cli/main.py
def cli_main() -> None:
    """Entry-point wrapper that catches unhandled crashes."""
    try:
        app()
    except KeyboardInterrupt:
        raise SystemExit(130)
    except SystemExit:
        raise
    except Exception as exc:
        import sys as _sys

        from taskclf.core.crash import ISSUE_URL, write_crash_report

        try:
            path = write_crash_report(exc)
            print(
                f"taskclf crashed. Details saved to {path}\n"
                f"Please report at {ISSUE_URL}",
                file=_sys.stderr,
            )
        except Exception:
            traceback.print_exc()
        raise SystemExit(1) from exc