Skip to content

features/domain

Privacy-preserving browser domain classification. Maps eTLD+1 domains to semantic categories without storing raw URLs.

taskclf.features.domain

Privacy-preserving browser domain classification.

Maps eTLD+1 domains (e.g. "github.com") to semantic categories without storing full URLs, paths, or query strings. Only the domain category string is persisted — never the raw domain or URL.

When no domain information is available (e.g. no aw-watcher-web integration), the classifier falls back to "unknown" for browser apps and "non_browser" for non-browser apps.

See docs/guide/privacy.md §3.4 for the data-handling contract.

classify_domain(domain, *, is_browser=True)

Map a domain string to a privacy-safe category.

Parameters:

Name Type Description Default
domain str | None

An eTLD+1 or subdomain string (e.g. "github.com"). None when domain information is unavailable.

required
is_browser bool

Whether the foreground app is a browser.

True

Returns:

Type Description
str

One of :data:DOMAIN_CATEGORIES.

Source code in src/taskclf/features/domain.py
def classify_domain(domain: str | None, *, is_browser: bool = True) -> str:
    """Map a domain string to a privacy-safe category.

    Args:
        domain: An eTLD+1 or subdomain string (e.g. ``"github.com"``).
            ``None`` when domain information is unavailable.
        is_browser: Whether the foreground app is a browser.

    Returns:
        One of :data:`DOMAIN_CATEGORIES`.
    """
    if not is_browser:
        return "non_browser"
    if domain is None:
        return "unknown"
    domain = domain.lower().strip()
    if not domain:
        return "unknown"

    if domain in _DOMAIN_RULES:
        return _DOMAIN_RULES[domain]

    # Try parent domain (e.g. "mail.google.com" -> "google.com")
    parts = domain.split(".")
    if len(parts) > 2:
        parent = ".".join(parts[-2:])
        if parent in _DOMAIN_RULES:
            return _DOMAIN_RULES[parent]

    return "other"