core.store¶

Parquet (and future duckdb) IO primitives.

pandas is imported inside read_parquet() and write_parquet() so importing taskclf.core.store does not eagerly load the full dataframe stack; callers that only need other modules avoid that cost until parquet I/O runs.

`taskclf.core.store` ¶

Parquet I/O primitives for persisting DataFrames.

`write_parquet(df, path)` ¶

Write df to a parquet file at path atomically.

Writes to a temporary file in the same directory first, then atomically replaces the target via :func:os.replace. This prevents readers from ever seeing a partially-written file.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	DataFrame to persist.	required
`path`	`Path`	Destination file path (e.g. `data/processed/.../features.parquet`).	required

Returns:

Type	Description
`Path`	The path that was written, for convenient chaining.

Source code in src/taskclf/core/store.py

def write_parquet(df: pd.DataFrame, path: Path) -> Path:
    """Write *df* to a parquet file at *path* atomically.

    Writes to a temporary file in the same directory first, then
    atomically replaces the target via :func:`os.replace`.  This
    prevents readers from ever seeing a partially-written file.

    Args:
        df: DataFrame to persist.
        path: Destination file path (e.g. ``data/processed/.../features.parquet``).

    Returns:
        The *path* that was written, for convenient chaining.
    """
    path.parent.mkdir(parents=True, exist_ok=True)
    fd, tmp = tempfile.mkstemp(dir=path.parent, suffix=".parquet.tmp")
    try:
        os.close(fd)
        df.to_parquet(tmp, engine="pyarrow", index=False)
        os.replace(tmp, path)
    except BaseException:
        with contextlib.suppress(OSError):
            os.unlink(tmp)
        raise
    return path

`read_parquet(path)` ¶

Read a parquet file into a DataFrame.