astrodata.data package

Subpackages

Submodules

astrodata.data.pipeline module

class astrodata.data.pipeline.DataPipeline(config_path, loader, processors)

Bases: object

A pipeline for processing data using a loader and a series of processors.

loader

The data loader responsible for loading raw data.

Type:

BaseLoader

processors

A list of processors to process the data.

Type:

list[AbstractProcessor]

run(path

str) -> ProcessedData: Executes the pipeline by loading data from the given path, applying the processors sequentially, and converting the result into a ProcessedData object.

run(path, dump_output=True)

Executes the data pipeline.

Parameters:

path (str) – The file path to load the raw data from.

Returns:

The processed data after applying all processors.

Return type:

ProcessedData

astrodata.data.schemas module

class astrodata.data.schemas.ProcessedData(**data)

Bases: BaseModel

Represents processed data after transformations.

data

The actual data as a Pandas DataFrame.

Type:

pd.DataFrame

metadata

Additional metadata about the processed data.

Type:

Optional[dict]

class Config

Bases: object

arbitrary_types_allowed = True
data: DataFrame
dump_parquet(path)

Dumps the processed data to a Parquet file.

Parameters:

path (Path, optional) – The file path to save the Parquet file. If None, uses ‘processed_data.parquet’.

metadata: Optional[dict]
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class astrodata.data.schemas.RawData(**data)

Bases: BaseModel

Represents raw input data loaded from a source file.

source

The source of the data (e.g., file path or URL).

Type:

str

format

The format of the data (e.g., “fits”, “hdf5”, “csv”, “parquet”).

Type:

Literal

data

The actual data as a Pandas DataFrame.

Type:

pd.DataFrame

class Config

Bases: object

arbitrary_types_allowed = True
data: DataFrame
format: Literal['fits', 'hdf5', 'csv', 'parquet']
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

source: Path | str

astrodata.data.utils module

astrodata.data.utils.convert_to_processed_data(data)

Convert RawData to ProcessedData using specified feature and target columns.

Return type:

ProcessedData

astrodata.data.utils.extract_format(path)
Return type:

str

Module contents