astrodata.data package

Submodules

astrodata.data.pipeline module

class astrodata.data.pipeline.DataPipeline(config_path, loader, processors)

Bases: object

A pipeline for processing data using a loader and a series of processors.

loader

The data loader responsible for loading raw data.

Type:: BaseLoader

processors

A list of processors to process the data.

Type:: list[AbstractProcessor]

run(path: str) -> ProcessedData: Executes the pipeline by loading data from the given path, applying the processors sequentially, and converting the result into a ProcessedData object.

run(path, dump_output=True)

Executes the data pipeline.

Parameters:: path (str) – The file path to load the raw data from.
Returns:: The processed data after applying all processors.
Return type:: ProcessedData

astrodata.data.schemas module

class astrodata.data.schemas.ProcessedData(**data)

Bases: BaseModel

Represents processed data after transformations.

data

The actual data as a Pandas DataFrame.

Type:: pd.DataFrame

metadata

Additional metadata about the processed data.

Type:: Optional[dict]

class Config

Bases: object

arbitrary_types_allowed = True

data: DataFrame

dump_parquet(path)

Dumps the processed data to a Parquet file.

Parameters:: path (Path, optional) – The file path to save the Parquet file. If None, uses ‘processed_data.parquet’.

metadata: Optional[dict]

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class astrodata.data.schemas.RawData(**data)

Bases: BaseModel

Represents raw input data loaded from a source file.

source

The source of the data (e.g., file path or URL).

Type:: str

format

The format of the data (e.g., “fits”, “hdf5”, “csv”, “parquet”).

Type:: Literal

data

The actual data as a Pandas DataFrame.

Type:: pd.DataFrame

class Config

Bases: object

arbitrary_types_allowed = True

data: DataFrame

format: Literal['fits', 'hdf5', 'csv', 'parquet']

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

source: Path | str

astrodata.data.utils module

astrodata.data.utils.convert_to_processed_data(data)

Convert RawData to ProcessedData using specified feature and target columns.

Return type:: ProcessedData

astrodata.data.utils.extract_format(path)

Return type:: str

astrodata.data package

Subpackages

Submodules

astrodata.data.pipeline module

astrodata.data.schemas module

astrodata.data.utils module

Module contents