astrodata.data package
Subpackages
Submodules
astrodata.data.pipeline module
- class astrodata.data.pipeline.DataPipeline(config_path, loader, processors)
Bases:
object
A pipeline for processing data using a loader and a series of processors.
- loader
The data loader responsible for loading raw data.
- Type:
- processors
A list of processors to process the data.
- Type:
list[AbstractProcessor]
- run(path
str) -> ProcessedData: Executes the pipeline by loading data from the given path, applying the processors sequentially, and converting the result into a ProcessedData object.
- run(path, dump_output=True)
Executes the data pipeline.
- Parameters:
path (str) – The file path to load the raw data from.
- Returns:
The processed data after applying all processors.
- Return type:
astrodata.data.schemas module
- class astrodata.data.schemas.ProcessedData(**data)
Bases:
BaseModel
Represents processed data after transformations.
- data
The actual data as a Pandas DataFrame.
- Type:
pd.DataFrame
- metadata
Additional metadata about the processed data.
- Type:
Optional[dict]
-
data:
DataFrame
- dump_parquet(path)
Dumps the processed data to a Parquet file.
- Parameters:
path (Path, optional) – The file path to save the Parquet file. If None, uses ‘processed_data.parquet’.
-
metadata:
Optional
[dict
]
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class astrodata.data.schemas.RawData(**data)
Bases:
BaseModel
Represents raw input data loaded from a source file.
- source
The source of the data (e.g., file path or URL).
- Type:
str
- format
The format of the data (e.g., “fits”, “hdf5”, “csv”, “parquet”).
- Type:
Literal
- data
The actual data as a Pandas DataFrame.
- Type:
pd.DataFrame
-
data:
DataFrame
-
format:
Literal
['fits'
,'hdf5'
,'csv'
,'parquet'
]
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
source:
Path
|str
astrodata.data.utils module
- astrodata.data.utils.convert_to_processed_data(data)
Convert RawData to ProcessedData using specified feature and target columns.
- Return type:
- astrodata.data.utils.extract_format(path)
- Return type:
str