astrodata.data package¶
Subpackages¶
Submodules¶
astrodata.data.pipeline module¶
- class astrodata.data.pipeline.DataPipeline(config_path, loader, processors)¶
Bases:
object
A pipeline for processing data using a loader and a series of processors.
- config¶
Path to the configuration file.
- Type:
str
- loader¶
The data loader responsible for loading raw data.
- Type:
- processors¶
A list of processors to process the data.
- Type:
list[AbstractProcessor]
- run(path
str) -> ProcessedData: Executes the pipeline by loading data from the given path, applying the processors sequentially, and converting the result into a ProcessedData object.
- run(path, dump_output=True)¶
Executes the data pipeline.
- Parameters:
path (str) – The file path to load the raw data from, relative to the project path.
dump_output (bool, optional) – Whether to dump the processed data to a Parquet file. Defaults to True.
- Returns:
The processed data after applying all processors.
- Return type:
astrodata.data.schemas module¶
- class astrodata.data.schemas.ProcessedData(**data)¶
Bases:
BaseModel
Represents processed data after transformations.
- data¶
The actual data as a Pandas DataFrame.
- Type:
pd.DataFrame
- metadata¶
Additional metadata about the processed data.
- Type:
Optional[dict]
-
data:
DataFrame
¶
- dump_parquet(path)¶
Dumps the processed data to a Parquet file.
- Parameters:
path (Path) – The file path to save the Parquet file.
-
metadata:
Optional
[dict
]¶
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class astrodata.data.schemas.RawData(**data)¶
Bases:
BaseModel
Represents raw input data loaded from a source file.
- source¶
The source of the data (e.g., file path or URL).
- Type:
str
- format¶
The format of the data (e.g., “fits”, “hdf5”, “csv”, “parquet”).
- Type:
Literal
- data¶
The actual data as a Pandas DataFrame.
- Type:
pd.DataFrame
-
data:
DataFrame
¶
-
format:
Literal
['fits'
,'hdf5'
,'csv'
,'parquet'
]¶
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
source:
Path
|str
¶
astrodata.data.utils module¶
- astrodata.data.utils.convert_to_processed_data(data)¶
Convert RawData to ProcessedData using specified feature and target columns.
- Return type:
- astrodata.data.utils.extract_format(path)¶
- Return type:
str