astrodata.preml package
Subpackages
Submodules
astrodata.preml.pipeline module
- class astrodata.preml.pipeline.PremlPipeline(config_path, processors=None)
Bases:
object
Pipeline for processing data using a configurable sequence of processors.
- Features:
Requires either a config_path or a processors list (not both None).
Merges processors from config and argument, with argument processors taking priority.
Ensures the first processor is a TrainTestSplitter.
- Parameters:
config_path (str, optional) – Path to the pipeline configuration file.
processors (list[PremlProcessor], optional) – List of processor instances.
- run(processeddata
ProcessedData) -> Premldata: Executes the pipeline, applying processors in order and returning the final Premldata.
astrodata.preml.schemas module
- class astrodata.preml.schemas.Premldata(**data)
Bases:
BaseModel
Represents processed data after transformations.
- data
The actual data as a Pandas DataFrame.
- Type:
pd.DataFrame
- metadata
Additional metadata about the processed data.
- Type:
Optional[dict]
- dump_parquet(path)
Dumps the processed data to a Parquet file.
- Parameters:
path (Path) – The file path to save the Parquet file.
- dump_supervised_ML_format()
Returns the data into training and testing sets.
- Returns:
A tuple containing the training and testing features and targets.
- Return type:
tuple
-
metadata:
Optional
[dict
]
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
test_features:
DataFrame
-
test_targets:
DataFrame
|Series
-
train_features:
DataFrame
-
train_targets:
DataFrame
|Series
-
val_features:
Optional
[DataFrame
]
-
val_targets:
Union
[DataFrame
,Series
,None
]
astrodata.preml.utils module
- astrodata.preml.utils.instantiate_processors(config, ignore_unknown=True, defaults=None)
Given a config dict, returns a dict mapping processor names to instances. Validates processor names, catches instantiation errors, and allows for defaults.
- Parameters:
config (dict) – The ‘preml’ section of the configuration.
ignore_unknown (bool) – If True, unknown processors are ignored. If False, raises error.
defaults (dict) – Optional default parameters for processors.
- Returns:
Dictionary mapping processor names to their instances.
- Return type:
dict
- Raises:
ValueError – If unknown processor is found and ignore_unknown is False.
RuntimeError – If processor instantiation fails.