astrodata.preml package

Subpackages

Submodules

astrodata.preml.pipeline module

class astrodata.preml.pipeline.PremlPipeline(config_path, processors=None)

Bases: object

Pipeline for processing data using a configurable sequence of processors.

Features:
  • Requires either a config_path or a processors list (not both None).

  • Merges processors from config and argument, with argument processors taking priority.

  • Ensures the first processor is a TrainTestSplitter.

Parameters:
  • config_path (str, optional) – Path to the pipeline configuration file.

  • processors (list[PremlProcessor], optional) – List of processor instances.

run(processeddata

ProcessedData) -> Premldata: Executes the pipeline, applying processors in order and returning the final Premldata.

run(processeddata, dump_output=True)

Executes the data pipeline.

Return type:

Premldata

astrodata.preml.schemas module

class astrodata.preml.schemas.Premldata(**data)

Bases: BaseModel

Represents processed data after transformations.

data

The actual data as a Pandas DataFrame.

Type:

pd.DataFrame

metadata

Additional metadata about the processed data.

Type:

Optional[dict]

class Config

Bases: object

arbitrary_types_allowed = True
dump_parquet(path)

Dumps the processed data to a Parquet file.

Parameters:

path (Path) – The file path to save the Parquet file.

dump_supervised_ML_format()

Returns the data into training and testing sets.

Returns:

A tuple containing the training and testing features and targets.

Return type:

tuple

metadata: Optional[dict]
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

test_features: DataFrame
test_targets: DataFrame | Series
train_features: DataFrame
train_targets: DataFrame | Series
val_features: Optional[DataFrame]
val_targets: Union[DataFrame, Series, None]

astrodata.preml.utils module

astrodata.preml.utils.instantiate_processors(config, ignore_unknown=True, defaults=None)

Given a config dict, returns a dict mapping processor names to instances. Validates processor names, catches instantiation errors, and allows for defaults.

Parameters:
  • config (dict) – The ‘preml’ section of the configuration.

  • ignore_unknown (bool) – If True, unknown processors are ignored. If False, raises error.

  • defaults (dict) – Optional default parameters for processors.

Returns:

Dictionary mapping processor names to their instances.

Return type:

dict

Raises:
  • ValueError – If unknown processor is found and ignore_unknown is False.

  • RuntimeError – If processor instantiation fails.

Module contents