astrodata.preml.processors package¶
Submodules¶
astrodata.preml.processors.MissingImputator module¶
- class astrodata.preml.processors.MissingImputator.MissingImputator(categorical_columns=None, numerical_columns=None, artifact_path=None)¶
Bases:
PremlProcessor
Missing value imputator for handling missing data in datasets.
This class provides functionality to impute missing values in numerical and categorical columns using specified strategies. It supports saving and loading imputation artifacts for reuse.
- process(preml)¶
Imputes missing values in the dataset.
This method imputes missing values in numerical columns using the mean and in categorical columns using the mode. If an artifact path is provided, it loads the imputation artifact and applies it to the test features. Otherwise, it fits new imputers on the training features, transforms both training and test features, and saves the artifact for reuse.
astrodata.preml.processors.Ohe module¶
- class astrodata.preml.processors.Ohe.OHE(categorical_columns=None, numerical_columns=None, artifact_path=None)¶
Bases:
PremlProcessor
OneHotEncoder (OHE) processor for encoding categorical features.
This class provides functionality to one-hot encode categorical columns in a dataset, while optionally retaining numerical columns. It supports saving and loading encoding artifacts using pickle for reuse.
- process(preml)¶
One-hot encodes categorical features in the data.
This method encodes categorical columns in the dataset using one-hot encoding. If an artifact path is provided, it loads the encoding artifact and applies it to the test features. Otherwise, it fits a new encoder on the training features, transforms both training and test features, and saves the artifact for reuse.
astrodata.preml.processors.Standardizer module¶
- class astrodata.preml.processors.Standardizer.Standardizer(numerical_columns=None, artifact_path=None, save_path=None)¶
Bases:
PremlProcessor
Standardizer for scaling numerical features.
This class provides functionality to standardize numerical columns in a dataset by scaling them to have a mean of 0 and a standard deviation of 1. It supports saving and loading scaling artifacts for reuse.
- process(preml)¶
Standardizes numerical features in the dataset.
This method scales numerical columns to have a mean of 0 and a standard deviation of 1. If an artifact path is provided, it loads the scaling artifact and applies it to the test features. Otherwise, it fits a new scaler on the training features, transforms both training and test features, and saves the artifact for reuse.
astrodata.preml.processors.TrainTestSplitter module¶
- class astrodata.preml.processors.TrainTestSplitter.TrainTestSplitter(**kwargs)¶
Bases:
PremlProcessor
Processor to convert ProcessedData to Premldata.
This processor splits the input ProcessedData into training, testing, and optionally validation sets according to the configuration provided. It supports specifying target columns, test size, random state, and validation split. The output is a Premldata object containing the split datasets and metadata.
- process(data, **kwargs)¶
Converts a ProcessedData object to a Premldata object.
This method splits the input ProcessedData into training, testing, and optionally validation sets using scikit-learn’s train_test_split. The configuration determines the target columns, test size, random state, and validation split. The resulting Premldata object contains the split features, targets, and metadata.
- Parameters:
data (ProcessedData) – The input processed data to be split.
- Returns:
The resulting Premldata object containing the split datasets.
- Return type:
astrodata.preml.processors.base module¶
- class astrodata.preml.processors.base.PremlProcessor(artifact_path=None, **kwargs)¶
Bases:
ABC
An abstract base class for preml processors.
Subclasses must implement the process method to define how the input Premldata is processed.
- process(preml
Premldata) -> Premldata: Abstract method to process the input Premldata and return a new Premldata object.
- load_artifact(path)¶
Loads an artifact from a specified path.
- Parameters:
path (str) – The path from where the artifact should be loaded.
- abstractmethod process(preml, artifact=None, **kwargs)¶
process the input Premldata and returns a new Premldata object.
- Return type:
- save_artifact(artifact)¶
Saves an artifact to a specified path.
- Parameters:
Artifact (Any) – The artifact to be saved, which can be any object.