Data Pipeline¶
The DataPipeline
class in the astrodata.data
module orchestrates the loading and processing of data through a sequence of processors. It is designed to standardize and streamline data preparation workflows, making it easy to apply a series of transformations to your data.
Overview¶
The pipeline consists of three main components:
Loader: Responsible for loading raw data (e.g., from CSV files) into a standardized format.
Processors: A list of processors that sequentially transform the data.
Example Usage¶
from astrodata.data import CsvLoader, DataPipeline
# Initialize loader and processors
loader = CsvLoader()
processors = [CustomProcessor()]
# Create the pipeline
pipeline = DataPipeline(
config_path="example_config.yaml",
loader=loader,
processors=processors,
)
# Run the pipeline on your data file
processed_data = pipeline.run("your_data.csv", dump_output=False)
print(processed_data.data.head())