Getting started
We provide example Python and JSON files that can be used to create the following pipelines:
- 🧠 ML pipeline:
- MLPipelineSimple: Loads a CSV dataset, concatenates selected features, splits the data into training and testing sets, trains a Support Vector Classifier (SVC) model, tests the model, calculates performance metrics (accuracy, F1 score, precision, and recall), and visualizes the results in bar plots.
- MLPipelineCrossValidation: An extended version of MLPipelineSimple that adds a data splitting step for Stratified K-Fold Cross-Validation. Then, it trains and tests the model using the cross-validation technique and visualizes the validation and test F1 scores in bar plots.
- MLPipelineModelSelection: A modified version of MLPipelineSimple that replaces the training step with a model selection step. Rather than using a fixed model, this pipeline involves training and cross-validating a Support Vector Classifier (SVC) model with various hyperparameters to optimize performance.
- 📊 Statistics pipeline:
- StatsPipeline: Loads a specific feature from a CSV dataset, calculates its mean and standard deviation, and visualizes the feature's values using a line plot and the calculated statistics using a bar plot.
- 📈 Visualization pipeline:
- VisuPipeline: The pipeline loads two numerical features from a CSV dataset and visualizes each feature's values using separate line plots.
💡 Tip: To fetch the examples into your working directory for easy access, run
typer exe_kg_lib.cli.main run get-examples
.🗒️ Note: The naming convention for output names (used as inputs for subsequent tasks) in
.json
files can be found inexe_kg_lib/utils/string_utils.py
. Look forTASK_OUTPUT_NAME_REGEX
.