Getting started

We provide example Python and JSON files that can be used to create the following pipelines:

🧠 ML pipeline:
1. MLPipelineSimple: Loads a CSV dataset, concatenates selected features, splits the data into training and testing sets, trains a Support Vector Classifier (SVC) model, tests the model, calculates performance metrics (accuracy, F1 score, precision, and recall), and visualizes the results in bar plots.
2. MLPipelineCrossValidation: An extended version of MLPipelineSimple that adds a data splitting step for Stratified K-Fold Cross-Validation. Then, it trains and tests the model using the cross-validation technique and visualizes the validation and test F1 scores in bar plots.
3. MLPipelineModelSelection: A modified version of MLPipelineSimple that replaces the training step with a model selection step. Rather than using a fixed model, this pipeline involves training and cross-validating a Support Vector Classifier (SVC) model with various hyperparameters to optimize performance.
📊 Statistics pipeline:
- StatsPipeline: Loads a specific feature from a CSV dataset, calculates its mean and standard deviation, and visualizes the feature's values using a line plot and the calculated statistics using a bar plot.
📈 Visualization pipeline:
- VisuPipeline: The pipeline loads two numerical features from a CSV dataset and visualizes each feature's values using separate line plots.

💡 Tip: To fetch the examples into your working directory for easy access, run typer exe_kg_lib.cli.main run get-examples.

🗒️ Note: The naming convention for output names (used as inputs for subsequent tasks) in .json files can be found in exe_kg_lib/utils/string_utils.py. Look for TASK_OUTPUT_NAME_REGEX.