Robot Control - ML Homework 1

This project is the first homework of the Machine Learning 2025-2026 course at the MSc in Artificial Intelligence and Robotics, Sapienza University of Rome.

This goal of this project is to learn a function for position control in joint space of a robotic arm without using its model.

Target Function

The target function to learn is:

f(x, q, dx) = dq

where:

Input:
- Current end-effector (EE) position x
- Current joint angles q
- Desired EE displacement dx
Output:
- Joint angles displacement dq

Problems Overview

Problem	EE position dimension (x)	Joint angles dimension (q)	Input space dimension	Output space dimension
1	ℝ²	ℝ³	7	3
2	ℝ³	ℝ⁴	10	4
3	ℝ³	ℝ⁶	12	6

Dataset

Dataset files are in CSV format with , as separator.
Each CSV file contains columns in the following order:
1. End-effector (EE) position (e.g. x, y, z)
2. Joint angles (e.g. q1, q2, q3)
3. EE displacement (e.g. dx, dy, dz)
4. Joint angles displacement (e.g. dq1, dq2, dq3)
Raw datasets are stored in dataset/raw/
Processed datasets are stored in dataset/processed/
Merged raw datasets are stored in dataset/merge/.

FIle naming convention:

All dataset files follow this naming formate <processing_type><dataset_type>_<id_dataset>.csv

where

processing_type: clean, addnoise, augnoise, merge or None
robot: reacher3, reacher4 or reacher6
dataset_type: train or test or val
id_dataset: integer identifier (datasets acquired from my scripts start at 1000)

Project Structure

marr_gz/
│
├── dataset/ # Dataset files
│ └── raw/ # Raw CSV files
│ └── processed/ # Cleaned and/or augmented datasets
│ └── merge/ # Merged dataset from multiple sources
│
├── plots/ # Containing plot images (PNG)
│ └── utils/ # Utility files for plot data and information
│
├── ml_models/ # Scripts for machine learning models
│ └── saved_models/ # Trained models 
│   └── lr/ # Linear Regression models
│   └── NN/ # Neural Network models
│     └── <robot>/ # Models trained for a specific robot
│       └── <dataset_name>/ # Models trained on a specific dataset
│         └── model_config_id_<id>.keras # Best trained model (with cross-validation)
│         └── best_params.json # Best hyperparameters (NN only; LR & SVM embed in model name)
│         └── best_history.json # Training history (NN only)
│         └── scaler_X.joblib # Saved StandardScaler for input features, used for future transformations and inverse transformations
│         └── scaler_Y.joblib # Saved StandardScaler for output targets, used for future transformations and inverse transformations
│         └── saved_config/ # All saved configurations
│           └── config_<id>/ # Individual hyperparameter configuration
│             └── model.keras
│             └── history.json
│             └── results.json
│           └── all_results.json #  All configuration results
│   └── svm/ # Support Vector Regressor models
│   └── mvpr/ # Multivariate Polynomial Regression models
│   └── rf/ # Random Forest Regressor models
│ └── data_processor.py # Data preprocessing script
│ └── linear_regression.py # Linear Regression model script
│ └── mvpr.py # Multivariate Polynomial Regression model script
│ └── svm.py # Support Vector Regressor script
│ └── random_forest.py # Random Forest Regressor model script
│ └── NN.py # Neural Network model script
│ └── tuning.py # Hyperparameter tuning scripts (LR & SVR & MVPR & RFR)
│ └── NN_utils.py # Utility functions for Neural Networks
│ └── nested_main.py # Script that performs validation by splitting the test set using multiple random seeds
│ └── perc_train.py # Train models with different training set sizes (subsets) using hyperparameters optimized on the full dataset
│ └── noise_train.py # Train models on a small dataset augmented with Gaussian noise, using hyperparameters optimized on the full dataset
│ └── plot_utils.py # Script for generating plots
│ └── my_test.py # Script for testing trained models
│
└── marr_gz/ # Robot code, configurations, and simulations
  └── data_collector.py # Script to acquire new data
  └── robot_controller.py # Script to evaluate the learned model and control the robot toward a desired end-effector position
  └── data_config.yaml # Configuration file for data collection

Installation

Clone the repository:

  git clone https://github.com/yiz-bit/marr_gz
  cd robot_control

Install dependencies:

  # Using the setup script
  bash setup.sh

  # Or manually using requirements.txt
  pip install -r requirements.txt

How to Use (Ready-to-Use Models)

1. Global Model Parameters

Use the my_test.py to load and evaluate the performance of a pretrained model. Modify the global configuration section at the top of the file:

# ================= Global Model Parameters =================

ROBOT = "reacher3"          # Options: "reacher3", "reacher4", "reacher6"
FOLDER = "raw"              # Options: "raw" or "processed" or "merge"
ID_DATASET_TRAIN = 12       # Train dataset ID
                            # Also indicates the dataset on which the model to be tested was trained
ID_DATASET_VAL = 12         # Validation dataset: None or an integer ID
ID_DATASET_TEST = 12        # Test dataset ID
                            # Check which datasets are available in the 'dataset/' folder
PROCESSING_TYPE = None      # Options: None, "clean", "addnoise", "augnoise"

MODEL_LIST = ["svm"]        # Options: "lr", "mvpr", "svm", "rf", "NN"
                            # To evaluate a single model, provide only one model name
                            # To evaluate multiple models, provide a list of model names

NORMALIZATION = True        # Enable data normalization with StandardScaler
                            # Automaticallu save the scaler im joblib version

PLOT = False                # Enable a comparative bar plot for R2 and RMSE scores
                            # Useful and recommended when evaluating multiple models

# ---------------- Warning ----------------
# There are other global parameters in this file that can be modified,
# but changing them is NOT recommended. Doing so may break the configuration
# and workspace setup illustrated previously.

Dataset paths and model directories are automatically generated based on these parameters.

Changing these values is enough to switch robot, dataset, or preprocessing type.

2. Load a Pretrained Model and Evaluate on Test Dataset

Once the global parameters are set, running the scripts will automatically train and save the model. No manual dataset loading or path editing is required.

After running a model, you will see terminal output similar to the following:

Results for ROBOT: reacher3, MODEL: NN

         Train  Validation      Test
R2    0.779257    0.743557  0.702483
MAE   0.019598    0.020142  0.024341
RMSE  0.027233    0.028334  0.032877

How to Use (Using the Models from Scratch)

1. Global Model Parameters

All models (svm.py, NN.py, linear_regression.py, mvpr.py, random_forest.py) use a global configuration section at the top of the file:

# ================= Global Model Parameters =================

ROBOT = "reacher3"          # Options: "reacher3", "reacher4", "reacher6"
FOLDER = "raw"              # Options: "raw" or "processed" or "merge"
ID_DATASET_TRAIN = 12       # Train dataset ID
                            # Also indicates the dataset on which the model to be tested was trained
ID_DATASET_VAL = 12         # Validation dataset: None or an integer ID
ID_DATASET_TEST = 12        # Test dataset ID
                            # Check which datasets are available in the 'dataset/' folder
PROCESSING_TYPE = None      # Options: None, "clean", "addnoise", "augnoise"

# Hyperparameter tuning flags
MORE_TUNE = True            # Enable coarse-to-fine tuning (only for SVM & LR & MVPR)
TUNE = False                # Enable single-level cross-validation (only for SVM & LR & RF & MVPR)
                            # For NN models, single-level tuning is automatically applied

SEARCH_TYPE = "random"      # Type of hyperparameters search used during cross-validation
                            # Options: "random" or "grid"
                            # For NN models, this parameter is referred to as TUNING

NORMALIZATION = True        # Enable data normalization with StandardScaler
                            # Automatically save the scaler in joblib version

NESTED = False              # Use this when no independent validation set is available
                            # It automatically uses different SEEDS to create a validation set from the test set for cross-validation
                            # If the validation IDs (ID_DATASET_VAL) are None, it is recommended to enable this
                            # If a validation set is already provided, keep this disabled

SEEDS = [6, 20, 42, 64, 306] # List of random seeds to use for the nested process

# ---------------- Warning ----------------
# There are other global parameters in this file that can be modified,
# but changing them is NOT recommended. Doing so may break the configuration
# and workspace setup illustrated previously.

Dataset paths and model directories are automatically generated based on these parameters.

Changing these values is enough to switch robot, dataset, or preprocessing type.

After training and cross-validation, the trained model is saved in the saved_models/ folder. Inside this folder:

Each robot has its own subfolder.
Inside the robot folder, there is a subfolder named after the training CSV file.

This structure makes it easy to remember which training data were used to train each model.

After training and evaluation, the following metrics are reported:

R² (Train and Test) - Coefficient of determination, shows how well the model fits the data. é la metrica principale di tutto il processo
RMSE (Test) - Root Mean Squared Error, emphasizes larger errors
MAE (Test) - Mean Absolute Error, measures the average magnitude of errors

Example output:

TRAIN RESULTS:
R2 Train: 0.92

TEST RESULTS:
RMSE Test: 0.13
MAE Test: 0.10
R2 Test: 0.89

Trained models and scalers are automatically saved in:

saved_models/<model_type>/<ROBOT>/<dataset_name>/...

You can use these saved models for evaluation or prediction with the my_test.py script.

2. Hyperparameters

All models (svm.py, NN.py, linear_regression.py, mvpr.py, random_forest.py) use a hyperparameter grid defined inside the code. Unlike previous cases, there is no global hyperparameter dictionary. You can modify these values directly in the code, for example:

param_grid = {
    'kernel': ['rbf'],
    'gamma': ['scale', 'auto'],
    'C': [0.001, 0.01, 0.1, 1.0, 10.0],
    'epsilon': [0.01]
}

Currently, the code does not allow adding new hyperparameters beyond the ones already present in each model’s grid. However, you can do it by:

Adding the parameter name to the grid inside the model file;
Updating the fixed_params_map dictionary in tuning.py:

fixed_params_map = {
    "SVR": ["kernel", "gamma", "C", "epsilon"],
    "SGDRegressor": ["alpha", "max_iter", "tol", "penalty", "learning_rate", "eta0", "random_state"],
    "RandomForestRegressor": ["n_estimators", "max_depth", "min_samples_split", "min_samples_leaf", "max_features", "bootstrap"]
}

Add the new parameter name here as well to ensure it is included in the tuning process.

3. Cross-Validation Settings

The type of cross-validation can be controlled via the following global parameters:

MORE_TUNE = True            # Enable coarse-to-fine tuning (only for SVM & LR & MVPR)
TUNE = False                # Enable single-level cross-validation (only for SVM & LR & RF & MVPR)
                            # For NN models, single-level tuning is automatically applied

SEARCH_TYPE = "random"      # Type of hyperparameters search used during cross-validation
                            # Options: "random" or "grid"
                            # For NN models, this parameter is referred to as TUNING

NESTED = False              # Use this when no independent validation set is available
                            # It automatically uses different SEEDS to create a validation set from the test set for cross-validation
                            # If the validation IDs (ID_DATASET_VAL) are None, it is recommended to enable this
                            # If a validation set is already provided, keep this disabled

SEEDS = [6, 20, 42, 64, 306] # List of random seeds to use for the nested process

Notes:

The first three parameters (MORE_TUNE, TUNE, SEARCH_TYPE) determine the type of cross-validation: coarse-to-fine or single-level, using either random search or grid search.
NESTED and SEEDS are used when there is no independent validation set. In this case, the code automatically splits the original test set into validation and test sets, repeating the process for each seed to get a robust estimate.
If a validation set exists, NESTED is automatically set to False.
If a validation set not exist and NESTED is not enabled, a single validation set is generated from the test set using a fixed random seed (42).

This procedure is necessary because the collected data are derived from a series of configurations in which each configuration displaces the robot multiple times with random offsets. As a result, the data are correlated, so classical cross-validation (splitting the train set into train and validation randomly) is not appropriate. Nested or controlled splitting ensures that validation data remain independent enough to provide a reliable estimate of model performance.

Model Evaluation

You can perform model evaluation as follows:

Open a terminal and run:

cd marr_gz
source ~/ros2_ws/install/setup.bash

Select the robot of interest:

reacher3 -> problem1
reacher4 -> problem2
reacher6 -> problem3

Launch the simulation:

python3 marr_launch.py ../config/my_problem1_config.yaml

To evaluate the model (currently only NN is fully supported), open another terminal:

cd marr_gz
source ~/ros2_ws/install/setup.bash
python3 robot_controller.py

To change the target points for testing, edit the main() function in robot_controller.py:

if ROBOT == "reacher3":
        test_target = [0.00929, -0.00136]
    else:
        test_target = [0.011, -0.080, -0.095]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robot Control - ML Homework 1

Target Function

Problems Overview

Dataset

FIle naming convention:

Project Structure

Installation

How to Use (Ready-to-Use Models)

1. Global Model Parameters

2. Load a Pretrained Model and Evaluate on Test Dataset

How to Use (Using the Models from Scratch)

1. Global Model Parameters

2. Hyperparameters

3. Cross-Validation Settings

Model Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
dataset		dataset
marr_gz		marr_gz
ml_models		ml_models
plots		plots
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Robot Control - ML Homework 1

Target Function

Problems Overview

Dataset

FIle naming convention:

Project Structure

Installation

How to Use (Ready-to-Use Models)

1. Global Model Parameters

2. Load a Pretrained Model and Evaluate on Test Dataset

How to Use (Using the Models from Scratch)

1. Global Model Parameters

2. Hyperparameters

3. Cross-Validation Settings

Model Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages