Skip to content

carvalho7976/ChangeProneTools

Repository files navigation

Supplementary Material - Machine Learning for Change-Prone Class Prediction: A History-Based Approach

This repository contains the data used in the article Machine Learning for Change-Prone Class Prediction: A History-Based Approach

Abstract

Classes have a very dynamic life cycle in object-oriented software projects. They can be created, modified or removed due to differ- ent reasons. The prediction of prone-change classes in the early stages of the project positively impact the team’s productivity, the allocation of resources, and the quality of the software developed. Existing work uses Machine Learning (ML) and different kind of class metrics. But a limitation of existing work that they do not con- sider the temporal dependency between instances in the datasets. To fulfill such gap, this work introduces an approach based on the change history of the class in different releases from public repositories. The approach uses the Sliding Window method, and adopts as predictors structural and evolutionary metrics, as well as frequency and diversity of smells. Five projects and four ML algorithms are used in the evaluation. In the great majority of the cases our approach overcomes a traditional approach considering all the indicators. Random Forest presents the best performance and the use of smell-related information does not impact the results

Folders

  • Datasets: Contains the datasets used in the project
  • Docs: Contains the results generated by the scripts
  • Results: Contains spreadsheets with processed results used in the article

Scripts

Each script starts with "main_" and is responsible for some individual part of the process.

  • main_organic: read json generated by organic and covert to csv
  • main_evolutiveMetricsExtractor: extract evolutive metrics from repository
  • main_changeDistillerWrapper: execute changedistiller to extract code changes
  • main_historyBasedApproach: run history-based approach
  • main_tradicionalApproach: run tradicional approach
  • main_joinMetrics: join metrics of organic, understand, ck, evolutional and changeDistiller
  • main_analysis: statistical analysis of the datasets

Folders

  • dataset: raw datasets
  • extra tools: scripts for charts, feature selection, table creation, pca analysis and information gain.
  • metrics definition: description of metrics with references
  • results: results of each approach
  • statistical analysis: statistical analysis

Extras

To run the script: python3 main__changeDistillerWrapper.py --pathA --pathB --commits --projectName --absolutePath --mode

--pathA - path of the project to checkout --pathB - secodary path of the project (each path will be checked out via git) --commits - csv file of commits to be compared (commitA,commitB) --projectName - name of the project --absolutePath - absolute path of the main folder of the script --mode - if tag, the script will compare all commits with tag and ignore the csv file

Example:

python3 changeDistiller.py --pathA "/mnt/sda4/projects-smells/results/changeDistiller/projectA/jgit/" --pathB "/mnt/sda4/projects-smells/results/changeDistiller/projectB/jgit/" --commits commits.txt --projectName "jgit" --absolutePath "/mnt/sda4/projects-smells/results/changeDistiller/" --mode "tag"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages