UCD-BDLab · ramosv · Nov 25, 2025 · Nov 13, 2025 · Nov 23, 2025 · Nov 24, 2025
diff --git a/.gitignore b/.gitignore
@@ -56,10 +56,10 @@ TCGA_BRCA_DATA/
 !bioneuralnet/utils/
 !bioneuralnet/utils/rdata_to_df.R
 
-# Allowing .csv files in datasets/example1 and its subdirectories
+# Allowing .csv files in datasets/example and its subdirectories
 !bioneuralnet/datasets/
-!bioneuralnet/datasets/example1/
-!bioneuralnet/datasets/example1/**/*.csv
+!bioneuralnet/datasets/example/
+!bioneuralnet/datasets/example/**/*.csv
 
 !bioneuralnet/datasets/
 !bioneuralnet/datasets/monet/
@@ -72,8 +72,8 @@ TCGA_BRCA_DATA/
 !bioneuralnet/datasets/kipan/
 !bioneuralnet/datasets/kipan/**/*.csv
 
-!bioneuralnet/datasets/gbmlgg/
-!bioneuralnet/datasets/gbmlgg/**/*.csv
+!bioneuralnet/datasets/lgg/
+!bioneuralnet/datasets/lgg/**/*.csv
 
 
 feature_testing
@@ -136,3 +136,9 @@ GBMLGG
 PAAN
 dpmon_cv_results_GAT_FINAL
 docs_notebooks
+dpmon_results
+dpmon_results_GAT_FINAL
+dpmon_results_GCN_FINAL
+dpmon_results_SAGE_FINAL
+dpmon_results_GIN_FINAL
+TCGA-Notebooks-data
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -48,7 +48,7 @@ repos:
             echo "Forbidden file types detected (RData)!"
             exit 1
           fi
-          if echo "$FILES" | grep -E "\.csv$" | grep -vE "^bioneuralnet/datasets/(example1|brca|monet)/"; then
+          if echo "$FILES" | grep -E "\.csv$" | grep -vE "^bioneuralnet/datasets/(example|brca|monet)/"; then
             echo "Forbidden CSV files detected (outside allowed folders)!"
             exit 1
           fi

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -147,3 +147,56 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
 ### **Changed**
 
   - **Documentation Update**: Updated the online documentation (Read the Docs/API Reference) to include the new TCGA datasets and their respective classification results using the **DPMON**.
+
+## [1.2.0] - 2025-11-23
+
+### API and Architecture Refactoring
+- **Namespace Hierarchy Overhaul**: Transitioned from a flat namespace to a hybrid hierarchical structure to enhance modularity and prevent namespace pollution.
+    - **Core Classes**: `DPMON`, `GNNEmbedding`, `SubjectRepresentation`, `SmCCNet`, and `DatasetLoader` remain accessible at the top level (e.g., `bnn.DPMON`).
+    - **Utilities and Metrics**: Functional tools are now scoped to their respective submodules (e.g., `bnn.metrics.plot_network`, `bnn.utils.preprocess_clinical`).
+- **Utils Module Restructuring**: Decomposed the monolithic `utils` module into specialized submodules for improved maintainability:
+    - `utils.data`: Contains summary statistics functions (e.g., `variance_summary`).
+    - `utils.preprocess`: Contains data transformation functions (e.g., `impute_omics`, `normalize_omics`).
+    - `utils.reproducibility`: Dedicated module for seeding functions (`set_seed`).
+
+### New Features
+- **Graph Engineering Module (`graph_tools`)**: Introduced a new module for the diagnosis and repair of network topology issues.
+    - `repair_graph_connectivity`: Implemented an algorithm to reconnect fragmented network components (islands) to the global network using eigenvector centrality hubs or omics-driven correlation.
+    - `find_optimal_graph`: Added an AutoML-style search function that benchmarks various graph construction strategies (Gaussian, Correlation, Threshold) using a structural proxy task to optimize downstream stability.
+    - `graph_analysis`: Added diagnostic utilities to log topological metrics (clustering coefficient, average degree) and identify isolated subgraphs broken down by omics modality.
+- **DPMON Enhancements**: Expanded the `NeuralNetwork` backbone to support multiple dimensionality reduction strategies beyond the standard AutoEncoder.
+    - **Linear Projection**: Added `ScalarProjection`, utilizing a linear layer to map embeddings to feature weights.
+    - **MLP Projection**: Added `MLPProjection`, utilizing a non-linear Multilayer Perceptron for complex feature weighting.
+- **Dataset Loaders**:
+    - Implemented functional loaders (`load_brca`, `load_kipan`, `load_lgg`, `load_paad`, `load_monet`, `load_example`) to provide immediate access to data dictionaries, aligning with `scikit-learn` conventions.
+    - Added `__getitem__` support to the `DatasetLoader` class for direct key access (e.g., `loader['rna']`).
+
+### Data Standardization
+- **BRCA Clinical Update**: Removed 15 duplicated columns from the BRCA clinical dataset, reducing the feature dimensionality from 118 to 103 to ensure data uniqueness.
+- **Dataset Renaming**:
+    - Renamed the synthetic dataset `example1` to `example`.
+    - Renamed `gbmlgg` to `lgg` (Brain Lower Grade Glioma).
+- **Target Variable Update**: Updated the target variable for the `lgg` dataset from 'histological type' to 'vital_status' to better align with prognostic prediction tasks.
+- **Key Standardization**: Removed redundant `_data` suffixes from dataset dictionary keys (e.g., `monet['mirna_data']` is now `monet['mirna']`).
+- **Dataset Specifications**: Updated documentation to explicitly define the dimensions (samples × features) for all included datasets:
+    - **BRCA**: miRNA (769, 503), Target (769, 1), Clinical (769, 103), RNA (769, 2500), Meth (769, 2203).
+    - **LGG**: miRNA (511, 548), Target (511, 1), Clinical (511, 13), RNA (511, 2127), Meth (511, 1823).
+    - **PAAD**: CNV (177, 1035), Target (177, 1), Clinical (177, 19), RNA (177, 1910), Meth (177, 1152).
+    - **KIPAN**: miRNA (658, 472), Target (658, 1), Clinical (658, 19), RNA (658, 2284), Meth (658, 2102).
+    - **Monet**: Gene (107, 5039), miRNA (107, 789), Phenotype (106, 1), RPPA (107, 175), Clinical (107, 5).
+    - **Example**: X1 (358, 500), X2 (358, 100), Y (358, 1), Clinical (358, 6).
+
+### Improvements and Fixes
+- **Documentation**: Refactored all docstrings across the library to adhere to strict Google Style formatting (Args/Returns) to ensure consistent API documentation generation.
+- **Clustering**:
+    - **Hybrid Louvain**: Corrected the parameter tuning logic for `k3` and `k4` weights and refined the iterative refinement loop for identifying phenotype-associated subgraphs.
+    - **Correlated PageRank**: Enhanced input validation to ensure proper alignment between graph nodes and omics features.
+
+### Removed
+- **Metrics Evaluation**: Removed the `metrics.evaluation` module. Its functionality has been consolidated into the `metrics` module or deprecated in favor of external validation workflows.
+
+### Left to Do
+- **(DONE)Test Suite Completion**: Refactor remaining tests (`gnn_embedding`, `subject_representation`, `hybrid_louvain`) to align with new `utils` imports and other major changes.
+- **Documentation**: Update ReadTheDocs API reference and `README.md` to reflect the split `utils` submodules and new `graph_tools`. All jupyter notebook examples will need to be updates.
+- **(DONE)Release Prep**: Bump version to `1.2.0` in `setup.py`.
+- **(DONE)Errors**: Errors with tests and doc-build are expected. They will be addresed in following smaller versions `1.2.1` and so on.
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -12,10 +12,11 @@ recursive-include bioneuralnet/external_tools *.R *.r
 recursive-include bioneuralnet/utils *.R *.r
 
 recursive-include bioneuralnet/datasets/monet *.csv
-recursive-include bioneuralnet/datasets/example1 *.csv
+recursive-include bioneuralnet/datasets/example *.csv
 recursive-include bioneuralnet/datasets/brca *.csv
-recursive-include bioneuralnet/datasets/gbmlgg *.csv
+recursive-include bioneuralnet/datasets/lgg *.csv
 recursive-include bioneuralnet/datasets/kipan *.csv
+recursive-include bioneuralnet/datasets/paad *.csv
 
 # Include documentation source files
 prune docs

diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@
 [![Documentation](https://img.shields.io/badge/docs-read%20the%20docs-blue.svg)](https://bioneuralnet.readthedocs.io/en/latest/)
 [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17503083.svg)](https://doi.org/10.5281/zenodo.17503083)
 
-## Welcome to BioNeuralNet 1.1.4
+## Welcome to BioNeuralNet 1.2.0
 
 ![BioNeuralNet Logo](assets/LOGO_WB.png)
 
@@ -172,7 +172,7 @@ from bioneuralnet.downstream_task import DPMON
 from bioneuralnet.datasets import DatasetLoader
 
 # Load the dataset and access individual omics modalities
-example = DatasetLoader("example1")
+example = DatasetLoader("example")
 omics_genes = example.data["X1"]
 omics_proteins = example.data["X2"]
 phenotype = example.data["Y"]