Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,10 @@ TCGA_BRCA_DATA/
!bioneuralnet/utils/
!bioneuralnet/utils/rdata_to_df.R

# Allowing .csv files in datasets/example1 and its subdirectories
# Allowing .csv files in datasets/example and its subdirectories
!bioneuralnet/datasets/
!bioneuralnet/datasets/example1/
!bioneuralnet/datasets/example1/**/*.csv
!bioneuralnet/datasets/example/
!bioneuralnet/datasets/example/**/*.csv

!bioneuralnet/datasets/
!bioneuralnet/datasets/monet/
Expand All @@ -72,8 +72,8 @@ TCGA_BRCA_DATA/
!bioneuralnet/datasets/kipan/
!bioneuralnet/datasets/kipan/**/*.csv

!bioneuralnet/datasets/gbmlgg/
!bioneuralnet/datasets/gbmlgg/**/*.csv
!bioneuralnet/datasets/lgg/
!bioneuralnet/datasets/lgg/**/*.csv


feature_testing
Expand Down Expand Up @@ -136,3 +136,9 @@ GBMLGG
PAAN
dpmon_cv_results_GAT_FINAL
docs_notebooks
dpmon_results
dpmon_results_GAT_FINAL
dpmon_results_GCN_FINAL
dpmon_results_SAGE_FINAL
dpmon_results_GIN_FINAL
TCGA-Notebooks-data
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ repos:
echo "Forbidden file types detected (RData)!"
exit 1
fi
if echo "$FILES" | grep -E "\.csv$" | grep -vE "^bioneuralnet/datasets/(example1|brca|monet)/"; then
if echo "$FILES" | grep -E "\.csv$" | grep -vE "^bioneuralnet/datasets/(example|brca|monet)/"; then
echo "Forbidden CSV files detected (outside allowed folders)!"
exit 1
fi
Expand Down
53 changes: 53 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,3 +147,56 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
### **Changed**

- **Documentation Update**: Updated the online documentation (Read the Docs/API Reference) to include the new TCGA datasets and their respective classification results using the **DPMON**.

## [1.2.0] - 2025-11-23

### API and Architecture Refactoring
- **Namespace Hierarchy Overhaul**: Transitioned from a flat namespace to a hybrid hierarchical structure to enhance modularity and prevent namespace pollution.
- **Core Classes**: `DPMON`, `GNNEmbedding`, `SubjectRepresentation`, `SmCCNet`, and `DatasetLoader` remain accessible at the top level (e.g., `bnn.DPMON`).
- **Utilities and Metrics**: Functional tools are now scoped to their respective submodules (e.g., `bnn.metrics.plot_network`, `bnn.utils.preprocess_clinical`).
- **Utils Module Restructuring**: Decomposed the monolithic `utils` module into specialized submodules for improved maintainability:
- `utils.data`: Contains summary statistics functions (e.g., `variance_summary`).
- `utils.preprocess`: Contains data transformation functions (e.g., `impute_omics`, `normalize_omics`).
- `utils.reproducibility`: Dedicated module for seeding functions (`set_seed`).

### New Features
- **Graph Engineering Module (`graph_tools`)**: Introduced a new module for the diagnosis and repair of network topology issues.
- `repair_graph_connectivity`: Implemented an algorithm to reconnect fragmented network components (islands) to the global network using eigenvector centrality hubs or omics-driven correlation.
- `find_optimal_graph`: Added an AutoML-style search function that benchmarks various graph construction strategies (Gaussian, Correlation, Threshold) using a structural proxy task to optimize downstream stability.
- `graph_analysis`: Added diagnostic utilities to log topological metrics (clustering coefficient, average degree) and identify isolated subgraphs broken down by omics modality.
- **DPMON Enhancements**: Expanded the `NeuralNetwork` backbone to support multiple dimensionality reduction strategies beyond the standard AutoEncoder.
- **Linear Projection**: Added `ScalarProjection`, utilizing a linear layer to map embeddings to feature weights.
- **MLP Projection**: Added `MLPProjection`, utilizing a non-linear Multilayer Perceptron for complex feature weighting.
- **Dataset Loaders**:
- Implemented functional loaders (`load_brca`, `load_kipan`, `load_lgg`, `load_paad`, `load_monet`, `load_example`) to provide immediate access to data dictionaries, aligning with `scikit-learn` conventions.
- Added `__getitem__` support to the `DatasetLoader` class for direct key access (e.g., `loader['rna']`).

### Data Standardization
- **BRCA Clinical Update**: Removed 15 duplicated columns from the BRCA clinical dataset, reducing the feature dimensionality from 118 to 103 to ensure data uniqueness.
- **Dataset Renaming**:
- Renamed the synthetic dataset `example1` to `example`.
- Renamed `gbmlgg` to `lgg` (Brain Lower Grade Glioma).
- **Target Variable Update**: Updated the target variable for the `lgg` dataset from 'histological type' to 'vital_status' to better align with prognostic prediction tasks.
- **Key Standardization**: Removed redundant `_data` suffixes from dataset dictionary keys (e.g., `monet['mirna_data']` is now `monet['mirna']`).
- **Dataset Specifications**: Updated documentation to explicitly define the dimensions (samples × features) for all included datasets:
- **BRCA**: miRNA (769, 503), Target (769, 1), Clinical (769, 103), RNA (769, 2500), Meth (769, 2203).
- **LGG**: miRNA (511, 548), Target (511, 1), Clinical (511, 13), RNA (511, 2127), Meth (511, 1823).
- **PAAD**: CNV (177, 1035), Target (177, 1), Clinical (177, 19), RNA (177, 1910), Meth (177, 1152).
- **KIPAN**: miRNA (658, 472), Target (658, 1), Clinical (658, 19), RNA (658, 2284), Meth (658, 2102).
- **Monet**: Gene (107, 5039), miRNA (107, 789), Phenotype (106, 1), RPPA (107, 175), Clinical (107, 5).
- **Example**: X1 (358, 500), X2 (358, 100), Y (358, 1), Clinical (358, 6).

### Improvements and Fixes
- **Documentation**: Refactored all docstrings across the library to adhere to strict Google Style formatting (Args/Returns) to ensure consistent API documentation generation.
- **Clustering**:
- **Hybrid Louvain**: Corrected the parameter tuning logic for `k3` and `k4` weights and refined the iterative refinement loop for identifying phenotype-associated subgraphs.
- **Correlated PageRank**: Enhanced input validation to ensure proper alignment between graph nodes and omics features.

### Removed
- **Metrics Evaluation**: Removed the `metrics.evaluation` module. Its functionality has been consolidated into the `metrics` module or deprecated in favor of external validation workflows.

### Left to Do
- **(DONE)Test Suite Completion**: Refactor remaining tests (`gnn_embedding`, `subject_representation`, `hybrid_louvain`) to align with new `utils` imports and other major changes.
- **Documentation**: Update ReadTheDocs API reference and `README.md` to reflect the split `utils` submodules and new `graph_tools`. All jupyter notebook examples will need to be updates.
- **(DONE)Release Prep**: Bump version to `1.2.0` in `setup.py`.
- **(DONE)Errors**: Errors with tests and doc-build are expected. They will be addresed in following smaller versions `1.2.1` and so on.
5 changes: 3 additions & 2 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,11 @@ recursive-include bioneuralnet/external_tools *.R *.r
recursive-include bioneuralnet/utils *.R *.r

recursive-include bioneuralnet/datasets/monet *.csv
recursive-include bioneuralnet/datasets/example1 *.csv
recursive-include bioneuralnet/datasets/example *.csv
recursive-include bioneuralnet/datasets/brca *.csv
recursive-include bioneuralnet/datasets/gbmlgg *.csv
recursive-include bioneuralnet/datasets/lgg *.csv
recursive-include bioneuralnet/datasets/kipan *.csv
recursive-include bioneuralnet/datasets/paad *.csv

# Include documentation source files
prune docs
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
[![Documentation](https://img.shields.io/badge/docs-read%20the%20docs-blue.svg)](https://bioneuralnet.readthedocs.io/en/latest/)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17503083.svg)](https://doi.org/10.5281/zenodo.17503083)

## Welcome to BioNeuralNet 1.1.4
## Welcome to BioNeuralNet 1.2.0

![BioNeuralNet Logo](assets/LOGO_WB.png)

Expand Down Expand Up @@ -172,7 +172,7 @@ from bioneuralnet.downstream_task import DPMON
from bioneuralnet.datasets import DatasetLoader

# Load the dataset and access individual omics modalities
example = DatasetLoader("example1")
example = DatasetLoader("example")
omics_genes = example.data["X1"]
omics_proteins = example.data["X2"]
phenotype = example.data["Y"]
Expand Down
Loading
Loading