-
Notifications
You must be signed in to change notification settings - Fork 0
Ceren dev snaphot #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
CerenPaja
wants to merge
35
commits into
develop
Choose a base branch
from
ceren_dev
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
2f9d3b8
per n epoch snapshot tsne
CerenPaja 4e13749
per n epoch snaphot, error
CerenPaja 2c52f64
snapshot tsne script with errors
CerenPaja 281f6b6
fix parmaeter
dn070017 7d36a71
per epoch and prior prints
CerenPaja dd8eafe
Bug Fix v1
CerenPaja 97d7962
Bug Fix v2
CerenPaja 30ef9af
Bug Fix v3
CerenPaja 6af0aee
Bug Fix v4 fin
CerenPaja 0ef959c
Bug Fix v5 fin
CerenPaja 43d7d52
Bug Fix v6 API
CerenPaja 85b3574
Bug Fix v7 API
CerenPaja 29db121
Updated from develop branch plus my adds PeriodicEpoch and Grid Prior
CerenPaja 61b7e51
disable kl and other add-ins
CerenPaja 7b00112
Vanilla + GMM KL
73bbe7d
Vanilla + GMM + PeriodicTSNE
CerenPaja caee09c
Vanilla + GMM + PeriodicTSNE smll fix
CerenPaja 9c49869
Fix exploding gradients and PeriodicTSNECallback
f9f1225
Student-t dist added to Vanilla + GMM KL
d592b6f
Student-t dist added to Vanilla + GMM KL + BugFix
4a6c407
Annealing between vanilla and gmm
179b382
Annealing Fix v1
CerenPaja d8ba296
Crossfade Annealing works
CerenPaja b533088
Kmeans++ Seeding for prior initialization
CerenPaja 9777666
Kmeans and 3 Phases KL
CerenPaja 8d5c16c
3 phases merged into one big annealing
CerenPaja b9c5235
Per Sample Dispersion
CerenPaja d6c3e59
New KLAnnealing and Kmeans added to multi-modal architecture
CerenPaja c3d4305
fix merge conflict
dn070017 a68ffe2
New fixes on scheduler
CerenPaja a51990d
add encode and decode functions
dn070017 23a1d84
use pixi to manage environment
dn070017 9fd557e
Merge pull request #5 from dn070017/feature/layered_io
CerenPaja 5302357
Clustering Assignment for z_hat
CerenPaja b1969fb
Changes made according to comments
CerenPaja File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,4 +24,6 @@ Thumbs.db | |
| *.html | ||
| *.png | ||
| mlruns | ||
| outputs | ||
| outputs# pixi environments | ||
| .pixi/* | ||
| !.pixi/config.toml | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| #!/bin/bash | ||
| #SBATCH --job-name=Bulk_DNAmethylation | ||
| #SBATCH --account=project_2015212 | ||
| #SBATCH --partition=gpu | ||
| #SBATCH --gres=gpu:v100:1 | ||
| #SBATCH --cpus-per-task=8 | ||
| #SBATCH --mem=160G | ||
| #SBATCH --time=05:00:00 | ||
| #SBATCH --output=/scratch/project_2015212/ceren/runs/bulk/%x-%j.out | ||
| #SBATCH --error=/scratch/project_2015212/ceren/runs/bulk/%x-%j.err | ||
|
|
||
| set -euo pipefail | ||
|
|
||
| module load tensorflow/2.18 | ||
| source /projappl/project_2015212/cavachon/envs/ceren/.venv/bin/activate | ||
|
|
||
| export MLFLOW_TRACKING_URI="file:///scratch/project_2015212/ceren/mlruns" | ||
| export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} | ||
| export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK} | ||
| export PYTHONUNBUFFERED=1 | ||
|
|
||
| # Make sure key dirs exist | ||
| mkdir -p /scratch/project_2015212/ceren/runs/bulk/embeddings | ||
| mkdir -p /scratch/project_2015212/ceren/checkpoints | ||
|
|
||
| cd /projappl/project_2015212/cavachon/CAVACHON | ||
|
|
||
| python - << 'PY' | ||
| from cavachon.workflow import Workflow | ||
| CFG = "/projappl/project_2015212/cavachon/configs/ceren/DNAmethyl_small_run.yaml" | ||
| wf = Workflow(CFG) | ||
| wf.run() | ||
| PY |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| #!/bin/bash | ||
| #SBATCH --job-name=Bulk_DNAmethylation | ||
| #SBATCH --account=project_2015212 | ||
| #SBATCH --partition=gpu | ||
| #SBATCH --gres=gpu:v100:1 | ||
| #SBATCH --cpus-per-task=8 | ||
| #SBATCH --mem=90G | ||
| #SBATCH --time=01:00:00 | ||
| #SBATCH --output=/scratch/project_2015212/ceren/runs/bulk2/%x-%j.out | ||
| #SBATCH --error=/scratch/project_2015212/ceren/runs/bulk2/%x-%j.err | ||
|
|
||
| set -euo pipefail | ||
|
|
||
| module load tensorflow/2.18 | ||
| source /projappl/project_2015212/cavachon/envs/ceren/.venv/bin/activate | ||
|
|
||
| export MLFLOW_TRACKING_URI="file:///scratch/project_2015212/ceren/mlruns2" | ||
| export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} | ||
| export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK} | ||
| export PYTHONUNBUFFERED=1 | ||
|
|
||
| # Make sure key dirs exist | ||
| mkdir -p /scratch/project_2015212/ceren/runs/bulk2/embeddings | ||
| mkdir -p /scratch/project_2015212/ceren/checkpoints2 | ||
|
|
||
| cd /projappl/project_2015212/cavachon/CAVACHON | ||
|
|
||
| python - << 'PY' | ||
| from cavachon.workflow import Workflow | ||
| CFG = "/projappl/project_2015212/cavachon/configs/ceren/DNAm_second.yaml" | ||
| wf = Workflow(CFG) | ||
| wf.run() | ||
| PY |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,12 @@ | ||
| from .independent_bernoulli_data_modifier import ( | ||
| IndependentBernoulliDataModifier as IndependentBernoulliDataModifier, | ||
| ) | ||
| from .independent_zero_inflated_negative_binomial_data_modifier import ( | ||
| IndependentZeroInflatedNegativeBinomialDataModifier as IndependentZeroInflatedNegativeBinomialDataModifier, | ||
| ) | ||
| from .independent_bernoulli_data_modifier import ( | ||
| IndependentBernoulliDataModifier as IndependentBernoulliDataModifier, | ||
| ) | ||
| from .independent_zero_inflated_negative_binomial_data_modifier import ( | ||
| IndependentZeroInflatedNegativeBinomialDataModifier as IndependentZeroInflatedNegativeBinomialDataModifier, | ||
| ) | ||
| from .multivariate_normal_diag_data_modifier import ( | ||
| MultivariateNormalDiagDataModifier as MultivariateNormalDiagDataModifier, | ||
| ) | ||
| from .studentt_data_modifier import ( | ||
| StudenttDataModifier as StudenttDataModifier, | ||
| ) |
74 changes: 74 additions & 0 deletions
74
cavachon/dataloader/modifiers/multivariate_normal_diag_data_modifier.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| import functools | ||
| from typing import Any, Mapping | ||
|
|
||
| import tensorflow as tf | ||
|
|
||
| from cavachon.environment.constants import Constants | ||
| from cavachon.layers.modifiers.to_dense import ToDense | ||
|
|
||
|
|
||
| class MultivariateNormalDiagDataModifier(tf.keras.Model): | ||
| """MultivariateNormalDiagDataModifier | ||
|
|
||
| Modifiers for the modality which follows a MultivariateNormalDiag | ||
| distribution (Normal distribution with diagonal covariance). | ||
| The instance will be used right after the tf.data.Dataset is | ||
| created using the DataLoader. | ||
|
|
||
| Attributes | ||
| ---------- | ||
| modality_names: str | ||
| modality name. | ||
|
|
||
| modality_key: str | ||
| the key used to access the mapping of data created from | ||
| tf.data.Dataset. Defaults to `modality_name`_matrix. | ||
|
|
||
| modifiers: List[tf.keras.layers.Layer] | ||
| list of modifiers that will be applied to the data created from | ||
| tf.data.Dataset. Defaults to [ToDense]. | ||
|
|
||
| See Also | ||
| -------- | ||
| DataLoader: used to create tf.data.Dataset from MuData. | ||
|
|
||
| """ | ||
|
|
||
| def __init__(self, modality_name: str): | ||
| """Constructor for MultivariateNormalDiag data modifier | ||
|
|
||
| Parameters | ||
| ---------- | ||
| modality_name: str | ||
| the name of modality that needs to be processed. | ||
| """ | ||
| super().__init__() | ||
| self.modality_name: str = modality_name | ||
| self.modality_key: str = f"{modality_name}_{Constants.TENSOR_NAME_X}" | ||
| # For continuous normalized data (CNV, normalized RNA, etc.) | ||
| # we only need to convert sparse matrices to dense tensors | ||
| self.modifiers = [ToDense(self.modality_key)] | ||
|
|
||
| def call(self, inputs: Mapping[Any, tf.Tensor], training=None, mask=None): | ||
| """Process the data created from tf.data.Dataset. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| inputs: | ||
| Mapping of tf.Tensor, where the keys contain | ||
| self.modality_key. | ||
|
|
||
| training: bool, optional | ||
| Not used (kept for tf.keras.Model API). | ||
|
|
||
| mask: tf.Tensor, optional | ||
| Not used (kept for tf.keras.Model API). | ||
|
|
||
| Returns | ||
| ------- | ||
| Mapping[Any, tf.Tensor] | ||
| processed data. | ||
|
|
||
| """ | ||
| modifiers = self.modifiers | ||
| return functools.reduce(lambda x, modifier: modifier(x), modifiers, inputs) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| import functools | ||
| from typing import Any, Mapping | ||
|
|
||
| import tensorflow as tf | ||
|
|
||
| from cavachon.environment.constants import Constants | ||
| from cavachon.layers.modifiers.to_dense import ToDense | ||
|
|
||
|
|
||
| class StudenttDataModifier(tf.keras.Model): | ||
| def __init__(self, modality_name: str): | ||
| super().__init__() | ||
| self.modality_key = f"{modality_name}_{Constants.TENSOR_NAME_X}" | ||
| self.modifiers = [ToDense(self.modality_key)] | ||
|
|
||
| def call(self, inputs: Mapping[Any, tf.Tensor], **kwargs): | ||
| return functools.reduce(lambda x, mod: mod(x), self.modifiers, inputs) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| import tensorflow as tf | ||
| import tensorflow_probability as tfp | ||
|
|
||
| from cavachon.distributions.distribution import Distribution | ||
|
|
||
|
|
||
| class StudenttDistribution(Distribution, tfp.distributions.StudentT): | ||
| """StudentT distribution for continuous data with heavy tails (e.g. CNV).""" | ||
|
|
||
| def __init__(self, *args, **kwargs): | ||
| super().__init__(*args, **kwargs) | ||
|
|
||
| @classmethod | ||
| def from_parameterizer_output(cls, params: tf.Tensor, **kwargs): | ||
| """ | ||
| Creates distribution from a single tensor. | ||
| The last dimension is split into 3: loc, scale, and df. | ||
| """ | ||
| # Split into 3 equal parts | ||
| loc, scale_raw, df_raw = tf.split(params, 3, axis=-1) | ||
|
|
||
| # Scale (sigma) must be positive | ||
| scale = tf.math.softplus(scale_raw) + 1e-7 | ||
|
|
||
| # Degrees of Freedom (nu) must be > 0. | ||
| # Adding 2.0 ensures the variance is mathematically defined (> 2). | ||
| df = tf.math.softplus(df_raw) + 2.0 | ||
|
|
||
| return cls(df=df, loc=loc, scale=scale, **kwargs) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[MUST] the scripts related to HPC can be put into maybe additional directory like scripts or scripts/sbatch