Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
8c04859
Unify code branding (#26)
croots May 15, 2025
1ad0111
remove hyphenation in urls
croots May 15, 2025
d18185c
Merge branch 'main' of github.com:barricklab/efm-calculator2
croots May 15, 2025
4146ddb
Create requirements.txt
croots May 15, 2025
ff422af
Merge main to production (#27)
croots May 15, 2025
75f198c
Possible fix for streamlit dependency issue
croots May 15, 2025
5766ba7
Update requirements.txt
croots May 15, 2025
941182f
Fix bage pointing to old repo
croots May 16, 2025
94dff30
possible fix for download button
croots May 16, 2025
2eaf7df
fix warning about literal comparison
croots May 16, 2025
5a0b8c5
Merge branch 'production' into main
croots May 16, 2025
bcd956c
remove extra print statement
croots May 19, 2025
208a2cd
fix bugged top table assignment with rmds
croots May 19, 2025
e8effa8
1 index for CLI and webapp outputs
croots May 21, 2025
915ac32
Better filepath sanity checking
croots May 21, 2025
0ce9bad
Allow unrealistically high probabilities
croots May 21, 2025
171486d
Fix rel_rate error
croots May 21, 2025
bac42d8
Fix text input webapp not responding
croots May 21, 2025
3cc6b89
fixed filtering of nested SSRs
kevin99111 May 22, 2025
e79b6a9
possible fix for windows filepath issues
croots May 22, 2025
23d1168
Fix order of operations and statemachine clearing for multi input
croots May 22, 2025
f22fe2f
fix filtering of nested RMDs
kevin99111 May 26, 2025
8a6d793
fix filtering of nested SSRs
kevin99111 May 27, 2025
c62ba34
changes to RMD filtering to reduce memory usage
kevin99111 May 27, 2025
a6ba42c
Fix webapp forgetting incremental new files
croots May 28, 2025
e9724db
Merge branch 'main' of github.com:barricklab/efmcalculator2
croots May 28, 2025
06a5e2d
Fix wonky ssr glyphs
croots May 28, 2025
050460a
Make webapp sequence columns editable
croots May 28, 2025
727318e
fixed overfiltering of nested RMDs
kevin99111 Jun 9, 2025
76b2e16
remove debugging print statemetns
kevin99111 Jun 9, 2025
a062a65
srs/rmd filtering fix
kevin99111 Jun 11, 2025
f84366a
Remove logo and move text to bottom
croots Jun 11, 2025
cea1dfa
Slightly better column sorting
croots Jun 11, 2025
4a2464a
Remove badge, add back logo, push down right column
croots Jun 11, 2025
5afc53d
Uncolumn the input layout.
croots Jun 11, 2025
0d62195
fixed filtered out "TACTAGA"
kevin99111 Jun 16, 2025
246ac3d
Truncate very long sequences in mouseover
croots Jun 25, 2025
c60f4b3
Fix cant assign tuple when filtering by annotation
croots Jul 7, 2025
7df64b3
One indexing for annotation data
croots Jul 7, 2025
5feb78f
Classify SRS and RMD as "Tandem Repeat" when appropriate
kevin99111 Jul 23, 2025
ae9a99d
only run tandem repeat assignment code on SRS/RMD with distance of 1
kevin99111 Jul 24, 2025
2738574
update tandem repeat assignment to prevent error when second repeat i…
kevin99111 Jul 25, 2025
7d0e633
Pin streamlit-aggrid==1.1.7
croots Aug 22, 2025
e90b48d
Enforce showing TR column and order
croots Aug 25, 2025
2b8ed3c
Fix: Annotations overriding TR column
croots Aug 26, 2025
0224a8b
add new gam mut rates with updated numbers
kevin99111 Oct 27, 2025
4d88b5c
change name of file
kevin99111 Oct 27, 2025
9587b0b
change name of file
kevin99111 Oct 27, 2025
1966c3e
fix column names of gam_df_new
kevin99111 Oct 27, 2025
8e283a9
change code to use new gam mut rates
kevin99111 Oct 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/package_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,10 @@ jobs:
with:
python-version: 3.12

- name: install efmcalculator
- name: install efmcalculator2
run: |
pip install ./

- name: test efmcalculator
- name: test efmcalculator2
run: |
python -m unittest
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Auto-generated files
efmcalculator/_version.py
efmcalculator2/_version.py

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
[![Status](https://github.com/barricklab/efm-calculator2/actions/workflows/package_and_test.yml/badge.svg)](https://github.com/barricklab/efm-calculator2/actions/workflows/package_and_test.yml)
[![Stantus](https://github.com/barricklab/efmcalculator2/actions/workflows/package_and_test.yml/badge.svg)](https://github.com/barricklab/efmcalculator2/actions/workflows/package_and_test.yml)

`efmcalculator` is a Python package or web tool for detecting mutational hotspots. It predicts the mutation rates associated with each hotspot and combines them into a relative instability score. These hotspots include simple sequence repeats, repeat mediated deletions, and short repeat sequences. This code updates and improves upon the last version of the [EFM calculator](https://github.com/barricklab/efm-calculator).
`efmcalculator2` is a Python package or web tool for detecting mutational hotspots. It predicts the mutation rates associated with each hotspot and combines them into a relative instability score. These hotspots include simple sequence repeats, repeat mediated deletions, and short repeat sequences. This code updates and improves upon the last version of the [EFM calculator](https://github.com/barricklab/efm-calculator).

`efmcalculator` supports multifasta, genbank, or csv files as input and accepts parameters from the command line. It also supports the scanning of both linear and circular sequences. It defaults to a pairwise comparison strategy (all occurrences of a repeat are compared with all other occurrences), but it also contains an option for a linear comparison strategy (each occurrence of a repeat is only compared with the next occurrence in the sequence) to accelerate the analysis of large sequences.
`efmcalculator2` supports multifasta, genbank, or csv files as input and accepts parameters from the command line. It also supports the scanning of both linear and circular sequences. It defaults to a pairwise comparison strategy (all occurrences of a repeat are compared with all other occurrences), but it also contains an option for a linear comparison strategy (each occurrence of a repeat is only compared with the next occurrence in the sequence) to accelerate the analysis of large sequences.


# Installation
The EFM Calculator can be accessed as a free web tool at efm2-beta.streamlit.app. It is limited to 50000 bases to ensure the app remains performant for other users.
It can be installed and run locally below without such base restriction.

## From pip:
`pip install efmcalculator` or clone this repository and `pip install ./` from the root of the repository.
`pip install efmcalculator2` or clone this repository and `pip install ./` from the root of the repository.

# Command Line Usage
- -h: help
Expand All @@ -24,17 +24,17 @@ It can be installed and run locally below without such base restriction.
- -v: verbose. 0 (silent), 1 (basic information), 2 (debug)
- --summary: saves only aggrigate results, useful for very tall inputs

Print efmcalculator help:
Print efmcalculator2 help:
```
efmcalculator -h
efmcalculator2 -h
```

Run efmcalculator on all sequences in a FASTA file using the pairwise strategy and print output to csv files within an output folder:
Run efmcalculator2 on all sequences in a FASTA file using the pairwise strategy and print output to csv files within an output folder:
```
efmcalculator -i “input.fasta” -o “output_folder”
efmcalculator2 -i “input.fasta” -o “output_folder”
```

Run efmcalculator on all sequences in a FASTA file, outputing to the folder output_folder, while treating the input as circular, searching with a linear pattern, and printing debug information:
Run efmcalculator2 on all sequences in a FASTA file, outputing to the folder output_folder, while treating the input as circular, searching with a linear pattern, and printing debug information:
```
efmcalculator -i “input.fasta” -o “output_folder” -c -s “linear” -v 2
efmcalculator2 -i “input.fasta” -o “output_folder” -c -s “linear” -v 2
```
15 changes: 12 additions & 3 deletions efmcalculator/StateMachine.py → efmcalculator2/StateMachine.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
import multiprocessing as mp
from .pipeline.mutation_rates import rip_score
from .webapp.SequenceState import SequenceState
from .utilities import sanitize_filename
from copy import deepcopy

class ThreadSafeBar(Bar):
def __init__(self, *args, **kwargs):
Expand Down Expand Up @@ -53,10 +55,13 @@ def import_sequences(self, sequences, max_size=None, webapp = False):
"""Import newly uploaded sequences while retaining state of existing sequences"""
# Import sequences without overwriting old ones
new = {seq._originhash: seq for seq in sequences}
retained_states = {}
for key in new:
if key in self.user_sequences:
new[key] = self.user_sequences[key]
if new == self.user_sequences:
if webapp:
retained_states[key] = deepcopy(self.sequencestates[key])
if new.keys() == self.user_sequences.keys():
return
self.user_sequences = new

Expand All @@ -65,7 +70,9 @@ def import_sequences(self, sequences, max_size=None, webapp = False):

# Make webapp states
if webapp:
self.sequencestates = {key: SequenceState(value) for key, value in self.user_sequences.items()}
self.sequencestates = {key: SequenceState(value) for key, value in self.user_sequences.items() if key not in retained_states.keys()}
self.sequencestates.update(retained_states)


# Update sequence names
self.named_sequences = {}
Expand All @@ -78,6 +85,7 @@ def import_sequences(self, sequences, max_size=None, webapp = False):
self.named_sequences[sequence_name] = seqhash

def predict_tall(self, outpath, strategy, filetype, threads, keepmem=False, summaryonly=False):
outpath = sanitize_filename(outpath)
samples = []
for seqname in self.named_sequences:
seqhash = self.named_sequences[seqname]
Expand Down Expand Up @@ -117,6 +125,7 @@ def predict_tall(self, outpath, strategy, filetype, threads, keepmem=False, summ
summary_df.write_csv(summarypath)

def save_results(self, folderpath, prediction_style = None, filetype = "parquet", summaryonly=False):
folderpath = sanitize_filename(folderpath)
summary_df = pl.DataFrame([
pl.Series("name", [], dtype=pl.String),
pl.Series("ssr_sum", [], dtype=pl.Float64),
Expand Down Expand Up @@ -157,7 +166,7 @@ def save_results(self, folderpath, prediction_style = None, filetype = "parquet"
srss = seqobj.srss.select(pl.exclude(["predid", "annotationobjects"]))
rmds = seqobj.rmds.select(pl.exclude(["predid", "annotationobjects"]))

folder = os.path.join(folderpath, f"{seqname}")
folder = os.path.join(folderpath, sanitize_filename(f"{seqname}"))
path = pathlib.Path(folder)
path.mkdir(parents=True)
if filetype == "parquet":
Expand Down
File renamed without changes.
9 changes: 6 additions & 3 deletions efmcalculator/cli.py → efmcalculator2/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
from Bio.SeqRecord import SeqRecord

from .utilities import (
is_path_creatable,
is_pathname_valid,
)
from .ingest.EFMSequence import EFMSequence
Expand Down Expand Up @@ -160,7 +159,9 @@ def main():
elif not is_pathname_valid(args.outpath):
logger.error(f"File {args.outpath} is not a valid path.")
exit(1)
elif not is_path_creatable(args.outpath):
try:
os.makedirs(args.outpath, exist_ok=True)
except:
logger.error(f"Cannot write to {args.outpath}")
exit(1)

Expand Down Expand Up @@ -201,6 +202,8 @@ def main():

# Unpack sequences into list ---------
sequences = list(sequences)
for seq in sequences:
seq.oneindex = True

# Run EFM Calculator ----------------
statemachine = StateMachine()
Expand Down Expand Up @@ -234,7 +237,7 @@ def main():
t_min, t_sec = divmod(t_sec, 60)
t_hour, t_min = divmod(t_min, 60)
logger.info(
f"EFMCalculator completed in {t_hour:02d}h:{t_min:02d}m:{t_sec:02d}s:{t_msec:02d}ms"
f"EFMCalculator2 completed in {t_hour:02d}h:{t_min:02d}m:{t_sec:02d}s:{t_msec:02d}ms"
)

if __name__ == "__main__":
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading