Webservice

http://sbi.upf.edu/modcre

Configuration

Set configuration file parameters (i.e. ./scripts/config.ini).

Substitute PATH_OF_MODCRE for the address where modcre has been installed (i.e. /sbi/users/modcre). Executable binary programs of DSSP, X3DNA, TMalign, ghostscript, BLAST+ (ncbi-blast-2.2.31+) and HMMER (3) will be installed or linked in folder "src" of modcre. Modify the addresses in the scripts if you wish to use other:
```
 program_path = os.path.join(src_path, config.get("Paths", "program_path"))
```
to
```
 program_path = config.get("Paths", "program_path")
```
Check the path loacation of these other programs and modify the config file accordingly:

modppi (from MODPIN)

Clustal-Omega

EMBOSS

MEME

MMSeqs2

CD-HIT

MODELLER

WEBLOGO (installed in Python)
Necessary commands to submit to queues of a cluster are specified in file:

command_queues_cluster.txt
Folder files contains files as used in the webservice http://sbi.upf.edu/modcre. But these files can be modified under the criteria of the users

[Negative/Positive]_keywords are used to select TFs

[TF/NonTF]_biological_process.txt and [TF/NonTF]_molecular_function.txt are used to restrict the selection of TFs

nucleosomal_dna.pdb is used as template for nucleosome structures

species* files will be generated with the file "speclist.txt" from Uniprot DB

TcoF_merged.fa and TcoF_merged.txt are the sequences and interactions of co-transcriptoion factors (see reference in manuscript)

Download files

make folders for the database:

mkdir pdb

mkdir pbm

mkdir uniprot

PDB:

Go to "http://www.rcsb.org/pdb/search/advSearch.do?search=new".
Choose "All/Experimental Type/Molecule Type".
"Ignore" Experimental Method and select Molecule Type "Protein/NA complex".
Click on "Submit Query".
Click on "Download Results".
Uncheck all but "PDB".
Select Compresion Type "uncompressed".
Click on "Launch Download Application" and download "download_rcsb.jnlp".
Execute script and download files under "./pdb/all/".
Now, uncheck all but "Biological Assemblies".
Select Compresion Type "uncompressed".
Click on "Launch Download Application" and download "download_rcsb.jnlp".
Execute script and download files under "./pdb/biounits/".

Cis-BP:

Go to "http://cisbp.ccbr.utoronto.ca/index.php".
Select "PBM" in By Evidence Type" and press "GO" button.
Add all TFs to cart and click on "View cart" in the left menu.
Click on "Download TFs in cart" and only leave a [check] on "TF info" and "Protein features", then click on "Download TFs in cart!".
Go to "http://cisbp.ccbr.utoronto.ca/entireDownload.php".
Click on "E-scores" and download "Escores.txt.zip".
Unzip downloaded files and you should obtain the following: "TF_Information.txt", "prot_seq.txt" and "Escores.txt".
Put the files of CisBP in pbm folder: pbm/CisBP_2019

UniProt:

Go to "http://www.uniprot.org/".
Move to folder uniprot to download files
Download UniProtKB Swiss-Prot and TrEMBL proteins in FASTA format.
Gunzip downloaded files, and concatenate them as: "uniprot_sprot_trembl.fasta" (file is ~40G).
Go to "http://www.uniprot.org/docs/speclist".
Right click and "Save Page As..." "speclist.txt" in folder "uniprot"
On command line, run: grep ">" uniprot_sprot_trembl.fasta > headers.txt -*
Get uniprot_sprot and uniref50/90
In folder uniprot execute:

curl -O ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref50/uniref50.fasta.gz

gunzip uniref50.fasta.gz

curl -O ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz

gunzip uniref90.fasta.gz

curl -O ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz

gunzip uniprot_sprot.fasta.gz

Reformat UniRef in Sprot format

python ../scripts/uniref2SP.py -u uniref50.fasta -o uniref50_sprot.fasta

python ../scripts/uniref2SP.py -u uniref90.fasta -o uniref90_sprot.fasta

Get the header we wish to use

grep ">" uniref90_sprot.fasta>header.txt

Make the BLAST database

../src/ncbi-blast-2.2.31+/bin/makeblastdb -in uniref90_sprot.fasta -dbtype prot -title uniref90 -out uniref90_sprot.fasta

Get idmapping.dat from Uniprot to get cross-references

curl -O ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping.dat.gz

gunzip idmapping.dat.gz

Get TFs

If SQL files are not available (i.e downloading full SQL is incomplete), try getting at least Cisbp_2.00.motifs.sql and families:

Cisbp_2.00.motifs.sql

cisbp_2.0.tf_families.sql
Get TF information for all TFs plus TF info and proteins for PBM:

prot_seq_PBM.txt

TF_Information_PBM.txt
Modify prot_seq_PBM.txt: Species must be TF_Species and Protein_seq must be Protein_Sequence
Then extract SQL files and TF files from them

python ../../scripts/get_CisBP_Tables.py --sql Cisbp_2.00.motifs.sql --tf TF_Information_all_motifs.txt --ps prot_seq_PBM.txt --select TF_Information_PBM.txt -o CisBP_2.00.PBM

Then, get TF files (you can later select whether you plan to use all TFs downloaded or only those with PBM data extracted as above)

python scripts/tfinder2.py -o tfinder2019 -p pbm/CisBP_2019/CisBP_2.00.PBM.proteins.sql --pdb=pdb/all -t pbm/CisBP_2019/CisBP_2.00.PBM.tfs.sql -u uniprot/uniprot_sprot+trembl.fasta -f pbm/CisBP_2019/cisbp_2.0.tf_families.sql --dummy=dummy_tfinder -v

python scripts/tfinder2.py -o tfinder_all -p pbm/CisBP_2019/CisBP_2.00.all.proteins.sql --pdb=pdb/all -t pbm/CisBP_2019/CisBP_2.00.all.tfs.sql -u uniprot/uniprot_sprot+trembl.fasta -f pbm/CisBP_2019/CisBP_2.00.all.tf_families.sql --map uniprot/idmapping.dat --dummy=dummy_tfinder -v

[warning] Modify the scripts to load the program versions that you have installed, according to the definitions used in the configuration file

Parse PDB data

All in a run
with all the pdb

python scripts/pdb.py --dummy=./dummy -o pdb -p pdb -t tfinder/pdb.txt -v
with all TFs

python scripts/pdb.py --dummy=./dummy_tf -o pdb_tf -p pdb -t tfinder2/tfs.txt -v
Step by step DB construction:

sh pdb_1.sh

sh pdb_2.sh

sh pdb_3.sh

sh pdb_4.sh

sh pdb_5.sh

sh pdb_6.sh

sh pdb_7.sh

sh pdb_7r.sh

sh pdb_8.sh

Parse PBM data

All in a run to generate data from 2016 (for all data)
PBM of CisBP 2016:

python scripts/pbm.py -e pbm/CisBP_2016/Escores.txt -f pbm/CisBP_2016/cisbp_1.02.tf_families.sql -m pbm/CisBP_2016/cisbp_1.02.motifs.sql -s pbm/CisBP_2016/cisbp_1.02.motif_sources.sql -t pbm/CisBP_2016/cisbp_1.02.tfs.sql -u uniprot/uniprot_sprot+trembl.fasta --dummy=./dummy_tf --pdb=pdb_tf -o pbm_tf --pwm=pbm/CisBP_2016/PWM_2016/pwms/ -v

sh pbm_1.1.sh
Step by step DB construction for 2019 data:

sh pbm_1.sh

sh pbm_2.sh

sh pbm_3.sh

sh pbm_4.sh

sh pbm_5.sh

sh pbm_6.sh

sh pbm_7.sh

sh pbm_8.sh

sh pbm_9.sh

sh pbm_10.sh

sh pbm_11.sh

Grid Search to get best parameters per family

Get parameters with script run_grid_search
Parameters are inserted in config.ini

sh run_grid_search.sh

Examples of USE

Protein to DNA: applications of the model.py, pwm.py and score.py programs

Model a monomer:

[example] submit the script "sh model_atoh1.sh" or:

python scripts/model_protein.py -i example/protein2dna/atoh1.fa -l ATOH1 -o example/protein2dna/ATOH1 --pdb=pdb/ -v --dummy=./dummy

Model mix of monomer and dimers:

python scripts/model_protein.py -i example/protein2dna/esr1.fa -l ESR1 -o example/protein2dna/ESR1 --pdb=pdb/ -v --dummy=./dummy --full --all --renumerate --chains_fixed --dummy=./dummy --unrestrictive

python scripts/model_protein.py -i example/protein2dna/err1.fa -l ERR1 -o example/protein2dna/ERR1 --pdb=pdb/ -v --dummy=./dummy --full --all --renumerate --chains_fixed --dummy=./dummy

Model a heterodimer:

[warning] if the tfs do not dimerize, it won't model anything

python scripts/model_protein.py -i example/protein2dna/hetero.fa -l HETERO -o example/protein2dna/HETERO/ --pdb=pdb/ -v --dummy=./dummy

Model several protein-dna structures from threading files:

inputs a file with a list of threading files for several proteins and make the models for all parallelizing in a cluster

python scripts/model_multiple_proteins.py -i example/protein2dna/threads.list --threading --pdb=pdb/ -v --parallel --dna -o example/protein2dna/THREAD_LIST --dummy=./dummy

Model a DNA sequence:

[warning] first, calculate DNA interface with interface.py (if not provided, model.py will calculate it)

python scripts/interface.py -i example/protein2dna/ERR1/ERR1:73:153_1by4_B_1.pdb -o example/protein2dna/ERR1/ERR1:73:153_1by4_B_1.interface --dummy=./dummy

python scripts/model_dna.py -p example/protein2dna/ERR1/ERR1:73:153_1by4_B_1.pdb -o example/protein2dna/ERR1/ERR1:73:153_1by4_B_1.AAAAAAAAAAAAAAA --pdb=pdb/ -s AAAAAAAAAAAAAAA -i example/protein2dna/ERR1/ERR1:73:153_1by4_B_1.interface

Predict PWM of a single model:

[warning] not all combinations of statistical potentials exist (should you want them modify pdb.py and pbm.py)

python scripts/pwm_pbm.py -i examples/protein2dna/ATOH1/ATOH1:160:216_1mdy_A_1.pdb -o examples/protein2dna/ATOH1/ATOH1:160:216_1mdy_A_1 --pbm=pbm/ --pdb=pdb/ -v --auto

Predict PWM of a folder with models:

[warning] not all combinations of statistical potentials exist (should you want them modify pdb.py and pbm.py)

python scripts/pwm_pbm.py -i examples/protein2dna/ATOH1 --pbm=pbm/ --pdb=pdb/ -v --auto --parallel

Score an interaction:

[warning] not all combinations of statistical potentials exist (should you want them modify pdb.py and pbm.py)

python scripts/scorer.py -i example/protein2dna/ATOH1/ATOH1:160:216_1mdy_A_1.pdb --dummy=dummy -o example/protein2dna/ATOH1/ATOH1:160:216_1mdy_A_1.score --pbm=pbm/ --pdb=pdb/ --norm -v -a -s allexample/protein2dna/ATOH1/ATOH1:160:216_1mdy_A_1.score

Profile an interaction:

Make a profile of several TF listed in the input

echo "example/protein2dna/ERR1" > example/protein2dna/TF_profiler_ESR1_ERR1.dat

echo "example/protein2dna/ESR1" >> example/protein2dna/TF_profiler_ESR1_ERR1.dat

python scripts/xprofiler.py --chains_fixed --dummy=./dummy -d example/protein2dna/ERR1P/dna_large.fa -i example/protein2dna/TF_profiler_ESR1_ERR1.dat
--pbm=./pbm_2.0 --pdb=./pdb -v --plot --html --html_types normal,energy,energy_per_nucleotide,energy_best,fimo_log_score,fimo_binding,fimo_score --html_energies s3dc_dd,s3dc,pair -o ESR1_ERR1.profile -l thread --auto --parallel --radius 22.0 --reuse

DNA to protein: characterization of the human enhanceosome at the IFN-BETA gene promoter

Scan a DNA sequence:

[warning] execution time grows with the length of the input nucleotide sequence

python scripts/scanner.py --dummy=./dummy_scan -i example/enhancer/dna.fa -l IFNB_HUMAN --pbm=./pbm --pdb=./pdb -v -o example/enhancer/IFNB_HUMAN --parallel --reuse --max 100 --ft 0.005 -s 9606 --rank

Build models from threading files:

python scripts/model_protein.py -i example/enhancer/IFNB_HUMAN/aux_files/D3DR86.2i9t_B.203-216.txt -l NFKB2.2i9t_B.203-216 -o example/enhancer/IFNB_HUMAN/models/ --pbm=pbm/ --pdb=pdb/ -t -v

python scripts/model_protein.py -i example/enhancer/IFNB_HUMAN/aux_files/Q04864.2ram_A.208-220.txt -l REL.2ram_A.208-220 -o example/enhancer/IFNB_HUMAN/models/ --pbm=pbm/ --pdb=pdb/ -t -v

Compare the PWM predictions with the nearest-neighbour approach

Option 1, single run:

sh run_nearest_neighbour.sh

Option 2, step by step:

Get data of TF sequences: input sequences and separated Fasta files

nn_get_data.sh

Obtain models

nn_model_tfs.sh

Copy sequences in pbm/sequences folder

\cp nearest_neighbour/sequences/* pbm/sequences

Obtain theoretical PWMs of each TF

nn_pwm_tfs.sh

Get the input file for models of PWMs

nn_get_data_mdl.sh

Calculate the nearest neighbours in a cluster

nn_parallel.sh

Runs each confdition of NN
Compare by ranks

*results for the comparison are in nearest_neighbour/PLOT_NORMAL_RANK

nn_s3N.sh

Compare by rank-enrichment
Results for the comparison are in nearest_neighbour/PLOT_NORMAL_RANKENRICHED

nn_s5N.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webservice

Configuration

Download files

PDB:

Cis-BP:

UniProt:

Get TFs

[warning] Modify the scripts to load the program versions that you have installed, according to the definitions used in the configuration file

Parse PDB data

Parse PBM data

Grid Search to get best parameters per family

Examples of USE

Protein to DNA: applications of the model.py, pwm.py and score.py programs

DNA to protein: characterization of the human enhanceosome at the IFN-BETA gene promoter

Compare the PWM predictions with the nearest-neighbour approach

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
benchmark		benchmark
example		example
files		files
nearest_neighbour		nearest_neighbour
scripts		scripts
src/modpy		src/modpy
README.md		README.md
model_atoh1.sh		model_atoh1.sh
nn_get_data.sh		nn_get_data.sh
nn_get_data_mdl.sh		nn_get_data_mdl.sh
nn_model_tfs.sh		nn_model_tfs.sh
nn_parallel.sh		nn_parallel.sh
nn_pwm_tfs.sh		nn_pwm_tfs.sh
nn_s3N.sh		nn_s3N.sh
nn_s5N.sh		nn_s5N.sh
pbm_1.1.sh		pbm_1.1.sh
pbm_1.sh		pbm_1.sh
pbm_10.sh		pbm_10.sh
pbm_11.sh		pbm_11.sh
pbm_2.sh		pbm_2.sh
pbm_3.sh		pbm_3.sh
pbm_4.sh		pbm_4.sh
pbm_5.sh		pbm_5.sh
pbm_6.sh		pbm_6.sh
pbm_7.sh		pbm_7.sh
pbm_8.sh		pbm_8.sh
pbm_9.sh		pbm_9.sh
pdb_1.sh		pdb_1.sh
pdb_2.sh		pdb_2.sh
pdb_3.sh		pdb_3.sh
pdb_4.sh		pdb_4.sh
pdb_5.sh		pdb_5.sh
pdb_6.sh		pdb_6.sh
pdb_7.sh		pdb_7.sh
pdb_7r.sh		pdb_7r.sh
pdb_8.sh		pdb_8.sh
run_grid_search.sh		run_grid_search.sh
run_nearest_neighbour.sh		run_nearest_neighbour.sh

Folders and files

Latest commit

History

Repository files navigation

Webservice

Configuration

Download files

PDB:

Cis-BP:

UniProt:

Get TFs

[warning] Modify the scripts to load the program versions that you have installed, according to the definitions used in the configuration file

Parse PDB data

Parse PBM data

Grid Search to get best parameters per family

Examples of USE

Protein to DNA: applications of the model.py, pwm.py and score.py programs

DNA to protein: characterization of the human enhanceosome at the IFN-BETA gene promoter

Compare the PWM predictions with the nearest-neighbour approach

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages