GitHub - aidanborkan/FASTA: FASTA Query for Proteomics

#UniProt provides extremely powerful query capabilities, but the output is typically a raw FASTA file. FASTA files are ideal for sequence-based tools, but they are difficult to work with in downstream data analysis, annotation, and integration workflows.

This application fills the gap between UniProt FASTA datasets and proteomics analysis tables.

Specifically, it is designed for situations where you want to answer questions like:

Which proteins from a curated UniProt set are actually present in my experiment?
How many of my significant MS hits belong to a specific functional class (e.g., kinases, metal-binding proteins, reviewed human proteins)?
Which UniProt proteins should I carry forward into structural, enrichment, or targeted analyses?

What problem does this solve?

In many proteomics workflows, you work with two disconnected data sources:

A UniProt-derived FASTA set
Examples:
- Reviewed human proteome
- Proteins annotated with a specific keyword (e.g., copper-binding)
- Proteins from a specific organism or proteome
- Proteins matching a functional or annotation-based query
An experimental or analytical protein list
Examples:
- MS-identified proteins
- Differential abundance hits
- LiP-MS or TPP significant proteins
- Curated protein sets from prior analyses

This app:

Fetches FASTA data directly from UniProt using a reproducible query
Converts semi-structured FASTA headers into a tidy, joinable table
Allows you to intersect UniProt-derived protein sets with your own data
Produces outputs that are immediately usable in downstream analysis

Why not just download FASTA manually?

Manually downloading FASTA files and inspecting headers is:

Error-prone
Difficult to reproduce
Hard to integrate with data frames and pipelines
Unscalable for iterative analysis

This app:

Makes UniProt queries explicit and documented
Ensures identifier parsing is consistent and deterministic
Eliminates copy-paste and manual filtering
Encourages reproducible, query-driven workflows

When is this especially useful?

This tool is particularly helpful when you want to:

Filter large UniProt protein sets down to experimentally observed proteins
Validate whether expected functional classes appear in your data
Prepare protein subsets for:
- Structural modeling
- Enrichment analysis
- Targeted follow-up experiments
- Custom FASTA databases
Rapidly iterate on different UniProt queries without re-downloading files manually

UniProt query examples

Below are example queries that can be entered directly into the app:

Human reviewed proteome
proteome: UP000005640 AND reviewed: true
Human kinases (by protein name)
organism_id:9606 AND (protein_name:kinase)
Copper-binding proteins
keyword: "Copper-binding" AND reviewed: true.

The application uses UniProt’s REST stream endpoint (uniprotkb/stream) to retrieve FASTA records directly from UniProt, ensuring results are up to date and reproducible.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
FASTA.R		FASTA.R
README.md		README.md
requirements		requirements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What problem does this solve?

Why not just download FASTA manually?

When is this especially useful?

UniProt query examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What problem does this solve?

Why not just download FASTA manually?

When is this especially useful?

UniProt query examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages