BLAST_primer_filter parses primer-pair BLASTN hits against a genome assembly,
filters likely PCR products, and writes browser- and plotting-friendly outputs.
It is meant for marker validation workflows where you already have primer FASTA,
genome FASTA, and BLAST tabular output.
*_amplicons.tsv: viable amplicons, coordinates, identity metrics, and sequences.*_amplicons.gff3: IGV/JBrowse-ready PCR product annotations.*_failed.tsv: primer hits or primer pairs that failed filters, with reasons.*_plot.pdf: contig-scaled overview of viable products and failed hits.
The plot uses a black horizontal bar for each chromosome or contig, dark green for normal viable amplicons, light green for inverted viable amplicons, red vertical ticks for failed primer hits, and numbered labels tied to primer names in the legend.
- Handles normal and inverted primer orientations.
- Filters partial alignments so amplicons are built from full-length primer hits.
- Exports viable amplicons, failed hits, GFF3 browser tracks, and PDF plots.
- Supports configurable axis units:
bp,kb,Mb, andGb. - Annotates plots with primer-number legends for compact multi-marker views.
Create the conda/mamba environment from the project file:
mamba env create -f environment.yml
conda activate blast-primer-filterIf you prefer to keep the environment inside the repository:
mamba env create --prefix ./.conda -f environment.yml
conda activate ./.condaThe environment includes Python, Biopython, Matplotlib, BLAST+, and pytest.
Primer IDs must end in F or R. The shared prefix is treated as the primer
pair name. The recommended style is primername_F and primername_R.
>MyPrimer_F
ATGCGTACGTTAGC
>MyPrimer_R
CGTACGACTTACGA
With this style, the internal pair name is MyPrimer_; GFF3 display names trim
the trailing underscore for readability.
A reference genome or assembly FASTA used for sequence extraction and plotting contig lengths.
Run BLASTN with this exact tabular output layout:
makeblastdb -in genome.fasta -dbtype nucl
blastn -query primers.fasta -db genome.fasta \
-outfmt "6 qseqid sseqid sstart send sstrand pident length mismatch gapopen evalue bitscore sseq" \
-out blast.tsvpython blast_primer_analysis-v3.py \
--primers primers.fasta \
--blast blast.tsv \
--genome genome.fasta \
--out_prefix results \
--min_len 80 \
--max_len 3000 \
--tick_units Mb \
--tick_step 1By default this writes results_amplicons.tsv, results_amplicons.gff3,
results_failed.tsv, and results_plot.pdf.
For output columns, GFF3 attributes, and plot legend details, see docs/outputs.md.
You can also convert an existing amplicon table to GFF3 without the original genome FASTA:
python blast_primer_analysis-v3.py \
--amplicons_tsv examples/second/Blast_SRR_primers_amplicons.tsv \
--gff3 converted_amplicons.gff3-
examples/synthetic/contains a tiny fully runnable example with a synthetic genome anddemo_amplicon_F/demo_amplicon_Rprimers. Run it with:make synthetic
-
examples/second/contains a soybean SRR marker example with the primer FASTA, BLAST TSV, expected amplicon table, expected failed-hit table, expected GFF3 track, and PDF plot. -
The original soybean genome FASTA is not committed because it is a large external reference. Use the committed amplicon table to smoke-test GFF3 conversion, or provide the genome FASTA locally to rerun the full analysis.
Required for full analysis:
--primers: primer FASTA file.--blast: BLAST results in the required format.--genome: genome FASTA file.
General:
--out_prefix: output prefix. Default:results.--gff3: custom GFF3 output path. Default:<out_prefix>_amplicons.gff3.--no_gff3: skip GFF3 output.--amplicons_tsv: convert an existing amplicon TSV to GFF3 and exit.
Filtering:
--min_len: minimum allowed amplicon length. Default:80.--max_len: maximum allowed amplicon length. Default:3000.--require_3p: number of 3-prime bases that must match exactly. Default:3.--max_mismatches: maximum mismatches allowed in primer hits. Default:10.--min_pident: minimum percent identity required. Default:0.0.--len_tolerance: allowed difference between primer length and hit length. Default:5.--min_fail_len_frac: minimum failed-hit alignment length fraction to report. Default:0.8.--min_fail_pident: minimum failed-hit percent identity to report. Default:70.0.
Plotting:
--tick_units: one ofbp,kb,Mb, orGb. Default:Mb.--tick_step: spacing between x-axis ticks in the chosen units. Default:1.0.
The script applies filters in this order:
--len_tolerancesilently discards partial primer alignments that are too short or too long compared with the primer length.--max_mismatchesand--min_pidentmark hits as failed, but keep them available for failed-hit reporting.--require_3penforces exact matching at the primer 3-prime end.- Primer hits are paired only when they land on the same contig and have a valid normal or inverted orientation.
--min_lenand--max_lenfilter the final PCR product span.--min_fail_len_fracand--min_fail_pidentcontrol how much failed-hit detail is written to*_failed.tsv.
For a fuller walkthrough of each filter, including common gotchas, see docs/filtering.md.
- If exact short primers are missing from BLAST output, run BLAST with
blastn -task blastn-short. - If expected partial or distant hits disappear completely, increase
--len_tolerance. - If
*_failed.tsvis emptier than expected, lower--min_fail_pidentor--min_fail_len_frac.
Run the tests from an activated environment:
python -m pytestThe current tests verify soybean amplicon-to-GFF3 conversion, synthetic fixture formats, PNG plot previews, and the fully runnable synthetic BLAST example. They are split by purpose:
tests/test_core.py: direct function-level tests forblast_primer_analysis-v3.py.tests/test_cli.py: command-line behavior and argument validation.tests/test_examples.py: committed example fixtures and full synthetic BLAST run.
Conrad R. (2025). BLAST_primer_filter. GitHub repository:
https://github.com/rotheconrad/BLAST_primer_filter
