DI-2978 eggd_pytmb v1.0.0 release by Aisha-D · Pull Request #4 · eastgenomics/eggd_pyTMB

Aisha-D · 2026-06-19T10:16:37Z

This change is

Summary by CodeRabbit

Release Notes

New Features
- Introduced a complete Tumour Mutational Burden (TMB) calculation application accepting VEP-annotated VCF, BAM files, and optional BED/GTF inputs with configurable filtering parameters (VAF, MAF, depth thresholds).
Documentation
- Updated documentation with usage examples, input/output specifications, and typical workflow instructions.
Chores
- Configured asset dependencies and environment settings for application deployment.

…ndecies

… rebuilt to match DNAnexus runtime, pyEffGenomeSize to preserve full logs, .dxiignore aded, project.toml updated

…s and source code

Di 2978 eggd py tmb v1.0.0 dev (#1) Co-Authored-By: ayana-manoj <ayana.manoj@github.com>

…e' into DI-2978_dev

DI-2978 READMe (#2)

DI-2978 remove default input files (#3)

coderabbitai · 2026-06-19T10:16:51Z

Walkthrough

Transforms a DNAnexus app template into a fully implemented eggd_pyTMB app. dxapp.json is rewritten with pyTMB-specific inputs/outputs and pinned asset dependencies; src/code.sh becomes a complete pipeline driver running mosdepth-backed effective genome size computation and TMB calculation; pyTMB YAML configs, conda environment, ignore files, and README are all added.

Changes

eggd_pyTMB App Implementation

Layer / File(s)	Summary
App manifest and asset declarations `dxapp.json`, `assets/mosdepth/dxasset.json`, `assets/pytmb/dxasset.json`	`dxapp.json` is rewritten with pyTMB-specific `inputSpec`/`outputSpec`, `mosdepth` and `pytmb` pinned asset dependencies, and instance type upgraded to `mem1_ssd1_v2_x4`; `dxasset.json` manifests added for both assets.
pyTMB variant caller and VEP configs `resources/home/dnanexus/pytmb/config/tnhaplotyper.yml`, `resources/home/dnanexus/pytmb/config/vep.yml`	`tnhaplotyper.yml` defines `freq`/`depth`/`altDepth` field mappings; `vep.yml` configures CSQ tag parsing, consequence classification, splicing terms, and `gnomad` population database subfield indices.
Main Bash pipeline driver `src/code.sh`	Rewritten to download inputs, derive sample name, relocate BAM index, run `pytmb.cli.run_effgenomesize` extracting `EFF_GENOME_SIZE`, run `pytmb.cli.run_tmb` with computed size and config paths, validate the output report, and upload via `dx-upload-all-outputs`.
Conda environment, ignore files, and README `resources/home/dnanexus/environment.yml`, `.dxignore`, `resources/home/dnanexus/.dxignore`, `.gitignore`, `README.md`	`environment.yml` pins Python, `cyvcf2`, `pandas`, `mosdepth`, and other runtime/test dependencies; ignore files exclude build artefacts and bytecode; README fully rewritten with inputs, outputs, and a `dx run` example.

Sequence Diagram

sequenceDiagram
  participant User
  participant code.sh
  participant run_effgenomesize as pytmb.cli.run_effgenomesize
  participant run_tmb as pytmb.cli.run_tmb
  participant DNAnexus as DNAnexus Platform

  User->>DNAnexus: dx run eggd_pyTMB (VCF, BAM, BAI, BED, filters)
  DNAnexus->>code.sh: execute with downloaded inputs
  code.sh->>code.sh: derive SAMPLE from vcf_prefix, mv bam_bai to ${bam_path}.bai
  code.sh->>run_effgenomesize: BAM + BED → compute effective genome size
  run_effgenomesize-->>code.sh: stdout/stderr containing EFF_GENOME_SIZE
  code.sh->>code.sh: grep/awk extract + validate EFF_GENOME_SIZE
  code.sh->>run_tmb: VCF + configs + EFF_GENOME_SIZE + vaf/maf/depth filters
  run_tmb-->>code.sh: ${SAMPLE}_TMB.txt report
  code.sh->>code.sh: validate report non-empty
  code.sh->>DNAnexus: dx-upload-all-outputs (tmb_report)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

eastgenomics/eggd_pyTMB#1: Introduces the same core src/code.sh linear pipeline implementing run_effgenomesize → run_tmb with BAM index handling.
eastgenomics/eggd_pyTMB#3: Modifies dxapp.json with the same bed/gtf input adjustments and instance type upgrade to mem1_ssd1_v2_x4.
eastgenomics/eggd_pyTMB#2: Updates README.md to the same eggd_pyTMB-specific documentation with matching inputs/outputs and dx run example.

Suggested reviewers

RSWilson1

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the primary objective: a v1.0.0 release of eggd_pyTMB, which aligns with the substantial changes across configuration files, documentation, and implementation scripts.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch DI-2978_eggd_pyTMB_v1.0.0_release

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

README.md (1)

43-53: ⚡ Quick win

Add language specifier to code fence.

The fenced code block lacks a language identifier. Add bash to improve syntax highlighting and conformance with markdown best practices.

Proposed fix

-```
+```bash
 dx run eggd_pyTMB \
   -ivcf="sample.vep_annotated.vcf.gz" \

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 43 - 53, The code fence for the dx run command
example is missing a language specifier, which prevents proper syntax
highlighting and violates markdown best practices. Add the bash language
identifier to the opening triple backticks (change ``` to ```bash) on the line
immediately before the dx run eggd_pyTMB command to enable syntax highlighting
and conform to markdown standards.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/code.sh`:
- Around line 37-47: The EFF_GENOME_SIZE extraction pipeline is vulnerable when
pipefail is enabled because grep exits with code 1 when no match is found,
causing premature script termination before validation can occur. Additionally,
the empty check on line 39 does not validate whether the parsed value is
numeric, allowing non-numeric strings to reach the arithmetic comparison on line
44. Fix this by either: allowing grep to return empty output without triggering
pipefail (such as using grep with a fallback pattern or piping through a
non-failing command), and then enhance the validation logic to check both that
EFF_GENOME_SIZE is non-empty AND matches a numeric pattern before performing the
arithmetic comparison with -eq. This ensures the script handles missing matches,
empty results, and non-numeric values gracefully.
- Around line 17-21: The error checking happens after the move operation, but
with set -e enabled, a failing mv command will exit the script immediately,
bypassing the custom error message. Add validation checks BEFORE the mv command
instead of after it. Verify that the source file bam_bai_path exists and that
the destination BAI_PATH conditions are valid BEFORE attempting the move
operation. This ensures proper guards are in place to prevent the mv command
from failing with a hard exit, allowing your custom error handling to work as
intended.

---

Nitpick comments:
In `@README.md`:
- Around line 43-53: The code fence for the dx run command example is missing a
language specifier, which prevents proper syntax highlighting and violates
markdown best practices. Add the bash language identifier to the opening triple
backticks (change ``` to ```bash) on the line immediately before the dx run
eggd_pyTMB command to enable syntax highlighting and conform to markdown
standards.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c9843ae1-be11-4bb4-8e46-3cf31d6902bb

📥 Commits

Reviewing files that changed from the base of the PR and between 773f61a and 9bc5acd.

📒 Files selected for processing (12)

.dxignore
.gitignore
README.md
assets/mosdepth/dxasset.json
assets/pytmb/dxasset.json
dxapp.json
resources/home/dnanexus/.dxignore
resources/home/dnanexus/environment.yml
resources/home/dnanexus/python_script.py
resources/home/dnanexus/pytmb/config/tnhaplotyper.yml
resources/home/dnanexus/pytmb/config/vep.yml
src/code.sh

coderabbitai · 2026-06-19T10:20:50Z

+mv "${bam_bai_path}" "${BAI_PATH}"
+if [[ ! -f "${BAI_PATH}" ]]; then
+    echo "Error: BAM index not found at ${BAI_PATH}" >&2
+    exit 1
+fi


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if src/code.sh exists and read the relevant section if [[ -f src/code.sh ]]; then echo "=== Full file content ===" wc -l src/code.sh echo "=== Lines 1-30 (context around lines 17-21) ===" head -30 src/code.sh | cat -n else echo "File not found: src/code.sh" find . -name "code.sh" -type f fi

Repository: eastgenomics/eggd_pyTMB

Length of output: 1278

Guard index placement before moving the file.

With set -e enabled, a failing mv exits the script immediately, bypassing the custom error message. The same-file move case also triggers a hard failure.

Suggested patch

BAI_PATH="${bam_path}.bai" -mv "${bam_bai_path}" "${BAI_PATH}" +if [[ ! -f "${bam_bai_path}" ]]; then + echo "Error: BAM index input not found at ${bam_bai_path}" >&2 + exit 1 +fi +if [[ "${bam_bai_path}" != "${BAI_PATH}" ]]; then + cp -f "${bam_bai_path}" "${BAI_PATH}" +fi if [[ ! -f "${BAI_PATH}" ]]; then echo "Error: BAM index not found at ${BAI_PATH}" >&2 exit 1 fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

mv "${bam_bai_path}" "${BAI_PATH}"

if [[ ! -f "${BAI_PATH}" ]]; then

echo "Error: BAM index not found at ${BAI_PATH}" >&2

exit 1

fi

if [[ ! -f "${bam_bai_path}" ]]; then

echo "Error: BAM index input not found at ${bam_bai_path}" >&2

exit 1

fi

if [[ "${bam_bai_path}" != "${BAI_PATH}" ]]; then

cp -f "${bam_bai_path}" "${BAI_PATH}"

fi

if [[ ! -f "${BAI_PATH}" ]]; then

echo "Error: BAM index not found at ${BAI_PATH}" >&2

exit 1

fi

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 17-17: bam_bai_path is referenced but not assigned.

(SC2154)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/code.sh` around lines 17 - 21, The error checking happens after the move operation, but with set -e enabled, a failing mv command will exit the script immediately, bypassing the custom error message. Add validation checks BEFORE the mv command instead of after it. Verify that the source file bam_bai_path exists and that the destination BAI_PATH conditions are valid BEFORE attempting the move operation. This ensures proper guards are in place to prevent the mv command from failing with a hard exit, allowing your custom error handling to work as intended.

coderabbitai · 2026-06-19T10:20:50Z

+EFF_GENOME_SIZE=$(echo "${EFF_OUTPUT}" | grep "Effective Genome Size" | awk '{print $(NF-1)}')
+
+if [[ -z "${EFF_GENOME_SIZE}" ]]; then
+    echo "Error: pyEffGenomeSize did not return an effective genome size. Check logs above." >&2
+    exit 1
+fi
+
+if [[ "${EFF_GENOME_SIZE}" -eq 0 ]]; then
+    echo "Error: Effective genome size is 0 bp" >&2
+    exit 1
+fi


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if file exists and examine the relevant lines if [[ -f "src/code.sh" ]]; then echo "=== File exists ===" wc -l src/code.sh echo "" echo "=== Lines 1-50 to see context ===" sed -n '1,50p' src/code.sh else echo "File src/code.sh not found" git ls-files | grep -i "code\\.sh" fi

Repository: eastgenomics/eggd_pyTMB

Length of output: 1640

Make EFF_GENOME_SIZE parsing resilient under pipefail.

The pipeline on line 37 fails prematurely: if grep finds no match, it exits with code 1, triggering script termination before the validation block ever runs. Additionally, non-numeric values bypass the empty check and crash the arithmetic comparison on line 44.

Suggested patch

-EFF_GENOME_SIZE=$(echo "${EFF_OUTPUT}" | grep "Effective Genome Size" | awk '{print $(NF-1)}') +EFF_GENOME_SIZE="$(awk '/Effective Genome Size/ {print $(NF-1)}' <<< "${EFF_OUTPUT}" | tail -n1 || true)" if [[ -z "${EFF_GENOME_SIZE}" ]]; then echo "Error: pyEffGenomeSize did not return an effective genome size. Check logs above." >&2 exit 1 fi +if [[ ! "${EFF_GENOME_SIZE}" =~ ^[0-9]+$ ]]; then + echo "Error: Effective genome size is not a valid integer: ${EFF_GENOME_SIZE}" >&2 + exit 1 +fi + if [[ "${EFF_GENOME_SIZE}" -eq 0 ]]; then echo "Error: Effective genome size is 0 bp" >&2 exit 1 fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/code.sh` around lines 37 - 47, The EFF_GENOME_SIZE extraction pipeline is vulnerable when pipefail is enabled because grep exits with code 1 when no match is found, causing premature script termination before validation can occur. Additionally, the empty check on line 39 does not validate whether the parsed value is numeric, allowing non-numeric strings to reach the arithmetic comparison on line 44. Fix this by either: allowing grep to return empty output without triggering pipefail (such as using grep with a fallback pattern or piping through a non-failing command), and then enhance the validation logic to check both that EFF_GENOME_SIZE is non-empty AND matches a numeric pattern before performing the arithmetic comparison with -eq. This ensures the script handles missing matches, empty results, and non-numeric values gracefully.

RSWilson1

@RSWilson1 reviewed 12 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on Aisha-D).

src/code.sh line 31 at r1 (raw file):

  --minCoverage 100 \
  --minMapq 50 \
  --oprefix "/home/dnanexus/out/${SAMPLE}" 2>&1); then

should it be --oprefix ?

src/code.sh line 68 at r1 (raw file):

fi

dx-upload-all-outputs

could add --parallel here but probably won't make much difference

ayana-manoj and others added 21 commits May 22, 2026 14:25

Implement eggd_pyTMB v1.0.0 by bundling pytmb 1.3.0 with runtime depe…

028a341

…ndecies

environment version changes

cedc3cf

pysam replaced with a wheel built from source

8b67de5

remove python_packages

c301ac8

bundles pytmb as a wheel, so it nstalls offline, pybedtools and pysam…

03d325e

… rebuilt to match DNAnexus runtime, pyEffGenomeSize to preserve full logs, .dxiignore aded, project.toml updated

reviewable changes

19749c2

Error handling for low effective genome size

9c8fb5e

built pytmb and mosdepth as DNA assets, and removed all bundled wheel…

c4b666b

…s and source code

fix pyEffGenomeSize failure no longer swallowed by || true

49ba4be

Merge pull request #1 from eastgenomics/DI-2978_eggd_pyTMB_v1.0.0_dev

c45ce50

Di 2978 eggd py tmb v1.0.0 dev (#1) Co-Authored-By: ayana-manoj <ayana.manoj@github.com>

DI-2978 update readme

7f302bd

DI-2978 update readme

6dd8b59

Merge remote-tracking branch 'origin/DI-2978_eggd_pyTMB_v1.0.0_releas…

35dd2af

…e' into DI-2978_dev

DI-2978 update readme

b457970

DI-2978 update readme

453b5fd

DI-2978 update readme

cc38e6d

DI-2978 update optional inputs

7ad1c40

Merge pull request #2 from eastgenomics/DI-2978_dev

a116466

DI-2978 READMe (#2)

remove default input files

3bae38b

increase instance type

0077d1a

Merge pull request #3 from eastgenomics/DI-2978_dev2

9bc5acd

DI-2978 remove default input files (#3)

Aisha-D requested a review from RSWilson1 June 19, 2026 10:16

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

RSWilson1 requested changes Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DI-2978 eggd_pytmb v1.0.0 release#4

DI-2978 eggd_pytmb v1.0.0 release#4
Aisha-D wants to merge 21 commits into
mainfrom
DI-2978_eggd_pyTMB_v1.0.0_release

Aisha-D commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Uh oh!

coderabbitai Bot Jun 19, 2026

Uh oh!

RSWilson1 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Aisha-D commented Jun 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

RSWilson1 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Aisha-D commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading