DI-2978 eggd_pytmb v1.0.0 release#4
Conversation
… rebuilt to match DNAnexus runtime, pyEffGenomeSize to preserve full logs, .dxiignore aded, project.toml updated
…s and source code
Di 2978 eggd py tmb v1.0.0 dev (#1) Co-Authored-By: ayana-manoj <ayana.manoj@github.com>
…e' into DI-2978_dev
DI-2978 READMe (#2)
DI-2978 remove default input files (#3)
WalkthroughTransforms a DNAnexus app template into a fully implemented Changeseggd_pyTMB App Implementation
Sequence DiagramsequenceDiagram
participant User
participant code.sh
participant run_effgenomesize as pytmb.cli.run_effgenomesize
participant run_tmb as pytmb.cli.run_tmb
participant DNAnexus as DNAnexus Platform
User->>DNAnexus: dx run eggd_pyTMB (VCF, BAM, BAI, BED, filters)
DNAnexus->>code.sh: execute with downloaded inputs
code.sh->>code.sh: derive SAMPLE from vcf_prefix, mv bam_bai to ${bam_path}.bai
code.sh->>run_effgenomesize: BAM + BED → compute effective genome size
run_effgenomesize-->>code.sh: stdout/stderr containing EFF_GENOME_SIZE
code.sh->>code.sh: grep/awk extract + validate EFF_GENOME_SIZE
code.sh->>run_tmb: VCF + configs + EFF_GENOME_SIZE + vaf/maf/depth filters
run_tmb-->>code.sh: ${SAMPLE}_TMB.txt report
code.sh->>code.sh: validate report non-empty
code.sh->>DNAnexus: dx-upload-all-outputs (tmb_report)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
README.md (1)
43-53: ⚡ Quick winAdd language specifier to code fence.
The fenced code block lacks a language identifier. Add
bashto improve syntax highlighting and conformance with markdown best practices.Proposed fix
-``` +```bash dx run eggd_pyTMB \ -ivcf="sample.vep_annotated.vcf.gz" \🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` around lines 43 - 53, The code fence for the dx run command example is missing a language specifier, which prevents proper syntax highlighting and violates markdown best practices. Add the bash language identifier to the opening triple backticks (change ``` to ```bash) on the line immediately before the dx run eggd_pyTMB command to enable syntax highlighting and conform to markdown standards.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/code.sh`:
- Around line 37-47: The EFF_GENOME_SIZE extraction pipeline is vulnerable when
pipefail is enabled because grep exits with code 1 when no match is found,
causing premature script termination before validation can occur. Additionally,
the empty check on line 39 does not validate whether the parsed value is
numeric, allowing non-numeric strings to reach the arithmetic comparison on line
44. Fix this by either: allowing grep to return empty output without triggering
pipefail (such as using grep with a fallback pattern or piping through a
non-failing command), and then enhance the validation logic to check both that
EFF_GENOME_SIZE is non-empty AND matches a numeric pattern before performing the
arithmetic comparison with -eq. This ensures the script handles missing matches,
empty results, and non-numeric values gracefully.
- Around line 17-21: The error checking happens after the move operation, but
with set -e enabled, a failing mv command will exit the script immediately,
bypassing the custom error message. Add validation checks BEFORE the mv command
instead of after it. Verify that the source file bam_bai_path exists and that
the destination BAI_PATH conditions are valid BEFORE attempting the move
operation. This ensures proper guards are in place to prevent the mv command
from failing with a hard exit, allowing your custom error handling to work as
intended.
---
Nitpick comments:
In `@README.md`:
- Around line 43-53: The code fence for the dx run command example is missing a
language specifier, which prevents proper syntax highlighting and violates
markdown best practices. Add the bash language identifier to the opening triple
backticks (change ``` to ```bash) on the line immediately before the dx run
eggd_pyTMB command to enable syntax highlighting and conform to markdown
standards.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c9843ae1-be11-4bb4-8e46-3cf31d6902bb
📒 Files selected for processing (12)
.dxignore.gitignoreREADME.mdassets/mosdepth/dxasset.jsonassets/pytmb/dxasset.jsondxapp.jsonresources/home/dnanexus/.dxignoreresources/home/dnanexus/environment.ymlresources/home/dnanexus/python_script.pyresources/home/dnanexus/pytmb/config/tnhaplotyper.ymlresources/home/dnanexus/pytmb/config/vep.ymlsrc/code.sh
| mv "${bam_bai_path}" "${BAI_PATH}" | ||
| if [[ ! -f "${BAI_PATH}" ]]; then | ||
| echo "Error: BAM index not found at ${BAI_PATH}" >&2 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if src/code.sh exists and read the relevant section
if [[ -f src/code.sh ]]; then
echo "=== Full file content ==="
wc -l src/code.sh
echo "=== Lines 1-30 (context around lines 17-21) ==="
head -30 src/code.sh | cat -n
else
echo "File not found: src/code.sh"
find . -name "code.sh" -type f
fiRepository: eastgenomics/eggd_pyTMB
Length of output: 1278
Guard index placement before moving the file.
With set -e enabled, a failing mv exits the script immediately, bypassing the custom error message. The same-file move case also triggers a hard failure.
Suggested patch
BAI_PATH="${bam_path}.bai"
-mv "${bam_bai_path}" "${BAI_PATH}"
+if [[ ! -f "${bam_bai_path}" ]]; then
+ echo "Error: BAM index input not found at ${bam_bai_path}" >&2
+ exit 1
+fi
+if [[ "${bam_bai_path}" != "${BAI_PATH}" ]]; then
+ cp -f "${bam_bai_path}" "${BAI_PATH}"
+fi
if [[ ! -f "${BAI_PATH}" ]]; then
echo "Error: BAM index not found at ${BAI_PATH}" >&2
exit 1
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| mv "${bam_bai_path}" "${BAI_PATH}" | |
| if [[ ! -f "${BAI_PATH}" ]]; then | |
| echo "Error: BAM index not found at ${BAI_PATH}" >&2 | |
| exit 1 | |
| fi | |
| if [[ ! -f "${bam_bai_path}" ]]; then | |
| echo "Error: BAM index input not found at ${bam_bai_path}" >&2 | |
| exit 1 | |
| fi | |
| if [[ "${bam_bai_path}" != "${BAI_PATH}" ]]; then | |
| cp -f "${bam_bai_path}" "${BAI_PATH}" | |
| fi | |
| if [[ ! -f "${BAI_PATH}" ]]; then | |
| echo "Error: BAM index not found at ${BAI_PATH}" >&2 | |
| exit 1 | |
| fi |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 17-17: bam_bai_path is referenced but not assigned.
(SC2154)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/code.sh` around lines 17 - 21, The error checking happens after the move
operation, but with set -e enabled, a failing mv command will exit the script
immediately, bypassing the custom error message. Add validation checks BEFORE
the mv command instead of after it. Verify that the source file bam_bai_path
exists and that the destination BAI_PATH conditions are valid BEFORE attempting
the move operation. This ensures proper guards are in place to prevent the mv
command from failing with a hard exit, allowing your custom error handling to
work as intended.
| EFF_GENOME_SIZE=$(echo "${EFF_OUTPUT}" | grep "Effective Genome Size" | awk '{print $(NF-1)}') | ||
|
|
||
| if [[ -z "${EFF_GENOME_SIZE}" ]]; then | ||
| echo "Error: pyEffGenomeSize did not return an effective genome size. Check logs above." >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| if [[ "${EFF_GENOME_SIZE}" -eq 0 ]]; then | ||
| echo "Error: Effective genome size is 0 bp" >&2 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if file exists and examine the relevant lines
if [[ -f "src/code.sh" ]]; then
echo "=== File exists ==="
wc -l src/code.sh
echo ""
echo "=== Lines 1-50 to see context ==="
sed -n '1,50p' src/code.sh
else
echo "File src/code.sh not found"
git ls-files | grep -i "code\\.sh"
fiRepository: eastgenomics/eggd_pyTMB
Length of output: 1640
Make EFF_GENOME_SIZE parsing resilient under pipefail.
The pipeline on line 37 fails prematurely: if grep finds no match, it exits with code 1, triggering script termination before the validation block ever runs. Additionally, non-numeric values bypass the empty check and crash the arithmetic comparison on line 44.
Suggested patch
-EFF_GENOME_SIZE=$(echo "${EFF_OUTPUT}" | grep "Effective Genome Size" | awk '{print $(NF-1)}')
+EFF_GENOME_SIZE="$(awk '/Effective Genome Size/ {print $(NF-1)}' <<< "${EFF_OUTPUT}" | tail -n1 || true)"
if [[ -z "${EFF_GENOME_SIZE}" ]]; then
echo "Error: pyEffGenomeSize did not return an effective genome size. Check logs above." >&2
exit 1
fi
+if [[ ! "${EFF_GENOME_SIZE}" =~ ^[0-9]+$ ]]; then
+ echo "Error: Effective genome size is not a valid integer: ${EFF_GENOME_SIZE}" >&2
+ exit 1
+fi
+
if [[ "${EFF_GENOME_SIZE}" -eq 0 ]]; then
echo "Error: Effective genome size is 0 bp" >&2
exit 1
fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/code.sh` around lines 37 - 47, The EFF_GENOME_SIZE extraction pipeline is
vulnerable when pipefail is enabled because grep exits with code 1 when no match
is found, causing premature script termination before validation can occur.
Additionally, the empty check on line 39 does not validate whether the parsed
value is numeric, allowing non-numeric strings to reach the arithmetic
comparison on line 44. Fix this by either: allowing grep to return empty output
without triggering pipefail (such as using grep with a fallback pattern or
piping through a non-failing command), and then enhance the validation logic to
check both that EFF_GENOME_SIZE is non-empty AND matches a numeric pattern
before performing the arithmetic comparison with -eq. This ensures the script
handles missing matches, empty results, and non-numeric values gracefully.
RSWilson1
left a comment
There was a problem hiding this comment.
@RSWilson1 reviewed 12 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on Aisha-D).
src/code.sh line 31 at r1 (raw file):
--minCoverage 100 \ --minMapq 50 \ --oprefix "/home/dnanexus/out/${SAMPLE}" 2>&1); then
should it be --oprefix ?
src/code.sh line 68 at r1 (raw file):
fi dx-upload-all-outputs
could add --parallel here but probably won't make much difference
This change is
Summary by CodeRabbit
Release Notes
New Features
Documentation
Chores