Skip to content

CDS phase (offset for eg ribo slippage) #76

Description

@holtgrewe

I'm puzzled and need help understanding. Consider NM_015068.3. I think that the exons given in the CDOT JSON skip one base that needs to be in the CDS as it's read double (slippage). What do you think?

Here is how it looks like in cdot-0.2.24.refseq.grch37.json.

# jq '.transcripts | to_entries[] | (select(.key == "NM_015068.3"))' cdot-0.2.24.refseq.grch37.json
{
  "key": "NM_015068.3",
  "value": {
    "biotype": [
      "mRNA"
    ],
    "gene_name": "PEG10",
    "gene_version": "23089",
    "genome_builds": {
      "GRCh37": {
        "cds_end": 94294994,
        "cds_start": 94292868,
        "contig": "NC_000007.13",
        "exons": [
          [
            94285636,
            94285892,
            0,
            1,
            256,
            null
          ],
          [
            94292645,
            94299007,
            1,
            257,
            6618,
            null
          ]
        ],
        "note": "protein translation is dependent on -1 ribosomal frameshift%3B isoform 1 is encoded by transcript variant 1",
        "strand": "+",
        "url": "https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/105.20220307/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gff.gz"
      }
    },
    "hgnc": "14005",
    "id": "NM_015068.3",
    "protein": "NP_055883.2",
    "start_codon": 479,
    "stop_codon": 2605
  }
}

NCBI Gene says

[...]
     CDS             join(480..1436,1436..2605)
[...]

The original GFF3 says

 egrep -w 'ID=gene14272|Parent=rna16954|NM_015068.3' ref_GRCh37.p10_top_level.gff3
NC_000007.13    RefSeq  gene    94285637        94299007        .       +       .       ID=gene14272;Name=PEG10;Dbxref=GeneID:23089,HGNC:14005,MIM:609810;description=paternally expressed 10;gbkey=Gene;gene=PEG10;gene_synonym=EDR,HB-1,Mar2,Mart2,MEF3L,RGAG3
NC_000007.13    RefSeq  mRNA    94285637        94299007        .       +       .       ID=rna16953;Name=NM_015068.3;Parent=gene14272;Dbxref=GeneID:23089,Genbank:NM_015068.3,HGNC:14005,MIM:609810;gbkey=mRNA;gene=PEG10;product=paternally expressed 10%2C transcript variant 1;transcript_id=NM_015068.3
NC_000007.13    RefSeq  exon    94285682        94285903        .       +       .       ID=id161180;Parent=rna16954;Dbxref=GeneID:23089,Genbank:NM_001172437.1,HGNC:14005,MIM:609810;gbkey=mRNA;gene=PEG10;product=paternally expressed 10%2C transcript variant 2;transcript_id=NM_001172437.1
NC_000007.13    RefSeq  exon    94292646        94299007        .       +       .       ID=id161181;Parent=rna16954;Dbxref=GeneID:23089,Genbank:NM_001172437.1,HGNC:14005,MIM:609810;gbkey=mRNA;gene=PEG10;product=paternally expressed 10%2C transcript variant 2;transcript_id=NM_001172437.1
NC_000007.13    RefSeq  CDS     94285899        94285903        .       +       0       ID=cds13063;Name=NP_001165908.1;Parent=rna16954;Note=isoform 3 is encoded by transcript variant 2;Dbxref=GeneID:23089,Genbank:NP_001165908.1,HGNC:14005,MIM:609810;exception=ribosomal slippage;gbkey=CDS;product=retrotransposon-derived protein PEG10 isoform 3;protein_id=NP_001165908.1
NC_000007.13    RefSeq  CDS     94292646        94293825        .       +       1       ID=cds13063;Name=NP_001165908.1;Parent=rna16954;Note=isoform 3 is encoded by transcript variant 2;Dbxref=GeneID:23089,Genbank:NP_001165908.1,HGNC:14005,MIM:609810;exception=ribosomal slippage;gbkey=CDS;product=retrotransposon-derived protein PEG10 isoform 3;protein_id=NP_001165908.1
NC_000007.13    RefSeq  CDS     94293825        94294994        .       +       0       ID=cds13063;Name=NP_001165908.1;Parent=rna16954;Note=isoform 3 is encoded by transcript variant 2;Dbxref=GeneID:23089,Genbank:NP_001165908.1,HGNC:14005,MIM:609810;exception=ribosomal slippage;gbkey=CDS;product=retrotransposon-derived protein PEG10 isoform 3;protein_id=NP_001165908.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions