Skip to content

This solver needs samples of at least 2 classes in the data #12

Description

@skudashev

My issue is very similar to #9 (comment)

Parsing BAM file: chr22_alignments.sorted.bam
Identified 182998 introns
Annotated introns file /ei/projects/8/8289c66d-2d56-4706-a307-5a9a3eb3747e/data/Annotations/gencode.v44.annotated_juncs.bed provided
Identified 402454 annotated introns
debug: Tree structure:
debug: |--- jad <= 71.50
debug: |   |--- class: 0
debug: |--- jad >  71.50
debug: |   |--- is_canonical_motif <= 0.50
debug: |   |   |--- class: 0
debug: |   |--- is_canonical_motif >  0.50
debug: |   |   |--- class: 0
debug: Decision tree 1 confusion matrix:
debug: [[177013      0]
debug:  [  5985      0]]
Fetching junction sequences from /ei/projects/3/31655266-640a-41d2-8663-59bba38bc3c4/data/data/References/hg38_sequin.fa
Identified 132451 unique donors and 127498 unique acceptors
Scoring donor sequences with LR...
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
pgrep: /nbi/software/production/bin/core/../..//hpccore/5/x86_64/lib/liblzma.so.5: no version information available (required by /lib64/libsystemd.so.0)
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker
    r = call_item()
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/lib2pass/seqlr.py", line 39, in train_and_predict
    lr.fit(X_train, y_train)
  File "/ei/software/testing/python_miniconda/4.10.3_py3.9_sk/x86_64/envs/2passtools/lib/python3.6/site-packages/sklearn/linear_model/_logistic.py", line 1376, in fit
    " class: %r" % classes_[0])
ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0
"""

I followed the instructions and then ran 2passtools with DEBUG on.

paftools.js gff2bed -j gencode.v44.annotation.gtf > gencode.v44.annotated_juncs.bed 
2passtools score -v DEBUG -f /ei/projects/3/31655266-640a-41d2-8663-59bba38bc3c4/data/data/References/hg38_sequin.fa -p 24 \
    -a /ei/projects/8/8289c66d-2d56-4706-a307-5a9a3eb3747e/data/Annotations/gencode.v44.annotated_juncs.bed --classifier-type decision_tree \
    -m "GTAG|GCAG|ATAG" -j 4 --keep-all-annot -o iPSC.merged.juncs.all.bed $subset_bam 
head -n 5  /ei/projects/8/8289c66d-2d56-4706-a307-5a9a3eb3747e/data/Annotations/gencode.v44.annotated_juncs.bed
chr1	12227	12612	ENST00000456328.2|lncRNA|DDX11L2	1000	+
chr1	12721	13220	ENST00000456328.2|lncRNA|DDX11L2	1000	+
chr1	12057	12178	ENST00000450305.2|transcribed_unprocessed_pseudogene|DDX11L1	1000	+
chr1	12227	12612	ENST00000450305.2|transcribed_unprocessed_pseudogene|DDX11L1	1000	+
chr1	12697	12974	ENST00000450305.2|transcribed_unprocessed_pseudogene|DDX11L1	1000	+

Could this be something to do with my canonical motifs? Also my JAD is set to 4 but the tree structure says jad <= 71.50, is this correct?

Kind regards,
Sofia

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions