CNV frequencies halved in gene_cnv_frequencies_advanced with nobs_mode="fixed"

### Summary
When `gene_cnv_frequencies_advanced` is called with `nobs_mode="fixed"`, all returned CNV frequencies are exactly **half** the correct value. The nobs denominator multiplies by 2 when it shouldn't.

### The Problem

In `cnv_frq.py` lines 587-588:

```python
if nobs_mode == "called":
    nobs[:, cohort_index] = np.repeat(cohort_n_called, 2)
else:
    assert nobs_mode == "fixed"
    nobs[:, cohort_index] = cohort.size * 2  # ← BUG: should not multiply by 2
```

The issue: `count` represents **number of samples** with amp/del (line 580-581):
```python
count[::2, cohort_index] = np.sum(cohort_is_amp, axis=1)   # sample count
count[1::2, cohort_index] = np.sum(cohort_is_del, axis=1)  # sample count
```

So `frequency = count / nobs` becomes `sample_count / (samples × 2)` — exactly half.

### Why this is wrong

1. **The "called" mode proves the bug.** It uses:
   ```python
   nobs[:, cohort_index] = np.repeat(cohort_n_called, 2)
   ```
   `np.repeat(x, 2)` interleaves each value twice — it produces `[10, 10, 20, 20, 30, 30]`,  NOT `[20, 40, 60]`. So nobs = number of called **samples**, not doubled.

2. **The basic version confirms it.** `gene_cnv_frequencies` (non-advanced) computes:
   ```python
   frequency = amp_count_coh / called_count_coh  # sample count / sample count ✅
   nobs = called_count_coh                        # NO multiply by 2
   ```

3. **CNV calls are per-sample, not per-allele.** Unlike SNPs where each diploid sample contributes 2 alleles, CNV `CN_mode` gives one copy number classification per sample. 

### Impact

- All CNV frequencies from `gene_cnv_frequencies_advanced(..., nobs_mode="fixed")`  are exactly **half** the true value
- Downstream confidence intervals from `_add_frequency_ci` are artificially narrow (inflated nobs inflates precision)
- Any spatial/temporal/population analyses using these frequencies show systematically deflated CNV rates

### Example

10 samples with amp, nobs_mode="fixed":
- Current (wrong): frequency = 10 / (10 × 2) = 0.50 → suggests 50% frequency
- Correct: frequency = 10 / 10 = 1.0 → suggests 100% frequency (all samples have it)






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNV frequencies halved in gene_cnv_frequencies_advanced with nobs_mode="fixed" #1019

Summary

The Problem

Why this is wrong

Impact

Example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CNV frequencies halved in gene_cnv_frequencies_advanced with nobs_mode="fixed" #1019

Description

Summary

The Problem

Why this is wrong

Impact

Example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions