Add API to move values from X to obs/var/obsm/varm and vice versa

Hi,

for ehrapy we need the ability to move values from X to obs/var/obsm/varm and the other way around. Quoting myself:

> Would be very useful to have a method to move data from X to obs and the other way around. I am sure that when loading complex data one sometimes forgets to add a column to "obs_only" and instead of loading the complete (large) file again it would help to just move columns.

@ivirshup already kindly drafted an API for this:


<details>
<summary> Drafts: </summary>

```python
from typing import Union

import anndata as ad, numpy as np, pandas as pd

def split_out(adata: ad.AnnData, idx: "np.ndarray[1, bool]", *, axis=1):
    idxs = [slice(None), slice(None)]
    idxs[axis] = idx
    idxs = tuple(idxs)
    df = adata[idxs].to_df()
    if axis == 1:
        adata._inplace_subset_var(~idx)
        adata.obs = adata.obs.join(df)
    elif axis == 0:
        adata._inplace_subset_obs(~idx)
        adata.var = adata.var.join(df)


def splice_in(
        adata: ad.AnnData,
        *,
        obs: Union[str, list[str]]=None,
        var: Union[str, list[str]]=None,
    ) -> ad.AnnData:
    assert (obs is None) + (var is None) == 1
    if obs is not None:
        if isinstance(obs, str): obs = [obs]
        res = ad.concat([adata, ad.AnnData(adata.var[obs])], axis=0)
        res.var.drop(columns=obs, inplace=True)
        return res
    elif var is not None:
        if isinstance(var, str): var = [var]
        res = ad.concat([adata, ad.AnnData(adata.obs[var])], axis=1)
        res.obs.drop(columns=var, inplace=True)
        return res


from anndata.tests.helpers import gen_adata

a = gen_adata((20, 10))
b = gen_adata((20, 5))
c = ad.concat({"a": a, "b": b}, axis=1, index_unique="-", label="vartype")
d = c.copy()
d
```

```
AnnData object with n_obs × n_vars = 20 × 15
    var: 'var_cat', 'cat_ordered', 'int64', 'float64', 'uint8', 'vartype'
    varm: 'array', 'sparse', 'df'
    layers: 'array', 'sparse'
```

```python
removed_var = c.var_names[c.var["vartype"] == "b"]

split_out(d, d.var_names.isin(removed_var))  # convert removed_var to mask
d
```

```
AnnData object with n_obs × n_vars = 20 × 10
    obs: 'gene0-b', 'gene1-b', 'gene2-b', 'gene3-b', 'gene4-b'
    var: 'var_cat', 'cat_ordered', 'int64', 'float64', 'uint8', 'vartype'
    varm: 'array', 'sparse', 'df'
    layers: 'array', 'sparse'
```

```python
splice_in(d, var=removed_var)
```

```
AnnData object with n_obs × n_vars = 20 × 15
```

</details>

@ivirshup would you be up to implementing this yourself or should @imipenem have a go at this?

CC @giovp @mbuttner because I was told that this might be useful for you.

Cheers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add API to move values from X to obs/var/obsm/varm and vice versa #655

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Add API to move values from X to obs/var/obsm/varm and vice versa #655

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions