Skip to content

Add API to move values from X to obs/var/obsm/varm and vice versa #655

Description

@Zethson

Hi,

for ehrapy we need the ability to move values from X to obs/var/obsm/varm and the other way around. Quoting myself:

Would be very useful to have a method to move data from X to obs and the other way around. I am sure that when loading complex data one sometimes forgets to add a column to "obs_only" and instead of loading the complete (large) file again it would help to just move columns.

@ivirshup already kindly drafted an API for this:

Drafts:
from typing import Union

import anndata as ad, numpy as np, pandas as pd

def split_out(adata: ad.AnnData, idx: "np.ndarray[1, bool]", *, axis=1):
    idxs = [slice(None), slice(None)]
    idxs[axis] = idx
    idxs = tuple(idxs)
    df = adata[idxs].to_df()
    if axis == 1:
        adata._inplace_subset_var(~idx)
        adata.obs = adata.obs.join(df)
    elif axis == 0:
        adata._inplace_subset_obs(~idx)
        adata.var = adata.var.join(df)


def splice_in(
        adata: ad.AnnData,
        *,
        obs: Union[str, list[str]]=None,
        var: Union[str, list[str]]=None,
    ) -> ad.AnnData:
    assert (obs is None) + (var is None) == 1
    if obs is not None:
        if isinstance(obs, str): obs = [obs]
        res = ad.concat([adata, ad.AnnData(adata.var[obs])], axis=0)
        res.var.drop(columns=obs, inplace=True)
        return res
    elif var is not None:
        if isinstance(var, str): var = [var]
        res = ad.concat([adata, ad.AnnData(adata.obs[var])], axis=1)
        res.obs.drop(columns=var, inplace=True)
        return res


from anndata.tests.helpers import gen_adata

a = gen_adata((20, 10))
b = gen_adata((20, 5))
c = ad.concat({"a": a, "b": b}, axis=1, index_unique="-", label="vartype")
d = c.copy()
d
AnnData object with n_obs × n_vars = 20 × 15
    var: 'var_cat', 'cat_ordered', 'int64', 'float64', 'uint8', 'vartype'
    varm: 'array', 'sparse', 'df'
    layers: 'array', 'sparse'
removed_var = c.var_names[c.var["vartype"] == "b"]

split_out(d, d.var_names.isin(removed_var))  # convert removed_var to mask
d
AnnData object with n_obs × n_vars = 20 × 10
    obs: 'gene0-b', 'gene1-b', 'gene2-b', 'gene3-b', 'gene4-b'
    var: 'var_cat', 'cat_ordered', 'int64', 'float64', 'uint8', 'vartype'
    varm: 'array', 'sparse', 'df'
    layers: 'array', 'sparse'
splice_in(d, var=removed_var)
AnnData object with n_obs × n_vars = 20 × 15

@ivirshup would you be up to implementing this yourself or should @Imipenem have a go at this?

CC @giovp @mbuttner because I was told that this might be useful for you.

Cheers

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Enhancement.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions