diff --git a/.gitignore b/.gitignore index a168fb481..a3491066b 100644 --- a/.gitignore +++ b/.gitignore @@ -19,6 +19,7 @@ __pycache__/ # Test results (nunit/junit) and coverage /test-data/ /*coverage* +tests/repr_html_visual_test.html # jupyter .ipynb_checkpoints diff --git a/.vscode/settings.json b/.vscode/settings.json index c92d8b3ed..a60c24e2b 100644 --- a/.vscode/settings.json +++ b/.vscode/settings.json @@ -1,5 +1,5 @@ { - "[python][toml][json][jsonc]": { + "[python][javascript][toml][json][jsonc]": { "editor.formatOnSave": true, "editor.codeActionsOnSave": { "source.organizeImports": "explicit", @@ -12,7 +12,7 @@ "[toml]": { "editor.defaultFormatter": "tamasfe.even-better-toml", }, - "[json][jsonc]": { + "[javascript][json][jsonc]": { "editor.defaultFormatter": "biomejs.biome", }, "python.analysis.typeCheckingMode": "basic", diff --git a/biome.jsonc b/biome.jsonc index 3c34a2071..3f3c15941 100644 --- a/biome.jsonc +++ b/biome.jsonc @@ -1,6 +1,18 @@ { - "$schema": "https://biomejs.dev/schemas/2.1.1/schema.json", + "$schema": "https://biomejs.dev/schemas/2.3.12/schema.json", "formatter": { "useEditorconfig": true }, + "linter": { + "rules": { + "complexity": { + "noForEach": "on", + }, + }, + }, + "javascript": { + "formatter": { + "semicolons": "asNeeded", + }, + }, "overrides": [ { "includes": ["./.vscode/*.json", "**/*.jsonc", "**/asv.conf.json"], diff --git a/docs/release-notes/2236.feat.md b/docs/release-notes/2236.feat.md new file mode 100644 index 000000000..38093264d --- /dev/null +++ b/docs/release-notes/2236.feat.md @@ -0,0 +1 @@ +Add rich HTML representation for {class}`~anndata.AnnData` objects in Jupyter notebooks with foldable sections, search/filter, category color visualization, dark mode support, and configurable settings via {attr}`anndata.settings` diff --git a/pyproject.toml b/pyproject.toml index 200f926e8..fd4b98f97 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -266,7 +266,7 @@ max-positional-args = 5 [tool.codespell] skip = ".git,*.pdf,*.svg" -ignore-words-list = "theis,coo,homogenous,GroupT" +ignore-words-list = "theis,coo,homogenous,vart,GroupT" [tool.towncrier] package = "anndata" diff --git a/src/anndata/_core/anndata.py b/src/anndata/_core/anndata.py index a2d0ade75..e476c2729 100644 --- a/src/anndata/_core/anndata.py +++ b/src/anndata/_core/anndata.py @@ -556,6 +556,64 @@ def __repr__(self) -> str: else: return self._gen_repr(self.n_obs, self.n_vars) + def _repr_html_(self) -> str | None: + """Rich HTML representation for Jupyter notebooks. + + Returns an interactive HTML representation with: + + - Foldable sections for each attribute (auto-collapse for large sections) + - Search/filter functionality across all fields + - Copy-to-clipboard buttons for field names + - Color visualization for categorical data with color palettes + - Serialization warnings for non-serializable types + - Memory usage and version information + - Dark mode support (auto-detects Jupyter/VS Code themes) + - Graceful degradation when JavaScript is disabled + + The representation can be configured via :attr:`anndata.settings`: + + - ``repr_html_enabled``: Enable/disable HTML repr (default: True) + - ``repr_html_fold_threshold``: Auto-fold sections with more entries (default: 5) + - ``repr_html_max_depth``: Max recursion depth for nested AnnData (default: 3) + - ``repr_html_max_items``: Max items to show per section (default: 200) + - ``repr_html_max_categories``: Max category values to display inline (default: 100) + - ``repr_html_unique_limit``: Max rows for unique count computation (default: 1M) + - ``repr_html_max_field_width``: Max width in pixels for field name column (default: 400) + - ``repr_html_type_width``: Width in pixels for type column (default: 220) + + Examples + -------- + Disable HTML representation globally: + + >>> import anndata + >>> anndata.settings.repr_html_enabled = False + + Temporarily change settings using context manager:: + + with anndata.settings.override(repr_html_fold_threshold=10): + display(adata) # Sections fold only when >10 items + + Returns + ------- + str | None + HTML string if enabled, None otherwise (falls back to text repr). + """ + if not settings.repr_html_enabled: + return None + + try: + from anndata._repr import generate_repr_html + + return generate_repr_html(self) + except Exception as e: # noqa: BLE001 + # Intentional broad catch: HTML repr should never crash the notebook + # Fall back to text repr if HTML generation fails, but log the error + warn( + f"HTML repr failed, falling back to text repr: {e}", + UserWarning, + ) + return None + def __eq__(self, other): """Equality testing""" msg = ( diff --git a/src/anndata/_repr/__init__.py b/src/anndata/_repr/__init__.py new file mode 100644 index 000000000..6c39c6394 --- /dev/null +++ b/src/anndata/_repr/__init__.py @@ -0,0 +1,443 @@ +""" +Rich HTML representation for AnnData objects in Jupyter notebooks. + +Module Architecture +------------------- +This package uses a layered import hierarchy to avoid circular imports. +When modifying imports, maintain this order: + +.. code-block:: text + + _repr_constants.py (outside _repr/, no internal imports) + │ Constants only. Imported by _settings.py at anndata import time. + │ Must not import anything from anndata. + │ + └─► utils.py (depends only on external: numpy, pandas) + │ HTML escaping, formatting, serialization checks. + │ + └─► components.py (depends on: utils) + │ UI building blocks: badges, buttons, icons. + │ + └─► registry.py (depends on: _repr_constants) + │ Formatter registry, TypeFormatter, SectionFormatter. + │ NOTE: formatters.py imports registry for registration. + │ + └─► core.py (depends on: components, registry, utils) + │ Shared rendering primitives: render_section(). + │ + ├─► sections.py (depends on: core, components, registry, utils) + │ │ Section-specific renderers (obs, var, uns, etc.). + │ │ Uses late import of html.render_formatted_entry. + │ │ + │ └─► html.py (depends on: core, sections, components, registry, utils) + │ Main orchestrator: generate_repr_html(). + │ Side-effect import of formatters.py for registration. + │ + └─► formatters.py (depends on: registry, components, utils) + Built-in type formatters. Auto-registers on import. + Late import of html.generate_repr_html for AnnDataFormatter. + + __init__.py (imports from all modules for public API) + +Key patterns for avoiding circular imports: +1. ``_repr_constants.py`` is outside ``_repr/`` - safe to import anywhere +2. Side-effect imports use ``from . import formatters as _formatters # noqa: F401`` +3. Late imports inside functions for cross-module dependencies +4. Type hints use ``if TYPE_CHECKING:`` blocks + +This module provides an extensible HTML representation system with: +- Foldable sections with auto-collapse +- Search/filter functionality +- Color visualization for categorical data +- Value previews for simple types in uns (strings, numbers, dicts, lists) +- Serialization warnings +- Support for nested AnnData objects +- Graceful handling of unknown types + +Extensibility +------------- +The system is designed to be extensible via two registry patterns: + +**TypeFormatter** (for custom visualization of values): + Register a formatter to customize how specific types are displayed. + Can match by Python type OR by embedded type hints in data. + + Attributes: + - ``priority``: Higher priority formatters are checked first (default: 0) + - ``sections``: Tuple of section names to restrict formatter to (default: None = all) + + Example - format by Python type:: + + from anndata._repr import register_formatter, TypeFormatter, FormattedOutput + + + @register_formatter + class MyArrayFormatter(TypeFormatter): + sections = ("obsm", "varm") # Only apply to obsm/varm + + def can_format(self, obj, context): + return isinstance(obj, MyArrayType) + + def format(self, obj, context): + return FormattedOutput( + type_name=f"MyArray {obj.shape}", + css_class="anndata-dtype--myarray", + # preview_html provides HTML for the preview column (rightmost) + preview_html=f'({obj.n_items} items)', + ) + + **Error handling**: Formatters can signal errors in two ways: + + 1. **Raise an exception** - The registry catches it, emits a warning with the + full error message (for debugging), and continues to try other formatters. + If all formatters fail, the fallback formatter is used with the accumulated + errors (showing only exception types in HTML to avoid long messages). + + 2. **Set ``error`` field explicitly** - For expected errors, set + ``FormattedOutput(error="reason")`` directly. The row will be highlighted + red and the error shown in the preview column. + + When ``error`` is set, it takes precedence over ``preview`` and ``preview_html``. + + Example - format by embedded type hint (for tagged data in uns):: + + from anndata._repr import register_formatter, TypeFormatter, FormattedOutput + from anndata._repr import extract_uns_type_hint + + + @register_formatter + class MyConfigFormatter(TypeFormatter): + priority = 100 # Check before fallback + sections = ("uns",) # Only apply to uns + + def can_format(self, obj, context): + hint, _ = extract_uns_type_hint(obj) + return hint == "mypackage.config" + + def format(self, obj, context): + hint, data = extract_uns_type_hint(obj) + return FormattedOutput( + type_name="config", + preview_html="Custom config preview", + ) + + Data structure for type hints (works in any section):: + + adata.uns["my_config"] = { + "__anndata_repr__": "mypackage.config", + "data": '{"setting": "value"}', + } + + When a package registers a formatter and the user imports that package, + the formatter will automatically handle matching tagged data. Without + the import, a fallback shows: "[mypackage.config] (import mypackage)". + + See :func:`extract_uns_type_hint` for full documentation on this pattern. + + **The context parameter**: Both ``can_format()`` and ``format()`` receive a + :class:`FormatterContext` with useful attributes: + + - ``context.section``: Current section ("obs", "var", "uns", etc.) + - ``context.key``: Current entry key (column name for obs/var, dict key for uns, etc.) + - ``context.adata_ref``: Reference to root AnnData (for uns lookups) + + This enables context-aware formatting, e.g., looking up metadata in + ``context.adata_ref.uns`` based on ``context.key``. See + :class:`FormatterContext` for all available attributes. + +**SectionFormatter** (for adding new sections): + Register a formatter to add entirely new sections (like TreeData's obst/vart). + + Example:: + + from anndata._repr import register_formatter, SectionFormatter + from anndata._repr import FormattedEntry, FormattedOutput + + + @register_formatter + class ObstSectionFormatter(SectionFormatter): + section_name = "obst" + after_section = "obsm" # Position after obsm + + def should_show(self, obj): + return hasattr(obj, "obst") and len(obj.obst) > 0 + + def get_entries(self, obj, context): + return [ + FormattedEntry( + key=k, + output=FormattedOutput(type_name=f"Tree ({v.n_nodes} nodes)"), + ) + for k, v in obj.obst.items() + ] + +Building Custom _repr_html_ +--------------------------- +For packages with AnnData-adjacent objects (like SpatialData, MuData) that need +their own ``_repr_html_``, you can reuse anndata's CSS, JavaScript, and helpers. + +**Basic structure**:: + + from anndata._repr import get_css, get_javascript + + + class MyData: + def _repr_html_(self): + container_id = f"mydata-{id(self)}" + return f''' + {get_css()} +
+
+ MyData + 100 items +
+
+ +
+
+ {get_javascript(container_id)} + ''' + +**CSS classes** (stable, can be used directly): + + Classes follow `BEM naming convention `_: + ``anndata-{block}__{element}--{modifier}`` + + **Blocks** (top-level components): + - ``anndata-repr``: Main container (required for JS and styling) + - ``anndata-header``: Header row (flexbox, contains type/shape/badges) + - ``anndata-footer``: Footer row (version, memory info) + - ``anndata-section``: Individual section wrapper + - ``anndata-entry``: Table row for data entries + - ``anndata-badge``: Status badges (View, Backed, Lazy) + - ``anndata-dtype``: Data type indicators + + **Elements** (parts of blocks, use ``__``): + - ``anndata-header__type``: Type name span in header + - ``anndata-header__shape``: Shape/dimensions span in header + - ``anndata-section__content``: Section content (rows) + - ``anndata-entry__name``: Entry name cell + - ``anndata-entry__type``: Entry type cell + - ``anndata-entry__preview``: Entry preview cell + + **Modifiers** (variants, use ``--``): + - ``anndata-section--collapsed``: Collapsed section state + - ``anndata-badge--view``: View badge variant + - ``anndata-dtype--category``: Categorical dtype styling + +**CSS variables** (set on ``.anndata-repr`` element): + + - ``--anndata-name-col-width``: Width of name column (default: 150px) + - ``--anndata-type-col-width``: Width of type column (default: 220px) + +**Using render helpers** for consistent section rendering:: + + from anndata._repr import ( + CSS_DTYPE_NDARRAY, + get_css, + get_javascript, + render_section, + render_formatted_entry, + render_badge, + render_search_box, + FormattedEntry, + FormattedOutput, + ) + + + def _repr_html_(self): + container_id = f"mydata-{id(self)}" + parts = [get_css()] + + # Header + parts.append(f''' +
+
+ MyData + {render_badge("Zarr", "anndata-badge--backed")} + + {render_search_box(container_id)} +
+
+ ''') + + # Build section entries + entries = [] + for key, value in self.items.items(): + entry = FormattedEntry( + key=key, + output=FormattedOutput( + type_name=f"array {value.shape}", + css_class=CSS_DTYPE_NDARRAY, + ), + ) + entries.append(render_formatted_entry(entry)) + + # Render section + parts.append( + render_section( + "items", + "\\n".join(entries), + n_items=len(self.items), + ) + ) + + parts.append("
") + parts.append(get_javascript(container_id)) + return "\\n".join(parts) + +**Embedding nested AnnData** with full interactivity:: + + from anndata._repr import generate_repr_html, FormattedEntry, FormattedOutput + + nested_html = generate_repr_html(adata, depth=1, max_depth=3) + entry = FormattedEntry( + key="table", + output=FormattedOutput( + type_name=f"AnnData ({adata.n_obs} x {adata.n_vars})", + expanded_html=nested_html, # Collapsible content below the row + ), + ) + +**Complete example**: See ``MockSpatialData`` in ``tests/visual_inspect_repr_html.py`` +for a full implementation with images, labels, points, shapes, and nested tables. +""" + +from __future__ import annotations + +# Import constants from dedicated module (single source of truth) +# Note: _repr_constants is outside _repr/ to avoid loading the full _repr +# package when _settings.py imports constants at anndata import time. +from .._repr_constants import ( + CSS_DTYPE_ANNDATA, + CSS_DTYPE_NDARRAY, + DEFAULT_FOLD_THRESHOLD, + DEFAULT_MAX_CATEGORIES, + DEFAULT_MAX_DEPTH, + DEFAULT_MAX_FIELD_WIDTH, + DEFAULT_MAX_ITEMS, + DEFAULT_MAX_LAZY_CATEGORIES, + DEFAULT_MAX_STRING_LENGTH, + DEFAULT_PREVIEW_ITEMS, + DEFAULT_TYPE_WIDTH, + DEFAULT_UNIQUE_LIMIT, + NOT_SERIALIZABLE_MSG, +) + +# Documentation base URL +DOCS_BASE_URL = "https://anndata.readthedocs.io/en/latest/" + + +def get_section_doc_url(section: str) -> str: + """Get documentation URL for a section. + + Centralizes URL generation so the pattern can be changed in one place. + Uses /en/latest/ for discoverability (users can navigate to their version). + + Parameters + ---------- + section + Section name (e.g., "obs", "var", "uns", "obsm") + + Returns + ------- + URL to the section's documentation page + """ + return f"{DOCS_BASE_URL}generated/anndata.AnnData.{section}.html" + + +# Import main functionality +# Inline styles for graceful degradation (from single source of truth) +from .._repr_constants import STYLE_HIDDEN # noqa: E402 + +# Building blocks for packages that want to create their own _repr_html_ +# These allow reusing anndata's styling while building custom representations +from .components import ( # noqa: E402 + TypeCellConfig, + render_badge, + render_copy_button, + render_header_badges, + render_search_box, + render_warning_icon, +) +from .css import get_css # noqa: E402 +from .html import ( # noqa: E402 + generate_repr_html, + render_formatted_entry, + render_section, +) +from .javascript import get_javascript # noqa: E402 +from .registry import ( # noqa: E402 + UNS_TYPE_HINT_KEY, + FormattedEntry, + FormattedOutput, + FormatterContext, + # Type formatter registry + FormatterRegistry, + SectionFormatter, + TypeFormatter, + # Type hint extraction (for tagged data in uns) + extract_uns_type_hint, + formatter_registry, + register_formatter, +) + +# HTML rendering helpers for building custom sections +from .utils import ( # noqa: E402 + escape_html, + format_memory_size, + format_number, + validate_key, +) + +__all__ = [ # noqa: RUF022 # organized by category, not alphabetically + # Constants + "DEFAULT_FOLD_THRESHOLD", + "DEFAULT_MAX_DEPTH", + "DEFAULT_MAX_ITEMS", + "DEFAULT_MAX_STRING_LENGTH", + "DEFAULT_PREVIEW_ITEMS", + "DEFAULT_MAX_CATEGORIES", + "DEFAULT_MAX_LAZY_CATEGORIES", + "DEFAULT_UNIQUE_LIMIT", + "DEFAULT_MAX_FIELD_WIDTH", + "DEFAULT_TYPE_WIDTH", + "DOCS_BASE_URL", + "get_section_doc_url", + "NOT_SERIALIZABLE_MSG", + # CSS dtype constants for custom formatters + "CSS_DTYPE_NDARRAY", + "CSS_DTYPE_ANNDATA", + # Main function + "generate_repr_html", + # Registry for extensibility + "FormatterRegistry", + "formatter_registry", + "register_formatter", + "SectionFormatter", + "TypeFormatter", + "FormattedOutput", + "FormattedEntry", + "FormatterContext", + # Type hint extraction (for tagged data in uns) + "extract_uns_type_hint", + "UNS_TYPE_HINT_KEY", + # Building blocks for custom _repr_html_ implementations + "get_css", + "get_javascript", + "escape_html", + "format_number", + "format_memory_size", + "render_section", + "render_formatted_entry", + "STYLE_HIDDEN", + # UI component helpers + "render_search_box", + "render_copy_button", + "render_badge", + "render_header_badges", + "render_warning_icon", + "TypeCellConfig", + # Validation helpers + "validate_key", +] diff --git a/src/anndata/_repr/components.py b/src/anndata/_repr/components.py new file mode 100644 index 000000000..db2530117 --- /dev/null +++ b/src/anndata/_repr/components.py @@ -0,0 +1,608 @@ +""" +Reusable UI components for HTML representation. + +This module provides building blocks for creating consistent HTML representations: +- Warning/error icons with tooltips +- Search box with filter toggles +- Fold/expand icons for collapsible sections +- Copy-to-clipboard buttons +- Status badges (view, backed, sparse, etc.) + +These components are designed to be used by both anndata's internal repr +and by external packages (MuData, SpatialData, TreeData) that want to +build compatible representations. +""" + +from __future__ import annotations + +from dataclasses import dataclass, field + +from .._repr_constants import ( + CSS_ENTRY, + CSS_TEXT_MUTED, + NOT_SERIALIZABLE_MSG, + STYLE_HIDDEN, +) +from .utils import escape_html, sanitize_css_color + + +def render_entry_row_open( + key: str, + dtype: str, + *, + has_warnings: bool = False, + is_error: bool = False, + has_expandable_content: bool = False, + extra_classes: str = "", +) -> str: + """Render the opening tag for an entry row. + + For regular entries, returns ``
``. + For expandable entries, returns ``
`` + so the whole row acts as the disclosure toggle. + + Parameters + ---------- + key + The entry key (column name, field name, etc.) + dtype + The data type string (for data-dtype attribute) + has_warnings + Whether the entry has warnings + is_error + Whether the entry has errors (not serializable, invalid key) + has_expandable_content + Whether this entry has nested content (uses ``
``/````) + extra_classes + Additional CSS classes to include + + Returns + ------- + Opening tag(s) with class and data attributes + """ + # Build CSS class string + classes = [CSS_ENTRY] + if extra_classes: + classes.append(extra_classes) + if has_warnings: + classes.append("warning") + if is_error: + classes.append("error") + css_class = " ".join(classes) + + escaped_key = escape_html(key) + escaped_dtype = escape_html(dtype) + + if has_expandable_content: + return ( + f'
' + f'' + ) + return f'
' + + +def render_warning_icon( + warnings: list[str], *, is_not_serializable: bool = False +) -> str: + """Render warning icon with tooltip if there are warnings or serialization issues. + + Parameters + ---------- + warnings + List of warning messages to show in tooltip. + is_not_serializable + If True, prepends "Not serializable to H5AD/Zarr" to warnings. + + Returns + ------- + HTML string for warning icon, or empty string if no warnings. + """ + if not warnings and not is_not_serializable: + return "" + + # Build the tooltip message + if is_not_serializable: + if warnings: + # "Not serializable: reason1; reason2" + reasons = "; ".join(warnings) + title = f"{NOT_SERIALIZABLE_MSG}: {reasons}" + else: + # Just "Not serializable to H5AD/Zarr" + title = NOT_SERIALIZABLE_MSG + else: + # Independent warnings joined with ";" + title = "; ".join(warnings) + + title = escape_html(title) + return f'(!)' + + +def render_search_box(container_id: str = "") -> str: + """ + Render a search box with filter indicator and search mode toggles. + + The search box is hidden by default and shown when JavaScript is enabled. + It filters entries across all sections by key, type, or content. + Includes toggle buttons for case-sensitive search and regex mode. + + Parameters + ---------- + container_id + Unique ID for the container (used for label association) + + Returns + ------- + HTML string for the search box + + Example + ------- + >>> container_id = "spatialdata-123" + >>> parts = ['
'] + >>> parts.append('SpatialData') + >>> parts.append('') # Spacer + >>> parts.append(render_search_box(container_id)) + >>> parts.append("
") + """ + search_id = f"{container_id}-search" if container_id else "anndata-search" + return ( + f'' + f'' + f'' + f'' + f'' + f"" + f"" + f'' + ) + + +def render_copy_button(text: str, tooltip: str = "Copy") -> str: + """ + Render a copy-to-clipboard button. + + The button is hidden by default and shown when JavaScript is enabled. + When clicked, it copies the specified text to the clipboard. + + Parameters + ---------- + text + The text to copy when clicked + tooltip + Tooltip text (default: "Copy") + + Returns + ------- + HTML string for the copy button + + Example + ------- + >>> name = "gene_expression" + >>> html = f"{name}{render_copy_button(name, 'Copy name')}" + """ + escaped_text = escape_html(text) + escaped_tooltip = escape_html(tooltip) + return ( + f'' + ) + + +def _render_wrap_button(css_class: str) -> str: + """Render a wrap toggle button with the specified CSS class. + + Internal helper used by render_categories_wrap_button and render_columns_wrap_button. + """ + return f'' + + +def render_categories_wrap_button() -> str: + """Render a button to toggle category list between single-line and multi-line. + + Returns + ------- + HTML string for the wrap button (▼ expands, ▲ collapses) + """ + return _render_wrap_button("anndata-categories__wrap") + + +def render_columns_wrap_button() -> str: + """Render a button to toggle column list between single-line and multi-line. + + Returns + ------- + HTML string for the wrap button (▼ expands, ▲ collapses) + """ + return _render_wrap_button("anndata-columns__wrap") + + +def render_muted_span(text: str) -> str: + """Render text in a muted span (gray color). + + Parameters + ---------- + text + Text to render (will be HTML-escaped) + + Returns + ------- + HTML string with muted styling + """ + return f'{escape_html(text)}' + + +def render_nested_content(html_content: str) -> str: + """Render nested/expanded content inside an expandable entry. + + The entry must have been opened with ``has_expandable_content=True`` + (which makes it a ``
`` with ````). This function + closes the ```` and adds the nested content. The caller + must close the entry with ``
``. + + Parameters + ---------- + html_content + The HTML content to display when expanded + + Returns + ------- + HTML closing the summary and wrapping nested content + """ + return ( + f"
" + f'
' + f'
{html_content}
' + f"
" + ) + + +def render_badge( + text: str, + variant: str = "", + tooltip: str = "", +) -> str: + """ + Render a badge (pill-shaped label). + + Parameters + ---------- + text + Badge text + variant + Variant class for styling. Built-in variants: + - "" (default gray) + - "anndata-badge--view" (blue, for views) + - "anndata-badge--backed" (orange, for backed mode) + - "anndata-badge--sparse" (green, for sparse matrices) + - "anndata-badge--dask" (purple, for Dask arrays) + - "anndata-badge--extension" (for extension types) + tooltip + Tooltip text on hover + + Returns + ------- + HTML string for the badge + + Example + ------- + >>> badge = render_badge("Zarr", "anndata-badge--backed", "Backed by Zarr store") + """ + escaped_text = escape_html(text) + title_attr = f' title="{escape_html(tooltip)}"' if tooltip else "" + # Always include base class, optionally add variant + css_class = f"anndata-badge {variant}".strip() if variant else "anndata-badge" + return f'{escaped_text}' + + +def render_header_badges( + *, + is_view: bool = False, + is_backed: bool = False, + is_lazy: bool = False, + backing_path: str | None = None, + backing_format: str | None = None, +) -> str: + """ + Render standard header badges for view/backed/lazy status. + + Parameters + ---------- + is_view + Whether this is a view + is_backed + Whether this is backed by a file + is_lazy + Whether this uses lazy loading (experimental read_lazy) + backing_path + Path to the backing file (for tooltip) + backing_format + Format of the backing file ("H5AD", "Zarr", etc.) + + Returns + ------- + HTML string with badges + + Example + ------- + >>> badges = render_header_badges( + ... is_backed=True, + ... backing_path="/data/sample.zarr", + ... backing_format="Zarr", + ... ) + """ + parts = [] + if is_view: + parts.append( + render_badge( + "View", "anndata-badge--view", "This is a view of another object" + ) + ) + if is_backed: + tooltip = f"Backed by {backing_path}" if backing_path else "Backed mode" + label = backing_format or "Backed" + parts.append(render_badge(label, "anndata-badge--backed", tooltip)) + if is_lazy: + parts.append( + render_badge( + "Lazy", "anndata-badge--lazy", "Lazy loading (experimental read_lazy)" + ) + ) + return "".join(parts) + + +def render_name_cell(name: str) -> str: + """Render a name cell with copy button and tooltip for truncated names. + + The structure uses flexbox so the copy button stays visible even when + the name text overflows and shows ellipsis. + + Parameters + ---------- + name + The field name to display + + Returns + ------- + HTML string for the cell div + """ + escaped_name = escape_html(name) + return ( + f'' + f'' + f'{escaped_name}' + f"{render_copy_button(name, 'Copy name')}" + f"" + f"" + ) + + +def render_category_list( + categories: list, + colors: list[str] | None, + max_cats: int, + *, + n_hidden: int = 0, +) -> str: + """Render a list of category values with optional color dots. + + Parameters + ---------- + categories + List of category values to display + colors + Optional list of colors matching categories + max_cats + Maximum number of categories to show + n_hidden + Number of additional hidden categories (for lazy truncation). + These are added to any truncation from max_cats. + + Returns + ------- + HTML string for the category list + """ + parts = [''] + for i, cat in enumerate(categories[:max_cats]): + if i > 0: + parts.append(', ') + cat_name = escape_html(str(cat)) + color = colors[i] if colors and i < len(colors) else None + parts.append('') + if color: + # Sanitize color to prevent CSS injection + safe_color = sanitize_css_color(str(color)) + if safe_color: + parts.append( + f'' + ) + # Skip color dot if color is invalid/unsafe + parts.append(f"{cat_name}") + parts.append("") + + # Calculate total hidden: from max_cats truncation + lazy truncation + hidden_from_max_cats = max(0, len(categories) - max_cats) + total_hidden = hidden_from_max_cats + n_hidden + + if total_hidden > 0: + parts.append(f'...+{total_hidden}') + parts.append("") + return "".join(parts) + + +@dataclass +class TypeCellConfig: + """Configuration for rendering a type cell. + + Groups the many parameters of render_entry_type_cell into a single object, + making call sites cleaner and easier to understand. + + Attributes + ---------- + type_name + The type name to display (e.g., "ndarray (100, 50) float32") + css_class + CSS class for the type span (e.g., "anndata-dtype--ndarray") + type_html + Optional custom HTML content for the type cell + tooltip + Optional tooltip for the type label + warnings + List of warning messages + is_not_serializable + Whether the data cannot be serialized to H5AD/Zarr + has_columns_list + Whether to show columns wrap button + has_categories_list + Whether to show categories wrap button + append_type_html + If True, type_html is appended below type_name instead of replacing it + + Examples + -------- + >>> config = TypeCellConfig( + ... type_name="ndarray (100, 50) float32", + ... css_class="anndata-dtype--ndarray", + ... tooltip="Dense array", + ... ) + >>> html = render_entry_type_cell(config) + + With warnings:: + + >>> config = TypeCellConfig( + ... type_name="object", + ... css_class="anndata-dtype--object", + ... warnings=["Custom warning"], + ... is_not_serializable=True, + ... ) + """ + + type_name: str + css_class: str + type_html: str | None = None + tooltip: str = "" + warnings: list[str] = field(default_factory=list) + is_not_serializable: bool = False + has_columns_list: bool = False + has_categories_list: bool = False + append_type_html: bool = False + + +def render_entry_type_cell(config: TypeCellConfig) -> str: + """Render the type cell for an entry row. + + This is a unified helper that handles all type cell variations: + - Type label with optional tooltip + - Custom type_html (as replacement or appended content) + - Warning icon + - Expand/wrap buttons + + The type_html and append_type_html config fields control content rendering: + + 1. No type_html: Shows type_name in a styled span + ``type_name`` + + 2. type_html with append_type_html=False (default): type_html REPLACES type_name + Used for fully custom type content (e.g., category swatches instead of text) + + 3. type_html with append_type_html=True: type_html is shown BELOW type_name + Used to add extra content while keeping the type label + (e.g., showing category list below "categorical" label) + + Parameters + ---------- + config + TypeCellConfig object with all rendering options + + Returns + ------- + HTML string for the complete type cell + + Examples + -------- + >>> config = TypeCellConfig( + ... type_name="ndarray (100, 50) float32", + ... css_class="anndata-dtype--ndarray", + ... tooltip="Dense array", + ... ) + >>> html = render_entry_type_cell(config) + """ + type_name = config.type_name + css_class = config.css_class + type_html = config.type_html + tooltip = config.tooltip + warnings = config.warnings + is_not_serializable = config.is_not_serializable + has_columns_list = config.has_columns_list + has_categories_list = config.has_categories_list + append_type_html = config.append_type_html + + parts = [ + '' + ] + + # Type content: handle different cases + if type_html and not append_type_html: + # type_html replaces the type label entirely + parts.append(type_html) + elif tooltip: + parts.append( + f'' + f"{escape_html(type_name)}" + ) + else: + parts.append(f'{escape_html(type_name)}') + + # Warning icon + parts.append( + render_warning_icon(warnings or [], is_not_serializable=is_not_serializable) + ) + + # Wrap buttons + if has_columns_list: + parts.append(render_columns_wrap_button()) + if has_categories_list: + parts.append(render_categories_wrap_button()) + + # Appended type_html (for custom inline rendering below the type) + if type_html and append_type_html: + parts.append(f'{type_html}') + + parts.append("") + return "".join(parts) + + +def render_entry_preview_cell( + preview_html: str | None = None, + preview_text: str | None = None, +) -> str: + """Render the preview cell (third column) for an entry row. + + Formatters are responsible for producing complete preview content. + This function just wraps it in the appropriate cell element. + + Parameters + ---------- + preview_html + Raw HTML content for preview (highest priority) + preview_text + Plain text preview (will be escaped and muted) + + Returns + ------- + HTML string for the preview cell + """ + parts = [ + '' + ] + + if preview_html: + parts.append(preview_html) + elif preview_text: + parts.append(render_muted_span(preview_text)) + + parts.append("") + return "".join(parts) diff --git a/src/anndata/_repr/core.py b/src/anndata/_repr/core.py new file mode 100644 index 000000000..3546a0183 --- /dev/null +++ b/src/anndata/_repr/core.py @@ -0,0 +1,402 @@ +""" +Core rendering primitives for AnnData HTML representation. + +This module contains shared rendering functions used by both: +- html.py (main orchestration) +- sections.py (section-specific renderers) + +By extracting these to a separate module, we avoid circular imports +between html.py and sections.py. +""" + +from __future__ import annotations + +from typing import TYPE_CHECKING + +from .._repr_constants import ( + CSS_DTYPE_CATEGORY, + CSS_DTYPE_DATAFRAME, + CSS_TEXT_ERROR, + CSS_TEXT_MUTED, +) +from .components import ( + TypeCellConfig, + render_entry_preview_cell, + render_entry_row_open, + render_entry_type_cell, + render_name_cell, + render_nested_content, +) +from .registry import formatter_registry +from .utils import escape_html, format_number + +if TYPE_CHECKING: + from .registry import FormattedEntry, FormatterContext + + +def render_section( # noqa: PLR0913 + name: str, + entries_html: str, + *, + n_items: int, + doc_url: str | None = None, + tooltip: str = "", + should_collapse: bool = False, + section_id: str | None = None, + count_str: str | None = None, +) -> str: + """ + Render a complete section with header and content. + + This is a public API for packages building their own _repr_html_. + It is also used internally for consistency. + + Parameters + ---------- + name + Display name for the section header (e.g., 'images', 'tables') + entries_html + HTML content for the section body (table rows) + n_items + Number of items (used for empty check and default count string) + doc_url + URL for the help link (? icon) + tooltip + Tooltip text for the help link + should_collapse + Whether this section should start collapsed + section_id + ID for the section in data-section attribute (defaults to name) + count_str + Custom count string for header (defaults to "(N items)") + + Returns + ------- + HTML string for the complete section + + Examples + -------- + :: + + from anndata._repr import ( + CSS_DTYPE_NDARRAY, + FormattedEntry, + FormattedOutput, + render_formatted_entry, + render_section, + ) + + rows = [] + for key, info in items.items(): + entry = FormattedEntry( + key=key, + output=FormattedOutput( + type_name=info["type"], css_class=CSS_DTYPE_NDARRAY + ), + ) + rows.append(render_formatted_entry(entry)) + + html = render_section( + "images", + "\\n".join(rows), + n_items=len(items), + doc_url="https://docs.example.com/images", + tooltip="Image data", + ) + """ + if section_id is None: + section_id = name + + if n_items == 0: + return render_empty_section(name, doc_url, tooltip) + + if count_str is None: + count_str = f"({n_items} items)" + + open_attr = " open" if not should_collapse else "" + parts = [ + f'
' + ] + + # Header + parts.append(_render_section_header(name, count_str, doc_url, tooltip)) + + # Content + parts.append('
') + parts.append('
') + parts.append(entries_html) + parts.append("
") + + return "\n".join(parts) + + +def _render_section_header( + name: str, + count_str: str, + doc_url: str | None, + tooltip: str, +) -> str: + """Render a section header as - native disclosure triangle replaces fold icon.""" + parts = [""] + parts.append(f'{escape_html(name)}') + parts.append( + f'{escape_html(count_str)}' + ) + if doc_url: + parts.append( + f'?' + ) + parts.append("") + return "\n".join(parts) + + +def render_empty_section( + name: str, + doc_url: str | None = None, + tooltip: str = "", +) -> str: + """Render an empty section indicator.""" + # Build help link if doc_url provided + help_link = "" + if doc_url: + help_link = f'?' + + return f""" +
+ + {escape_html(name)} + (empty) + {help_link} + +
+
No entries
+
+
+""" + + +def render_truncation_indicator(remaining: int) -> str: + """Render a truncation indicator.""" + return f'
... and {format_number(remaining)} more
' + + +def get_section_tooltip(section: str) -> str: + """Get tooltip text for a section.""" + tooltips = { + "obs": "Observation (cell) annotations", + "var": "Variable (gene) annotations", + "uns": "Unstructured annotation", + "obsm": "Multi-dimensional observation annotations", + "varm": "Multi-dimensional variable annotations", + "layers": "Additional data layers (same shape as X)", + "obsp": "Pairwise observation annotations", + "varp": "Pairwise variable annotations", + "raw": "Raw data (original unprocessed)", + } + return tooltips.get(section, "") + + +def render_x_entry(obj: object, context: FormatterContext) -> str: + """Render X as a single compact entry row. + + Works with AnnData, Raw, and any object with an X attribute. + Handles missing or broken X attributes gracefully. + """ + parts = ['
'] + parts.append("X") + + try: + X = obj.X + except Exception as e: # noqa: BLE001 + # Handle missing or broken X attribute gracefully + error_msg = f"error: {type(e).__name__}" + parts.append( + f'({escape_html(error_msg)})' + ) + parts.append("
") + return "\n".join(parts) + + if X is None: + parts.append("None") + else: + # Format the X matrix (formatter includes all info like sparsity, on disk, etc.) + try: + output = formatter_registry.format_value(X, context) + parts.append( + f'{escape_html(output.type_name)}' + ) + except Exception as e: # noqa: BLE001 + error_msg = f"error formatting: {type(e).__name__}" + parts.append( + f'({escape_html(error_msg)})' + ) + + parts.append("
") + return "\n".join(parts) + + +def render_formatted_entry( + entry: FormattedEntry, + section: str = "", + *, + extra_warnings: list[str] | None = None, + append_type_html: bool = False, + preview_note: str | None = None, +) -> str: + """ + Render a FormattedEntry as a table row. + + This is the unified entry renderer used both internally and as a public API + for packages building their own _repr_html_. + + Parameters + ---------- + entry + A FormattedEntry containing the key and FormattedOutput + section + Optional section name (used for meta column rendering) + extra_warnings + Additional warnings to display (e.g., key validation warnings) + append_type_html + If True, append type_html below type_name instead of replacing it. + Used for mapping entries (obsm, varm, etc.) to show extra content. + preview_note + Optional note to prepend to preview text (for type hints in uns) + + Returns + ------- + HTML string for the table row(s) + + Examples + -------- + :: + + from anndata._repr import ( + CSS_DTYPE_ANNDATA, + CSS_DTYPE_NDARRAY, + FormattedEntry, + FormattedOutput, + render_formatted_entry, + ) + + entry = FormattedEntry( + key="my_array", + output=FormattedOutput( + type_name="ndarray (100, 50) float32", + css_class=CSS_DTYPE_NDARRAY, + tooltip="My custom array", + warnings=["Some warning"], + ), + ) + html = render_formatted_entry(entry) + + With expandable nested content:: + + nested_html = generate_repr_html(adata, depth=1) + entry = FormattedEntry( + key="cell_table", + output=FormattedOutput( + type_name="AnnData (150 × 30)", + css_class=CSS_DTYPE_ANNDATA, + expanded_html=nested_html, + ), + ) + html = render_formatted_entry(entry) + + With key validation warnings:: + + entry = FormattedEntry( + key="bad/key", + output=FormattedOutput(...), + ) + html = render_formatted_entry( + entry, extra_warnings=["Contains '/' (deprecated)"] + ) + + With explicit error:: + + entry = FormattedEntry( + key="broken_data", + output=FormattedOutput( + type_name="MyType", + error="Failed to load: file not found", + ), + ) + html = render_formatted_entry(entry) + """ + output = entry.output + extra_warnings = extra_warnings or [] + + # Compute entry CSS classes + # Both hard errors and serialization issues get red background + all_warnings = extra_warnings + list(output.warnings) + has_error = output.error is not None or not output.is_serializable + + has_expandable_content = output.expanded_html is not None + # Detect wrap button needs from output css_class + has_categories = output.css_class == CSS_DTYPE_CATEGORY and bool( + output.preview_html + ) + has_columns_list = output.css_class == CSS_DTYPE_DATAFRAME and bool( + output.preview_html + ) + + # Build row using consolidated helper + parts = [ + render_entry_row_open( + entry.key, + output.type_name, + has_warnings=bool(all_warnings), + is_error=has_error, + has_expandable_content=has_expandable_content, + ) + ] + + # Name cell + parts.append(render_name_cell(entry.key)) + + # Type cell + type_cell_config = TypeCellConfig( + type_name=output.type_name, + css_class=output.css_class, + type_html=output.type_html if append_type_html else None, + tooltip=output.tooltip, + warnings=all_warnings, + is_not_serializable=not output.is_serializable, + has_columns_list=has_columns_list, + has_categories_list=has_categories, + append_type_html=append_type_html, + ) + parts.append(render_entry_type_cell(type_cell_config)) + + # Preview cell + # Error takes precedence over preview/preview_html + preview_html = output.preview_html + preview_text = output.preview + + if output.error and not preview_html: + # Generate error preview if error is set but no preview_html provided + error_text = escape_html(output.error) + preview_html = f'{error_text}' + + if preview_note and preview_text: + preview_text = f"{preview_note} {preview_text}" + elif preview_note: + preview_text = preview_note + + parts.append( + render_entry_preview_cell( + preview_html=preview_html, + preview_text=preview_text, + ) + ) + + # Expandable entries use
/; render_nested_content + # closes the and adds the nested content div. + if has_expandable_content: + parts.append(render_nested_content(output.expanded_html)) + parts.append("
") + else: + parts.append("") + + return "\n".join(parts) diff --git a/src/anndata/_repr/css.py b/src/anndata/_repr/css.py new file mode 100644 index 000000000..04c1645d2 --- /dev/null +++ b/src/anndata/_repr/css.py @@ -0,0 +1,17 @@ +"""CSS styles for AnnData HTML representation.""" + +from __future__ import annotations + +from functools import cache +from importlib.resources import files + + +@cache +def get_css() -> str: + """Get the complete CSS for the HTML representation. + + Dark/light theming is handled entirely in CSS via ``light-dark()`` + and ``color-scheme`` — no Python-side substitution needed. + """ + css = files("anndata._repr.static").joinpath("repr.css").read_text(encoding="utf-8") + return f"" diff --git a/src/anndata/_repr/formatters.py b/src/anndata/_repr/formatters.py new file mode 100644 index 000000000..8407455f3 --- /dev/null +++ b/src/anndata/_repr/formatters.py @@ -0,0 +1,1172 @@ +""" +Built-in formatters for common types. + +This module registers formatters for: +- NumPy arrays (dense and masked) +- SciPy sparse matrices +- Pandas DataFrames, Series, Categorical +- Dask arrays +- Awkward arrays +- AnnData objects (for recursive display in .uns) +- Python built-in types +- Color lists + +The formatters are registered automatically when this module is imported. +""" + +from __future__ import annotations + +import contextlib +from typing import TYPE_CHECKING + +import numpy as np +import pandas as pd + +from .._repr_constants import ( + COLOR_PREVIEW_LIMIT, + CSS_COLORS, + CSS_COLORS_SWATCH, + CSS_COLORS_SWATCH_INVALID, + CSS_DTYPE_ANNDATA, + CSS_DTYPE_ARRAY_API, + CSS_DTYPE_AWKWARD, + CSS_DTYPE_BOOL, + CSS_DTYPE_CATEGORY, + CSS_DTYPE_DASK, + CSS_DTYPE_DATAFRAME, + CSS_DTYPE_FLOAT, + CSS_DTYPE_GPU, + CSS_DTYPE_INT, + CSS_DTYPE_OBJECT, + CSS_DTYPE_SPARSE, + CSS_DTYPE_STRING, + CSS_DTYPE_TPU, + CSS_DTYPE_UNKNOWN, + CSS_NESTED_ANNDATA, + CSS_TEXT_MUTED, +) +from ..compat import has_xp +from .components import render_category_list +from .lazy import get_lazy_categorical_info, is_lazy_column +from .registry import ( + FormattedOutput, + TypeFormatter, + formatter_registry, +) +from .utils import ( + check_color_category_mismatch, + check_invalid_colors, + escape_html, + format_invalid_colors_warning, + format_number, + get_categories_for_display, + get_matching_column_colors, + get_setting, + is_color_list, + is_serializable, + preview_dict, + preview_number, + preview_sequence, + preview_string, + sanitize_css_color, + should_warn_string_column, +) + +if TYPE_CHECKING: + from typing import ClassVar, TypeGuard + + from .registry import FormatterContext + + +def _check_array_has_writer(array: object) -> bool: + """Check if an array type has a registered IO writer. + + This uses the actual IO registry, making it future-proof: if a writer + is registered for a new type (e.g., datetime64), this will detect it. + """ + try: + from .._io.specs.registry import _REGISTRY + + _REGISTRY.get_spec(array) + return True + except (KeyError, TypeError): + return False + + +def _check_series_backing_array(series: pd.Series) -> tuple[bool, str]: + """Check if a Series' backing array type can be serialized. + + Uses the IO registry to check the underlying array. This is future-proof: + if anndata adds support for datetime64/timedelta64/etc, this will detect it. + + Returns (is_serializable, reason_if_not). + """ + # Standard numpy dtypes are always serializable (no registry check needed) + # This covers: float16/32/64, int8/16/32/64, uint*, bool, complex*, bytes, str + if series.dtype.kind in ("f", "i", "u", "b", "c", "S", "U"): + return True, "" + + # Get the backing array for extension dtypes + backing_array = series.array + + # NumpyExtensionArray wraps numpy arrays - check the underlying numpy array + if type(backing_array).__name__ == "NumpyExtensionArray": + # The underlying numpy array is serializable + return True, "" + + # For other extension arrays (DatetimeArray, ArrowStringArray, etc.), + # check the IO registry. This is future-proof: if anndata adds support + # for datetime64, the registry will have a writer and this returns True. + if _check_array_has_writer(backing_array): + return True, "" + + # No writer registered - provide a helpful message + dtype_name = str(series.dtype) + return False, f"{dtype_name} not serializable" + + +def _check_series_serializability(series: pd.Series) -> tuple[bool, str]: + """ + Check if an object-dtype Series contains serializable values. + + For object dtype columns, checks the first non-null value to determine + if the column can be written to H5AD/Zarr. Uses anndata's actual IO + mechanism to test serializability. + + Parameters + ---------- + series + Pandas Series with object dtype + + Returns + ------- + tuple of (is_serializable, reason_if_not) + """ + if len(series) == 0: + return True, "" + + # Get first non-null value + first_valid_idx = series.first_valid_index() + if first_valid_idx is None: + return True, "" # All null + + value = series.loc[first_valid_idx] + + # Object dtype columns with non-string/numeric values are problematic + # Check if value is a type that anndata can serialize in a DataFrame column + if isinstance(value, (list, tuple)): + # Lists/tuples in DataFrame columns are not directly serializable + # (they work in uns as arrays, but not as DataFrame cell values) + # NOTE: If https://github.com/scverse/anndata/issues/1923 is resolved, + # lists of strings may become serializable - update this check accordingly + return False, f"Contains {type(value).__name__}" + elif isinstance(value, dict): + return False, "Contains dict" + elif not isinstance(value, str | bytes | np.generic | int | float | bool): + # Custom objects are not serializable + return False, f"Contains {type(value).__name__}" + + return True, "" + + +class NumpyArrayFormatter(TypeFormatter[np.ndarray]): + """Formatter for numpy.ndarray.""" + + priority = 100 + + def can_format( + self, obj: object, context: FormatterContext + ) -> TypeGuard[np.ndarray]: + return isinstance(obj, np.ndarray) + + def format(self, obj: np.ndarray, context: FormatterContext) -> FormattedOutput: + arr = obj + shape_str = " × ".join(format_number(s) for s in arr.shape) + dtype_str = str(arr.dtype) + + # Determine CSS class based on dtype + css_class = _get_dtype_css_class(arr.dtype) + + if arr.ndim == 2: + type_name = f"ndarray ({shape_str}) {dtype_str}" + elif arr.ndim == 1: + type_name = f"ndarray ({shape_str},) {dtype_str}" + else: + type_name = f"ndarray {arr.shape} {dtype_str}" + + # For obsm/varm sections, show number of columns in preview + preview = None + if context.section in ("obsm", "varm") and arr.ndim == 2: + n_cols = arr.shape[1] + preview = f"({format_number(n_cols)} columns)" + + return FormattedOutput( + type_name=type_name, + css_class=css_class, + preview=preview, + is_serializable=True, + ) + + +class NumpyMaskedArrayFormatter(TypeFormatter[np.ma.MaskedArray]): + """Formatter for numpy.ma.MaskedArray.""" + + priority = 110 + + def can_format( + self, obj: object, context: FormatterContext + ) -> TypeGuard[np.ma.MaskedArray]: + return isinstance(obj, np.ma.MaskedArray) + + def format( + self, obj: np.ma.MaskedArray, context: FormatterContext + ) -> FormattedOutput: + arr = obj + shape_str = " × ".join(format_number(s) for s in arr.shape) + dtype_str = str(arr.dtype) + n_masked = int(np.sum(arr.mask)) if arr.mask is not np.ma.nomask else 0 + + # For obsm/varm sections, show number of columns in preview + preview = None + if context.section in ("obsm", "varm") and arr.ndim == 2: + n_cols = arr.shape[1] + preview = f"({format_number(n_cols)} columns)" + + return FormattedOutput( + type_name=f"MaskedArray ({shape_str}) {dtype_str}", + css_class=_get_dtype_css_class(arr.dtype), + tooltip=f"{n_masked} masked values" if n_masked > 0 else "", + preview=preview, + is_serializable=True, + ) + + +class SparseMatrixFormatter(TypeFormatter[object]): + """ + Formatter for scipy.sparse matrices and arrays. + + Future-proofing notes: + - PR #1927 (https://github.com/scverse/anndata/pull/1927) removes scipy sparse inheritance + - Uses duck typing as fallback to detect sparse-like objects without relying on isinstance() + - Handles both scipy.sparse (CPU) and cupyx.scipy.sparse (GPU) sparse arrays + """ + + priority = 100 + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[object]: + # First try scipy.sparse.issparse() if available (backward compatibility) + try: + import scipy.sparse as sp + + if sp.issparse(obj): + return True + except ImportError: + pass + + # Fallback: Duck typing for sparse-like objects + # Future-proof against PR #1927 removing scipy sparse inheritance + # A sparse object should have: nnz (non-zero count), shape, dtype, and sparse conversion methods + module = type(obj).__module__ + is_sparse_module = module.startswith(("scipy.sparse", "cupyx.scipy.sparse")) + has_sparse_attrs = ( + hasattr(obj, "nnz") + and hasattr(obj, "shape") + and hasattr(obj, "dtype") + and (hasattr(obj, "tocsr") or hasattr(obj, "tocsc")) + ) + + return is_sparse_module and has_sparse_attrs + + def format(self, obj: object, context: FormatterContext) -> FormattedOutput: # noqa: PLR0912 + # Duck-typed: can_format() guarantees shape/dtype/nnz attrs + shape_str = " × ".join(format_number(s) for s in obj.shape) # type: ignore[attr-defined] + dtype_str = str(obj.dtype) # type: ignore[attr-defined] + + # Calculate sparsity + n_elements = obj.shape[0] * obj.shape[1] if len(obj.shape) == 2 else 1 # type: ignore[attr-defined] + if n_elements > 0: + sparsity = 1 - (obj.nnz / n_elements) # type: ignore[attr-defined] + sparsity_str = f"{sparsity:.1%} sparse" + else: + sparsity_str = "" + + # Determine format name + # Try scipy-specific checks first (backward compatibility) + format_name = None + try: + import scipy.sparse as sp + + if sp.isspmatrix_csr(obj): + format_name = "csr_matrix" + elif sp.isspmatrix_csc(obj): + format_name = "csc_matrix" + elif sp.isspmatrix_coo(obj): + format_name = "coo_matrix" + elif sp.isspmatrix_lil(obj): + format_name = "lil_matrix" + elif sp.isspmatrix_dok(obj): + format_name = "dok_matrix" + elif sp.isspmatrix_dia(obj): + format_name = "dia_matrix" + elif sp.isspmatrix_bsr(obj): + format_name = "bsr_matrix" + except (ImportError, TypeError): + # ImportError: scipy not available + # TypeError: isspmatrix_* functions may fail on new sparse array types (PR #1927) + pass + + # Fallback: Use type name (works for new sparse array classes like csr_array, csc_array) + if format_name is None: + format_name = type(obj).__name__ + + # Build type_name with sparsity info inline + nnz_formatted = format_number(obj.nnz) # type: ignore[attr-defined] + if sparsity_str: + type_name = ( + f"{format_name} ({shape_str}) {dtype_str} · " + f"{sparsity_str} ({nnz_formatted} stored)" + ) + else: + type_name = f"{format_name} ({shape_str}) {dtype_str}" + + return FormattedOutput( + type_name=type_name, + css_class=CSS_DTYPE_SPARSE, + tooltip=f"{nnz_formatted} stored elements", + is_serializable=True, + ) + + +class BackedSparseDatasetFormatter(TypeFormatter[object]): + """Formatter for anndata's backed sparse datasets (_CSRDataset, _CSCDataset). + + These are HDF5/Zarr-backed sparse matrices that stay on disk. + Only metadata (shape, dtype, format) is read — no data is loaded. + """ + + priority = 110 # Higher than SparseMatrixFormatter to check first + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[object]: + # Check for anndata's backed sparse dataset classes + module = type(obj).__module__ + return module.startswith("anndata._core.sparse_dataset") and hasattr( + obj, "format" + ) + + def format(self, obj: object, context: FormatterContext) -> FormattedOutput: + # Duck-typed: can_format() guarantees shape/dtype attrs + shape_str = " × ".join(format_number(s) for s in obj.shape) # type: ignore[attr-defined] + dtype_str = str(obj.dtype) # type: ignore[attr-defined] + format_name = getattr(obj, "format", "sparse") + + return FormattedOutput( + type_name=f"{format_name}_matrix ({shape_str}) {dtype_str} · on disk", + css_class=CSS_DTYPE_SPARSE, + tooltip="Backed sparse matrix (data stays on disk)", + is_serializable=True, + ) + + +class DataFrameFormatter(TypeFormatter[pd.DataFrame]): + """Formatter for pandas.DataFrame. + + Shows column names in the meta column. Can optionally show full DataFrame + as expandable content via pandas ``_repr_html_()`` - controlled by setting + ``anndata.settings.repr_html_dataframe_expand`` (default: False). + + When expanded, uses the rich Jupyter-style output from pandas. Configure + the display with pandas options:: + + pd.set_option("display.max_rows", 10) + pd.set_option("display.max_columns", 5) + """ + + priority = 100 + + def can_format( + self, obj: object, context: FormatterContext + ) -> TypeGuard[pd.DataFrame]: + return isinstance(obj, pd.DataFrame) + + def format(self, obj: pd.DataFrame, context: FormatterContext) -> FormattedOutput: + df = obj + n_rows, n_cols = len(df), len(df.columns) + cols = list(df.columns) + + # Build preview_html with column list for obsm/varm sections + # Uses anndata-columns class for CSS truncation and JS wrap button + preview_html = None + if n_cols > 0 and context.section in ("obsm", "varm"): + col_str = ", ".join(escape_html(str(c)) for c in cols) + preview_html = f'[{col_str}]' + + # Check if expandable _repr_html_ is enabled + expand_dataframes = get_setting("repr_html_dataframe_expand", default=False) + + expanded_html = None + if expand_dataframes and n_rows > 0 and n_cols > 0: + # Use pandas _repr_html_() for native Jupyter-style output + # Respects pd.options.display settings (max_rows, max_columns, etc.) + # Intentional broad catch: _repr_html_() can fail in many ways + # (memory, recursion, custom dtypes, etc.) - gracefully degrade + with contextlib.suppress(Exception): + expanded_html = df._repr_html_() + + return FormattedOutput( + type_name=f"DataFrame ({format_number(n_rows)} × {format_number(n_cols)})", + css_class=CSS_DTYPE_DATAFRAME, + expanded_html=expanded_html, + preview_html=preview_html, + is_serializable=True, + ) + + +class SeriesFormatter(TypeFormatter[pd.Series]): + """Formatter for pandas.Series.""" + + priority = 100 + + def can_format( + self, obj: object, context: FormatterContext + ) -> TypeGuard[pd.Series]: + return isinstance(obj, pd.Series) and not isinstance( + obj.dtype, pd.CategoricalDtype + ) + + def format(self, obj: pd.Series, context: FormatterContext) -> FormattedOutput: + series = obj + dtype_str = str(series.dtype) + css_class = _get_dtype_css_class(series.dtype) + + # Check serializability using the IO registry (future-proof) + is_serial = True + warnings = [] + + # For non-object dtypes, check if the backing array has a registered writer + # This is future-proof: if anndata adds datetime64 support, this will detect it + if series.dtype != np.dtype("object"): + is_serial, reason = _check_series_backing_array(series) + if not is_serial: + warnings.append(reason) + + # Object dtype columns need value-level checking + elif len(series) > 0: + is_serial, reason = _check_series_serializability(series) + if not is_serial: + warnings.append(reason) + + # Compute unique count for preview column (only for obs/var sections) + preview = None + n_unique = None + if context.section in ("obs", "var"): + if context.unique_limit > 0 and len(series) <= context.unique_limit: + # nunique() fails on unhashable types (e.g., lists/dicts) + with contextlib.suppress(TypeError): + n_unique = series.nunique() + if n_unique is not None: + preview = f"({n_unique} unique)" + + # Check for string->category conversion warning + should_warn, warn_msg = should_warn_string_column(series, n_unique) + if should_warn: + warnings.append(warn_msg) + + return FormattedOutput( + type_name=f"{dtype_str}", + css_class=css_class, + preview=preview, + is_serializable=is_serial, + warnings=warnings, + ) + + +class CategoricalFormatter(TypeFormatter[pd.Categorical | pd.Series]): + """Formatter for pandas.Categorical, categorical Series, and xarray DataArrays.""" + + priority = 110 + + def can_format( + self, obj: object, context: FormatterContext + ) -> TypeGuard[pd.Categorical | pd.Series]: + # pandas Categorical + if isinstance(obj, pd.Categorical): + return True + # pandas Series with categorical dtype + if isinstance(obj, pd.Series) and hasattr(obj, "cat"): + return True + # Check for lazy categorical (CategoricalArray) without accessing dtype + # which would trigger loading + try: + from anndata.experimental.backed._lazy_arrays import CategoricalArray + + if hasattr(obj, "variable") and hasattr(obj.variable, "_data"): + lazy_indexed = obj.variable._data + if hasattr(lazy_indexed, "array") and isinstance( + lazy_indexed.array, CategoricalArray + ): + return True + except ImportError: + pass + # Fallback: xarray DataArray with categorical dtype (will load data) + return ( + hasattr(obj, "dtype") + and isinstance(obj.dtype, pd.CategoricalDtype) + and not isinstance(obj, pd.Series | pd.Categorical) + ) + + def format( # noqa: PLR0912 + self, obj: pd.Categorical | pd.Series, context: FormatterContext + ) -> FormattedOutput: + # Determine if this is a lazy (xarray DataArray) categorical + is_lazy = is_lazy_column(obj) + n_categories = 0 + + # Get number of categories based on object type + if isinstance(obj, pd.Series): + n_categories = len(obj.cat.categories) + elif isinstance(obj, pd.Categorical): + n_categories = len(obj.categories) + else: + # Try to get info from lazy categorical without loading + lazy_count, _lazy_ordered = get_lazy_categorical_info(obj) + if lazy_count is not None: + n_categories = lazy_count + elif hasattr(obj, "dtype") and hasattr(obj.dtype, "categories"): + # Fallback: access dtype.categories (will load data) + n_categories = len(obj.dtype.categories) + + # Format type name - indicate lazy for xarray DataArrays + type_name = ( + f"category ({n_categories}, lazy)" + if is_lazy + else f"category ({n_categories})" + ) + + # Build preview_html with category list and colors + preview_html = None + error = None + if context.section in ("obs", "var") and context.key is not None: + try: + # Get categories (respecting lazy loading limits) + categories, was_truncated, n_total = get_categories_for_display( + obj, context, is_lazy=is_lazy + ) + + if len(categories) == 0: + # Metadata-only mode or no categories: show just count + if n_total is not None: + preview_html = f'({n_total} categories)' + else: + preview_html = ( + f'(categories)' + ) + else: + # Get colors for categories + colors = None + if context.adata_ref is not None: + max_cats = context.max_categories + n_cats_to_show = min(len(categories), max_cats) + # For lazy with truncation, only load colors we need + color_limit = ( + n_cats_to_show if (is_lazy and was_truncated) else None + ) + colors = get_matching_column_colors( + context.adata_ref, # type: ignore[arg-type] + context.key, + limit=color_limit, + ) + + # Render category list with colors + n_hidden = ( + (n_total - len(categories)) + if (n_total and was_truncated) + else 0 + ) + preview_html = render_category_list( + categories, colors, context.max_categories, n_hidden=n_hidden + ) + except Exception as e: # noqa: BLE001 + # Never let preview generation crash the repr + # Set error field - renderer will escape and display it + error = f"error: {type(e).__name__}" + + # Check for color warnings + warnings = [] + if context.adata_ref is not None and context.key is not None: + # Check for color count mismatch + if n_categories > 0: + color_warning = check_color_category_mismatch( + context.adata_ref, # type: ignore[arg-type] + context.key, + n_categories, + ) + if color_warning: + warnings.append(color_warning) + # Check for invalid/unsafe colors (limited to previewed categories) + invalid_warning = check_invalid_colors( + context.adata_ref, # type: ignore[arg-type] + context.key, + limit=context.max_categories, + n_total=n_categories, + ) + if invalid_warning: + warnings.append(invalid_warning) + + return FormattedOutput( + type_name=type_name, + css_class=CSS_DTYPE_CATEGORY, + preview_html=preview_html, + is_serializable=True, + warnings=warnings, + error=error, + ) + + +class LazyColumnFormatter(TypeFormatter[object]): + """ + Formatter for lazy obs/var columns (xarray DataArray) from read_lazy(). + + For lazy AnnData, obs/var columns are xarray DataArrays instead of pandas Series. + This formatter shows the dtype with "(lazy)" indicator, without the shape since + all columns in obs/var have the same length (n_obs or n_var). + + Note: Categorical columns are handled by CategoricalFormatter (higher priority). + """ + + priority = ( + 60 # Higher than ArrayAPIFormatter (50), lower than CategoricalFormatter (110) + ) + sections = ("obs", "var") # Only apply to obs/var sections + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[object]: + # xarray DataArray (categoricals already handled by higher-priority CategoricalFormatter) + if not ( + hasattr(obj, "dtype") and hasattr(obj, "shape") and hasattr(obj, "ndim") + ): + return False + + # Exclude already-handled types + if isinstance(obj, np.ndarray | pd.DataFrame | pd.Series | pd.Categorical): + return False + + # Check if it looks like an xarray DataArray (has .data attribute) + # This is a good heuristic for lazy obs/var columns + return hasattr(obj, "data") + + def format(self, obj: object, context: FormatterContext) -> FormattedOutput: + # Duck-typed: can_format() guarantees dtype attr (xarray DataArray) + dtype_str = str(obj.dtype) # type: ignore[attr-defined] + dtype_lower = dtype_str.lower() + + # Map common dtypes to CSS classes + if "int" in dtype_lower: + css_class = CSS_DTYPE_INT + elif "float" in dtype_lower: + css_class = CSS_DTYPE_FLOAT + elif "bool" in dtype_lower: + css_class = CSS_DTYPE_BOOL + elif "str" in dtype_lower or dtype_lower == "object": + css_class = CSS_DTYPE_STRING + else: + css_class = CSS_DTYPE_UNKNOWN + + # For lazy non-categorical columns, we can't compute unique count + # without loading data, so we just indicate it's lazy + preview = None + if context.section in ("obs", "var"): + preview = "(lazy)" + + return FormattedOutput( + type_name=f"{dtype_str} (lazy)", + css_class=css_class, + preview=preview, + is_serializable=True, + ) + + +class DaskArrayFormatter(TypeFormatter[object]): + """Formatter for dask.array.Array.""" + + priority = 120 + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[object]: + try: + import dask.array as da + + return isinstance(obj, da.Array) + except ImportError: + return False + + def format(self, obj: object, context: FormatterContext) -> FormattedOutput: + # Duck-typed: can_format() verifies isinstance(obj, dask.array.Array) + dtype_str = str(obj.dtype) # type: ignore[attr-defined] + + # Get chunk info + chunks_str = str(obj.chunksize) if hasattr(obj, "chunksize") else "unknown" # type: ignore[attr-defined] + + # In obsm/varm/obsp/varp sections, don't show shape in type (redundant) + # - obsp/varp: always n_obs × n_obs or n_var × n_var + # - obsm/varm: preview column shows number of columns + if context.section in ("obsm", "varm", "obsp", "varp"): + type_name = f"dask.array {dtype_str} · chunks={chunks_str}" + else: + shape_str = " × ".join(format_number(s) for s in obj.shape) # type: ignore[attr-defined] + type_name = f"dask.array ({shape_str}) {dtype_str} · chunks={chunks_str}" + + # For obsm/varm sections, show number of columns in preview + preview = None + if context.section in ("obsm", "varm") and len(obj.shape) == 2: # type: ignore[attr-defined] + n_cols = obj.shape[1] # type: ignore[attr-defined] + preview = f"({format_number(n_cols)} columns)" + + return FormattedOutput( + type_name=type_name, + css_class=CSS_DTYPE_DASK, + tooltip=f"{obj.npartitions} partitions", # type: ignore[attr-defined] + preview=preview, + is_serializable=True, + ) + + +class AwkwardArrayFormatter(TypeFormatter[object]): + """Formatter for awkward.Array (ragged/jagged arrays).""" + + priority = 120 + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[object]: + return type(obj).__module__.startswith("awkward") + + def format(self, obj: object, context: FormatterContext) -> FormattedOutput: + # Duck-typed: can_format() verifies module starts with "awkward" + length: int | None = None + try: + length = len(obj) # type: ignore[arg-type] + type_str = str(obj.type) if hasattr(obj, "type") else "unknown" + except Exception: # noqa: BLE001 + # Intentional broad catch: awkward arrays can fail on len/type access + # in edge cases (lazy evaluation, corrupt data) - show placeholder + type_str = "unknown" + + length_str = str(length) if length is not None else "?" + return FormattedOutput( + type_name=f"awkward.Array ({length_str} records)", + css_class=CSS_DTYPE_AWKWARD, + tooltip=f"Type: {type_str}", + is_serializable=True, + ) + + +class ArrayAPIFormatter(TypeFormatter[object]): + """ + Formatter for Array-API compatible arrays (JAX, CuPy, PyTorch, TensorFlow, etc.). + + Detection strategy (two tiers): + + 1. :func:`~anndata.compat.has_xp` — canonical check for arrays implementing the + `Array API standard `_ (e.g. JAX, CuPy). + 2. Duck-typing fallback — catches arrays with ``shape``/``dtype``/``ndim`` that do + not (yet) implement the full protocol (e.g. PyTorch tensors, TensorFlow tensors). + + CuPy arrays (≥12) implement the full Array API protocol, so they are handled + here. CuPy's device object exposes ``.id``; the formatter renders it as + ``GPU:{device.id}`` for a clean label. + + Low priority (50) ensures specific formatters (numpy, dask, etc.) + are tried first. + """ + + priority = 50 # Lower than specific formatters (numpy=110) but higher than builtins + + _FRIENDLY_NAMES: ClassVar[dict[str, str]] = { + "jax": "JAX", + "jaxlib": "JAX", + "torch": "PyTorch", + "tensorflow": "TensorFlow", + "tf": "TensorFlow", + "mxnet": "MXNet", + "cupy": "CuPy", + } + + # Modules already handled by dedicated formatters (defensive guard; + # the priority system normally prevents reaching this formatter). + _HANDLED_MODULES: ClassVar[tuple[str, ...]] = ( + "numpy", + "pandas", + "scipy.sparse", + "awkward", + "dask", + ) + + @staticmethod + def _device_css_class(device_str: str) -> str: + """Map device string to a CSS class: GPU (green), TPU (teal), CPU/other (amber).""" + lower = device_str.lower() + if "cuda" in lower or "gpu" in lower: + return CSS_DTYPE_GPU + if "tpu" in lower: + return CSS_DTYPE_TPU + return CSS_DTYPE_ARRAY_API + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[object]: + # Tier 1: full Array API protocol (JAX, CuPy ≥12, numpy ≥2.0, …) + if has_xp(obj): + # numpy has its own formatter + return not isinstance(obj, np.ndarray) + + # Tier 2: duck-typing for arrays that expose shape/dtype/ndim + # but don't implement the full protocol (PyTorch, TensorFlow, …) + if not ( + hasattr(obj, "shape") and hasattr(obj, "dtype") and hasattr(obj, "ndim") + ): + return False + + # Exclude types that have dedicated formatters + module = type(obj).__module__ + return not isinstance(obj, np.ndarray) and not module.startswith( + self._HANDLED_MODULES + ) + + def format(self, obj: object, context: FormatterContext) -> FormattedOutput: + # Duck-typed: can_format() guarantees shape/dtype/ndim attrs + type_name = type(obj).__name__ + shape_str = " × ".join(format_number(s) for s in obj.shape) # type: ignore[attr-defined] + dtype_str = str(obj.dtype) # type: ignore[attr-defined] + + # Derive backend label: prefer __array_namespace__, fall back to module name + backend_label = type(obj).__module__.split(".")[0] + with contextlib.suppress(Exception): + xp = obj.__array_namespace__() # type: ignore[attr-defined] + ns_name = getattr(xp, "__name__", "") or type(xp).__module__ + backend_label = ns_name.split(".")[0] + backend_label = self._FRIENDLY_NAMES.get(backend_label, backend_label) + + # Device info (present on array-api arrays; also on PyTorch/CuPy) + # CuPy's device is a cupy.cuda.Device object — use GPU:{id} for a clean label + device_str = "" + with contextlib.suppress(Exception): + if hasattr(obj.device, "id"): # type: ignore[attr-defined] + device_str = f"GPU:{obj.device.id}" # type: ignore[attr-defined] + else: + device_str = str(obj.device) # type: ignore[attr-defined] + + # Color by device type: GPU (green), TPU (teal), CPU/other (amber) + css_class = self._device_css_class(device_str) + + # Surface device in type_name so it's visible without hovering + if device_str: + type_display = f"{type_name} ({shape_str}) {dtype_str} · {device_str}" + else: + type_display = f"{type_name} ({shape_str}) {dtype_str}" + + # For obsm/varm sections, show number of columns in preview + preview = None + if context.section in ("obsm", "varm") and obj.ndim == 2: # type: ignore[attr-defined] + n_cols = obj.shape[1] # type: ignore[attr-defined] + preview = f"({format_number(n_cols)} columns)" + + return FormattedOutput( + type_name=type_display, + css_class=css_class, + tooltip=f"{backend_label} array", + preview=preview, + is_serializable=True, + ) + + +class AnnDataFormatter(TypeFormatter[object]): + """Formatter for nested AnnData objects.""" + + priority = 150 + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[object]: + # Check by class name to avoid circular imports + return type(obj).__name__ == "AnnData" and hasattr(obj, "n_obs") + + def format(self, obj: object, context: FormatterContext) -> FormattedOutput: + # Duck-typed: can_format() checks class name + n_obs attr (avoids circular import) + shape_str = f"{format_number(obj.n_obs)} × {format_number(obj.n_vars)}" # type: ignore[attr-defined] + + # Generate expanded HTML if within depth limit + expanded_html = None + if context.depth < context.max_depth - 1: + # Lazy import to avoid circular dependency + from .html import generate_repr_html + + nested_html = generate_repr_html( + obj, # type: ignore[arg-type] + depth=context.depth + 1, + max_depth=context.max_depth, + show_header=True, + show_search=False, + ) + expanded_html = f'
{nested_html}
' + + return FormattedOutput( + type_name=f"AnnData ({shape_str})", + css_class=CSS_DTYPE_ANNDATA, + tooltip="Nested AnnData object", + expanded_html=expanded_html, + is_serializable=True, + ) + + +class NoneFormatter(TypeFormatter[None]): + """Formatter for None.""" + + priority = 50 + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[None]: + return obj is None + + def format(self, obj: None, context: FormatterContext) -> FormattedOutput: + return FormattedOutput( + type_name="NoneType", + css_class=CSS_DTYPE_OBJECT, + preview="None", + is_serializable=True, + ) + + +class BoolFormatter(TypeFormatter[bool]): + """Formatter for bool.""" + + priority = 50 + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[bool]: + return isinstance(obj, bool) + + def format(self, obj: bool, context: FormatterContext) -> FormattedOutput: # noqa: FBT001 + return FormattedOutput( + type_name="bool", + css_class=CSS_DTYPE_BOOL, + preview=preview_number(obj), + is_serializable=True, + ) + + +class IntFormatter(TypeFormatter[int]): + """Formatter for int.""" + + priority = 50 + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[int]: + return isinstance(obj, (int, np.integer)) and not isinstance(obj, bool) + + def format(self, obj: int, context: FormatterContext) -> FormattedOutput: + return FormattedOutput( + type_name="int", + css_class=CSS_DTYPE_INT, + preview=preview_number(obj), + is_serializable=True, + ) + + +class FloatFormatter(TypeFormatter[float]): + """Formatter for float.""" + + priority = 50 + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[float]: + return isinstance(obj, (float, np.floating)) + + def format(self, obj: float, context: FormatterContext) -> FormattedOutput: + return FormattedOutput( + type_name="float", + css_class=CSS_DTYPE_FLOAT, + preview=preview_number(obj), + is_serializable=True, + ) + + +class StringFormatter(TypeFormatter[str]): + """Formatter for str.""" + + priority = 50 + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[str]: + return isinstance(obj, str) + + def format(self, obj: str, context: FormatterContext) -> FormattedOutput: + return FormattedOutput( + type_name="str", + css_class=CSS_DTYPE_STRING, + preview=preview_string(obj, context.max_string_length), + is_serializable=True, + ) + + +class DictFormatter(TypeFormatter[dict]): + """Formatter for dict.""" + + priority = 50 + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[dict]: + return isinstance(obj, dict) + + def format(self, obj: dict, context: FormatterContext) -> FormattedOutput: + # Check serializability of contents + is_serial, reason = is_serializable(obj) + warnings = [] if is_serial else [reason] + + return FormattedOutput( + type_name="dict", + css_class=CSS_DTYPE_OBJECT, + preview=preview_dict(obj), + is_serializable=is_serial, + warnings=warnings, + ) + + +class ColorListFormatter(TypeFormatter[list]): + """Formatter for color lists (uns entries ending in _colors).""" + + priority = 60 # Higher than ListFormatter to check first + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[list]: + """Check if this is a color list based on key name and value.""" + key = context.key + return key is not None and is_color_list(key, obj) + + def format(self, obj: list, context: FormatterContext) -> FormattedOutput: + colors = obj + n_colors = len(colors) + + # Build color swatch HTML with sanitized colors, counting invalid ones + swatches = [] + invalid_count = 0 + for color in colors[:COLOR_PREVIEW_LIMIT]: + # Sanitize color to prevent CSS injection + safe_color = sanitize_css_color(str(color)) + if safe_color: + swatches.append( + f'' + ) + else: + # Invalid/unsafe color - show as text only, no style + invalid_count += 1 + swatches.append( + f'?""" + ) + if n_colors > COLOR_PREVIEW_LIMIT: + swatches.append( + f'+{n_colors - COLOR_PREVIEW_LIMIT}' + ) + + preview_html = f'{"".join(swatches)}' + + # Build warnings list (only for colors within preview limit) + warnings = [] + if invalid_count > 0: + has_more = n_colors > COLOR_PREVIEW_LIMIT + warnings.append( + format_invalid_colors_warning(invalid_count, has_more=has_more) + ) + + return FormattedOutput( + type_name=f"colors ({n_colors})", + css_class=CSS_DTYPE_OBJECT, + preview_html=preview_html, + is_serializable=True, + warnings=warnings, + ) + + +class ListFormatter(TypeFormatter[list | tuple]): + """Formatter for list and tuple.""" + + priority = 50 + + def can_format( + self, obj: object, context: FormatterContext + ) -> TypeGuard[list | tuple]: + return isinstance(obj, (list, tuple)) + + def format(self, obj: list | tuple, context: FormatterContext) -> FormattedOutput: + type_name = "list" if isinstance(obj, list) else "tuple" + + # Check serializability + is_serial, reason = is_serializable(obj) + warnings = [] if is_serial else [reason] + + return FormattedOutput( + type_name=type_name, + css_class=CSS_DTYPE_OBJECT, + preview=preview_sequence(obj), + is_serializable=is_serial, + warnings=warnings, + ) + + +def _get_dtype_css_class(dtype: np.dtype | pd.api.types.CategoricalDtype) -> str: # noqa: PLR0911 + """Get CSS class for a numpy or pandas dtype.""" + # Check for pandas CategoricalDtype first (has kind="O" but is special) + dtype_name = str(dtype) + if dtype_name == "category": + return CSS_DTYPE_CATEGORY + + # Try numpy dtype.kind (most reliable for standard dtypes) + kind = getattr(dtype, "kind", None) + if kind is not None: + if kind in ("i", "u"): + return CSS_DTYPE_INT + if kind == "f": + return CSS_DTYPE_FLOAT + if kind == "b": + return CSS_DTYPE_BOOL + if kind in ("U", "S", "O"): + return CSS_DTYPE_STRING + if kind == "c": + return CSS_DTYPE_FLOAT # complex + + # Fallback to string matching for pandas extension dtypes + if "int" in dtype_name: + return CSS_DTYPE_INT + if "float" in dtype_name: + return CSS_DTYPE_FLOAT + if "bool" in dtype_name: + return CSS_DTYPE_BOOL + if "object" in dtype_name or "string" in dtype_name: + return CSS_DTYPE_STRING + return CSS_DTYPE_OBJECT + + +def _register_builtin_formatters() -> None: + """Register all built-in formatters with the global registry.""" + formatters: list[TypeFormatter] = [ + # High priority (specific types) + AnnDataFormatter(), + DaskArrayFormatter(), + AwkwardArrayFormatter(), + NumpyMaskedArrayFormatter(), + CategoricalFormatter(), + BackedSparseDatasetFormatter(), # Before SparseMatrixFormatter (backed sparse) + # Medium priority + NumpyArrayFormatter(), + SparseMatrixFormatter(), + DataFrameFormatter(), + SeriesFormatter(), + # Lazy obs/var columns (xarray DataArray) - must come before ArrayAPIFormatter + LazyColumnFormatter(), + # Low-medium priority (Array-API compatible arrays) + # Must come after specific array formatters (numpy, cupy, etc.) but before builtins + # Handles JAX, PyTorch, TensorFlow arrays (PR #2071 added array-api support) + ArrayAPIFormatter(), + # Low priority (builtins) + NoneFormatter(), + BoolFormatter(), + IntFormatter(), + FloatFormatter(), + StringFormatter(), + DictFormatter(), + ColorListFormatter(), # Before ListFormatter (higher priority for *_colors keys) + ListFormatter(), + ] + + for formatter in formatters: + formatter_registry.register_type_formatter(formatter) + + +# Auto-register on import +_register_builtin_formatters() diff --git a/src/anndata/_repr/html.py b/src/anndata/_repr/html.py new file mode 100644 index 000000000..6dc30c4f3 --- /dev/null +++ b/src/anndata/_repr/html.py @@ -0,0 +1,650 @@ +""" +Main HTML generator for AnnData representation. + +This module generates the complete HTML representation by: +1. Building the header with badges +2. Rendering the search box +3. Generating metadata (version, memory) +4. Rendering each section (X, obs, var, uns, etc.) +5. Handling nested objects recursively +""" + +from __future__ import annotations + +import uuid +from typing import TYPE_CHECKING + +from .._repr_constants import ( + CSS_BADGE_BACKED, + CSS_BADGE_EXTENSION, + CSS_BADGE_LAZY, + CSS_BADGE_VIEW, + DEFAULT_MAX_README_SIZE, + TOOLTIP_TRUNCATE_LENGTH, +) +from .._types import AnnDataElem +from ..utils import get_literal_members +from . import ( + DEFAULT_FOLD_THRESHOLD, + DEFAULT_MAX_CATEGORIES, + DEFAULT_MAX_DEPTH, + DEFAULT_MAX_FIELD_WIDTH, + DEFAULT_MAX_ITEMS, + DEFAULT_MAX_LAZY_CATEGORIES, + DEFAULT_MAX_STRING_LENGTH, + DEFAULT_PREVIEW_ITEMS, + DEFAULT_TYPE_WIDTH, + DEFAULT_UNIQUE_LIMIT, +) +from .components import ( + render_badge, + render_search_box, +) +from .core import ( + render_formatted_entry, + render_section, + render_truncation_indicator, + render_x_entry, +) +from .css import get_css +from .javascript import get_javascript +from .lazy import get_lazy_backing_info, is_lazy_adata +from .registry import ( + FormatterContext, + formatter_registry, +) +from .sections import ( + _detect_unknown_sections, + _render_dataframe_section, + _render_error_entry, + _render_mapping_section, + _render_raw_section, + _render_unknown_sections, + _render_uns_section, +) +from .utils import ( + escape_html, + format_index_preview, + format_memory_size, + format_number, + get_anndata_version, + get_backing_info, + get_setting, + is_backed, + is_view, +) + +if TYPE_CHECKING: + from anndata import AnnData + + from .registry import SectionFormatter + +# Import formatters to register them (side-effect import) +from .._repr_constants import ( + CHAR_WIDTH_PX, + COPY_BUTTON_PADDING_PX, + DEFAULT_FIELD_WIDTH_PX, + MIN_FIELD_WIDTH_PX, +) +from . import formatters as _formatters # noqa: F401 + + +def _collect_all_field_names(adata: AnnData) -> list[str]: + """ + Collect all field names from standard and custom sections. + + Returns field names from obs/var columns and keys from mapping sections + (uns, obsm, varm, layers, obsp, varp) plus any registered custom sections. + """ + all_names: list[str] = [] + standard_sections = set(get_literal_members(AnnDataElem)) + + for section in get_literal_members(AnnDataElem): + if section in {"X", "raw"}: + continue + try: + attr = getattr(adata, section) + if attr is None: + continue + if section in {"obs", "var"}: + if hasattr(attr, "columns"): + all_names.extend(attr.columns.tolist()) + elif hasattr(attr, "keys"): + all_names.extend(attr.keys()) + except Exception: # noqa: BLE001 + # Broken section — skip for width calculation, error placeholder is + # rendered separately by _render_section. + pass + + # Registered custom sections (e.g., TreeData's obst/vart) + for section_name in formatter_registry.get_registered_sections(): + if section_name in standard_sections: + continue + try: + attr = getattr(adata, section_name, None) + if attr is not None and hasattr(attr, "keys"): + all_names.extend(attr.keys()) + except Exception: # noqa: BLE001 + pass + + return all_names + + +def _calculate_field_name_width(adata: AnnData, max_width: int) -> int: + """ + Calculate the optimal field name column width based on longest field name. + + Uses _collect_all_field_names() to gather names from all sections, + then converts the longest name to a pixel width (up to max_width). + + Uses constants from _repr_constants.py tuned for the default 13px monospace font. + """ + all_names = _collect_all_field_names(adata) + + if not all_names: + return DEFAULT_FIELD_WIDTH_PX + + # Find longest name and convert to pixels + max_len = max(len(str(name)) for name in all_names) + width_px = (max_len * CHAR_WIDTH_PX) + COPY_BUTTON_PADDING_PX + + # Clamp to reasonable range (max_width from user setting always wins) + return min(max(MIN_FIELD_WIDTH_PX, width_px), max_width) + + +def _resolve_setting(override: int | None, setting_name: str, default: int) -> int: + """Resolve a setting value with priority: explicit override > anndata.settings > default. + + Parameters + ---------- + override + Explicit value passed to generate_repr_html (highest priority) + setting_name + Name of the anndata.settings attribute to check + default + Fallback default value (lowest priority) + """ + if override is not None: + return override + return get_setting(setting_name, default=default) + + +def _create_formatter_context( + adata: AnnData, + *, + depth: int = 0, + max_depth: int | None = None, + fold_threshold: int | None = None, + max_items: int | None = None, + max_lazy_categories: int | None = None, +) -> FormatterContext: + """Create a FormatterContext with settings resolution. + + Parameters with function overrides use _resolve_setting() (override > settings > default). + Settings-only parameters use get_setting() directly (settings > default). + """ + return FormatterContext( + depth=depth, + # Overridable parameters (passed to generate_repr_html) + max_depth=_resolve_setting(max_depth, "repr_html_max_depth", DEFAULT_MAX_DEPTH), + fold_threshold=_resolve_setting( + fold_threshold, "repr_html_fold_threshold", DEFAULT_FOLD_THRESHOLD + ), + max_items=_resolve_setting(max_items, "repr_html_max_items", DEFAULT_MAX_ITEMS), + max_lazy_categories=_resolve_setting( + max_lazy_categories, + "repr_html_max_lazy_categories", + DEFAULT_MAX_LAZY_CATEGORIES, + ), + # Settings-only parameters (not overridable at call time) + max_categories=get_setting( + "repr_html_max_categories", default=DEFAULT_MAX_CATEGORIES + ), + max_string_length=get_setting( + "repr_html_max_string_length", default=DEFAULT_MAX_STRING_LENGTH + ), + unique_limit=get_setting( + "repr_html_unique_limit", default=DEFAULT_UNIQUE_LIMIT + ), + adata_ref=adata, + ) + + +def generate_repr_html( # noqa: PLR0913 + adata: AnnData, + *, + depth: int = 0, + max_depth: int | None = None, + fold_threshold: int | None = None, + max_items: int | None = None, + max_lazy_categories: int | None = None, + show_header: bool = True, + show_search: bool = True, + _container_id: str | None = None, +) -> str: + """ + Generate HTML representation for an AnnData object. + + Parameters + ---------- + adata + The AnnData object to represent + depth + Current recursion depth (for nested AnnData in .uns) + max_depth + Maximum recursion depth. Uses settings/default if None. + fold_threshold + Auto-fold sections with more entries than this. Uses settings/default if None. + max_items + Maximum items to show per section. Uses settings/default if None. + max_lazy_categories + Maximum categories to load for lazy categoricals. Set to 0 to disable + loading categories entirely (metadata-only mode). Uses settings/default if None. + show_header + Whether to show the header (for nested display) + show_search + Whether to show the search box (only at top level) + _container_id + Internal: container ID for scoping + + Returns + ------- + HTML string + """ + # Check if HTML repr is enabled + if not get_setting("repr_html_enabled", default=True): + return f"
{escape_html(repr(adata))}
" + + # Create formatter context (resolves settings) + context = _create_formatter_context( + adata, + depth=depth, + max_depth=max_depth, + fold_threshold=fold_threshold, + max_items=max_items, + max_lazy_categories=max_lazy_categories, + ) + + # Check max depth + if depth >= context.max_depth: + return _render_max_depth_indicator(adata) + + # Generate unique container ID + container_id = _container_id or f"anndata-repr-{uuid.uuid4().hex[:8]}" + + # Build HTML parts + parts = [] + + # CSS and JS only at top level + if depth == 0: + parts.append(get_css()) + + # Calculate field name column width based on content + max_field_width = get_setting( + "repr_html_max_field_width", default=DEFAULT_MAX_FIELD_WIDTH + ) + field_width = _calculate_field_name_width(adata, max_field_width) + + # Get type column width from settings + type_width = get_setting("repr_html_type_width", default=DEFAULT_TYPE_WIDTH) + + # Container with computed column widths as CSS variables. + # Inline font-family:monospace provides readable fallback when CSS is stripped + # (GitHub, untrusted notebooks). CSS overrides with its own font stack. + # Inline min-width on cells + CSS custom properties give column alignment + # even without a stylesheet. + style = f"font-family: monospace; --anndata-name-col-width: {field_width}px; --anndata-type-col-width: {type_width}px;" + parts.append( + f'
' + ) + + # Header (with search box integrated on the right) + if show_header: + parts.append( + _render_header( + adata, show_search=show_search and depth == 0, container_id=container_id + ) + ) + + # Index preview (only at top level) + if depth == 0: + parts.append(_render_index_preview(adata)) + + # Sections container + parts.append('
') + parts.extend(_render_all_sections(adata, context)) + parts.append("
") # anndata-repr__sections + + # Footer with metadata (only at top level) + if depth == 0: + parts.append(_render_footer(adata)) + # Degradation hints: visible only when CSS or JS is missing. + # No-CSS hint: visible by default, hidden by CSS. + parts.append( + '
' + "Styled representation available in Jupyter and trusted notebooks " + "(colors, search, type highlighting)." + "
" + ) + # No-JS hint: hidden by default (no-CSS case already has its own hint), + # shown by CSS (for static HTML with styles but no JS), + # hidden again by JS on init. + parts.append( + '" + ) + + parts.append("
") # anndata-repr + + # JavaScript (only at top level) + if depth == 0: + parts.append(get_javascript(container_id)) + + return "\n".join(parts) + + +def _render_all_sections( + adata: AnnData, + context: FormatterContext, +) -> list[str]: + """Render all standard and custom sections.""" + parts: list[str] = [] + custom_sections_after = _get_custom_sections_by_position(adata) + + for section in get_literal_members(AnnDataElem): + parts.append(_render_section(adata, section, context)) + + # Render custom sections after this section + if section in custom_sections_after: + parts.extend( + _render_custom_section(adata, section_formatter, context) + for section_formatter in custom_sections_after[section] + ) + + # Custom sections at end (no specific position) + if None in custom_sections_after: + parts.extend( + _render_custom_section(adata, section_formatter, context) + for section_formatter in custom_sections_after[None] + ) + + # Detect and show unknown sections (attributes not in AnnDataElem) + unknown_sections = _detect_unknown_sections(adata) + if unknown_sections: + parts.append(_render_unknown_sections(unknown_sections)) + + return parts + + +def _render_section( + adata: AnnData, + section: str, + context: FormatterContext, +) -> str: + """Render a single standard section. + + Attribute access happens inside the try/except so a broken section (one + whose ``getattr`` raises — e.g. a corrupt aligned mapping or a subclass + with a crashing property) renders as an error placeholder instead of + aborting the whole repr. This is why we iterate section names directly + via ``get_literal_members(AnnDataElem)`` rather than delegating to + ``iter_outer``, which propagates the first exception it hits. + """ + try: + if section == "X": + return render_x_entry(adata, context) + elem = getattr(adata, section) + if section == "raw": + return _render_raw_section(elem, context) + if section in ("obs", "var"): + return _render_dataframe_section(section, elem, context) + if section == "uns": + return _render_uns_section(elem, context) + return _render_mapping_section(section, elem, context) + except Exception as e: # noqa: BLE001 + # Show error instead of hiding the section + return _render_error_entry(section, f"{type(e).__name__}: {e}") + + +def _get_custom_sections_by_position( + adata: object, +) -> dict[str | None, list[SectionFormatter]]: + """ + Get registered custom section formatters grouped by their position. + + Returns a dict mapping after_section -> list of formatters. + None key contains formatters that should appear at the end. + """ + from collections import defaultdict + + result = defaultdict(list) + standard_section_names = set(get_literal_members(AnnDataElem)) + + for section_name in formatter_registry.get_registered_sections(): + formatter = formatter_registry.get_section_formatter(section_name) + if formatter is None: + continue + + # Skip standard sections (they're handled separately) + if section_name in standard_section_names: + continue + + # Check if this section should be shown for this object + try: + if not formatter.should_show(adata): + continue + except Exception: # noqa: BLE001 + # Intentional broad catch: custom formatters shouldn't break the repr + continue + + # Group by position + after = getattr(formatter, "after_section", None) + result[after].append(formatter) + + return dict(result) + + +def _render_custom_section( + adata: AnnData, + formatter: SectionFormatter, + context: FormatterContext, +) -> str: + """Render a custom section using its registered formatter. + + If the formatter defines ``render_html(obj, context)``, it is tried + first and the result is used as-is (no ``
`` wrapping). + If ``render_html`` fails, falls back to the standard ``get_entries`` + path so formatters can provide both an enhanced and a safe representation. + """ + # Allow formatters to produce raw HTML (e.g., compact inline rows) + if hasattr(formatter, "render_html"): + try: + return formatter.render_html(adata, context) + except Exception as e: # noqa: BLE001 + from .._warnings import warn + + warn( + f"Custom section formatter '{formatter.section_name}' render_html failed, " + f"falling back to get_entries: {e}", + UserWarning, + ) + # Fall through to get_entries below + + try: + entries = formatter.get_entries(adata, context) + except Exception as e: # noqa: BLE001 + # Intentional broad catch: custom formatters shouldn't crash the entire repr + from .._warnings import warn + + warn( + f"Custom section formatter '{formatter.section_name}' failed: {e}", + UserWarning, + ) + return "" + + if not entries: + return "" + + n_items = len(entries) + section_name = formatter.section_name + + # Render entries (with truncation) + rows = [] + for i, entry in enumerate(entries): + if i >= context.max_items: + rows.append(render_truncation_indicator(n_items - context.max_items)) + break + rows.append(render_formatted_entry(entry, section_name)) + + # Use render_section for consistent structure + return render_section( + getattr(formatter, "display_name", section_name), + "\n".join(rows), + n_items=n_items, + doc_url=getattr(formatter, "doc_url", None), + tooltip=getattr(formatter, "tooltip", ""), + should_collapse=n_items > context.fold_threshold, + section_id=section_name, + ) + + +def _render_header( + adata: AnnData, *, show_search: bool = False, container_id: str = "" +) -> str: + """Render the header with type, shape, badges, and optional search box.""" + parts = ['
'] + + # Type name - allow for extension types + type_name = type(adata).__name__ + parts.append(f'{escape_html(type_name)}') + + # Shape + shape_str = f"{format_number(adata.n_obs)} obs × {format_number(adata.n_vars)} vars" + parts.append(f'{shape_str}') + + # Badges - use render_badge() helper + if is_view(adata): + parts.append(render_badge("View", CSS_BADGE_VIEW)) + + if is_backed(adata): + backing = get_backing_info(adata) + filename = backing.get("filename", "") + format_str = backing.get("format", "") + status = "Open" if backing.get("is_open") else "Closed" + parts.append(render_badge(f"{format_str} ({status})", CSS_BADGE_BACKED)) + # Inline file path (full path, no truncation) + if filename: + parts.append( + f'{escape_html(filename)}' + ) + + if is_lazy_adata(adata): + lazy_info = get_lazy_backing_info(adata) + lazy_format = lazy_info.get("format", "") + if lazy_format: + parts.append(render_badge(f"Lazy ({lazy_format})", CSS_BADGE_LAZY)) + else: + parts.append(render_badge("Lazy", CSS_BADGE_LAZY)) + # Show file path for lazy AnnData (similar to backed) + lazy_filename = lazy_info.get("filename", "") + if lazy_filename: + path_style = ( + "font-family:ui-monospace,monospace;font-size:11px;" + "color:var(--anndata-text-secondary, #6c757d);" + ) + parts.append( + f'' + f"{escape_html(lazy_filename)}" + f"" + ) + + # Check for extension type (not standard AnnData) + if type_name != "AnnData": + parts.append(render_badge(type_name, CSS_BADGE_EXTENSION)) + + # README icon if uns["README"] exists with a string + readme_content = adata.uns.get("README") if hasattr(adata, "uns") else None + if isinstance(readme_content, str) and readme_content.strip(): + # Check max README size setting (0 means no limit) + max_readme_size = get_setting( + "repr_html_max_readme_size", default=DEFAULT_MAX_README_SIZE + ) + original_len = len(readme_content) + if max_readme_size > 0 and original_len > max_readme_size: + # Truncate and add note + readme_content = readme_content[:max_readme_size] + truncation_note = ( + f"\n\n---\n*README truncated: showing {max_readme_size:,} of " + f"{original_len:,} characters*" + ) + readme_content += truncation_note + + escaped_readme = escape_html(readme_content) + # Truncate for no-JS tooltip (first 500 chars) + tooltip_text = readme_content[:TOOLTIP_TRUNCATE_LENGTH] + if len(readme_content) > TOOLTIP_TRUNCATE_LENGTH: + tooltip_text += "..." + escaped_tooltip = escape_html(tooltip_text) + + parts.append( + f'' + f"ⓘ" + f"" + ) + + # Search box on the right (spacer pushes it right) - use render_search_box() helper + if show_search: + parts.append('') + parts.append(render_search_box(container_id)) + + parts.append("
") + return "\n".join(parts) + + +def _render_footer(adata: AnnData) -> str: + """Render the footer with version and memory info.""" + parts = ['") + return "\n".join(parts) + + +def _render_index_preview(adata: AnnData) -> str: + """Render preview of obs_names and var_names.""" + parts = ['
'] + + # obs_names preview + obs_preview = format_index_preview(adata.obs_names, DEFAULT_PREVIEW_ITEMS) + parts.append(f"
obs_names: {obs_preview}
") + + # var_names preview + var_preview = format_index_preview(adata.var_names, DEFAULT_PREVIEW_ITEMS) + parts.append(f"
var_names: {var_preview}
") + + parts.append("
") + return "\n".join(parts) + + +def _render_max_depth_indicator(adata: AnnData) -> str: + """Render indicator when max depth is reached.""" + n_obs = getattr(adata, "n_obs", "?") + n_vars = getattr(adata, "n_vars", "?") + return f'
AnnData ({format_number(n_obs)} × {format_number(n_vars)}) - max depth reached
' diff --git a/src/anndata/_repr/javascript.py b/src/anndata/_repr/javascript.py new file mode 100644 index 000000000..a86057a38 --- /dev/null +++ b/src/anndata/_repr/javascript.py @@ -0,0 +1,59 @@ +""" +JavaScript for AnnData HTML representation interactivity. + +Provides: +- Section folding/unfolding +- Search/filter functionality across all levels +- Copy to clipboard +- Nested content expansion +- README modal with plain text display + +The JavaScript is loaded from static/repr.js and wrapped in an IIFE +that scopes it to a specific container element. +""" + +from __future__ import annotations + +from functools import cache +from importlib.resources import files + + +@cache +def _load_js_content() -> str: + """Load main JS content from static file (cached).""" + return files("anndata._repr.static").joinpath("repr.js").read_text(encoding="utf-8") + + +def get_javascript(container_id: str) -> str: + """ + Get the JavaScript code for a specific container. + + Each rendered repr ships the full source so that any cell is + self-sufficient (surviving deletion, reorder, or notebook reopen), + but only the first to execute installs ``window.anndataRepr`` — + subsequent cells reuse the installed ``init`` for their own container. + + Parameters + ---------- + container_id + Unique ID for the container element + + Returns + ------- + JavaScript code wrapped in script tags + """ + js_content = _load_js_content() + return f"""""" diff --git a/src/anndata/_repr/lazy.py b/src/anndata/_repr/lazy.py new file mode 100644 index 000000000..f79b6b686 --- /dev/null +++ b/src/anndata/_repr/lazy.py @@ -0,0 +1,346 @@ +""" +Lazy loading utilities for AnnData HTML representation. + +This module consolidates all logic related to detecting and handling lazy AnnData +objects (from read_lazy()). Lazy AnnData uses xarray-backed storage and requires +special handling to avoid triggering data loading during repr generation. + +Key concepts: +- Lazy AnnData: Created by read_lazy(), obs/var are Dataset2D (xarray-backed) +- Lazy series: Individual columns from Dataset2D, implemented as xarray DataArrays +- CategoricalArray: anndata's lazy categorical implementation for zarr/h5 storage + +Usage: + from .lazy import ( + is_lazy_adata, + is_lazy_column, + get_lazy_category_count, + get_lazy_categories, + get_lazy_categorical_info, + ) +""" + +from __future__ import annotations + +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from .registry import FormatterContext + + +def _get_categorical_array(col: object) -> object | None: + """ + Get the underlying CategoricalArray from a lazy xarray DataArray. + + Navigates through the xarray structure: + DataArray -> Variable -> LazilyIndexedArray -> CategoricalArray + + Parameters + ---------- + col + The column (potentially an xarray DataArray) to extract from + + Returns + ------- + CategoricalArray if found, None otherwise + """ + try: + from anndata.experimental.backed._lazy_arrays import CategoricalArray + + # Navigate through xarray structure to find CategoricalArray + # DataArray -> Variable -> LazilyIndexedArray -> CategoricalArray + if hasattr(col, "variable") and hasattr(col.variable, "_data"): + lazy_indexed = col.variable._data + if hasattr(lazy_indexed, "array"): + arr = lazy_indexed.array + if isinstance(arr, CategoricalArray): + return arr + except ImportError: + pass + return None + + +def is_lazy_adata(obj: object) -> bool: + """Check if an AnnData uses lazy loading (experimental read_lazy). + + Lazy AnnData has Dataset2D (xarray-backed) obs/var instead of regular DataFrames. + + Parameters + ---------- + obj + Object to check (typically an AnnData) + + Returns + ------- + True if obj is a lazy AnnData + + Notes + ----- + This function accesses the .obs attribute which may trigger I/O for some + objects. If .obs raises an exception, returns False. + """ + try: + obs = getattr(obj, "obs", None) + if obs is None: + return False + # Dataset2D has a different class name than DataFrame + return obs.__class__.__name__ == "Dataset2D" + except Exception: # noqa: BLE001 + # Intentional broad catch: .obs access may raise anything + return False + + +def _extract_path_from_lazy_array(arr: object) -> dict[str, str] | None: + """Extract file path and format from a lazy array (CategoricalArray/MaskedArray).""" + from pathlib import Path + + base_path = arr.base_path_or_zarr_group + file_format = getattr(arr, "file_format", "") + + # H5AD files have a Path as base_path + if isinstance(base_path, Path): + fmt = "H5AD" if file_format == "h5" else "Zarr" + return {"filename": str(base_path), "format": fmt} + + # For zarr groups, extract the store path (v2 uses .path, v3 uses .root) + if hasattr(base_path, "store"): + store = base_path.store + store_path = getattr(store, "path", None) or getattr(store, "root", None) + if store_path is not None: + return {"filename": str(store_path), "format": "Zarr"} + return {"filename": "", "format": "Zarr"} + + return None + + +def get_lazy_backing_info(obj: object) -> dict[str, str]: + """Get backing file information from a lazy AnnData. + + Extracts the file path and format from the underlying lazy arrays + (CategoricalArray or MaskedArray) in obs/var columns. + + Parameters + ---------- + obj + A lazy AnnData object (from read_lazy()) + + Returns + ------- + Dictionary with: + - 'filename': str - path to the backing file (empty if not found) + - 'format': str - 'H5AD' or 'Zarr' (empty if not found) + """ + empty_result: dict[str, str] = {"filename": "", "format": ""} + + if not is_lazy_adata(obj): + return empty_result + + # Try to get path from adata.file (set for H5AD files opened via path) + file_obj = getattr(obj, "file", None) + if file_obj is not None: + filename = getattr(file_obj, "filename", None) + if filename is not None: + filename_str = str(filename) + fmt = "H5AD" if filename_str.endswith(".h5ad") else "Zarr" + return {"filename": filename_str, "format": fmt} + + # Try to extract from underlying lazy arrays in obs/var + obs = getattr(obj, "obs", None) + ds = getattr(obs, "ds", None) if obs is not None and hasattr(obs, "ds") else None + if ds is None: + return empty_result + + # Search through columns for a backing array with path info + try: + from anndata.experimental.backed._lazy_arrays import ( + CategoricalArray, + MaskedArray, + ) + + for col_name in ds.data_vars: + col = ds[col_name] + # Navigate: DataArray -> Variable -> LazilyIndexedArray -> BackingArray + if not (hasattr(col, "variable") and hasattr(col.variable, "_data")): + continue + lazy_indexed = col.variable._data + if not hasattr(lazy_indexed, "array"): + continue + arr = lazy_indexed.array + if isinstance(arr, (CategoricalArray, MaskedArray)): + result = _extract_path_from_lazy_array(arr) + if result is not None: + return result + except ImportError: + pass + + return empty_result + + +def is_lazy_column(series: object) -> bool: + """ + Check if a Series-like object is lazy (backed by remote/lazy storage). + + This detects Series from Dataset2D (xarray-backed DataFrames used in + lazy AnnData) to prevent operations that would trigger data loading. + + Note: We avoid accessing .data as that triggers loading for lazy + CategoricalArrays. Instead we check for xarray-specific attributes. + + Parameters + ---------- + series + The column/series to check + + Returns + ------- + True if series is a lazy column (xarray DataArray) + """ + # Check for xarray DataArray structure without triggering data loading + # xarray DataArrays have 'variable' and 'dims' attributes + if hasattr(series, "variable") and hasattr(series, "dims"): + return True + # Check for xarray Variable backing + return hasattr(series, "_variable") + + +def get_lazy_category_count(col: object) -> int | None: + """ + Get the number of categories for a lazy categorical without loading them. + + For lazy categoricals, we access the underlying CategoricalArray directly + and read the category count from the zarr/h5 storage metadata, avoiding + any data loading. + + Parameters + ---------- + col + The lazy categorical column (xarray DataArray) + + Returns + ------- + Number of categories, or None if cannot be determined + """ + # Try to get category count from CategoricalArray without loading + cat_arr = _get_categorical_array(col) + if cat_arr is not None: + try: + # Access the raw _categories group/array shape + # For zarr: _categories is a Group with 'values' array + # For h5: similar structure + cats = cat_arr._categories + if hasattr(cats, "keys"): # It's a group + values = cats["values"] + return values.shape[0] + elif hasattr(cats, "shape"): # It's an array directly + return cats.shape[0] + except Exception: # noqa: BLE001 + pass + return None + + +def get_lazy_categorical_info(obj: object) -> tuple[int | None, bool]: + """ + Get category count and ordered flag from a lazy categorical without loading data. + + For lazy categoricals (xarray DataArray backed by CategoricalArray), + this accesses the underlying storage metadata directly to get the count + without loading the actual category values. + + Parameters + ---------- + obj + The object to check (xarray DataArray backed by CategoricalArray) + + Returns + ------- + tuple of (n_categories, ordered) + n_categories: Number of categories, or None if cannot be determined + ordered: Whether the categorical is ordered + """ + try: + from anndata.experimental.backed._lazy_arrays import CategoricalArray + + # Navigate through xarray structure to find CategoricalArray + if hasattr(obj, "variable") and hasattr(obj.variable, "_data"): + lazy_indexed = obj.variable._data + if hasattr(lazy_indexed, "array"): + arr = lazy_indexed.array + if isinstance(arr, CategoricalArray): + # Get count from storage metadata without loading + cats = arr._categories + if hasattr(cats, "keys"): # It's a group (zarr) + values = cats["values"] # type: ignore[index] + return values.shape[0], arr._ordered # type: ignore[union-attr] + elif hasattr(cats, "shape"): # It's an array directly + return cats.shape[0], arr._ordered # type: ignore[union-attr] + except (ImportError, Exception): # noqa: BLE001 + pass + return None, False + + +def get_lazy_categories( + col: object, context: FormatterContext +) -> tuple[list, bool, int | None]: + """ + Get categories for a lazy categorical column, respecting limits. + + For lazy AnnData (from read_lazy()), this accesses the underlying + CategoricalArray directly and reads only the needed categories from + storage, avoiding loading the full categorical data. + + Parameters + ---------- + col + Column (lazy xarray DataArray) to get categories from + context + FormatterContext with max_lazy_categories limit + + Returns + ------- + tuple of (categories_list, was_truncated, n_categories) + categories_list: List of category values (empty if skipped) + was_truncated: True if categories were truncated due to limit + n_categories: Total number of categories (if known) + """ + # Import here to avoid circular imports + from .utils import _get_categories_from_column + + # Try to get category count without loading + n_cats = get_lazy_category_count(col) + + # If max_lazy_categories is 0, skip loading entirely (metadata-only mode) + if context.max_lazy_categories == 0: + return [], True, n_cats + + # Determine if we need to truncate + should_truncate = n_cats is not None and n_cats > context.max_lazy_categories + n_to_read = context.max_lazy_categories if should_truncate else n_cats + + # Try to read categories directly from CategoricalArray storage. + # We access _categories (private) to bypass the @cached_property which loads + # ALL categories. Instead, we use read_elem_partial (official API) to read + # only the first N categories. This is intentional - for large categoricals, + # loading everything defeats the purpose of lazy loading. + cat_arr = _get_categorical_array(col) + if cat_arr is not None: + try: + from anndata._io.specs.registry import read_elem, read_elem_partial + + cats = cat_arr._categories + # Get values array: zarr uses group with "values" key, h5 uses array directly + values = cats["values"] if hasattr(cats, "keys") else cats + if n_to_read is not None and n_to_read < (n_cats or float("inf")): + categories = list( + read_elem_partial(values, indices=slice(0, n_to_read)) + ) + else: + categories = list(read_elem(values)) # type: ignore[arg-type] + return categories, should_truncate, n_cats + except Exception: # noqa: BLE001 + pass + + # Fallback to unified accessor (will trigger loading) + try: + return _get_categories_from_column(col), False, n_cats + except Exception: # noqa: BLE001 + return [], True, n_cats diff --git a/src/anndata/_repr/registry.py b/src/anndata/_repr/registry.py new file mode 100644 index 000000000..c09a7cb04 --- /dev/null +++ b/src/anndata/_repr/registry.py @@ -0,0 +1,1044 @@ +""" +Registry pattern for extensible HTML formatting. + +This module provides a registry system that allows: +1. New data types (TreeData, MuData, SpatialData) to register custom formatters +2. Graceful fallback for unknown types +3. Override of default formatters +4. Section-level and type-level customization + +Usage for extending to new types: + + from anndata._repr import register_formatter, TypeFormatter, FormattedOutput + + # Format by Python type (e.g., custom array in obsm) + @register_formatter + class MyArrayFormatter(TypeFormatter): + def can_format(self, obj, context): + return isinstance(obj, MyArrayType) + + def format(self, obj, context): + return FormattedOutput( + type_name=f"MyArray {obj.shape}", + css_class="anndata-dtype--myarray", + # preview_html for rightmost column (data preview, counts, etc.) + preview_html=f'({obj.n_items} items)', + ) + + # Format by embedded type hint (e.g., tagged data in uns) + from anndata._repr import extract_uns_type_hint + + @register_formatter + class MyConfigFormatter(TypeFormatter): + priority = 100 # Higher priority to check before fallback + + def can_format(self, obj, context): + hint, _ = extract_uns_type_hint(obj) + return hint == "mypackage.config" + + def format(self, obj, context): + hint, data = extract_uns_type_hint(obj) + return FormattedOutput( + type_name="config", + preview_html='Custom config preview', + ) +""" + +from __future__ import annotations + +from abc import ABC, abstractmethod +from dataclasses import dataclass, field, replace +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from typing import TypeGuard + +from .._repr_constants import ( + CSS_DTYPE_EXTENSION, + CSS_DTYPE_UNKNOWN, + CSS_TEXT_ERROR, + CSS_TEXT_WARNING, + DEFAULT_FOLD_THRESHOLD, + DEFAULT_MAX_CATEGORIES, + DEFAULT_MAX_DEPTH, + DEFAULT_MAX_ITEMS, + DEFAULT_MAX_LAZY_CATEGORIES, + DEFAULT_MAX_STRING_LENGTH, + DEFAULT_UNIQUE_LIMIT, +) +from .utils import escape_html, validate_key + + +@dataclass +class FormattedOutput: + """Output from a formatter. + + Visual structure of an entry row:: + + ┌─────────────┬────────────────────────────┬─────────────────┐ + │ Name │ Type │ Preview │ + ├─────────────┼────────────────────────────┼─────────────────┤ + │ (from key) │ type_html or type_name │ preview_html or │ + │ │ + warnings + [Expand ▼] │ preview (text) │ + └─────────────┴────────────────────────────┴─────────────────┘ + │ (if expanded_html provided and clicked) + ▼ + ┌─────────────────────────────────────────────────┐ + │ expanded_html content (collapsible row) │ + └─────────────────────────────────────────────────┘ + + Field precedence rules + ---------------------- + Some fields have precedence relationships when multiple are provided: + + **Type column** (``type_name`` vs ``type_html``): + - ``type_name`` is always required and used for search/filter (data-dtype) + - If ``type_html`` is provided, it replaces the visual display + - ``type_name`` is still used for the data-dtype attribute regardless + + **Preview column** (``preview`` vs ``preview_html``): + - If ``preview_html`` is provided, it is used (raw HTML) + - Otherwise, ``preview`` is used as plain text (auto-escaped) + - A warning is logged if both are provided + + Field naming convention + ----------------------- + - ``*_html`` fields contain raw HTML (caller responsible for escaping) + - Other string fields are plain text (auto-escaped when rendered) + + Available CSS classes + --------------------- + Built-in dtype classes for ``css_class`` (BEM modifiers on ``anndata-dtype``): + - ``anndata-dtype--category``: Categorical data (purple) + - ``anndata-dtype--int``, ``anndata-dtype--float``: Numeric types (blue) + - ``anndata-dtype--bool``: Boolean (red) + - ``anndata-dtype--string``: String data (dark blue) + - ``anndata-dtype--sparse``: Sparse matrices (green) + - ``anndata-dtype--array``, ``anndata-dtype--ndarray``: Arrays (blue) + - ``anndata-dtype--dataframe``: DataFrames (purple) + - ``anndata-dtype--anndata``: Nested AnnData (red, bold) + - ``anndata-dtype--dask``: Dask arrays (orange) + - ``anndata-dtype--gpu``: GPU arrays (lime) + - ``anndata-dtype--awkward``: Awkward arrays (orange-red) + - ``anndata-dtype--unknown``: Unknown types (gray, italic) + - ``anndata-dtype--extension``: Extension types (purple) + """ + + type_name: str = "unknown" + """Text for the type column (e.g., 'ndarray (100, 50) float32'). + Always used for data-dtype attribute (search/filter). Auto-escaped. + Defaults to 'unknown' for resilience when type extraction fails.""" + + type_html: str | None = None + """Optional. Raw HTML to render in type column instead of type_name. + If provided, replaces the visual rendering but type_name still used for data-dtype.""" + + css_class: str = CSS_DTYPE_UNKNOWN + """CSS class for styling the type column.""" + + tooltip: str = "" + """Tooltip text on hover.""" + + warnings: list[str] = field(default_factory=list) + """Warning messages to display with warning icon.""" + + preview: str | None = None + """Optional. Plain text for preview column (rightmost). Auto-escaped. + Mutually exclusive with preview_html.""" + + preview_html: str | None = None + """Optional. Raw HTML for preview column (e.g., category pills with colors). + Takes precedence over preview if both provided (with warning).""" + + expanded_html: str | None = None + """Optional. Raw HTML for expandable content shown in collapsible row below. + If provided, an 'Expand ▼' button is added to the type column.""" + + is_serializable: bool = True + """Whether this type can be serialized to H5AD/Zarr.""" + + error: str | None = None + """Hard error message. If set, row is highlighted red and error shown in preview. + + **Precedence**: If ``error`` is set, it takes precedence over ``preview`` and + ``preview_html`` - the error message is displayed instead of any preview content. + + Used for: formatter failures, key validation errors, property access failures. + Ecosystem packages can set this explicitly or just raise (caught by registry).""" + + +@dataclass +class FormattedEntry: + """A single entry in a section (e.g., one column in obs).""" + + key: str + """The key/name of this entry""" + + output: FormattedOutput + """Formatted output for this entry""" + + +@dataclass +class FormatterContext: + """Context passed to formatters for stateful formatting.""" + + depth: int = 0 + """Current recursion depth""" + + max_depth: int = DEFAULT_MAX_DEPTH + """Maximum recursion depth""" + + fold_threshold: int = DEFAULT_FOLD_THRESHOLD + """Auto-fold sections with more than this many entries""" + + max_items: int = DEFAULT_MAX_ITEMS + """Maximum items to show per section""" + + max_categories: int = DEFAULT_MAX_CATEGORIES + """Maximum category values to display inline""" + + max_lazy_categories: int = DEFAULT_MAX_LAZY_CATEGORIES + """Maximum categories to load for lazy categoricals. + + For lazy AnnData (from read_lazy()), loading categories requires reading + from disk. This limit prevents loading categories for columns with many + unique values (which would be slow and produce cluttered output). + Set to 0 to skip loading categories entirely for lazy columns. + """ + + max_string_length: int = DEFAULT_MAX_STRING_LENGTH + """Truncate strings longer than this in previews""" + + unique_limit: int = DEFAULT_UNIQUE_LIMIT + """Max rows to compute unique counts (0 to disable)""" + + parent_keys: tuple[str, ...] = () + """Keys of parent objects (for building access paths)""" + + adata_ref: object = None + """Reference to the root AnnData object (for color lookups etc.)""" + + section: str = "" + """Current section being formatted (obs, var, uns, etc.)""" + + key: str | None = None + """Key/name of the current entry being formatted (column name, uns key, etc.)""" + + def child(self, key: str) -> FormatterContext: + """Create a child context for nested formatting.""" + return replace( + self, + depth=self.depth + 1, + parent_keys=(*self.parent_keys, key), + ) + + @property + def access_path(self) -> str: + """Build Python access path string.""" + if not self.parent_keys: + return "" + parts = [] + for key in self.parent_keys: + if key.isidentifier(): + parts.append(f".{key}") + else: + parts.append(f"[{key!r}]") + return "".join(parts) + + +class TypeFormatter[T](ABC): + """ + Base class for type-specific formatters. + + Subclass this to add support for new types. The formatter will be + called when can_format() returns True for an object. + + The generic parameter ``T`` specifies the type that this formatter handles. + When ``can_format()`` returns ``True``, it narrows the type of ``obj`` to ``T`` + via :class:`~typing.TypeGuard`, so ``format()`` receives ``obj: T`` without + manual casts. For duck-typed formatters, use ``TypeFormatter[object]``. + + Attributes + ---------- + priority : int + Determines order of checking (higher = checked first). Default: 0. + sections : tuple[str, ...] | None + If set, this formatter only applies to the specified sections. + Use standard section names: "obs", "var", "uns", "obsm", "varm", + "layers", "obsp", "varp", "raw", "X". + If None (default), applies to all sections. + + Examples + -------- + Formatter that only applies to uns section:: + + @register_formatter + class MyUnsFormatter(TypeFormatter[MySpecialType]): + sections = ("uns",) + + def can_format(self, obj, context): + return isinstance(obj, MySpecialType) + + def format(self, obj, context): + return FormattedOutput(type_name="MySpecial") + + Formatter that applies to obsm and varm:: + + @register_formatter + class MyMatrixFormatter(TypeFormatter[MyMatrixType]): + sections = ("obsm", "varm") + + def can_format(self, obj, context): + return isinstance(obj, MyMatrixType) + + def format(self, obj, context): + return FormattedOutput(type_name=f"MyMatrix {obj.shape}") + + Formatter that uses context for metadata lookup:: + + @register_formatter + class AnnotatedCategoricalFormatter(TypeFormatter[pd.Series]): + priority = 115 # Higher than default CategoricalFormatter + sections = ("obs", "var") + + def can_format(self, obj, context): + # Check if column has metadata in uns + if not (isinstance(obj, pd.Series) and hasattr(obj, "cat")): + return False + if context.adata_ref is None or context.key is None: + return False + annotations = context.adata_ref.uns.get("__annotations__", {}) + return context.key in annotations.get(context.section, {}) + + def format(self, obj, context): + # Use metadata to render enhanced output + return FormattedOutput(type_name="category[annotated]") + """ + + priority: int = 0 + sections: tuple[str, ...] | None = None + + @abstractmethod + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[T]: + """Return True if this formatter can handle the given object. + + When this returns ``True``, the type checker narrows ``obj`` to ``T``. + + Parameters + ---------- + obj + The object to check. + context + Formatter context (see :class:`FormatterContext`). Key attributes: + + - ``section``: Current section ("obs", "var", "uns", etc.) + - ``key``: Current entry key (column name for obs/var, dict key for uns, etc.) + - ``adata_ref``: Reference to root AnnData (for uns lookups) + + Use these for context-aware decisions, e.g., looking up metadata + in ``context.adata_ref.uns`` based on ``context.key``. + See the "Formatter that uses context" example in the class docstring. + """ + ... + + @abstractmethod + def format(self, obj: T, context: FormatterContext) -> FormattedOutput: + """Format the object and return FormattedOutput.""" + ... + + +class SectionFormatter(ABC): + """ + Base class for section-specific formatters. + + Subclass this to customize how entire sections (obs, var, uns, etc.) + are formatted. This allows packages like TreeData, MuData, SpatialData + to add custom sections (e.g., obst, vart, mod, spatial). + + A single SectionFormatter can handle multiple sections by setting + ``section_names`` (tuple). Use ``should_show()`` returning False to + suppress sections entirely (they won't appear in "other" either). + + Example usage:: + + from anndata._repr import ( + register_formatter, + SectionFormatter, + FormattedEntry, + FormattedOutput, + ) + + + @register_formatter + class ObstSectionFormatter(SectionFormatter): + section_name = "obst" + after_section = "obsm" # Position after obsm + + def should_show(self, obj): + return hasattr(obj, "obst") and len(obj.obst) > 0 + + def get_entries(self, obj, context): + entries = [] + for key, value in obj.obst.items(): + output = FormattedOutput( + type_name=f"Tree ({value.n_nodes} nodes)", + css_class="anndata-dtype--tree", + ) + entries.append(FormattedEntry(key=key, output=output)) + return entries + + Example - suppress multiple sections:: + + @register_formatter + class SuppressInternalSections(SectionFormatter): + section_names = ("obsmap", "varmap", "axis") + + @property + def section_name(self) -> str: + return self.section_names[0] + + def should_show(self, obj) -> bool: + return False # Never show + + def get_entries(self, obj, context): + return [] + """ + + @property + @abstractmethod + def section_name(self) -> str: + """Primary name of the section this formatter handles.""" + ... + + @property + def section_names(self) -> tuple[str, ...]: + """ + All section names this formatter handles. + + Override this to handle multiple sections with one formatter. + Defaults to a tuple containing just section_name. + """ + return (self.section_name,) + + @property + def display_name(self) -> str: + """Display name (defaults to section_name).""" + return self.section_name + + @property + def after_section(self) -> str | None: + """ + Section after which this custom section should appear. + + If None, appears at the end. Valid values are standard sections: + "X", "obs", "var", "uns", "obsm", "varm", "layers", "obsp", "varp", "raw" + """ + return None + + @property + def doc_url(self) -> str | None: + """URL to documentation for this section.""" + return None + + @property + def tooltip(self) -> str: + """Tooltip text for section header.""" + return "" + + @abstractmethod + def get_entries( + self, obj: object, context: FormatterContext + ) -> list[FormattedEntry]: + """Get all entries for this section.""" + ... + + def should_show(self, obj: object) -> bool: + """Return True if this section should be displayed.""" + return True + + +class FallbackFormatter(TypeFormatter[object]): + """ + Fallback formatter for unknown types. + + This is the last line of defense - it must NEVER raise an exception. + Every single attribute access is wrapped defensively because objects may have + malicious __getattr__, broken properties, or custom metaclasses that fail. + """ + + priority: int = -1000 # Lowest priority, always checked last + + def can_format(self, obj: object, context: FormatterContext) -> TypeGuard[object]: + return True # Can format anything + + def format( # noqa: PLR0912, PLR0915 + self, + obj: object, + context: FormatterContext, + *, + outer_error: str | None = None, + ) -> FormattedOutput: + """Format any object defensively, never raising exceptions. + + Parameters + ---------- + obj + Object to format + context + Formatter context + outer_error + Error message from a failed formatter (passed by registry) + """ + # === Type name (with fallback) === + type_name = "unknown" + try: # noqa: SIM105 + type_name = type(obj).__name__ + except Exception: # noqa: BLE001 + pass + + # === Module (with fallback) === + module = None + try: # noqa: SIM105 + module = type(obj).__module__ + except Exception: # noqa: BLE001 + pass + + # === Build full name safely === + full_name = type_name + try: + if module and module != "builtins": + full_name = f"{module}.{type_name}" + except Exception: # noqa: BLE001 + pass + + # === Gather info defensively === + tooltip_parts: list[str] = [] + access_errors: list[str] = [] + + # Type info for tooltip + try: # noqa: SIM105 + tooltip_parts.append(f"Type: {full_name}") + except Exception: # noqa: BLE001 + pass + + # Shape + try: + if hasattr(obj, "shape"): + shape = obj.shape + tooltip_parts.append(f"Shape: {shape}") + except Exception as e: # noqa: BLE001 + try: + access_errors.append(f".shape raised {type(e).__name__}") + except Exception: # noqa: BLE001 + access_errors.append(".shape failed") + + # dtype + try: + if hasattr(obj, "dtype"): + dtype = obj.dtype + tooltip_parts.append(f"Dtype: {dtype}") + except Exception as e: # noqa: BLE001 + try: + access_errors.append(f".dtype raised {type(e).__name__}") + except Exception: # noqa: BLE001 + access_errors.append(".dtype failed") + + # len + try: + if hasattr(obj, "__len__"): + length = len(obj) + tooltip_parts.append(f"Length: {length}") + if length > 1_000_000_000: + access_errors.append(f"len() = {length:,} (suspicious)") + except Exception as e: # noqa: BLE001 + try: + access_errors.append(f"len() raised {type(e).__name__}") + except Exception: # noqa: BLE001 + access_errors.append("len() failed") + + # repr (for tooltip only) + repr_str = None + try: + repr_str = repr(obj) + if repr_str and not repr_str.startswith("<"): + tooltip_parts.append(f"Repr: {repr_str[:100]}") + except Exception as e: # noqa: BLE001 + try: + access_errors.append(f"repr() raised {type(e).__name__}") + except Exception: # noqa: BLE001 + access_errors.append("repr() failed") + + # str + try: + str_val = str(obj) + # Just checking it doesn't fail, not using the result + _ = str_val + except Exception as e: # noqa: BLE001 + try: + access_errors.append(f"str() raised {type(e).__name__}") + except Exception: # noqa: BLE001 + access_errors.append("str() failed") + + # === Combine all errors === + all_errors: list[str] = [] + if outer_error: + all_errors.append(outer_error) + all_errors.extend(access_errors) + + error = "; ".join(all_errors) if all_errors else None + + # === Check serializability === + is_serial = True + serial_reason = "" + try: + from .utils import is_serializable + + is_serial, serial_reason = is_serializable(obj) + except Exception: # noqa: BLE001 + pass + + # === Build preview_html for errors === + # SECURITY: All text must be HTML-escaped to prevent XSS + preview_html = None + warnings: list[str] = [] + + # Add serialization reason to warnings if not serializable + if not is_serial and serial_reason: + warnings.append(serial_reason) + + if all_errors: + try: + error_text = escape_html(", ".join(all_errors)) + preview_html = f'{error_text}' + except Exception: # noqa: BLE001 + preview_html = f'Error' + else: + # No errors - check if unknown type warning needed + try: + is_extension = module and not module.startswith(( + "anndata", + "numpy", + "pandas", + "scipy", + )) + if not is_extension: + warnings.append(f"Unknown type: {full_name}") + warning_text = escape_html(f"Unknown type: {full_name}") + preview_html = ( + f'{warning_text}' + ) + except Exception: # noqa: BLE001 + pass + + # === Build tooltip safely === + tooltip = "" + try: # noqa: SIM105 + tooltip = "\n".join(tooltip_parts) + except Exception: # noqa: BLE001 + pass + + # === Determine CSS class === + css_class = CSS_DTYPE_UNKNOWN + try: + is_extension = module and not module.startswith(( + "anndata", + "numpy", + "pandas", + "scipy", + )) + if is_extension: + css_class = CSS_DTYPE_EXTENSION + except Exception: # noqa: BLE001 + pass + + return FormattedOutput( + type_name=type_name, + css_class=css_class, + tooltip=tooltip, + warnings=warnings, + preview_html=preview_html, + is_serializable=is_serial, + error=error, + ) + + +def _apply_key_validation( + result: FormattedOutput, context: FormatterContext +) -> FormattedOutput: + """Apply key validation to a formatted result. + + Checks if context.key is valid for HDF5/Zarr serialization and adds + warnings to the result if not. This centralizes key validation so + all formatters benefit automatically. + + Parameters + ---------- + result + The FormattedOutput from a formatter + context + The formatter context (uses context.key) + + Returns + ------- + FormattedOutput, possibly with added warnings and updated is_serializable + """ + if context.key is None: + return result + + key_valid, key_reason, is_hard_error = validate_key(context.key) + if key_valid: + return result + + # Key has issues - add warning and possibly mark as not serializable + new_warnings = [*result.warnings, key_reason] + new_serializable = result.is_serializable and not is_hard_error + + return replace(result, warnings=new_warnings, is_serializable=new_serializable) + + +class FormatterRegistry: + """ + Registry for type and section formatters. + + This is the central registry that manages all formatters. It supports: + - Registering new formatters at runtime + - Priority-based formatter selection + - Graceful fallback for unknown types + - Thread-safe operation + """ + + def __init__(self) -> None: + self._type_formatters: list[TypeFormatter] = [] + self._section_formatters: dict[str, SectionFormatter] = {} + self._fallback = FallbackFormatter() + + def register_type_formatter(self, formatter: TypeFormatter) -> None: + """ + Register a type formatter. + + Formatters are checked in priority order (highest first). + """ + self._type_formatters.append(formatter) + # Keep sorted by priority (highest first) + self._type_formatters.sort(key=lambda f: -f.priority) + + def register_section_formatter(self, formatter: SectionFormatter) -> None: + """Register a section formatter for all its section_names.""" + for name in formatter.section_names: + self._section_formatters[name] = formatter + + def unregister_type_formatter(self, formatter: TypeFormatter) -> bool: + """Unregister a type formatter. Returns True if found and removed.""" + try: + self._type_formatters.remove(formatter) + return True + except ValueError: + return False + + def format_value(self, obj: object, context: FormatterContext) -> FormattedOutput: + """ + Format a value using the appropriate formatter. + + Tries each registered formatter in priority order, falling back + to the fallback formatter if none match. Formatters with a `sections` + property are only checked if the current section matches. + + If a formatter raises an exception, we continue to try other formatters. + Failed formatters are accumulated and: + - If a later formatter succeeds: warn about the failures + - If all fail: pass accumulated errors to fallback + """ + from .._warnings import warn + + current_section = context.section + # Track failures: (full_msg_for_warning, short_msg_for_html) + failed_formatters: list[tuple[str, str]] = [] + + for formatter in self._type_formatters: + # Check if formatter is restricted to specific sections + if ( + formatter.sections is not None + and current_section not in formatter.sections + ): + continue + + try: + if formatter.can_format(obj, context): + result = formatter.format(obj, context) + # Success! Warn about any earlier failures + if failed_formatters: + warn_msgs = [f[0] for f in failed_formatters] + try: + success_name = type(formatter).__name__ + except Exception: # noqa: BLE001 + success_name = "Formatter" + warn( + f"Formatters failed before {success_name} succeeded: " + f"{'; '.join(warn_msgs)}", + UserWarning, + ) + return _apply_key_validation(result, context) + except Exception as e: # noqa: BLE001 + # Formatter failed - record and continue to next + # Build both messages defensively + try: + formatter_name = type(formatter).__name__ + except Exception: # noqa: BLE001 + formatter_name = "Formatter" + + # Full message for warning (debugging) + try: + full_msg = f"{formatter_name}: {e}" + except Exception: # noqa: BLE001 + full_msg = f"{formatter_name} failed" + + # Short message for HTML (exception type only) + try: + error_type = type(e).__name__ + short_msg = f"{formatter_name} raised {error_type}" + except Exception: # noqa: BLE001 + short_msg = f"{formatter_name} failed" + + failed_formatters.append((full_msg, short_msg)) + # Warn immediately for debugging + warn(f"Formatter {full_msg}", UserWarning) + + # No formatter succeeded - pass accumulated errors to fallback + if failed_formatters: + short_msgs = [f[1] for f in failed_formatters] + outer_error = "; ".join(short_msgs) + else: + outer_error = None + + result = self._fallback.format(obj, context, outer_error=outer_error) + return _apply_key_validation(result, context) + + def get_section_formatter(self, section: str) -> SectionFormatter | None: + """Get the formatter for a section, or None if not registered.""" + return self._section_formatters.get(section) + + def get_registered_sections(self) -> list[str]: + """Get list of registered section names.""" + return list(self._section_formatters.keys()) + + def get_formatter_for( + self, obj: object, context: FormatterContext | None = None + ) -> tuple[TypeFormatter | None, str]: + """ + Debug helper: find which formatter would handle an object. + + This is useful for understanding formatter priority and debugging + why a specific formatter is or isn't being selected. + + Parameters + ---------- + obj + The object to find a formatter for + context + Optional FormatterContext. If None, creates a minimal context. + + Returns + ------- + tuple of (formatter, reason) + formatter: The TypeFormatter that would handle this object, or None + reason: String explaining why this formatter was selected or why none matched + + Examples + -------- + >>> import numpy as np + >>> from anndata._repr import formatter_registry, FormatterContext + >>> arr = np.array([1, 2, 3]) + >>> formatter, reason = formatter_registry.get_formatter_for(arr) + >>> print(f"{type(formatter).__name__}: {reason}") + NumpyArrayFormatter: can_format returned True (priority=100) + """ + if context is None: + context = FormatterContext() + + current_section = context.section + for formatter in self._type_formatters: + # Check section restriction + if ( + formatter.sections is not None + and current_section not in formatter.sections + ): + continue + + try: + if formatter.can_format(obj, context): + sections_str = ( + f", sections={formatter.sections}" if formatter.sections else "" + ) + return ( + formatter, + f"can_format returned True (priority={formatter.priority}{sections_str})", + ) + except Exception as e: # noqa: BLE001 + return ( + None, + f"{type(formatter).__name__} raised {type(e).__name__}: {e}", + ) + + return (self._fallback, "No formatter matched, using fallback") + + def list_formatters(self) -> list[dict[str, str | int | tuple[str, ...] | None]]: + """ + List all registered type formatters with their properties. + + Returns a list of dicts with formatter info, sorted by priority (highest first). + Useful for debugging formatter registration and priority ordering. + + Returns + ------- + List of dicts with keys: name, priority, sections, module + + Examples + -------- + >>> from anndata._repr import formatter_registry + >>> for f in formatter_registry.list_formatters()[:5]: + ... print(f"{f['priority']:4d} {f['name']}") + 150 AnnDataFormatter + 120 DaskArrayFormatter + 120 AwkwardArrayFormatter + 110 NumpyMaskedArrayFormatter + 110 CategoricalFormatter + """ + return [ + { + "name": type(formatter).__name__, + "priority": formatter.priority, + "sections": formatter.sections, + "module": type(formatter).__module__, + } + for formatter in self._type_formatters + ] + + +# Global registry instance +formatter_registry = FormatterRegistry() + + +# Type hint key used in uns dicts to indicate custom rendering +UNS_TYPE_HINT_KEY = "__anndata_repr__" + + +def extract_uns_type_hint(value: object) -> tuple[str | None, object]: + """ + Extract type hint from data if present. + + This is a utility function for TypeFormatter implementations that need + to handle tagged data. Data can be tagged with a type hint to indicate + which package should render it, without requiring that package to be + installed or imported. + + Supports two formats: + + 1. Dict with __anndata_repr__ key:: + + {"__anndata_repr__": "mypackage.mytype", "data": {"key": "value"}} + # Returns ("mypackage.mytype", {"data": {"key": "value"}}) + + 2. String with prefix:: + + "__anndata_repr__:mypackage.mytype::actual content here" + # Returns ("mypackage.mytype", "actual content here") + + If no type hint found, returns (None, original_value). + + How to make a formatter available + --------------------------------- + When data contains a type hint but no formatter is registered for it, + anndata shows a fallback message: "[mypackage.mytype] (import mypackage)". + This tells users they need to import the package to see the custom rendering. + + To register a formatter that handles tagged data: + + 1. In your package (e.g., mypackage/__init__.py), register a TypeFormatter:: + + from anndata._repr import ( + register_formatter, + TypeFormatter, + FormattedOutput, + extract_uns_type_hint, + ) + + + @register_formatter + class MyTypeFormatter(TypeFormatter): + priority = 100 # Check before fallback formatters + sections = ("uns",) # Only apply to uns section + + def can_format(self, obj, context): + hint, _ = extract_uns_type_hint(obj) + return hint == "mypackage.mytype" + + def format(self, obj, context): + hint, data = extract_uns_type_hint(obj) + # Render your custom visualization + return FormattedOutput( + type_name="mytype", + preview_html="Custom rendering", + ) + + 2. When the user imports your package, the formatter is registered + and will be used automatically for any data tagged with your hint. + + Parameters + ---------- + value + The value to check for a type hint + + Returns + ------- + tuple of (type_hint or None, cleaned_value) + If a type hint is found, returns (hint_string, data_without_hint). + Otherwise returns (None, original_value). + """ + # Check dict format + if isinstance(value, dict) and UNS_TYPE_HINT_KEY in value: + hint = value.get(UNS_TYPE_HINT_KEY) + if isinstance(hint, str): + # Return value without the type hint key + cleaned = {k: v for k, v in value.items() if k != UNS_TYPE_HINT_KEY} + return hint, cleaned + + # Check string prefix format + if isinstance(value, str) and value.startswith(f"{UNS_TYPE_HINT_KEY}:"): + # Format: "__anndata_repr__:type.hint::content" + rest = value[len(UNS_TYPE_HINT_KEY) + 1 :] # After "__anndata_repr__:" + if "::" in rest: + hint, content = rest.split("::", 1) + return hint, content + + return None, value + + +def register_formatter( + formatter: TypeFormatter | SectionFormatter, +) -> TypeFormatter | SectionFormatter: + """ + Register a formatter with the global registry. + + Can be used as a decorator: + + @register_formatter + class MyFormatter(TypeFormatter): + ... + + Or called directly: + + register_formatter(MyFormatter()) + """ + if isinstance(formatter, type): + # Called with class, instantiate it + formatter = formatter() + + if isinstance(formatter, TypeFormatter): + formatter_registry.register_type_formatter(formatter) + elif isinstance(formatter, SectionFormatter): + formatter_registry.register_section_formatter(formatter) + else: + msg = f"Expected TypeFormatter or SectionFormatter, got {type(formatter)}" + raise TypeError(msg) + + return formatter diff --git a/src/anndata/_repr/sections.py b/src/anndata/_repr/sections.py new file mode 100644 index 000000000..866c2d9e1 --- /dev/null +++ b/src/anndata/_repr/sections.py @@ -0,0 +1,586 @@ +""" +Section-specific renderers for AnnData HTML representation. + +This module contains renderers for each section type: +- DataFrame sections (obs, var) +- Mapping sections (obsm, varm, layers, obsp, varp) +- Uns section (unstructured annotations) +- Raw section (unprocessed data) +- Unknown sections (extension attributes) + +Error Handling Policy +--------------------- +This module uses broad exception handling (``except Exception``) in several places. +This is intentional - user data may contain arbitrary objects that raise unexpected +exceptions when accessed. The repr should never crash; instead it should: + +1. Use ``# noqa: BLE001`` to acknowledge the broad catch +2. Provide a fallback (e.g., show "?" or skip the problematic item) +3. Continue rendering the rest of the representation + +This ensures a partially-rendered repr is always better than a crashed cell. +""" + +from __future__ import annotations + +from dataclasses import replace +from typing import TYPE_CHECKING + +from .._repr_constants import ( + CSS_DTYPE_ANNDATA, + CSS_DTYPE_UNKNOWN, + ERROR_TRUNCATE_LENGTH, + INTERNAL_ANNDATA_ATTRS, +) +from .._types import AnnDataElem +from ..utils import get_literal_members +from . import ( + get_section_doc_url, +) +from .components import ( + TypeCellConfig, + render_entry_preview_cell, + render_entry_row_open, + render_entry_type_cell, + render_name_cell, + render_nested_content, +) +from .core import ( + get_section_tooltip, + render_empty_section, + render_formatted_entry, + render_section, + render_truncation_indicator, + render_x_entry, +) +from .registry import ( + FormattedEntry, + extract_uns_type_hint, + formatter_registry, +) +from .utils import ( + escape_html, + format_index_preview, + format_number, +) + +if TYPE_CHECKING: + import pandas as pd + + from anndata import AnnData + + from .registry import FormattedOutput, FormatterContext + + +def _render_entry_row( + key: str, + output: FormattedOutput, + *, + append_type_html: bool = False, + preview_note: str | None = None, +) -> str: + """Render an entry row for DataFrame, mapping, or uns sections. + + Key validation is handled by FormatterRegistry.format_value() via context.key, + so output already contains any key-related warnings and serialization flags. + + Parameters + ---------- + key + Entry key/name to display + output + FormattedOutput from a TypeFormatter (already includes key validation) + append_type_html + If True, append type_html below type_name (for mapping entries) + preview_note + Optional note to prepend to preview (for type hints in uns) + + Returns + ------- + HTML string for the entry row (and optional expandable content row) + """ + entry = FormattedEntry(key=key, output=output) + return render_formatted_entry( + entry, + append_type_html=append_type_html, + preview_note=preview_note, + ) + + +# ----------------------------------------------------------------------------- +# DataFrame Section (obs, var) +# ----------------------------------------------------------------------------- + + +def _render_dataframe_section( + section: str, + df: pd.DataFrame, + context: FormatterContext, +) -> str: + """Render obs or var section.""" + n_cols = len(df.columns) + + # Doc URL and tooltip for this section + doc_url = get_section_doc_url(section) + tooltip = "Observation annotations" if section == "obs" else "Variable annotations" + + if n_cols == 0: + return render_empty_section(section, doc_url, tooltip) + + # Set section for section-specific formatters (e.g., LazyColumnFormatter) + section_context = replace(context, section=section) + + # Render entries (with truncation) + rows = [] + for i, col_name in enumerate(df.columns): + if i >= context.max_items: + rows.append(render_truncation_indicator(n_cols - context.max_items)) + break + col = df[col_name] + col_context = replace(section_context, key=col_name) + output = formatter_registry.format_value(col, col_context) + rows.append(_render_entry_row(col_name, output)) + + return render_section( + section, + "\n".join(rows), + n_items=n_cols, + doc_url=doc_url, + tooltip=tooltip, + should_collapse=n_cols > context.fold_threshold, + count_str=f"({n_cols} columns)", + ) + + +# ----------------------------------------------------------------------------- +# Mapping Section (obsm, varm, layers, obsp, varp) +# ----------------------------------------------------------------------------- + + +def _render_mapping_section( + section: str, + mapping: object, + context: FormatterContext, +) -> str: + """Render obsm, varm, layers, obsp, varp sections.""" + if mapping is None: + return "" + + # Get count without creating full list (O(1) for most mappings) + n_items = len(mapping) + + # Doc URL and tooltip for this section + doc_url = get_section_doc_url(section) + tooltip = get_section_tooltip(section) + + if n_items == 0: + return render_empty_section(section, doc_url, tooltip) + + # Set section for section-specific formatters (e.g., DaskArrayFormatter) + section_context = replace(context, section=section) + + # Render entries (with truncation) - iterate lazily, stop at max_items + rows = [] + for i, key in enumerate(mapping.keys()): + if i >= context.max_items: + rows.append(render_truncation_indicator(n_items - context.max_items)) + break + value = mapping[key] + key_context = replace(section_context, key=key) + output = formatter_registry.format_value(value, key_context) + rows.append(_render_entry_row(key, output, append_type_html=True)) + + return render_section( + section, + "\n".join(rows), + n_items=n_items, + doc_url=doc_url, + tooltip=tooltip, + should_collapse=n_items > context.fold_threshold, + ) + + +# ----------------------------------------------------------------------------- +# Uns Section (unstructured annotations) +# ----------------------------------------------------------------------------- + + +def _render_uns_section( + uns: object, + context: FormatterContext, +) -> str: + """Render the uns section with special handling.""" + # Get count without creating full list (O(1) for dict) + n_items = len(uns) + + # Doc URL and tooltip + doc_url = get_section_doc_url("uns") + tooltip = "Unstructured annotation" + + if n_items == 0: + return render_empty_section("uns", doc_url, tooltip) + + # Render entries (with truncation) - iterate lazily, stop at max_items + rows = [] + for i, key in enumerate(uns.keys()): + if i >= context.max_items: + rows.append(render_truncation_indicator(n_items - context.max_items)) + break + value = uns[key] + rows.append(_render_uns_entry(key, value, context)) + + return render_section( + "uns", + "\n".join(rows), + n_items=n_items, + doc_url=doc_url, + tooltip=tooltip, + should_collapse=n_items > context.fold_threshold, + ) + + +def _render_uns_entry( + key: str, + value: object, + context: FormatterContext, +) -> str: + """Render a single uns entry with special type handling. + + Rendering priority: + 1. Custom TypeFormatter (may handle type hints, color lists, AnnData) + 2. Unhandled type hint (show import suggestion) + 3. Default formatter + """ + # Pass key to context for key-based detection (e.g., color lists) + key_context = replace(context, key=key) + + # 1. Try formatter first - handles type hints, color lists, AnnData + output = formatter_registry.format_value(value, key_context) + + # If a custom formatter produced preview_html, use it directly + if output.preview_html: + return _render_entry_row(key, output) + + # 2. Check for unhandled type hint (basic formatter matched, not custom) + type_hint, cleaned_value = extract_uns_type_hint(value) + if type_hint is not None: + # Type hint present but no custom formatter - show import suggestion + package_name = type_hint.split(".")[0] if "." in type_hint else type_hint + cleaned_output = formatter_registry.format_value(cleaned_value, key_context) + return _render_entry_row( + key, + cleaned_output, + preview_note=f"[{type_hint}] (import {package_name} to enable)", + ) + + # 3. Use formatter output + return _render_entry_row(key, output) + + +# ----------------------------------------------------------------------------- +# Unknown Sections (extension attributes) +# ----------------------------------------------------------------------------- + + +def _detect_unknown_sections(adata: AnnData) -> list[tuple[str, str]]: + """Detect mapping-like attributes not surfaced by the standard section list. + + Returns list of (attr_name, type_description) tuples for unknown sections. + """ + from collections.abc import Mapping + + # See INTERNAL_ANNDATA_ATTRS docstring for why the internal list is explicit. + known = set(get_literal_members(AnnDataElem)) | INTERNAL_ANNDATA_ATTRS + + # Also exclude sections with registered custom formatters + # (including should_show=False ones that suppress display). + known |= set(formatter_registry.get_registered_sections()) + + unknown = [] + for attr in dir(adata): + # Skip private, known, and callable attributes + if attr.startswith("_") or attr in known: + continue + + try: + val = getattr(adata, attr) + # Check if it's a data container (mapping-like or has keys()) + if isinstance(val, Mapping) or ( + hasattr(val, "keys") + and hasattr(val, "__getitem__") + and not callable(val) + ): + # Get type description + type_name = type(val).__name__ + try: + n_items = len(val) + type_desc = f"{type_name} ({n_items} items)" + except Exception: # noqa: BLE001 + type_desc = type_name + unknown.append((attr, type_desc)) + except Exception: # noqa: BLE001 + # If we can't even access the attribute, note it as inaccessible + unknown.append((attr, "inaccessible")) + + return unknown + + +def _render_unknown_sections(unknown_sections: list[tuple[str, str]]) -> str: + """Render a section showing unknown/unrecognized attributes.""" + parts = [ + '
' + ] + parts.append("") + parts.append('other') + parts.append( + f'({len(unknown_sections)})' + ) + parts.append("") + + parts.append('
') + parts.append('
') + + for attr_name, type_desc in unknown_sections: + parts.append(render_entry_row_open(attr_name, type_desc)) + parts.append(render_name_cell(attr_name)) + parts.append('') + parts.append( + f'' + f"{escape_html(type_desc)}" + ) + parts.append("") + parts.append('') + parts.append("
") + + parts.append("
") + parts.append("") + parts.append("
") + + return "\n".join(parts) + + +def _render_error_entry(section: str, error: str) -> str: + """Render an error indicator for a section that failed to render.""" + error_str = str(error) + if len(error_str) > ERROR_TRUNCATE_LENGTH: + error_str = error_str[:ERROR_TRUNCATE_LENGTH] + "..." + error_escaped = escape_html(error_str) + return f""" +
+ + {escape_html(section)} + (error) + +
+
+ Failed to render: {error_escaped} +
+
+
+""" + + +# ----------------------------------------------------------------------------- +# Raw Section +# ----------------------------------------------------------------------------- + + +def _safe_get_attr(obj: object, attr: str, default: object = "?") -> object: + """Safely get an attribute with fallback. + + Parameters + ---------- + obj + Object to get attribute from + attr + Attribute name + default + Default value if attribute is missing or access raises exception + + Returns + ------- + Attribute value or default + """ + try: + val = getattr(obj, attr, None) + return val if val is not None else default + except Exception: # noqa: BLE001 + return default + + +def _get_raw_meta_parts(raw: object) -> list[str]: + """Build meta info parts for raw section. + + Parameters + ---------- + raw + Raw object to extract metadata from + + Returns + ------- + List of metadata strings like ["var: 5 cols", "varm: 2"] + """ + meta_parts = [] + try: + if hasattr(raw, "var") and raw.var is not None and len(raw.var.columns) > 0: + meta_parts.append(f"var: {len(raw.var.columns)} cols") + except Exception: # noqa: BLE001 + pass + try: + if hasattr(raw, "varm") and raw.varm is not None and len(raw.varm) > 0: + meta_parts.append(f"varm: {len(raw.varm)}") + except Exception: # noqa: BLE001 + pass + return meta_parts + + +def _render_raw_section( + raw: object, + context: FormatterContext, +) -> str: + """Render the raw section as a single expandable row. + + The raw section shows unprocessed data that was saved before filtering/normalization. + It contains raw.X (the matrix), raw.var (variable annotations), and raw.varm + (multi-dimensional variable annotations). + + Unlike the main AnnData, raw shares obs with the parent but has its own var + (which may have more variables than the filtered main data). + + Rendered as a single row with an expand button (no section header). + When expanded, shows a full AnnData-like repr for Raw contents (X, var, varm). + The depth parameter prevents infinite recursion. + """ + if raw is None: + return "" + + # Safely get dimensions with fallbacks + n_obs = _safe_get_attr(raw, "n_obs", "?") + n_vars = _safe_get_attr(raw, "n_vars", "?") + + # Check if we can expand (same logic as nested AnnData) + can_expand = context.depth < context.max_depth - 1 + + # Build meta info string safely + meta_parts = _get_raw_meta_parts(raw) + meta_text = ", ".join(meta_parts) if meta_parts else "" + + # Single row container (like a minimal section with just one entry) + parts = ['
'] + parts.append('
') + + # Single row with raw info + type_str = f"{format_number(n_obs)} obs × {format_number(n_vars)} var" + parts.append(render_entry_row_open("raw", "Raw", has_expandable_content=can_expand)) + parts.append(render_name_cell("raw")) + type_cell_config = TypeCellConfig( + type_name=type_str, + css_class=CSS_DTYPE_ANNDATA, + ) + parts.append(render_entry_type_cell(type_cell_config)) + parts.append(render_entry_preview_cell(preview_text=meta_text)) + + # Nested content (entry is
/ when can_expand) + if can_expand: + nested_html = _generate_raw_repr_html(raw, context.child("raw")) + # Wrap in anndata-entry__nested-anndata for specific styling + wrapped_html = f'
{nested_html}
' + parts.append(render_nested_content(wrapped_html)) + parts.append("
") # close expandable entry + else: + parts.append("
") # close plain entry + + parts.append("
") # close entries grid + parts.append("") # close section + + return "\n".join(parts) + + +def _generate_raw_repr_html( + raw, + context: FormatterContext, +) -> str: + """Generate HTML repr for a Raw object. + + This renders X, var, and varm sections similar to AnnData, + but without obs, obsm, layers, obsp, varp, uns, or raw sections. + + Parameters + ---------- + raw + Raw object to render + context + FormatterContext with depth, max_depth, fold_threshold, max_items + """ + # Safely get dimensions + n_obs = _safe_get_attr(raw, "n_obs", "?") + n_vars = _safe_get_attr(raw, "n_vars", "?") + + parts = [] + + # Container with header showing Raw shape + container_id = f"raw-repr-{id(raw)}" + parts.append(f'
') + + # Header for Raw - same structure as AnnData header + parts.append('
') + parts.append('Raw') + shape_str = f"{format_number(n_obs)} obs × {format_number(n_vars)} var" + parts.append(f'{shape_str}') + parts.append("
") + + # Index preview (obs_names and var_names) + parts.append('
') + try: + obs_names = getattr(raw, "obs_names", None) + if obs_names is not None: + parts.append( + f"
obs_names: {format_index_preview(obs_names)}
" + ) + else: + parts.append( + "
obs_names: not available
" + ) + except Exception: # noqa: BLE001 + parts.append("
obs_names: not available
") + try: + var_names = getattr(raw, "var_names", None) + if var_names is not None: + parts.append( + f"
var_names: {format_index_preview(var_names)}
" + ) + else: + parts.append( + "
var_names: not available
" + ) + except Exception: # noqa: BLE001 + parts.append("
var_names: not available
") + parts.append("
") + + # X section - show matrix info (with error handling) + try: + if hasattr(raw, "X") and raw.X is not None: + parts.append(render_x_entry(raw, context)) + except Exception as e: # noqa: BLE001 + parts.append(_render_error_entry("X", str(e))) + + # var section (like AnnData's var) + try: + if hasattr(raw, "var") and raw.var is not None and len(raw.var.columns) > 0: + # Raw doesn't have the same structure as AnnData, so clear adata_ref + var_context = replace(context, adata_ref=None, section="var") + parts.append(_render_dataframe_section("var", raw.var, var_context)) + except Exception as e: # noqa: BLE001 + parts.append(_render_error_entry("var", str(e))) + + # varm section (like AnnData's varm) + try: + if hasattr(raw, "varm") and raw.varm is not None and len(raw.varm) > 0: + varm_context = replace(context, adata_ref=None, section="varm") + parts.append(_render_mapping_section("varm", raw.varm, varm_context)) + except Exception as e: # noqa: BLE001 + parts.append(_render_error_entry("varm", str(e))) + + parts.append("
") + + return "\n".join(parts) diff --git a/src/anndata/_repr/static/__init__.py b/src/anndata/_repr/static/__init__.py new file mode 100644 index 000000000..c4da67a5e --- /dev/null +++ b/src/anndata/_repr/static/__init__.py @@ -0,0 +1 @@ +"""Static assets for AnnData HTML representation.""" diff --git a/src/anndata/_repr/static/css_colors.txt b/src/anndata/_repr/static/css_colors.txt new file mode 100644 index 000000000..6be2db7bf --- /dev/null +++ b/src/anndata/_repr/static/css_colors.txt @@ -0,0 +1,197 @@ +# CSS3 Named Colors for AnnData HTML Representation +# ================================================== +# +# This file contains CSS named colors used to detect color lists in AnnData +# .uns entries (e.g., cluster_colors, batch_colors). When a key ends with +# "_colors" and contains values from this list (or hex/rgb values), it's +# rendered with color swatches in the HTML repr. +# +# Source: CSS Color Module Level 3 (W3C Recommendation) +# https://www.w3.org/TR/css-color-3/#svg-color +# +# Relation to Matplotlib +# ---------------------- +# Matplotlib's CSS4_COLORS dictionary contains the same 148 colors (147 CSS3 +# + rebeccapurple from CSS4). Scanpy and other tools use matplotlib to generate +# colors stored in adata.uns["{column}_colors"]. These are typically: +# +# - Hex colors: "#1f77b4", "#ff7f0e" (from matplotlib's default palettes) +# - Named colors: "red", "blue", "cornflowerblue" (from CSS/matplotlib) +# +# This file ensures named colors are recognized WITHOUT requiring matplotlib +# as a dependency. Hex and rgb() colors are always recognized regardless of +# this file. +# +# Format +# ------ +# - One color name per line, lowercase +# - Lines starting with # are comments (ignored) +# - Empty lines are ignored +# +# How to Update +# ------------- +# Option 1: From W3C specification +# Visit https://www.w3.org/TR/css-color-3/#svg-color +# +# Option 2: From MDN (more readable) +# Visit https://developer.mozilla.org/en-US/docs/Web/CSS/named-color +# +# Option 3: From matplotlib (if installed) +# python -c "from matplotlib.colors import CSS4_COLORS; print('\n'.join(sorted(CSS4_COLORS.keys())))" +# +# After updating, run: pytest tests/repr/ -x -q +# +# Note: Hex (#RGB, #RRGGBB, #RRGGBBAA) and functional (rgb(), rgba()) color +# formats are always recognized and don't need to be listed here. + +# Basic colors (HTML 4.01 / CSS 1) +black +white +gray +grey +silver +red +green +blue +yellow +cyan +magenta +maroon +navy +olive +purple +teal +aqua +lime +fuchsia +orange + +# Extended CSS3 colors (sorted alphabetically) +aliceblue +antiquewhite +aquamarine +azure +beige +bisque +blanchedalmond +blueviolet +brown +burlywood +cadetblue +chartreuse +chocolate +coral +cornflowerblue +cornsilk +crimson +darkblue +darkcyan +darkgoldenrod +darkgray +darkgrey +darkgreen +darkkhaki +darkmagenta +darkolivegreen +darkorange +darkorchid +darkred +darksalmon +darkseagreen +darkslateblue +darkslategray +darkslategrey +darkturquoise +darkviolet +deeppink +deepskyblue +dimgray +dimgrey +dodgerblue +firebrick +floralwhite +forestgreen +gainsboro +ghostwhite +gold +goldenrod +greenyellow +honeydew +hotpink +indianred +indigo +ivory +khaki +lavender +lavenderblush +lawngreen +lemonchiffon +lightblue +lightcoral +lightcyan +lightgoldenrodyellow +lightgray +lightgrey +lightgreen +lightpink +lightsalmon +lightseagreen +lightskyblue +lightslategray +lightslategrey +lightsteelblue +lightyellow +limegreen +linen +mediumaquamarine +mediumblue +mediumorchid +mediumpurple +mediumseagreen +mediumslateblue +mediumspringgreen +mediumturquoise +mediumvioletred +midnightblue +mintcream +mistyrose +moccasin +navajowhite +oldlace +olivedrab +orangered +orchid +palegoldenrod +palegreen +paleturquoise +palevioletred +papayawhip +peachpuff +peru +pink +plum +powderblue +rebeccapurple +rosybrown +royalblue +saddlebrown +salmon +sandybrown +seagreen +seashell +sienna +skyblue +slateblue +slategray +slategrey +snow +springgreen +steelblue +tan +thistle +tomato +turquoise +violet +wheat +whitesmoke +yellowgreen diff --git a/src/anndata/_repr/static/repr.css b/src/anndata/_repr/static/repr.css new file mode 100644 index 000000000..4c49bb19c --- /dev/null +++ b/src/anndata/_repr/static/repr.css @@ -0,0 +1,1097 @@ +/* AnnData HTML Representation Styles */ +/* Scoped to .anndata-repr to avoid conflicts */ +/* Uses native CSS nesting (Chrome 120+, Firefox 117+, Safari 17.2+) */ +/* Uses subgrid (Chrome 117+, Firefox 71+, Safari 16+) */ +/* Uses light-dark() (Chrome 123+, Firefox 120+, Safari 17.5+) — the bottleneck */ + +.anndata-repr { + /* Hide the no-CSS hint when styles are loaded */ + .anndata-repr__hint-nocss { + display: none; + } + + /* Show the no-JS hint only when CSS loads but JS hasn't run. + JS adds .anndata-repr--js on init, so this rule only matches + in CSS-but-no-JS environments (e.g., nbconvert static HTML). + The hint is hidden by default (inline display:none) for the no-CSS case. */ + &:not(.anndata-repr--js) .anndata-repr__hint-nojs { + display: block !important; + padding: 4px 12px; + font-size: 11px; + color: var(--anndata-text-muted); + } + + /* Opt into light-dark(): responds to used color scheme, defaulting to light */ + color-scheme: light dark; + + /* --- Theme overrides --- + When an app explicitly sets a theme, override the color scheme so + light-dark() picks the correct value regardless of OS preference. */ + + body.light-mode &, + [data-theme="light"] &, + html[data-theme="light"] &, + [data-jp-theme-light="true"] &, + .jp-Theme-Light &, + body.vscode-light &, + body[data-vscode-theme-kind="vscode-light"] & { + color-scheme: light; + } + + body.dark-mode &, + [data-theme="dark"] &, + html[data-theme="dark"] &, + [data-jp-theme-light="false"] &, + .jp-Theme-Dark &, + body.vscode-dark &, + body[data-vscode-theme-kind="vscode-dark"] & { + color-scheme: dark; + } + + /* CSS Variables — each defined once via light-dark(light, dark) */ + --anndata-bg-primary: light-dark(#ffffff, #1e1e1e); + --anndata-bg-secondary: light-dark(#f8f9fa, #252526); + --anndata-bg-tertiary: light-dark(#e9ecef, #2d2d2d); + --anndata-highlight: light-dark(#e7f1ff, #264f78); + --anndata-text-primary: light-dark(#212529, #e0e0e0); + --anndata-text-secondary: light-dark(#6c757d, #a0a0a0); + --anndata-text-muted: light-dark(#adb5bd, #707070); + --anndata-border-color: light-dark(#dee2e6, #404040); + --anndata-border-light: light-dark(#e9ecef, #333333); + --anndata-accent-color: light-dark(#0d6efd, #58a6ff); + --anndata-warning-color: light-dark(#ffc107, #d29922); + --anndata-warning-bg: light-dark(#fff3cd, #3d3200); + --anndata-error-color: light-dark(#dc3545, #f85149); + --anndata-error-bg: light-dark(#f8d7da, #3d1a1a); + --anndata-success-color: light-dark(#198754, #3fb950); + --anndata-info-color: light-dark(#0dcaf0, #58a6ff); + --anndata-link-color: light-dark(#0d6efd, #58a6ff); + --anndata-code-bg: light-dark(#f8f9fa, #2d2d2d); + --anndata-radius: 4px; + --anndata-font-mono: + ui-monospace, SFMono-Regular, "SF Mono", Menlo, Consolas, monospace; + --anndata-font-size: 13px; + --anndata-line-height: 1.4; + /* Dtype colors */ + --anndata-dtype-category: light-dark(#8250df, #d2a8ff); + --anndata-dtype-int: light-dark(#0550ae, #79c0ff); + --anndata-dtype-float: light-dark(#0550ae, #79c0ff); + --anndata-dtype-bool: light-dark(#cf222e, #ff7b72); + --anndata-dtype-string: light-dark(#0a3069, #a5d6ff); + --anndata-dtype-object: light-dark(#6e7781, #a0a0a0); + --anndata-dtype-sparse: light-dark(#1a7f37, #7ee787); + --anndata-dtype-array: light-dark(#0550ae, #79c0ff); + --anndata-dtype-dataframe: light-dark(#8250df, #d2a8ff); + --anndata-dtype-anndata: light-dark(#cf222e, #ff7b72); + --anndata-dtype-unknown: light-dark(#6e7781, #a0a0a0); + --anndata-dtype-extension: light-dark(#8250df, #d2a8ff); + --anndata-dtype-dask: light-dark(#fb8500, #ffc168); + --anndata-dtype-gpu: light-dark(#76b900, #a0db63); + --anndata-dtype-tpu: light-dark(#0891b2, #67e8f9); + --anndata-dtype-awkward: light-dark(#e85d04, #ff9d76); + --anndata-dtype-array-api: light-dark(#9a6700, #e6c400); + /* Column widths are set dynamically via inline style on container */ + + font-family: + -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, + sans-serif; + font-size: var(--anndata-font-size); + line-height: var(--anndata-line-height); + color: var(--anndata-text-primary); + background: var(--anndata-bg-primary); + border: 1px solid var(--anndata-border-color); + border-radius: var(--anndata-radius); + padding: 0; + margin: 8px 0; + max-width: 100%; + overflow: hidden; + + /* --- JS-enabled overrides --- */ + + &.anndata-repr--js { + .anndata-entry__preview { + overflow: hidden; + text-overflow: ellipsis; + white-space: nowrap; + } + + .anndata-categories { + display: inline; + white-space: nowrap; + word-break: normal; + + &.anndata-categories--wrapped { + display: inline; + max-width: none; + white-space: normal; + word-break: break-word; + overflow: visible; + text-overflow: clip; + } + } + + .anndata-columns { + display: inline; + white-space: nowrap; + word-break: normal; + + &.anndata-columns--wrapped { + display: inline; + max-width: none; + white-space: normal; + word-break: break-word; + overflow: visible; + text-overflow: clip; + } + } + + /* Allow preview cell to expand when wrap button is toggled */ + .anndata-entry__preview.anndata-entry--expanded { + white-space: normal; + overflow: visible; + text-overflow: clip; + } + + /* Wrap buttons: visibility is managed by JS (updateWrapButtonVisibility). + JS sets inline style.display based on overflow detection. + No CSS override needed here — just ensure they're not display:none + from the base rule, so JS can take over. */ + } + + /* --- Header --- */ + + .anndata-header { + display: flex; + flex-wrap: wrap; + align-items: center; + gap: 8px; + padding: 10px 12px; + background: var(--anndata-bg-secondary, #f8f9fa); + border-bottom: 1px solid var(--anndata-border-color, #dee2e6); + } + + .anndata-header__type { + font-weight: 600; + font-size: 14px; + color: var(--anndata-text-primary, #212529); + } + + .anndata-header__shape { + font-family: var( + --anndata-font-mono, + ui-monospace, + SFMono-Regular, + "SF Mono", + Menlo, + Consolas, + monospace + ); + font-size: 12px; + color: var(--anndata-text-secondary, #6c757d); + } + + .anndata-header__index { + padding: 8px 12px; + font-size: 11px; + font-family: var( + --anndata-font-mono, + ui-monospace, + SFMono-Regular, + "SF Mono", + Menlo, + Consolas, + monospace + ); + color: var(--anndata-text-secondary, #6c757d); + background: var(--anndata-bg-primary, #ffffff); + border-bottom: 1px solid var(--anndata-border-light, #e9ecef); + overflow: hidden; + text-overflow: ellipsis; + white-space: nowrap; + + strong { + color: var(--anndata-text-primary, #212529); + font-weight: 500; + } + } + + /* --- Badges --- */ + + .anndata-badge { + display: inline-flex; + align-items: center; + gap: 4px; + padding: 2px 8px; + font-size: 11px; + font-weight: 500; + border-radius: 10px; + white-space: nowrap; + } + + /* Badge modifiers must be siblings, not nested children. + In CSS nesting, &--modifier inside a doubly-nested rule produces + :is(parent child)--modifier which is an invalid selector. */ + .anndata-badge--view { + background: var(--anndata-info-color); + color: white; + } + + .anndata-badge--backed { + background: var(--anndata-success-color); + color: white; + } + + .anndata-badge--lazy { + background: var(--anndata-warning-color); + color: white; + } + + .anndata-badge--extension { + background: var(--anndata-accent-color); + color: white; + } + + /* --- README --- */ + + .anndata-readme__icon { + cursor: pointer; + font-size: 14px; + opacity: 0.7; + transition: opacity 0.15s; + margin-left: 4px; + + &:hover { + opacity: 1; + } + } + + .anndata-readme__overlay { + position: fixed; + top: 0; + left: 0; + right: 0; + bottom: 0; + background: rgba(0, 0, 0, 0.5); + display: flex; + align-items: center; + justify-content: center; + z-index: 10000; + padding: 20px; + } + + .anndata-readme__modal { + background: var(--anndata-bg-primary); + border: 1px solid var(--anndata-border-color); + border-radius: 8px; + box-shadow: 0 4px 20px rgba(0, 0, 0, 0.2); + max-width: 700px; + max-height: 80vh; + width: 100%; + display: flex; + flex-direction: column; + overflow: hidden; + } + + .anndata-readme__header { + display: flex; + align-items: center; + justify-content: space-between; + padding: 12px 16px; + border-bottom: 1px solid var(--anndata-border-color); + background: var(--anndata-bg-secondary); + + h3 { + margin: 0; + font-size: 14px; + font-weight: 600; + color: var(--anndata-text-primary); + } + } + + .anndata-readme__close { + background: none; + border: none; + font-size: 20px; + cursor: pointer; + color: var(--anndata-text-secondary); + padding: 0 4px; + line-height: 1; + + &:hover { + color: var(--anndata-text-primary); + } + } + + .anndata-readme__content { + padding: 16px; + overflow-y: auto; + font-size: 13px; + line-height: 1.6; + color: var(--anndata-text-primary); + + pre { + margin: 0; + white-space: pre-wrap; + word-wrap: break-word; + font-family: var(--anndata-font-mono); + font-size: 0.9em; + line-height: 1.5; + } + } + + /* --- Search box --- */ + + .anndata-search__box { + display: inline-flex; + align-items: center; + max-width: 300px; + border: 1px solid var(--anndata-border-color); + border-radius: var(--anndata-radius); + background: var(--anndata-bg-primary); + transition: border-color 0.15s; + + &:focus-within { + border-color: var(--anndata-accent-color); + } + + &.anndata-search__box--error { + border-color: #dc3545; + + .anndata-search__input { + background: rgba(220, 53, 69, 0.05); + } + } + } + + .anndata-search__input { + flex: 1; + min-width: 120px; + padding: 6px 8px; + font-size: 12px; + border: none; + background: transparent; + color: var(--anndata-text-primary); + outline: none; + + &::placeholder { + color: var(--anndata-text-muted); + } + } + + .anndata-search__indicator { + display: none; + margin-left: 8px; + font-size: 11px; + color: var(--anndata-accent-color); + + /* JS toggles .anndata--active (not BEM --active modifier) */ + &.anndata--active { + display: inline; + } + } + + .anndata-search__toggles { + display: flex; + gap: 1px; + padding-right: 4px; + border-left: 1px solid var(--anndata-border-light); + margin-left: 4px; + padding-left: 4px; + } + + .anndata-search__toggle { + display: none; /* Hidden until JS enables */ + align-items: center; + justify-content: center; + width: 20px; + height: 20px; + padding: 0; + border: none; + border-radius: 3px; + background: transparent; + color: var(--anndata-text-muted); + font-size: 10px; + font-family: var(--anndata-font-mono); + font-weight: 600; + cursor: pointer; + transition: all 0.15s; + + &:hover { + background: var(--anndata-bg-secondary); + color: var(--anndata-text-primary); + } + + /* JS toggles .anndata--active (not BEM --active modifier) */ + &.anndata--active { + background: var(--anndata-accent-color); + color: white; + } + } + + /* --- Sections --- */ + + .anndata-section { + border-bottom: 1px solid var(--anndata-border-light, #e9ecef); + + &:last-child { + border-bottom: none; + } + } + + .anndata-section > summary { + display: flex; + align-items: center; + gap: 8px; + padding: 8px 12px; + user-select: none; + background: var(--anndata-bg-primary, #ffffff); + transition: background-color 0.15s; + list-style: none; + + &::-webkit-details-marker { + display: none; + } + + &::before { + content: "\25BC"; /* ▼ */ + display: inline-flex; + align-items: center; + justify-content: center; + width: 16px; + height: 16px; + font-size: 10px; + color: var(--anndata-text-muted, #adb5bd); + transition: transform 0.15s; + transform-origin: center; + flex-shrink: 0; + } + + &:hover { + background: var(--anndata-bg-secondary, #f8f9fa); + } + } + + .anndata-section:not([open]) > summary::before { + transform: rotate(-90deg); + } + + .anndata-section__name { + font-weight: 600; + color: var(--anndata-text-primary, #212529); + } + + .anndata-section__count { + font-size: 11px; + color: var(--anndata-text-secondary, #6c757d); + } + + .anndata-section__help { + margin-left: auto; + padding: 2px 6px; + font-size: 11px; + color: var(--anndata-text-muted, #adb5bd); + text-decoration: none; + border-radius: var(--anndata-radius, 4px); + transition: + color 0.15s, + background-color 0.15s; + + &:hover { + color: var(--anndata-accent-color, #0d6efd); + background: var(--anndata-bg-tertiary, #e9ecef); + } + } + + .anndata-section__content { + padding: 0; + overflow: hidden; + } + + .anndata-section__entries { + display: grid; + grid-template-columns: var(--anndata-name-col-width, 150px) var( + --anndata-type-col-width, + 180px + ) 1fr; + font-size: 12px; + width: 100%; + } + + .anndata-section__truncated { + grid-column: 1 / -1; + padding: 8px 12px; + font-size: 11px; + color: var(--anndata-text-muted); + text-align: center; + font-style: italic; + } + + .anndata-section__empty { + padding: 8px 12px; + font-size: 11px; + color: var(--anndata-text-muted); + font-style: italic; + } + + /* --- Entries --- */ + + .anndata-entry { + grid-column: 1 / -1; + transition: background-color 0.1s; + + &:nth-child(even of .anndata-entry:not(.anndata-entry--hidden)) { + background: var(--anndata-bg-secondary); + } + + &:hover { + background: var(--anndata-bg-tertiary, #e9ecef); + } + + &.anndata-entry--hidden { + display: none; + } + + &.warning { + background: var(--anndata-warning-bg) !important; + } + + &.error { + background: var(--anndata-error-bg) !important; + } + } + + /* Copy button visibility on hover. + Using `.anndata-entry:hover` would propagate from any nested child row + up through every ancestor entry, revealing every ancestor's copy button + when a deeply-nested row is hovered. Scope the trigger to: + - the entry itself for plain (non-expandable) rows, and + - the for expandable rows, which doesn't contain nested + child entries (those live in `.anndata-entry__nested-content`). */ + div.anndata-entry:hover > .anndata-entry__name .anndata-entry__copy, + details.anndata-entry + > summary.anndata-entry__summary:hover + .anndata-entry__copy { + opacity: 1; + } + + /* Regular (non-expandable) entries use subgrid for column alignment */ + div.anndata-entry { + display: grid; + grid-template-columns: subgrid; + } + + /* Expandable entry summary — uses explicit columns (not subgrid) + so nested content below is a normal block child at full width */ + details.anndata-entry > summary.anndata-entry__summary { + display: grid; + grid-template-columns: var(--anndata-name-col-width, 150px) var( + --anndata-type-col-width, + 180px + ) 1fr; + list-style: none; + cursor: pointer; + position: relative; + + &::-webkit-details-marker { + display: none; + } + + /* Expand indicator arrow */ + &::after { + content: "\25B6"; /* ▶ */ + position: absolute; + right: 12px; + top: 50%; + transform: translateY(-50%); + font-size: 8px; + color: var(--anndata-accent-color); + transition: transform 0.15s; + } + } + + /* When open, rotate the arrow */ + details.anndata-entry[open] > summary.anndata-entry__summary::after { + transform: translateY(-50%) rotate(90deg); + } + + /* Reserve space for the expand indicator */ + details.anndata-entry > summary.anndata-entry__summary + > .anndata-entry__preview { + padding-right: 24px; + } + + /* Reset inline fallback styles on entry cells — grid controls sizing */ + .anndata-entry__name, + .anndata-entry__type, + .anndata-entry__preview { + display: block !important; + min-width: 0 !important; + vertical-align: baseline !important; + } + + .anndata-entry__name { + font-family: var( + --anndata-font-mono, + ui-monospace, + SFMono-Regular, + "SF Mono", + Menlo, + Consolas, + monospace + ); + font-size: var(--anndata-font-size, 13px); + font-weight: 500; + color: var(--anndata-text-primary, #212529); + white-space: nowrap; + text-align: left; + padding: 6px 12px; + align-self: center; + } + + .anndata-entry__name-inner { + display: flex; + align-items: center; + gap: 4px; + min-width: 0; /* Allow flex child to shrink below content size */ + } + + .anndata-entry__name-text { + overflow: hidden; + text-overflow: ellipsis; + white-space: nowrap; + flex: 1; + min-width: 0; /* Allow text to shrink and show ellipsis */ + font-size: inherit; /* Prevent mobile browsers from auto-sizing truncated text */ + } + + .anndata-entry__type { + font-family: var( + --anndata-font-mono, + ui-monospace, + SFMono-Regular, + "SF Mono", + Menlo, + Consolas, + monospace + ); + font-size: 11px; + color: var(--anndata-text-secondary, #6c757d); + text-align: left; + white-space: nowrap; + padding: 6px 12px; + align-self: center; + } + + .anndata-entry__preview { + font-size: 11px; + color: var(--anndata-text-muted, #adb5bd); + text-align: right; + /* Default: allow wrapping for graceful no-JS degradation */ + white-space: normal; + word-break: break-word; + padding: 6px 12px; + align-self: center; + min-width: 0; + } + + /* Copy button */ + .anndata-entry__copy { + display: inline-flex; + align-items: center; + justify-content: center; + width: 16px; + height: 16px; + padding: 0; + background: transparent; + border: none; + border-radius: 2px; + cursor: pointer; + opacity: 0; + transition: + opacity 0.15s, + background-color 0.15s; + flex-shrink: 0; + position: relative; + + &::before, + &::after { + content: ""; + position: absolute; + width: 7px; + height: 8px; + border: 1.5px solid var(--anndata-text-muted); + border-radius: 1px; + background: var(--anndata-bg-primary); + } + + &::before { + top: 2px; + left: 2px; + } + + &::after { + top: 5px; + left: 5px; + } + + &:hover { + &::before, + &::after { + border-color: var(--anndata-accent-color); + } + } + + /* When copied, hide squares and show checkmark */ + &.anndata-entry__copy--copied { + &::before, + &::after { + display: none; + } + + &::before { + display: block; + content: "\2713"; + width: auto; + height: auto; + border: none; + background: none; + color: var(--anndata-success-color); + font-size: 12px; + font-weight: bold; + top: 50%; + left: 50%; + transform: translate(-50%, -50%); + } + } + } + + .anndata-entry__warning { + font-size: 10px; + font-weight: 600; + color: var(--anndata-warning-color); + cursor: pointer; + margin-left: 2px; + } + + /* Nested content area (inside expandable entries) */ + .anndata-entry__nested-content { + margin-left: 0; + padding: 12px; + background: var(--anndata-bg-secondary); + overflow: hidden; + } + + .anndata-entry__expanded { + display: block; + overflow-x: auto; + overflow-y: hidden; + padding: 12px; + box-sizing: border-box; + -webkit-overflow-scrolling: touch; + + /* Jupyter-like table styling for embedded DataFrames */ + > div { + display: inline-block; + } + + table { + border-collapse: collapse; + border-spacing: 0; + border: none; + font-size: 12px; + font-family: var(--anndata-font-mono); + table-layout: auto; + margin: 0; + white-space: nowrap; + } + + thead { + border-bottom: 1px solid var(--anndata-border-color); + vertical-align: bottom; + } + + th, + td { + vertical-align: middle; + padding: 6px 10px; + line-height: normal; + border: none; + text-align: right; + } + + th { + font-weight: 600; + color: var(--anndata-text-primary); + background: var(--anndata-bg-secondary); + } + + td { + color: var(--anndata-text-primary); + } + + tbody tr { + &:nth-child(odd) { + background: var(--anndata-bg-primary); + } + + &:nth-child(even) { + background: var(--anndata-bg-secondary); + } + + &:hover { + background: var(--anndata-highlight); + } + } + + /* Nested AnnData wrapper — fill width, no extra padding */ + &:has(> .anndata-entry__nested-anndata) { + padding: 0; + text-align: left; + } + + > .anndata-entry__nested-anndata { + display: block; + width: 100%; + margin: 0; + box-sizing: border-box; + } + } + + /* Nested .anndata-repr must fill its container */ + .anndata-entry__nested-anndata > .anndata-repr { + margin: 0; + width: 100%; + box-sizing: border-box; + } + + /* --- Dtype colors --- */ + + .anndata-dtype--category { + color: var(--anndata-dtype-category); + } + .anndata-dtype--int { + color: var(--anndata-dtype-int); + } + .anndata-dtype--float { + color: var(--anndata-dtype-float); + } + .anndata-dtype--bool { + color: var(--anndata-dtype-bool); + } + .anndata-dtype--string { + color: var(--anndata-dtype-string); + } + .anndata-dtype--object { + color: var(--anndata-dtype-object); + } + .anndata-dtype--sparse { + color: var(--anndata-dtype-sparse); + } + .anndata-dtype--ndarray { + color: var(--anndata-dtype-array); + } + .anndata-dtype--dataframe { + color: var(--anndata-dtype-dataframe); + } + .anndata-dtype--anndata { + color: var(--anndata-dtype-anndata); + font-weight: 600; + } + .anndata-dtype--unknown { + color: var(--anndata-dtype-unknown); + font-style: italic; + } + .anndata-dtype--extension { + color: var(--anndata-dtype-extension); + } + .anndata-dtype--dask { + color: var(--anndata-dtype-dask); + } + .anndata-dtype--gpu { + color: var(--anndata-dtype-gpu); + } + .anndata-dtype--tpu { + color: var(--anndata-dtype-tpu); + } + .anndata-dtype--awkward { + color: var(--anndata-dtype-awkward); + } + .anndata-dtype--array-api { + color: var(--anndata-dtype-array-api); + } + + /* --- Color swatches --- */ + + .anndata-colors { + display: inline-flex; + gap: 2px; + margin-left: 6px; + vertical-align: middle; + } + + .anndata-colors__swatch { + display: inline-block; + width: 12px; + height: 12px; + border-radius: 2px; + border: 1px solid var(--anndata-border-color); + } + + .anndata-colors__swatch--invalid { + display: inline-flex; + align-items: center; + justify-content: center; + font-size: 10px; + font-weight: bold; + color: var(--anndata-text-muted); + background: var(--anndata-bg-tertiary); + cursor: help; + } + + /* --- Category / Columns lists --- */ + + .anndata-categories { + display: inline; + white-space: normal; + word-break: break-word; + color: var(--anndata-text-muted); + } + + .anndata-categories__sep { + display: none; + } + + .anndata-categories__item { + display: inline-flex; + align-items: center; + gap: 3px; + margin-right: 8px; + } + + .anndata-categories__wrap, + .anndata-columns__wrap { + display: none; + background: transparent; + border: none; + color: var(--anndata-text-muted); + cursor: pointer; + font-size: 11px; + padding: 0 4px; + margin-left: 4px; + transition: color 0.15s; + vertical-align: middle; + + &:hover { + color: var(--anndata-accent-color); + } + } + + .anndata-columns { + display: inline; + white-space: normal; + word-break: break-word; + color: var(--anndata-text-muted); + } + + /* --- X entry row --- */ + + .anndata-x__entry { + display: flex; + align-items: center; + gap: 12px; + padding: 6px 12px; + border-bottom: 1px solid var(--anndata-border-light, #e9ecef); + color: var(--anndata-text-secondary, #6c757d); + + > span:first-child { + font-family: var(--anndata-font-mono); + font-weight: 600; + min-width: 60px; + } + + > span:last-child { + font-family: var(--anndata-font-mono); + font-size: 11px; + } + } + + /* --- Depth limit --- */ + + .anndata-depth-limit { + padding: 8px 12px; + font-size: 11px; + color: var(--anndata-text-muted); + background: var(--anndata-bg-tertiary); + border-radius: var(--anndata-radius); + text-align: center; + } + + /* --- Header filepath --- */ + + .anndata-header__filepath { + font-family: var(--anndata-font-mono); + font-size: 11px; + color: var(--anndata-text-secondary, #6c757d); + } + + /* --- Spacer (flex-grow pushes siblings apart) --- */ + + .anndata-spacer { + flex-grow: 1; + } + + /* --- Category dot (color indicator) --- */ + + .anndata-categories__dot { + width: 8px; + height: 8px; + border-radius: 50%; + display: inline-block; + } + + /* --- Custom type content --- */ + + .anndata-entry__custom { + margin-top: 4px; + } + + /* --- Error entry --- */ + + .anndata-entry--error { + color: var(--anndata-error-color, #dc3545); + padding: 4px 8px; + font-size: 12px; + } + + .anndata-badge--error { + color: var(--anndata-error-color, #dc3545); + } + + /* --- Footer --- */ + + .anndata-footer { + display: flex; + justify-content: space-between; + padding: 4px 12px; + font-size: 10px; + font-family: + -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif; + color: var(--anndata-text-muted); + border-top: 1px solid var(--anndata-border-light); + } + + /* --- Text helpers --- */ + + .anndata-text--muted { + color: var(--anndata-text-muted); + } + + .anndata-text--error { + color: var(--anndata-error-color); + font-weight: 500; + } + + .anndata-text--warning { + color: var(--anndata-warning-color); + } +} diff --git a/src/anndata/_repr/static/repr.js b/src/anndata/_repr/static/repr.js new file mode 100644 index 000000000..e202be002 --- /dev/null +++ b/src/anndata/_repr/static/repr.js @@ -0,0 +1,477 @@ +// AnnData HTML Representation JavaScript +// This file provides interactivity for the HTML repr. +// The {container_id} placeholder is replaced at runtime. + +// Mark container as JS-enabled (shows interactive elements) +container.classList.add("anndata-repr--js") + +// Show interactive elements (hidden by default for no-JS graceful degradation) +for (const btn of container.querySelectorAll(".anndata-entry__copy")) { + btn.style.display = "inline-flex" +} +for (const box of container.querySelectorAll(".anndata-search__box")) { + box.style.display = "inline-flex" +} +for (const btn of container.querySelectorAll(".anndata-search__toggle")) { + btn.style.display = "inline-flex" +} +// Filter indicator is shown via CSS .active class, no need to set display here + +// Hide the no-JS hint now that JavaScript is running +for (const el of container.querySelectorAll(".anndata-repr__hint-nojs")) { + el.style.display = "none" +} + +// Section collapse is handled natively by
/ elements. +// No JS needed for section toggle — the browser handles open/close state. + +// Search/filter functionality +const searchBox = container.querySelector(".anndata-search__box") +const searchInput = container.querySelector(".anndata-search__input") +const filterIndicator = container.querySelector(".anndata-search__indicator") +const caseToggle = container.querySelector(".anndata-search__toggle--case") +const regexToggle = container.querySelector(".anndata-search__toggle--regex") + +// Search state +let caseSensitive = false +let useRegex = false + +if (searchInput) { + let debounceTimer + + const triggerFilter = () => { + clearTimeout(debounceTimer) + debounceTimer = setTimeout(() => { + filterEntries(searchInput.value.trim()) + }, 150) + } + + searchInput.addEventListener("input", triggerFilter) + + // Clear on Escape + searchInput.addEventListener("keydown", (e) => { + if (e.key === "Escape") { + searchInput.value = "" + filterEntries("") + } + }) + + // Toggle button handlers + if (caseToggle) { + caseToggle.addEventListener("click", (e) => { + e.stopPropagation() + caseSensitive = !caseSensitive + caseToggle.classList.toggle("anndata--active", caseSensitive) + caseToggle.setAttribute("aria-pressed", caseSensitive) + triggerFilter() + }) + } + + if (regexToggle) { + regexToggle.addEventListener("click", (e) => { + e.stopPropagation() + useRegex = !useRegex + regexToggle.classList.toggle("anndata--active", useRegex) + regexToggle.setAttribute("aria-pressed", useRegex) + triggerFilter() + }) + } +} + +// Helper: test if text matches query (respects case sensitivity and regex mode) +function matchesQuery(text, query) { + if (!query) return true + if (useRegex) { + try { + const flags = caseSensitive ? "" : "i" + const regex = new RegExp(query, flags) + if (searchBox) + searchBox.classList.remove("anndata-search__box--error") + return regex.test(text) + } catch { + // Invalid regex - show error state but don't crash + if (searchBox) searchBox.classList.add("anndata-search__box--error") + return false + } + } else { + if (searchBox) searchBox.classList.remove("anndata-search__box--error") + if (caseSensitive) { + return text.includes(query) + } else { + return text.toLowerCase().includes(query.toLowerCase()) + } + } +} + +function filterEntries(query) { + let totalMatches = 0 + let totalEntries = 0 + + // First pass: mark all entries as hidden or not based on direct match + const entries = container.querySelectorAll(".anndata-entry") + const directMatches = new Set() + + for (const entry of entries) { + totalEntries++ + + const key = entry.dataset.key || "" + const dtype = entry.dataset.dtype || "" + const text = entry.textContent + + const matches = + !query || + matchesQuery(key, query) || + matchesQuery(dtype, query) || + matchesQuery(text, query) + + if (matches) { + directMatches.add(entry) + entry.classList.remove("anndata-entry--hidden") + totalMatches++ + + // Expand parent sections to show match + const section = entry.closest(".anndata-section") + if (section && !section.open) { + section.open = true + } + + // Expand nested content if match is inside nested area + const nestedContent = entry.closest( + ".anndata-entry__nested-content", + ) + if (nestedContent) { + const expandableEntry = nestedContent.closest( + "details.anndata-entry", + ) + if (expandableEntry && !expandableEntry.open) { + expandableEntry.open = true + } + } + } else { + entry.classList.add("anndata-entry--hidden") + } + } + + // Second pass: if a nested entry matches, show all ancestor entry rows + // This ensures that when searching for something inside a nested AnnData, + // all parent rows remain visible so the user can expand them to see the match + if (query) { + for (const matchedEntry of directMatches) { + // Walk up the DOM tree to find and show all parent entry rows + // Safety limit prevents infinite loops (max nesting depth is typically 3) + let element = matchedEntry + let iterations = 0 + const maxIterations = 20 + + while ( + element && + element !== container && + iterations < maxIterations + ) { + iterations++ + // Check if we're inside a nested content container + const nestedContainer = element.closest( + ".anndata-entry__nested-content", + ) + if (!nestedContainer) break + + // Find the parent entry that contains this nested content + // Structure: details.anndata-entry > .anndata-entry__nested-content + const parentEntry = nestedContainer.closest(".anndata-entry") + if (!parentEntry) break + + if (parentEntry.classList.contains("anndata-entry--hidden")) { + parentEntry.classList.remove("anndata-entry--hidden") + totalMatches++ + } + + // Open the expandable entry so nested content is visible + const expandableEntry = nestedContainer.closest( + "details.anndata-entry", + ) + if (expandableEntry && !expandableEntry.open) { + expandableEntry.open = true + } + + // Continue searching from the parent entry's container + element = parentEntry.parentElement + } + } + } + + // Also filter X entries in nested AnnData (they use anndata-x__entry class, not anndata-entry) + // This prevents orphaned X rows from showing when their sibling entries are hidden + if (query) { + for (const xEntry of container.querySelectorAll( + ".anndata-entry__nested-content .anndata-x__entry", + )) { + // Check if the nested AnnData has any visible entries + const nestedRepr = xEntry.closest(".anndata-repr") + if (nestedRepr) { + const hasVisibleEntries = nestedRepr.querySelector( + ".anndata-entry:not(.anndata-entry--hidden)", + ) + xEntry.style.display = hasVisibleEntries ? "" : "none" + } + } + } else { + // Reset X entries when no query + for (const xEntry of container.querySelectorAll( + ".anndata-entry__nested-content .anndata-x__entry", + )) { + xEntry.style.display = "" + } + } + + // Update filter indicator + if (filterIndicator) { + if (query) { + filterIndicator.classList.add("anndata--active") + filterIndicator.textContent = `Showing ${totalMatches} of ${totalEntries}` + } else { + filterIndicator.classList.remove("anndata--active") + } + } + + // Hide sections with no visible entries + for (const section of container.querySelectorAll(".anndata-section")) { + const visibleEntries = section.querySelectorAll( + ".anndata-entry:not(.anndata-entry--hidden)", + ) + + if (query && visibleEntries.length === 0) { + section.style.display = "none" + } else { + section.style.display = "" + } + } +} + +// Copy to clipboard +for (const btn of container.querySelectorAll(".anndata-entry__copy")) { + btn.addEventListener("click", async (e) => { + e.stopPropagation() + + const text = btn.dataset.copy + if (!text) return + + try { + await navigator.clipboard.writeText(text) + + // Visual feedback (icon turns green via CSS) + btn.classList.add("anndata-entry__copy--copied") + setTimeout( + () => btn.classList.remove("anndata-entry__copy--copied"), + 1500, + ) + } catch { + // Fallback for older browsers + const textarea = document.createElement("textarea") + textarea.value = text + textarea.style.position = "fixed" + textarea.style.opacity = "0" + document.body.appendChild(textarea) + textarea.select() + + try { + document.execCommand("copy") + btn.classList.add("anndata-entry__copy--copied") + setTimeout( + () => btn.classList.remove("anndata-entry__copy--copied"), + 1500, + ) + } catch (e) { + console.error("Copy failed:", e) + } + + document.body.removeChild(textarea) + } + }) +} + +// Helper to check if element is overflowing +function isOverflowing(el) { + return el.scrollWidth > el.clientWidth +} + +// Helper to update wrap button visibility based on overflow +function updateWrapButtonVisibility(btn, list, metaCell, wrappedClass) { + if (!list || !metaCell) { + btn.style.display = "none" + return + } + // Show button only if content is overflowing or currently wrapped + const isWrapped = list.classList.contains(wrappedClass) + const overflows = isOverflowing(metaCell) + btn.style.display = overflows || isWrapped ? "inline" : "none" +} + +// Factory function to set up wrap button handlers (DRY pattern for cats/cols buttons) +function setupWrapButtons(buttonSelector, listSelector, wrappedClass) { + for (const btn of container.querySelectorAll(buttonSelector)) { + const entry = btn.closest(".anndata-entry") + const metaCell = entry + ? entry.querySelector(".anndata-entry__preview") + : null + const list = metaCell ? metaCell.querySelector(listSelector) : null + + // Initial visibility check + updateWrapButtonVisibility(btn, list, metaCell, wrappedClass) + + btn.addEventListener("click", (e) => { + e.stopPropagation() + if (!list || !metaCell) return + + const isWrapped = list.classList.toggle(wrappedClass) + metaCell.classList.toggle("anndata-entry--expanded", isWrapped) + btn.textContent = isWrapped ? "▲" : "▼" + btn.title = isWrapped + ? "Collapse to single line" + : "Expand to multi-line view" + // Always show button when wrapped + btn.style.display = "inline" + }) + } +} + +// Set up wrap buttons for categories and columns lists +setupWrapButtons( + ".anndata-categories__wrap", + ".anndata-categories", + "anndata-categories--wrapped", +) +setupWrapButtons( + ".anndata-columns__wrap", + ".anndata-columns", + "anndata-columns--wrapped", +) + +// Update button visibility on container resize (works for JupyterLab panes too) +// Uses the same selector pairs as setupWrapButtons for consistency +function updateAllWrapButtons() { + for (const [btnSel, listSel, wrappedClass] of [ + [ + ".anndata-categories__wrap", + ".anndata-categories", + "anndata-categories--wrapped", + ], + [ + ".anndata-columns__wrap", + ".anndata-columns", + "anndata-columns--wrapped", + ], + ]) { + for (const btn of container.querySelectorAll(btnSel)) { + const entry = btn.closest(".anndata-entry") + const metaCell = entry + ? entry.querySelector(".anndata-entry__preview") + : null + const list = metaCell ? metaCell.querySelector(listSel) : null + updateWrapButtonVisibility(btn, list, metaCell, wrappedClass) + } + } +} + +// Use ResizeObserver for robust resize detection (pane resizes, not just window) +if (typeof ResizeObserver !== "undefined") { + let resizeTimer + const resizeObserver = new ResizeObserver(() => { + clearTimeout(resizeTimer) + resizeTimer = setTimeout(updateAllWrapButtons, 100) + }) + resizeObserver.observe(container) +} else { + // Fallback for older browsers + let resizeTimer + window.addEventListener("resize", () => { + clearTimeout(resizeTimer) + resizeTimer = setTimeout(updateAllWrapButtons, 100) + }) +} + +// README modal functionality +const readmeIcon = container.querySelector(".anndata-readme__icon") +if (readmeIcon) { + // Ensure accessibility attributes + readmeIcon.setAttribute("role", "button") + readmeIcon.setAttribute("tabindex", "0") + readmeIcon.setAttribute("aria-label", "View README") + + readmeIcon.addEventListener("click", (e) => { + e.stopPropagation() + const readmeContent = readmeIcon.dataset.readme + if (!readmeContent) return + + // Create modal overlay + const overlay = document.createElement("div") + overlay.className = "anndata-readme__overlay" + + // Create modal with accessibility attributes + // Use container.id to make IDs unique across multiple cells + const modalTitleId = `${container.id}-readme-modal-title` + const modal = document.createElement("div") + modal.className = "anndata-readme__modal" + modal.setAttribute("role", "dialog") + modal.setAttribute("aria-modal", "true") + modal.setAttribute("aria-labelledby", modalTitleId) + + // Header + const header = document.createElement("div") + header.className = "anndata-readme__header" + + const title = document.createElement("h3") + title.id = modalTitleId + title.textContent = "README" + header.appendChild(title) + + const closeBtn = document.createElement("button") + closeBtn.className = "anndata-readme__close" + closeBtn.textContent = "×" + closeBtn.setAttribute("aria-label", "Close") + header.appendChild(closeBtn) + + // Content — plain text (no markdown parsing, XSS-safe via textContent) + const content = document.createElement("div") + content.className = "anndata-readme__content" + const pre = document.createElement("pre") + pre.textContent = readmeContent + content.appendChild(pre) + + modal.appendChild(header) + modal.appendChild(content) + overlay.appendChild(modal) + + // Add to container (scoped styles apply) + container.appendChild(overlay) + + // Close handlers + const closeModal = () => { + overlay.remove() + } + + closeBtn.addEventListener("click", closeModal) + overlay.addEventListener("click", (e) => { + if (e.target === overlay) closeModal() + }) + + // Escape key closes modal + const escHandler = (e) => { + if (e.key === "Escape") { + closeModal() + document.removeEventListener("keydown", escHandler) + } + } + document.addEventListener("keydown", escHandler) + + // Focus trap + closeBtn.focus() + }) + + // Keyboard accessibility for the icon + readmeIcon.addEventListener("keydown", (e) => { + if (e.key === "Enter" || e.key === " ") { + e.preventDefault() + readmeIcon.click() + } + }) +} diff --git a/src/anndata/_repr/utils.py b/src/anndata/_repr/utils.py new file mode 100644 index 000000000..2a87f2595 --- /dev/null +++ b/src/anndata/_repr/utils.py @@ -0,0 +1,835 @@ +""" +Utility functions for HTML representation. + +This module provides: +- Serialization checking using the anndata IO registry +- String-to-category warning detection +- Color list detection and validation +- HTML escaping and sanitization +- Memory size formatting +""" + +from __future__ import annotations + +import html +import re +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from collections.abc import Sequence + +import numpy as np + +from .._repr_constants import ( + DICT_PREVIEW_KEYS, + DICT_PREVIEW_KEYS_LARGE, + LIST_PREVIEW_ITEMS, + STRING_INLINE_LIMIT, +) + +if TYPE_CHECKING: + import pandas as pd + + from anndata import AnnData + + from .registry import FormatterContext + + +def _check_serializable_single(obj: object) -> tuple[bool, str]: + """Check if a single (non-container) object is serializable.""" + # Handle None + if obj is None: + return True, "" + + # Use the actual IO registry + try: + from .._io.specs.registry import _REGISTRY + + _REGISTRY.get_spec(obj) + return True, "" + except (KeyError, TypeError): + pass + + # Check for basic Python types that are serializable + if isinstance(obj, (bool, int, float, str, bytes)): + return True, "" + + # Check numpy scalar types + if isinstance(obj, np.generic): + return True, "" + + return ( + False, + f"Type '{type(obj).__module__}.{type(obj).__name__}' has no registered writer", + ) + + +def is_serializable( + obj: object, + *, + _depth: int = 0, + _max_depth: int = 10, +) -> tuple[bool, str]: + """ + Check if an object can be serialized to H5AD/Zarr. + + Uses the actual anndata IO registry to check if a type has a registered writer. + For containers (dict, list), recursively checks all elements. + + Parameters + ---------- + obj + Object to check + _depth + Current recursion depth (internal) + _max_depth + Maximum recursion depth to prevent infinite loops + + Returns + ------- + tuple of (is_serializable, reason_if_not) + """ + if _depth > _max_depth: + return False, "Maximum nesting depth exceeded" + + # Check containers recursively + if isinstance(obj, dict): + for k, v in obj.items(): + ok, reason = is_serializable(v, _depth=_depth + 1, _max_depth=_max_depth) + if not ok: + return False, f"Key '{k}': {reason}" + return True, "" + + if isinstance(obj, (list, tuple)): + for i, v in enumerate(obj): + ok, reason = is_serializable(v, _depth=_depth + 1, _max_depth=_max_depth) + if not ok: + return False, f"Index {i}: {reason}" + return True, "" + + return _check_serializable_single(obj) + + +def should_warn_string_column( + series: pd.Series, n_unique: int | None +) -> tuple[bool, str]: + """ + Check if a string column will be auto-converted to categorical on save. + + This replicates the logic from AnnData.strings_to_categoricals() + (see _core/anndata.py:1249-1259): + - Column must be string type (infer_dtype == "string") + - Number of unique values must be less than total values + + Parameters + ---------- + series + Pandas Series to check + n_unique + Pre-computed nunique value (None if skipped due to unique_limit or lazy) + + Returns + ------- + tuple of (should_warn, warning_message) + """ + # Can't check if n_unique wasn't computed + if n_unique is None: + return False, "" + + from pandas.api.types import infer_dtype + + # Same check as AnnData.strings_to_categoricals() + dtype_str = infer_dtype(series) + if dtype_str != "string": + return False, "" + + n_total = len(series) + if n_unique < n_total: + return ( + True, + f"String column ({n_unique} unique). " + f"Will be converted to categorical on save.", + ) + + return False, "" + + +def _is_color_string(s: str) -> bool: + """Check if a string looks like a color value.""" + if s.startswith("#"): + return True + s_lower = s.lower() + if s_lower in _NAMED_COLORS: + return True + return s_lower.startswith(("rgb(", "rgba(")) + + +def sanitize_css_color(color: str) -> str | None: # noqa: PLR0911 + """ + Sanitize a color string for safe use in CSS style attributes. + + Returns the sanitized color if valid, or None if the color is invalid + or potentially dangerous (contains CSS injection attempts). + + This is critical for security - color values go into style attributes + and must not allow CSS injection (e.g., "red; background-image: url(...)"). + + Note: Multiple returns are intentional for clarity in validating different + color formats (hex, named, rgb/rgba). + + Parameters + ---------- + color + The color string to sanitize + + Returns + ------- + The sanitized color string, or None if invalid/unsafe + """ + if not isinstance(color, str): + return None + + color = color.strip() + if not color: + return None + + # Length limit to prevent DoS via very long strings + if len(color) > 50: + return None + + # Hex colors: #RGB, #RRGGBB, or #RRGGBBAA (strict whitelist) + if color.startswith("#"): + hex_part = color[1:] + if len(hex_part) in (3, 4, 6, 8) and all( + c in "0123456789abcdefABCDEF" for c in hex_part + ): + return color + return None + + # Named colors - must exactly match a known CSS color name (whitelist) + color_lower = color.lower() + if color_lower in _NAMED_COLORS: + return color_lower + + # rgb() and rgba() - WHITELIST approach: only allow safe characters + if color_lower.startswith("rgb"): + # Only these characters can appear in valid rgb/rgba colors + safe_chars = set("rgbaRGBA0123456789(),. %") + if not all(c in safe_chars for c in color): + return None + # Validate rgb/rgba format strictly with regex + rgb_pattern = r"^rgba?\(\s*\d{1,3}%?\s*,\s*\d{1,3}%?\s*,\s*\d{1,3}%?\s*(,\s*(0|1|0?\.\d+))?\s*\)$" + if re.match(rgb_pattern, color_lower): + return color + return None + + # Reject everything else - no hsl(), var(), url(), expression(), etc. + return None + + +def is_color_list(key: str, value: object) -> bool: + """ + Check if a value is a color list following the *_colors convention. + + Parameters + ---------- + key + The key name (should end with '_colors') + value + The value to check + + Returns + ------- + True if this appears to be a color list + """ + if not isinstance(key, str) or not key.endswith("_colors"): + return False + if not isinstance(value, (list, np.ndarray, tuple)): + return False + # Empty list is valid + if len(value) == 0: + return True + # Check first element + first = value[0] + return isinstance(first, str) and _is_color_string(first) + + +def _get_categories_from_column(col: object) -> list: + """ + Get categories from a categorical column. + + Works for both pandas Series (.cat.categories) and xarray DataArray + (dtype.categories). Returns empty list if categories cannot be extracted. + """ + try: + # Pandas Series + if hasattr(col, "cat"): + return list(col.cat.categories) + + # xarray DataArray or other objects with CategoricalDtype + if hasattr(col, "dtype") and hasattr(col.dtype, "categories"): + return list(col.dtype.categories) + except Exception as e: # noqa: BLE001 + from .._warnings import warn + + warn( + f"Failed to extract categories from column: {type(e).__name__}: {e}", + UserWarning, + ) + + return [] + + +def get_categories_for_display( + col: object, + context: FormatterContext, + *, + is_lazy: bool, +) -> tuple[list, bool, int | None]: + """ + Get categories for a column, handling lazy loading appropriately. + + Parameters + ---------- + col + The column to get categories from + context + FormatterContext with display settings + is_lazy + Whether this is a lazy column (from read_lazy()) + + Returns + ------- + tuple of (categories_list, was_truncated, n_categories) + categories_list: List of category values + was_truncated: True if categories were truncated for lazy columns + n_categories: Total number of categories (if known) + """ + if is_lazy: + from .lazy import get_lazy_categories + + return get_lazy_categories(col, context) + + # Non-lazy categorical - use unified accessor + categories = _get_categories_from_column(col) + return categories, False, len(categories) if categories else None + + +def _compute_if_dask(obj: object) -> object: + """ + Compute a dask array/object if it is one, otherwise return as-is. + + For lazy AnnData, uns values may be dask arrays that need to be + computed to get the actual values. + """ + if hasattr(obj, "compute"): + return obj.compute() + return obj + + +def get_matching_column_colors( + adata: AnnData, + column_name: str, + *, + limit: int | None = None, +) -> list[str] | None: + """ + Get colors for a column from uns if they exist. + + This function is called by CategoricalFormatter which already verified + the column is categorical. It just looks up and returns the colors. + Color count validation is done separately by check_color_category_mismatch. + + Parameters + ---------- + adata + AnnData object + column_name + Name of the column to get colors for + limit + If provided, only load the first `limit` colors. This avoids loading + all colors from disk when only displaying partial categories. + + Returns + ------- + List of color strings if colors exist, None otherwise + """ + colors = _get_colors_from_uns(adata, column_name, limit=limit) + return list(colors) if colors is not None else None + + +def check_color_category_mismatch( + adata: AnnData, + column_name: str, + n_categories: int, +) -> str | None: + """ + Check if colors exist but don't match category count. + + Called by _render_dataframe_entry for categorical columns. The caller + already knows this is categorical and has the category count. + + Parameters + ---------- + adata + AnnData object (or object with .uns attribute) + column_name + Name of the column to check + n_categories + Number of categories in the column + + Returns + ------- + Warning message if mismatch, None otherwise + """ + colors = _get_colors_from_uns(adata, column_name) + if colors is None: + return None + + if len(colors) != n_categories: + return f"Color mismatch: {len(colors)} colors for {n_categories} categories" + + return None + + +def count_invalid_colors(colors: Sequence) -> int: + """ + Count colors that fail sanitization. + + Parameters + ---------- + colors + Sequence of color values to check + + Returns + ------- + Number of colors that fail sanitize_css_color validation + """ + return sum(1 for c in colors if sanitize_css_color(str(c)) is None) + + +def format_invalid_colors_warning(invalid_count: int, *, has_more: bool = False) -> str: + """ + Format a warning message for invalid colors. + + Parameters + ---------- + invalid_count + Number of invalid colors found + has_more + If True, adds "+" suffix to indicate more unchecked colors + + Returns + ------- + Formatted warning message like "2 invalid colors" or "2+ invalid colors" + """ + suffix = "+" if has_more else "" + s = "s" if invalid_count > 1 else "" + return f"{invalid_count}{suffix} invalid color{s}" + + +def check_invalid_colors( + adata: AnnData, + column_name: str, + limit: int | None = None, + n_total: int | None = None, +) -> str | None: + """ + Check if any colors in the color list are invalid or unsafe. + + Called by CategoricalFormatter for categorical columns that have associated + colors in .uns. + + Parameters + ---------- + adata + AnnData object (or object with .uns attribute) + column_name + Name of the column to check + limit + If provided, only check the first `limit` colors (for lazy loading). + n_total + Total number of colors expected (e.g., n_categories). Used to determine + if there are unchecked colors beyond the limit. + + Returns + ------- + Warning message if invalid colors found, None otherwise + """ + colors = _get_colors_from_uns(adata, column_name, limit=limit) + if colors is None: + return None + + invalid_count = count_invalid_colors(colors) + if invalid_count == 0: + return None + + has_more = n_total is not None and limit is not None and n_total > limit + return format_invalid_colors_warning(invalid_count, has_more=has_more) + + +def _get_colors_from_uns( + adata: AnnData, + column_name: str, + limit: int | None = None, +) -> object | None: + """Get colors from uns for a column, handling lazy loading. + + Parameters + ---------- + adata + AnnData object (or object with .uns attribute) + column_name + Name of the column (colors key will be "{column_name}_colors") + limit + If provided, only load the first `limit` colors (for dask arrays) + + Returns + ------- + Colors array/list if found, None otherwise + """ + # Handle objects without .uns (e.g., Raw) + if not hasattr(adata, "uns"): + return None + + color_key = f"{column_name}_colors" + if color_key not in adata.uns: + return None + + colors = adata.uns[color_key] + + # For lazy AnnData with dask arrays, slice before computing + if limit is not None and hasattr(colors, "compute"): + return colors[:limit].compute() + + # Compute if dask array (for lazy AnnData) + return _compute_if_dask(colors) + + +def format_index_preview(index: pd.Index, preview_n: int = 5) -> str: + """Format a preview of a pandas Index. + + Shows first and last items with ellipsis in between for long indices. + Handles bytes index values (from older h5ad files) by decoding them. + + Parameters + ---------- + index + The pandas Index to preview + preview_n + Number of items to show at the start and end + + Returns + ------- + Comma-separated preview string, or ``empty`` for empty indices. + """ + n = len(index) + if n == 0: + return "empty" + + def _format_value(x: object) -> str: + """Format a single index value, decoding bytes if needed.""" + if isinstance(x, bytes): + try: + return x.decode("utf-8") + except UnicodeDecodeError: + return x.decode("latin-1") + return str(x) + + if n <= preview_n * 2: + items = [escape_html(_format_value(x)) for x in index] + else: + first = [escape_html(_format_value(x)) for x in index[:preview_n]] + last = [escape_html(_format_value(x)) for x in index[-preview_n:]] + items = [*first, "...", *last] + + return ", ".join(items) + + +def escape_html(text: str) -> str: + """Escape HTML special characters and replace null bytes. + + Null bytes in user data (e.g., column names like ``"null\\x00byte"``) + break HTML parsers and cause truncated rendering. They are replaced + with the Unicode replacement character U+FFFD. + """ + return html.escape(str(text).replace("\x00", "\ufffd")) + + +def sanitize_for_id(text: str) -> str: + """Sanitize a string for use as an HTML id attribute.""" + # Replace non-alphanumeric chars with underscore + sanitized = re.sub(r"[^a-zA-Z0-9_-]", "_", str(text)) + # Ensure it starts with a letter + if sanitized and not sanitized[0].isalpha(): + sanitized = "id_" + sanitized + return sanitized + + +def truncate_string(text: str, max_length: int = 100) -> str: + """Truncate a string and add ellipsis if needed.""" + text = str(text) + if len(text) <= max_length: + return text + return text[: max_length - 3] + "..." + + +def format_memory_size(size_bytes: float) -> str: + """Format memory size in human-readable form.""" + if size_bytes < 0: + return "Unknown" + + for unit in ("B", "KB", "MB", "GB", "TB"): + if abs(size_bytes) < 1024: + if unit == "B": + return f"{int(size_bytes)} {unit}" + return f"{size_bytes:.1f} {unit}" + size_bytes /= 1024 + + return f"{size_bytes:.1f} PB" + + +def format_number(n: float | str) -> str: + """Format a number with thousand separators. + + Accepts int, float, or str (for fallback values like "?"). + """ + if isinstance(n, str): + return n + if isinstance(n, float): + if n == int(n): + n = int(n) + else: + return f"{n:,.2f}" + return f"{n:,}" + + +def get_anndata_version() -> str: + """Get the anndata version string.""" + try: + from importlib.metadata import PackageNotFoundError, version + + return version("anndata") + except PackageNotFoundError: + return "unknown" + + +def is_view(obj: object) -> bool: + """Check if an object is a view (for AnnData-like objects).""" + try: + return getattr(obj, "is_view", False) + except Exception: # noqa: BLE001 + return False + + +def is_backed(obj: object) -> bool: + """Check if an object is backed (for AnnData-like objects).""" + try: + return getattr(obj, "isbacked", False) + except Exception: # noqa: BLE001 + return False + + +def get_backing_info(obj: object) -> dict[str, bool | str | None]: + """Get information about backing for an AnnData-like object.""" + try: + if not is_backed(obj): + return {"backed": False} + + filename = str(getattr(obj, "filename", None) or "") + info: dict[str, bool | str | None] = { + "backed": True, + "filename": filename, + } + + # Try to get file status + file_obj = getattr(obj, "file", None) + if file_obj is not None: + info["is_open"] = getattr(file_obj, "is_open", None) + + # Detect format from filename + if filename: + if filename.endswith(".h5ad"): + info["format"] = "H5AD" + elif ".zarr" in filename: + info["format"] = "Zarr" + else: + info["format"] = "Unknown" + + return info + except Exception: # noqa: BLE001 + return {"backed": False} + + +def _load_css_colors() -> frozenset[str]: + """Load CSS named colors from static file. + + The colors are loaded from static/css_colors.txt which contains the + 147 CSS3 named colors. This file can be easily updated if needed. + + Returns + ------- + frozenset of lowercase color names + """ + from functools import cache + from importlib.resources import files + + @cache + def _load() -> frozenset[str]: + content = ( + files("anndata._repr.static") + .joinpath("css_colors.txt") + .read_text(encoding="utf-8") + ) + colors = set() + for line in content.splitlines(): + line = line.strip() + if line and not line.startswith("#"): + colors.add(line.lower()) + return frozenset(colors) + + return _load() + + +# CSS named colors for color detection in _is_color_string(). +# Loaded from static/css_colors.txt - see that file for the full list. +# Colors can also be specified as hex (#RGB, #RRGGBB), rgb(), or rgba(). +_NAMED_COLORS = _load_css_colors() + + +# ----------------------------------------------------------------------------- +# Value preview functions +# ----------------------------------------------------------------------------- + + +def preview_string(value: str, max_len: int) -> str: + """Preview a string value.""" + if len(value) <= max_len: + return f'"{value}"' + return f'"{value[:max_len]}..."' + + +def preview_number(value: float | np.integer | np.floating) -> str: + """Preview a numeric value.""" + if isinstance(value, bool): + return str(value) + if isinstance(value, (int, np.integer)): + return str(value) + # Float - format nicely + if value == int(value): + return str(int(value)) + return f"{value:.6g}" + + +def preview_dict(value: dict) -> str: + """Preview a dict value.""" + n_keys = len(value) + if n_keys == 0: + return "{}" + if n_keys <= DICT_PREVIEW_KEYS: + keys_preview = ", ".join(str(k) for k in list(value.keys())[:DICT_PREVIEW_KEYS]) + return f"{{{keys_preview}}}" + keys_preview = ", ".join( + str(k) for k in list(value.keys())[:DICT_PREVIEW_KEYS_LARGE] + ) + return f"{{{keys_preview}, ...}} ({n_keys} keys)" + + +def preview_sequence(value: list | tuple) -> str: + """Preview a list or tuple value.""" + n_items = len(value) + bracket = "[]" if isinstance(value, list) else "()" + if n_items == 0: + return bracket + if n_items <= LIST_PREVIEW_ITEMS: + try: + items = [preview_item(v) for v in value[:LIST_PREVIEW_ITEMS]] + if all(items): + return f"{bracket[0]}{', '.join(items)}{bracket[1]}" + except Exception: # noqa: BLE001 + # Intentional broad catch: preview generation is best-effort + pass + return f"({n_items} items)" + + +def preview_item(value: object) -> str: + """Generate a short preview for a single item (for list/tuple previews).""" + if isinstance(value, str): + if len(value) <= STRING_INLINE_LIMIT: + return f'"{value}"' + truncate_at = STRING_INLINE_LIMIT - 3 # Leave room for "..." + return f'"{value[:truncate_at]}..."' + if isinstance(value, bool): + return str(value) + if isinstance(value, (int, float, np.integer, np.floating)): + return str(value) + if value is None: + return "None" + return "" # Empty string means skip + + +def generate_value_preview(value: object, max_len: int = 100) -> str: + """Generate a human-readable preview of a value. + + Returns empty string if no meaningful preview can be generated. + """ + if value is None: + return "None" + if isinstance(value, str): + return preview_string(value, max_len) + if isinstance(value, (bool, int, float, np.integer, np.floating)): + return preview_number(value) + if isinstance(value, dict): + return preview_dict(value) + if isinstance(value, (list, tuple)): + return preview_sequence(value) + # No preview for complex types + return "" + + +def get_setting(name: str, *, default: object) -> object: + """Get a setting value from anndata.settings, falling back to default. + + Parameters + ---------- + name + The setting name (e.g., "repr_html_max_items") + default + Default value if setting is not available + + Returns + ------- + The setting value or default + """ + try: + from anndata import settings + + return getattr(settings, name, default) + except (ImportError, AttributeError): + return default + + +def validate_key(key: str) -> tuple[bool, str, bool]: + """Check if a key name is valid for HDF5/Zarr serialization. + + Key names (column names, uns keys, etc.) are validated because certain + characters cause issues with the underlying storage formats (HDF5 and Zarr). + + Parameters + ---------- + key + Key name to validate + + Returns + ------- + tuple of (is_valid, reason, is_hard_error) + is_valid: False if there's an issue + reason: Description of the issue + is_hard_error: True means write fails NOW, False means deprecation warning + """ + if not isinstance(key, str): + return False, f"Non-string key ({type(key).__name__})", True + # Slashes will be disallowed in h5 stores (FutureWarning) + if "/" in key: + return False, "Contains '/' (deprecated)", False + return True, "", False diff --git a/src/anndata/_repr_constants.py b/src/anndata/_repr_constants.py new file mode 100644 index 000000000..9c4d0bfbe --- /dev/null +++ b/src/anndata/_repr_constants.py @@ -0,0 +1,157 @@ +""" +Constants for HTML representation. + +This module contains default values for repr_html settings. +It is located outside the _repr/ package to avoid loading the full +_repr module when _settings.py imports these constants at anndata +import time. Python loads parent packages before submodules, so +importing from _repr.constants would trigger _repr/__init__.py. +""" + +from __future__ import annotations + +# Display behavior +DEFAULT_FOLD_THRESHOLD = 5 # Auto-fold sections with more than N entries +DEFAULT_MAX_DEPTH = 3 # Maximum recursion depth for nested objects +DEFAULT_MAX_ITEMS = 200 # Maximum items to show per section +DEFAULT_MAX_STRING_LENGTH = 100 # Truncate strings longer than this +DEFAULT_PREVIEW_ITEMS = 5 # Number of items to show in previews (first/last) +# Max category values to display inline (used by render_category_list in components.py) +# Note: DataFrame columns in obsm/varm have no limit - see DF_COLS_PREVIEW_LIMIT comment +DEFAULT_MAX_CATEGORIES = 100 +DEFAULT_MAX_LAZY_CATEGORIES = ( + 100 # Max categories to load for lazy categoricals (0 to skip) +) +DEFAULT_UNIQUE_LIMIT = 1_000_000 # Max rows to compute unique counts (0 to disable) +DEFAULT_MAX_README_SIZE = 100_000 # Max README size in chars (100KB, 0 to disable) + +# Column widths (pixels) +DEFAULT_MAX_FIELD_WIDTH = 400 # Max width for field name column +DEFAULT_TYPE_WIDTH = 220 # Width for type column + +# Field name column width calculation constants +# These values are empirically tuned for the default 13px monospace font +CHAR_WIDTH_PX = 8 # Average character width for monospace at 13px font-size +COPY_BUTTON_PADDING_PX = ( + 54 # Extra space for copy button + cell padding (24px for grid border-box) +) +MIN_FIELD_WIDTH_PX = 104 # Minimum column width (includes 24px cell padding) +DEFAULT_FIELD_WIDTH_PX = ( + 104 # Default when no field names exist (matches MIN_FIELD_WIDTH_PX) +) + +# Inline style for graceful degradation (hidden until JS enables). +# JS sets different display values per element, so this must stay inline. +STYLE_HIDDEN = "display:none;" + +# Warning messages +NOT_SERIALIZABLE_MSG = "Not serializable to H5AD/Zarr" + +# Preview truncation limits +TOOLTIP_TRUNCATE_LENGTH = 500 # Max chars for tooltip full text +ERROR_TRUNCATE_LENGTH = 200 # Max chars for error messages +COLOR_PREVIEW_LIMIT = 15 # Max color swatches to show +DICT_PREVIEW_KEYS = 3 # Keys to show in dict preview (small dicts) +DICT_PREVIEW_KEYS_LARGE = 2 # Keys to show in dict preview (large dicts) +LIST_PREVIEW_ITEMS = 3 # Items to show in list preview +STRING_INLINE_LIMIT = 20 # Max string length before truncating inline + +# DataFrame column preview limits (for DataFrames in uns only) +# These control the compact inline preview shown in the type cell, e.g. "[col1, col2, ...]" +# Used by DataFrameFormatter in formatters.py for uns entries. +# +# Note: DataFrames in obsm/varm render ALL columns with CSS truncation + wrap button +# (render_entry_preview_cell() in components.py). No constant limits this - all columns +# are in the HTML, CSS truncates the display, and the wrap button expands to show all. +# This differs from categories which are limited by DEFAULT_MAX_CATEGORIES below. +DF_COLS_PREVIEW_LIMIT = 5 # Max columns to show in compact preview +DF_COLS_PREVIEW_MAX_LEN = 40 # Max total chars for column list string + +# CSS class names for entry rows (BEM: anndata-entry block) +CSS_ENTRY = "anndata-entry" +CSS_ENTRY_NAME = "anndata-entry__name" +CSS_ENTRY_TYPE = "anndata-entry__type" +CSS_ENTRY_PREVIEW = "anndata-entry__preview" +CSS_TEXT_MUTED = "anndata-text--muted" +CSS_TEXT_ERROR = "anndata-text--error" +CSS_TEXT_WARNING = "anndata-text--warning" +CSS_NESTED_CONTENT = "anndata-entry__nested-content" +CSS_NESTED_ANNDATA = "anndata-entry__nested-anndata" + +# CSS class names for dtype spans (BEM: anndata-dtype block with modifiers) +# These provide visual differentiation for different data types +# Basic types +CSS_DTYPE_INT = "anndata-dtype--int" +CSS_DTYPE_FLOAT = "anndata-dtype--float" +CSS_DTYPE_BOOL = "anndata-dtype--bool" +CSS_DTYPE_STRING = "anndata-dtype--string" +CSS_DTYPE_OBJECT = "anndata-dtype--object" +# Container/structured types +CSS_DTYPE_CATEGORY = "anndata-dtype--category" +CSS_DTYPE_DATAFRAME = "anndata-dtype--dataframe" +CSS_DTYPE_ANNDATA = "anndata-dtype--anndata" +# Array types +CSS_DTYPE_NDARRAY = "anndata-dtype--ndarray" +CSS_DTYPE_SPARSE = "anndata-dtype--sparse" +# Specialized array types +CSS_DTYPE_DASK = "anndata-dtype--dask" +CSS_DTYPE_GPU = "anndata-dtype--gpu" +CSS_DTYPE_TPU = "anndata-dtype--tpu" +CSS_DTYPE_AWKWARD = "anndata-dtype--awkward" +CSS_DTYPE_ARRAY_API = "anndata-dtype--array-api" +# Extension/unknown +CSS_DTYPE_EXTENSION = "anndata-dtype--extension" +CSS_DTYPE_UNKNOWN = "anndata-dtype--unknown" + +# CSS class names for badges (BEM: anndata-badge block with modifiers) +CSS_BADGE = "anndata-badge" +CSS_BADGE_VIEW = "anndata-badge--view" +CSS_BADGE_BACKED = "anndata-badge--backed" +CSS_BADGE_LAZY = "anndata-badge--lazy" +CSS_BADGE_EXTENSION = "anndata-badge--extension" + +# CSS class names for color swatches (BEM: anndata-colors block) +CSS_COLORS = "anndata-colors" +CSS_COLORS_SWATCH = "anndata-colors__swatch" +CSS_COLORS_SWATCH_INVALID = "anndata-colors__swatch--invalid" + +# Section name constants (canonical strings used for data-section attributes and dispatch) +SECTION_X = "X" +SECTION_OBS = "obs" +SECTION_VAR = "var" +SECTION_UNS = "uns" +SECTION_OBSM = "obsm" +SECTION_VARM = "varm" +SECTION_LAYERS = "layers" +SECTION_OBSP = "obsp" +SECTION_VARP = "varp" +SECTION_RAW = "raw" + +# Internal AnnData attributes to skip when detecting unknown sections. +# +# Standard data sections (obs, var, uns, obsm, etc.) are discovered via +# `anndata.utils.iter_outer`. Custom sections come from registered +# SectionFormatter extensions. This frozenset lists non-data attributes +# that would otherwise appear as "unknown sections" — shape/size metadata, +# index accessors, file/backing info, and the transpose accessor. +# +# Keep NEW data slots OUT of this list. They should surface as "unknown" +# until a proper renderer is implemented. +INTERNAL_ANNDATA_ATTRS = frozenset({ + # Shape/size metadata + "shape", + "n_obs", + "n_vars", + # Index accessors + "obs_names", + "var_names", + # File/backing info + "filename", + "file", + "isbacked", + # View status + "is_view", + "isview", # Deprecated alias, triggers warning if accessed + # Transpose accessor + "T", +}) diff --git a/src/anndata/_settings.py b/src/anndata/_settings.py index a08c80d4f..1d22c4fff 100644 --- a/src/anndata/_settings.py +++ b/src/anndata/_settings.py @@ -12,6 +12,17 @@ from types import GenericAlias, NoneType from typing import TYPE_CHECKING, Any, NamedTuple, cast +from ._repr_constants import ( + DEFAULT_FOLD_THRESHOLD, + DEFAULT_MAX_CATEGORIES, + DEFAULT_MAX_DEPTH, + DEFAULT_MAX_FIELD_WIDTH, + DEFAULT_MAX_ITEMS, + DEFAULT_MAX_LAZY_CATEGORIES, + DEFAULT_MAX_README_SIZE, + DEFAULT_TYPE_WIDTH, + DEFAULT_UNIQUE_LIMIT, +) from ._warnings import warn from .compat import old_positionals @@ -528,5 +539,109 @@ def validate_sparse_settings(val: Any, settings: SettingsManager) -> None: ) +# HTML representation settings +settings.register( + "repr_html_enabled", + default_value=True, + description="Whether to use rich HTML representation in Jupyter notebooks. Set to False to use plain text repr.", + validate=validate_bool, + get_from_env=check_and_get_bool, +) + +settings.register( + "repr_html_fold_threshold", + default_value=DEFAULT_FOLD_THRESHOLD, + description="Auto-fold sections in HTML repr when they have more than this many entries.", + validate=validate_int, + get_from_env=check_and_get_int, +) + +settings.register( + "repr_html_max_depth", + default_value=DEFAULT_MAX_DEPTH, + description="Maximum recursion depth for nested AnnData objects in HTML repr.", + validate=validate_int, + get_from_env=check_and_get_int, +) + +settings.register( + "repr_html_max_items", + default_value=DEFAULT_MAX_ITEMS, + description="Maximum number of items to show per section in HTML repr.", + validate=validate_int, + get_from_env=check_and_get_int, +) + +settings.register( + "repr_html_max_categories", + default_value=DEFAULT_MAX_CATEGORIES, + description="Maximum number of category values to display inline in HTML repr.", + validate=validate_int, + get_from_env=check_and_get_int, +) + +settings.register( + "repr_html_max_lazy_categories", + default_value=DEFAULT_MAX_LAZY_CATEGORIES, + description=( + "Maximum categories to load for lazy categoricals in HTML repr. " + "For lazy AnnData (from read_lazy()), loading categories requires reading " + "from disk. This limit prevents loading too many categories. " + "Set to 0 to disable loading categories entirely (metadata-only mode)." + ), + validate=validate_int, + get_from_env=check_and_get_int, +) + +settings.register( + "repr_html_unique_limit", + default_value=DEFAULT_UNIQUE_LIMIT, + description="Maximum number of rows to compute unique counts for in HTML repr. Set to 0 to disable.", + validate=validate_int, + get_from_env=check_and_get_int, +) + +settings.register( + "repr_html_dataframe_expand", + default_value=False, + description=( + "Whether to show expandable pandas DataFrame previews in HTML repr. " + "When enabled, DataFrames in obsm/varm can be expanded to show their content " + "using pandas _repr_html_() (rich Jupyter-style output). Configure pandas " + "display options to control output: pd.set_option('display.max_rows', 10)" + ), + validate=validate_bool, + get_from_env=check_and_get_bool, +) + +settings.register( + "repr_html_max_field_width", + default_value=DEFAULT_MAX_FIELD_WIDTH, + description="Maximum width in pixels for the field name column in HTML repr.", + validate=validate_int, + get_from_env=check_and_get_int, +) + +settings.register( + "repr_html_type_width", + default_value=DEFAULT_TYPE_WIDTH, + description="Width in pixels for the type column in HTML repr.", + validate=validate_int, + get_from_env=check_and_get_int, +) + +settings.register( + "repr_html_max_readme_size", + default_value=DEFAULT_MAX_README_SIZE, + description=( + "Maximum size in characters for README content in HTML repr. " + "READMEs larger than this will be truncated with a note. " + "Set to 0 to disable truncation (not recommended for very large READMEs)." + ), + validate=validate_int, + get_from_env=check_and_get_int, +) + + ################################################################################## ################################################################################## diff --git a/src/anndata/_settings.pyi b/src/anndata/_settings.pyi index 775f20ba7..9139cfd77 100644 --- a/src/anndata/_settings.pyi +++ b/src/anndata/_settings.pyi @@ -47,5 +47,16 @@ class _AnnDataSettingsManager(SettingsManager): disallow_forward_slash_in_h5ad: bool = False write_csr_csc_indices_with_min_possible_dtype: bool = False auto_shard_zarr_v3: bool | None = None + repr_html_enabled: bool = True + repr_html_fold_threshold: int = 5 + repr_html_max_depth: int = 3 + repr_html_max_items: int = 200 + repr_html_max_categories: int = 20 + repr_html_unique_limit: int = 1_000_000 + repr_html_dataframe_expand: bool = False + repr_html_max_field_width: int = 400 + repr_html_type_width: int = 220 + repr_html_max_lazy_categories: int = 100 + repr_html_max_readme_size: int = 100_000 settings: _AnnDataSettingsManager diff --git a/tests/repr/__init__.py b/tests/repr/__init__.py new file mode 100644 index 000000000..ebe87200d --- /dev/null +++ b/tests/repr/__init__.py @@ -0,0 +1 @@ +"""Tests for anndata._repr module.""" diff --git a/tests/repr/conftest.py b/tests/repr/conftest.py new file mode 100644 index 000000000..019ea654f --- /dev/null +++ b/tests/repr/conftest.py @@ -0,0 +1,272 @@ +""" +Shared fixtures for repr tests. +""" + +from __future__ import annotations + +import re + +import numpy as np +import pandas as pd +import pytest +import scipy.sparse as sp + +from anndata import AnnData + +# Import HTML validation utilities from separate module +from .html_validator import ( + HTMLValidator, + StrictHTMLParser, + validate_html5_strict, +) + +# Re-export validation utilities for use by other test modules +__all__ = ["HTMLValidator", "StrictHTMLParser", "validate_html5_strict"] + + +def pytest_configure(config): + """Suppress ImplicitModificationWarning for all repr tests. + + This warning is expected when AnnData transforms indices internally + during singledispatch in functools. + """ + import warnings + + from anndata._warnings import ImplicitModificationWarning + + warnings.filterwarnings("ignore", category=ImplicitModificationWarning) + + +# ============================================================================= +# Optional Dependencies +# ============================================================================= + +try: + import dask.array as da # noqa: F401 + + HAS_DASK = True +except ImportError: + HAS_DASK = False + +try: + import cupy as cp # noqa: F401 + + HAS_CUPY = True +except ImportError: + HAS_CUPY = False + +try: + import awkward as ak # noqa: F401 + + HAS_AWKWARD = True +except ImportError: + HAS_AWKWARD = False + +try: + import xarray # noqa: F401 + + HAS_XARRAY = True +except ImportError: + HAS_XARRAY = False + + +# ============================================================================= +# HTML5 Validation Fixture +# ============================================================================= + + +@pytest.fixture +def validate_html5(): + """ + Fixture for strict HTML5 validation using Nu Html Checker. + + Usage: + def test_valid_html5(validate_html5): + html = adata._repr_html_() + errors = validate_html5(html) + assert not errors, f"HTML5 validation errors: {errors}" + + Skips validation if vnu is not installed. + Install via: brew install vnu OR pip install html5-validator + """ + + def _validate(html: str) -> list[str]: + return validate_html5_strict(html) + + return _validate + + +# ============================================================================= +# Optional JavaScript Validation +# ============================================================================= + +# Check for esprima (pure Python JS parser) availability +try: + import esprima # noqa: F401 + + HAS_ESPRIMA = True +except ImportError: + HAS_ESPRIMA = False + + +def validate_javascript_syntax(html: str) -> list[str]: + """ + Validate JavaScript syntax in HTML script tags. + + Returns list of syntax errors. + Requires: pip install esprima + + This is a lightweight alternative to ESLint that doesn't require Node.js. + """ + if not HAS_ESPRIMA: + return [] + + import esprima + + errors = [] + script_pattern = r"]*>(.*?)" + scripts = re.findall(script_pattern, html, re.DOTALL | re.I) + + for i, script in enumerate(scripts): + if not script.strip(): + continue + try: + esprima.parseScript(script, tolerant=True) + except esprima.Error as e: + errors.append(f"Script {i + 1}: {e}") + + return errors + + +@pytest.fixture +def validate_js(): + """ + Fixture for JavaScript syntax validation. + + Usage: + def test_valid_js(validate_js): + html = adata._repr_html_() + errors = validate_js(html) + assert not errors, f"JavaScript errors: {errors}" + + Skips validation if esprima is not installed. + Install via: pip install esprima + """ + + def _validate(html: str) -> list[str]: + return validate_javascript_syntax(html) + + return _validate + + +# ============================================================================= +# Fixtures +# ============================================================================= + + +@pytest.fixture +def adata(): + """Basic AnnData for testing.""" + return AnnData( + np.random.randn(100, 50).astype(np.float32), + obs=pd.DataFrame( + {"batch": ["A", "B"] * 50}, index=[f"cell_{i}" for i in range(100)] + ), + var=pd.DataFrame( + {"gene_name": [f"gene_{i}" for i in range(50)]}, + index=[f"gene_{i}" for i in range(50)], + ), + ) + + +@pytest.fixture +def adata_full(): + """AnnData with all attributes populated.""" + n_obs, n_vars = 100, 50 + adata = AnnData( + sp.random(n_obs, n_vars, density=0.1, format="csr", dtype=np.float32), + obs=pd.DataFrame({ + "batch": pd.Categorical(["A", "B"] * (n_obs // 2)), + "n_counts": np.random.randint(1000, 10000, n_obs), + "cell_type": pd.Categorical( + ["T", "B", "NK"] * (n_obs // 3) + ["T"] * (n_obs % 3) + ), + }), + var=pd.DataFrame({ + "gene_name": [f"gene_{i}" for i in range(n_vars)], + "highly_variable": np.random.choice([True, False], n_vars), + }), + ) + adata.uns["neighbors"] = {"params": {"n_neighbors": 15}} + adata.uns["batch_colors"] = ["#FF0000", "#00FF00"] + adata.obsm["X_pca"] = np.random.randn(n_obs, 50).astype(np.float32) + adata.obsm["X_umap"] = np.random.randn(n_obs, 2).astype(np.float32) + adata.varm["PCs"] = np.random.randn(n_vars, 50).astype(np.float32) + adata.layers["raw"] = sp.random(n_obs, n_vars, density=0.1, format="csr") + adata.obsp["distances"] = sp.random(n_obs, n_obs, density=0.01, format="csr") + adata.varp["gene_corr"] = sp.random(n_vars, n_vars, density=0.1, format="csr") + return adata + + +@pytest.fixture +def adata_with_colors(): + """AnnData with color annotations.""" + adata = AnnData(np.zeros((10, 5))) + adata.obs["cluster"] = pd.Categorical(["A", "B", "C"] * 3 + ["A"]) + adata.uns["cluster_colors"] = ["#FF0000", "#00FF00", "#0000FF"] + return adata + + +@pytest.fixture +def adata_with_nested(): + """AnnData with nested AnnData in uns.""" + inner = AnnData(np.zeros((5, 3))) + outer = AnnData(np.zeros((10, 5))) + outer.uns["nested_adata"] = inner + return outer + + +@pytest.fixture +def adata_with_special_chars(): + """AnnData with special characters in names.""" + adata = AnnData(np.zeros((10, 5))) + adata.obs["col" + scripts = re.findall(script_pattern, self.html, re.DOTALL | re.I) + + for script in scripts: + if js_fragment in script: + return self + + raise AssertionError( + msg or f"JavaScript fragment '{js_fragment}' not found in scripts" + ) + + def assert_collapse_functionality_present( + self, *, msg: str | None = None + ) -> HTMLValidator: + """Assert collapse/expand functionality is present in HTML. + + Section collapse uses native
/ elements. + Entry-level expand uses JS classList.toggle. + """ + has_details = " or entry JS) not found in HTML" + ) + return self + + def assert_section_initially_collapsed( + self, section_name: str, *, msg: str | None = None + ) -> HTMLValidator: + """Assert a section starts collapsed (
without open attribute).""" + # Match
without an open attribute + pattern = rf']*data-section="{re.escape(section_name)}"[^>]*>' + match = re.search(pattern, self.html) + if not match: + raise AssertionError( + msg or f"Section '{section_name}' not found as
element" + ) + tag = match.group(0) + # The section should NOT have the open attribute + if re.search(r"\bopen\b", tag): + raise AssertionError( + msg + or f"Section '{section_name}' has 'open' attribute (expected collapsed)" + ) + return self + + def assert_section_not_initially_collapsed( + self, section_name: str, *, msg: str | None = None + ) -> HTMLValidator: + """Assert a section starts expanded (
with open attribute).""" + self.assert_section_exists(section_name) + # Match
with an open attribute + pattern = rf']*data-section="{re.escape(section_name)}"[^>]*>' + match = re.search(pattern, self.html) + if not match: + raise AssertionError( + msg or f"Section '{section_name}' not found as
element" + ) + tag = match.group(0) + if not re.search(r"\bopen\b", tag): + raise AssertionError( + msg + or f"Section '{section_name}' missing 'open' attribute (expected expanded)" + ) + return self + + def assert_has_event_handler( + self, event: str, *, msg: str | None = None + ) -> HTMLValidator: + """Assert elements have specific event handlers.""" + inline_pattern = rf"on{event}\s*=" + listener_pattern = rf"addEventListener\s*\(\s*['\"]?{event}['\"]?" + + has_handler = re.search(inline_pattern, self.html, re.I) or re.search( + listener_pattern, self.html + ) + + if not has_handler: + raise AssertionError( + msg or f"Event handler for '{event}' not found in HTML" + ) + return self + + def assert_has_data_attribute( + self, + attr_name: str, + expected_value: str | None = None, + *, + msg: str | None = None, + ) -> HTMLValidator: + """Assert data-* attributes exist with optional value check.""" + if expected_value is not None: + pattern = ( + rf'data-{re.escape(attr_name)}=["\']?{re.escape(expected_value)}["\']?' + ) + else: + pattern = rf"data-{re.escape(attr_name)}=" + + if not re.search(pattern, self.html): + if expected_value: + raise AssertionError( + msg or f"data-{attr_name}='{expected_value}' not found in HTML" + ) + raise AssertionError(msg or f"data-{attr_name} attribute not found in HTML") + return self + + def assert_truncation_indicator(self, *, msg: str | None = None) -> HTMLValidator: + """Assert truncation is indicated with specific patterns. + + Checks for actual truncation indicators used by the repr system: + - `...+{number}` pattern (e.g., "...+20" for categories) + - `... and {number} more` pattern (e.g., "... and 100 more" for rows) + - CSS class `anndata-section__truncated` + - Escaped ellipsis in category display + """ + truncation_patterns = [ + r"\.\.\.\+\d+", # ...+N pattern (categories) + r"\.\.\.\s+and\s+[\d,]+\s+more", # "... and N more" (rows) + r"anndata-section__truncated", # CSS class for truncation + r"…", # HTML entity for ellipsis + r"…", # Numeric HTML entity for ellipsis + ] + has_truncation = any( + re.search(pattern, self.html, re.I) for pattern in truncation_patterns + ) + if not has_truncation: + raise AssertionError(msg or "No truncation indicator found in HTML") + return self + + def assert_error_shown( + self, error_text: str | None = None, *, msg: str | None = None + ) -> HTMLValidator: + """Assert an error message is shown in the HTML output. + + The repr system should show errors visibly, not hide them. + Errors are typically shown as: + - Text containing 'error:' or 'Error' + - Elements with 'error' class + - Text with exception names (e.g., 'AttributeError', 'RuntimeError') + """ + if error_text: + if error_text not in self.html: + raise AssertionError( + msg or f"Error text '{error_text}' not found in HTML" + ) + return self + + # Check for generic error indicators + has_error = ( + "error:" in self.html.lower() + or re.search(r"\bError\b", self.html) + or "anndata-entry--error" in self.html + or ("anndata-text--muted" in self.html and "error" in self.html.lower()) + ) + if not has_error: + raise AssertionError(msg or "No error indicator found in HTML") + return self + + def assert_no_raw_xss(self, *, msg: str | None = None) -> HTMLValidator: + """Assert no raw XSS payloads exist in executable context. + + Checks that common XSS attack vectors are properly escaped. + Note: Escaped content showing XSS strings as visible text is OK. + We only flag actual executable payloads (unescaped in HTML attributes). + """ + # Get content after the style tag (where user content would be) + style_end = self.html.find("") + content = self.html[style_end:] if style_end > 0 else self.html + + # Check for actual executable script injection + # Raw ", content, re.DOTALL | re.I + ) + for script in scripts: + # Our own scripts have specific patterns + is_our_script = ( + "anndata" in script.lower() or "toggle" in script.lower() + ) + has_xss_pattern = "alert" in script or "eval(" in script + if not is_our_script and has_xss_pattern: + raise AssertionError( + msg or "Potential XSS: executable script tag found" + ) + + # Check for event handlers in HTML tags (actual attributes, not text) + # Pattern: ]+\s{handler}\s*=" + if re.search(pattern, content, re.I): + # Verify it's not in our own legitimate HTML + match = re.search(pattern, content, re.I) + if match: + # Check if it's part of user content (escaped) or real attribute + context = content[max(0, match.start() - 50) : match.end() + 50] + if "<" not in context and """ not in context: + raise AssertionError( + msg or f"Potential XSS: {handler} handler in tag attribute" + ) + + # Check for javascript: URLs in href attributes + if re.search(r'href\s*=\s*["\']?\s*javascript:', content, re.I): + raise AssertionError(msg or "Potential XSS: javascript: URL in href") + + return self + + def assert_html_well_formed(self, *, msg: str | None = None) -> HTMLValidator: + """Assert HTML is well-formed (balanced tags, no duplicate IDs).""" + parser = StrictHTMLParser() + parser.feed(self.html) + if parser.errors: + raise AssertionError(msg or f"HTML is malformed: {parser.errors}") + if parser.tag_stack: + raise AssertionError(msg or f"Unclosed tags: {parser.tag_stack}") + return self + + def assert_accessibility_attribute( + self, attr: str, *, msg: str | None = None + ) -> HTMLValidator: + """Assert accessibility attributes exist (aria-*, role, etc.).""" + pattern = rf"{re.escape(attr)}=" + if not re.search(pattern, self.html): + raise AssertionError( + msg or f"Accessibility attribute '{attr}' not found in HTML" + ) + return self + + def count_elements(self, selector: str) -> int: + """Count elements matching selector.""" + pattern = self._selector_to_pattern(selector) + return len(re.findall(pattern, self.html)) + + def get_text_content(self) -> str: + """Get visible text content (stripped of tags).""" + visible_html = re.sub( + r"<(style|script)[^>]*>.*?", "", self.html, flags=re.DOTALL | re.I + ) + return re.sub(r"<[^>]+>", " ", visible_html) + + def _selector_to_pattern(self, selector: str) -> str: # noqa: PLR0911 + """Convert CSS-like selector to regex pattern.""" + if selector.startswith("#"): + id_val = selector[1:] + return rf'<[^>]+id=["\']?{re.escape(id_val)}["\']?[^>]*>' + elif selector.startswith("."): + class_val = selector[1:] + return ( + rf'<[^>]+class=["\'][^"\']*\b{re.escape(class_val)}\b[^"\']*["\'][^>]*>' + ) + elif "[" in selector: + match = re.match(r"\[(\w+)(?:=([^\]]+))?\]", selector) + if match: + attr, val = match.groups() + if val: + val = val.strip("\"'") + return rf'<[^>]+{attr}=["\']?{re.escape(val)}["\']?[^>]*>' + return rf"<[^>]+{attr}=[^>]*>" + elif "." in selector: + tag, class_val = selector.split(".", 1) + return rf'<{tag}[^>]+class=["\'][^"\']*\b{re.escape(class_val)}\b[^"\']*["\'][^>]*>' + else: + return rf"<{selector}[^>]*>" + + return selector # Fallback + + def _selector_to_content_pattern(self, selector: str) -> str: + """Convert selector to pattern that captures element content.""" + if selector.startswith("."): + class_val = selector[1:] + return ( + rf'<[^>]+class=["\'][^"\']*\b{re.escape(class_val)}\b[^"\']*["\'][^>]*>' + rf"(.*?)]+>" + ) + elif selector.startswith("#"): + id_val = selector[1:] + return rf'<[^>]+id=["\']?{re.escape(id_val)}["\']?[^>]*>(.*?)]+>' + else: + return rf"<{selector}[^>]*>(.*?)" + + +# ============================================================================= +# HTML Structure Validation +# ============================================================================= + +VOID_ELEMENTS = frozenset({ + "area", + "base", + "br", + "col", + "embed", + "hr", + "img", + "input", + "link", + "meta", + "param", + "source", + "track", + "wbr", +}) + + +class StrictHTMLParser(HTMLParser): + """Validates HTML structure and catches common errors.""" + + def __init__(self): + super().__init__() + self.tag_stack = [] + self.errors = [] + self.ids_seen = set() + + def handle_starttag(self, tag, attrs): + if tag not in VOID_ELEMENTS: + self.tag_stack.append(tag) + + # Check for duplicate IDs + for name, value in attrs: + if name == "id": + if value in self.ids_seen: + self.errors.append(f"Duplicate ID: {value}") + self.ids_seen.add(value) + + def handle_endtag(self, tag): + if tag not in VOID_ELEMENTS: + if not self.tag_stack or self.tag_stack[-1] != tag: + self.errors.append(f"Mismatched tag: ") + else: + self.tag_stack.pop() + + +# ============================================================================= +# Optional W3C HTML5 Validation +# ============================================================================= + +# Check for vnu (Nu Html Checker) availability +try: + import subprocess + + result = subprocess.run( + ["vnu", "--version"], + check=False, + capture_output=True, + text=True, + timeout=5, + ) + HAS_VNU = result.returncode == 0 +except (FileNotFoundError, subprocess.TimeoutExpired, OSError): + HAS_VNU = False + + +def validate_html5_strict(html: str) -> list[str]: + """ + Validate HTML5 using Nu Html Checker (vnu) if available. + + Returns list of validation errors/warnings. + Requires: pip install html5-validator OR system vnu installation. + """ + if not HAS_VNU: + return [] # Skip if not available + + import json + import tempfile + + with tempfile.NamedTemporaryFile(mode="w", suffix=".html", delete=False) as f: + full_html = f""" + +Test +{html} +""" + f.write(full_html) + f.flush() + + try: + result = subprocess.run( + ["vnu", "--format", "json", f.name], + check=False, + capture_output=True, + text=True, + timeout=30, + ) + if result.stderr: + data = json.loads(result.stderr) + return [ + f"{msg['type']}: {msg['message']}" + for msg in data.get("messages", []) + ] + except (subprocess.TimeoutExpired, json.JSONDecodeError): + pass + finally: + from pathlib import Path + + Path(f.name).unlink() + + return [] diff --git a/tests/repr/test_html_validator.py b/tests/repr/test_html_validator.py new file mode 100644 index 000000000..d77c3b21c --- /dev/null +++ b/tests/repr/test_html_validator.py @@ -0,0 +1,734 @@ +""" +Tests for HTMLValidator and examples of proper HTML testing patterns. + +These tests demonstrate the recommended approach for testing HTML repr output: +- Use structured assertions instead of string matching +- Validate element presence and attributes +- Check text appears in correct elements +- Verify sections and entries are properly rendered +""" + +from __future__ import annotations + +import numpy as np +import pandas as pd +import pytest +import scipy.sparse as sp + +from anndata import AnnData + +from .conftest import HTMLValidator + + +def _get_top_level_selectors(css: str) -> list[str]: + """Extract only top-level CSS selectors (brace depth 0). + + With native CSS nesting, nested selectors inherit scope from + their parent and don't need individual scope checking. + """ + selectors = [] + depth = 0 + current: list[str] = [] + for char in css: + if char == "{": + if depth == 0: + selector = "".join(current).strip() + if selector: + selectors.append(selector) + depth += 1 + current = [] + elif char == "}": + depth -= 1 + depth = max(depth, 0) + current = [] + elif depth == 0: + current.append(char) + return selectors + + +class TestHTMLValidatorBasics: + """Tests for HTMLValidator functionality.""" + + def test_assert_element_exists_by_class(self): + """Test finding elements by class selector.""" + html = '
content
' + v = HTMLValidator(html) + v.assert_element_exists(".my-class") + + def test_assert_element_exists_by_id(self): + """Test finding elements by ID selector.""" + html = '
content
' + v = HTMLValidator(html) + v.assert_element_exists("#my-id") + + def test_assert_element_exists_by_tag(self): + """Test finding elements by tag selector.""" + html = "content" + v = HTMLValidator(html) + v.assert_element_exists("span") + + def test_assert_element_exists_by_attribute(self): + """Test finding elements by attribute selector.""" + html = '
content
' + v = HTMLValidator(html) + v.assert_element_exists("[data-section=obs]") + + def test_assert_element_not_exists(self): + """Test asserting element doesn't exist.""" + html = '
content
' + v = HTMLValidator(html) + v.assert_element_not_exists(".missing-class") + + def test_assert_element_exists_fails(self): + """Test assertion fails when element missing.""" + html = "
content
" + v = HTMLValidator(html) + with pytest.raises(AssertionError, match="not found"): + v.assert_element_exists(".missing") + + def test_assert_text_visible(self): + """Test asserting text is visible (not in style/script).""" + html = "
visible text
" + v = HTMLValidator(html) + v.assert_text_visible("visible text") + with pytest.raises(AssertionError): + v.assert_text_visible("hidden") + + def test_assert_text_in_element(self): + """Test asserting text appears in specific element.""" + html = '
expected text
other
' + v = HTMLValidator(html) + v.assert_text_in_element(".target", "expected text") + + def test_assert_section_exists(self): + """Test asserting data section exists.""" + html = '
content
' + v = HTMLValidator(html) + v.assert_section_exists("obs") + + def test_assert_section_contains_entry(self): + """Test asserting section contains entry.""" + html = '
batch cell_type
' + v = HTMLValidator(html) + v.assert_section_contains_entry("obs", "batch") + + def test_assert_badge_shown(self): + """Test asserting badge is displayed.""" + html = 'View' + v = HTMLValidator(html) + v.assert_badge_shown("view") + + def test_assert_badge_not_shown(self): + """Test asserting badge is not displayed.""" + html = 'content' + v = HTMLValidator(html) + v.assert_badge_not_shown("view") + + def test_assert_shape_displayed(self): + """Test asserting shape values are displayed.""" + html = "
100 obs × 50 var
" + v = HTMLValidator(html) + v.assert_shape_displayed(100, 50) + + def test_assert_dtype_displayed(self): + """Test asserting dtype is displayed.""" + html = 'float32' + v = HTMLValidator(html) + v.assert_dtype_displayed("float32") + + def test_count_elements(self): + """Test counting matching elements.""" + html = '
1
2
3
' + v = HTMLValidator(html) + assert v.count_elements(".item") == 2 + + def test_chaining(self): + """Test method chaining works.""" + html = '
text
' + v = HTMLValidator(html) + ( + v + .assert_element_exists(".a") + .assert_section_exists("obs") + .assert_text_visible("text") + ) + + +class TestHTMLValidatorWithAnnData: + """Tests demonstrating HTMLValidator with actual AnnData repr.""" + + def test_basic_structure(self, validate_html): + """Test basic AnnData repr structure.""" + adata = AnnData(np.zeros((100, 50))) + html = adata._repr_html_() + v = validate_html(html) + + # Validate structure, not just string presence + v.assert_element_exists(".anndata-repr") + v.assert_shape_displayed(100, 50) + + def test_sections_present(self, validate_html): + """Test all populated sections are present.""" + adata = AnnData( + np.zeros((10, 5)), + obs=pd.DataFrame({"batch": ["A", "B"] * 5}), + var=pd.DataFrame({"gene": range(5)}), + ) + adata.uns["key"] = "value" + html = adata._repr_html_() + v = validate_html(html) + + v.assert_section_exists("obs") + v.assert_section_exists("var") + v.assert_section_exists("uns") + + def test_obs_entries_in_correct_section(self, validate_html): + """Test obs column names appear in obs section.""" + adata = AnnData(np.zeros((10, 5))) + adata.obs["cell_type"] = pd.Categorical(["A", "B"] * 5) + adata.obs["n_counts"] = range(10) + html = adata._repr_html_() + v = validate_html(html) + + v.assert_section_contains_entry("obs", "cell_type") + v.assert_section_contains_entry("obs", "n_counts") + + def test_view_badge_displayed_correctly(self, validate_html): + """Test View badge is shown for views.""" + adata = AnnData(np.zeros((10, 5))) + view = adata[0:5, :] + html = view._repr_html_() + v = validate_html(html) + + v.assert_badge_shown("view") + v.assert_shape_displayed(5, 5) # View shape, not original + + def test_view_badge_not_shown_for_non_view(self, validate_html): + """Test View badge is NOT shown for non-views.""" + adata = AnnData(np.zeros((10, 5))) + html = adata._repr_html_() + v = validate_html(html) + + v.assert_badge_not_shown("view") + + def test_sparse_matrix_dtype_displayed(self, validate_html): + """Test sparse matrix shows dtype.""" + X = sp.random(100, 50, density=0.1, format="csr", dtype=np.float32) + adata = AnnData(X) + html = adata._repr_html_() + v = validate_html(html) + + v.assert_dtype_displayed("float32") + v.assert_text_visible("csr") + + def test_categorical_with_colors(self, validate_html): + """Test categorical with colors shows color values.""" + adata = AnnData(np.zeros((10, 5))) + adata.obs["cluster"] = pd.Categorical(["A", "B"] * 5) + adata.uns["cluster_colors"] = ["#FF0000", "#00FF00"] + html = adata._repr_html_() + v = validate_html(html) + + v.assert_section_contains_entry("obs", "cluster") + v.assert_color_swatch("#FF0000") + v.assert_color_swatch("#00FF00") + + def test_warning_for_unserializable(self, validate_html): + """Test warning indicator for unserializable objects.""" + + class CustomClass: + pass + + adata = AnnData(np.zeros((5, 3))) + adata.uns["custom"] = CustomClass() + html = adata._repr_html_() + v = validate_html(html) + + v.assert_section_contains_entry("uns", "custom") + v.assert_warning_indicator() + + def test_backed_badge(self, validate_html, tmp_path): + """Test backed badge is shown for backed AnnData with sparse X.""" + from scipy import sparse + + import anndata as ad + + # Use sparse matrix to cover BackedSparseDatasetFormatter + adata = AnnData(sparse.random(100, 50, density=0.1, format="csr")) + path = tmp_path / "test.h5ad" + adata.write_h5ad(path) + + backed = ad.read_h5ad(path, backed="r") + html = backed._repr_html_() + v = validate_html(html) + + v.assert_badge_shown("backed") + # Verify sparse matrix is shown with "on disk" indicator + v.assert_text_visible("on disk") + backed.file.close() + + def test_raw_section_visible(self, validate_html): + """Test raw section is visible when raw is set.""" + adata = AnnData(np.zeros((10, 20))) + adata.raw = adata.copy() + adata = adata[:, :5] + html = adata._repr_html_() + v = validate_html(html) + + v.assert_section_exists("raw") + # Raw should show original var count + v.assert_text_visible("20") + + def test_layers_section(self, validate_html): + """Test layers section shows layer names.""" + adata = AnnData(np.zeros((10, 5))) + adata.layers["counts"] = np.ones((10, 5)) + adata.layers["normalized"] = np.zeros((10, 5)) + html = adata._repr_html_() + v = validate_html(html) + + v.assert_section_exists("layers") + v.assert_section_contains_entry("layers", "counts") + v.assert_section_contains_entry("layers", "normalized") + + +class TestMigrationExamples: + """Examples showing how to migrate old-style tests to HTMLValidator. + + OLD STYLE (string matching): + html = adata._repr_html_() + assert "batch" in html + assert "View" in html + + NEW STYLE (structured validation): + v = validate_html(html) + v.assert_section_contains_entry("obs", "batch") + v.assert_badge_shown("view") + """ + + def test_old_vs_new_section_check(self, validate_html): + """Compare old vs new section checking.""" + adata = AnnData(np.zeros((10, 5))) + adata.obs["batch"] = ["A", "B"] * 5 + html = adata._repr_html_() + + # OLD: String in HTML (could match CSS class name, comment, etc.) + assert "batch" in html # Weak assertion + + # NEW: Entry in correct section + v = validate_html(html) + v.assert_section_contains_entry("obs", "batch") # Strong assertion + + def test_old_vs_new_badge_check(self, validate_html): + """Compare old vs new badge checking.""" + adata = AnnData(np.zeros((10, 5))) + view = adata[0:5, :] + html = view._repr_html_() + + # OLD: String matching (could match "View" in documentation, comments, etc.) + assert "View" in html # Weak assertion + + # NEW: Check for actual badge element + v = validate_html(html) + v.assert_badge_shown("view") # Strong assertion - checks CSS class + + def test_old_vs_new_shape_check(self, validate_html): + """Compare old vs new shape checking.""" + adata = AnnData(np.zeros((123, 456))) + html = adata._repr_html_() + + # OLD: Numbers anywhere in HTML + assert "123" in html # Could match anything + assert "456" in html + + # NEW: Explicit shape check + v = validate_html(html) + v.assert_shape_displayed(123, 456) + + +class TestJupyterNotebookCompatibility: + """Tests for Jupyter Notebook/Lab HTML compatibility. + + These tests ensure the HTML repr works correctly when embedded + in Jupyter output cells, including: + - CSS scoping (no style leakage) + - JavaScript isolation (no global pollution) + - Multiple cell compatibility (unique IDs) + - Jupyter theming support + """ + + def test_css_scoped_to_anndata_repr(self, adata_full): + """Test CSS rules are scoped to .anndata-repr container. + + Unscoped CSS could affect other notebook cells or UI elements. + With native CSS nesting, only top-level selectors need checking + since nested selectors inherit scope from their parent. + """ + import re + + html = adata_full._repr_html_() + style_match = re.search(r"]*>(.*?)", html, re.DOTALL) + + if style_match: + css = style_match.group(1) + + # Remove CSS comments + css_clean = re.sub(r"/\*.*?\*/", "", css, flags=re.DOTALL) + + # Extract only top-level selectors (brace depth 0). + # With native CSS nesting, nested selectors inherit scope + # from their parent and don't need individual checking. + selectors = _get_top_level_selectors(css_clean) + + for selector in selectors: + # Skip :root (used for CSS variables) + if ":root" in selector: + continue + # Skip Jupyter theme selectors (these are intentionally global) + if "[data-jp-theme" in selector or ".jp-" in selector: + continue + # Skip @media / @keyframes (parsed separately) + if selector.startswith("@"): + continue + + # All other selectors should be scoped to anndata-repr + assert ".anndata-repr" in selector or "anndata" in selector.lower(), ( + f"CSS selector '{selector}' is not scoped to .anndata-repr. " + "This could affect other Jupyter cells." + ) + + def test_no_global_element_selectors(self, adata_full): + """Test no unscoped element selectors like 'div', 'span', etc. + + Global element selectors would style ALL divs/spans in the notebook. + With native CSS nesting, only top-level selectors are checked since + nested element selectors (e.g., `td` inside `.anndata-repr`) are scoped. + """ + import re + + html = adata_full._repr_html_() + style_match = re.search(r"]*>(.*?)", html, re.DOTALL) + + if style_match: + css = style_match.group(1) + css_clean = re.sub(r"/\*.*?\*/", "", css, flags=re.DOTALL) + + # Only check top-level selectors (brace depth 0) + selectors = _get_top_level_selectors(css_clean) + + bare_elements = { + "div", + "span", + "table", + "tr", + "td", + "th", + "ul", + "li", + "p", + "a", + "button", + } + global_elements = [ + s + for s in selectors + if s.strip().split(",")[0].strip().split()[0] in bare_elements + ] + assert not global_elements, ( + f"Found global element selectors: {global_elements}. " + "These would affect the entire notebook." + ) + + def test_javascript_uses_iife_or_closure(self, adata_full): + """Test JavaScript is wrapped to avoid global scope pollution.""" + import re + + html = adata_full._repr_html_() + script_matches = re.findall( + r"]*>(.*?)", html, re.DOTALL | re.I + ) + + for script in script_matches: + script = script.strip() + if not script: + continue + + # Check for IIFE pattern: (function() { ... })() or (() => { ... })() + has_iife = bool( + re.search(r"\(\s*function\s*\([^)]*\)\s*\{", script) + or re.search(r"\(\s*\([^)]*\)\s*=>\s*\{", script) + ) + + # Check for block scope (const/let at top level within braces) + has_block_scope = bool(re.search(r"\{\s*(const|let)\s+", script)) + + # Check for event handler inline (scoped to element) + is_event_handler = bool(re.search(r"\.addEventListener\s*\(", script)) + + # Should use one of these isolation patterns + assert has_iife or has_block_scope or is_event_handler, ( + "JavaScript should use IIFE, block scope, or event handlers " + "to avoid polluting global scope in Jupyter." + ) + + def test_no_global_function_declarations(self, adata_full): + """Test no unscoped 'function name()' declarations. + + Named function declarations at top level would pollute global scope. + """ + import re + + html = adata_full._repr_html_() + script_matches = re.findall( + r"]*>(.*?)", html, re.DOTALL | re.I + ) + + for script in script_matches: + # Look for "function name(" not inside another function/IIFE + # This is a simplified check - looks for function at start of line + global_funcs = re.findall( + r"^\s*function\s+(\w+)\s*\(", script, re.MULTILINE + ) + + # Filter out functions that are clearly inside IIFEs + # (script starts with "(function" or "(() =>") + if script.strip().startswith("("): + continue # Inside IIFE, OK + + assert not global_funcs, ( + f"Found global function declarations: {global_funcs}. " + "Use const name = function() or wrap in IIFE." + ) + + def test_unique_ids_per_render(self, adata): + """Test each render produces unique element IDs. + + Multiple AnnData cells in same notebook must not have ID collisions. + """ + import re + + html1 = adata._repr_html_() + html2 = adata._repr_html_() + + ids1 = set(re.findall(r'id=["\']([^"\']+)["\']', html1)) + ids2 = set(re.findall(r'id=["\']([^"\']+)["\']', html2)) + + # If both have IDs, they should be different (use unique prefixes) + if ids1 and ids2: + # At least the container IDs should be different + overlap = ids1 & ids2 + # Allow empty overlap or ensure IDs use unique suffixes + for id_val in overlap: + # IDs like "anndata-repr" without unique suffix are problematic + assert re.search(r"[a-f0-9]{6,}|_\d+$", id_val), ( + f"ID '{id_val}' appears in both renders without unique suffix. " + "This could cause conflicts in Jupyter notebooks with multiple cells." + ) + + def test_jupyter_dark_mode_support(self, adata, validate_html): + """Test dark mode CSS uses Jupyter-compatible selectors.""" + html = adata._repr_html_() + + # Should use light-dark() with color-scheme for theming + assert "light-dark(" in html, "CSS should use light-dark() for theming" + + # Should support at least one Jupyter dark mode detection method + has_jp_theme = "[data-jp-theme-light" in html + has_jp_dark_class = ".jp-Theme-Dark" in html + + assert has_jp_theme or has_jp_dark_class, ( + "HTML should support Jupyter dark mode via " + "[data-jp-theme-light] or .jp-Theme-Dark" + ) + + def test_no_document_level_operations(self, adata_full): + """Test JavaScript doesn't use document-level operations unsafely. + + Operations like document.querySelector without scoping could + affect elements in other cells. + """ + import re + + html = adata_full._repr_html_() + script_matches = re.findall( + r"]*>(.*?)", html, re.DOTALL | re.I + ) + + for script in script_matches: + # Check for document.querySelector without container scoping + # OK: container.querySelector, element.querySelector + # Risky: document.querySelector(".class") without context + + # This is a heuristic check - look for document.querySelector + # that doesn't immediately follow a variable assignment from + # a scoped search + if "document.querySelector" in script: + # Should be immediately scoped, e.g.: + # const container = document.getElementById('unique-id') + # container.querySelector(...) + has_scoped_search = bool( + re.search(r"getElementById\s*\(['\"][\w-]+['\"]", script) + ) + assert has_scoped_search, ( + "document.querySelector should be scoped to a container " + "obtained via getElementById with unique ID" + ) + + def test_html_valid_as_fragment(self, adata_full, validate_html5): + """Test HTML is valid when embedded in a typical Jupyter output div. + + Jupyter wraps output in:
...
+ """ + html = adata_full._repr_html_() + + # Wrap in typical Jupyter output structure + jupyter_wrapper = f""" +
+
+
+
+ {html} +
+
+
+
+ """ + + errors = validate_html5(jupyter_wrapper) + # Filter expected fragment-related warnings + critical = [ + e + for e in errors + if not e.startswith("info:") + and "style" not in e.lower() + and "script" not in e.lower() + # vnu's CSS parser doesn't support native CSS nesting + # https://github.com/w3c/css-validator/issues/431 + and "parse error" not in e.lower() + ] + assert not critical, "HTML invalid in Jupyter context:\n" + "\n".join(critical) + + def test_multiple_cells_valid_html(self, validate_html5): + """Test multiple AnnData reprs together produce valid HTML. + + Simulates having multiple AnnData output cells in one notebook view. + """ + # Create different AnnData objects + adata1 = AnnData(np.zeros((10, 5))) + adata1.obs["batch"] = ["A", "B"] * 5 + + adata2 = AnnData(np.zeros((20, 8))) + adata2.uns["key"] = "value" + + adata3 = AnnData(sp.random(100, 50, density=0.1, format="csr")) + + html1 = adata1._repr_html_() + html2 = adata2._repr_html_() + html3 = adata3._repr_html_() + + # Combine as if in multiple notebook cells + combined = f""" +
{html1}
+
{html2}
+
{html3}
+ """ + + errors = validate_html5(combined) + critical = [ + e + for e in errors + if not e.startswith("info:") + and "style" not in e.lower() + and "script" not in e.lower() + # Duplicate styles are OK in fragments + and "duplicate" not in e.lower() + # vnu's CSS parser doesn't support native CSS nesting + # https://github.com/w3c/css-validator/issues/431 + and "parse error" not in e.lower() + ] + assert not critical, "Combined cells invalid:\n" + "\n".join(critical) + + def test_no_id_collisions_multiple_cells(self): + """Test no ID collisions when multiple cells are rendered.""" + import re + + adata1 = AnnData(np.zeros((10, 5))) + adata2 = AnnData(np.zeros((20, 8))) + adata3 = AnnData(np.zeros((15, 6))) + + html1 = adata1._repr_html_() + html2 = adata2._repr_html_() + html3 = adata3._repr_html_() + + ids1 = re.findall(r'id=["\']([^"\']+)["\']', html1) + ids2 = re.findall(r'id=["\']([^"\']+)["\']', html2) + ids3 = re.findall(r'id=["\']([^"\']+)["\']', html3) + + all_ids = ids1 + ids2 + ids3 + if all_ids: + # Check for duplicates + seen = set() + duplicates = [] + for id_val in all_ids: + if id_val in seen: + duplicates.append(id_val) + seen.add(id_val) + + assert not duplicates, ( + f"Duplicate IDs across cells: {duplicates}. " + "Each cell must have unique IDs." + ) + + def test_css_variables_prefixed(self, adata): + """Test CSS variables use anndata prefix to avoid conflicts.""" + import re + + html = adata._repr_html_() + + # Find all CSS variable definitions (not BEM modifiers like --copied) + # CSS vars are defined as: --name: value; (note single colon) + # BEM modifiers are: .block--modifier (in class names) + css_vars = re.findall(r"(? with unique ID prefix + # 3. Use CSS variables that cascade properly + + # Check that the main container uses scoped class + assert 'class="anndata-repr' in html or "class='anndata-repr" in html, ( + "Main container should use .anndata-repr class for CSS scoping" + ) + + def test_works_without_jupyter_css_variables(self, adata, validate_html): + """Test repr works even without Jupyter CSS variables defined. + + The repr should have sensible defaults when --jp-* variables + are not available (e.g., when viewed outside Jupyter). + """ + html = adata._repr_html_() + v = validate_html(html) + + # Should still render all key elements + v.assert_element_exists(".anndata-repr") + v.assert_shape_displayed(100, 50) + + # Should have fallback colors defined (not relying solely on --jp-* vars) + # Check that colors are defined in the CSS, not just referenced + assert "#" in html or "rgb" in html.lower(), ( + "Should define fallback colors for non-Jupyter environments" + ) diff --git a/tests/repr/test_repr_core.py b/tests/repr/test_repr_core.py new file mode 100644 index 000000000..63f8b8bd2 --- /dev/null +++ b/tests/repr/test_repr_core.py @@ -0,0 +1,1292 @@ +""" +Core HTML repr tests: validation, basic generation, settings, header/footer. +""" + +from __future__ import annotations + +import re + +import numpy as np +import pandas as pd +import scipy.sparse as sp + +from anndata import AnnData + +from .conftest import StrictHTMLParser + + +class TestHTMLValidation: + """Validate generated HTML is well-formed and standards-compliant.""" + + def test_html_well_formed(self, adata): + """Test HTML is parseable and well-formed.""" + html = adata._repr_html_() + parser = StrictHTMLParser() + parser.feed(html) + assert not parser.errors, f"HTML errors: {parser.errors}" + assert not parser.tag_stack, f"Unclosed tags: {parser.tag_stack}" + + def test_no_duplicate_ids(self, adata_full): + """Test no duplicate element IDs within same repr.""" + html = adata_full._repr_html_() + ids = re.findall(r'id=["\']([^"\']+)["\']', html) + assert len(ids) == len(set(ids)), f"Duplicate IDs found: {ids}" + + def test_all_links_have_href(self, adata_full): + """Test all anchor tags have href attribute.""" + html = adata_full._repr_html_() + a_without_href = re.search(r"]*href=)[^>]*>", html) + assert a_without_href is None, "Found without href" + + def test_style_tag_valid_css(self, adata): + """Test inline CSS has balanced braces.""" + html = adata._repr_html_() + style_match = re.search(r"]*>(.*?)", html, re.DOTALL) + if style_match: + css = style_match.group(1) + assert css.count("{") == css.count("}"), "Unbalanced CSS braces" + + def test_escaped_user_content(self, adata_with_special_chars): + """Test that user-provided content is properly escaped.""" + html = adata_with_special_chars._repr_html_() + # Should not contain raw ", "", stripped, flags=re.DOTALL) + + # Rich HTML is visible (no display:none) + assert '
for folding + assert "alert" not in html + + def test_readme_tooltip_truncated(self, validate_html): + """Test long README is truncated in tooltip.""" + adata = AnnData(np.zeros((10, 5))) + adata.uns["README"] = "x" * 1000 + html = adata._repr_html_() + v = validate_html(html) + v.assert_element_exists(".anndata-readme__icon") + v.assert_text_visible("...") + + def test_readme_data_attribute_contains_content(self, validate_html): + """Test data-readme attribute contains full content.""" + adata = AnnData(np.zeros((10, 5))) + adata.uns["README"] = "Test content" + html = adata._repr_html_() + v = validate_html(html) + v.assert_element_exists(".anndata-readme__icon") + v.assert_attribute_value(".anndata-readme__icon", "data-readme", "Test content") + + def test_readme_icon_accessibility(self, validate_html): + """Test readme icon has accessibility attributes.""" + adata = AnnData(np.zeros((10, 5))) + adata.uns["README"] = "Description" + html = adata._repr_html_() + v = validate_html(html) + v.assert_element_exists(".anndata-readme__icon") + v.assert_attribute_value(".anndata-readme__icon", "aria-label", "View README") + v.assert_attribute_value(".anndata-readme__icon", "tabindex", "0") + + +class TestGenerateReprHtmlDirectly: + """Test generate_repr_html function directly.""" + + def test_html_disabled_returns_pre(self): + """Test disabled HTML returns pre-formatted text.""" + from anndata import settings + from anndata._repr.html import generate_repr_html + + adata = AnnData(np.zeros((10, 5))) + with settings.override(repr_html_enabled=False): + html = generate_repr_html(adata) + assert "
" in html
+
+    def test_max_depth_reached_at_depth_zero(self):
+        """Test max depth indicator at depth 0."""
+        from anndata._repr.html import generate_repr_html
+
+        adata = AnnData(np.zeros((10, 5)))
+        html = generate_repr_html(adata, depth=0, max_depth=0)
+        assert "max depth" in html.lower()
+
+    def test_nested_anndata_not_expandable_at_max_depth(self):
+        """Test nested AnnData not expandable when at max depth."""
+        from anndata._repr.html import generate_repr_html
+
+        inner = AnnData(np.zeros((5, 3)))
+        outer = AnnData(np.zeros((10, 5)))
+        outer.uns["nested"] = inner
+
+        html = generate_repr_html(outer, max_depth=1)
+        assert "nested" in html
+
+
+class TestPublicAPIExports:
+    """Test that public API items are properly exported."""
+
+    def test_css_js_exports(self):
+        """Test CSS and JS functions are exported."""
+        from anndata._repr import get_css, get_javascript
+
+        css = get_css()
+        assert ".anndata-repr" in css
+
+        js = get_javascript("test-id")
+        assert "test-id" in js
+
+    def test_section_rendering_exports(self):
+        """Test section rendering functions are exported."""
+        from anndata._repr import (
+            render_formatted_entry,
+            render_section,
+        )
+
+        assert callable(render_section)
+        assert callable(render_formatted_entry)
+
+    def test_ui_helper_exports(self):
+        """Test UI helper functions are exported."""
+        from anndata._repr import (
+            render_badge,
+            render_copy_button,
+            render_header_badges,
+            render_search_box,
+            render_warning_icon,
+        )
+
+        assert callable(render_badge)
+        assert callable(render_copy_button)
+        assert callable(render_header_badges)
+        assert callable(render_search_box)
+        assert callable(render_warning_icon)
+
+    def test_utility_exports(self):
+        """Test utility functions are exported."""
+        from anndata._repr import escape_html, format_memory_size, format_number
+
+        assert escape_html("") == "<test>"
+        assert "KB" in format_memory_size(1024)
+        assert format_number(1000) == "1,000"
+
+    def test_registry_exports(self):
+        """Test registry classes are exported."""
+        from anndata._repr import (
+            FormattedEntry,
+            FormattedOutput,
+            FormatterContext,
+            SectionFormatter,
+            TypeFormatter,
+            formatter_registry,
+            register_formatter,
+        )
+
+        assert FormattedOutput is not None
+        assert FormattedEntry is not None
+        assert FormatterContext is not None
+        assert TypeFormatter is not None
+        assert SectionFormatter is not None
+        assert formatter_registry is not None
+        assert callable(register_formatter)
+
+    def test_generate_repr_html_export(self):
+        """Test generate_repr_html is exported."""
+        from anndata._repr import generate_repr_html
+
+        adata = AnnData(np.zeros((10, 5)))
+        html = generate_repr_html(adata)
+        assert "anndata-repr" in html
+
+
+class TestNeverCrash:
+    """Tests ensuring repr never crashes regardless of data content."""
+
+    def test_none_in_obs_column(self, validate_html):
+        """Test repr handles None values in obs columns."""
+        adata = AnnData(np.zeros((5, 3)))
+        adata.obs["with_none"] = [None, "a", None, "b", None]
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_section_contains_entry("obs", "with_none")
+
+    def test_nan_in_obs_column(self, validate_html):
+        """Test repr handles NaN values in obs columns."""
+        adata = AnnData(np.zeros((5, 3)))
+        adata.obs["with_nan"] = [np.nan, 1.0, np.nan, 2.0, np.nan]
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_section_contains_entry("obs", "with_nan")
+
+    def test_inf_in_obs_column(self, validate_html):
+        """Test repr handles inf values in obs columns."""
+        adata = AnnData(np.zeros((5, 3)))
+        adata.obs["with_inf"] = [np.inf, -np.inf, 0, 1, 2]
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_section_contains_entry("obs", "with_inf")
+
+    def test_empty_string_column(self, validate_html):
+        """Test repr handles empty string columns."""
+        adata = AnnData(np.zeros((5, 3)))
+        adata.obs["empty_strings"] = ["", "", "", "", ""]
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_section_contains_entry("obs", "empty_strings")
+
+    def test_mixed_type_list_in_uns(self, validate_html):
+        """Test repr handles mixed type lists in uns."""
+        adata = AnnData(np.zeros((5, 3)))
+        adata.uns["mixed"] = [1, "string", None, 3.14, True]
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_section_contains_entry("uns", "mixed")
+
+    def test_circular_reference_like_structure(self, validate_html):
+        """Test repr handles dict structures that reference themselves."""
+        adata = AnnData(np.zeros((5, 3)))
+        d = {"a": 1}
+        d["self_like"] = {"nested": d.copy()}  # Not circular but deep
+        adata.uns["deep"] = d
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+
+    def test_very_long_string_in_uns(self, validate_html):
+        """Test repr handles very long strings."""
+        adata = AnnData(np.zeros((5, 3)))
+        adata.uns["long_string"] = "x" * 10000
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_section_contains_entry("uns", "long_string")
+
+    def test_special_characters_in_keys(self, validate_html):
+        """Test repr handles special characters in keys."""
+        adata = AnnData(np.zeros((5, 3)))
+        adata.uns["keyspecial&chars"] = "value"
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        # Special chars should be escaped in section
+        v.assert_section_exists("uns")
+
+    def test_unicode_in_data(self, validate_html):
+        """Test repr handles unicode in data."""
+        adata = AnnData(np.zeros((5, 3)))
+        adata.obs["unicode"] = ["日本語", "émoji🧬", "中文", "العربية", "עברית"]
+        adata.uns["unicode_key_日本語"] = "value"
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_section_contains_entry("obs", "unicode")
+
+    def test_empty_categorical(self, validate_html):
+        """Test repr handles empty categorical."""
+        adata = AnnData(np.zeros((5, 3)))
+        adata.obs["empty_cat"] = pd.Categorical([None] * 5, categories=["a", "b", "c"])
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_section_contains_entry("obs", "empty_cat")
+
+    def test_zero_size_array_in_obsm(self, validate_html):
+        """Test repr handles zero-size arrays in obsm."""
+        adata = AnnData(np.zeros((5, 3)))
+        adata.obsm["empty"] = np.zeros((5, 0))
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_section_contains_entry("obsm", "empty")
+
+    def test_object_with_failing_repr(self, validate_html):
+        """Test repr handles objects whose __repr__ fails."""
+
+        class FailingRepr:
+            def __repr__(self):
+                msg = "Repr failed"
+                raise RuntimeError(msg)
+
+        adata = AnnData(np.zeros((5, 3)))
+        adata.uns["failing"] = FailingRepr()
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_section_contains_entry("uns", "failing")
+
+    def test_object_with_failing_sizeof(self, validate_html):
+        """Test repr handles objects whose __sizeof__ fails."""
+
+        class FailingSizeof:
+            def __sizeof__(self):
+                msg = "Sizeof failed"
+                raise RuntimeError(msg)
+
+        adata = AnnData(np.zeros((5, 3)))
+        adata.uns["failing_size"] = FailingSizeof()
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+
+
+class TestViewAndBackedModes:
+    """Tests for view and backed mode handling."""
+
+    def test_view_shows_badge(self, validate_html):
+        """Test view shows View badge."""
+        adata = AnnData(np.zeros((10, 5)))
+        view = adata[0:5, :]
+        html = view._repr_html_()
+        v = validate_html(html)
+        v.assert_badge_shown("view")
+
+    def test_backed_shows_badge(self, tmp_path, validate_html):
+        """Test backed mode shows badge."""
+        import anndata as ad
+
+        adata = AnnData(np.zeros((10, 5)))
+        path = tmp_path / "test.h5ad"
+        adata.write_h5ad(path)
+
+        backed = ad.read_h5ad(path, backed="r")
+        html = backed._repr_html_()
+        v = validate_html(html)
+        v.assert_badge_shown("backed")
+        backed.file.close()
+
+    def test_view_of_backed_shows_both_badges(self, tmp_path, validate_html):
+        """Test view of backed shows both badges."""
+        import anndata as ad
+
+        adata = AnnData(np.zeros((10, 5)))
+        path = tmp_path / "test.h5ad"
+        adata.write_h5ad(path)
+
+        backed = ad.read_h5ad(path, backed="r")
+        view = backed[0:5, :]
+        html = view._repr_html_()
+        v = validate_html(html)
+        v.assert_badge_shown("view")
+        v.assert_badge_shown("backed")
+        backed.file.close()
+
+
+class TestComprehensiveAnnData:
+    """Test with comprehensive AnnData similar to visual_inspect."""
+
+    def test_comprehensive_anndata_renders_completely(self, validate_html):
+        """Test comprehensive AnnData with all features renders without error."""
+        n_obs, n_vars = 100, 50
+
+        adata = AnnData(
+            sp.random(n_obs, n_vars, density=0.1, format="csr", dtype=np.float32),
+            obs=pd.DataFrame({
+                "cell_type": pd.Categorical(
+                    ["T cell", "B cell", "NK cell", "Monocyte", "DC"] * (n_obs // 5)
+                ),
+                "louvain": pd.Categorical([
+                    f"cluster_{i}" for i in np.random.randint(0, 8, n_obs)
+                ]),
+                "n_counts": np.random.randint(1000, 50000, n_obs),
+                "percent_mito": np.random.uniform(0, 15, n_obs).astype(np.float32),
+                "is_doublet": np.random.choice([True, False], n_obs, p=[0.1, 0.9]),
+            }),
+            var=pd.DataFrame({
+                "gene_symbol": [f"GN{i}" for i in range(n_vars)],
+                "highly_variable": np.random.choice(
+                    [True, False], n_vars, p=[0.2, 0.8]
+                ),
+                "means": np.random.exponential(1, n_vars).astype(np.float32),
+            }),
+        )
+
+        # Colors
+        adata.uns["cell_type_colors"] = [
+            "#FF6B6B",
+            "#4ECDC4",
+            "#45B7D1",
+            "#96CEB4",
+            "#FFEAA7",
+        ]
+        adata.uns["louvain_colors"] = [
+            "#1f77b4",
+            "#ff7f0e",
+            "#2ca02c",
+            "#d62728",
+            "#9467bd",
+            "#8c564b",
+            "#e377c2",
+            "#7f7f7f",
+        ]
+
+        # Uns
+        adata.uns["neighbors"] = {"params": {"n_neighbors": 15}}
+        adata.uns["experiment_id"] = "EXP_001"
+        adata.uns["steps"] = ["filter", "normalize", "pca", "umap"]
+
+        # Nested AnnData
+        inner = AnnData(np.zeros((10, 5)))
+        adata.uns["subset"] = inner
+
+        # Obsm
+        adata.obsm["X_pca"] = np.random.randn(n_obs, 50).astype(np.float32)
+        adata.obsm["X_umap"] = np.random.randn(n_obs, 2).astype(np.float32)
+
+        # Varm
+        adata.varm["PCs"] = np.random.randn(n_vars, 50).astype(np.float32)
+
+        # Layers
+        adata.layers["counts"] = sp.random(n_obs, n_vars, density=0.1, format="csr")
+        adata.layers["normalized"] = np.random.randn(n_obs, n_vars).astype(np.float32)
+
+        # Obsp/Varp
+        adata.obsp["distances"] = sp.random(n_obs, n_obs, density=0.05, format="csr")
+        adata.varp["correlations"] = sp.random(
+            n_vars, n_vars, density=0.1, format="csr"
+        )
+
+        # Raw
+        raw = AnnData(
+            sp.random(n_obs, n_vars + 20, density=0.1, format="csr"),
+            var=pd.DataFrame({"gene": [f"G{i}" for i in range(n_vars + 20)]}),
+        )
+        adata.raw = raw
+
+        # Generate HTML and validate structure
+        html = adata._repr_html_()
+        v = validate_html(html)
+
+        # Validate key sections are present
+        v.assert_element_exists(".anndata-repr")
+        v.assert_section_exists("obs")
+        v.assert_section_exists("var")
+        v.assert_section_exists("uns")
+        v.assert_section_exists("obsm")
+        v.assert_section_exists("varm")
+        v.assert_section_exists("layers")
+        v.assert_section_exists("obsp")
+        v.assert_section_exists("varp")
+        v.assert_section_exists("raw")
+
+        # Validate key entries in correct sections
+        v.assert_section_contains_entry("obs", "cell_type")
+        v.assert_section_contains_entry("obsm", "X_pca")
+        v.assert_section_contains_entry("obsm", "X_umap")
+        v.assert_section_contains_entry("uns", "neighbors")
+        v.assert_section_contains_entry("layers", "counts")
+        v.assert_section_contains_entry("obsp", "distances")
+
+        # Validate shape is displayed
+        v.assert_shape_displayed(100, 50)
+
+        # Colors should be shown
+        v.assert_color_swatch("#FF6B6B")
+
+
+class TestIndexPreview:
+    """Tests for obs_names and var_names preview."""
+
+    def test_obs_names_preview_shown(self, validate_html):
+        """Test obs_names preview is shown."""
+        adata = AnnData(
+            np.zeros((10, 5)),
+            obs=pd.DataFrame(index=[f"cell_{i}" for i in range(10)]),
+        )
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_text_visible("cell_0")
+
+    def test_var_names_preview_shown(self, validate_html):
+        """Test var_names preview is shown."""
+        adata = AnnData(
+            np.zeros((10, 5)),
+            var=pd.DataFrame(index=[f"gene_{i}" for i in range(5)]),
+        )
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_text_visible("gene_0")
+
+    def test_long_index_names_truncated(self, validate_html):
+        """Test long index names are handled."""
+        long_name = "very_long_cell_name_" * 10
+        adata = AnnData(
+            np.zeros((3, 2)),
+            obs=pd.DataFrame(index=[long_name, "short", "medium_name"]),
+        )
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_shape_displayed(3, 2)
+
+    def test_numeric_index_shown(self, validate_html):
+        """Test numeric index is shown correctly."""
+        adata = AnnData(np.zeros((10, 5)))
+        # Default index is 0, 1, 2, ...
+        html = adata._repr_html_()
+        v = validate_html(html)
+        v.assert_element_exists(".anndata-repr")
+        v.assert_text_visible("0")
diff --git a/tests/repr/test_repr_formatters.py b/tests/repr/test_repr_formatters.py
new file mode 100644
index 000000000..de7a66347
--- /dev/null
+++ b/tests/repr/test_repr_formatters.py
@@ -0,0 +1,1066 @@
+"""
+Type formatter tests for the _repr module.
+
+Tests for NumpyArrayFormatter, SparseMatrixFormatter, CategoricalFormatter,
+DaskArrayFormatter, AwkwardArrayFormatter, ArrayAPIFormatter,
+and all built-in type formatters.
+"""
+
+from __future__ import annotations
+
+import re
+
+import numpy as np
+import pandas as pd
+import pytest
+import scipy.sparse as sp
+
+from anndata import AnnData
+
+from .conftest import HAS_AWKWARD, HAS_DASK
+
+
+def _make_array_api_mock(module: str, *, shape, dtype, device="cpu"):
+    """Create a mock array satisfying the SupportsArrayApi protocol.
+
+    The mock has all attributes required by ``anndata.types.SupportsArrayApi``
+    (``shape``, ``device``, ``__array_namespace__``, ``to_device``,
+    ``__dlpack__``, ``__dlpack_device__``) so that ``has_xp()`` returns True.
+    """
+    ns_module = type("Namespace", (), {"__name__": module.split(".", maxsplit=1)[0]})()
+
+    cls = type(
+        "MockArrayAPI",
+        (),
+        {
+            "shape": shape,
+            "dtype": dtype,
+            "ndim": len(shape),
+            "device": device,
+            "__array_namespace__": lambda self, **kw: ns_module,
+            "to_device": lambda self, dev, /, **kw: self,
+            "__dlpack__": lambda self, **kw: None,
+            "__dlpack_device__": lambda self: (1, 0),
+        },
+    )
+    cls.__module__ = module
+    return cls()
+
+
+class TestNumpyFormatters:
+    """Tests for NumPy array formatters."""
+
+    def test_numpy_array_formatter(self):
+        """Test numpy array formatting."""
+        from anndata._repr.formatters import NumpyArrayFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = NumpyArrayFormatter()
+        arr = np.zeros((100, 50), dtype=np.float32)
+
+        assert formatter.can_format(arr, FormatterContext())
+        result = formatter.format(arr, FormatterContext())
+
+        assert "100" in result.type_name
+        assert "50" in result.type_name
+        assert "float32" in result.type_name
+
+    def test_numpy_array_3d(self):
+        """Test numpy array formatter with 3D+ arrays."""
+        from anndata._repr.formatters import NumpyArrayFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = NumpyArrayFormatter()
+        arr = np.zeros((10, 5, 3))
+        result = formatter.format(arr, FormatterContext())
+
+        assert "(10, 5, 3)" in result.type_name
+
+    def test_masked_array_formatter(self):
+        """Test MaskedArray formatter."""
+        from anndata._repr.formatters import NumpyMaskedArrayFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = NumpyMaskedArrayFormatter()
+        arr = np.ma.array([1, 2, 3, 4, 5], mask=[0, 0, 1, 0, 1])
+
+        assert formatter.can_format(arr, FormatterContext())
+        result = formatter.format(arr, FormatterContext())
+
+        assert "MaskedArray" in result.type_name
+        assert "2 masked values" in result.tooltip
+
+    def test_masked_array_no_mask(self):
+        """Test MaskedArray formatter with no masked values."""
+        from anndata._repr.formatters import NumpyMaskedArrayFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = NumpyMaskedArrayFormatter()
+        arr = np.ma.array([1, 2, 3, 4, 5])
+
+        result = formatter.format(arr, FormatterContext())
+        assert result.tooltip == ""  # No masked values means empty tooltip
+
+
+class TestSparseFormatters:
+    """Tests for sparse matrix formatters."""
+
+    def test_sparse_matrix_formatter(self):
+        """Test sparse matrix formatting."""
+        from anndata._repr.formatters import SparseMatrixFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = SparseMatrixFormatter()
+        mat = sp.random(1000, 500, density=0.1, format="csr")
+
+        assert formatter.can_format(mat, FormatterContext())
+        result = formatter.format(mat, FormatterContext())
+
+        assert "csr" in result.type_name.lower()
+        assert "sparse" in result.type_name.lower()
+        assert "stored" in result.type_name.lower()
+
+    def test_sparse_csc_formatter(self):
+        """Test sparse formatter with CSC matrix."""
+        from anndata._repr.formatters import SparseMatrixFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = SparseMatrixFormatter()
+        mat = sp.csc_matrix([[1, 0], [0, 2]])
+
+        result = formatter.format(mat, FormatterContext())
+        assert "csc" in result.type_name.lower()
+
+    def test_sparse_coo_formatter(self):
+        """Test sparse formatter with COO matrix."""
+        from anndata._repr.formatters import SparseMatrixFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = SparseMatrixFormatter()
+        mat = sp.coo_matrix([[1, 0], [0, 2]])
+
+        result = formatter.format(mat, FormatterContext())
+        assert "coo" in result.type_name.lower()
+
+    def test_sparse_lil_formatter(self):
+        """Test sparse formatter with LIL matrix."""
+        from anndata._repr.formatters import SparseMatrixFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = SparseMatrixFormatter()
+        mat = sp.lil_matrix((10, 10))
+        mat[0, 0] = 1
+        mat[5, 5] = 2
+
+        assert formatter.can_format(mat, FormatterContext())
+        result = formatter.format(mat, FormatterContext())
+        assert "lil" in result.type_name.lower()
+
+    def test_sparse_dok_formatter(self):
+        """Test sparse formatter with DOK matrix."""
+        from anndata._repr.formatters import SparseMatrixFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = SparseMatrixFormatter()
+        mat = sp.dok_matrix((10, 10))
+        mat[0, 0] = 1
+        mat[5, 5] = 2
+
+        assert formatter.can_format(mat, FormatterContext())
+        result = formatter.format(mat, FormatterContext())
+        assert "dok" in result.type_name.lower()
+
+    def test_sparse_dia_formatter(self):
+        """Test sparse formatter with DIA matrix."""
+        from anndata._repr.formatters import SparseMatrixFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = SparseMatrixFormatter()
+        data = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
+        offsets = np.array([0, 1])
+        mat = sp.dia_matrix((data, offsets), shape=(4, 4))
+
+        assert formatter.can_format(mat, FormatterContext())
+        result = formatter.format(mat, FormatterContext())
+        assert "dia" in result.type_name.lower()
+
+    def test_sparse_bsr_formatter(self):
+        """Test sparse formatter with BSR matrix."""
+        from anndata._repr.formatters import SparseMatrixFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = SparseMatrixFormatter()
+        mat = sp.bsr_matrix(
+            np.array([[1, 0, 0, 0], [0, 0, 2, 0], [0, 0, 0, 3], [4, 0, 0, 0]])
+        )
+
+        assert formatter.can_format(mat, FormatterContext())
+        result = formatter.format(mat, FormatterContext())
+        assert "bsr" in result.type_name.lower()
+
+    def test_sparse_zero_elements(self):
+        """Test sparse formatter with zero-element matrix."""
+        from anndata._repr.formatters import SparseMatrixFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = SparseMatrixFormatter()
+        mat = sp.csr_matrix((0, 0))
+
+        result = formatter.format(mat, FormatterContext())
+        assert (
+            "sparse" not in result.type_name.lower() or "stored" not in result.type_name
+        )
+
+    def test_sparse_formatter_duck_typing_fallback(self):
+        """Test sparse formatter uses duck typing when scipy checks fail."""
+        from anndata._repr.formatters import SparseMatrixFormatter
+        from anndata._repr.registry import FormatterContext
+
+        class MockSparseArray:
+            def __init__(self):
+                self.nnz = 10
+                self.shape = (5, 5)
+                self.dtype = np.float64
+
+            def tocsr(self):
+                pass
+
+        MockSparseArray.__module__ = "scipy.sparse._csr"
+
+        formatter = SparseMatrixFormatter()
+        mock_sparse = MockSparseArray()
+
+        assert formatter.can_format(mock_sparse, FormatterContext())
+        result = formatter.format(mock_sparse, FormatterContext())
+        assert "MockSparseArray" in result.type_name
+
+
+class TestPandasFormatters:
+    """Tests for pandas formatters."""
+
+    def test_categorical_formatter(self):
+        """Test categorical formatting."""
+        from anndata._repr.formatters import CategoricalFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = CategoricalFormatter()
+        cat_series = pd.Series(pd.Categorical(["A", "B", "C"] * 10))
+
+        assert formatter.can_format(cat_series, FormatterContext())
+        result = formatter.format(cat_series, FormatterContext())
+
+        assert "category" in result.type_name.lower()
+        assert "(3)" in result.type_name
+
+    def test_categorical_direct_object(self):
+        """Test CategoricalFormatter with direct pd.Categorical object."""
+        from anndata._repr.formatters import CategoricalFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = CategoricalFormatter()
+        cat = pd.Categorical(["A", "B", "A", "C"])
+
+        assert formatter.can_format(cat, FormatterContext())
+        result = formatter.format(cat, FormatterContext())
+        assert "category" in result.type_name.lower()
+        assert "(3)" in result.type_name
+
+    def test_series_formatter_simple(self):
+        """Test SeriesFormatter with simple numeric series."""
+        from anndata._repr.formatters import SeriesFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = SeriesFormatter()
+        series = pd.Series([1.0, 2.0, 3.0, 4.0, 5.0])
+
+        assert formatter.can_format(series, FormatterContext())
+        result = formatter.format(series, FormatterContext())
+        assert "float64" in result.type_name
+
+    def test_dataframe_formatter(self):
+        """Test DataFrameFormatter."""
+        from anndata._repr.formatters import DataFrameFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = DataFrameFormatter()
+        df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
+
+        assert formatter.can_format(df, FormatterContext())
+        result = formatter.format(df, FormatterContext())
+        assert "3 × 2" in result.type_name
+
+        result_obsm = formatter.format(df, FormatterContext(section="obsm"))
+        assert result_obsm.preview_html is not None
+        assert "a" in result_obsm.preview_html
+        assert "b" in result_obsm.preview_html
+
+    def test_dataframe_formatter_expandable(self):
+        """Test DataFrameFormatter with expandable to_html enabled."""
+        import anndata
+        from anndata._repr.formatters import DataFrameFormatter
+        from anndata._repr.registry import FormatterContext
+
+        formatter = DataFrameFormatter()
+        ctx = FormatterContext()
+        df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
+
+        result = formatter.format(df, ctx)
+        assert result.expanded_html is None
+
+        original = anndata.settings.repr_html_dataframe_expand
+        try:
+            anndata.settings.repr_html_dataframe_expand = True
+            result_expanded = formatter.format(df, ctx)
+            assert result_expanded.expanded_html is not None
+            assert " 0
+        assert (
+            "FutureArray" in html
+            or "future" in html.lower()
+            or "object" in html.lower()
+        )
+
+    def test_css_array_api_styling_exists(self):
+        """Test that CSS styling for Array-API arrays is present."""
+        from anndata._repr.css import get_css
+
+        css = get_css()
+
+        assert "anndata-dtype--array-api" in css
+        pattern = r"\.anndata-dtype--array-api\s*\{[^}]*color:"
+        assert re.search(pattern, css)
+
+
+class TestCustomHtmlContent:
+    """Tests for custom HTML content in Type Formatters."""
+
+    def test_inline_html_content(self):
+        """Test inline (non-expandable) custom HTML content in preview column."""
+        from anndata._repr.registry import (
+            FormattedOutput,
+            TypeFormatter,
+            formatter_registry,
+        )
+
+        # Custom array class that supports inline HTML preview
+        class CustomArray(np.ndarray):
+            _test_inline_html = True
+
+        class InlineHtmlFormatter(TypeFormatter):
+            priority = 2000  # High priority to be checked first
+
+            def can_format(self, obj, context):
+                return isinstance(obj, np.ndarray) and getattr(
+                    obj, "_test_inline_html", False
+                )
+
+            def format(self, obj, context):
+                return FormattedOutput(
+                    type_name="CustomInline",
+                    css_class="anndata-dtype--custom",
+                    preview_html='Inline Preview',
+                )
+
+        formatter = InlineHtmlFormatter()
+        formatter_registry.register_type_formatter(formatter)
+
+        try:
+            adata = AnnData(np.zeros((5, 3)))
+            # Create custom array with proper shape
+            custom = np.zeros((5, 2)).view(CustomArray)
+            adata.obsm["custom_data"] = custom
+
+            html = adata._repr_html_()
+
+            assert "CustomInline" in html
+            assert "Inline Preview" in html
+            assert "test-inline" in html
+        finally:
+            formatter_registry.unregister_type_formatter(formatter)
+
+    def test_expandable_html_content(self):
+        """Test expandable custom HTML content (e.g., for TreeData visualization)."""
+        from anndata._repr.registry import (
+            FormattedOutput,
+            TypeFormatter,
+            formatter_registry,
+        )
+
+        # Custom array class that supports expandable HTML (like TreeData)
+        class TreeArray(np.ndarray):
+            _test_expandable_html = True
+
+        class ExpandableHtmlFormatter(TypeFormatter):
+            priority = 2000
+
+            def can_format(self, obj, context):
+                return isinstance(obj, np.ndarray) and getattr(
+                    obj, "_test_expandable_html", False
+                )
+
+            def format(self, obj, context):
+                tree_html = """
+                
+
    +
  • Root +
      +
    • Child 1
    • +
    • Child 2
    • +
    +
  • +
+
+ """ + return FormattedOutput( + type_name="TreeData (3 nodes)", + css_class="anndata-dtype--tree", + expanded_html=tree_html, + ) + + formatter = ExpandableHtmlFormatter() + formatter_registry.register_type_formatter(formatter) + + try: + adata = AnnData(np.zeros((5, 3))) + # Create tree array with proper shape + tree = np.zeros((5, 4)).view(TreeArray) + adata.obsm["tree"] = tree + + html = adata._repr_html_() + + # Should have the type name + assert "TreeData (3 nodes)" in html + # Should have expand button + assert "expand" in html.lower() or "Expand" in html + # Should have the tree content somewhere + assert "test-tree" in html + finally: + formatter_registry.unregister_type_formatter(formatter) + + def test_expand_button_for_expandable_content(self): + """Test expand button appears for expandable content.""" + from anndata import settings + + adata = AnnData(np.zeros((10, 5))) + # DataFrames with expand setting enabled show expand button + adata.uns["df"] = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) + + with settings.override(repr_html_dataframe_expand=True): + html = adata._repr_html_() + # Should have expand functionality + assert "expand" in html.lower() or "Expand" in html + + +class TestReadmeIcon: + """Tests for README icon functionality.""" + + def test_readme_icon_shown_with_content(self): + """Test README icon appears when README is set.""" + adata = AnnData(np.zeros((5, 3))) + adata.uns["README"] = "This is a test dataset with important documentation." + html = adata._repr_html_() + # Should show the README content or icon + assert "README" in html or "readme" in html.lower() + + def test_readme_content_accessible(self): + """Test README content is accessible.""" + adata = AnnData(np.zeros((5, 3))) + readme_text = "Dataset contains single-cell RNA-seq data." + adata.uns["README"] = readme_text + html = adata._repr_html_() + # The readme text should be somewhere in the HTML + assert readme_text in html or "README" in html + + +class TestObsmVarmPreviewConsistency: + """Tests for consistent column preview in obsm/varm sections across array formatters. + + Array formatters should show "(N columns)" preview for 2D arrays in obsm/varm. + This test class ensures all array-like formatters implement this consistently. + """ + + def test_numpy_array_obsm_preview(self): + """Test NumpyArrayFormatter shows column count in obsm.""" + from anndata._repr.formatters import NumpyArrayFormatter + from anndata._repr.registry import FormatterContext + + formatter = NumpyArrayFormatter() + arr = np.zeros((100, 25), dtype=np.float32) + + # No preview outside obsm/varm + result = formatter.format(arr, FormatterContext(section="uns")) + assert result.preview is None + + # Shows columns in obsm + result = formatter.format(arr, FormatterContext(section="obsm")) + assert result.preview == "(25 columns)" + + # Shows columns in varm + result = formatter.format(arr, FormatterContext(section="varm")) + assert result.preview == "(25 columns)" + + def test_numpy_masked_array_obsm_preview(self): + """Test NumpyMaskedArrayFormatter shows column count in obsm.""" + from anndata._repr.formatters import NumpyMaskedArrayFormatter + from anndata._repr.registry import FormatterContext + + formatter = NumpyMaskedArrayFormatter() + arr = np.ma.array(np.zeros((100, 25)), mask=False) + + # No preview outside obsm/varm + result = formatter.format(arr, FormatterContext(section="uns")) + assert result.preview is None + + # Shows columns in obsm + result = formatter.format(arr, FormatterContext(section="obsm")) + assert result.preview == "(25 columns)" + + # Shows columns in varm + result = formatter.format(arr, FormatterContext(section="varm")) + assert result.preview == "(25 columns)" + + def test_array_api_cupy_obsm_preview(self): + """Test ArrayAPIFormatter shows column count in obsm for CuPy-like arrays.""" + from anndata._repr.formatters import ArrayAPIFormatter + from anndata._repr.registry import FormatterContext + + arr = _make_array_api_mock( + "cupy._core.core", shape=(100, 30), dtype=np.float32, device="gpu:0" + ) + + formatter = ArrayAPIFormatter() + + # No preview outside obsm/varm + result = formatter.format(arr, FormatterContext(section="uns")) + assert result.preview is None + + # Shows columns in obsm + result = formatter.format(arr, FormatterContext(section="obsm")) + assert result.preview == "(30 columns)" + + # Shows columns in varm + result = formatter.format(arr, FormatterContext(section="varm")) + assert result.preview == "(30 columns)" + + @pytest.mark.skipif(not HAS_DASK, reason="dask not installed") + def test_dask_array_obsm_preview(self): + """Test DaskArrayFormatter shows column count in obsm.""" + import dask.array as da + + from anndata._repr.formatters import DaskArrayFormatter + from anndata._repr.registry import FormatterContext + + formatter = DaskArrayFormatter() + arr = da.zeros((100, 20), chunks=(50, 10)) + + # No preview outside obsm/varm + result = formatter.format(arr, FormatterContext(section="uns")) + assert result.preview is None + + # Shows columns in obsm + result = formatter.format(arr, FormatterContext(section="obsm")) + assert result.preview == "(20 columns)" + + # Shows columns in varm + result = formatter.format(arr, FormatterContext(section="varm")) + assert result.preview == "(20 columns)" + + def test_array_api_formatter_obsm_preview(self): + """Test ArrayAPIFormatter shows column count in obsm.""" + from anndata._repr.formatters import ArrayAPIFormatter + from anndata._repr.registry import FormatterContext + + arr = _make_array_api_mock( + "jax.numpy", shape=(100, 15), dtype=np.float32, device="cpu" + ) + + formatter = ArrayAPIFormatter() + + # No preview outside obsm/varm + result = formatter.format(arr, FormatterContext(section="uns")) + assert result.preview is None + + # Shows columns in obsm + result = formatter.format(arr, FormatterContext(section="obsm")) + assert result.preview == "(15 columns)" + + # Shows columns in varm + result = formatter.format(arr, FormatterContext(section="varm")) + assert result.preview == "(15 columns)" + + def test_dataframe_obsm_preview_html(self): + """Test DataFrameFormatter shows column list in obsm preview_html.""" + from anndata._repr.formatters import DataFrameFormatter + from anndata._repr.registry import FormatterContext + + formatter = DataFrameFormatter() + df = pd.DataFrame({"col_a": [1, 2, 3], "col_b": [4, 5, 6], "col_c": [7, 8, 9]}) + + # No preview_html outside obsm/varm + result = formatter.format(df, FormatterContext(section="uns")) + assert result.preview_html is None + + # Shows column names in obsm + result = formatter.format(df, FormatterContext(section="obsm")) + assert result.preview_html is not None + assert "col_a" in result.preview_html + assert "col_b" in result.preview_html + assert "col_c" in result.preview_html + + # Shows column names in varm + result = formatter.format(df, FormatterContext(section="varm")) + assert result.preview_html is not None + assert "col_a" in result.preview_html + + def test_1d_arrays_no_preview(self): + """Test that 1D arrays don't show column preview in obsm/varm.""" + from anndata._repr.formatters import ( + NumpyArrayFormatter, + NumpyMaskedArrayFormatter, + ) + from anndata._repr.registry import FormatterContext + + # 1D numpy array + np_formatter = NumpyArrayFormatter() + arr_1d = np.zeros((100,), dtype=np.float32) + result = np_formatter.format(arr_1d, FormatterContext(section="obsm")) + assert result.preview is None + + # 1D masked array + ma_formatter = NumpyMaskedArrayFormatter() + ma_1d = np.ma.array(np.zeros((100,))) + result = ma_formatter.format(ma_1d, FormatterContext(section="obsm")) + assert result.preview is None + + def test_large_column_count_formatting(self): + """Test that large column counts are formatted with thousands separators.""" + from anndata._repr.formatters import NumpyArrayFormatter + from anndata._repr.registry import FormatterContext + + formatter = NumpyArrayFormatter() + arr = np.zeros((100, 12345), dtype=np.float32) + + result = formatter.format(arr, FormatterContext(section="obsm")) + # format_number adds thousands separators + assert "12,345" in result.preview or "12345" in result.preview diff --git a/tests/repr/test_repr_lazy.py b/tests/repr/test_repr_lazy.py new file mode 100644 index 000000000..786159e98 --- /dev/null +++ b/tests/repr/test_repr_lazy.py @@ -0,0 +1,826 @@ +""" +Lazy loading tests for the _repr module. + +Tests for module lazy loading, lazy categorical handling, +and lazy AnnData representation. +""" + +from __future__ import annotations + +import numpy as np +import pandas as pd +import pytest +import scipy.sparse as sp + +import anndata as ad +from anndata import AnnData + +from .conftest import HAS_XARRAY + + +class TestModuleLazyLoading: + """Tests to ensure _repr module doesn't load on import anndata.""" + + def test_repr_module_not_loaded_on_import(self): + """Verify that importing anndata doesn't load the full _repr module.""" + import subprocess + import sys + + code = """ +import sys +import anndata +repr_modules = [m for m in sys.modules if 'anndata._repr' in m and m != 'anndata._repr'] +repr_modules = [m for m in repr_modules if '_repr_constants' not in m] +if repr_modules: + print(f"FAIL: These _repr modules were loaded on import: {repr_modules}") + sys.exit(1) +else: + print("OK: _repr module not loaded on import") + sys.exit(0) +""" + result = subprocess.run( + [sys.executable, "-c", code], + check=False, + capture_output=True, + text=True, + ) + assert result.returncode == 0, ( + f"Lazy loading failed: {result.stdout}\n{result.stderr}" + ) + + def test_repr_module_loads_on_repr_html_call(self): + """Verify that _repr module loads when _repr_html_() is called.""" + import subprocess + import sys + + code = """ +import sys +import anndata as ad +import numpy as np + +adata = ad.AnnData(np.eye(3)) +_ = adata._repr_html_() + +repr_modules = [m for m in sys.modules if 'anndata._repr.' in m] +if not repr_modules: + print("FAIL: _repr modules not loaded after _repr_html_()") + sys.exit(1) +else: + print(f"OK: _repr modules loaded: {len(repr_modules)} submodules") + sys.exit(0) +""" + result = subprocess.run( + [sys.executable, "-c", code], + check=False, + capture_output=True, + text=True, + ) + assert result.returncode == 0, ( + f"Module loading failed: {result.stdout}\n{result.stderr}" + ) + + +class TestLazyCategoryLoading: + """Tests for lazy category loading in HTML repr.""" + + def test_get_lazy_category_count(self): + """Test get_lazy_category_count returns None for non-lazy columns.""" + from anndata._repr.lazy import get_lazy_category_count + + series = pd.Series(pd.Categorical(["a", "b", "c"])) + result = get_lazy_category_count(series) + assert result is None + + class MockCol: + pass + + assert get_lazy_category_count(MockCol()) is None + + non_cat = pd.Series([1, 2, 3]) + assert get_lazy_category_count(non_cat) is None + + def test_get_lazy_categories_max_zero_skips(self): + """Test that max_lazy_categories=0 skips loading entirely.""" + from anndata._repr.lazy import get_lazy_categories + from anndata._repr.registry import FormatterContext + + context = FormatterContext(max_lazy_categories=0) + + class MockCol: + pass + + categories, skipped, n_cats = get_lazy_categories(MockCol(), context) + assert categories == [] + assert skipped is True + assert n_cats is None + + def test_get_categories_for_display_non_lazy(self): + """Test get_categories_for_display with regular (non-lazy) categorical.""" + from anndata._repr.registry import FormatterContext + from anndata._repr.utils import get_categories_for_display + + context = FormatterContext() + series = pd.Series(pd.Categorical(["a", "b", "c", "a"])) + + categories, skipped, n_cats = get_categories_for_display( + series, context, is_lazy=False + ) + assert set(categories) == {"a", "b", "c"} + assert skipped is False + assert n_cats == 3 + + def test_default_max_lazy_categories_export(self): + """Test that DEFAULT_MAX_LAZY_CATEGORIES is properly exported.""" + from anndata._repr import DEFAULT_MAX_LAZY_CATEGORIES + + assert isinstance(DEFAULT_MAX_LAZY_CATEGORIES, int) + assert DEFAULT_MAX_LAZY_CATEGORIES > 0 + + def test_formatter_context_has_max_lazy_categories(self): + """Test that FormatterContext has max_lazy_categories attribute.""" + from anndata._repr import DEFAULT_MAX_LAZY_CATEGORIES + from anndata._repr.registry import FormatterContext + + context = FormatterContext() + assert hasattr(context, "max_lazy_categories") + assert context.max_lazy_categories == DEFAULT_MAX_LAZY_CATEGORIES + + def test_formatter_context_propagates_max_lazy_categories(self): + """Test that FormatterContext.child() propagates max_lazy_categories.""" + from anndata._repr.registry import FormatterContext + + context = FormatterContext(max_lazy_categories=50) + child = context.child("test_key") + assert child.max_lazy_categories == 50 + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_get_lazy_category_count_does_not_load_data(self, tmp_path): + """Test that get_lazy_category_count reads from storage metadata only.""" + from anndata._repr.lazy import get_lazy_category_count + + adata = AnnData( + sp.random(100, 50, density=0.1, format="csr", dtype=np.float32), + obs=pd.DataFrame({"cat_col": pd.Categorical(["a", "b", "c"] * 33 + ["a"])}), + ) + path = tmp_path / "test.zarr" + adata.write_zarr(path) + + lazy = ad.experimental.read_lazy(path) + col = lazy.obs._ds["cat_col"] + + cat_arr = col.variable._data.array + assert "categories" not in cat_arr.__dict__ + + n_cats = get_lazy_category_count(col) + assert n_cats == 3 + + assert "categories" not in cat_arr.__dict__ + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_get_lazy_categories_does_not_load_data(self, tmp_path): + """Test that get_lazy_categories reads from storage directly.""" + from anndata._repr.lazy import get_lazy_categories + from anndata._repr.registry import FormatterContext + + adata = AnnData( + sp.random(100, 50, density=0.1, format="csr", dtype=np.float32), + obs=pd.DataFrame({"cat_col": pd.Categorical(["x", "y", "z"] * 33 + ["x"])}), + ) + path = tmp_path / "test.zarr" + adata.write_zarr(path) + + lazy = ad.experimental.read_lazy(path) + col = lazy.obs._ds["cat_col"] + + cat_arr = col.variable._data.array + assert "categories" not in cat_arr.__dict__ + + context = FormatterContext(max_lazy_categories=100) + categories, skipped, n_cats = get_lazy_categories(col, context) + + assert set(categories) == {"x", "y", "z"} + assert not skipped + assert n_cats == 3 + + assert "categories" not in cat_arr.__dict__ + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_get_lazy_categories_skipping_does_not_load_categories(self, tmp_path): + """Test that when skipping (too many cats), we don't load category values.""" + from anndata._repr.lazy import get_lazy_categories + from anndata._repr.registry import FormatterContext + + large_cats = [f"cat_{i}" for i in range(150)] + adata = AnnData( + sp.random(150, 50, density=0.1, format="csr", dtype=np.float32), + obs=pd.DataFrame({"big_cat": pd.Categorical(large_cats)}), + ) + path = tmp_path / "test.zarr" + adata.write_zarr(path) + + lazy = ad.experimental.read_lazy(path) + col = lazy.obs._ds["big_cat"] + + cat_arr = col.variable._data.array + assert "categories" not in cat_arr.__dict__ + + context = FormatterContext(max_lazy_categories=100) + categories, truncated, n_cats = get_lazy_categories(col, context) + + assert len(categories) == 100 + assert categories[0] == "cat_0" + assert truncated is True + assert n_cats == 150 + + assert "categories" not in cat_arr.__dict__ + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_get_lazy_categories_h5ad(self, tmp_path): + """Test get_lazy_categories works with H5AD files.""" + import h5py + + from anndata._repr.lazy import get_lazy_categories, get_lazy_category_count + from anndata._repr.registry import FormatterContext + + adata = AnnData(np.random.randn(100, 50).astype(np.float32)) + adata.obs["cat_col"] = pd.Categorical(["x", "y", "z"] * 33 + ["x"]) + + path = tmp_path / "test.h5ad" + adata.write_h5ad(path) + + with h5py.File(path, "r") as f: + lazy = ad.experimental.read_lazy(f) + col = lazy.obs._ds["cat_col"] + + n_cats = get_lazy_category_count(col) + assert n_cats == 3 + + context = FormatterContext(max_lazy_categories=100) + categories, skipped, n = get_lazy_categories(col, context) + + assert set(categories) == {"x", "y", "z"} + assert not skipped + assert n == 3 + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_repr_html_does_not_load_lazy_categorical_data(self, tmp_path): + """Test that generating HTML repr doesn't trigger loading of lazy categorical data.""" + from anndata._repr import DEFAULT_MAX_LAZY_CATEGORIES + + adata = AnnData( + sp.random(100, 50, density=0.1, format="csr", dtype=np.float32), + obs=pd.DataFrame({ + "small_cat": pd.Categorical(["A", "B", "C"] * 33 + ["A"]), + "large_cat": pd.Categorical( + [f"cat_{i}" for i in range(100)], + categories=[ + f"cat_{i}" for i in range(DEFAULT_MAX_LAZY_CATEGORIES + 20) + ], + ), + }), + ) + path = tmp_path / "test.zarr" + adata.write_zarr(path) + + lazy = ad.experimental.read_lazy(path) + + small_cat_arr = lazy.obs._ds["small_cat"].variable._data.array + large_cat_arr = lazy.obs._ds["large_cat"].variable._data.array + + assert "categories" not in small_cat_arr.__dict__ + assert "categories" not in large_cat_arr.__dict__ + + html = lazy._repr_html_() + + assert "small_cat" in html + assert "large_cat" in html + assert "category" in html + + assert "categories" not in small_cat_arr.__dict__ + assert "categories" not in large_cat_arr.__dict__ + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_lazy_categorical_repr_integration(self, tmp_path): + """Integration test: verify lazy categoricals display correctly in repr.""" + import h5py + + from anndata._repr import DEFAULT_MAX_LAZY_CATEGORIES + from anndata.experimental import read_lazy + + n_large = DEFAULT_MAX_LAZY_CATEGORIES + 20 + + adata = AnnData(np.random.randn(100, 50).astype(np.float32)) + adata.obs["small_cat"] = pd.Categorical(["A", "B", "C"] * 33 + ["A"]) + adata.obs["large_cat"] = pd.Categorical( + [f"cat_{i}" for i in range(100)], + categories=[f"cat_{i}" for i in range(n_large)], + ) + adata.uns["small_cat_colors"] = ["#ff0000", "#00ff00", "#0000ff"] + + path = tmp_path / "test.h5ad" + adata.write_h5ad(path) + + with h5py.File(path, "r") as f: + lazy_adata = read_lazy(f) + html = lazy_adata._repr_html_() + + assert "small_cat" in html + assert "A" in html + assert "#ff0000" in html or "ff0000" in html + + assert "large_cat" in html + assert "cat_0" in html + assert "...+20" in html + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_metadata_only_mode_no_disk_loading(self, tmp_path): + """Test that max_lazy_categories=0 shows counts without loading category labels.""" + import h5py + + from anndata._repr.html import generate_repr_html + from anndata.experimental import read_lazy + + adata = AnnData(np.random.randn(50, 20).astype(np.float32)) + adata.obs["cat1"] = pd.Categorical(["A", "B", "C"] * 16 + ["A", "A"]) + adata.obs["cat2"] = pd.Categorical(["X", "Y"] * 25) + adata.uns["cat1_colors"] = ["#ff0000", "#00ff00", "#0000ff"] + + path = tmp_path / "test.h5ad" + adata.write_h5ad(path) + + with h5py.File(path, "r") as f: + lazy_adata = read_lazy(f) + + html = generate_repr_html(lazy_adata, max_lazy_categories=0) + + assert "(3 categories)" in html + assert "(2 categories)" in html + + assert "#ff0000" not in html + assert "#00ff00" not in html + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_metadata_only_vs_default_mode(self, tmp_path): + """Compare metadata-only mode vs default mode output.""" + import h5py + + from anndata._repr.html import generate_repr_html + from anndata.experimental import read_lazy + + adata = AnnData(np.random.randn(50, 20).astype(np.float32)) + adata.obs["small_cat"] = pd.Categorical(["A", "B"] * 25) + adata.uns["small_cat_colors"] = ["#ff0000", "#00ff00"] + + path = tmp_path / "test.h5ad" + adata.write_h5ad(path) + + with h5py.File(path, "r") as f: + lazy_adata = read_lazy(f) + + html_default = generate_repr_html(lazy_adata) + assert "#ff0000" in html_default + + html_metadata = generate_repr_html(lazy_adata, max_lazy_categories=0) + assert "#ff0000" not in html_metadata + assert "(2 categories)" in html_metadata + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_lazy_non_categorical_columns_repr(self, tmp_path): + """Test that non-categorical lazy columns display correctly in repr. + + This exercises the LazyColumnFormatter which handles int, float, and + string columns in lazy AnnData (as opposed to CategoricalFormatter + which handles categorical columns). + """ + import h5py + + from anndata._repr_constants import ( + CSS_DTYPE_BOOL, + CSS_DTYPE_CATEGORY, + CSS_DTYPE_FLOAT, + CSS_DTYPE_INT, + CSS_DTYPE_STRING, + ) + from anndata.experimental import read_lazy + + adata = AnnData(np.random.randn(50, 20).astype(np.float32)) + # Add various non-categorical column types + adata.obs["int_col"] = np.arange(50) + adata.obs["float_col"] = np.random.randn(50) + adata.obs["bool_col"] = np.array([True, False] * 25) + adata.obs["str_col"] = [f"cell_{i}" for i in range(50)] + # Also add a categorical for comparison + adata.obs["cat_col"] = pd.Categorical(["A", "B"] * 25) + + path = tmp_path / "test.h5ad" + adata.write_h5ad(path) + + with h5py.File(path, "r") as f: + lazy_adata = read_lazy(f) + html = lazy_adata._repr_html_() + + # All column names should appear + assert "int_col" in html + assert "float_col" in html + assert "bool_col" in html + assert "str_col" in html + assert "cat_col" in html + + # Non-categorical columns should show "(lazy)" indicator + assert "(lazy)" in html + + # Categorical column should show category info, not "(lazy)" + assert "category" in html + + # Check that dtype info is shown for lazy columns + assert "int64" in html or "int32" in html + assert "float64" in html + assert "bool" in html + + # Verify correct CSS classes are applied for each dtype + # This catches bugs like case-sensitive dtype matching + assert CSS_DTYPE_INT in html, "int column should have int CSS class" + assert CSS_DTYPE_FLOAT in html, "float column should have float CSS class" + assert CSS_DTYPE_BOOL in html, "bool column should have bool CSS class" + assert CSS_DTYPE_STRING in html, ( + "string column should have string CSS class" + ) + assert CSS_DTYPE_CATEGORY in html, ( + "categorical column should have category CSS class" + ) + + +class TestIsLazyColumn: + """Tests for is_lazy_column detection.""" + + def test_is_lazy_column_regular_series(self): + """Test that regular pandas Series is not detected as lazy.""" + from anndata._repr.lazy import is_lazy_column + + series = pd.Series([1, 2, 3]) + assert is_lazy_column(series) is False + + def test_is_lazy_column_categorical_series(self): + """Test that categorical pandas Series is not detected as lazy.""" + from anndata._repr.lazy import is_lazy_column + + series = pd.Series(pd.Categorical(["a", "b", "c"])) + assert is_lazy_column(series) is False + + def test_is_lazy_column_numpy_array(self): + """Test that numpy array is not detected as lazy.""" + from anndata._repr.lazy import is_lazy_column + + arr = np.array([1, 2, 3]) + assert is_lazy_column(arr) is False + + def test_is_lazy_column_mock_xarray_like(self): + """Test that object with xarray-like attributes is detected as lazy.""" + from anndata._repr.lazy import is_lazy_column + + class MockXarrayColumn: + variable = "something" + dims = ("x",) + + assert is_lazy_column(MockXarrayColumn()) is True + + def test_is_lazy_column_mock_variable_backed(self): + """Test that object with _variable attribute is detected as lazy.""" + from anndata._repr.lazy import is_lazy_column + + class MockVariableBacked: + _variable = "something" + + assert is_lazy_column(MockVariableBacked()) is True + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_is_lazy_column_real_xarray(self, tmp_path): + """Test is_lazy_column with real xarray DataArray from lazy AnnData.""" + from anndata._repr.lazy import is_lazy_column + + adata = AnnData(np.random.randn(50, 20).astype(np.float32)) + adata.obs["cat"] = pd.Categorical(["A", "B"] * 25) + + path = tmp_path / "test.zarr" + adata.write_zarr(path) + + lazy = ad.experimental.read_lazy(path) + col = lazy.obs._ds["cat"] + + assert is_lazy_column(col) is True + + +class TestGetLazyCategoricalInfo: + """Tests for get_lazy_categorical_info function.""" + + def test_get_lazy_categorical_info_non_lazy(self): + """Test that non-lazy objects return (None, False).""" + from anndata._repr.lazy import get_lazy_categorical_info + + series = pd.Series(pd.Categorical(["a", "b", "c"])) + n_cats, ordered = get_lazy_categorical_info(series) + assert n_cats is None + assert ordered is False + + def test_get_lazy_categorical_info_plain_object(self): + """Test that plain objects return (None, False).""" + from anndata._repr.lazy import get_lazy_categorical_info + + n_cats, ordered = get_lazy_categorical_info("not a categorical") + assert n_cats is None + assert ordered is False + + def test_get_lazy_categorical_info_mock_without_categorical(self): + """Test object with xarray structure but no CategoricalArray.""" + from anndata._repr.lazy import get_lazy_categorical_info + + class MockVariable: + _data = None + + class MockCol: + variable = MockVariable() + + n_cats, ordered = get_lazy_categorical_info(MockCol()) + assert n_cats is None + assert ordered is False + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_get_lazy_categorical_info_zarr(self, tmp_path): + """Test get_lazy_categorical_info with Zarr-backed categorical.""" + from anndata._repr.lazy import get_lazy_categorical_info + + adata = AnnData(sp.random(100, 50, density=0.1, format="csr", dtype=np.float32)) + adata.obs["cat"] = pd.Categorical(["a", "b", "c", "d", "e"] * 20) + + path = tmp_path / "test.zarr" + adata.write_zarr(path) + + lazy = ad.experimental.read_lazy(path) + col = lazy.obs._ds["cat"] + + n_cats, ordered = get_lazy_categorical_info(col) + assert n_cats == 5 + assert not ordered + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_get_lazy_categorical_info_h5ad(self, tmp_path): + """Test get_lazy_categorical_info with H5AD-backed categorical.""" + import h5py + + from anndata._repr.lazy import get_lazy_categorical_info + + adata = AnnData(np.random.randn(100, 50).astype(np.float32)) + adata.obs["cat"] = pd.Categorical(["x", "y", "z"] * 33 + ["x"], ordered=False) + + path = tmp_path / "test.h5ad" + adata.write_h5ad(path) + + with h5py.File(path, "r") as f: + lazy = ad.experimental.read_lazy(f) + col = lazy.obs._ds["cat"] + + n_cats, ordered = get_lazy_categorical_info(col) + assert n_cats == 3 + assert not ordered + + +class TestGetCategoricalArrayHelper: + """Tests for _get_categorical_array helper function.""" + + def test_get_categorical_array_non_lazy(self): + """Test that non-lazy objects return None.""" + from anndata._repr.lazy import _get_categorical_array + + series = pd.Series(pd.Categorical(["a", "b", "c"])) + assert _get_categorical_array(series) is None + + def test_get_categorical_array_plain_object(self): + """Test that plain objects return None.""" + from anndata._repr.lazy import _get_categorical_array + + assert _get_categorical_array("string") is None + assert _get_categorical_array(123) is None + assert _get_categorical_array(None) is None + + def test_get_categorical_array_mock_structure_no_categorical(self): + """Test object with partial xarray structure returns None.""" + from anndata._repr.lazy import _get_categorical_array + + class MockLazyIndexed: + array = "not a CategoricalArray" + + class MockVariable: + _data = MockLazyIndexed() + + class MockCol: + variable = MockVariable() + + assert _get_categorical_array(MockCol()) is None + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_get_categorical_array_real(self, tmp_path): + """Test _get_categorical_array with real lazy categorical.""" + from anndata._repr.lazy import _get_categorical_array + from anndata.experimental.backed._lazy_arrays import CategoricalArray + + adata = AnnData(sp.random(50, 20, density=0.1, format="csr", dtype=np.float32)) + adata.obs["cat"] = pd.Categorical(["A", "B", "C"] * 16 + ["A", "A"]) + + path = tmp_path / "test.zarr" + adata.write_zarr(path) + + lazy = ad.experimental.read_lazy(path) + col = lazy.obs._ds["cat"] + + result = _get_categorical_array(col) + assert isinstance(result, CategoricalArray) + + +class TestLazyAdataDetection: + """Tests for is_lazy_adata detection with edge cases.""" + + def test_is_lazy_adata_none(self): + """Test that None returns False.""" + from anndata._repr.lazy import is_lazy_adata + + assert is_lazy_adata(None) is False + + def test_is_lazy_adata_no_obs(self): + """Test object without obs attribute returns False.""" + from anndata._repr.lazy import is_lazy_adata + + class NoObs: + pass + + assert is_lazy_adata(NoObs()) is False + + def test_is_lazy_adata_obs_raises(self): + """Test object where .obs raises returns False.""" + from anndata._repr.lazy import is_lazy_adata + + class RaisingObs: + @property + def obs(self): + msg = "Cannot access obs" + raise RuntimeError(msg) + + assert is_lazy_adata(RaisingObs()) is False + + def test_is_lazy_adata_obs_none(self): + """Test object with obs=None returns False.""" + from anndata._repr.lazy import is_lazy_adata + + class NoneObs: + obs = None + + assert is_lazy_adata(NoneObs()) is False + + +class TestLazyBackingInfo: + """Tests for lazy AnnData backing file info extraction.""" + + def test_get_lazy_backing_info_non_lazy_returns_empty(self): + """Test that non-lazy AnnData returns empty backing info.""" + from anndata._repr.lazy import get_lazy_backing_info + + adata = AnnData(np.random.randn(10, 5).astype(np.float32)) + info = get_lazy_backing_info(adata) + + assert info == {"filename": "", "format": ""} + + def test_is_lazy_adata_false_for_regular(self): + """Test that is_lazy_adata returns False for regular AnnData.""" + from anndata._repr.lazy import is_lazy_adata + + adata = AnnData(np.random.randn(10, 5).astype(np.float32)) + assert is_lazy_adata(adata) is False + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_get_lazy_backing_info_h5ad(self, tmp_path): + """Test backing info extraction from lazy H5AD.""" + import h5py + + from anndata._repr.lazy import get_lazy_backing_info, is_lazy_adata + from anndata.experimental import read_lazy + + adata = AnnData(np.random.randn(50, 20).astype(np.float32)) + adata.obs["cat"] = pd.Categorical(["A", "B", "C"] * 16 + ["A", "A"]) + + path = tmp_path / "test.h5ad" + adata.write_h5ad(path) + + with h5py.File(path, "r") as f: + lazy_adata = read_lazy(f) + + assert is_lazy_adata(lazy_adata) is True + + info = get_lazy_backing_info(lazy_adata) + assert info["format"] == "H5AD" + assert str(path) in info["filename"] or info["filename"] != "" + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_get_lazy_backing_info_zarr(self, tmp_path): + """Test backing info extraction from lazy Zarr.""" + from anndata._repr.lazy import get_lazy_backing_info, is_lazy_adata + from anndata.experimental import read_lazy + + adata = AnnData(sp.random(50, 20, density=0.1, format="csr", dtype=np.float32)) + adata.obs["cat"] = pd.Categorical(["A", "B", "C"] * 16 + ["A", "A"]) + + path = tmp_path / "test.zarr" + adata.write_zarr(path) + + lazy_adata = read_lazy(path) + + assert is_lazy_adata(lazy_adata) is True + + info = get_lazy_backing_info(lazy_adata) + assert info["format"] == "Zarr" + # Zarr path should be extracted + assert str(path) in info["filename"] or info["filename"] != "" + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_lazy_badge_h5ad_in_html(self, tmp_path, validate_html): + """Test that lazy H5AD shows correct badge in HTML repr.""" + import h5py + + from anndata.experimental import read_lazy + + adata = AnnData(np.random.randn(50, 20).astype(np.float32)) + adata.obs["cat"] = pd.Categorical(["A", "B"] * 25) + + path = tmp_path / "test.h5ad" + adata.write_h5ad(path) + + with h5py.File(path, "r") as f: + lazy_adata = read_lazy(f) + html = lazy_adata._repr_html_() + + v = validate_html(html) + # Should show "Lazy (H5AD)" badge with lazy badge styling + v.assert_badge_shown("lazy") + v.assert_text_visible("Lazy") + v.assert_text_visible("H5AD") + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_lazy_badge_zarr_in_html(self, tmp_path, validate_html): + """Test that lazy Zarr shows correct badge in HTML repr.""" + from anndata.experimental import read_lazy + + adata = AnnData(sp.random(50, 20, density=0.1, format="csr", dtype=np.float32)) + adata.obs["cat"] = pd.Categorical(["A", "B"] * 25) + + path = tmp_path / "test.zarr" + adata.write_zarr(path) + + lazy_adata = read_lazy(path) + html = lazy_adata._repr_html_() + + v = validate_html(html) + # Should show "Lazy (Zarr)" badge with lazy badge styling + v.assert_badge_shown("lazy") + v.assert_text_visible("Lazy") + v.assert_text_visible("Zarr") + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_lazy_file_path_in_html_h5ad(self, tmp_path, validate_html): + """Test that lazy H5AD file path appears in HTML repr.""" + import h5py + + from anndata.experimental import read_lazy + + adata = AnnData(np.random.randn(50, 20).astype(np.float32)) + adata.obs["cat"] = pd.Categorical(["A", "B"] * 25) + + path = tmp_path / "test.h5ad" + adata.write_h5ad(path) + + with h5py.File(path, "r") as f: + lazy_adata = read_lazy(f) + html = lazy_adata._repr_html_() + + v = validate_html(html) + # File path should appear in visible content + v.assert_text_visible(str(path)) + # Also verify it's in the file path element + v.assert_element_exists(".anndata-header__filepath") + + @pytest.mark.skipif(not HAS_XARRAY, reason="xarray not installed") + def test_lazy_file_path_in_html_zarr(self, tmp_path, validate_html): + """Test that lazy Zarr file path appears in HTML repr.""" + from anndata.experimental import read_lazy + + adata = AnnData(sp.random(50, 20, density=0.1, format="csr", dtype=np.float32)) + adata.obs["cat"] = pd.Categorical(["A", "B"] * 25) + + path = tmp_path / "test.zarr" + adata.write_zarr(path) + + lazy_adata = read_lazy(path) + html = lazy_adata._repr_html_() + + v = validate_html(html) + # File path should appear in visible content + v.assert_text_visible(str(path)) + # Also verify it's in the file path element + v.assert_element_exists(".anndata-header__filepath") diff --git a/tests/repr/test_repr_registry.py b/tests/repr/test_repr_registry.py new file mode 100644 index 000000000..a4af35eec --- /dev/null +++ b/tests/repr/test_repr_registry.py @@ -0,0 +1,920 @@ +""" +Registry pattern tests for the _repr module. + +Tests for FormatterRegistry, TypeFormatter, SectionFormatter, +custom formatter registration, and uns type hints. +""" + +from __future__ import annotations + +from typing import TYPE_CHECKING + +import numpy as np +import pytest + +from anndata import AnnData + +if TYPE_CHECKING: + from typing import Any + + +class TestFormatterRegistry: + """Test the formatter registry pattern for extensibility.""" + + def test_registry_has_formatters(self): + """Test registry contains registered formatters.""" + from anndata._repr.registry import formatter_registry + + assert len(formatter_registry._type_formatters) > 0 + + def test_custom_formatter_registration(self): + """Test registering a custom formatter.""" + from anndata._repr.registry import ( + FormattedOutput, + FormatterContext, + TypeFormatter, + formatter_registry, + ) + + class CustomType: + pass + + class CustomTypeFormatter(TypeFormatter): + priority = 500 + + def can_format(self, obj: Any, context) -> bool: + return isinstance(obj, CustomType) + + def format(self, obj: Any, context: FormatterContext) -> FormattedOutput: + return FormattedOutput( + type_name="CustomType", + css_class="anndata-dtype--custom", + is_serializable=False, + ) + + formatter = CustomTypeFormatter() + formatter_registry.register_type_formatter(formatter) + + try: + obj = CustomType() + context = FormatterContext() + result = formatter_registry.format_value(obj, context) + assert result.type_name == "CustomType" + assert result.css_class == "anndata-dtype--custom" + assert result.is_serializable is False + finally: + formatter_registry.unregister_type_formatter(formatter) + + def test_fallback_formatter_for_unknown_types(self): + """Test fallback formatter handles unknown types gracefully.""" + from anndata._repr.registry import FormatterContext, formatter_registry + + class UnknownType: + pass + + obj = UnknownType() + context = FormatterContext() + result = formatter_registry.format_value(obj, context) + + assert result is not None + assert "UnknownType" in result.type_name + assert "unknown" in result.css_class or "extension" in result.css_class + + def test_formatter_priority_order(self): + """Test formatters are checked in priority order.""" + from anndata._repr.registry import formatter_registry + + priorities = [f.priority for f in formatter_registry._type_formatters] + assert priorities == sorted(priorities, reverse=True) + + def test_formatter_sections_filtering(self): + """Test formatters are only applied to specified sections.""" + from anndata._repr.registry import ( + FormattedOutput, + FormatterContext, + TypeFormatter, + formatter_registry, + ) + + class SectionSpecificType: + pass + + class UnsOnlyFormatter(TypeFormatter): + priority = 600 + sections = ("uns",) + + def can_format(self, obj: Any, context) -> bool: + return isinstance(obj, SectionSpecificType) + + def format(self, obj: Any, context: FormatterContext) -> FormattedOutput: + return FormattedOutput( + type_name="UnsSpecificType", + css_class="anndata-dtype--uns-specific", + ) + + formatter = UnsOnlyFormatter() + formatter_registry.register_type_formatter(formatter) + + try: + obj = SectionSpecificType() + + context_uns = FormatterContext(section="uns") + result_uns = formatter_registry.format_value(obj, context_uns) + assert result_uns.type_name == "UnsSpecificType" + + context_obsm = FormatterContext(section="obsm") + result_obsm = formatter_registry.format_value(obj, context_obsm) + assert result_obsm.type_name != "UnsSpecificType" + assert "SectionSpecificType" in result_obsm.type_name + finally: + formatter_registry.unregister_type_formatter(formatter) + + def test_formatter_sections_none_applies_everywhere(self): + """Test formatters with sections=None apply to all sections.""" + from anndata._repr.registry import ( + FormattedOutput, + FormatterContext, + TypeFormatter, + formatter_registry, + ) + + class UniversalType: + pass + + class UniversalFormatter(TypeFormatter): + priority = 600 + sections = None + + def can_format(self, obj: Any, context) -> bool: + return isinstance(obj, UniversalType) + + def format(self, obj: Any, context: FormatterContext) -> FormattedOutput: + return FormattedOutput(type_name="UniversalType") + + formatter = UniversalFormatter() + formatter_registry.register_type_formatter(formatter) + + try: + obj = UniversalType() + for section in ["uns", "obsm", "varm", "layers", "obs", "var"]: + context = FormatterContext(section=section) + result = formatter_registry.format_value(obj, context) + assert result.type_name == "UniversalType" + finally: + formatter_registry.unregister_type_formatter(formatter) + + def test_extension_type_graceful_handling(self): + """Test extension types are handled gracefully.""" + from anndata._repr.registry import FormatterContext, formatter_registry + + class ExtensionData: + def __init__(self): + self.n_obs = 100 + self.n_vars = 50 + self.shape = (100, 50) + self.dtype = np.float32 + + ExtensionData.__module__ = "treedata.core" + + obj = ExtensionData() + context = FormatterContext() + result = formatter_registry.format_value(obj, context) + + assert result is not None + assert "ExtensionData" in result.type_name + assert "Shape:" in result.tooltip # Shape info in tooltip for fallback + + def test_anndata_in_uns_detected(self): + """Test nested AnnData in .uns is properly detected.""" + inner = AnnData(np.zeros((5, 3))) + outer = AnnData(np.zeros((10, 5))) + outer.uns["inner_adata"] = inner + + html = outer._repr_html_() + + assert "inner_adata" in html + assert "AnnData" in html + + def test_registry_formatter_exception_continues_to_next(self): + """Test registry continues to next formatter on exception.""" + from anndata._repr.registry import ( + FormattedOutput, + FormatterContext, + FormatterRegistry, + TypeFormatter, + ) + + class FailingFormatter(TypeFormatter): + priority = 1000 + + def can_format(self, obj, context): + return True + + def format(self, obj, context): + msg = "Intentional failure" + raise RuntimeError(msg) + + class BackupFormatter(TypeFormatter): + priority = 500 + + def can_format(self, obj, context): + return True + + def format(self, obj, context): + return FormattedOutput(type_name="Backup", css_class="backup") + + registry = FormatterRegistry() + failing = FailingFormatter() + backup = BackupFormatter() + registry.register_type_formatter(failing) + registry.register_type_formatter(backup) + + try: + # Two warnings: one for the failure, one summarizing failures before success + with pytest.warns(UserWarning, match=r"Formatter") as warnings: + result = registry.format_value("test", FormatterContext()) + # BackupFormatter succeeds + assert result.type_name == "Backup" + # No error in result (backup succeeded) + assert result.error is None + # Warnings were emitted about the failure + warning_messages = [str(w.message) for w in warnings] + assert any("FailingFormatter" in msg for msg in warning_messages) + assert any("Intentional failure" in msg for msg in warning_messages) + finally: + registry.unregister_type_formatter(failing) + registry.unregister_type_formatter(backup) + + def test_register_formatter_decorator_with_class(self): + """Test register_formatter works as decorator with class.""" + from anndata._repr.registry import ( + FormattedOutput, + FormatterContext, + TypeFormatter, + formatter_registry, + register_formatter, + ) + + class DecoratorTestFormatter(TypeFormatter): + priority = 999 + + def can_format(self, obj, context): + return ( + isinstance(obj, tuple) + and len(obj) == 3 + and obj[0] == "decorator_test" + ) + + def format(self, obj, context): + return FormattedOutput(type_name="DecoratorTest", css_class="test") + + formatter_instance = DecoratorTestFormatter() + register_formatter(formatter_instance) + + try: + result = formatter_registry.format_value( + ("decorator_test", 1, 2), FormatterContext() + ) + assert result.type_name == "DecoratorTest" + finally: + formatter_registry.unregister_type_formatter(formatter_instance) + + +class TestFormatterContext: + """Tests for FormatterContext.""" + + def test_context_child_creates_nested_context(self): + """Test FormatterContext.child() creates proper nested context.""" + from anndata._repr.registry import FormatterContext + + parent = FormatterContext( + depth=0, + max_depth=5, + parent_keys=(), + adata_ref=None, + section="uns", + ) + + child = parent.child("nested_key") + + assert child.depth == 1 + assert child.max_depth == 5 + assert child.parent_keys == ("nested_key",) + assert child.section == "uns" + + grandchild = child.child("deeper") + assert grandchild.depth == 2 + assert grandchild.parent_keys == ("nested_key", "deeper") + + def test_context_access_path_empty(self): + """Test access_path returns empty string for no parent keys.""" + from anndata._repr.registry import FormatterContext + + context = FormatterContext(parent_keys=()) + assert context.access_path == "" + + def test_context_access_path_identifier_keys(self): + """Test access_path with valid Python identifiers.""" + from anndata._repr.registry import FormatterContext + + context = FormatterContext(parent_keys=("uns", "neighbors", "params")) + assert context.access_path == ".uns.neighbors.params" + + def test_context_access_path_non_identifier_keys(self): + """Test access_path with non-identifier keys.""" + from anndata._repr.registry import FormatterContext + + context = FormatterContext(parent_keys=("uns", "key with spaces", "123numeric")) + path = context.access_path + assert ".uns" in path + assert "['key with spaces']" in path + assert "['123numeric']" in path + + def test_context_access_path_mixed_keys(self): + """Test access_path with mixed identifier and non-identifier keys.""" + from anndata._repr.registry import FormatterContext + + context = FormatterContext(parent_keys=("valid", "has-hyphen", "also_valid")) + path = context.access_path + assert ".valid" in path + assert "['has-hyphen']" in path + assert ".also_valid" in path + + +class TestSectionFormatter: + """Tests for SectionFormatter abstract class.""" + + def test_section_formatter_default_methods(self): + """Test SectionFormatter default method implementations.""" + from anndata._repr.registry import SectionFormatter + + class TestSectionFormatter(SectionFormatter): + @property + def section_name(self) -> str: + return "test_section" + + def get_entries(self, obj, context): + return [] + + formatter = TestSectionFormatter() + + assert formatter.display_name == "test_section" + assert formatter.doc_url is None + assert formatter.tooltip == "" + assert formatter.should_show(None) is True + + +class TestFallbackFormatter: + """Tests for FallbackFormatter edge cases.""" + + def test_fallback_extension_type_no_unknown_warning(self): + """Test fallback formatter for extension types doesn't add 'Unknown type' warning.""" + from anndata._repr.registry import FallbackFormatter, FormatterContext + + class ExtensionType: + pass + + ExtensionType.__module__ = "treedata.core" + + formatter = FallbackFormatter() + obj = ExtensionType() + context = FormatterContext() + + result = formatter.format(obj, context) + + assert result.type_name == "ExtensionType" + assert result.css_class == "anndata-dtype--extension" + # Extension types don't get "Unknown type" warning, but do get serialization warning + assert not any("Unknown type" in w for w in result.warnings) + # Serialization reason is included + assert any("no registered writer" in w for w in result.warnings) + + def test_fallback_with_shape_and_dtype(self): + """Test fallback formatter extracts shape and dtype.""" + from anndata._repr.registry import FallbackFormatter, FormatterContext + + class ShapedType: + shape = (10, 5) + dtype = "float32" + + formatter = FallbackFormatter() + obj = ShapedType() + context = FormatterContext() + + result = formatter.format(obj, context) + + assert "Shape: (10, 5)" in result.tooltip + assert "Dtype: float32" in result.tooltip + + def test_fallback_with_len(self): + """Test fallback formatter extracts length.""" + from anndata._repr.registry import FallbackFormatter, FormatterContext + + class LengthType: + def __len__(self): + return 42 + + formatter = FallbackFormatter() + obj = LengthType() + context = FormatterContext() + + result = formatter.format(obj, context) + + assert "Length: 42" in result.tooltip + + def test_fallback_len_raises_error(self): + """Test fallback formatter handles __len__ errors gracefully.""" + from anndata._repr.registry import FallbackFormatter, FormatterContext + + class BrokenLenType: + def __len__(self): + msg = "Cannot get length" + raise TypeError(msg) + + formatter = FallbackFormatter() + obj = BrokenLenType() + context = FormatterContext() + + result = formatter.format(obj, context) + assert "Length:" not in result.tooltip + + +class TestRegistryAbstractMethods: + """Tests for registry abstract method checks.""" + + def test_type_formatter_is_abstract(self): + """Test TypeFormatter requires abstract methods.""" + from anndata._repr.registry import TypeFormatter + + # Should not be instantiable directly + assert hasattr(TypeFormatter, "can_format") + assert hasattr(TypeFormatter, "format") + + def test_section_formatter_is_abstract(self): + """Test SectionFormatter requires abstract methods.""" + from anndata._repr.registry import SectionFormatter + + assert hasattr(SectionFormatter, "section_name") + assert hasattr(SectionFormatter, "get_entries") + + +class TestUnsRendererRegistry: + """Test the uns renderer registry for custom serialized data visualization.""" + + def test_extract_type_hint_dict_format(self): + """Test extracting type hint from dict format.""" + from anndata._repr.registry import UNS_TYPE_HINT_KEY, extract_uns_type_hint + + value = { + UNS_TYPE_HINT_KEY: "mypackage.config", + "data": {"setting": "value"}, + "version": "1.0", + } + hint, cleaned = extract_uns_type_hint(value) + + assert hint == "mypackage.config" + assert UNS_TYPE_HINT_KEY not in cleaned + assert cleaned["data"] == {"setting": "value"} + assert cleaned["version"] == "1.0" + + def test_extract_type_hint_string_format(self): + """Test extracting type hint from string prefix format.""" + from anndata._repr.registry import UNS_TYPE_HINT_KEY, extract_uns_type_hint + + value = f'{UNS_TYPE_HINT_KEY}:mypackage.config::{{"setting": "value"}}' + hint, cleaned = extract_uns_type_hint(value) + + assert hint == "mypackage.config" + assert cleaned == '{"setting": "value"}' + + def test_extract_type_hint_no_hint_returns_none(self): + """Test that values without type hints return (None, original_value).""" + from anndata._repr.registry import extract_uns_type_hint + + value = {"data": "value"} + hint, cleaned = extract_uns_type_hint(value) + assert hint is None + assert cleaned == value + + value = "just a string" + hint, cleaned = extract_uns_type_hint(value) + assert hint is None + assert cleaned == value + + value = 42 + hint, cleaned = extract_uns_type_hint(value) + assert hint is None + assert cleaned == value + + def test_extract_type_hint_invalid_dict_hint_type(self): + """Test that non-string type hints in dict are ignored.""" + from anndata._repr.registry import UNS_TYPE_HINT_KEY, extract_uns_type_hint + + value = {UNS_TYPE_HINT_KEY: 123, "data": "value"} + hint, cleaned = extract_uns_type_hint(value) + assert hint is None + assert cleaned == value + + def test_extract_type_hint_malformed_string_format(self): + """Test that malformed string format returns no hint.""" + from anndata._repr.registry import UNS_TYPE_HINT_KEY, extract_uns_type_hint + + value = f"{UNS_TYPE_HINT_KEY}:mypackage.config:data" + hint, cleaned = extract_uns_type_hint(value) + assert hint is None + assert cleaned == value + + def test_type_formatter_for_tagged_uns_data(self): + """Test using TypeFormatter to handle tagged data in uns.""" + from anndata._repr import ( + FormattedOutput, + TypeFormatter, + extract_uns_type_hint, + formatter_registry, + register_formatter, + ) + + class TestConfigFormatter(TypeFormatter): + priority = 100 + + def can_format(self, obj, context): + hint, _ = extract_uns_type_hint(obj) + return hint == "test.config_format" + + def format(self, obj, context): + _hint, data = extract_uns_type_hint(obj) + items = data.get("data", {}) + return FormattedOutput( + type_name="test config", + preview_html=f'Items: {len(items)}', + ) + + formatter = TestConfigFormatter() + register_formatter(formatter) + + try: + adata = AnnData(np.zeros((5, 3))) + adata.uns["my_config"] = { + "__anndata_repr__": "test.config_format", + "data": {"a": 1, "b": 2, "c": 3}, + } + + html = adata._repr_html_() + + assert "Items: 3" in html + assert "test config" in html + finally: + formatter_registry.unregister_type_formatter(formatter) + + def test_unregistered_type_hint_shows_import_message(self): + """Test that unregistered type hints show helpful import message.""" + adata = AnnData(np.zeros((5, 3))) + adata.uns["external_data"] = { + "__anndata_repr__": "externalpackage.customtype", + "data": {"key": "value"}, + } + + html = adata._repr_html_() + + assert "externalpackage.customtype" in html + assert "import externalpackage" in html + + def test_formatter_error_handled_gracefully(self): + """Test that TypeFormatter errors don't crash the repr.""" + from anndata._repr import ( + TypeFormatter, + extract_uns_type_hint, + formatter_registry, + register_formatter, + ) + + class FailingFormatter(TypeFormatter): + priority = 100 + + def can_format(self, obj, context): + hint, _ = extract_uns_type_hint(obj) + return hint == "test.failing_format" + + def format(self, obj, context): + msg = "Intentional test error" + raise ValueError(msg) + + formatter = FailingFormatter() + register_formatter(formatter) + + try: + adata = AnnData(np.zeros((5, 3))) + adata.uns["will_fail"] = { + "__anndata_repr__": "test.failing_format", + "data": "test", + } + + with pytest.warns(UserWarning, match="Formatter.*:"): + html = adata._repr_html_() + + assert html is not None + assert "will_fail" in html + finally: + formatter_registry.unregister_type_formatter(formatter) + + def test_string_format_type_hint_in_html(self): + """Test string format type hints work in HTML output.""" + adata = AnnData(np.zeros((5, 3))) + adata.uns["string_hint"] = ( + "__anndata_repr__:somepackage.config::actual content here" + ) + + html = adata._repr_html_() + + assert "somepackage.config" in html + assert "import somepackage" in html + + def test_type_hint_key_constant_exported(self): + """Test that UNS_TYPE_HINT_KEY constant is properly exported.""" + from anndata._repr import UNS_TYPE_HINT_KEY + + assert UNS_TYPE_HINT_KEY == "__anndata_repr__" + + def test_security_data_never_triggers_import(self): + """Test that data in uns NEVER triggers imports or code execution.""" + import sys + + fake_package = "definitely_not_a_real_package_12345" + assert fake_package not in sys.modules + + adata = AnnData(np.zeros((5, 3))) + adata.uns["malicious"] = { + "__anndata_repr__": f"{fake_package}.evil", + "data": "some data", + } + + html = adata._repr_html_() + + assert fake_package not in sys.modules + assert html is not None + assert "malicious" in html + + +class TestCustomSectionFormatterCodePaths: + """Tests for custom section formatter code paths to improve coverage.""" + + def test_custom_section_with_entries(self): + """Test custom section formatter with entries.""" + from anndata._repr.registry import ( + FormattedEntry, + FormattedOutput, + FormatterContext, # noqa: TC001 + SectionFormatter, + formatter_registry, + ) + + class TestSectionFormatter(SectionFormatter): + @property + def section_name(self) -> str: + return "test_custom_section_entries" + + @property + def display_name(self) -> str: + return "Test Custom Entries" + + @property + def doc_url(self) -> str: + return "https://example.com/docs" + + @property + def tooltip(self) -> str: + return "A test section" + + def should_show(self, obj) -> bool: + return hasattr(obj, "_test_marker") + + def get_entries(self, obj, context: FormatterContext): + return [ + FormattedEntry( + key="entry1", + output=FormattedOutput(type_name="TestType", css_class="test"), + ), + FormattedEntry( + key="entry2", + output=FormattedOutput( + type_name="TestType2", + css_class="test", + warnings=["Test warning"], + ), + ), + ] + + formatter = TestSectionFormatter() + formatter_registry.register_section_formatter(formatter) + + try: + adata = AnnData(np.zeros((10, 5))) + adata._test_marker = True + + html = adata._repr_html_() + assert ( + "Test Custom Entries" in html or "test_custom_section_entries" in html + ) + assert "entry1" in html + assert "entry2" in html + finally: + formatter_registry._section_formatters.pop( + "test_custom_section_entries", None + ) + + def test_custom_section_exception_handling(self): + """Test custom section formatter handles exceptions gracefully.""" + from anndata._repr.registry import ( + FormatterContext, # noqa: TC001 + SectionFormatter, + formatter_registry, + ) + + class FailingSectionFormatter(SectionFormatter): + @property + def section_name(self) -> str: + return "failing_section" + + def should_show(self, obj) -> bool: + return True + + def get_entries(self, obj, context: FormatterContext): + msg = "Intentional failure" + raise ValueError(msg) + + formatter = FailingSectionFormatter() + formatter_registry.register_section_formatter(formatter) + + try: + adata = AnnData(np.zeros((10, 5))) + # Should not crash, just skip the failing section + with pytest.warns(UserWarning, match="Custom section.*failed"): + html = adata._repr_html_() + assert html is not None + finally: + formatter_registry._section_formatters.pop("failing_section", None) + + def test_custom_section_should_show_exception(self): + """Test custom section handles should_show exception.""" + from anndata._repr.registry import ( + FormatterContext, # noqa: TC001 + SectionFormatter, + formatter_registry, + ) + + class ShouldShowFailingFormatter(SectionFormatter): + @property + def section_name(self) -> str: + return "should_show_failing" + + def should_show(self, obj) -> bool: + msg = "Intentional failure in should_show" + raise RuntimeError(msg) + + def get_entries(self, obj, context: FormatterContext): + return [] + + formatter = ShouldShowFailingFormatter() + formatter_registry.register_section_formatter(formatter) + + try: + adata = AnnData(np.zeros((10, 5))) + # Should not crash + html = adata._repr_html_() + assert html is not None + finally: + formatter_registry._section_formatters.pop("should_show_failing", None) + + +class TestFormattedEntryRendering: + """Tests for FormattedEntry rendering in custom sections.""" + + def test_formatted_entry_with_expandable_html(self): + """Test formatted entry with expandable HTML content.""" + from anndata._repr.registry import ( + FormattedEntry, + FormattedOutput, + FormatterContext, # noqa: TC001 + SectionFormatter, + formatter_registry, + ) + + class ExpandableEntryFormatter(SectionFormatter): + @property + def section_name(self) -> str: + return "expandable_section" + + def should_show(self, obj) -> bool: + return hasattr(obj, "_expandable_marker") + + def get_entries(self, obj, context: FormatterContext): + return [ + FormattedEntry( + key="expandable_entry", + output=FormattedOutput( + type_name="Expandable", + css_class="test", + expanded_html="
Expanded content here
", + ), + ), + ] + + formatter = ExpandableEntryFormatter() + formatter_registry.register_section_formatter(formatter) + + try: + adata = AnnData(np.zeros((10, 5))) + adata._expandable_marker = True + + html = adata._repr_html_() + assert "expandable_entry" in html + assert "Expand" in html or "expand" in html.lower() + finally: + formatter_registry._section_formatters.pop("expandable_section", None) + + def test_formatted_entry_with_inline_html(self): + """Test formatted entry with inline (non-expandable) HTML.""" + from anndata._repr.registry import ( + FormattedEntry, + FormattedOutput, + FormatterContext, # noqa: TC001 + SectionFormatter, + formatter_registry, + ) + + class InlineEntryFormatter(SectionFormatter): + @property + def section_name(self) -> str: + return "inline_section" + + def should_show(self, obj) -> bool: + return hasattr(obj, "_inline_marker") + + def get_entries(self, obj, context: FormatterContext): + return [ + FormattedEntry( + key="inline_entry", + output=FormattedOutput( + type_name="Inline", + css_class="test", + preview_html="Preview content", + ), + ), + ] + + formatter = InlineEntryFormatter() + formatter_registry.register_section_formatter(formatter) + + try: + adata = AnnData(np.zeros((10, 5))) + adata._inline_marker = True + + html = adata._repr_html_() + assert "inline_entry" in html + assert "Preview content" in html + finally: + formatter_registry._section_formatters.pop("inline_section", None) + + def test_formatted_entry_not_serializable(self): + """Test formatted entry with not serializable marker.""" + from anndata._repr.registry import ( + FormattedEntry, + FormattedOutput, + FormatterContext, # noqa: TC001 + SectionFormatter, + formatter_registry, + ) + + class NotSerializableFormatter(SectionFormatter): + @property + def section_name(self) -> str: + return "not_serializable_section" + + def should_show(self, obj) -> bool: + return hasattr(obj, "_not_serializable_marker") + + def get_entries(self, obj, context: FormatterContext): + return [ + FormattedEntry( + key="bad_entry", + output=FormattedOutput( + type_name="BadType", + css_class="test", + is_serializable=False, + warnings=["Cannot be saved"], + ), + ), + ] + + formatter = NotSerializableFormatter() + formatter_registry.register_section_formatter(formatter) + + try: + adata = AnnData(np.zeros((10, 5))) + adata._not_serializable_marker = True + + html = adata._repr_html_() + assert "bad_entry" in html + assert "⚠" in html or "warning" in html.lower() + finally: + formatter_registry._section_formatters.pop("not_serializable_section", None) diff --git a/tests/repr/test_repr_robustness.py b/tests/repr/test_repr_robustness.py new file mode 100644 index 000000000..209c93846 --- /dev/null +++ b/tests/repr/test_repr_robustness.py @@ -0,0 +1,1511 @@ +""" +Adversarial robustness tests for the HTML repr module. + +These tests verify that the repr system handles malformed, broken, and +adversarial objects gracefully without crashing. The design philosophy is +"report what is there and what could not be done" - errors should be visible +in the output, not hidden or causing crashes. + +Test categories: +- Escaping coverage (verify html.escape at every insertion point) +- Unicode edge cases (emoji, CJK, RTL, Zalgo) +- Huge data (large strings, many categories, deep nesting) +- Broken objects (properties that raise, missing attributes) +- Type confusion and lying hasattr +- Circular references +- Thread safety + +All tests use the HTMLValidator to ensure proper HTML output and error reporting. +""" + +# ruff: noqa: EM101, RUF003 +# EM101: Exception string literals are used intentionally in test fixtures to create +# identifiable error messages that can be verified in test assertions. +# RUF003: Unicode lookalike characters in comments are intentional - we're testing +# that the repr handles confusable characters correctly (e.g., Cyrillic 'а' vs Latin 'a'). + +from __future__ import annotations + +import threading +from typing import TYPE_CHECKING + +import numpy as np + +if TYPE_CHECKING: + from typing import Any +import pandas as pd +import pytest +import scipy.sparse as sp + +import anndata as ad +from anndata import AnnData +from anndata._repr.core import render_x_entry +from anndata._repr.html import generate_repr_html +from anndata._repr.lazy import is_lazy_adata +from anndata._repr.registry import FormatterContext, formatter_registry +from anndata._repr.utils import ( + _get_categories_from_column, + get_backing_info, + is_backed, + is_serializable, + is_view, + sanitize_css_color, +) + +# ============================================================================= +# Evil object fixtures - the most absurd data and behavior +# ============================================================================= + + +class PropertyBomb: + """Object where every property access raises an exception.""" + + @property + def X(self) -> None: + raise RuntimeError("X exploded") + + @property + def obs(self) -> None: + raise MemoryError("obs exploded") + + @property + def is_view(self) -> None: + raise MemoryError("is_view exploded") + + @property + def isbacked(self) -> None: + raise RuntimeError("isbacked exploded") + + @property + def shape(self) -> None: + raise TypeError("shape exploded") + + @property + def dtype(self) -> None: + raise TypeError("dtype exploded") + + def __len__(self) -> int: + raise MemoryError("len exploded") + + +class LyingHasattr: + """Object where hasattr returns True but getattr fails.""" + + def __getattribute__(self, name: str) -> Any: + if name in ("X", "obs", "var", "uns"): + msg = f"Gotcha! {name} doesn't really exist" + raise AttributeError(msg) + return object.__getattribute__(self, name) + + +class BrokenRepr: + """Object where __repr__ and __str__ crash.""" + + def __repr__(self) -> str: + msg = "repr is broken" + raise ValueError(msg) + + def __str__(self) -> str: + msg = "str is broken" + raise TypeError(msg) + + +class RecursiveDict(dict): + """Dict that contains itself.""" + + def __init__(self) -> None: + super().__init__() + self["self"] = self + self["deeper"] = {"even_deeper": self} + + +class BrokenCategories: + """Object with broken categorical accessor.""" + + @property + def cat(self) -> Any: + class FakeCat: + @property + def categories(self) -> None: + raise RuntimeError("categories exploded") + + return FakeCat() + + +class ZalgoText: + """Generator for Zalgo (heavily combined) text.""" + + @staticmethod + def generate(base: str = "EVIL") -> str: + """Generate Zalgo text with many combining characters.""" + combiners = [ + "\u0300", # grave + "\u0301", # acute + "\u0302", # circumflex + "\u0303", # tilde + "\u0304", # macron + "\u0305", # overline + "\u0306", # breve + "\u0307", # dot above + "\u0308", # diaeresis + "\u0309", # hook above + "\u030a", # ring above + "\u030b", # double acute + "\u030c", # caron + "\u030d", # vertical line above + "\u030e", # double vertical line above + "\u030f", # double grave + ] + result = "" + for char in base: + result += char + # Add random combiners + for combiner in combiners[:8]: + result += combiner + return result + + +# ============================================================================= +# Tests for escaping coverage — verify html.escape() at every insertion point +# ============================================================================= + + +class TestEscapingCoverage: + """Verify html.escape() is applied at every user-data insertion point. + + We trust html.escape() (stdlib) — we only need to verify it's called. + Each test puts a single HTML marker in one insertion point and verifies + it appears escaped, not raw. + """ + + MARKER = "MARKER" + ESCAPED = "<b>MARKER</b>" + + def test_obs_column_name_escaped(self, validate_html): + """obs column names are escaped.""" + adata = AnnData(np.zeros((3, 3))) + adata.obs[self.MARKER] = [1, 2, 3] + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + assert self.MARKER not in html, "Raw HTML marker in obs column name" + assert self.ESCAPED in html + + def test_var_column_name_escaped(self, validate_html): + """var column names are escaped.""" + adata = AnnData(np.zeros((3, 5))) + adata.var[self.MARKER] = range(5) + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + assert self.MARKER not in html, "Raw HTML marker in var column name" + assert self.ESCAPED in html + + def test_uns_key_escaped(self, validate_html): + """uns dictionary keys are escaped.""" + adata = AnnData(np.zeros((3, 3))) + adata.uns[self.MARKER] = "value" + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + assert self.MARKER not in html, "Raw HTML marker in uns key" + assert self.ESCAPED in html + + def test_category_values_escaped(self, validate_html): + """Categorical preview values are escaped.""" + adata = AnnData(np.zeros((3, 3))) + adata.obs["cats"] = pd.Categorical([self.MARKER, "normal", "other"]) + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + assert self.MARKER not in html, "Raw HTML marker in category values" + assert self.ESCAPED in html + + def test_dataframe_columns_escaped(self, validate_html): + """DataFrame column names in obsm are escaped.""" + adata = AnnData(np.zeros((5, 3))) + evil_df = pd.DataFrame( + {self.MARKER: np.random.rand(5), "normal": np.random.rand(5)}, + index=adata.obs_names, + ) + adata.obsm["X_evil"] = evil_df + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + assert self.MARKER not in html, "Raw HTML marker in DataFrame column" + assert self.ESCAPED in html + + def test_readme_content_escaped(self, validate_html): + """README content in data attribute is escaped.""" + adata = AnnData(np.zeros((3, 3))) + adata.uns["README"] = f"# Title\n{self.MARKER}" + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_element_exists(".anndata-readme__icon") + # README content goes into data-readme attribute (HTML-escaped) + assert self.MARKER not in html, "Raw HTML marker in README content" + + def test_type_name_escaped(self, validate_html): + """type(obj).__name__ in uns display is escaped.""" + + class MaliciousType: + pass + + MaliciousType.__name__ = self.MARKER + + adata = AnnData(np.zeros((5, 3))) + adata.uns["evil_type"] = MaliciousType() + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + assert self.MARKER not in html, "Raw HTML marker in type name" + assert self.ESCAPED in html + + def test_exception_name_escaped(self, validate_html): + """Exception class __name__ in error display is escaped.""" + + class XSSException(Exception): + pass + + XSSException.__name__ = self.MARKER + + class MaliciousObject: + @property + def shape(self): + raise XSSException() + + adata = AnnData(np.zeros((5, 3))) + adata.uns["attack"] = MaliciousObject() + + with pytest.warns(UserWarning, match="Formatter.*:"): + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + assert self.MARKER not in html, "Raw HTML marker in exception name" + assert self.ESCAPED in html + + def test_style_breakout_escaped(self, validate_html): + """ in user data doesn't break out of style block.""" + adata = AnnData(np.zeros((3, 3))) + adata.var[""] = range(3) + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + # Only 1 legitimate tag + assert html.lower().count("") == 1 + assert "" not in html + + def test_div_breakout_escaped(self, validate_html): + """
in user data doesn't break out of container.""" + adata = AnnData(np.zeros((3, 3))) + adata.obs[""] = [1, 2, 3] + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + assert "" not in html + + def test_css_colors_sanitized(self): + """sanitize_css_color() blocks injection vectors.""" + # Semicolons (CSS property separator) + assert sanitize_css_color("red;padding:100px") is None + assert sanitize_css_color("blue; font-size:100px") is None + # url() and expression() + assert sanitize_css_color("url(https://evil.com)") is None + assert sanitize_css_color("expression(alert(1))") is None + # hsl() and var() not whitelisted + assert sanitize_css_color("hsl(120, 100%, 50%)") is None + assert sanitize_css_color("var(--user-color)") is None + # Very long strings rejected + assert sanitize_css_color("red" + "x" * 1000) is None + # Valid colors pass + assert sanitize_css_color("#ff0000") == "#ff0000" + assert sanitize_css_color("red") == "red" + assert sanitize_css_color("rgb(0,255,0)") == "rgb(0,255,0)" + + def test_special_chars_in_keys_escaped(self, validate_html): + """<, >, &, quotes in uns keys are escaped.""" + adata = AnnData(np.zeros((3, 3))) + adata.uns["keyspecial&chars"] = "value" + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + assert "" not in html, "Raw angle brackets in key" + assert "<with>" in html + + +# ============================================================================= +# Tests for Unicode edge cases +# ============================================================================= + + +class TestUnicodeEdgeCases: + """Test handling of Unicode edge cases.""" + + def test_emoji_in_column_names(self, validate_html): + """Emoji in column names should work.""" + adata = AnnData(np.zeros((3, 3))) + adata.obs["emoji_\U0001f4a9_poop"] = [1, 2, 3] + adata.obs["\U0001f600\U0001f601\U0001f602"] = [4, 5, 6] # Multiple emoji + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("obs") + # Emoji column should be visible (emoji preserved or the word "emoji") + assert "emoji" in html.lower() or "\U0001f4a9" in html + + def test_cjk_characters(self, validate_html): + """CJK characters in data should work.""" + adata = AnnData(np.zeros((4, 3))) + adata.obs["chinese_中文"] = ["猫", "狗", "鸟", "魚"] + adata.var["日本語"] = ["遺伝子", "発現", "解析"] + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("obs") + v.assert_section_exists("var") + + def test_rtl_override_character(self, validate_html): + """RTL override characters should be handled.""" + adata = AnnData(np.zeros((3, 3))) + # U+202E is right-to-left override + adata.obs["rtl_\u202eEVIL\u202c_text"] = [1, 2, 3] + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("obs") + + def test_zalgo_text(self, validate_html): + """Zalgo (heavily combined) text should not crash.""" + adata = AnnData(np.zeros((3, 3))) + zalgo = ZalgoText.generate("EVIL") + adata.obs[f"zalgo_{zalgo}"] = [1, 2, 3] + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("obs") + + def test_null_byte_in_string(self, validate_html): + """Null bytes in strings should be replaced, not leak into HTML.""" + adata = AnnData(np.zeros((3, 3))) + adata.uns["null_byte"] = "before\x00after" + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("uns") + assert "\x00" not in html + + def test_null_byte_in_column_name(self, validate_html): + """Null bytes in column names must not leak into HTML output.""" + adata = AnnData( + np.zeros((3, 2)), + obs=pd.DataFrame( + {"null\x00col": [1, 2, 3]}, + index=["a", "b", "c"], + ), + ) + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + assert "\x00" not in html + # Null byte replaced with U+FFFD (replacement character) + assert "\ufffd" in html + + def test_mixed_unicode_categories(self, validate_html): + """Mixed unicode in categorical should work.""" + adata = AnnData(np.zeros((6, 3))) + adata.obs["mixed"] = pd.Categorical([ + "English", + "日本語", + "العربية", + "עברית", + "emoji\U0001f600", + "Ελληνικά", + ]) + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("obs", "mixed") + + def test_zero_width_characters_in_names(self, validate_html): + """Zero-width chars create identical-looking but different columns.""" + adata = AnnData(np.zeros((3, 3))) + # These columns look identical but are different keys + adata.obs["gene"] = [1, 2, 3] + adata.obs["gene\u200b"] = [4, 5, 6] # Zero-width space + adata.obs["gene\u200d"] = [7, 8, 9] # Zero-width joiner + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("obs") + # All three columns should be shown as distinct entries + # They look the same but are different keys + assert html.count("gene") >= 3, "All 3 'gene' columns should be shown" + + +# ============================================================================= +# Tests for huge/large data handling +# ============================================================================= + + +class TestHugeDataHandling: + """Test handling of extremely large data.""" + + def test_huge_categorical_truncated(self, validate_html): + """Categoricals with many categories should be truncated.""" + adata = AnnData(np.zeros((50, 3))) + # Use 500 categories (still triggers truncation, but faster than 10000) + cats = [f"category_{i}" for i in range(500)] + adata.obs["huge_cat"] = pd.Categorical( + np.random.choice(cats[:50], size=50), categories=cats + ) + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("obs", "huge_cat") + v.assert_truncation_indicator() + # Should not show all 500 categories + category_count = html.count("category_") + assert category_count < 200, f"Too many categories shown: {category_count}" + + def test_giant_string_in_uns_truncated(self, validate_html): + """Giant strings (100KB) in uns should be truncated.""" + adata = AnnData(np.zeros((3, 3))) + adata.uns["giant"] = "x" * 100_000 # 100KB string (faster than 1MB) + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("uns") + # HTML should not be bloated (base template is ~50KB CSS/JS) + # 100KB string should be truncated to add only ~10-20KB content + assert len(html) < 80_000, f"HTML too large: {len(html)} chars" + # The string should be truncated (not all 100K x's present) + assert html.count("x") < 10_000, "Giant string should be truncated" + + def test_deeply_nested_uns(self, validate_html): + """Deeply nested structures (20 levels) should be handled.""" + adata = AnnData(np.zeros((3, 3))) + nested: dict = {} + current = nested + # 20 levels is enough to test depth limiting (reduced from 100) + for i in range(20): + current["level"] = {"depth": i} + current = current["level"] + adata.uns["deep"] = nested + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("uns") + + def test_many_uns_keys_truncated(self, validate_html): + """Many uns keys (300) should be truncated.""" + adata = AnnData(np.zeros((3, 3))) + # Use batch update instead of individual assignments (faster) + adata.uns.update({f"key_{i}": i for i in range(300)}) + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("uns") + v.assert_truncation_indicator() + # Should not show all 300 keys + key_count = html.count("key_") + assert key_count < 1500, f"Way too many keys shown: {key_count}" + + def test_wide_array_in_obsm(self, validate_html): + """Wide array (500 columns) in obsm should be handled.""" + adata = AnnData(np.zeros((10, 5))) + # 500 columns is enough to test (reduced from 1000) + adata.obsm["wide"] = np.random.rand(10, 500) + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("obsm") + v.assert_section_contains_entry("obsm", "wide") + + +# ============================================================================= +# Tests for broken/adversarial objects +# ============================================================================= + + +class TestBrokenObjects: + """Test handling of objects with broken attributes.""" + + @pytest.fixture + def context(self) -> FormatterContext: + return FormatterContext() + + def test_is_view_with_exploding_property(self) -> None: + """is_view should return False when property raises.""" + result = is_view(PropertyBomb()) + assert result is False + + def test_is_backed_with_exploding_property(self) -> None: + """is_backed should return False when property raises.""" + result = is_backed(PropertyBomb()) + assert result is False + + def test_get_backing_info_with_exploding_property(self) -> None: + """get_backing_info should return default when property raises.""" + result = get_backing_info(PropertyBomb()) + assert result == {"backed": False} + + def test_is_lazy_adata_with_exploding_obs(self) -> None: + """is_lazy_adata should return False when .obs raises.""" + result = is_lazy_adata(PropertyBomb()) + assert result is False + + def test_get_categories_with_exploding_cat_accessor(self) -> None: + """_get_categories_from_column should return [] when .cat raises.""" + with pytest.warns(UserWarning, match="Failed to extract categories"): + result = _get_categories_from_column(BrokenCategories()) + assert result == [] + + def test_is_serializable_with_circular_reference(self) -> None: + """is_serializable should detect circular references.""" + result = is_serializable(RecursiveDict()) + assert isinstance(result, tuple) + assert result[0] is False + + def test_render_x_entry_missing_x(self, context, validate_html) -> None: + """render_x_entry should show error for missing X attribute.""" + + class NoX: + pass + + result = render_x_entry(NoX(), context) + v = validate_html(result) + + v.assert_error_shown("AttributeError") + + def test_render_x_entry_x_raises(self, context, validate_html) -> None: + """render_x_entry should show error when X property raises.""" + result = render_x_entry(PropertyBomb(), context) + v = validate_html(result) + + v.assert_error_shown("RuntimeError") + + def test_fallback_formatter_len_raises(self, context) -> None: + """FallbackFormatter should handle __len__ raising.""" + + class LenRaises: + def __len__(self) -> int: + raise MemoryError("len exploded") + + output = formatter_registry.format_value(LenRaises(), context) + # Errors are now in output.error, not output.warnings + assert output.error is not None + assert "len()" in output.error + + def test_fallback_formatter_shape_raises(self, context) -> None: + """FallbackFormatter should handle .shape raising.""" + + class ShapeRaises: + @property + def shape(self) -> None: + raise TypeError("shape exploded") + + with pytest.warns(UserWarning, match="shape exploded"): + output = formatter_registry.format_value(ShapeRaises(), context) + # Errors are now in output.error, not output.warnings + assert output.error is not None + assert ".shape" in output.error + + def test_fallback_formatter_dtype_raises(self, context) -> None: + """FallbackFormatter should handle .dtype raising.""" + + class DtypeRaises: + shape = (3, 3) + + @property + def dtype(self) -> None: + raise TypeError("dtype exploded") + + with pytest.warns(UserWarning, match="dtype exploded"): + output = formatter_registry.format_value(DtypeRaises(), context) + # Errors are now in output.error, not output.warnings + assert output.error is not None + assert ".dtype" in output.error + + def test_fallback_formatter_broken_repr(self, context) -> None: + """FallbackFormatter should handle broken __repr__.""" + output = formatter_registry.format_value(BrokenRepr(), context) + assert output.type_name == "BrokenRepr" + + def test_object_with_failing_repr_in_uns(self, validate_html) -> None: + """Objects with failing __repr__ in uns should show type name.""" + adata = AnnData(np.zeros((3, 3))) + adata.uns["broken"] = BrokenRepr() + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("uns", "broken") + v.assert_text_visible("BrokenRepr") + + def test_object_with_failing_sizeof_in_uns(self, validate_html) -> None: + """Objects with failing __sizeof__ should still render.""" + + class FailingSizeof: + def __sizeof__(self): + msg = "Sizeof failed" + raise RuntimeError(msg) + + adata = AnnData(np.zeros((3, 3))) + adata.uns["failing_size"] = FailingSizeof() + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("uns") + + +# ============================================================================= +# Tests for concurrent access (thread safety) +# ============================================================================= + + +class TestThreadSafety: + """Test that repr generation doesn't crash under concurrent access. + + Note: This tests crash resistance, not full thread safety with synchronization. + Concurrent modification of AnnData while generating repr may raise exceptions, + but should never cause memory corruption or segfaults. + """ + + def test_concurrent_repr_same_object(self) -> None: + """Multiple threads generating reprs of the SAME object should work.""" + adata = ad.AnnData( + X=np.random.rand(50, 20), + obs=pd.DataFrame({"cat": pd.Categorical(["a", "b"] * 25)}), + var=pd.DataFrame({"gene": [f"g{i}" for i in range(20)]}), + ) + adata.obsm["X_pca"] = np.random.rand(50, 10) + adata.uns["params"] = {"k": 10} + + errors: list[Exception] = [] + results: list[str] = [] + + def generate_reprs() -> None: + try: + for _ in range(5): + html = generate_repr_html(adata) + results.append(html) + except Exception as e: # noqa: BLE001 + errors.append(e) + + threads = [threading.Thread(target=generate_reprs) for _ in range(4)] + for t in threads: + t.start() + for t in threads: + t.join(timeout=30) + + assert len(errors) == 0, f"Errors in concurrent repr: {errors}" + # All results should be valid HTML with expected content + assert len(results) == 20 # 4 threads × 5 iterations + for html in results: + assert "anndata-repr" in html + assert "50 × 20" in html or "50 ×" in html # shape in header + + def test_concurrent_repr_different_objects(self) -> None: + """Multiple threads generating reprs of different objects should work.""" + errors: list[Exception] = [] + + def generate_reprs() -> None: + try: + adata = ad.AnnData(X=np.random.rand(10, 10)) + for _ in range(5): + generate_repr_html(adata) + except Exception as e: # noqa: BLE001 + errors.append(e) + + threads = [threading.Thread(target=generate_reprs) for _ in range(4)] + for t in threads: + t.start() + for t in threads: + t.join(timeout=30) + + assert len(errors) == 0, f"Errors in concurrent repr: {errors}" + + def test_concurrent_repr_with_modifications(self) -> None: + """Concurrent repr generation while modifying should not crash.""" + adata = ad.AnnData(X=np.random.rand(10, 10)) + errors: list[Exception] = [] + successful_reprs: list[str] = [] + stop_flag = threading.Event() + + def modify_adata() -> None: + """Modify AnnData while repr is being generated.""" + i = 0 + # Limit iterations to avoid infinite loop if stop_flag never set + while not stop_flag.is_set() and i < 100: + try: + adata.obs[f"col_{i % 5}"] = np.random.rand(10) + i += 1 + except Exception: # noqa: BLE001 + pass # Expected during concurrent access + + def generate_reprs() -> None: + for _ in range(2): # Reduced from 5 to 2 + try: + html = generate_repr_html(adata) + if html: + successful_reprs.append(html) + except Exception as e: # noqa: BLE001 + errors.append(e) + + modifier = threading.Thread(target=modify_adata) + generators = [threading.Thread(target=generate_reprs) for _ in range(2)] + + modifier.start() + for g in generators: + g.start() + for g in generators: + g.join(timeout=5) # Reduced from 30 to 5 + stop_flag.set() + modifier.join(timeout=2) + + # Critical errors that indicate memory corruption or crashes - must never happen + critical_errors = [ + e + for e in errors + if isinstance(e, (SystemError, SegmentationError, MemoryError)) + ] + assert not critical_errors, ( + f"Critical errors during concurrent repr: {critical_errors}" + ) + + # At least some reprs should succeed even with concurrent modification + assert len(successful_reprs) > 0, ( + f"No successful reprs generated. Errors: {errors}" + ) + + # Successful reprs should be valid HTML (not corrupted) + for html in successful_reprs: + # HTML can start with "] = np.random.rand(30) + adata.var["breakout"] = np.random.rand(30) + + # uns with various evil content + adata.uns["normal"] = {"key": "value", "number": 42} + adata.uns["nested_deep"] = { + "l1": {"l2": {"l3": {"l4": {"l5": {"l6": "deep"}}}}} + } + adata.uns["giant_string"] = "x" * 10000 + adata.uns[""] = "xss_attempt" + adata.uns["null_byte"] = "before\x00after" + + # Many uns keys - use batch update (faster than loop) + adata.uns.update({f"spam_key_{i}": {"value": i} for i in range(50)}) + + # Categorical with many categories + big_cats = [f"category_{i}" for i in range(300)] + adata.obs["huge_categorical"] = pd.Categorical( + np.random.choice(big_cats[:50], size=50), categories=big_cats + ) + + # obsm/varm/layers + adata.obsm["X_pca"] = np.random.rand(50, 10) + adata.obsm["X_umap"] = np.random.rand(50, 2) + adata.layers["raw"] = sp.random(50, 30, density=0.1, format="csr") + adata.obsp["distances"] = sp.random(50, 50, density=0.05, format="csr") + + return adata + + def test_evil_adata_renders_safely(self, evil_adata, validate_html) -> None: + """Evil AnnData renders without crash, well-formed HTML, no raw XSS.""" + html = evil_adata._repr_html_() + v = validate_html(html) + + # Doesn't crash + well-formed HTML + v.assert_html_well_formed() + v.assert_element_exists(".anndata-repr") + v.assert_shape_displayed(50, 30) + + # No raw XSS + v.assert_no_raw_xss() + assert '' not in html + assert " tag + assert html.lower().count("") == 1 + + +# ============================================================================= +# Tests for arbitrary object types +# ============================================================================= + + +class TestArbitraryObjects: + """Test that functions handle completely arbitrary objects.""" + + @pytest.fixture + def context(self) -> FormatterContext: + return FormatterContext() + + @pytest.mark.parametrize( + "obj", + [ + None, + 42, + 3.14, + True, + "string", + b"bytes", + [], + [1, 2, 3], + {}, + {"a": 1}, + (), + (1, 2), + set(), + {1, 2}, + lambda x: x, + type("Empty", (), {})(), + ], + ) + def test_is_view_arbitrary(self, obj: Any) -> None: + """is_view should handle arbitrary objects without crash.""" + result = is_view(obj) + assert result is False + + @pytest.mark.parametrize( + "obj", + [None, 42, "string", [], {}, type("Empty", (), {})()], + ) + def test_is_backed_arbitrary(self, obj: Any) -> None: + """is_backed should handle arbitrary objects without crash.""" + result = is_backed(obj) + assert result is False + + @pytest.mark.parametrize( + "obj", + [None, 42, "string", [], {}, type("Empty", (), {})()], + ) + def test_is_lazy_adata_arbitrary(self, obj: Any) -> None: + """is_lazy_adata should handle arbitrary objects without crash.""" + result = is_lazy_adata(obj) + assert result is False + + @pytest.mark.parametrize( + "obj", + [None, 42, "string", [], {}, type("Empty", (), {})()], + ) + def test_get_categories_arbitrary(self, obj: Any) -> None: + """_get_categories_from_column should return [] for arbitrary objects.""" + result = _get_categories_from_column(obj) + assert result == [] + + @pytest.mark.parametrize( + "obj", + [ + None, + 42, + "string", + [], + {}, + np.array([1, 2, 3]), + pd.DataFrame({"a": [1, 2, 3]}), + ], + ) + def test_format_value_arbitrary(self, obj: Any, context: FormatterContext) -> None: + """format_value should handle arbitrary objects without crash.""" + output = formatter_registry.format_value(obj, context) + assert output.type_name is not None + + +# ============================================================================= +# Tests combining errors with real AnnData +# ============================================================================= + + +class TestRealAnnDataWithErrors: + """Test real AnnData objects with various problematic data.""" + + def test_circular_reference_in_uns(self, validate_html) -> None: + """Circular references in uns should not crash.""" + adata = AnnData(np.zeros((3, 3))) + adata.uns["circular"] = {"ref": adata.uns} + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("uns") + + def test_anndata_self_reference_in_uns(self, validate_html) -> None: + """AnnData that contains itself in uns should not crash.""" + adata = AnnData(np.zeros((3, 3))) + adata.uns["self"] = adata # AnnData containing itself + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("uns") + # Should show nested AnnData info + assert "AnnData" in html + + def test_none_values_in_obs(self, validate_html) -> None: + """None values in obs columns should be handled.""" + adata = AnnData(np.zeros((5, 3))) + adata.obs["with_none"] = [None, "a", None, "b", None] + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("obs", "with_none") + + def test_nan_and_inf_in_obs(self, validate_html) -> None: + """NaN and inf values in obs columns should be handled.""" + adata = AnnData(np.zeros((5, 3))) + adata.obs["with_nan"] = [np.nan, 1.0, np.nan, 2.0, np.nan] + adata.obs["with_inf"] = [np.inf, -np.inf, 0, 1, 2] + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("obs", "with_nan") + v.assert_section_contains_entry("obs", "with_inf") + + def test_empty_categorical(self, validate_html) -> None: + """Empty categorical (all None values) should be handled.""" + adata = AnnData(np.zeros((5, 3))) + adata.obs["empty_cat"] = pd.Categorical([None] * 5, categories=["a", "b", "c"]) + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("obs", "empty_cat") + + def test_zero_size_array_in_obsm(self, validate_html) -> None: + """Zero-size arrays in obsm should be handled.""" + adata = AnnData(np.zeros((5, 3))) + adata.obsm["empty"] = np.zeros((5, 0)) + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("obsm", "empty") + + def test_special_chars_in_keys(self, validate_html) -> None: + """Special characters in keys should be escaped.""" + adata = AnnData(np.zeros((3, 3))) + adata.uns["keyspecial&chars"] = "value" + adata.uns["quotes\"and'apostrophes"] = "value" + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("uns") + + def test_mixed_type_list_in_uns(self, validate_html) -> None: + """Mixed type lists in uns should be handled.""" + adata = AnnData(np.zeros((3, 3))) + adata.uns["mixed"] = [1, "string", None, 3.14, True, [1, 2], {"a": 1}] + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("uns", "mixed") + + +# ============================================================================= +# Tests for error visibility (crashing objects should show error messages) +# ============================================================================= + + +class TestErrorVisibility: + """Test that errors from crashing objects are visible in the repr. + + Error messages should be visible in the HTML output, not just in tooltips. + This is important for users to understand why their data might not display correctly. + """ + + def test_crashing_repr_shows_error_visibly(self, validate_html) -> None: + """Objects with crashing __repr__ should show error info visibly in the preview.""" + + class ExplodingRepr: + def __repr__(self): + raise RuntimeError("BOOM! __repr__ exploded") + + adata = AnnData(np.zeros((3, 3))) + adata.uns["exploding"] = ExplodingRepr() + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("uns", "exploding") + # Type name should still be shown (as fallback) + v.assert_text_visible("ExplodingRepr") + # Error should be VISIBLE in the text (not just tooltip) with red styling + v.assert_text_visible("RuntimeError") + v.assert_text_visible("repr()") + # Should use error text CSS class (red color) + v.assert_element_exists(".anndata-text--error") + + def test_crashing_len_shows_error_visibly(self, validate_html) -> None: + """Objects with crashing __len__ should show error info visibly.""" + + class ExplodingLen: + def __len__(self): + raise MemoryError("BOOM! __len__ exploded") + + adata = AnnData(np.zeros((3, 3))) + adata.uns["exploding_len"] = ExplodingLen() + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("uns", "exploding_len") + # Type name should still be shown + v.assert_text_visible("ExplodingLen") + # Error should be VISIBLE in the text (not just tooltip) + v.assert_text_visible("MemoryError") + v.assert_text_visible("len()") + + def test_crashing_shape_shows_error_visibly(self, validate_html) -> None: + """Objects with crashing .shape property should show error info visibly.""" + + class ExplodingShape: + @property + def shape(self): + raise TypeError("BOOM! shape exploded") + + adata = AnnData(np.zeros((3, 3))) + adata.uns["exploding_shape"] = ExplodingShape() + + with pytest.warns(UserWarning, match="BOOM! shape exploded"): + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("uns", "exploding_shape") + v.assert_text_visible("ExplodingShape") + # Error should be VISIBLE in the text (not just tooltip) + v.assert_text_visible("TypeError") + v.assert_text_visible(".shape") + + def test_very_long_error_message_not_in_html(self, validate_html) -> None: + """Very long error messages should NOT appear in HTML (only exception type).""" + + class VeryLongError: + @property + def shape(self): + # Create a very long error message (2KB+) + raise TypeError( + "LONG_ERROR_MSG " * 100 + "This is additional context. " * 50 + ) + + adata = AnnData(np.zeros((3, 3))) + adata.uns["long_error"] = VeryLongError() + + with pytest.warns(UserWarning, match="LONG_ERROR_MSG"): + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("uns", "long_error") + v.assert_text_visible("VeryLongError") + # Only the exception TYPE should be in HTML, not the full message + # The error display shows ".shape raised TypeError", not the message content + assert ".shape raised TypeError" in html, "Error indicator should be visible" + # The full error message content should NOT be in HTML + assert "LONG_ERROR_MSG" not in html, ( + "Full error message should not appear in HTML (only type name)" + ) + + def test_section_error_truncation_shows_ellipsis(self, validate_html) -> None: + """Section rendering errors with long messages should show '...' truncation.""" + from anndata._repr.sections import _render_error_entry + + # Create a very long error message (>200 chars) + long_error = "X" * 300 + + html = _render_error_entry("test_section", long_error) + + # Should be truncated with "..." + assert "..." in html, "Truncated error should show '...' indicator" + # Should not contain the full 300 X's + assert "X" * 250 not in html, "Error message should be truncated" + # Should contain some of the error + assert "X" * 50 in html, "Some error content should be visible" + + +# ============================================================================= +# Tests for section truncation with many entries +# ============================================================================= + + +class TestSectionTruncation: + """Test that sections with many entries are truncated properly.""" + + def test_varp_with_many_entries_truncated(self, validate_html) -> None: + """varp with many entries should show truncation indicator.""" + adata = AnnData(np.zeros((30, 30))) + + # Create one valid entry, then populate internal store directly + tiny_sparse = sp.csr_matrix(([1.0], ([0], [0])), shape=(30, 30)) + adata.varp["varp_000"] = tiny_sparse + # Add more entries directly to bypass validation + for i in range(1, 250): + adata.varp._data[f"varp_{i:03d}"] = tiny_sparse + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("varp") + # Should show truncation (default max_items=200) + v.assert_truncation_indicator() + # Should show "(250 items)" or similar count + assert "250" in html or "items" in html.lower() + + def test_uns_with_many_keys_truncated(self, validate_html) -> None: + """uns with many keys should be truncated.""" + adata = AnnData(np.zeros((3, 3))) + # Use batch update instead of loop (much faster) + adata.uns.update({f"key_{i:04d}": i for i in range(300)}) + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("uns") + v.assert_truncation_indicator() + # Should not show all 300 keys (allowing 4x for key appearing in multiple places: + # data-key, data-copy, visible text, tooltip) + key_count = html.count("key_") + assert key_count < 1200, f"Too many keys in HTML: {key_count}" + # Also verify that key_0299 (the last one) is NOT shown (truncation) + assert "key_0299" not in html, "Last key should not be shown (truncation)" + + +# ============================================================================= +# Tests for nested object visibility +# ============================================================================= + + +class TestNestedObjectVisibility: + """Test that nested/deeply nested objects are visible in the repr.""" + + def test_nested_dict_visible(self, validate_html) -> None: + """Nested dicts in uns should be visible.""" + adata = AnnData(np.zeros((3, 3))) + adata.uns["nested"] = { + "level1": { + "level2": { + "level3": "deep_value", + } + } + } + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("uns", "nested") + # The nested structure should be visible (type dict shown) + assert "dict" in html.lower() + + def test_multiple_references_to_same_anndata(self, validate_html) -> None: + """Multiple references to same AnnData should work.""" + adata = AnnData(np.zeros((3, 3))) + shared = AnnData(np.zeros((2, 2))) + # Same AnnData referenced multiple times + adata.uns["ref1"] = shared + adata.uns["ref2"] = shared + adata.uns["nested"] = {"ref3": shared} + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("uns") + + +# ============================================================================= +# Additional crashing object tests +# ============================================================================= + + +class LyingObject: + """Object where shape/dtype/len/str all raise via explicit properties. + + Note: Using explicit properties because Python's special method lookup + bypasses __getattr__. This simulates objects that claim to have properties + but crash when accessed. + """ + + @property + def shape(self): + raise AttributeError("I have no shape") + + @property + def dtype(self): + raise AttributeError("I have no dtype") + + def __len__(self): + raise AttributeError("I have no length") + + def __repr__(self): + return "LyingObject(all properties lie)" + + def __str__(self): + raise AttributeError("I have no str") + + +# ============================================================================= +# Bad color array tests +# ============================================================================= + + +class TestBadColorArrays: + """Tests for malformed color arrays in uns.""" + + def test_too_many_colors(self, validate_html) -> None: + """More colors than categories should be handled.""" + adata = AnnData(np.zeros((10, 3))) + adata.obs["cat"] = pd.Categorical(np.random.choice(["A", "B", "C"], size=10)) + # 6 colors for 3 categories + adata.uns["cat_colors"] = ["red", "green", "blue", "yellow", "purple", "orange"] + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("obs", "cat") + + def test_too_few_colors(self, validate_html) -> None: + """Fewer colors than categories should be handled.""" + adata = AnnData(np.zeros((10, 3))) + adata.obs["cat"] = pd.Categorical( + np.random.choice(["X", "Y", "Z", "W"], size=10) + ) + # 1 color for 4 categories + adata.uns["cat_colors"] = ["red"] + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("obs", "cat") + + def test_invalid_color_strings(self, validate_html) -> None: + """Invalid color strings should be handled gracefully.""" + adata = AnnData(np.zeros((10, 3))) + adata.obs["cat"] = pd.Categorical(np.random.choice(["alpha", "beta"], size=10)) + adata.uns["cat_colors"] = ["not_a_color", "also_invalid"] + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("obs", "cat") + + def test_strange_color_formats(self, validate_html) -> None: + """Various color formats (hex, rgb, rgba) should work.""" + adata = AnnData(np.zeros((10, 3))) + adata.obs["cat"] = pd.Categorical( + np.random.choice(["one", "two", "three"], size=10) + ) + adata.uns["cat_colors"] = [ + "#FF0000", # Valid hex + "rgb(0,255,0)", # Valid RGB + "rgba(0,0,255,0.5)", # Valid RGBA + ] + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("obs", "cat") + + def test_empty_colors_array(self, validate_html) -> None: + """Empty colors array should be handled.""" + adata = AnnData(np.zeros((10, 3))) + adata.obs["cat"] = pd.Categorical(np.random.choice(["p", "q"], size=10)) + adata.uns["cat_colors"] = [] + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_contains_entry("obs", "cat") + + +# ============================================================================= +# Nested AnnData with errors tests +# ============================================================================= + + +class TestNestedAnnDataWithErrors: + """Tests for nested AnnData objects containing broken objects.""" + + def test_nested_anndata_with_broken_objects(self, validate_html) -> None: + """Nested AnnData with broken objects should render gracefully.""" + parent = AnnData(np.zeros((5, 5))) + child = AnnData(np.zeros((3, 3))) + + # Add broken objects to child + child.uns["broken_repr"] = BrokenRepr() + child.uns["lying"] = LyingObject() + + parent.uns["nested"] = child + + html = parent._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_section_exists("uns") + # Should show the nested AnnData + assert "AnnData" in html + + +# ============================================================================= +# README data attribute edge cases +# ============================================================================= + + +class TestReadmeEdgeCases: + """Tests for README data attribute with edge-case content. + + README is displayed as plain text via textContent (not innerHTML), + so XSS vectors cannot fire. These tests verify that edge-case content + in the data-readme attribute doesn't break HTML well-formedness. + """ + + def test_large_readme_handled(self, validate_html) -> None: + """Large README (50KB+) should not bloat HTML.""" + adata = AnnData(np.zeros((3, 3))) + adata.uns["README"] = "Big README\n" + "A" * 50000 + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_element_exists(".anndata-readme__icon") + + def test_unicode_in_readme(self, validate_html) -> None: + """Unicode edge cases in data-readme attribute.""" + adata = AnnData(np.zeros((3, 3))) + adata.uns["README"] = """# Unicode Chaos +RTL: \u202eSIHT DAER\u202c +Null: before\x00after +Zalgo: H̸̡̪̯ͨ͊̽̅̾ḛ̫̞̜̹̙̈́͊̓̑̄̏ c̷̶̻̠̜̲̗̠̪o̶̜̹̠̺̗m̴̨̙̝̯͕̥̞̥͉̲e̴͕̫͉̮͇̣̮̼̱̤s̵̨͖̖̱̻̣͙̥̱͓ +Emoji: 💀💀💀💀💀 +""" + + html = adata._repr_html_() + v = validate_html(html) + + v.assert_html_well_formed() + v.assert_element_exists(".anndata-readme__icon") diff --git a/tests/repr/test_repr_sections.py b/tests/repr/test_repr_sections.py new file mode 100644 index 000000000..93eaa5b47 --- /dev/null +++ b/tests/repr/test_repr_sections.py @@ -0,0 +1,1319 @@ +""" +Section rendering tests for the _repr module. + +Tests for obs, var, uns, obsm, varm, obsp, varp, layers, and raw section rendering, +as well as custom section formatters. +""" + +from __future__ import annotations + +import numpy as np +import pandas as pd +import pytest +import scipy.sparse as sp + +from anndata import AnnData + + +class TestRawSection: + """Test .raw section display.""" + + def test_raw_section_present(self, validate_html): + """Test raw section appears when raw is set.""" + adata = AnnData(np.zeros((10, 20))) + adata.raw = adata.copy() + adata = adata[:, :5] + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("raw") + + def test_raw_none_no_section(self, validate_html): + """Test no raw section when raw is None.""" + adata = AnnData(np.zeros((10, 5))) + html = adata._repr_html_() + v = validate_html(html) + v.assert_element_exists(".anndata-repr") + + def test_raw_section_with_var(self, validate_html): + """Test raw section shows var info.""" + adata = AnnData(np.zeros((10, 20))) + adata.raw = adata.copy() + adata = adata[:, :5] + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("raw") + + def test_raw_section_with_varm(self, validate_html): + """Test raw section shows varm info.""" + adata = AnnData(np.zeros((10, 20))) + adata.varm["test"] = np.zeros((20, 3)) + adata.raw = adata.copy() + adata = adata[:, :5] + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("raw") + + def test_raw_index_preview_obs_names(self, validate_html): + """Test raw expanded view shows obs_names preview.""" + adata = AnnData( + np.zeros((5, 10)), + obs=pd.DataFrame(index=["alpha", "beta", "gamma", "delta", "epsilon"]), + ) + adata.raw = adata.copy() + html = adata._repr_html_() + v = validate_html(html) + v.assert_text_visible("obs_names") + v.assert_text_visible("alpha") + v.assert_text_visible("epsilon") + + def test_raw_index_preview_var_names(self, validate_html): + """Test raw expanded view shows var_names preview.""" + adata = AnnData( + np.zeros((5, 3)), + var=pd.DataFrame(index=["geneA", "geneB", "geneC"]), + ) + adata.raw = adata.copy() + html = adata._repr_html_() + v = validate_html(html) + v.assert_text_visible("var_names") + v.assert_text_visible("geneA") + v.assert_text_visible("geneC") + + def test_raw_index_preview_truncation(self): + """Test raw index preview truncates long indices with ellipsis.""" + adata = AnnData( + np.zeros((20, 15)), + obs=pd.DataFrame(index=[f"cell_{i}" for i in range(20)]), + var=pd.DataFrame(index=[f"gene_{i}" for i in range(15)]), + ) + adata.raw = adata.copy() + html = adata._repr_html_() + # First and last items should appear with ellipsis in between + assert "cell_0" in html + assert "cell_19" in html + assert "gene_0" in html + assert "gene_14" in html + assert "..." in html + + def test_raw_index_preview_differs_from_parent(self, validate_html): + """Test raw var_names preview shows original vars, not subsetted parent.""" + adata = AnnData( + np.zeros((5, 20)), + var=pd.DataFrame(index=[f"gene_{i}" for i in range(20)]), + ) + adata.raw = adata.copy() + adata = adata[:, :3] # Subset parent to 3 vars + html = adata._repr_html_() + # Raw should still show original var_names (20 genes) + assert "gene_19" in html + v = validate_html(html) + v.assert_text_visible("var_names") + + def test_raw_index_preview_absent_gracefully(self): + """Test raw handles missing obs_names/var_names gracefully.""" + from anndata._repr.registry import FormatterContext + from anndata._repr.sections import _generate_raw_repr_html + + class FakeRaw: + n_obs = 5 + n_vars = 3 + X = None + + @property + def obs_names(self): + msg = "no obs_names" + raise AttributeError(msg) + + @property + def var_names(self): + msg = "no var_names" + raise AttributeError(msg) + + ctx = FormatterContext(depth=1, max_depth=3, section="raw") + html = _generate_raw_repr_html(FakeRaw(), ctx) + assert "not available" in html + assert "obs_names" in html + assert "var_names" in html + + +class TestRepresentationCompleteness: + """Verify all data is accurately represented.""" + + def test_all_obs_columns_shown(self, validate_html): + """Test all obs columns appear in repr.""" + obs_cols = ["col_a", "col_b", "col_c", "col_d", "col_e"] + adata = AnnData( + np.zeros((10, 5)), obs=pd.DataFrame({c: list(range(10)) for c in obs_cols}) + ) + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("obs") + for col in obs_cols: + v.assert_section_contains_entry("obs", col) + + def test_all_var_columns_shown(self, validate_html): + """Test all var columns appear in repr.""" + var_cols = ["gene_name", "gene_id", "highly_variable"] + adata = AnnData( + np.zeros((10, 5)), var=pd.DataFrame({c: list(range(5)) for c in var_cols}) + ) + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("var") + for col in var_cols: + v.assert_section_contains_entry("var", col) + + def test_all_uns_keys_shown(self, validate_html): + """Test all uns keys appear in repr.""" + adata = AnnData(np.zeros((10, 5))) + adata.uns["key1"] = "value1" + adata.uns["key2"] = 42 + adata.uns["nested_dict"] = {"a": 1, "b": 2} + adata.uns["array_data"] = np.array([1, 2, 3]) + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("uns") + for key in ["key1", "key2", "nested_dict", "array_data"]: + v.assert_section_contains_entry("uns", key) + + def test_all_obsm_keys_shown(self, validate_html): + """Test all obsm keys appear in repr.""" + adata = AnnData(np.zeros((10, 5))) + adata.obsm["X_pca"] = np.random.randn(10, 50).astype(np.float32) + adata.obsm["X_umap"] = np.random.randn(10, 2).astype(np.float32) + adata.obsm["X_tsne"] = np.random.randn(10, 2).astype(np.float32) + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("obsm") + for key in ["X_pca", "X_umap", "X_tsne"]: + v.assert_section_contains_entry("obsm", key) + + def test_all_layers_shown(self, validate_html): + """Test all layers appear in repr.""" + adata = AnnData(np.zeros((10, 5))) + adata.layers["raw"] = np.random.randn(10, 5) + adata.layers["normalized"] = np.random.randn(10, 5) + adata.layers["scaled"] = np.random.randn(10, 5) + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("layers") + for layer in ["raw", "normalized", "scaled"]: + v.assert_section_contains_entry("layers", layer) + + def test_correct_shape_values(self, validate_html): + """Test shape values are accurate.""" + adata = AnnData(np.zeros((123, 456))) + html = adata._repr_html_() + v = validate_html(html) + v.assert_shape_displayed(123, 456) + + +class TestEdgeCases: + """Test edge cases and error handling.""" + + def test_very_large_keys(self, validate_html): + """Test handling of very long key names.""" + adata = AnnData(np.zeros((10, 5))) + long_key = "a" * 500 + adata.uns[long_key] = "value" + html = adata._repr_html_() + v = validate_html(html) + v.assert_element_exists(".anndata-repr") + v.assert_section_exists("uns") + + def test_unicode_keys(self, validate_html): + """Test handling of unicode key names.""" + adata = AnnData(np.zeros((10, 5))) + adata.uns["日本語"] = "value" + adata.uns["émojis_🧬"] = "dna" + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("uns") + v.assert_text_visible("日本語") + v.assert_text_visible("émojis") + + def test_empty_sections(self, validate_html): + """Test handling of empty sections.""" + adata = AnnData(np.zeros((10, 5))) + html = adata._repr_html_() + v = validate_html(html) + v.assert_element_exists(".anndata-repr") + + def test_very_nested_uns(self, validate_html): + """Test handling of deeply nested uns.""" + adata = AnnData(np.zeros((10, 5))) + nested = {"level": 0} + current = nested + for i in range(10): + current["nested"] = {"level": i + 1} + current = current["nested"] + adata.uns["deep"] = nested + html = adata._repr_html_() + v = validate_html(html) + v.assert_element_exists(".anndata-repr") + v.assert_section_contains_entry("uns", "deep") + + def test_mixed_types_in_uns(self, validate_html): + """Test handling of mixed types in uns.""" + adata = AnnData(np.zeros((10, 5))) + adata.uns["string"] = "value" + adata.uns["int"] = 42 + adata.uns["float"] = 3.14 + adata.uns["list"] = [1, 2, 3] + adata.uns["dict"] = {"a": 1} + adata.uns["array"] = np.array([1, 2, 3]) + adata.uns["sparse"] = sp.csr_matrix([[1, 0], [0, 1]]) + adata.uns["df"] = pd.DataFrame({"a": [1, 2]}) + html = adata._repr_html_() + v = validate_html(html) + v.assert_element_exists(".anndata-repr") + v.assert_section_exists("uns") + + +class TestMappingSectionEdgeCases: + """Tests for mapping section edge cases.""" + + def test_obsm_shape_meta_display(self, validate_html): + """Test obsm shows shape metadata.""" + adata = AnnData(np.zeros((10, 5))) + adata.obsm["X_pca"] = np.random.randn(10, 50) + adata.obsm["X_umap"] = np.random.randn(10, 2) + + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("obsm") + v.assert_section_contains_entry("obsm", "X_pca") + v.assert_section_contains_entry("obsm", "X_umap") + + def test_layers_truncation(self, validate_html): + """Test layers section truncates appropriately.""" + from anndata import settings + + adata = AnnData(np.zeros((10, 5))) + for i in range(20): + adata.layers[f"layer_{i}"] = np.random.randn(10, 5) + + with settings.override(repr_html_max_items=10): + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("layers") + v.assert_section_contains_entry("layers", "layer_0") + v.assert_truncation_indicator() + + +class TestUnsEntryRendering: + """Tests for uns entry rendering.""" + + def test_uns_entry_with_type_hint_preview_note(self, validate_html): + """Test uns entry with type hint shows preview note.""" + adata = AnnData(np.zeros((5, 3))) + adata.uns["typed_data"] = { + "__anndata_repr__": "somepackage.type", + "data": "value", + } + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_contains_entry("uns", "typed_data") + + def test_uns_entry_with_tuple(self, validate_html): + """Test uns entry with tuple.""" + adata = AnnData(np.zeros((5, 3))) + adata.uns["tuple_val"] = (1, 2, 3) + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_contains_entry("uns", "tuple_val") + + def test_uns_entry_with_empty_list(self, validate_html): + """Test uns entry with empty list.""" + adata = AnnData(np.zeros((5, 3))) + adata.uns["empty"] = [] + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_contains_entry("uns", "empty") + + def test_uns_entry_with_none(self, validate_html): + """Test uns entry with None.""" + adata = AnnData(np.zeros((5, 3))) + adata.uns["null_val"] = None + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_contains_entry("uns", "null_val") + + +class TestCustomSectionFormatters: + """Tests for custom section formatters.""" + + def test_custom_section_appears_after_specified_section(self): + """Test custom section appears in correct position.""" + from anndata._repr import FormattedEntry, FormattedOutput, SectionFormatter + from anndata._repr.registry import formatter_registry, register_formatter + + class TestCustomSection(SectionFormatter): + @property + def section_name(self): + return "test_custom" + + @property + def display_name(self): + return "Test Custom" + + @property + def after_section(self): + return "obs" + + def should_show(self, adata): + return True + + def get_entries(self, adata, context): + return [ + FormattedEntry( + key="custom_entry", + output=FormattedOutput(type_name="custom_type"), + ) + ] + + formatter = TestCustomSection() + register_formatter(formatter) + + try: + adata = AnnData( + np.zeros((10, 5)), + obs=pd.DataFrame({"a": list(range(10))}), + ) + html = adata._repr_html_() + + assert "Test Custom" in html + assert "custom_entry" in html + assert "custom_type" in html + finally: + if "test_custom" in formatter_registry._section_formatters: + del formatter_registry._section_formatters["test_custom"] + + def test_custom_section_not_shown_when_should_show_false(self): + """Test custom section hidden when should_show returns False.""" + from anndata._repr import FormattedEntry, FormattedOutput, SectionFormatter + from anndata._repr.registry import formatter_registry, register_formatter + + class HiddenCustomSection(SectionFormatter): + @property + def section_name(self): + return "hidden_section" + + def should_show(self, adata): + return False + + def get_entries(self, adata, context): + return [ + FormattedEntry( + key="should_not_appear", + output=FormattedOutput(type_name="hidden"), + ) + ] + + formatter = HiddenCustomSection() + register_formatter(formatter) + + try: + adata = AnnData(np.zeros((5, 3))) + html = adata._repr_html_() + + assert "hidden_section" not in html + assert "should_not_appear" not in html + finally: + if "hidden_section" in formatter_registry._section_formatters: + del formatter_registry._section_formatters["hidden_section"] + + +class TestUnknownSectionsDetection: + """Tests for unknown/custom attribute detection.""" + + def test_standard_sections_not_in_other(self, validate_html): + """Test standard sections don't appear in 'other' section.""" + adata = AnnData( + np.zeros((10, 5)), + obs=pd.DataFrame({"a": list(range(10))}), + var=pd.DataFrame({"b": list(range(5))}), + ) + adata.uns["test"] = "value" + adata.obsm["X_pca"] = np.zeros((10, 2)) + + html = adata._repr_html_() + v = validate_html(html) + # Standard sections should appear normally + v.assert_section_exists("obs") + v.assert_section_exists("var") + v.assert_section_exists("uns") + v.assert_section_exists("obsm") + + def test_registered_section_not_in_other(self): + """Test registered custom sections don't appear in 'other'.""" + from anndata._repr import FormattedEntry, FormattedOutput, SectionFormatter + from anndata._repr.registry import formatter_registry, register_formatter + + class RegisteredSection(SectionFormatter): + @property + def section_name(self): + return "registered_section" + + def should_show(self, adata): + return True + + def get_entries(self, adata, context): + return [ + FormattedEntry( + key="entry", output=FormattedOutput(type_name="type") + ) + ] + + formatter = RegisteredSection() + register_formatter(formatter) + + try: + adata = AnnData(np.zeros((5, 3))) + html = adata._repr_html_() + + assert "registered_section" in html + finally: + if "registered_section" in formatter_registry._section_formatters: + del formatter_registry._section_formatters["registered_section"] + + +class TestCoverageEdgeCases: + """Tests for edge cases to improve code coverage.""" + + def test_category_column_overflow(self): + """Test rendering categorical column with more than max categories.""" + from anndata import settings + + categories = [f"cat_{i}" for i in range(50)] + adata = AnnData( + np.zeros((100, 5)), + obs=pd.DataFrame({ + "many_cats": pd.Categorical( + np.random.choice(categories, 100), categories=categories + ) + }), + ) + + with settings.override(repr_html_max_categories=10): + html = adata._repr_html_() + assert "cat_0" in html + assert "...+" in html or "more" in html.lower() + + def test_max_depth_with_multiple_nested_anndata(self): + """Test max depth indicator with deeply nested AnnData.""" + from anndata import settings + + level2 = AnnData(np.zeros((5, 3))) + level1 = AnnData(np.zeros((7, 4))) + level1.uns["nested"] = level2 + level0 = AnnData(np.zeros((10, 5))) + level0.uns["nested"] = level1 + + with settings.override(repr_html_max_depth=0): + html = level0._repr_html_() + assert "max depth" in html.lower() or "depth" in html.lower() + + def test_dataframe_entry_nunique_exception(self): + """Test nunique() exception handling for dataframe columns.""" + adata = AnnData(np.zeros((10, 5))) + adata.obs["unhashable"] = [[i] for i in range(10)] + + html = adata._repr_html_() + assert html is not None + assert "unhashable" in html + + def test_very_large_dataframe_skips_nunique(self): + """Test that nunique() is skipped for very large columns.""" + from anndata import settings + + adata = AnnData(np.zeros((100000, 5)), obs=pd.DataFrame({"col": range(100000)})) + + with settings.override(repr_html_unique_limit=1000): + html = adata._repr_html_() + assert "col" in html + + def test_empty_category_counts(self): + """Test rendering category column with all unique values.""" + adata = AnnData( + np.zeros((10, 5)), + obs=pd.DataFrame({ + "unique_cats": pd.Categorical( + [f"cat_{i}" for i in range(10)], + categories=[f"cat_{i}" for i in range(10)], + ) + }), + ) + + html = adata._repr_html_() + assert "unique_cats" in html + assert "cat_0" in html + + +class TestCompleteDataVisibility: + """Tests ensuring all data is visible or truncation is indicated. + + Scientific display requirement: Nothing should be hidden from the user. + """ + + def test_all_varp_keys_shown(self, validate_html): + """Test all varp keys appear in repr.""" + adata = AnnData(np.zeros((10, 5))) + keys = ["corr", "covariance", "pvals"] + for key in keys: + adata.varp[key] = sp.random(5, 5, density=0.3, format="csr") + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("varp") + for key in keys: + v.assert_section_contains_entry("varp", key) + + def test_all_obsp_keys_shown(self, validate_html): + """Test all obsp keys appear in repr.""" + adata = AnnData(np.zeros((10, 5))) + keys = ["distances", "connectivities", "weights"] + for key in keys: + adata.obsp[key] = sp.random(10, 10, density=0.1, format="csr") + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("obsp") + for key in keys: + v.assert_section_contains_entry("obsp", key) + + def test_all_varm_keys_shown(self, validate_html): + """Test all varm keys appear in repr.""" + adata = AnnData(np.zeros((10, 5))) + keys = ["PCs", "loadings", "gene_embeddings"] + for key in keys: + adata.varm[key] = np.random.randn(5, 3) + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("varm") + for key in keys: + v.assert_section_contains_entry("varm", key) + + def test_truncation_indicator_when_many_items(self, validate_html): + """Test truncation indicator appears for many items.""" + from anndata import settings + + adata = AnnData(np.zeros((10, 5))) + for i in range(100): + adata.uns[f"key_{i}"] = i + + with settings.override(repr_html_max_items=10): + html = adata._repr_html_() + v = validate_html(html) + # Should indicate more items exist with specific truncation pattern + v.assert_truncation_indicator() + + def test_category_truncation_indicator(self): + """Test category truncation indicator appears.""" + from anndata import settings + + categories = [f"cat_{i}" for i in range(50)] + adata = AnnData( + np.zeros((100, 5)), + obs=pd.DataFrame({ + "many_cats": pd.Categorical( + np.random.choice(categories, 100), categories=categories + ) + }), + ) + + with settings.override(repr_html_max_categories=10): + html = adata._repr_html_() + assert "...+" in html or "more" in html.lower() + + +class TestSpecificInfoDisplay: + """Tests for specific information displayed in repr.""" + + def test_shape_values_accurate(self, validate_html): + """Test shape values are accurate.""" + adata = AnnData(np.zeros((123, 456))) + html = adata._repr_html_() + v = validate_html(html) + v.assert_shape_displayed(123, 456) + + def test_sparse_matrix_format_shown(self, validate_html): + """Test sparse matrix format is shown.""" + # Only CSR and CSC are supported by AnnData + for fmt in ["csr", "csc"]: + X = sp.random(100, 50, density=0.1, format=fmt) + adata = AnnData(X) + html = adata._repr_html_() + v = validate_html(html) + v.assert_text_visible(fmt) + + def test_sparse_matrix_density_or_nnz_shown(self, validate_html): + """Test sparse matrix shows density or nnz info.""" + X = sp.random(100, 50, density=0.1, format="csr") + adata = AnnData(X) + html = adata._repr_html_() + v = validate_html(html) + v.assert_element_exists(".anndata-repr") + # Should show either density percentage or nnz count + assert "%" in html or "nnz" in html.lower() or str(X.nnz) in html + + def test_dtype_shown_for_arrays(self, validate_html): + """Test dtype is shown for arrays.""" + adata = AnnData(np.zeros((10, 5), dtype=np.float32)) + html = adata._repr_html_() + v = validate_html(html) + v.assert_dtype_displayed("float32") + + def test_obsm_shape_shown(self, validate_html): + """Test obsm array shapes are shown.""" + adata = AnnData(np.zeros((10, 5))) + adata.obsm["X_pca"] = np.zeros((10, 50)) + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_contains_entry("obsm", "X_pca") + v.assert_text_visible("50") # Second dimension shown + + def test_category_count_shown(self, validate_html): + """Test category count is shown for categoricals.""" + adata = AnnData(np.zeros((10, 5))) + adata.obs["cat"] = pd.Categorical(["A", "B", "C"] * 3 + ["A"]) + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_contains_entry("obs", "cat") + # Should show category count or list categories + assert "3" in html or ("A" in html and "B" in html and "C" in html) + + def test_dataframe_in_obsm_shows_columns(self, validate_html): + """Test DataFrame in obsm shows column count or names.""" + adata = AnnData(np.zeros((10, 5))) + # DataFrame index must match adata.obs_names + adata.obsm["spatial"] = pd.DataFrame( + {"x": np.zeros(10), "y": np.zeros(10), "z": np.zeros(10)}, + index=adata.obs_names, + ) + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_contains_entry("obsm", "spatial") + # Should indicate it's a DataFrame with columns + assert "DataFrame" in html or "3" in html or "x" in html + + def test_nested_dict_shows_key_count(self, validate_html): + """Test nested dict shows key count.""" + adata = AnnData(np.zeros((10, 5))) + adata.uns["params"] = {"a": 1, "b": 2, "c": 3} + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_contains_entry("uns", "params") + # Should indicate dict with keys + assert "3" in html or "dict" in html.lower() or "a" in html + + def test_raw_section_shows_var_count(self, validate_html): + """Test raw section shows different var count.""" + adata = AnnData(np.zeros((10, 20))) + adata.raw = adata.copy() + adata = adata[:, :5] # Subset vars + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_exists("raw") + # Main shows 5 vars, raw should show 20 + v.assert_text_visible("20") + v.assert_text_visible("5") + + def test_list_shows_item_count(self, validate_html): + """Test lists show item count.""" + adata = AnnData(np.zeros((10, 5))) + adata.uns["steps"] = ["step1", "step2", "step3", "step4", "step5"] + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_contains_entry("uns", "steps") + assert "5" in html or "items" in html.lower() + + +class TestUnknownSectionsAndErrorHandling: + """Tests for unknown sections and error handling in section rendering.""" + + def test_unknown_attribute_detected(self): + """Test that unknown attributes on AnnData are handled gracefully.""" + adata = AnnData(np.zeros((5, 3))) + # Set a custom attribute that's not a standard AnnData attribute + adata._custom_unknown_attr = {"key": "value"} + html = adata._repr_html_() + # Should not crash + assert html is not None + + def test_raw_section_with_empty_var(self): + """Test raw section renders with empty var columns.""" + # Create adata with no var columns, then set raw + adata = AnnData(np.zeros((10, 5))) + # raw.var is read-only, so we need to create raw from adata without var columns + adata_no_cols = AnnData(np.zeros((10, 5))) + adata.raw = adata_no_cols + html = adata._repr_html_() + assert html is not None + assert "raw" in html.lower() + + def test_safe_get_attr_with_normal_object(self): + """Test _safe_get_attr returns attribute value.""" + from anndata._repr.sections import _safe_get_attr + + class Obj: + attr = "value" + + assert _safe_get_attr(Obj(), "attr", "default") == "value" + + def test_safe_get_attr_with_missing_attr(self): + """Test _safe_get_attr returns default for missing attribute.""" + from anndata._repr.sections import _safe_get_attr + + class Obj: + pass + + assert _safe_get_attr(Obj(), "missing", "default") == "default" + + def test_safe_get_attr_with_exception(self): + """Test _safe_get_attr returns default when property raises.""" + from anndata._repr.sections import _safe_get_attr + + class FailingObj: + @property + def bad_attr(self): + msg = "Access failed" + raise RuntimeError(msg) + + result = _safe_get_attr(FailingObj(), "bad_attr", "fallback") + assert result == "fallback" + + def test_get_raw_meta_parts_with_var(self): + """Test _get_raw_meta_parts extracts var column count.""" + from anndata._repr.sections import _get_raw_meta_parts + + adata = AnnData(np.zeros((5, 3))) + adata.var["gene_name"] = ["A", "B", "C"] + parts = _get_raw_meta_parts(adata) + assert any("var" in p for p in parts) + + def test_get_raw_meta_parts_with_varm(self): + """Test _get_raw_meta_parts extracts varm count.""" + from anndata._repr.sections import _get_raw_meta_parts + + adata = AnnData(np.zeros((5, 3))) + adata.varm["PCs"] = np.zeros((3, 2)) + parts = _get_raw_meta_parts(adata) + assert any("varm" in p for p in parts) + + def test_get_raw_meta_parts_exception_handling(self): + """Test _get_raw_meta_parts handles exceptions gracefully.""" + from anndata._repr.sections import _get_raw_meta_parts + + class BadRaw: + @property + def var(self): + msg = "var access failed" + raise RuntimeError(msg) + + @property + def varm(self): + msg = "varm access failed" + raise RuntimeError(msg) + + # Should not crash, returns empty list + parts = _get_raw_meta_parts(BadRaw()) + assert parts == [] + + def test_render_error_entry(self): + """Test _render_error_entry produces valid HTML.""" + from anndata._repr.sections import _render_error_entry + + html = _render_error_entry("test_section", "Test error message") + assert "test_section" in html + assert "error" in html.lower() + # Should be valid HTML structure + assert " 1000 + + +class TestSectionTooltips: + """Tests for section tooltips.""" + + def test_get_section_tooltip_all_sections(self): + """Test tooltips exist for all standard sections.""" + from anndata._repr.core import get_section_tooltip + + sections = [ + "obs", + "var", + "uns", + "obsm", + "varm", + "obsp", + "varp", + "layers", + "raw", + ] + for section in sections: + tooltip = get_section_tooltip(section) + assert isinstance(tooltip, str) + + def test_get_section_tooltip_unknown(self): + """Test tooltip for unknown section.""" + from anndata._repr.core import get_section_tooltip + + tooltip = get_section_tooltip("unknown_section") + assert tooltip == "" + + +class TestColorSwatchesAndCategories: + """Tests for color swatch display and category handling.""" + + def test_matching_colors_show_swatches(self, validate_html): + """Test matching *_colors show color swatches.""" + adata = AnnData(np.zeros((10, 5))) + adata.obs["cluster"] = pd.Categorical(["A", "B"] * 5) + adata.uns["cluster_colors"] = ["#FF0000", "#00FF00"] + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_contains_entry("obs", "cluster") + v.assert_color_swatch("#FF0000") + + def test_color_mismatch_shows_warning(self, validate_html): + """Test color count mismatch shows warning.""" + adata = AnnData(np.zeros((10, 5))) + adata.obs["cluster"] = pd.Categorical(["A", "B", "C"] * 3 + ["A"]) + # Only 2 colors for 3 categories + adata.uns["cluster_colors"] = ["#FF0000", "#00FF00"] + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_contains_entry("obs", "cluster") + # Should indicate mismatch warning + assert "mismatch" in html.lower() or "⚠" in html or "warning" in html.lower() + + def test_categories_shown_inline(self, validate_html): + """Test category values are shown inline.""" + adata = AnnData(np.zeros((10, 5))) + adata.obs["cluster"] = pd.Categorical( + ["TypeA", "TypeB", "TypeC"] * 3 + ["TypeA"] + ) + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_contains_entry("obs", "cluster") + v.assert_text_visible("TypeA") + v.assert_text_visible("TypeB") + v.assert_text_visible("TypeC") + + def test_many_categories_truncated_with_count(self, validate_html): + """Test many categories are truncated with remaining count.""" + from anndata import settings + + categories = [f"Type_{i}" for i in range(30)] + adata = AnnData( + np.zeros((30, 5)), + obs=pd.DataFrame({"cat": pd.Categorical(categories)}), + ) + + with settings.override(repr_html_max_categories=10): + html = adata._repr_html_() + v = validate_html(html) + v.assert_section_contains_entry("obs", "cat") + v.assert_text_visible("Type_0") + assert "...+" in html or "20" in html or "more" in html.lower() + + +class TestNestedStructuresAndDepth: + """Tests for nested structures and depth handling.""" + + def test_nested_anndata_shows_expand_button(self): + """Test nested AnnData has expand functionality.""" + inner = AnnData(np.zeros((5, 3))) + outer = AnnData(np.zeros((10, 5))) + outer.uns["nested"] = inner + html = outer._repr_html_() + # Should have expand functionality + assert "expand" in html.lower() or "Expand" in html + + def test_max_depth_shows_indicator(self): + """Test max depth shows indicator.""" + from anndata import settings + + level2 = AnnData(np.zeros((5, 3))) + level1 = AnnData(np.zeros((7, 4))) + level1.uns["inner"] = level2 + level0 = AnnData(np.zeros((10, 5))) + level0.uns["inner"] = level1 + + with settings.override(repr_html_max_depth=1): + html = level0._repr_html_() + # Should indicate max depth reached + assert "max depth" in html.lower() or "depth" in html + + def test_deeply_nested_dict_handled(self): + """Test deeply nested dicts are handled.""" + adata = AnnData(np.zeros((5, 3))) + nested = {"level": 0} + current = nested + for i in range(20): + current["nested"] = {"level": i + 1} + current = current["nested"] + adata.uns["deep"] = nested + html = adata._repr_html_() + assert html is not None + assert "deep" in html + + +# Fixtures needed by tests above + + +@pytest.fixture +def adata(): + """Basic AnnData for testing.""" + return AnnData( + np.random.randn(100, 50).astype(np.float32), + obs=pd.DataFrame( + {"batch": ["A", "B"] * 50}, index=[f"cell_{i}" for i in range(100)] + ), + var=pd.DataFrame( + {"gene_name": [f"gene_{i}" for i in range(50)]}, + index=[f"gene_{i}" for i in range(50)], + ), + ) + + +@pytest.fixture +def adata_full(): + """AnnData with all attributes populated.""" + import scipy.sparse as sp + + n_obs, n_vars = 100, 50 + adata = AnnData( + sp.random(n_obs, n_vars, density=0.1, format="csr", dtype=np.float32), + obs=pd.DataFrame({ + "batch": pd.Categorical(["A", "B"] * (n_obs // 2)), + "n_counts": np.random.randint(1000, 10000, n_obs), + "cell_type": pd.Categorical( + ["T", "B", "NK"] * (n_obs // 3) + ["T"] * (n_obs % 3) + ), + }), + var=pd.DataFrame({ + "gene_name": [f"gene_{i}" for i in range(n_vars)], + "highly_variable": np.random.choice([True, False], n_vars), + }), + ) + adata.uns["neighbors"] = {"params": {"n_neighbors": 15}} + adata.uns["batch_colors"] = ["#FF0000", "#00FF00"] + adata.obsm["X_pca"] = np.random.randn(n_obs, 50).astype(np.float32) + adata.obsm["X_umap"] = np.random.randn(n_obs, 2).astype(np.float32) + adata.varm["PCs"] = np.random.randn(n_vars, 50).astype(np.float32) + adata.layers["raw"] = sp.random(n_obs, n_vars, density=0.1, format="csr") + adata.obsp["distances"] = sp.random(n_obs, n_obs, density=0.01, format="csr") + adata.varp["gene_corr"] = sp.random(n_vars, n_vars, density=0.1, format="csr") + return adata diff --git a/tests/repr/test_repr_utils.py b/tests/repr/test_repr_utils.py new file mode 100644 index 000000000..e3b500927 --- /dev/null +++ b/tests/repr/test_repr_utils.py @@ -0,0 +1,510 @@ +""" +Utility function tests for the _repr module. + +Tests for serialization checks, color detection, formatting helpers, +preview functions, and other utilities. +""" + +from __future__ import annotations + +import numpy as np +import pandas as pd + +from anndata import AnnData + + +class TestSerializability: + """Tests for serialization detection utilities.""" + + def test_is_serializable_basic_types(self): + """Test serialization detection for basic types.""" + from anndata._repr.utils import is_serializable + + assert is_serializable(None)[0] + assert is_serializable(True)[0] # noqa: FBT003 + assert is_serializable(42)[0] + assert is_serializable(3.14)[0] + assert is_serializable("string")[0] + assert is_serializable(np.array([1, 2, 3]))[0] + assert is_serializable({"key": "value"})[0] + assert is_serializable([1, 2, 3])[0] + + def test_is_serializable_custom_class(self): + """Test custom classes are not serializable.""" + from anndata._repr.utils import is_serializable + + class CustomClass: + pass + + is_ok, reason = is_serializable(CustomClass()) + assert not is_ok + assert "CustomClass" in reason + + def test_is_serializable_nested_check(self): + """Test nested unserializable object is detected.""" + from anndata._repr.utils import is_serializable + + class CustomClass: + pass + + obj = {"valid": 1, "invalid": CustomClass()} + is_ok, reason = is_serializable(obj) + assert not is_ok + assert "invalid" in reason + + def test_is_serializable_list_with_unserializable(self): + """Test is_serializable catches unserializable item in list.""" + from anndata._repr.utils import is_serializable + + class BadType: + pass + + obj = [1, 2, BadType(), 4] + is_ok, reason = is_serializable(obj) + assert not is_ok + assert "Index 2" in reason + + def test_is_serializable_max_depth_exceeded(self): + """Test is_serializable handles deep nesting.""" + from anndata._repr.utils import is_serializable + + nested = {"level": 0} + current = nested + for i in range(15): + current["nested"] = {"level": i + 1} + current = current["nested"] + + is_ok, reason = is_serializable(nested, _max_depth=10) + assert not is_ok + assert "depth" in reason.lower() + + def test_is_serializable_numpy_scalar(self): + """Test is_serializable handles numpy scalar types.""" + from anndata._repr.utils import is_serializable + + assert is_serializable(np.int64(42))[0] + assert is_serializable(np.float32(3.14))[0] + assert is_serializable(np.bool_(True))[0] # noqa: FBT003 + + +class TestStringWarnings: + """Tests for string-to-category warning detection.""" + + def test_should_warn_string_column(self): + """Test string-to-category warning detection.""" + from anndata._repr.utils import should_warn_string_column + + s = pd.Series(["A", "B", "A", "C", "B", "A"]) + warn, msg = should_warn_string_column(s, s.nunique()) + assert warn + assert "3" in msg + + s = pd.Series(["A", "B", "C", "D", "E"]) + warn, msg = should_warn_string_column(s, s.nunique()) + assert not warn + + s = pd.Series([1, 2, 1, 2, 1]) + warn, msg = should_warn_string_column(s, s.nunique()) + assert not warn + + s = pd.Series(["A", "B", "A"]) + warn, msg = should_warn_string_column(s, None) + assert not warn + + def test_should_warn_string_column_with_none_nunique(self): + """Test should_warn_string_column handles None n_unique gracefully.""" + from anndata._repr.utils import should_warn_string_column + + s = pd.Series(["A", "B", "A", "C"]) + warn, _msg = should_warn_string_column(s, None) + assert not warn + + +class TestColorDetection: + """Tests for color list detection.""" + + def test_is_color_list(self): + """Test color list detection.""" + from anndata._repr.utils import is_color_list + + assert is_color_list("cluster_colors", ["#FF0000", "#00FF00", "#0000FF"]) + assert is_color_list("leiden_colors", np.array(["#123456", "#ABCDEF"])) + assert is_color_list("cluster_colors", []) + + assert not is_color_list("cluster", ["#FF0000"]) + assert not is_color_list("colors", ["#FF0000"]) + assert not is_color_list("cluster_colors", "#FF0000") + assert not is_color_list("cluster_colors", ["not_a_color", "also_not"]) + + def test_is_color_list_named_colors(self): + """Test is_color_list detects named colors.""" + from anndata._repr.utils import is_color_list + + assert is_color_list("cluster_colors", ["red", "blue", "green"]) + assert is_color_list("batch_colors", ["crimson", "navy"]) + + def test_is_color_list_rgb_format(self): + """Test is_color_list detects RGB/RGBA format.""" + from anndata._repr.utils import is_color_list + + assert is_color_list("cluster_colors", ["rgb(255, 0, 0)", "rgb(0, 255, 0)"]) + assert is_color_list("batch_colors", ["rgba(255, 0, 0, 0.5)"]) + + def test_is_color_list_none_first_element(self): + """Test is_color_list handles None in first element.""" + from anndata._repr.utils import is_color_list + + assert not is_color_list("cluster_colors", [None, "#FF0000"]) + + def test_get_matching_column_colors_var(self): + """Test get_matching_column_colors finds colors for var columns.""" + from anndata._repr.utils import get_matching_column_colors + + adata = AnnData(np.zeros((10, 5))) + adata.var["gene_type"] = pd.Categorical(["A", "B"] * 2 + ["A"]) + adata.uns["gene_type_colors"] = ["#FF0000", "#00FF00"] + + colors = get_matching_column_colors(adata, "gene_type") + assert colors == ["#FF0000", "#00FF00"] + + def test_get_matching_column_colors_no_uns_key(self): + """Test get_matching_column_colors returns None when no colors in uns.""" + from anndata._repr.utils import get_matching_column_colors + + adata = AnnData(np.zeros((10, 5))) + adata.obs["cell_type"] = pd.Categorical(["A", "B"] * 5) + + colors = get_matching_column_colors(adata, "cell_type") + assert colors is None + + +class TestFormatting: + """Tests for formatting utilities.""" + + def test_escape_html(self): + """Test HTML escaping.""" + from anndata._repr.utils import escape_html + + assert escape_html("' + return f"
{escape_html(val)}
" + + try: + html = adata._repr_html_() + # The raw script tag must NOT appear + assert '' not in html + # The escaped version should appear + assert "<script>" in html + finally: + from anndata._repr.registry import formatter_registry + + formatter_registry._section_formatters.pop("_test_escaped", None) + + +def test_repr_html_section_formatter_render_html_crash_fallback(adata): + """Crashing render_html falls back to get_entries.""" + import warnings + + from anndata._repr import ( + FormattedEntry, + FormattedOutput, + SectionFormatter, + register_formatter, + ) + + @register_formatter + class TestCrashFallbackSection(SectionFormatter): + section_names = ("_test_crash_fb",) + + @property + def section_name(self): + return "_test_crash_fb" + + @property + def after_section(self): + return "obs" + + def get_entries(self, obj, context): + return [ + FormattedEntry( + key="fallback_key", + output=FormattedOutput(type_name="str", preview="fallback_value"), + ) + ] + + def render_html(self, obj, context): + msg = "intentional crash" + raise RuntimeError(msg) + + try: + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + html = adata._repr_html_() + # Should still produce valid HTML + assert html is not None + assert "anndata-repr" in html + # Should warn about the crash + crash_warnings = [x for x in w if "render_html failed" in str(x.message)] + assert len(crash_warnings) == 1 + # Should fall back to get_entries + assert "fallback_key" in html + assert "fallback_value" in html + finally: + from anndata._repr.registry import formatter_registry + + formatter_registry._section_formatters.pop("_test_crash_fb", None) diff --git a/tests/visual_inspect_repr_html.py b/tests/visual_inspect_repr_html.py new file mode 100644 index 000000000..4bbd5da7e --- /dev/null +++ b/tests/visual_inspect_repr_html.py @@ -0,0 +1,3500 @@ +""" +Visual inspection script for AnnData HTML representation. + +Run this script to generate an HTML file that can be opened in a browser +to visually inspect the _repr_html_ output. + +Usage: + python tests/visual_inspect_repr_html.py + +Then open tests/repr_html_visual_test.html in your browser. + +Key extensibility examples: +- Test 12: Uns type hints (TypeFormatter for tagged data in uns) +- Test 14: TreeData custom sections (SectionFormatter for new sections) +- Test 19: MuData (SectionFormatter for .mod section) +- Test 20: SpatialData (custom _repr_html_ using building blocks) +- Test 25: Ecosystem package extensibility (TypeFormatter for obs/var columns) + +See also: +- src/anndata/_repr/registry.py: TypeFormatter and SectionFormatter APIs +- Reviewer's Guide gist for architecture overview +""" + +# ruff: noqa: EM101 +# EM101: Exception string literals are used intentionally in evil test objects +# RUF001/RUF003: Unicode lookalike characters are intentional - testing confusable chars + +from __future__ import annotations + +import tempfile +import warnings +from pathlib import Path + +import numpy as np +import pandas as pd +import scipy.sparse as sp + +import anndata as ad + +# Suppress anndata warning about string index transformation (not relevant for visual tests) +from anndata._warnings import ImplicitModificationWarning + +warnings.filterwarnings( + "ignore", + message="Transforming to str index", + category=ImplicitModificationWarning, +) +from anndata import AnnData # noqa: E402 +from anndata._repr import ( # noqa: E402 + FormattedOutput, + TypeFormatter, + escape_html, + extract_uns_type_hint, + register_formatter, +) + +# Check optional dependencies +try: + import dask.array as da + + HAS_DASK = True +except ImportError: + HAS_DASK = False + +try: + import xarray # noqa: F401 + + from anndata.experimental import read_lazy + + HAS_XARRAY = True +except ImportError: + HAS_XARRAY = False + +try: + import networkx as nx + from treedata import TreeData + + from anndata._repr import ( + FormattedEntry, + FormattedOutput, + FormatterContext, + SectionFormatter, + register_formatter, + ) + + HAS_TREEDATA = True + + def _render_tree_svg( + tree: nx.DiGraph, max_leaves: int = 30, width: int = 300, height: int = 150 + ) -> str: + """Render a tree as an SVG visualization. + + Uses a simple top-down layout similar to pycea's approach but generates SVG. + """ + # Find root and leaves + roots = [n for n in tree.nodes() if tree.in_degree(n) == 0] + if not roots: + return ( + "Invalid tree (no root)" + ) + root = roots[0] + + leaves = [n for n in tree.nodes() if tree.out_degree(n) == 0] + n_leaves = len(leaves) + + # Truncate if too many leaves + if n_leaves > max_leaves: + return ( + f"" + f"Tree with {n_leaves} leaves (too large to preview)" + ) + + # Compute depths using BFS + depths = {root: 0} + queue = [root] + while queue: + node = queue.pop(0) + for child in tree.successors(node): + depths[child] = depths[node] + 1 + queue.append(child) + + max_depth = max(depths.values()) if depths else 0 + if max_depth == 0: + return "Single node tree" + + # Assign y-coordinates (leaves get sequential positions) + y_coords = {} + leaf_idx = 0 + + def assign_y(node): + nonlocal leaf_idx + children = list(tree.successors(node)) + if not children: # leaf + y_coords[node] = leaf_idx + leaf_idx += 1 + else: + for child in children: + assign_y(child) + # Internal node: average of children + y_coords[node] = sum(y_coords[c] for c in children) / len(children) + + assign_y(root) + + # Scale coordinates + margin = 15 + x_scale = (width - 2 * margin) / max_depth if max_depth > 0 else 1 + y_scale = (height - 2 * margin) / (n_leaves - 1) if n_leaves > 1 else 1 + + def get_x(node): + return margin + depths[node] * x_scale + + def get_y(node): + return margin + y_coords[node] * y_scale + + # Generate SVG + svg_parts = [ + f'' + ] + + # Draw branches (parent -> child) + for parent, child in tree.edges(): + px, py = get_x(parent), get_y(parent) + cx, cy = get_x(child), get_y(child) + # Draw elbow connector (horizontal then vertical) + svg_parts.append( + f'' + ) + + # Draw nodes + for node in tree.nodes(): + x, y = get_x(node), get_y(node) + is_leaf = tree.out_degree(node) == 0 + r = 3 if is_leaf else 4 + fill = "#4a90d9" if is_leaf else "#333" + svg_parts.append( + f'' + ) + + svg_parts.append("") + return "".join(svg_parts) + + # TreeData documentation URL + TREEDATA_DOCS = "https://treedata.readthedocs.io/en/latest/" + + # Register TreeData section formatters + @register_formatter + class ObstSectionFormatter(SectionFormatter): + """Section formatter for obst (observation trees).""" + + @property + def section_name(self) -> str: + return "obst" + + @property + def after_section(self) -> str: + return "obsm" + + @property + def doc_url(self) -> str: + return TREEDATA_DOCS + + @property + def tooltip(self) -> str: + return "Tree annotation of observations (TreeData)" + + def should_show(self, obj) -> bool: + return hasattr(obj, "obst") and len(obj.obst) > 0 + + def get_entries(self, obj, context: FormatterContext) -> list[FormattedEntry]: + entries = [] + for key, tree in obj.obst.items(): + n_nodes = tree.number_of_nodes() + n_leaves = sum(1 for n in tree.nodes() if tree.out_degree(n) == 0) + # Generate SVG preview + svg_html = _render_tree_svg(tree) + output = FormattedOutput( + type_name=f"DiGraph ({n_nodes} nodes, {n_leaves} leaves)", + css_class="anndata-dtype--tree", + tooltip=f"Phylogenetic tree with {n_nodes} total nodes", + expanded_html=svg_html, + ) + entries.append(FormattedEntry(key=key, output=output)) + return entries + + @register_formatter + class VartSectionFormatter(SectionFormatter): + """Section formatter for vart (variable trees).""" + + @property + def section_name(self) -> str: + return "vart" + + @property + def after_section(self) -> str: + return "varm" + + @property + def doc_url(self) -> str: + return TREEDATA_DOCS + + @property + def tooltip(self) -> str: + return "Tree annotation of variables (TreeData)" + + def should_show(self, obj) -> bool: + return hasattr(obj, "vart") and len(obj.vart) > 0 + + def get_entries(self, obj, context: FormatterContext) -> list[FormattedEntry]: + entries = [] + for key, tree in obj.vart.items(): + n_nodes = tree.number_of_nodes() + n_leaves = sum(1 for n in tree.nodes() if tree.out_degree(n) == 0) + # Generate SVG preview + svg_html = _render_tree_svg(tree) + output = FormattedOutput( + type_name=f"DiGraph ({n_nodes} nodes, {n_leaves} leaves)", + css_class="anndata-dtype--tree", + tooltip=f"Phylogenetic tree with {n_nodes} total nodes", + expanded_html=svg_html, + ) + entries.append(FormattedEntry(key=key, output=output)) + return entries + + @register_formatter + class TreeMetadataSectionFormatter(SectionFormatter): + """Section formatter for TreeData metadata as a compact inline row. + + This demonstrates a fully custom section representation using + ``render_html()`` instead of the standard ``get_entries()`` path. + When ``render_html()`` is defined, it takes precedence and the + returned HTML is inserted directly — no ``
`` wrapping, + no entry grid. This is useful for compact metadata that doesn't + fit the "list of entries" pattern. + + Renders like the X entry — a single non-foldable line showing + key=value pairs for label, alignment, and allow_overlap. + All values are escaped via ``escape_html(repr(val))``. + """ + + @property + def section_name(self) -> str: + return "tree_metadata" + + @property + def display_name(self) -> str: + return "tree" + + @property + def after_section(self) -> str: + return "X" + + @property + def doc_url(self) -> str: + return TREEDATA_DOCS + + @property + def tooltip(self) -> str: + return "Tree configuration parameters (TreeData)" + + def should_show(self, obj) -> bool: + return hasattr(obj, "_tree_label") + + def get_entries(self, obj, context: FormatterContext) -> list[FormattedEntry]: + """Fallback if render_html fails (e.g., missing optional dependency).""" + entries = [] + for attr, label in [ + ("_tree_label", "label"), + ("_alignment", "alignment"), + ("_allow_overlap", "allow_overlap"), + ]: + val = getattr(obj, attr, None) + if val is not None: + output = FormattedOutput( + type_name=type(val).__name__, + preview=repr(val), + ) + entries.append(FormattedEntry(key=label, output=output)) + return entries + + def render_html(self, obj, context: FormatterContext) -> str: + """Render as a compact line instead of a foldable section.""" + from anndata._repr.utils import escape_html + + pairs = [] + for attr, label in [ + ("_tree_label", "label"), + ("_alignment", "alignment"), + ("_allow_overlap", "allow_overlap"), + ]: + val = getattr(obj, attr, None) + if val is not None: + pairs.append( + f'{label}=' + f"{escape_html(repr(val))}" + ) + summary = "   ".join(pairs) + return ( + '
' + f"tree" + f"{summary}" + "
" + ) + +except (ImportError, AttributeError): + # AttributeError can occur on Python 3.14+ with incompatible networkx versions + HAS_TREEDATA = False + +# Check for MuData +try: + from mudata import MuData + + from anndata._repr import ( + FormattedEntry, + FormattedOutput, + FormatterContext, + SectionFormatter, + register_formatter, + ) + from anndata._repr.html import generate_repr_html + from anndata._repr.utils import format_number + + HAS_MUDATA = True + + # Suppress MuData's internal mapping attributes using a SectionFormatter + # that handles multiple sections and returns empty (suppresses them) + @register_formatter + class MuDataInternalSectionsFormatter(SectionFormatter): + """Suppress MuData's internal mapping attributes.""" + + section_names = ("obsmap", "varmap", "axis") + + @property + def section_name(self) -> str: + return self.section_names[0] # Primary name for compatibility + + def should_show(self, obj) -> bool: + return False # Never show these sections + + def get_entries(self, obj, context): + return [] # No entries + + # Register a SectionFormatter for MuData's .mod section + # This allows generate_repr_html() to work directly on MuData objects + @register_formatter + class ModSectionFormatter(SectionFormatter): + """ + SectionFormatter for MuData's .mod attribute. + + This demonstrates how external packages (like mudata) can extend + anndata's HTML repr to add new sections. The .mod section contains + AnnData objects for each modality, similar to how .uns can contain + nested AnnData objects. + """ + + section_name = "mod" + priority = 200 # High priority to show before other sections + + @property + def after_section(self) -> str: + return "X" # Show right after X (before obs) + + @property + def doc_url(self) -> str: + return "https://mudata.readthedocs.io/en/latest/api/generated/mudata.MuData.html" + + @property + def tooltip(self) -> str: + return "Modalities (MuData)" + + def should_show(self, obj) -> bool: + return hasattr(obj, "mod") and len(obj.mod) > 0 + + def get_entries(self, obj, context: FormatterContext) -> list[FormattedEntry]: + entries = [] + for mod_name, adata in obj.mod.items(): + shape_str = ( + f"{format_number(adata.n_obs)} × {format_number(adata.n_vars)}" + ) + # Generate nested HTML for expandable content + can_expand = context.depth < context.max_depth + nested_html = None + if can_expand: + nested_html = generate_repr_html( + adata, + depth=context.depth + 1, + max_depth=context.max_depth, + show_header=True, + show_search=False, + ) + output = FormattedOutput( + type_name=f"AnnData ({shape_str})", + css_class="anndata-dtype--anndata", + tooltip=f"Modality: {mod_name}", + expanded_html=nested_html if can_expand else None, + is_serializable=True, + ) + entries.append(FormattedEntry(key=mod_name, output=output)) + return entries + +except ImportError: + HAS_MUDATA = False + MuData = None # type: ignore[assignment,misc] + + +# ============================================================================= +# SpatialData Example: Building custom _repr_html_ using anndata's building blocks +# ============================================================================= +# This demonstrates how packages like SpatialData can create their own _repr_html_ +# while reusing anndata's CSS, JavaScript, and rendering helpers. +# +# KEY BUILDING BLOCKS USED: +# - get_css() : Reuse anndata's CSS (dark mode, styling) +# - get_javascript(id) : Reuse anndata's JS (fold, search, copy) +# - render_section() : Render a collapsible section +# - render_formatted_entry() : Render a table row +# - FormattedEntry/Output : Data classes for entry configuration +# - generate_repr_html() : Embed nested AnnData objects +# - FormatterRegistry : (Optional) Allow third-party extensions + +try: + import uuid + + from anndata._repr import ( + FormattedEntry, + FormattedOutput, + FormatterContext, + FormatterRegistry, + SectionFormatter, + TypeFormatter, + escape_html, + format_number, + get_css, + get_javascript, + render_badge, + render_formatted_entry, + render_search_box, + render_section, + ) + from anndata._repr.html import generate_repr_html + + HAS_SPATIALDATA_EXAMPLE = True + + # ========================================================================= + # MockSpatialData: Minimal example of custom _repr_html_ + # ========================================================================= + + class MockSpatialData: + """ + Mock SpatialData demonstrating custom _repr_html_ with anndata's building blocks. + + This is a simplified example showing the essential pattern. A real + implementation would have more complex data structures. + """ + + def __init__( + self, + *, + images: dict | None = None, + labels: dict | None = None, + points: dict | None = None, + shapes: dict | None = None, + tables: dict | None = None, # Contains AnnData objects + coordinate_systems: list | None = None, + path: str | None = None, + ): + self.images = images or {} + self.labels = labels or {} + self.points = points or {} + self.shapes = shapes or {} + self.tables = tables or {} + self.coordinate_systems = coordinate_systems or [] + self.path = path + + def _repr_html_(self) -> str: + """ + Build HTML using anndata's building blocks. + + Pattern: + 1. get_css() - include styling + 2. Container div with unique ID + 3. Custom header (optional) + 4. Coordinate systems preview (like obs_names/var_names in AnnData) + 5. Sections using render_section() + render_formatted_entry() + 6. Custom sections via FormatterRegistry (optional) + 7. get_javascript(id) - include interactivity + """ + container_id = f"spatialdata-{uuid.uuid4().hex[:8]}" + parts = [] + + # --- STEP 1: Include anndata's CSS --- + parts.append(get_css()) + + # --- STEP 2: Container with anndata-repr class --- + parts.append( + f'
' + ) + + # --- STEP 3: Custom header (SpatialData has no shape) --- + parts.append(self._build_header(container_id)) + + # --- STEP 4: Coordinate systems preview (alternative to obs_names/var_names) --- + parts.append(self._build_coordinate_systems_preview()) + + # --- STEP 5: Sections using render_section() --- + parts.append('
') + parts.append(self._build_images_section()) + parts.append(self._build_labels_section()) + parts.append(self._build_points_section()) + parts.append(self._build_shapes_section()) + parts.append(self._build_tables_section()) # Nested AnnData + # --- STEP 6: Custom sections from FormatterRegistry --- + parts.append(self._build_custom_sections()) + parts.append("
") + + parts.append("
") + + # --- STEP 7: Include anndata's JavaScript --- + parts.append(get_javascript(container_id)) + + return "\n".join(parts) + + def _build_header(self, container_id: str) -> str: + """Custom header - shows 'SpatialData' with Zarr badge and file path.""" + parts = ['
'] + parts.append('SpatialData') + + # Zarr badge using render_badge() helper + if self.path: + parts.append( + render_badge( + "Zarr", "anndata-badge--backed", "Backed by Zarr storage" + ) + ) + parts.append( + f'' + f"{escape_html(self.path)}" + ) + + # Search box using render_search_box() helper + parts.append('') + parts.append(render_search_box(container_id)) + parts.append("
") + return "\n".join(parts) + + def _build_coordinate_systems_preview(self) -> str: + """ + Build coordinate systems preview - SpatialData's equivalent to obs_names/var_names. + + Simple list of coordinate system names with element details in tooltips. + """ + if not self.coordinate_systems: + return "" + + # Collect element names for tooltips + all_elements = [] + if self.images: + all_elements.extend([f"{k} (Images)" for k in self.images]) + if self.labels: + all_elements.extend([f"{k} (Labels)" for k in self.labels]) + if self.points: + all_elements.extend([f"{k} (Points)" for k in self.points]) + if self.shapes: + all_elements.extend([f"{k} (Shapes)" for k in self.shapes]) + + elements_str = ", ".join(all_elements) if all_elements else "no elements" + + # Build simple inline list + parts = ['
'] + parts.append( + 'coordinate_systems: ' + ) + + # Render coordinate systems as simple badges with tooltips + cs_parts = [] + for cs_name in self.coordinate_systems: + tooltip = f"Elements: {elements_str}" + cs_parts.append( + f'' + f"'{escape_html(cs_name)}'" + ) + + parts.append(", ".join(cs_parts)) + parts.append("
") + return "".join(parts) + + def _build_images_section(self) -> str: + """ + Build images section using render_section() + render_formatted_entry(). + + This is the core pattern: create FormattedEntry objects and render them. + """ + rows = [] + for name, info in self.images.items(): + # Build meta content (dimensions info) for the META column + dims_str = ", ".join(info.get("dims", ["y", "x"])) + meta = f'[{dims_str}]' + + # Create a FormattedEntry with FormattedOutput + entry = FormattedEntry( + key=name, + output=FormattedOutput( + type_name=f"DataArray {info['shape']} {info['dtype']}", + css_class="anndata-dtype--ndarray", + preview_html=meta, # Content in preview column (rightmost) + ), + ) + # render_formatted_entry() creates the table row HTML + rows.append(render_formatted_entry(entry)) + + # render_section() wraps rows in a collapsible section + return render_section( + "images", + "\n".join(rows), + n_items=len(self.images), + tooltip="Image data (xarray.DataArray)", + ) + + def _build_labels_section(self) -> str: + """Build labels section - same pattern as images.""" + rows = [] + for name, info in self.labels.items(): + dims_str = ", ".join(info.get("dims", ["y", "x"])) + meta = f'[{dims_str}]' + + entry = FormattedEntry( + key=name, + output=FormattedOutput( + type_name=f"Labels {info['shape']} {info['dtype']}", + css_class="anndata-dtype--ndarray", + preview_html=meta, + ), + ) + rows.append(render_formatted_entry(entry)) + + return render_section( + "labels", + "\n".join(rows), + n_items=len(self.labels), + tooltip="Segmentation masks (xarray.DataArray)", + ) + + def _build_points_section(self) -> str: + """Build points section.""" + rows = [] + for name, info in self.points.items(): + meta = f'{info["n_dims"]}D coordinates' + + entry = FormattedEntry( + key=name, + output=FormattedOutput( + type_name=f"dask.DataFrame ({format_number(info['n_points'])} × {info['n_dims']})", + css_class="anndata-dtype--dataframe", + preview_html=meta, + ), + ) + rows.append(render_formatted_entry(entry)) + + return render_section( + "points", + "\n".join(rows), + n_items=len(self.points), + tooltip="Point annotations (dask.DataFrame)", + ) + + def _build_shapes_section(self) -> str: + """Build shapes section.""" + rows = [] + for name, info in self.shapes.items(): + meta = f'{info["geometry_type"]}' + + entry = FormattedEntry( + key=name, + output=FormattedOutput( + type_name=f"GeoDataFrame ({format_number(info['n_shapes'])} shapes)", + css_class="anndata-dtype--dataframe", + preview_html=meta, + ), + ) + rows.append(render_formatted_entry(entry)) + + return render_section( + "shapes", + "\n".join(rows), + n_items=len(self.shapes), + tooltip="Vector shapes (geopandas.GeoDataFrame)", + ) + + def _build_tables_section(self) -> str: + """ + Build tables section with NESTED AnnData objects. + + Uses generate_repr_html() to embed full AnnData representations + that are expandable with all standard features. + """ + rows = [] + for name, adata in self.tables.items(): + # generate_repr_html() creates nested AnnData HTML + nested_html = generate_repr_html( + adata, + depth=1, # Nested level + max_depth=3, + show_header=True, + show_search=False, + ) + + # FormattedOutput with expanded_html makes it collapsible + entry = FormattedEntry( + key=name, + output=FormattedOutput( + type_name=f"AnnData ({adata.n_obs} × {adata.n_vars})", + css_class="anndata-dtype--anndata", + expanded_html=nested_html, # Makes the nested content collapsible + ), + ) + rows.append(render_formatted_entry(entry)) + + return render_section( + "tables", + "\n".join(rows), + n_items=len(self.tables), + tooltip="Annotation tables (AnnData)", + ) + + def _build_custom_sections(self) -> str: + """ + Render custom sections from FormatterRegistry. + + This demonstrates how third-party packages can add new sections + by registering SectionFormatters with spatialdata_formatter_registry. + """ + parts = [] + context = FormatterContext() + + for ( + section_name + ) in spatialdata_formatter_registry.get_registered_sections(): + formatter = spatialdata_formatter_registry.get_section_formatter( + section_name + ) + if formatter is None or not formatter.should_show(self): + continue + + entries = formatter.get_entries(self, context) + if not entries: + continue + + rows = [render_formatted_entry(entry) for entry in entries] + section_html = render_section( + formatter.section_name, + "\n".join(rows), + n_items=len(entries), + tooltip=getattr(formatter, "tooltip", ""), + ) + parts.append(section_html) + + return "\n".join(parts) + + # ========================================================================= + # OPTIONAL: FormatterRegistry for third-party extensibility + # ========================================================================= + # SpatialData can create its own registry to allow plugins to add + # custom type formatters or new sections. This mirrors anndata's pattern. + + # Create SpatialData's own formatter registry + spatialdata_formatter_registry = FormatterRegistry() + + # Example: TypeFormatter for custom value rendering + class DataTreeFormatter(TypeFormatter): + """Example: format xarray DataTree objects.""" + + priority = 100 + + def can_format(self, obj, context) -> bool: + return isinstance(obj, dict) and "shape" in obj and "dtype" in obj + + def format(self, obj, context: FormatterContext) -> FormattedOutput: + return FormattedOutput( + type_name=f"DataTree {obj['shape']} {obj['dtype']}", + css_class="anndata-dtype--ndarray", + ) + + spatialdata_formatter_registry.register_type_formatter(DataTreeFormatter()) + + # Example: SectionFormatter to add new sections + class TransformsSectionFormatter(SectionFormatter): + """Example: add a 'transforms' section.""" + + section_name = "transforms" + + def should_show(self, obj) -> bool: + return ( + hasattr(obj, "coordinate_systems") and len(obj.coordinate_systems) > 1 + ) + + def get_entries(self, obj, context: FormatterContext) -> list[FormattedEntry]: + cs = list(obj.coordinate_systems) + return [ + FormattedEntry( + key=f"{cs[i]} → {cs[i + 1]}", + output=FormattedOutput(type_name="Affine (3×3)"), + ) + for i in range(len(cs) - 1) + ] + + spatialdata_formatter_registry.register_section_formatter( + TransformsSectionFormatter() + ) + + # ========================================================================= + # Test data factory + # ========================================================================= + + def create_test_spatialdata(): + """Create a mock SpatialData object for testing.""" + # Create nested AnnData tables + cell_table = AnnData( + np.random.randn(150, 30).astype(np.float32), + obs=pd.DataFrame({ + "cell_type": pd.Categorical(["Tumor", "Immune", "Stromal"] * 50), + "area": np.random.uniform(100, 1000, 150), + }), + ) + cell_table.obsm["spatial"] = np.random.randn(150, 2).astype(np.float32) + + transcript_table = AnnData( + np.random.randn(80, 10).astype(np.float32), + obs=pd.DataFrame({ + "gene": pd.Categorical( + np.random.choice([f"gene_{i}" for i in range(10)], 80) + ), + }), + ) + + return MockSpatialData( + images={ + "raw_image": { + "shape": (3, 2048, 2048), + "dims": ("c", "y", "x"), + "dtype": "uint16", + }, + "processed": { + "shape": (3, 1024, 1024), + "dims": ("c", "y", "x"), + "dtype": "float32", + }, + }, + labels={ + "cell_segmentation": { + "shape": (2048, 2048), + "dims": ("y", "x"), + "dtype": "int32", + }, + "nucleus_segmentation": { + "shape": (2048, 2048), + "dims": ("y", "x"), + "dtype": "int32", + }, + }, + points={ + "transcripts": {"n_points": 50000, "n_dims": 3}, + }, + shapes={ + "cell_boundaries": {"n_shapes": 150, "geometry_type": "Polygon"}, + "roi_annotations": {"n_shapes": 5, "geometry_type": "Polygon"}, + }, + tables={ + "cell_annotations": cell_table, + "transcript_counts": transcript_table, + }, + coordinate_systems=["global", "aligned", "microscope"], + path="/data/experiment_001.zarr", + ) + +except (ImportError, AttributeError): + HAS_SPATIALDATA_EXAMPLE = False + + +def create_test_mudata(): + """Create a comprehensive test MuData with multiple modalities.""" + if not HAS_MUDATA: + return None + + np.random.seed(42) + + # RNA modality + n_cells = 100 + n_genes = 50 + rna = AnnData( + np.random.randn(n_cells, n_genes).astype(np.float32), + obs=pd.DataFrame({ + "cell_type": pd.Categorical( + ["T cell", "B cell", "NK cell"] * 33 + ["T cell"] + ), + "n_counts": np.random.randint(1000, 10000, n_cells), + }), + var=pd.DataFrame({ + "gene_name": [f"gene_{i}" for i in range(n_genes)], + "highly_variable": np.random.choice([True, False], n_genes), + }), + ) + rna.uns["cell_type_colors"] = ["#e41a1c", "#377eb8", "#4daf4a"] + rna.obsm["X_pca"] = np.random.randn(n_cells, 10).astype(np.float32) + rna.obsm["X_umap"] = np.random.randn(n_cells, 2).astype(np.float32) + rna.layers["raw"] = np.random.randn(n_cells, n_genes).astype(np.float32) + + # ATAC modality (same cells, different features) + n_peaks = 30 + atac = AnnData( + np.random.randn(n_cells, n_peaks).astype(np.float32), + obs=pd.DataFrame({ + "peak_count": np.random.randint(500, 5000, n_cells), + "tss_enrichment": np.random.uniform(2, 10, n_cells), + }), + var=pd.DataFrame({ + "peak_name": [f"peak_{i}" for i in range(n_peaks)], + "chr": [f"chr{i % 22 + 1}" for i in range(n_peaks)], + }), + ) + atac.obsm["X_lsi"] = np.random.randn(n_cells, 15).astype(np.float32) + + # Protein modality (subset of cells) + n_prot_cells = 80 + n_proteins = 20 + prot = AnnData( + np.random.randn(n_prot_cells, n_proteins).astype(np.float32), + obs=pd.DataFrame({ + "protein_count": np.random.randint(100, 1000, n_prot_cells), + }), + var=pd.DataFrame({ + "protein_name": [f"CD{i}" for i in range(n_proteins)], + "isotype_control": [i < 3 for i in range(n_proteins)], + }), + ) + + # Create MuData + import warnings + + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + mdata = MuData({"rna": rna, "atac": atac, "prot": prot}) + + # Add shared annotations + mdata.uns["experiment"] = "multiome_sample_001" + mdata.uns["processing_date"] = "2024-03-15" + + return mdata + + +def create_test_treedata(): + """Create a TreeData object with observation and variable trees.""" + if not HAS_TREEDATA: + return None + + np.random.seed(42) + n_obs = 24 # Small enough for SVG preview (< 30 leaves) + n_vars = 45 # Large enough to trigger "too large to preview" (> 30 leaves) + obs_names = [f"cell_{i}" for i in range(n_obs)] + var_names = [f"gene_{i}" for i in range(n_vars)] + + # Create observation tree (phylogenetic-like structure) + obs_tree = nx.DiGraph() + obs_tree.add_edges_from([ + ("root", "clade_A"), + ("root", "clade_B"), + ("clade_A", "subA1"), + ("clade_A", "subA2"), + ("clade_B", "subB1"), + ("clade_B", "subB2"), + ]) + for i, name in enumerate(obs_names): + parent = ["subA1", "subA2", "subB1", "subB2"][i % 4] + obs_tree.add_edge(parent, name) + + # Create variable tree (gene ontology-like structure, >30 leaves) + var_tree = nx.DiGraph() + var_tree.add_edges_from([ + ("all_genes", "pathway_X"), + ("all_genes", "pathway_Y"), + ("all_genes", "pathway_Z"), + ("pathway_X", "module_1"), + ("pathway_X", "module_2"), + ("pathway_Y", "module_3"), + ("pathway_Y", "module_4"), + ("pathway_Z", "module_5"), + ]) + for i, name in enumerate(var_names): + parent = ["module_1", "module_2", "module_3", "module_4", "module_5"][i % 5] + var_tree.add_edge(parent, name) + + # Create TreeData with explicit metadata values + tdata = TreeData( + X=np.random.randn(n_obs, n_vars).astype(np.float32), + obs=pd.DataFrame( + {"cell_type": pd.Categorical(["T cell", "B cell"] * (n_obs // 2))}, + index=obs_names, + ), + var=pd.DataFrame({"gene_name": var_names}, index=var_names), + obst={"phylogeny": obs_tree}, + vart={"gene_ontology": var_tree}, + label="phylogeny", + alignment="leaves", + allow_overlap=False, + ) + + # Add standard annotations + tdata.uns["cell_type_colors"] = ["#e41a1c", "#377eb8"] + tdata.obsm["X_pca"] = np.random.randn(n_obs, 10).astype(np.float32) + tdata.layers["raw"] = np.random.randn(n_obs, n_vars).astype(np.float32) + + return tdata + + +def create_test_anndata() -> AnnData: + """Create a comprehensive test AnnData with all features. + + This showcases common patterns from real single-cell analysis workflows: + - Sparse X matrix with typical density + - Categorical columns with color annotations + - Numeric QC metrics + - String columns (some will trigger serialization warnings) + - Datetime columns (will trigger serialization warnings) + - Boolean columns + - Cluster assignments (louvain, leiden) + - Dimensionality reductions (PCA, UMAP, t-SNE) + - Neighbor graphs + - Layers (raw counts, normalized) + - Raw attribute (unprocessed data) + - Various uns types (dicts, arrays, nested AnnData) + """ + n_obs, n_vars = 100, 50 + + # Main AnnData with sparse X + # obs: 5 columns (stays expanded below fold_threshold) + # var: more columns to demonstrate folding and various types + adata = AnnData( + sp.random(n_obs, n_vars, density=0.1, format="csr", dtype=np.float32), + obs=pd.DataFrame({ + # Categorical with colors (5 categories) + "cell_type": pd.Categorical( + ["T cell", "B cell", "NK cell", "Monocyte", "DC"] * (n_obs // 5) + ), + # Categorical with colors (8 clusters) + "louvain": pd.Categorical([ + f"cluster_{i}" for i in (np.random.randint(0, 8, n_obs)) + ]), + # Numeric QC metric + "n_counts": np.random.randint(1000, 50000, n_obs), + # Float QC metric + "percent_mito": np.random.uniform(0, 15, n_obs).astype(np.float32), + # Boolean column + "is_doublet": np.random.choice([True, False], n_obs, p=[0.1, 0.9]), + }), + var=pd.DataFrame({ + # Basic gene info + "gene_symbol": [f"GN{i}" for i in range(n_vars)], + "highly_variable": np.random.choice([True, False], n_vars, p=[0.2, 0.8]), + "means": np.random.exponential(1, n_vars).astype(np.float32), + "dispersions": np.random.exponential(0.5, n_vars).astype(np.float32), + # Categorical column + "chromosome": pd.Categorical([f"chr{i % 22 + 1}" for i in range(n_vars)]), + # String column (will trigger categorical conversion warning) + "gene_biotype": ["protein_coding"] * (n_vars - 5) + ["lncRNA"] * 5, + # Datetime column (will trigger serialization warning) + "annotation_date": pd.to_datetime(["2024-01-15"] * n_vars), + # All unique strings (no warning - too many unique values) + "ensembl_id": [f"ENSG{i:011d}" for i in range(n_vars)], + }), + ) + + # === Color annotations === + # Matching colors for cell_type (5 categories) + adata.uns["cell_type_colors"] = [ + "#FF6B6B", + "#4ECDC4", + "#45B7D1", + "#96CEB4", + "#FFEAA7", + ] + # Matching colors for louvain (8 clusters) + adata.uns["louvain_colors"] = [ + "#1f77b4", + "#ff7f0e", + "#2ca02c", + "#d62728", + "#9467bd", + "#8c564b", + "#e377c2", + "#7f7f7f", + ] + + # === Uns: Analysis results (typical scanpy output) === + adata.uns["neighbors"] = { + "connectivities_key": "connectivities", + "distances_key": "distances", + "params": {"n_neighbors": 15, "method": "umap", "metric": "euclidean"}, + } + adata.uns["pca"] = { + "variance": np.random.exponential(10, 50).astype(np.float32), + "variance_ratio": np.sort(np.random.uniform(0, 0.1, 50))[::-1].astype( + np.float32 + ), + } + adata.uns["umap"] = {"params": {"min_dist": 0.5, "spread": 1.0}} + adata.uns["louvain"] = {"params": {"resolution": 1.0, "random_state": 0}} + + # === Uns: Simple values === + adata.uns["experiment_id"] = "EXP_2024_001" + adata.uns["n_highly_variable"] = int(adata.var["highly_variable"].sum()) + adata.uns["total_counts"] = float(adata.obs["n_counts"].sum()) + adata.uns["processing_steps"] = [ + "filtering", + "normalization", + "hvg", + "pca", + "neighbors", + "umap", + "clustering", + ] + + # === Uns: Nested AnnData === + inner_adata = AnnData(np.zeros((10, 5))) + inner_adata.obs["inner_cluster"] = pd.Categorical(["A", "B"] * 5) + inner_adata.var["gene"] = [f"gene_{i}" for i in range(5)] + adata.uns["subset_adata"] = inner_adata + + # === Uns: Unserializable type (will warn) === + class CustomAnalysisResult: + def __repr__(self): + return "CustomAnalysisResult(n_clusters=8)" + + adata.uns["custom_result"] = CustomAnalysisResult() + + # === Obsm: Embeddings and metadata === + adata.obsm["X_pca"] = np.random.randn(n_obs, 50).astype(np.float32) + adata.obsm["X_umap"] = np.random.randn(n_obs, 2).astype(np.float32) + adata.obsm["X_tsne"] = np.random.randn(n_obs, 2).astype(np.float32) + # DataFrame in obsm (spatial coordinates) + adata.obsm["spatial"] = pd.DataFrame( + { + "x": np.random.uniform(0, 1000, n_obs), + "y": np.random.uniform(0, 1000, n_obs), + "z": np.random.uniform(0, 100, n_obs), + "area": np.random.uniform(50, 500, n_obs), + "perimeter": np.random.uniform(20, 100, n_obs), + }, + index=adata.obs_names, + ) + + # === Varm: Gene loadings === + adata.varm["PCs"] = np.random.randn(n_vars, 50).astype(np.float32) + + # === Layers: Different normalizations === + adata.layers["counts"] = sp.random( + n_obs, n_vars, density=0.1, format="csr", dtype=np.float32 + ) + adata.layers["normalized"] = np.random.randn(n_obs, n_vars).astype(np.float32) + adata.layers["log1p"] = np.log1p(np.abs(np.random.randn(n_obs, n_vars))).astype( + np.float32 + ) + + # === Obsp/Varp: Graphs === + adata.obsp["distances"] = sp.random( + n_obs, n_obs, density=0.05, format="csr", dtype=np.float32 + ) + adata.obsp["connectivities"] = sp.random( + n_obs, n_obs, density=0.05, format="csr", dtype=np.float32 + ) + adata.varp["gene_correlation"] = sp.random( + n_vars, n_vars, density=0.1, format="csr", dtype=np.float32 + ) + + # === Raw: Unprocessed data (common in scanpy workflows) === + raw_adata = AnnData( + sp.random(n_obs, n_vars + 20, density=0.1, format="csr", dtype=np.float32), + var=pd.DataFrame({ + "gene_name": [f"Gene_{i}" for i in range(n_vars + 20)], + "n_cells": np.random.randint(1, n_obs, n_vars + 20), + }), + ) + adata.raw = raw_adata + + return adata + + +def create_html_page(sections: list[tuple[str, str, str | None]]) -> str: + """Create a full HTML page with multiple test cases. + + Parameters + ---------- + sections + List of (title, html_content, description) tuples. + Description can be None for no description box. + """ + # Generate TOC entries + toc_items = [] + for item in sections: + title = item[0] + # Create anchor ID from title + anchor_id = title.lower().replace(" ", "-").replace("(", "").replace(")", "") + anchor_id = "".join(c for c in anchor_id if c.isalnum() or c == "-") + toc_items.append(f'
{title}') + + toc_html = "\n ".join(toc_items) + + html_parts = [ + f""" + + + + + + AnnData _repr_html_ Visual Test + + + + + + + +

AnnData _repr_html_ Visual Test

+

This page displays various AnnData configurations to visually verify the HTML representation.

+""" + ] + + for item in sections: + title = item[0] + html_content = item[1] + description = item[2] if len(item) > 2 else None + + # Create anchor ID from title (same logic as TOC generation) + anchor_id = title.lower().replace(" ", "-").replace("(", "").replace(")", "") + anchor_id = "".join(c for c in anchor_id if c.isalnum() or c == "-") + + desc_html = "" + if description: + desc_html = f'
{description}
' + + html_parts.append(f""" +
+

{title}

+ {desc_html} +
+ {html_content} +
+
+""") + + html_parts.append(""" + + +""") + + return "".join(html_parts) + + +def strip_script_tags(html: str) -> str: + """Remove tags from HTML to simulate no-JS environment.""" + import re + + return re.sub(r"", "", html, flags=re.DOTALL) + + +def strip_style_and_script_tags(html: str) -> str: + """Remove ", "", html, flags=re.DOTALL) + html = re.sub(r"", "", html, flags=re.DOTALL) + return html + + +def main(): # noqa: PLR0915, PLR0912 + """Generate visual test HTML file.""" + print("Generating visual test cases...") + + sections = [] + + # Test 1: Full AnnData + print(" 1. Full AnnData with all features") + adata_full = create_test_anndata() + sections.append(( + "1. Full AnnData (all features)", + adata_full._repr_html_(), + "A comprehensive AnnData with all standard attributes populated: X (sparse matrix), " + "obs/var with multiple columns including categoricals with colors, " + "obsm/varm with embeddings, uns with nested data, layers, and obsp/varp. " + "Use this as the baseline reference for a typical annotated dataset. " + "Each section header has a ? icon that links to the relevant anndata documentation, " + "and hovering over the section name shows a tooltip describing that attribute.", + )) + + # Test 2: Empty AnnData + print(" 2. Empty AnnData") + adata_empty = AnnData() + sections.append(( + "2. Empty AnnData", + adata_empty._repr_html_(), + "An AnnData with no data (0 × 0). Tests graceful handling of the edge case " + "where all sections are empty. Should show the header with shape and no sections.", + )) + + # Test 3: Minimal AnnData + print(" 3. Minimal AnnData (just X)") + adata_minimal = AnnData(np.zeros((10, 5))) + sections.append(( + "3. Minimal AnnData (just X matrix)", + adata_minimal._repr_html_(), + "Only an X matrix with no annotations. Tests the minimal case where only X section " + "is shown. obs/var exist with default integer indices but have no columns.", + )) + + # Test 4: View + print(" 4. AnnData View") + view = adata_full[0:20, 0:10] + sections.append(( + "4. AnnData View (subset)", + view._repr_html_(), + "A view (subset) of Test 1. Should display a 'View' badge in the header indicating " + "this is a reference to underlying data, not a copy. The shape shows the subset dimensions.", + )) + + # Test 5: Dense matrix + print(" 5. Dense matrix") + adata_dense = AnnData(np.random.randn(50, 30).astype(np.float32)) + adata_dense.obs["cluster"] = pd.Categorical(["A", "B", "C", "D", "E"] * 10) + adata_dense.uns["cluster_colors"] = [ + "#e41a1c", + "#377eb8", + "#4daf4a", + "#984ea3", + "#ff7f00", + ] + sections.append(( + "5. Dense Matrix with Categories", + adata_dense._repr_html_(), + "Dense numpy array X (not sparse). The X section shows 'ndarray' instead of CSR/CSC. " + "Also demonstrates categorical column with associated colors from uns (color dots appear).", + )) + + # Test 6: Many columns (collapsed sections) + print(" 6. Many columns (tests folding)") + adata_many = AnnData(np.zeros((20, 10))) + for i in range(15): + adata_many.obs[f"column_{i}"] = list(range(20)) + for i in range(12): + adata_many.obsm[f"X_embedding_{i}"] = np.random.randn(20, 2).astype(np.float32) + sections.append(( + "6. Many Columns (tests auto-folding)", + adata_many._repr_html_(), + "Sections with many entries (15 obs columns, 12 obsm embeddings) to test auto-folding. " + "Sections with >8 items collapse by default and show a fold indicator. " + "Click the section header or fold icon to expand/collapse.", + )) + + # Test 7: Special characters + print(" 7. Special characters in names") + adata_special = AnnData(np.zeros((5, 3))) + adata_special.obs["columnhtml"] = list(range(5)) + adata_special.obs["column&ersand"] = list(range(5)) + adata_special.uns["key\"with'quotes"] = "value" + adata_special.uns["unicode_日本語"] = "japanese" + sections.append(( + "7. Special Characters (XSS/Unicode test)", + adata_special._repr_html_(), + "Tests proper HTML escaping and Unicode handling. Column names with <html> tags, " + "ampersands, quotes, and Japanese characters should render correctly without breaking " + "the layout or causing XSS vulnerabilities.", + )) + + # Test 8a: Dask array (if available) - demonstrates lazy loading safety + if HAS_DASK: + print(" 8a. Dask array (lazy loading safety)") + # Create Dask arrays in multiple sections to show lazy handling + X_dask = da.random.random((1000, 500), chunks=(100, 100)) + adata_dask = AnnData(X_dask) + adata_dask.obs["cluster"] = pd.Categorical(["A", "B", "C"] * 333 + ["A"]) + adata_dask.var["gene_name"] = [f"gene_{i}" for i in range(500)] + # Dask arrays in layers and obsm + adata_dask.layers["counts"] = da.random.randint( + 0, 100, (1000, 500), chunks=(100, 100) + ) + adata_dask.obsm["X_pca"] = da.random.random((1000, 50), chunks=(100, 50)) + adata_dask.varm["loadings"] = da.random.random((500, 50), chunks=(100, 50)) + sections.append(( + "8a. Dask Arrays (Lazy Loading Safety)", + adata_dask._repr_html_(), + "Regular AnnData with Dask arrays — no .compute() triggered!
" + "

This is a normal (in-memory) AnnData where X, layers, obsm, and varm " + "are Dask arrays. The repr reads only metadata attributes:

" + "
    " + "
  • X: shape, dtype, chunks from Dask's lazy metadata
  • " + "
  • layers['counts']: Same — no computation
  • " + "
  • obsm['X_pca'], varm['loadings']: shape from .shape
  • " + "
" + "

Key distinction from 8b/8c: This object is not backed by " + "a file. The obs/var DataFrames are regular pandas objects in memory. " + "The 'lazy' aspect here refers only to Dask not computing array values.

", + )) + + # Test 8b: Lazy AnnData (experimental) - fully lazy obs/var + # Tests the lazy category loading behavior: + # - Categorical columns with few categories: load and display categories + # - Categorical columns with too many categories: show "(lazy)" to avoid loading + # - Categorical columns with colors in uns: display color swatches + # - Non-categorical columns: show "(lazy)" for all + if HAS_XARRAY: + print(" 8b. Lazy AnnData (experimental read_lazy)") + with tempfile.NamedTemporaryFile(suffix=".h5ad", delete=False) as tmp: + tmp_path = tmp.name + adata_lazy = None + h5_file = None + try: + import h5py + + # Create a comprehensive test file for lazy loading behavior + adata_to_save = AnnData( + sp.random(1000, 500, density=0.1, format="csr", dtype=np.float32) + ) + + # --- Categorical columns --- + # 1. Small categorical WITH colors (should show categories + color dots) + adata_to_save.obs["cell_type"] = pd.Categorical( + np.random.choice(["T cell", "B cell", "Monocyte", "NK cell"], 1000) + ) + adata_to_save.uns["cell_type_colors"] = [ + "#e41a1c", # T cell - red + "#377eb8", # B cell - blue + "#4daf4a", # Monocyte - green + "#984ea3", # NK cell - purple + ] + + # 2. Small categorical WITHOUT colors (should show categories only) + adata_to_save.obs["cluster"] = pd.Categorical( + np.random.choice(["C0", "C1", "C2", "C3", "C4"], 1000) + ) + + # 3. Medium categorical (50 cats) - will show truncation with max_lazy_categories=30 + medium_categories = [f"sample_{i}" for i in range(50)] + adata_to_save.obs["sample_id"] = pd.Categorical( + np.random.choice(medium_categories, 1000), + categories=medium_categories, # Ensure all 50 categories exist + ) + + # --- Non-categorical columns (all should show "(lazy)") --- + adata_to_save.obs["n_genes"] = np.random.randint(500, 5000, 1000) + adata_to_save.obs["total_counts"] = np.random.randint(1000, 50000, 1000) + + # --- var columns --- + adata_to_save.var["gene_symbol"] = [f"GENE{i}" for i in range(500)] + adata_to_save.var["highly_variable"] = np.random.choice([True, False], 500) + adata_to_save.var["mean_expression"] = np.random.uniform(0, 10, 500) + + # --- obsm/varm --- + adata_to_save.obsm["X_pca"] = np.random.randn(1000, 50).astype(np.float32) + adata_to_save.obsm["X_umap"] = np.random.randn(1000, 2).astype(np.float32) + adata_to_save.varm["PCs"] = np.random.randn(500, 50).astype(np.float32) + + # --- uns with array (to show dask array WITH size in uns) --- + adata_to_save.uns["neighbors"] = { + "connectivities_key": "connectivities", + "distances_key": "distances", + } + adata_to_save.uns["pca_variance"] = np.random.rand(50).astype(np.float32) + + adata_to_save.write_h5ad(tmp_path) + + # Read with experimental lazy loading + h5_file = h5py.File(tmp_path, "r") + adata_lazy = read_lazy(h5_file) + + # Use setting to demonstrate truncation behavior (default is 100) + # - cell_type (4 cats): all shown + # - cluster (5 cats): all shown + # - sample_id (50 cats): first 30 shown + "...+20" + original_max_lazy_cats = ad.settings.repr_html_max_lazy_categories + ad.settings.repr_html_max_lazy_categories = 30 + custom_lazy_html = adata_lazy._repr_html_() + ad.settings.repr_html_max_lazy_categories = original_max_lazy_cats + + sections.append(( + "8b. Lazy AnnData (Experimental)", + custom_lazy_html, + "anndata.experimental.read_lazy()
" + "

File-backed lazy AnnData — category labels loaded from disk!

" + "

" + "The header shows a Lazy (H5AD) badge and the file path (similar to backed mode). " + "Unlike 8a (in-memory) and 8c (metadata-only), this repr actually reads data from the HDF5 file:

" + "

What IS loaded from disk:

" + "
    " + "
  • cell_type: 4 category labels + 4 colors from uns
  • " + "
  • cluster: 5 category labels (no colors)
  • " + "
  • sample_id: first 30 of 50 category labels (truncated by max_lazy_categories=30)
  • " + "
" + "

What is NOT loaded:

" + "
    " + "
  • Numeric data (dask arrays not computed)
  • " + "
  • Category codes (only labels, not which cell has which category)
  • " + "
  • Categories beyond the max_lazy_categories limit
  • " + "
  • Non-categorical column values (show as '(lazy)')
  • " + "
" + "

" + "Compare with 8c to see the same object with zero disk I/O.

", + )) + + # Test 8c: Lazy AnnData with max_lazy_categories=0 (metadata-only mode) + print(" 8c. Lazy AnnData (metadata-only mode)") + + # Use setting to disable category loading (instead of parameter) + original_max_lazy_cats = ad.settings.repr_html_max_lazy_categories + ad.settings.repr_html_max_lazy_categories = 0 + metadata_only_html = adata_lazy._repr_html_() + ad.settings.repr_html_max_lazy_categories = original_max_lazy_cats + + sections.append(( + "8c. Lazy AnnData (Metadata-Only Mode)", + metadata_only_html, + "ad.settings.repr_html_max_lazy_categories = 0
" + "

Same object as 8b, but with zero disk I/O!

" + "

" + "Compare this output to 8b — this is the exact same lazy AnnData object, " + "but with max_lazy_categories=0 to prevent any data loading. " + "The header still shows the Lazy (H5AD) badge and file path.

" + "

What's NOT loaded (unlike 8b):

" + "
    " + "
  • Category labels — only shows (N categories) count from dtype metadata
  • " + "
  • Colors from uns — no color dots displayed
  • " + "
" + "

What IS shown (from already-loaded metadata):

" + "
    " + "
  • Category count (e.g., '4 categories') from the dtype (already in memory)
  • " + "
  • Column names and types
  • " + "
  • Array shapes and dtypes
  • " + "
" + "

Use case: Fastest possible repr when you want to avoid " + "all disk access (e.g., network-mounted storage, very large files).

", + )) + + except (OSError, ImportError, TypeError) as e: + print(f" Warning: Failed to create lazy example: {e}") + finally: + if h5_file is not None: + h5_file.close() + Path(tmp_path).unlink() + + # Test 8d: Lazy AnnData with Zarr format + print(" 8d. Lazy AnnData (Zarr format)") + import shutil + + zarr_path = Path(tempfile.mkdtemp(suffix=".zarr")) + try: + import zarr + + # Create test data for zarr + adata_zarr_save = AnnData( + sp.random(800, 400, density=0.1, format="csr", dtype=np.float32) + ) + adata_zarr_save.obs["tissue"] = pd.Categorical( + np.random.choice(["Brain", "Heart", "Liver", "Lung", "Kidney"], 800) + ) + adata_zarr_save.uns["tissue_colors"] = [ + "#e41a1c", + "#377eb8", + "#4daf4a", + "#984ea3", + "#ff7f00", + ] + adata_zarr_save.obs["donor"] = pd.Categorical( + np.random.choice([f"D{i}" for i in range(10)], 800) + ) + adata_zarr_save.obs["n_counts"] = np.random.randint(1000, 50000, 800) + adata_zarr_save.var["gene_name"] = [f"GENE{i}" for i in range(400)] + adata_zarr_save.obsm["X_umap"] = np.random.randn(800, 2).astype(np.float32) + + # Write to zarr + adata_zarr_save.write_zarr(zarr_path) + + # Read lazily from zarr + zarr_store = zarr.open_group(zarr_path, mode="r") + adata_lazy_zarr = read_lazy(zarr_store) + + sections.append(( + "8d. Lazy AnnData (Zarr Format)", + adata_lazy_zarr._repr_html_(), + "anndata.experimental.read_lazy(zarr_store)
" + "

Lazy AnnData backed by Zarr storage

" + "

" + "The header shows a Lazy (Zarr) badge and the zarr directory path. " + "Zarr is particularly useful for cloud storage (S3, GCS) and parallel access.

" + "

Same lazy behavior as 8b/8c:

" + "
    " + "
  • Category labels loaded on demand (respects max_lazy_categories)
  • " + "
  • Numeric columns show '(lazy)'
  • " + "
  • Arrays show shape/dtype without loading data
  • " + "
" + "

Zarr advantages: chunked storage, cloud-native, " + "supports concurrent reads, consolidatable metadata.

", + )) + + except (OSError, ImportError, TypeError) as e: + print(f" Warning: Failed to create zarr lazy example: {e}") + finally: + shutil.rmtree(zarr_path, ignore_errors=True) + else: + print(" 8b. Lazy AnnData (skipped - xarray not installed)") + + # Test 9: Backed AnnData (H5AD file) - demonstrates on-disk safety + print(" 9. Backed AnnData (H5AD file)") + with tempfile.NamedTemporaryFile(suffix=".h5ad", delete=False) as tmp: + tmp_path = tmp.name + adata_backed = None + try: + adata_to_save = AnnData( + sp.random(500, 200, density=0.1, format="csr", dtype=np.float32) + ) + adata_to_save.obs["cluster"] = pd.Categorical( + ["A", "B", "C"] * 166 + ["A", "B"] + ) + adata_to_save.obs["n_counts"] = np.random.randint(1000, 10000, 500) + adata_to_save.var["gene_name"] = [f"gene_{i}" for i in range(200)] + adata_to_save.var["highly_variable"] = np.random.choice([True, False], 200) + adata_to_save.obsm["X_pca"] = np.random.randn(500, 50).astype(np.float32) + adata_to_save.write_h5ad(tmp_path) + adata_backed = ad.read_h5ad(tmp_path, backed="r") + sections.append(( + "9. Backed AnnData (H5AD File)", + adata_backed._repr_html_(), + "File-backed mode via read_h5ad(backed='r')
" + f"{tmp_path}

" + "

Key difference from 8b (lazy): Backed mode loads obs/var " + "DataFrames fully into memory, while lazy mode keeps them as dask-backed xarray.

" + "

What the repr reads:

" + "
    " + "
  • X.shape, X.dtype, X.nnz — from HDF5 attributes
  • " + "
  • obs/var DataFrames — fully loaded in memory
  • " + "
  • obsm/varm shapes — from HDF5 dataset attributes
  • " + "
" + "

What stays on disk:

" + "
    " + "
  • The actual X matrix data (memory-mapped, not loaded)
  • " + "
", + )) + finally: + if adata_backed is not None: + adata_backed.file.close() + Path(tmp_path).unlink() + + # Test 10: Nested AnnData at depth + print(" 10. Deeply nested AnnData") + inner3 = AnnData(np.zeros((3, 2))) + inner2 = AnnData(np.zeros((5, 3))) + inner2.uns["level3"] = inner3 + inner1 = AnnData(np.zeros((10, 5))) + inner1.uns["level2"] = inner2 + outer = AnnData(np.zeros((20, 10))) + outer.uns["level1"] = inner1 + sections.append(( + "10. Deeply Nested AnnData (tests max depth)", + outer._repr_html_(), + "AnnData with 3 levels of nesting in uns (outer → level1 → level2 → level3). " + "Tests the max_depth limit for nested repr. By default, nesting stops at depth 3, " + "so level3 should show as a collapsed entry without further expansion. " + "Click expand arrows to drill into the nested structure.", + )) + + # Test 11: Many categories (tests truncation and wrap button) + # Default max_categories is 100, but we set it to 20 here to test truncation + print(" 11. Many categories (tests category truncation)") + adata_many_cats = AnnData(np.zeros((100, 10))) + # 30 categories - with max_categories=20 should show first 20 + '...+10' + many_cat_values = [f"type_{i}" for i in range(30)] * (100 // 30) + [ + f"type_{i}" for i in range(100 % 30) + ] + adata_many_cats.obs["cell_type"] = pd.Categorical(many_cat_values) + # Add colors for the categories + adata_many_cats.uns["cell_type_colors"] = [ + "#e41a1c", + "#377eb8", + "#4daf4a", + "#984ea3", + "#ff7f00", + "#ffff33", + "#a65628", + "#f781bf", + "#999999", + "#66c2a5", + "#fc8d62", + "#8da0cb", + "#e78ac3", + "#a6d854", + "#ffd92f", + "#e5c494", + "#b3b3b3", + "#1b9e77", + "#d95f02", + "#7570b3", + "#e7298a", + "#66a61e", + "#e6ab02", + "#a6761d", + "#666666", + "#8dd3c7", + "#ffffb3", + "#bebada", + "#fb8072", + "#80b1d3", + ] + # Also add a column with exactly 20 categories + adata_many_cats.obs["batch"] = pd.Categorical([f"batch_{i}" for i in range(20)] * 5) + adata_many_cats.uns["batch_colors"] = [ + "#1f77b4", + "#ff7f0e", + "#2ca02c", + "#d62728", + "#9467bd", + "#8c564b", + "#e377c2", + "#7f7f7f", + "#bcbd22", + "#17becf", + "#aec7e8", + "#ffbb78", + "#98df8a", + "#ff9896", + "#c5b0d5", + "#c49c94", + "#f7b6d2", + "#c7c7c7", + "#dbdb8d", + "#9edae5", + ] + # Use lower max_categories (default is 100) to demonstrate truncation + original_max_cats = ad.settings.repr_html_max_categories + ad.settings.repr_html_max_categories = 20 + sections.append(( + "11. Many Categories (tests truncation)", + adata_many_cats._repr_html_(), + "

Category truncation with max_categories=20 (default: 100)

" + "
    " + "
  • cell_type (30 cats): shows first 20 with colors, then '...+10' indicator
  • " + "
  • batch (20 cats): shows all 20 (exactly at limit)
  • " + "
" + "

Click the arrow button to expand and see all categories. " + "The expand button appears only when categories are truncated. " + "Colors are shown for all displayed categories from uns['{col}_colors'].

", + )) + ad.settings.repr_html_max_categories = original_max_cats + + # Test 12: Uns value previews and custom TypeFormatter + print(" 12. Uns value previews and type hints") + + # Register a custom TypeFormatter for tagged data in uns + @register_formatter + class AnalysisHistoryFormatter(TypeFormatter): + """Example TypeFormatter for analysis history data with embedded type hint.""" + + priority = 100 # High priority to check before fallback + + def can_format(self, obj, context): + hint, _ = extract_uns_type_hint(obj) + return hint == "example.history" + + def format(self, obj, context): + import json + + _hint, value = extract_uns_type_hint(obj) + + # Parse JSON if string, otherwise use as-is + if isinstance(value, str): + try: + data = json.loads(value) + except json.JSONDecodeError: + data = {"raw": value} + else: + data = value if isinstance(value, dict) else {"data": value} + + # Build a rich HTML preview + runs = data.get("runs", []) + params = data.get("params", {}) + + html_parts = ['
'] + if runs: + html_parts.append(f"{len(runs)} runs") + if params: + param_str = ", ".join(f"{k}={v}" for k, v in list(params.items())[:3]) + if len(params) > 3: + param_str += "..." + html_parts.append(f" · params: {param_str}") + html_parts.append("
") + + return FormattedOutput( + type_name="analysis history", + preview_html="".join(html_parts), # Use preview_html for inline preview + ) + + adata_uns = AnnData(np.zeros((10, 5))) + # Simple types with previews + adata_uns.uns["string_param"] = "A short string value" + adata_uns.uns["long_string"] = ( + "This is a very long string that should be truncated in the preview because it exceeds the maximum length allowed for display in the meta column" + ) + adata_uns.uns["int_param"] = 42 + adata_uns.uns["float_param"] = 3.14159265359 + adata_uns.uns["bool_param"] = True + adata_uns.uns["none_param"] = None + adata_uns.uns["small_list"] = [1, 2, 3] + adata_uns.uns["small_dict"] = {"a": 1, "b": 2} + adata_uns.uns["larger_dict"] = { + "key1": "val1", + "key2": "val2", + "key3": "val3", + "key4": "val4", + "key5": "val5", + } + + # Type hint WITH registered renderer (shows custom HTML) + adata_uns.uns["analysis_history"] = { + "__anndata_repr__": "example.history", + "runs": [{"id": 1}, {"id": 2}, {"id": 3}], + "params": {"method": "umap", "n_neighbors": 15, "metric": "euclidean"}, + } + + # Type hint WITHOUT registered renderer (shows fallback with import hint) + adata_uns.uns["unregistered_data"] = { + "__anndata_repr__": "otherpackage.custom_type", + "data": {"some": "data", "values": [1, 2, 3]}, + } + # String format type hint (also unregistered) + adata_uns.uns["string_hint"] = ( + "__anndata_repr__:otherpackage.config::{'setting': 'value'}" + ) + + sections.append(( + "12. Uns Value Previews and Type Hints", + adata_uns._repr_html_(), + "

Uns entries with value previews and type hint system

" + "
    " + "
  • Simple types: strings, ints, floats, bools, None show inline previews
  • " + "
  • long_string: truncated with ellipsis when exceeding max length
  • " + "
  • small_list/dict: shows content preview; larger_dict shows key count
  • " + "
  • analysis_history: custom TypeFormatter renders '3 runs · params: ...'
  • " + "
  • unregistered_data: has __anndata_repr__ hint but no formatter → shows 'import X to enable'
  • " + "
" + "

The __anndata_repr__ type hint system allows packages to register " + "custom renderers for their data types stored in uns.

", + )) + + # Test 13: No JavaScript (graceful degradation) + print(" 13. No JavaScript (graceful degradation)") + adata_nojs = AnnData(np.random.randn(30, 15).astype(np.float32)) + adata_nojs.obs["group"] = pd.Categorical(["X", "Y", "Z"] * 10) + adata_nojs.uns["group_colors"] = ["#e41a1c", "#377eb8", "#4daf4a"] + for i in range(8): + adata_nojs.obs[f"metric_{i}"] = np.random.randn(30) + adata_nojs.obsm["X_pca"] = np.random.randn(30, 10).astype(np.float32) + adata_nojs.layers["raw"] = np.random.randn(30, 15).astype(np.float32) + # Add nested AnnData to test native
expand without JS + adata_nojs.uns["nested_adata"] = AnnData( + np.zeros((5, 3)), + obs=pd.DataFrame({"label": ["A", "B", "C", "D", "E"]}), + ) + # Add raw section to test raw rendering without JS + adata_nojs.raw = adata_nojs.copy() + # Add a DataFrame with many columns to test column list wrapping without JS + adata_nojs.obsm["cell_measurements"] = pd.DataFrame( + { + "area": np.random.rand(30) * 500, + "perimeter": np.random.rand(30) * 100, + "circularity": np.random.rand(30), + "eccentricity": np.random.rand(30), + "solidity": np.random.rand(30), + "extent": np.random.rand(30), + "major_axis_length": np.random.rand(30) * 50, + "minor_axis_length": np.random.rand(30) * 30, + "orientation": np.random.rand(30) * 180, + "mean_intensity": np.random.rand(30) * 255, + "max_intensity": np.random.rand(30) * 255, + "min_intensity": np.random.rand(30) * 50, + "std_intensity": np.random.rand(30) * 30, + "centroid_x": np.random.randn(30) * 100, + "centroid_y": np.random.randn(30) * 100, + "bbox_area": np.random.rand(30) * 600, + "convex_area": np.random.rand(30) * 550, + "euler_number": np.random.randint(-2, 3, 30), + "equivalent_diameter": np.random.rand(30) * 25, + "filled_area": np.random.rand(30) * 500, + }, + index=adata_nojs.obs_names, + ) + # Strip script tags to simulate no-JS environment + nojs_html = strip_script_tags(adata_nojs._repr_html_()) + sections.append(( + "13. No JavaScript (graceful degradation)", + nojs_html, + "This example has script tags removed to simulate environments where JS is disabled. " + "All content should be visible, sections should be expanded, category lists and " + "DataFrame column lists should wrap naturally to multiple lines, and interactive buttons " + "(fold icons, copy buttons, search, wrap toggle) should be hidden. " + "The obsm 'cell_measurements' DataFrame has 20 columns to test column list wrapping. " + "Includes a nested AnnData in uns and a raw section — both use native <details> " + "for expand/collapse which works without JS.", + )) + + # Test 13b: No CSS (GitHub / untrusted notebook fallback) + # Reuse the full AnnData from test 1 (has nested adata, raw, many sections) + print(" 13b. No CSS (GitHub / untrusted notebook fallback)") + nocss_html = strip_style_and_script_tags(adata_full._repr_html_()) + # Wrap in an iframe (srcdoc) so it's fully isolated from the page's CSS. + # Without isolation, the , , css expressions) + # - Unicode bombs (emoji, CJK, RTL override, Zalgo, null bytes) + # - CRASHING OBJECTS (broken __repr__, __len__, __str__, properties) + # - Circular references (dict containing itself) + # - Infinite-like data (len = 10^18) + # - Huge categoricals (10,000 categories!) + # - Giant strings (50KB+) + # - Deeply nested structures (15+ levels) + print(" 24. Evil AnnData (Adversarial Robustness)") + + # === DEFINE CRASHING/EVIL OBJECT CLASSES === + + class ExplodingRepr: + """Object whose __repr__ crashes.""" + + def __repr__(self): + raise RuntimeError("BOOM! __repr__ exploded") + + class ExplodingLen: + """Object whose __len__ crashes.""" + + def __len__(self): + raise MemoryError("BOOM! __len__ exploded") + + class ExplodingStr: + """Object whose __str__ crashes.""" + + def __str__(self): + raise ValueError("BOOM! __str__ exploded") + + def __repr__(self): + return "ExplodingStr(str crashes)" + + class LyingObject: + """Object that lies about all its properties.""" + + @property + def shape(self): + raise AttributeError("I have no shape") + + @property + def dtype(self): + raise AttributeError("I have no dtype") + + def __len__(self): + raise AttributeError("I have no length") + + def __repr__(self): + return "LyingObject(all properties lie)" + + def __str__(self): + raise AttributeError("I have no str") + + class InfiniteLen: + """Object claiming impossibly large length.""" + + def __len__(self): + return 10**18 # 1 quintillion items + + def __repr__(self): + return "InfiniteLen(10^18 items)" + + class ExplodingShape: + """Object whose shape property explodes.""" + + @property + def shape(self): + raise TypeError("BOOM! shape exploded") + + def __repr__(self): + return "ExplodingShape(.shape crashes)" + + class ExplodingDtype: + """Object whose dtype property explodes.""" + + shape = (10, 10) # Normal shape + + @property + def dtype(self): + raise TypeError("BOOM! dtype exploded") + + def __repr__(self): + return "ExplodingDtype(.dtype crashes)" + + # === THE EVIL ANNDATA === + adata_evil = ad.AnnData(X=np.random.rand(50, 30).astype(np.float32)) + + # XSS injection attempts (6 variants) + adata_evil.obs["normal_column"] = np.random.choice(["A", "B", "C"], size=50) + adata_evil.obs[''] = np.random.randint(0, 10, size=50) + adata_evil.obs[""] = np.random.rand(50) + adata_evil.obs['onclick="evil()"'] = np.random.rand(50) + adata_evil.obs[""] = np.random.rand(50) + adata_evil.obs["javascript:alert(1)"] = np.random.rand(50) + + # Unicode bombs + adata_evil.obs["emoji_\U0001f4a9_poop"] = np.random.rand(50) + adata_evil.obs["chinese_\u4e2d\u6587"] = pd.Categorical( + np.random.choice(["cat", "dog", "bird"], size=50) + ) + adata_evil.obs["rtl_\u202eEVIL\u202c_override"] = np.random.rand(50) + adata_evil.obs["null\x00byte\x00col"] = np.random.rand(50) + + # XSS in CATEGORY VALUES (not just column names) + # Tests that category preview properly escapes malicious category names + xss_categories = [ + '', + "", + '', + "normal_category", + "
", + "javascript:void(0)", + ] + adata_evil.obs["xss_category_values"] = pd.Categorical( + np.random.choice(xss_categories, size=50), categories=xss_categories + ) + + # HTML/CSS breakout + adata_evil.var["gene_normal"] = [f"gene_{i}" for i in range(30)] + adata_evil.var[""] = np.random.rand(30) + adata_evil.var["
breakout"] = np.random.rand(30) + + # CRASHING OBJECTS in uns + adata_evil.uns["normal"] = {"key": "value", "nested": {"a": 1, "b": 2}} + adata_evil.uns["exploding_repr"] = ExplodingRepr() + adata_evil.uns["exploding_len"] = ExplodingLen() + adata_evil.uns["exploding_str"] = ExplodingStr() + adata_evil.uns["lying_object"] = LyingObject() + adata_evil.uns["infinite_len"] = InfiniteLen() + adata_evil.uns["exploding_shape"] = ExplodingShape() + adata_evil.uns["exploding_dtype"] = ExplodingDtype() + + # XSS VIA EXCEPTION CLASS NAME - tests that error messages escape __name__ + class XSSException(Exception): + pass + + XSSException.__name__ = "" + + class XSSViaException: + """Object whose .shape raises exception with XSS payload in __name__.""" + + @property + def shape(self): + raise XSSException("gotcha") + + adata_evil.uns["xss_via_exception"] = XSSViaException() + + # XSS VIA TYPE NAME - tests that type names are escaped + class XSSViaTypeName: + pass + + XSSViaTypeName.__name__ = "" + adata_evil.uns["xss_via_type_name"] = XSSViaTypeName() + + # UNKNOWN TYPE WARNING (orange text) - object pretending to be from anndata package + class FakeAnndataType: + """Unknown type from anndata package triggers warning (not error).""" + + __module__ = "anndata.experimental.fake" + + def __repr__(self): + return "FakeAnndataType()" + + adata_evil.uns["unknown_anndata_type"] = FakeAnndataType() + + # EVIL README - tests the readme modal with adversarial content. + # README is displayed as plain text via textContent (not innerHTML), + # so none of these vectors can fire. This just verifies the data-readme + # attribute handles edge-case content without breaking HTML structure. + evil_readme = ( + """Evil README - XSS and Injection Test + + + + + + +Unicode: \u202eSIHT DAER\u202c +Null bytes: before\x00after +Emoji bomb: \U0001f480\U0001f480\U0001f480\U0001f480\U0001f480 + +{{constructor.constructor('alert(1)')()}} +${alert('template_literal')} + +Size bomb below (50KB): +""" + + "A" * 50000 + ) + adata_evil.uns["README"] = evil_readme + + # CIRCULAR REFERENCE (dict) + circular_dict = {"level1": {"level2": {}}} + circular_dict["level1"]["level2"]["back_to_start"] = circular_dict + adata_evil.uns["circular_dict"] = circular_dict + + # CIRCULAR REFERENCE (AnnData self-reference) + adata_evil.uns["self_reference"] = adata_evil + + # CIRCULAR REFERENCE (nested AnnData with parent reference) + child_adata = ad.AnnData(np.zeros((5, 5))) + child_adata.uns["parent_ref"] = adata_evil + adata_evil.uns["child_with_parent_ref"] = child_adata + + # DEEPLY NESTED - 15 levels + deeply_nested = {} + current = deeply_nested + for i in range(15): + current[f"level_{i}"] = {} + current = current[f"level_{i}"] + current["bottom"] = "reached the bottom!" + adata_evil.uns["deeply_nested_15_levels"] = deeply_nested + + # XSS in uns keys + adata_evil.uns[""] = "XSS key" + + # GIANT STRING - 50KB + adata_evil.uns["giant_string_50kb"] = "X" * 50_000 + + # MANY ITEMS - 500 items + adata_evil.uns["many_items_500"] = {f"item_{i:04d}": i for i in range(500)} + + # HUGE categorical (10,000 categories!) + huge_cats = [f"category_{i:05d}" for i in range(10000)] + adata_evil.obs["huge_categorical_10k"] = pd.Categorical( + np.random.choice(huge_cats[:50], size=50), categories=huge_cats + ) + + # === MANY ENTRIES IN ONE SECTION === + # Test that sections with many entries are truncated (default max is 200) + # Create one valid entry, then directly populate internal store to skip validation + tiny_sparse = sp.csr_matrix(([1.0], ([0], [0])), shape=(30, 30)) + adata_evil.varp["varp_000"] = tiny_sparse # One valid entry to initialize + # Directly add to internal store (bypasses slow validation) + for i in range(1, 300): + adata_evil.varp._data[f"varp_{i:03d}"] = tiny_sparse + + # BAD COLORS - various malformed color arrays + # Too many colors (more than categories) + adata_evil.obs["cat_too_many_colors"] = pd.Categorical( + np.random.choice(["A", "B", "C"], size=50) + ) + adata_evil.uns["cat_too_many_colors_colors"] = [ + "red", + "green", + "blue", + "yellow", + "purple", + "orange", + ] + + # Too few colors (fewer than categories) + adata_evil.obs["cat_too_few_colors"] = pd.Categorical( + np.random.choice(["X", "Y", "Z", "W"], size=50) + ) + adata_evil.uns["cat_too_few_colors_colors"] = ["red"] + + # Non-colors (invalid color strings) + adata_evil.obs["cat_bad_colors"] = pd.Categorical( + np.random.choice(["alpha", "beta"], size=50) + ) + adata_evil.uns["cat_bad_colors_colors"] = ["not_a_color", "also_invalid"] + + # Strange formats + adata_evil.obs["cat_strange_colors"] = pd.Categorical( + np.random.choice(["one", "two", "three"], size=50) + ) + adata_evil.uns["cat_strange_colors_colors"] = [ + "#FF0000", # Valid hex + "rgb(0,255,0)", # Valid RGB + "rgba(0,0,255,0.5)", # Valid RGBA + ] + + # Empty colors array + adata_evil.obs["cat_empty_colors"] = pd.Categorical( + np.random.choice(["p", "q"], size=50) + ) + adata_evil.uns["cat_empty_colors_colors"] = [] + + # CSS injection attempts (must be blocked by whitelist) + adata_evil.obs["cat_css_injection"] = pd.Categorical( + np.random.choice(["x", "y", "z"], size=50) + ) + adata_evil.uns["cat_css_injection_colors"] = [ + "#ff0000", # Valid (should render) + "blue; } .adata-table { display:none } .x {", # CSS injection + "red; background-image: url(https://evil.com/steal)", # Data exfil + ] + + # URL/expression injection (must be blocked) + adata_evil.obs["cat_url_injection"] = pd.Categorical( + np.random.choice(["a", "b"], size=50) + ) + adata_evil.uns["cat_url_injection_colors"] = [ + "url(https://evil.com/track)", # URL injection + "expression(alert(1))", # IE expression injection + ] + + # Very long color strings (DoS protection) + adata_evil.obs["cat_long_colors"] = pd.Categorical( + np.random.choice(["m", "n"], size=50) + ) + adata_evil.uns["cat_long_colors_colors"] = [ + "red" + "x" * 1000, # Very long string + "blue", + ] + + # NESTED ANNDATA WITH ERRORS (should show yellow/red rows in nested content) + nested_with_errors = ad.AnnData(np.zeros((10, 10))) + nested_with_errors.uns["bad_obj_in_nested"] = ExplodingRepr() + nested_with_errors.uns["another_bad"] = LyingObject() + adata_evil.uns["nested_adata_with_errors"] = nested_with_errors + + # OBJECT WITH VERY LONG ERROR MESSAGE (in uns to test truncation) + class VeryLongErrorObject: + """Object that produces a very long error message.""" + + @property + def shape(self): + raise TypeError( + "This is a VERY LONG ERROR MESSAGE that should be properly truncated. " + * 10 + + "It contains lots of details about what went wrong: " + + "ValueError: The input array has shape (100, 200, 300) but expected (50, 100). " + + "Additional context: This error occurred while processing the data matrix. " + + "Stack trace would go here with many lines of debugging information. " + * 5 + ) + + def __repr__(self): + return "VeryLongErrorObject(produces long error)" + + adata_evil.uns["long_error_object_uns"] = VeryLongErrorObject() + + # SVG XSS - SVG elements with scripts (must be escaped) + adata_evil.uns["svg_script"] = "" + adata_evil.uns["svg_onload"] = '' + + # MUTATION XSS (mXSS) - malformed HTML that could mutate during parsing + adata_evil.uns["mxss_unclosed"] = "alert("col")': np.random.rand(50), + "": np.random.rand(50), + "normal_col": np.random.rand(50), + }, + index=adata_evil.obs_names, + ) + adata_evil.obsm["X_evil_df_cols"] = evil_df + + # Standard sections to show they still work + adata_evil.obsm["X_pca"] = np.random.rand(50, 10) + adata_evil.obsm["X_umap"] = np.random.rand(50, 2) + adata_evil.layers["raw_counts"] = np.random.randint(0, 100, (50, 30)) + adata_evil.obsp["connectivities"] = sp.random(50, 50, density=0.1, format="csr") + + # Suppress warnings from crashing objects during repr generation + # (The warnings are expected - we're testing that repr handles them gracefully) + with warnings.catch_warnings(): + warnings.simplefilter("ignore") + evil_html = adata_evil._repr_html_() + + sections.append(( + "24. Evil AnnData - Adversarial Robustness", + evil_html, + "Comprehensive adversarial testing: XSS injection, HTML/CSS breakout, " + "Unicode bombs, crashing objects, circular references, size bombs, SVG XSS, " + "mutation XSS, and encoding attacks.

" + "Crashing objects in uns (errors shown in red in preview column):
" + "
    " + "
  • exploding_repr - __repr__ raises RuntimeError
  • " + "
  • exploding_len - __len__ raises MemoryError
  • " + "
  • exploding_str - __str__ raises ValueError
  • " + "
  • lying_object - shape/dtype/len/str all raise AttributeError
  • " + "
  • infinite_len - len() returns 10^18 (suspicious)
  • " + "
  • exploding_shape - .shape property raises TypeError
  • " + "
  • exploding_dtype - .dtype property raises TypeError
  • " + "
  • xss_via_exception - exception with XSS in __name__ (must be escaped)
  • " + "
  • xss_via_type_name - type with XSS in __name__ (must be escaped)
  • " + "
  • long_error_object_uns - very long error message (should truncate)
  • " + "
  • unknown_anndata_type - unknown type warning shown in orange
  • " + "
" + "Evil README (click icon to open modal):
" + "
    " + "
  • README is displayed as plain text via textContent, so no vectors can fire
  • " + "
  • Contains: script tags, event handlers, style injection, closing tags
  • " + "
  • Unicode: RTL override, null bytes, emoji
  • " + "
  • Template injection attempts, 50KB size bomb
  • " + "
" + "Crashing object in varm:
" + "
    " + "
  • long_error_object - very long error message (should truncate)
  • " + "
" + "Circular references:
" + "
    " + "
  • circular_dict - dict that contains itself
  • " + "
  • self_reference - AnnData references itself
  • " + "
  • child_with_parent_ref - nested AnnData with circular parent reference
  • " + "
" + "Extreme nesting:
" + "
  • deeply_nested_15_levels - 15 levels of nested dicts
" + "XSS injection in obs column names:
" + "
    " + "
  • <script>alert('XSS')</script>
  • " + "
  • <img onerror=alert(1)>
  • " + "
  • onclick='evil()'
  • " + "
  • <svg onload=alert(1)>
  • " + "
  • javascript:alert(1)
  • " + "
" + "XSS in uns keys:
" + "
  • <script>evil()</script>
" + "XSS in category VALUES (not just column names):
" + "
    " + "
  • xss_category_values - category names contain XSS payloads
  • " + "
  • Tests that category preview escapes: script, img onerror, svg onload, onclick
  • " + "
" + "XSS in DataFrame COLUMN NAMES (obsm preview):
" + "
    " + "
  • X_evil_df_cols - DataFrame column names contain XSS payloads
  • " + "
  • Tests that obsm DataFrame column preview escapes malicious column names
  • " + "
" + "HTML/CSS breakout in var:
" + "
    " + "
  • </style><script>bad()</script>
  • " + "
  • </div></div></div>breakout
  • " + "
" + "Unicode stress in obs:
" + "
    " + "
  • emoji_poop - emoji character
  • " + "
  • chinese_中文 - CJK characters
  • " + "
  • rtl_LIVE_override - RTL override
  • " + "
  • null\\x00byte\\x00col - null bytes
  • " + "
" + "Size bombs:
" + "
    " + "
  • huge_categorical_10k - 10,000 categories in obs
  • " + "
  • giant_string_50kb - 50KB string in uns
  • " + "
  • many_items_500 - dict with 500 items in uns
  • " + "
" + "Many entries in varp (tests truncation):
" + "
    " + "
  • varp_000 to varp_299 - 300 arrays (truncated to 200)
  • " + "
" + "Bad colors in obs (with _colors in uns):
" + "
    " + "
  • cat_too_many_colors - 6 colors for 3 categories
  • " + "
  • cat_too_few_colors - 1 color for 4 categories
  • " + "
  • cat_bad_colors - invalid color strings
  • " + "
  • cat_strange_colors - hex, rgb(), rgba() formats
  • " + "
  • cat_empty_colors - empty colors array
  • " + "
  • cat_css_injection - CSS injection via semicolons (blocked by whitelist)
  • " + "
  • cat_url_injection - url()/expression() injection (blocked)
  • " + "
  • cat_long_colors - very long strings (DoS protection, blocked)
  • " + "
" + "Nested AnnData with errors:
" + "
    " + "
  • nested_adata_with_errors - contains bad_obj_in_nested and another_bad
  • " + "
  • Should show yellow/red row backgrounds in nested content
  • " + "
" + "SVG XSS (must be escaped):
" + "
    " + "
  • svg_script - SVG with embedded script tag
  • " + "
  • svg_onload - SVG with onload handler
  • " + "
" + "Mutation XSS (malformed HTML):
" + "
    " + "
  • mxss_unclosed - unclosed img tag with onerror
  • " + "
  • mxss_nested - malformed nested tag
  • " + "
" + "Encoding attacks:
" + "
    " + "
  • utf7_script - UTF-7 encoded script tag
  • " + "
  • bom_prefix - BOM character prefix
  • " + "
" + "Expected behavior:
" + "
    " + "
  • Errors shown in red in preview column
  • " + "
  • Warnings shown in orange in preview column
  • " + "
  • Warning/error rows have colored backgrounds (yellow/red)
  • " + "
  • All XSS attempts are escaped (no script execution)
  • " + "
  • Large data is truncated
  • " + "
  • No crashes
  • " + "
", + )) + + # Test 25: Ecosystem Package Extensibility - Modifying Known Sections + # This demonstrates how external packages (bionty, lamindb, cellxgene, etc.) + # can customize how data in obs/var columns is rendered by: + # 1. Storing semantic metadata in uns + # 2. Registering a TypeFormatter with sections=("obs", "var") and higher priority + # 3. Using FormatterContext.adata_ref and key to look up metadata + # + # See also: LAMINDB_COMPARISON_REPORT.md for detailed analysis of this pattern + print(" 25. Ecosystem Package Extensibility (Customizing obs/var columns)") + + # === STEP 1: Define a convention for storing semantic metadata in uns === + # This is what a package like bionty or lamindb would store when annotating data + ONTOLOGY_METADATA_KEY = "__ontology_annotations__" + + # === STEP 2: Create a TypeFormatter that reads this metadata === + # This would be in the ecosystem package (e.g., bionty/_repr.py) + + @register_formatter + class OntologyAnnotatedCategoricalFormatter(TypeFormatter): + """ + Example TypeFormatter for columns annotated with ontology metadata. + + This demonstrates how ecosystem packages can enhance the HTML repr for + categorical columns in obs/var by: + 1. Checking if the column has ontology metadata in uns + 2. Rendering enhanced type info (registry name, validation status) + 3. Adding tooltips with ontology IDs + + To use this pattern in your package: + 1. Define a metadata convention (e.g., uns["__mypackage_annotations__"]) + 2. Create a TypeFormatter with sections=("obs", "var") + 3. Use context.adata_ref and context.key to look up metadata + + See: src/anndata/_repr/registry.py for TypeFormatter API + """ + + priority = 115 # Higher than CategoricalFormatter (110) + sections = ("obs", "var") # Only apply to obs/var columns + + def can_format(self, obj, context) -> bool: + """ + Check if this column has ontology metadata. + + The context parameter provides access to: + - context.adata_ref: reference to root AnnData for uns lookups + - context.key: current entry key being formatted + - context.section: current section ("obs", "var", etc.) + """ + # Basic type check - is this a categorical? + if not (isinstance(obj, pd.Series) and hasattr(obj, "cat")): + return False + + # Context check - do we have the info needed to look up metadata? + if context.adata_ref is None or context.key is None: + return False + + # Check if this column has ontology annotation + annotations = context.adata_ref.uns.get(ONTOLOGY_METADATA_KEY, {}) + section_annotations = annotations.get(context.section, {}) + return context.key in section_annotations + + def format(self, obj, context): + """ + Render the categorical with ontology information. + + This produces a FormattedOutput with: + - type_name: "category[registry] (n)" instead of just "category (n)" + - preview_html: Category values with validation indicators + - tooltip: Shows ontology ID and validation status + - warnings: If unmapped values exist + """ + # Get ontology metadata for this column + annotations = context.adata_ref.uns[ONTOLOGY_METADATA_KEY] + col_info = annotations[context.section][context.key] + + registry = col_info.get("registry", "unknown") + ontology_id = col_info.get("ontology_id", "") + validated = col_info.get("validated", True) + unmapped_count = col_info.get("unmapped_count", 0) + + # Build enhanced type name with registry + n_cats = len(obj.cat.categories) + type_name = f"category[{registry}] ({n_cats})" + + # Build preview with validation status + categories = list(obj.cat.categories[:5]) + if validated: + # All values mapped - show green checkmark + cat_html = ", ".join( + f'{escape_html(str(c))}' + for c in categories + ) + if n_cats > 5: + cat_html += f' ...+{n_cats - 5}' + cat_html += ' ' + else: + # Some unmapped values - show warning + cat_html = ", ".join( + f'{escape_html(str(c))}' + for c in categories + ) + if n_cats > 5: + cat_html += f' ...+{n_cats - 5}' + cat_html += f' ⚠ {unmapped_count} unmapped' + + # Build tooltip with full metadata + tooltip_parts = [f"Registry: {registry}"] + if ontology_id: + tooltip_parts.append(f"Ontology: {ontology_id}") + tooltip_parts.append(f"Validated: {'Yes' if validated else 'No'}") + if not validated: + tooltip_parts.append(f"Unmapped: {unmapped_count} values") + + return FormattedOutput( + type_name=type_name, + css_class="anndata-dtype--category", + tooltip="\n".join(tooltip_parts), + preview_html=cat_html, + warnings=[] + if validated + else [f"{unmapped_count} values not mapped to ontology"], + ) + + # === STEP 3: Create test AnnData with ontology-annotated columns === + adata_ontology = AnnData( + np.random.randn(100, 50).astype(np.float32), + obs=pd.DataFrame({ + # Fully validated cell types + "cell_type": pd.Categorical( + np.random.choice(["T cell", "B cell", "NK cell", "Monocyte"], 100) + ), + # Partially validated tissue types (some unmapped) + "tissue": pd.Categorical( + np.random.choice(["blood", "spleen", "lymph_node", "bone_marrow"], 100) + ), + # Assay with ontology + "assay": pd.Categorical(np.random.choice(["10x 3' v3", "Smart-seq2"], 100)), + # Regular categorical (no ontology annotation) + "batch": pd.Categorical( + np.random.choice(["batch_1", "batch_2", "batch_3"], 100) + ), + # Non-categorical column + "n_counts": np.random.randint(1000, 10000, 100), + }), + var=pd.DataFrame({ + # Gene annotations with Ensembl + "gene_symbol": pd.Categorical([f"GENE{i}" for i in range(50)]), + # Regular numeric column + "mean_expression": np.random.randn(50).astype(np.float32), + }), + ) + + # Add ontology metadata to uns (this is what bionty/lamindb would do) + adata_ontology.uns[ONTOLOGY_METADATA_KEY] = { + "obs": { + "cell_type": { + "registry": "bionty.CellType", + "ontology_id": "cl", + "validated": True, + "unmapped_count": 0, + }, + "tissue": { + "registry": "bionty.Tissue", + "ontology_id": "uberon", + "validated": False, # Some values not in ontology + "unmapped_count": 2, + }, + "assay": { + "registry": "bionty.ExperimentalFactor", + "ontology_id": "efo", + "validated": True, + "unmapped_count": 0, + }, + # Note: "batch" is NOT in this dict, so it uses default formatting + }, + "var": { + "gene_symbol": { + "registry": "bionty.Gene", + "ontology_id": "ensembl", + "validated": True, + "unmapped_count": 0, + }, + }, + } + + # Add some standard sections to show they still work + adata_ontology.obsm["X_pca"] = np.random.randn(100, 10).astype(np.float32) + adata_ontology.obsm["X_umap"] = np.random.randn(100, 2).astype(np.float32) + adata_ontology.uns["cell_type_colors"] = [ + "#e41a1c", + "#377eb8", + "#4daf4a", + "#984ea3", + ] + + sections.append(( + "25. Ecosystem Package Extensibility (obs/var customization)", + adata_ontology._repr_html_(), + """

Demonstrates how external packages can customize + rendering of data in known sections (obs, var)

+ +

This example shows the pattern used by ecosystem packages + like bionty or + lamindb to add semantic + annotations to AnnData columns.

+ +

How it works:

+
    +
  1. Store metadata in uns: uns["__ontology_annotations__"] + contains registry info per column
  2. +
  3. Register TypeFormatter: with sections=("obs", "var") + and priority=115 (higher than default CategoricalFormatter at 110)
  4. +
  5. Use can_format(obj, context): to check if the entry has metadata + via context.adata_ref and context.key
  6. +
  7. Render enhanced output: type shows registry, preview shows validation status
  8. +
+ +

Columns with ontology annotations:

+
    +
  • cell_type: category[bionty.CellType] - fully validated ✓
  • +
  • tissue: category[bionty.Tissue] - 2 unmapped values ⚠
  • +
  • assay: category[bionty.ExperimentalFactor] - validated ✓
  • +
  • gene_symbol (var): category[bionty.Gene] - validated ✓
  • +
+ +

Columns without annotations (default rendering):

+
    +
  • batch: regular category (3) - no metadata in uns
  • +
  • n_counts: regular int64 - not categorical
  • +
  • mean_expression (var): regular float32
  • +
+ +

Key API points:

+
    +
  • TypeFormatter.sections: restrict to specific sections
  • +
  • TypeFormatter.priority: higher priority overrides default formatters
  • +
  • can_format(obj, context): receives full context for metadata lookups
  • +
  • context.adata_ref: reference to root AnnData for uns lookups
  • +
  • context.key: current entry key being formatted
  • +
  • context.section: current section ("obs", "var", etc.)
  • +
+ +

See also:

+
    +
  • src/anndata/_repr/registry.py: TypeFormatter API and FormatterContext
  • +
  • Test 12: Uns type hints (similar pattern for uns entries)
  • +
  • Test 14: TreeData custom sections (SectionFormatter pattern)
  • +
+ +

Hover over annotated columns to see the tooltip with full metadata.

+ """, + )) + + # Test 26: Array-API arrays with device info + # Uses mock objects — no GPU or JAX installation required + print(" 26. Array-API arrays with device info") + + def _make_visual_array_api_mock(module, *, shape, dtype, device="cpu"): + """Create a mock array satisfying the SupportsArrayApi protocol.""" + ns_module = type("Namespace", (), {"__name__": module.split(".")[0]})() + cls = type( + "MockArrayAPI", + (), + { + "shape": shape, + "dtype": dtype, + "ndim": len(shape), + "size": int(np.prod(shape)), + "device": device, + "__array_namespace__": lambda self, **kw: ns_module, + "to_device": lambda self, dev, /, **kw: self, + "__dlpack__": lambda self, **kw: None, + "__dlpack_device__": lambda self: (1, 0), + }, + ) + cls.__module__ = module + return cls() + + n_obs_api, n_vars_api = 100, 50 + adata_arrayapi = AnnData( + np.random.randn(n_obs_api, n_vars_api).astype(np.float32), + obs=pd.DataFrame( + {"cell_type": pd.Categorical(["T cell", "B cell"] * (n_obs_api // 2))}, + index=[f"cell_{i}" for i in range(n_obs_api)], + ), + var=pd.DataFrame( + {"gene_name": [f"gene_{i}" for i in range(n_vars_api)]}, + index=[f"gene_{i}" for i in range(n_vars_api)], + ), + ) + # JAX array on GPU in obsm + adata_arrayapi.obsm["X_jax_gpu"] = _make_visual_array_api_mock( + "jax.numpy", shape=(n_obs_api, 30), dtype=np.dtype("float32"), device="cuda:0" + ) + # JAX array on TPU in obsm + adata_arrayapi.obsm["X_jax_tpu"] = _make_visual_array_api_mock( + "jax.numpy", shape=(n_obs_api, 10), dtype=np.dtype("float16"), device="tpu:0" + ) + # JAX array on CPU in obsm (device should still show) + adata_arrayapi.obsm["X_jax_cpu"] = _make_visual_array_api_mock( + "jax.numpy", shape=(n_obs_api, 50), dtype=np.dtype("float64"), device="cpu" + ) + + # CuPy-like array on GPU (handled by ArrayAPIFormatter with GPU-green styling) + class _MockGPUDevice: + id = 0 + + cupy_mock = _make_visual_array_api_mock( + "cupy._core.core", + shape=(n_obs_api, 20), + dtype=np.dtype("float32"), + device=_MockGPUDevice(), + ) + adata_arrayapi.obsm["X_cupy_gpu"] = cupy_mock + # Array-API array in uns + adata_arrayapi.uns["gpu_embedding"] = _make_visual_array_api_mock( + "jax.numpy", shape=(20, 5), dtype=np.dtype("float32"), device="cuda:1" + ) + + sections.append(( + "26. Array-API Arrays with Device Info", + adata_arrayapi._repr_html_(), + "

Demonstrates Array-API formatted arrays with device " + "info visible inline (no hover needed).

" + "

Uses mock objects that satisfy the " + "SupportsArrayApi protocol — no GPU or JAX installation required.

" + "
    " + "
  • X_jax_gpu: MockArrayAPI on cuda:0
  • " + "
  • X_jax_tpu: MockArrayAPI on tpu:0
  • " + "
  • X_jax_cpu: MockArrayAPI on cpu
  • " + "
  • X_cupy_gpu: CuPy-like on GPU:0
  • " + "
  • uns['gpu_embedding']: MockArrayAPI on cuda:1
  • " + "
" + "

Device appears as dtype · device in the type " + "column, visible without hovering.

", + )) + + # Generate HTML file + output_path = Path(__file__).parent / "repr_html_visual_test.html" + html_content = create_html_page(sections) + output_path.write_text(html_content) + + print(f"\nVisual test file generated: {output_path}") + print("Open this file in a browser to inspect the HTML representation.") + + +if __name__ == "__main__": + main()