Skip to content

torch.AcceleratorError in surya layout encoder on MPS (Apple Silicon) with 66-page PDF #993

@DanielRamosAcosta

Description

@DanielRamosAcosta

🧨 Describe the Bug

marker_single crashes with torch.AcceleratorError: index 4755 is out of bounds: 0, range 0 to 1 during the layout recognition phase when processing a 66-page PDF on Apple Silicon (MPS backend). The error is reproducible and occurs consistently around page 14–16.

📄 Input Document

The PDF that triggers the crash is publicly available:
https://www.defensordelpueblo.es/wp-content/uploads/2025/03/Defensor-del-Pueblo_Informe-anual-2024.pdf

(2.7 MB, 66 pages — Spanish Ombudsman Annual Report 2024)

📤 Output Trace / Stack Trace

Click to expand
2026-03-01 09:40:26,850 [WARNING] surya: `TableRecEncoderDecoderModel` is not compatible with mps backend. Defaulting to cpu instead
Recognizing Layout:  24%|██▍       | 16/66 [01:12<03:47,  4.55s/it]
Traceback (most recent call last):
  File "marker/scripts/convert_single.py", line 38, in convert_single_cli
    rendered = converter(fpath)
  File "marker/converters/pdf.py", line 195, in __call__
    document = self.build_document(temp_path)
  File "marker/converters/pdf.py", line 182, in build_document
    document = DocumentBuilder(self.config)(provider, layout_builder, line_builder, ocr_builder)
  File "marker/builders/document.py", line 33, in __call__
    layout_builder(document, provider)
  File "marker/builders/layout.py", line 56, in __call__
    layout_results = self.surya_layout(document.pages)
  File "marker/builders/layout.py", line 88, in surya_layout
    layout_results = self.layout_model(
        [p.get_image(highres=False) for p in pages],
        batch_size=int(self.get_batch_size()),
    )
  File "surya/layout/__init__.py", line 51, in __call__
    self.foundation_predictor.prediction_loop(...)
  File "surya/foundation/__init__.py", line 780, in prediction_loop
    updated_inputs, outputs, merge_idxs = self.prefill(
        current_inputs, max_lookahead_tokens=0
    )
  File "surya/foundation/__init__.py", line 556, in prefill
    image_embeddings = self.model.get_image_embeddings(
        pixel_values=image_tiles, ...
    )
  File "surya/common/surya/__init__.py", line 258, in get_image_embeddings
    chunk_embeddings = self.vision_encoder.embed_images(
        image_batch=chunk_pixels.unsqueeze(0).to(device=self.device),
        grid_thw=chunk_grid_thw.unsqueeze(0).to(device=self.device),
    )
  File "surya/common/surya/encoder/__init__.py", line 798, in embed_images
    return super().forward(hidden_states=image_batch, grid_thw=grid_thw)
  File "surya/common/surya/encoder/__init__.py", line 765, in forward
    hidden_states = blk(hidden_states, cu_seqlens=cu_seqlens, position_embeddings=position_embeddings)
  File "surya/common/surya/encoder/__init__.py", line 595, in forward
    hidden_states = hidden_states + self.attn(self.norm1(hidden_states), ...)
  File "surya/common/surya/encoder/__init__.py", line 544, in forward
    self.unpack_qkv_with_mask(q, k, v, cu_seqlens)
  File "surya/common/surya/encoder/__init__.py", line 438, in unpack_qkv_with_mask
    max_seq_len = seq_lengths.max().item()
torch.AcceleratorError: index 4755 is out of bounds: 0, range 0 to 1

⚙️ Environment

  • Marker version: 1.10.2
  • Surya version: 0.17.1
  • Python version: 3.13.5
  • PyTorch version: 2.10.0
  • Transformers version: 4.57.6
  • Operating System: macOS 15.5 (Darwin 24.6.0), Apple Silicon (MPS)

✅ Expected Behavior

The PDF should convert to markdown without crashing. A shorter 29-page PDF from the same source (the Anexo version) converts successfully.

📟 Command or Code Used

Click to expand
curl -sL -o /tmp/test.pdf "https://www.defensordelpueblo.es/wp-content/uploads/2025/03/Defensor-del-Pueblo_Informe-anual-2024.pdf"
marker_single /tmp/test.pdf --output_format markdown --disable_image_extraction --output_dir /tmp/output

📎 Additional Context

  • The error is reproducible across multiple runs, always crashing at the same point (~page 14–16 of 66).
  • surya logs a warning at startup: TableRecEncoderDecoderModel is not compatible with mps backend. Defaulting to cpu instead — but the layout model still runs on MPS and crashes.
  • Likely related to a specific page layout (complex table or chart) that causes a tensor index to go out of bounds in the vision encoder's attention mechanism.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions