-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Description
🧨 Describe the Bug
marker_single crashes with torch.AcceleratorError: index 4755 is out of bounds: 0, range 0 to 1 during the layout recognition phase when processing a 66-page PDF on Apple Silicon (MPS backend). The error is reproducible and occurs consistently around page 14–16.
📄 Input Document
The PDF that triggers the crash is publicly available:
https://www.defensordelpueblo.es/wp-content/uploads/2025/03/Defensor-del-Pueblo_Informe-anual-2024.pdf
(2.7 MB, 66 pages — Spanish Ombudsman Annual Report 2024)
📤 Output Trace / Stack Trace
Click to expand
2026-03-01 09:40:26,850 [WARNING] surya: `TableRecEncoderDecoderModel` is not compatible with mps backend. Defaulting to cpu instead
Recognizing Layout: 24%|██▍ | 16/66 [01:12<03:47, 4.55s/it]
Traceback (most recent call last):
File "marker/scripts/convert_single.py", line 38, in convert_single_cli
rendered = converter(fpath)
File "marker/converters/pdf.py", line 195, in __call__
document = self.build_document(temp_path)
File "marker/converters/pdf.py", line 182, in build_document
document = DocumentBuilder(self.config)(provider, layout_builder, line_builder, ocr_builder)
File "marker/builders/document.py", line 33, in __call__
layout_builder(document, provider)
File "marker/builders/layout.py", line 56, in __call__
layout_results = self.surya_layout(document.pages)
File "marker/builders/layout.py", line 88, in surya_layout
layout_results = self.layout_model(
[p.get_image(highres=False) for p in pages],
batch_size=int(self.get_batch_size()),
)
File "surya/layout/__init__.py", line 51, in __call__
self.foundation_predictor.prediction_loop(...)
File "surya/foundation/__init__.py", line 780, in prediction_loop
updated_inputs, outputs, merge_idxs = self.prefill(
current_inputs, max_lookahead_tokens=0
)
File "surya/foundation/__init__.py", line 556, in prefill
image_embeddings = self.model.get_image_embeddings(
pixel_values=image_tiles, ...
)
File "surya/common/surya/__init__.py", line 258, in get_image_embeddings
chunk_embeddings = self.vision_encoder.embed_images(
image_batch=chunk_pixels.unsqueeze(0).to(device=self.device),
grid_thw=chunk_grid_thw.unsqueeze(0).to(device=self.device),
)
File "surya/common/surya/encoder/__init__.py", line 798, in embed_images
return super().forward(hidden_states=image_batch, grid_thw=grid_thw)
File "surya/common/surya/encoder/__init__.py", line 765, in forward
hidden_states = blk(hidden_states, cu_seqlens=cu_seqlens, position_embeddings=position_embeddings)
File "surya/common/surya/encoder/__init__.py", line 595, in forward
hidden_states = hidden_states + self.attn(self.norm1(hidden_states), ...)
File "surya/common/surya/encoder/__init__.py", line 544, in forward
self.unpack_qkv_with_mask(q, k, v, cu_seqlens)
File "surya/common/surya/encoder/__init__.py", line 438, in unpack_qkv_with_mask
max_seq_len = seq_lengths.max().item()
torch.AcceleratorError: index 4755 is out of bounds: 0, range 0 to 1
⚙️ Environment
- Marker version: 1.10.2
- Surya version: 0.17.1
- Python version: 3.13.5
- PyTorch version: 2.10.0
- Transformers version: 4.57.6
- Operating System: macOS 15.5 (Darwin 24.6.0), Apple Silicon (MPS)
✅ Expected Behavior
The PDF should convert to markdown without crashing. A shorter 29-page PDF from the same source (the Anexo version) converts successfully.
📟 Command or Code Used
Click to expand
curl -sL -o /tmp/test.pdf "https://www.defensordelpueblo.es/wp-content/uploads/2025/03/Defensor-del-Pueblo_Informe-anual-2024.pdf"
marker_single /tmp/test.pdf --output_format markdown --disable_image_extraction --output_dir /tmp/output📎 Additional Context
- The error is reproducible across multiple runs, always crashing at the same point (~page 14–16 of 66).
- surya logs a warning at startup:
TableRecEncoderDecoderModel is not compatible with mps backend. Defaulting to cpu instead— but the layout model still runs on MPS and crashes. - Likely related to a specific page layout (complex table or chart) that causes a tensor index to go out of bounds in the vision encoder's attention mechanism.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels