Skip to content

Fix Googlebot robots.txt blocking of public read-only API endpoints#165

Merged
vincentmakes merged 1 commit into
mainfrom
claude/fix-robots-api-blocking-roXWK
May 6, 2026
Merged

Fix Googlebot robots.txt blocking of public read-only API endpoints#165
vincentmakes merged 1 commit into
mainfrom
claude/fix-robots-api-blocking-roXWK

Conversation

@vincentmakes
Copy link
Copy Markdown
Owner

Description

Googlebot was reporting Blocked by robots.txt for public read-only API endpoints (e.g. /api/datasets/id/:id, /api/settings/language). The public site hydrates client-side from /api/* JSON endpoints, so when a JS-rendering crawler can't fetch those endpoints, it sees only the SSR shell and skips re-render — degrading indexing quality.

The public server's robots.txt previously had a blanket Disallow: /api/, which blocked all API access. However, the public server only ever exposes a curated set of GET-only, rate-limited, sensitive-field-filtered endpoints, so it's safe to expose those to crawlers.

This PR updates robots.txt to emit explicit Allow: rules for each public read-only API path before the trailing Disallow: /api/. Per Google's robots.txt spec, the longest-prefix-wins rule ensures anything not on the allow-list stays blocked by default.

Changes

  • Added PUBLIC_API_ALLOW_PATHS constant listing all safe public API endpoints
  • Extracted buildRobotsTxt(req) helper function to unify both robots.txt handlers (dual-server and PUBLIC_ONLY paths) so they can't drift
  • Updated robots.txt output to include explicit Allow: rules for: /api/profile, /api/sections, /api/settings, /api/experiences, /api/certifications, /api/education, /api/skills, /api/projects, /api/timeline, /api/custom-sections, /api/layout-types, /api/social-platforms, /api/cv, /api/datasets/slug/, /api/datasets/id/
  • Added regression tests covering both indexable and noindex branches

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Checklist

Required for all code changes

  • I have tested my changes locally (npm test passes)
  • Version has been bumped in all 3 files (package.json, package-lock.json, version.json)
  • CHANGELOG.md has been updated with a new entry under the correct version

If adding or changing user-visible strings

  • N/A — no user-visible strings added

If documentation-only change

  • N/A — this is a code change

Test Plan

Added two new test cases in tests/backend.test.js:

  1. Verifies that all required public API paths have explicit Allow: rules in robots.txt when robotsMeta is set to index
  2. Verifies that robots.txt emits a global Disallow: / when robotsMeta is set to noindex

Both handlers (dual-server and PUBLIC_ONLY) now share the same buildRobotsTxt() function, eliminating the risk of divergence.

https://claude.ai/code/session_01EahTJ7kaq3g5ze6F3MmNn6

The public server's robots.txt had a blanket Disallow: /api/, which
blocked Googlebot from fetching the JSON the public site hydrates from.
JS-rendering crawlers that can't load /api/* fall back to the SSR shell
and skip the second render pass, hurting indexing.

The public server only exposes GET-only, rate-limited, sensitive-field-
filtered endpoints by design, so emit explicit Allow: rules for each
public read-only API path before the trailing Disallow: /api/ — the
longest-prefix-wins rule keeps anything not on the allow-list blocked.
Extracted a shared buildRobotsTxt(req) helper so the dual-server and
PUBLIC_ONLY robots.txt handlers can't drift apart.
@vincentmakes vincentmakes merged commit 872766e into main May 6, 2026
3 checks passed
@vincentmakes vincentmakes deleted the claude/fix-robots-api-blocking-roXWK branch May 6, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants