Skip to content

feat: suggest dataservice description#924

Open
bolinocroustibat wants to merge 7 commits intomainfrom
feat/suggest-dataservice-description
Open

feat: suggest dataservice description#924
bolinocroustibat wants to merge 7 commits intomainfrom
feat/suggest-dataservice-description

Conversation

@bolinocroustibat
Copy link
Contributor

@bolinocroustibat bolinocroustibat commented Feb 5, 2026

Add AI-powered description suggestion for dataservices (external APIs) via the Albert API.

When editing a dataservice, a "Suggérer une description" button generates a French description. The button is enabled only when:

  • title is filled
  • at least one of these is filled: technical documentation URL or machine documentation URL (OpenAPI/Swagger)

Changes:

  • DescribeDataservice.vue: suggestion button with loading/disabled states and tooltip; enabled when title + at least one doc URL are present
  • generate-dataservice-description.post.ts (new): Nitro endpoint that fetches documentation from the given URLs, inlines it into the prompt, and calls Albert API. Requires title and at least one of technicalDocumentationUrl or machineDocumentationUrl
  • fetch-documentation.ts (new): utility to fetch doc content from URLs (HTML stripped, JSON/YAML as-is; 15s timeout, 120k char cap)

EDIT (2026-02-17):

  • generate-dataservice-description.post.ts uses shared callAlbertAPI (albert-helpers), same pattern as other Albert endpoints.
  • Doc content is now fetched from the URLs and inlined into the prompt (new fetch-documentation.ts: HTML stripped, JSON/YAML as-is, 15s timeout, 120k char cap). Previously only the URLs were sent.
  • Fixed 422 "description too short" being rethrown as 500.
  • Removed redundant validateAlbertConfig from helper and all Albert endpoints.

@bolinocroustibat bolinocroustibat self-assigned this Feb 5, 2026
@bolinocroustibat bolinocroustibat marked this pull request as draft February 5, 2026 16:30
@bolinocroustibat bolinocroustibat force-pushed the feat/suggest-dataservice-description branch 2 times, most recently from 550dbdb to df4c507 Compare February 16, 2026 16:48
@bolinocroustibat bolinocroustibat marked this pull request as ready for review February 16, 2026 16:49
+ `Here is the API information:\n`
+ `Title: ${title.trim()}\n`
+ (hasTechnical ? `Technical documentation URL: ${technicalDocumentationUrl.trim()}\n` : '')
+ (hasMachine ? `Machine documentation URL (OpenAPI/Swagger): ${machineDocumentationUrl.trim()}\n` : '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design: The openweight-small model can't browse these URLs — it only sees the URL strings. The prompt asks to "mention key endpoints, data types, and use cases" but the model has no access to the actual documentation content. This will likely produce hallucinated descriptions about specific endpoints.

Possible alternatives:

  • Fetch the documentation content server-side and include it in the prompt
  • Use createAgentCompletion if the Albert agent API supports web browsing
  • Adjust the prompt to only describe what can be reasonably inferred from a title + URL patterns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damned I was tricked by the hallucinations of the models who made me think it was indeed browsing the URLs! Good call.

I would go for option 1, trying to fetch and format the data from those URLs, with some guardrails regarding the maximum size of the prompt and what those models should/could have as a long prompt. I'll suggest a commit soon.

@bolinocroustibat bolinocroustibat force-pushed the feat/suggest-dataservice-description branch from 7c4bf25 to 3c20ce2 Compare February 17, 2026 12:51
@bolinocroustibat bolinocroustibat moved this from 🛠 Doing to 👀 Review in 🚀 Produit data.gouv.fr Feb 17, 2026
@bolinocroustibat bolinocroustibat force-pushed the feat/suggest-dataservice-description branch 3 times, most recently from 494af3c to ba1f4fa Compare February 17, 2026 19:53
const timeoutId = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS)

try {
const response = await $fetch<string>(url, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: SSRF vulnerability. This $fetch call will follow any URL provided by the user, including internal network addresses. An attacker could probe:

  • http://169.254.169.254/latest/meta-data/ (cloud metadata — AWS, GCP)
  • http://localhost:3000/... or http://127.0.0.1/... (internal endpoints)
  • http://10.x.x.x/... (private network)

At a minimum, validate that the URL scheme is http/https and that the resolved hostname is not a private/reserved IP (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16).


Security: no response size limit. The MAXIMUM_PROMPT_LENGTH check in the endpoint happens after the full body has been downloaded into memory. A malicious URL could return gigabytes of data and exhaust server memory before the check kicks in.

Mitigation options:

  • Check Content-Length header before reading the body and reject if too large
  • Stream the response and abort once a byte threshold is reached
  • Or at the very least, truncate raw immediately after reception (before formatDocumentationContent)

text = text
.replace(/<script\b[^>]*>[\s\S]*?<\/script>/gi, '')
.replace(/<style\b[^>]*>[\s\S]*?<\/style>/gi, '')
.replace(/<[^>]+>/g, ' ')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quality: HTML entities are not decoded after stripping tags. After removing HTML tags, entities like &amp;, &rsquo;, &mdash;, &nbsp; remain as-is in the text. The LLM will see raw entity strings in the prompt, which degrades description quality.

A simple entity decode pass after the tag strip would help (e.g. using a lightweight lib like he, or a manual replacement of the most common entities).

/**
* Trims and formats content: strips HTML tags if present, normalizes whitespace.
*/
function formatDocumentationContent(raw: string, _url: string): string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: _url parameter is declared but unused. Either use it (e.g. to infer content type from the URL extension) or remove it.

{{ $t('Suggérer une description') }}
</template>
</BrandedButton>
<CdataLink
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UX: feedback link is visible before any suggestion has been generated. "Comment avez-vous trouvé cette suggestion ?" doesn't make sense when no description has been suggested yet. Consider conditioning on a hasGeneratedDescription boolean (set to true after a successful generation).

(Same issue exists in DescribeDataset.vue but no need to reproduce it here.)

bolinocroustibat and others added 7 commits February 23, 2026 15:17
Remove explicit ref import as it's auto-imported by Nuxt

Co-authored-by: Cursor <cursoragent@cursor.com>
- Use callAlbertAPI from albert-helpers in generate-dataservice-description
  (align with other Albert endpoints)
- Fix 422 for 'description too short' so it is returned to client instead of 500
- Remove validateAlbertConfig; useAlbertConfig() already throws when API key
  is missing
- Drop redundant error logging in dataservice-description handler

Co-authored-by: Cursor <cursoragent@cursor.com>
…ateDescriptionFeedbackUrl

Co-authored-by: Cursor <cursoragent@cursor.com>
@bolinocroustibat bolinocroustibat force-pushed the feat/suggest-dataservice-description branch from ba1f4fa to d99abad Compare February 23, 2026 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 👀 Review

Development

Successfully merging this pull request may close these issues.

2 participants