Skip to content
Emmanuel S. edited this page Apr 15, 2026 · 20 revisions

ROADMAP

Goal : This project should be replaced by Geoplateforme API calls providing a keyword-based search engine for collections (like https://www.data.gouv.fr/api/1/datasets/?q=ecole&page=1&page_size=20) and detailled schemas for collections (like Table Schema)

0.0.x - PoC

0.1.x - MVP - allow MCP integration

  • #4 - Improve data management to ease change detection and overwrite updates (create unique file for each WFS FeatureType)
  • #5 - Review available data on data.geopf.fr and improve filtering to keep only relevant ones (remove gpf publication test datasets, local data,...)
  • #6 - Integrate the lightweight search engine (search(q: string)) based on MiniSearch from the MCP ignfab/geocontext)
  • #8 - Improve logging to avoid problem in the MCP
  • #6 Add functional tests for the search :
- query: "bâtiment"
  expected: ["BDTOPO_V3:batiment","BDCARTO_V5:batiment"]
...

0.2.x - use more data source

Use more WFS infos :

  • #6 Define a first working strategy for the search to match expectations (ponderate between datasets, ...)
  • Retrieve keywords from DescribeFeatureType
  • Parse namespace to extract version ("ADMINEXPRESS-COG.2026" -> {"version": "2026"})
  • #16 Gather more internal metadata. Revisit the first naive metadata extraction.

Integrate validation schemas :

  • #17 Retrieve relevant informations from ISO 19115 metadata from https://data.geopf.fr/csw ⚠️ Tables schema not included in these metadata ⚠️
=>ISO 19115 is not trivial!
curl -sS "https://data.geopf.fr/wfs?SERVICE=WFS&VERSION=2.0.0&REQUEST=GetCapabilities" | xmllint --format - | grep MetadataURL

Ideas for the next steps

Prepare Géoplateforme integration

Use abstractions to prepare replacement?
  • Use an existing metamodel (Table Schema or IGN Validator) instead of src to align with validation requirements (not required for now as an LLM doesn't parse data and doesn't care about model changes)
  • Illustrate the expected service at Géoplateforme level with a Lightweight REST API :
    • Get all collections (/api/collections) - too fat for an LLM (seen on GeoServer implementation)
    • Get collections by id (/api/collections/{id}) - required to allow the MCP to query features
    • Search collection (/api/collections/search?q={text}) - required to allow the MCP to find data
    • Get collections by namespace (aka serie) (/api/collections?namespace=BDTOPO_V3) - not required for MCP

Experiment the use an LLM to generate schemas

  • Input : data/wfs/{namespace}/{name}.json + document (HTML/PDF)
  • Output : data/overwrite/{namespace}/{name}.json (that can be reviewed / completed)

Create a lightweight viewer

  • Display collection grouping by product (personal experiment is available here : https://www.quadtreeworld.net/geekeries/wfs-explorer/)
  • Allow user to search collection with a form (as having to use a LLM based tool to find available data is not very eco-friendly...)