Skip to content
@cycling-data-lab

cycling-data-lab

Open research lab on cycling data: bike-sharing demand, mobility-justice diagnostics, GBFS infrastructure audits, and cycling-environment composite indicators.

Cycling Data Lab

A research program on structural lower bounds for graph-supervised learning — instantiated empirically on materials informatics, urban mobility, bike share demand and mobility justice.

Repos License Data DOI Affiliation

By the numbers. 34,858 French communes mapped · 1,509 global GBFS feeds audited · 37 M bike share trip observations processed · 27 bike-share networks benchmarked · 322 cycling poverty deserts identified · 8-task MatBench applicability-domain panel · empirically validated across 24 international networks and a 34,858-commune multi-modal panel · one open-source Python package implementing the bound and its augmentation · one structural lower bound that connects them all.

Theoretical program

Our central research goal is a universal spectral lower bound on the generalisation error of any graph-supervised learner. The bound depends only on three objects — the graph Laplacian, the target signal, and the learner's reachable feature subspace — and is independent of the regressor choice. Each empirical application in this organisation is, at the methodological level, a corollary or instantiation of this single bound.

flowchart TD
    P5["<b>structural-bounds-framework</b><br/>Universal spectral lower bound<br/>+ C1, C2, C3 proofs absorbed<br/><i>JMLR / FoCM · v0.4 working draft (20 pp main + 5 pp SI)</i>"]
    P1["<b>materials-applicability-bound</b><br/>Corollary C1 — encoding gap under LSO<br/><i>MLST v1.0-rc.5 · draft ready (not yet submitted)</i>"]
    P4["<b>mobility-applicability-bound</b><br/>Empirical instantiation of C1 on<br/>34,858 French commune mobility panel<br/><i>TR-B target · early draft</i>"]
    P6["<b>topological-localization-mobility</b><br/>Bare-Laplacian eigenvector localization<br/>predicts the bound on bike-share<br/><i>EPJ Data Science / Applied Network Sci · v0.1 working draft</i>"]
    TOOL["<b>spectral-mobility</b><br/>Open-source Python package (MIT)<br/>operationalising the bound + augmentation<br/><i>v0.4.0 · 72 tests · CitySpectralProfile + SpectralAugmentedRegressor</i>"]
    P2["<b>(planned) negative-transfer-corollary</b><br/>C2 empirical anchor on QM9 → MatBench<br/><i>theory done in P5; experiments TBD</i>"]
    P3["<b>(planned) active-learning-corollary</b><br/>C3 empirical anchor: leverage-score vs uniform<br/><i>theory done in P5; experiments TBD</i>"]

    P5 -->|C1| P1
    P5 -->|C2| P2
    P5 -->|C3| P3
    P5 -.->|implemented in| TOOL
    P1 -.->|empirical sibling| P4
    P1 -.->|empirical sibling| P6
    P6 -.->|productised as| TOOL
Loading

Shared theoretical signature. All bounds in the program take the form

expected loss under evaluation protocol Π ≥ (1 − R²_spec(𝒮_𝒜, y)) · Var(y) − slack(Π, 𝒮_𝒜),

where R²_spec is the projection-R² of the target signal y onto the learner's reachable feature subspace 𝒮_𝒜, computed in the eigenbasis of the graph Laplacian, and the slack term is controlled by the Pesenson sampling quality of the protocol on that subspace (Pesenson 2008; Anis–Gadde–Ortega 2016; the extension to arbitrary feature subspaces follows Chepuri–Leus 2018, Tanaka–Eldar 2020).

Minimax-tight efficiency. In the realisable case, an ERM-on-projection witness saturates the bound up to an O(M² / √(N−n)) slack via Berry–Esseen anticoncentration (Theorem 2 of structural-bounds-framework), establishing the Pesenson-ridge estimator on the restricted feature subspace as an efficient estimator in the sense of the classical Cramér–Rao bound. Sharpening the constant via a multipoint Fano refinement is open.

Why the program is organised this way. Publishing several focused corollaries alongside the universal framework yields both tactical impact (each corollary stands on its own) and strategic coherence (the program builds a recognisable theoretical lane, in the spirit of how Cramér–Rao bounds organise classical estimation theory). Cross-domain controls (cycling networks, MovieLens, materials) are an intrinsic part of the methodology, not an afterthought: every corollary is validated on at least two unrelated domains to confirm that it reflects a property of graph-supervised learning, not an artefact of any single field.

What we work on

We measure cycling environments, bike share demand and the social distribution of both, at the granularity at which French transport policy is actually decided: the commune (n = 34,858) and the station (n ≈ 50,000 across France and 6 international networks). Methodologically, the same graph-signal-processing tools that we built for cycling network expansion turn out to apply far beyond that setting — to materials informatics (materials-applicability-bound, MLST submission), to urban mobility transferability (mobility-applicability-bound, TR-B target), and ultimately to a unified theoretical statement (structural-bounds-framework) of which the others are corollaries.

Three open data substrates meet here in a single research pipeline: OpenStreetMap infrastructure, GBFS station feeds, and INSEE social statistics. Plus, increasingly, MatBench DFT panels for the materials-side methodology work. The pipeline produces:

  1. A reproducible audit of 1,509 GBFS feeds worldwide, exposing the semantic ambiguities of the standard and releasing a 46 column certified catalogue across 48 countries.
  2. A commune level supply side composite indicator (the IMD-4) that improves on the de facto French standard (Cerema cycling infrastructure density) by +18 pts R² in predicting realised commuting share.
  3. A demand prediction benchmark on 27 dock based networks across two continents, with paired bootstrap CIs and an explicit decomposition of the +0.27 headline ΔR² into transferable spatial and station fingerprint components.
  4. A mobility justice diagnostic that turns the indicator into a ranked, intersectional priority list of 322 cycling poverty deserts for the 2023 to 2027 Plan Vélo.
  5. A graph signal processing toolkit that develops the spectral bounds, sampling theoretic siting and empirical learning curves which underpin the prediction work.
  6. A structural lower bound on the applicability-domain gap in materials property prediction, derived from the same GSP framework as the bike-share siting bounds, validated on 8 MatBench tasks with a foundation-model encoder-discrimination oracle test (CHGNet).
  7. An empirical instantiation of that same bound in urban mobility, on the 34,858 French commune panel — answering the question "why do mode-choice models trained in city A fail in city B?" in the same language as the materials encoding gap.
  8. A unified theoretical framework that contains all of the above as corollaries of a single regressor-independent spectral inequality.

All released as code, data and reproducible LaTeX under MIT, with Zenodo DOIs minted on every versioned release.

Repository map

flowchart TD
    A[gbfs-audit-catalogue<br/><i>1,509 feeds · 46,307 stations · 48 countries</i>]
    B[imd-national-catalogue<br/><i>IMD-4 + IES on 34,858 French communes</i>]
    C[bikeshare-demand-forecasting<br/><i>Prediction + leave-station-out siting</i>]
    D[bikeshare-gsp-tools<br/><i>Spectral bounds · D-optimal siting · learning curves</i>]
    E[penality-analysis<br/><i>Triple-penalty mobility-justice diagnostic</i>]
    F[materials-applicability-bound<br/><i>Corollary C1 · 8-task MatBench · CHGNet oracle test</i>]
    M[mobility-applicability-bound<br/><i>Empirical instantiation of C1 on 34,858 communes</i>]
    U[structural-bounds-framework<br/><i>Universal spectral lower bound · contains C1–C3 as corollaries</i>]
    T[topological-localization-mobility<br/><i>Eigenvector localization predicts the bound · 9-city panel</i>]
    S[spectral-mobility<br/><i>Python package · structural bound + spectral augmentation</i>]
    G[paper-template<br/><i>Starter repo for new papers</i>]

    A --> B
    A --> C
    B --> C
    B --> D
    B --> E
    B --> M
    C -.->|nine-city panel| T
    D -.->|GSP framework transfers| F
    F -.->|methodology contributes back| D
    F -.->|sibling empirical| M
    U ==>|C1| F
    U ==>|empirical sibling of C1| M
    U ==>|empirical sibling of C1| T
    G -.->|template for| F
    G -.->|template for| M
    G -.->|template for| U
    G -.->|template for| C
    G -.->|template for| T
    U -.->|implemented in| S
    T -.->|productised in| S
    S -.->|used by| C
    S -.->|used by| T
Loading
Repository Contribution Method Status
structural-bounds-framework Universal spectral lower bound on graph-supervised learning (contains C1–C3 as corollaries with full proofs) Transductive Rademacher (El-Yaniv–Pechyony 2006) + Hoeffding–Serfling + Berry–Esseen anticoncentration; sharp matching constant via Le Cam two-point; ERM-on-projection witness saturates v0.4 working draft (20 pp main + 5 pp SI + 5 research notes + cover letter), JMLR / FoCM target
materials-applicability-bound Corollary C1: first regressor-independent structural lower bound on the applicability-domain gap in materials property prediction Cochran finite-population identity + Talagrand-contraction Rademacher + Pesenson sampling, validated on 8 MatBench panels with CIG = 18× to 145× above shuffled-kNN null v1.0-rc.5, draft complete and ready for submission to MLST (Zenodo DOI 10.5281/zenodo.20355996) — not yet submitted
mobility-applicability-bound Empirical instantiation of C1 in urban mobility (why mode-choice models trained in city A fail in city B) Same framework as materials-applicability-bound, applied to the 34,858 French commune mobility panel Early draft, TR-B target
topological-localization-mobility Empirical instantiation of C1 on the nine-city bike-share panel: bare-Laplacian eigenvector localization predicts the bound at the level of individual modes; two falsification tests rule out centrality-artefact and disorder-driven alternative mechanisms Inverse-Participation-Ratio mode-by-mode bridge on n = 6,923 (city, eigenmode) pairs (ρ = −0.30, p = 2.3 × 10⁻¹⁴⁴); degree-preserving permutation null; on-site potential perturbation falsification v0.1 working draft (9 pp main + 3 pp SI), EPJ Data Science / Applied Network Science target
imd-national-catalogue IMD-4 cycling environment composite on 34,858 French communes Bayesian simplex MCMC calibrated on FUB and EMP panels v0.2 beta (Hugging Face and Zenodo planned)
bikeshare-demand-forecasting IMD augmented bike share demand prediction (temporal and leave station out) LightGBM with paired station bootstrap (B = 1000) on a 9 network LSO panel Working draft, pre submission
bikeshare-gsp-tools Graph signal processing foundations for cycling network expansion Symmetric Laplacian spectral bounds and D optimal greedy submodular siting (Nemhauser 1−1/e) Early draft, theory development in progress
penality-analysis Triple penalty mobility justice diagnostic Deterministic intersection of three vulnerability layers on the IMD-4 substrate Working draft, pre submission
gbfs-audit-catalogue Reproducible audit of 1,509 GBFS bike share feeds across 48 countries 46 column reference schema with an anomaly detection layer Stable, Zenodo archived
spectral-mobility Open-source Python package operationalising the structural bound and spectral augmentation: CitySpectralProfile, SpectralAugmentedRegressor, compare_cities, plot helpers 9 modules, 72 unit tests, transductive + Nyström-inductive cross-validation, Gaussian-RBF k-NN graphs (geographic or feature), MIT licensed v0.4.0 — alpha release; API stabilising around SpectralAugmentedRegressor + CitySpectralProfile
paper-template Starter directory for new papers in this organisation LaTeX + iopjournal.cls + numbered experiment scripts + reproducibility infrastructure + Zenodo metadata, all wired by default Template repo

Status note (May 2026). No manuscript from this organisation has been submitted to a journal yet. Several drafts have reached the point where submission is technically possible — materials-applicability-bound v1.0-rc.5 (MLST), gbfs-audit-catalogue v1.0.1 (data paper), structural-bounds-framework v0.4 (JMLR / FoCM), bikeshare-demand-forecasting, penality-analysis, and topological-localization-mobility v0.1 — but the program is deliberately paced (see the Submission roadmap below). The framework draft includes full proofs of Corollaries C2 (negative transfer) and C3 (active learning) absorbed inside the main paper; planned standalone follow-up repositories negative-transfer-corollary and active-learning-corollary will host the empirical-validation experiments. Working drafts are released openly during the writing process so that feedback can shape the eventual submission.

Provenance note. The topological-localization-mobility paper started life under an Anderson-localization framing that was empirically falsified by our own disorder-robustness placebo tests; the precursor repository is archived (read-only) at cycling-data-lab/anderson-localization-mobility, tag v0.1-anderson-falsified, for transparency.

Tool release. The spectral-mobility Python package operationalises the structural-bound and spectral-augmentation methodologies of the program. v0.4.0 ships SpectralAugmentedRegressor (sklearn-style, transductive + Nyström-inductive), CitySpectralProfile (self-contained spectral signature of a single network) and compare_cities / cross_city_similarity_matrix (multi-city pairwise comparison). 72 unit tests passing. Validated on real Boston Bluebikes data: baseline R² = +0.05 → augmented R² = +0.46 (inductive, strict, K=16), a × 9 improvement.

Submission roadmap

Working drafts are available openly in this organisation as they mature; formal journal submissions are being staged across Q3/Q4 2026. arXiv preprints will resolve cross-paper citations as drafts go out.

Papers (existing drafts + planned follow-ups)

Paper Target venue Status
gbfs-audit-catalogue Computer Standards & Interfaces / Scientific Data draft v1.0.1, foundational data substrate
materials-applicability-bound Machine Learning: Science and Technology (IOP) draft v1.0-rc.5, cleanest empirical application of Corollary C1
structural-bounds-framework JMLR (preferred over FoCM for review speed) draft v0.4, contains C1–C3 with proofs
bikeshare-demand-forecasting transport-data journal (TBD) draft, IMD-augmented demand prediction on 27-network panel
penality-analysis Transport Reviews / Journal of Transport Geography draft, triple-penalty mobility justice diagnostic
topological-localization-mobility EPJ Data Science / Applied Network Science draft v0.1, phenomenological diagnostic + Anderson falsification
(planned) National-scale cross-mode spectral analysis Transportation Research Part C empirical work done on the 34 858-commune × 4-INSEE-mode panel; writing TBD
(planned) Spectral feature augmentation for graph-structured prediction ICLR / NeurIPS / hybrid ML-mobility method paper around SpectralAugmentedRegressor; Boston validation R² = +0.05 → +0.46; writing TBD
mobility-applicability-bound Transportation Research Part B early draft, needs further work
bikeshare-gsp-tools TBD early draft, theory tightening + multi-city replication still needed

The two planned follow-ups build on topological-localization-mobility and spectral-mobility but are kept as separate manuscripts: the first is a diagnostic across modes and scales, the second is the algorithm that operationally closes the gap measured by the first. Merging them would dilute both narratives.

Open data and reproducibility

Every result in every repo can be reproduced from the raw open data sources:

Each repo ships:

  • a requirements.txt pinning the Python stack;
  • random seeds (typically 42) and explicit RAM and wall time budgets per script;
  • pre computed intermediate parquets to bypass long recomputations;
  • a CITATION.cff for machine readable citation;
  • a .zenodo.json for automatic DOI minting on each GitHub Release;
  • an MIT LICENSE for the code (data products inherit upstream licenses).

New repos in this organisation should be created from paper-template, which ships all of the above plus a starter LaTeX manuscript in the IOP iopjournal.cls style.

How to cite

@misc{cyclingDataLab,
  author       = {Foss\'e, Rohan and Pallares, Ga\"el},
  title        = {{cycling-data-lab}: a research program on structural lower bounds
                  for graph-supervised learning, with empirical instantiations in
                  materials informatics, urban mobility, bike share demand and
                  mobility justice},
  year         = {2026},
  howpublished = {\url{https://github.com/cycling-data-lab}}
}

Per repo BibTeX entries are in the corresponding README.md.

People

Rohan Fossé · Enseignant Responsable Pédagogique, CESI École d'Ingénieurs, Montpellier email ORCID

Gaël Pallares · Enseignant Chercheur, CESI LINEACT (EA 7527) ORCID

Affiliated with CESI LINEACT (EA 7527), Montpellier, France.

Funding

To complete.

Contributing

Issues and pull requests are welcome on any of the repos. We follow a publish then discuss model: drafts are released openly during the writing process so external feedback can shape the eventual submission.

For larger collaborations (joint papers, data sharing, code contributions), email Rohan directly.

One spectral lower bound, several empirical domains — and the data, code and LaTeX to reproduce every claim.

Pinned Loading

  1. gbfs-audit-catalogue gbfs-audit-catalogue Public

    A reproducible audit of 1,509 GBFS bike-sharing feeds across 48 countries. 46-column reference dataset for 46,307 certified stations.

    Python 1

  2. bikeshare-demand-forecasting bikeshare-demand-forecasting Public

    Paper: IMD-augmented bike-share demand forecasting. Spatio-temporal benchmark on 27 networks across 2 continents, with leave-station-out spatial generalisation. Companion to imd-national-catalogue.

    Python

  3. bikeshare-gsp-tools bikeshare-gsp-tools Public

    Paper (working draft): Graph-Signal-Processing foundations for cycling-network expansion. Spectral bounds, optimal siting (D-optimal greedy, MCLP, k-median), empirical learning curves on dock-based…

    Python

  4. imd-national-catalogue imd-national-catalogue Public

    A reproducible commune-level IMD (Indice de Mobilité Douce) and IES catalogue for 34,858 French communes. Companion to Paper 04 of the BikeShare-ICT series.

    Python

  5. penality-analysis penality-analysis Public

    Paper: Triple-penalty mobility-justice diagnostic on 34,858 French communes. Identifies cycling-poverty deserts at the intersection of cycling-environment deprivation, monetary poverty and structur…

    TeX

Repositories

Showing 10 of 14 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…