Visual portfolio of my PhD research at Monash University (Transport Engineering, 2021–2026).
This repo presents the frameworks, workflows, and key results from three case studies on active travel and shared micromobility in Melbourne, using social media data, urban heat island modelling, and interpretable machine learning.
📄 Publications: see Google Scholar
🔗 Profile: LinkedIn · Main GitHub
| # | Topic | Data | Methods |
|---|---|---|---|
| 1 | NLP for active-travel extraction from social media | Twitter, Greater Melbourne, 2018–2021 | BERT classification · Named Entity Matching · Location fusion |
| 2 | Spatial heterogeneous impacts of urban heat island on active travel | Mesh Block UHI index + social-media-derived trips | Multiscale Geographically Weighted Regression (MGWR) |
| 3 | Non-linear effects on shared e-scooter speed | Lime e-scooter & e-bike GPS trajectories (Melbourne CBD) | Map-matching · XGBoost / Random Forest / CatBoost · SHAP |
All case studies are situated in Greater Melbourne. The maps below — produced in ArcGIS Pro and QGIS — characterise the study area's geography, population, and multi-modal transport networks. They demonstrate cartographic design choices made for the thesis: layer hierarchy, classification schemes, label placement, and consistent visual identity across figures.
LGA boundaries grouped into Inner / Middle / Outer Melbourne to support multi-scale analysis.
Eight infrastructure types classified across Greater Melbourne (top) with a City-of-Melbourne zoom (bottom) showing the protected-lane backbone in the inner city.
Tram stops and routes overlaid on satellite imagery, with an inset locator. Service is concentrated in the inner and middle ring.
📂 More maps — Population, Train, Bus & hourly trip patterns (click to expand)
Two complementary normalisations: per-km² density (top) and absolute count (bottom). Density highlights inner-city intensity; count highlights outer suburbs with large populations.
Trips per hour for four day types. Clear AM (7am) and PM (4–5pm) commuter peaks on weekdays; flatter midday-dominant profile on weekends.
Metropolitan rail network across Greater Melbourne, with radial extensions to outer LGAs (Wyndham, Casey, Cardinia, Mornington Peninsula).
Distinct weekday commuter peaks (7am, 4–5pm) contrast with the flatter, daytime-dominant pattern on weekends.
Greater Melbourne bus stop coverage — denser than tram and rail, extending into outer suburbs where rail does not reach.
Bus shows the strongest weekday/weekend contrast of all three modes, with a sharp 7am AM peak and broad afternoon peak.
Problem. Social media is a rich but noisy source of active-travel signals. Most tweets lack precise geo-coordinates, so spatial analysis is usually limited to the small fraction of geo-tagged posts.
Contribution. A BERT-based classification + content-based location extraction pipeline that turns ungeotagged tweets into mappable active-travel trips, expanding the usable dataset substantially.
End-to-end pipeline: data cleansing → tweet classification → location extraction (Con-Loc + Geo-Loc) → information fusion.
Transformer-based encoder + linear classification head for identifying active-travel-related tweets.
Reconciles content-based locations (extracted from tweet text via NEM) with geotag-based locations, with conflict-resolution rules.
(a) Decision flow for fusing Con-Loc and Geo-Loc:
(b) Worked example — from location-contained tweets to trip tabulation:
4,042 POI-level observations recovered from text content, where geotagging alone would have missed most of them.
(a) Spatial change in location ratio by LGA:
(b) Ratio comparison across LGAs:
📄 Publication (full paper included): Identifying active transport from spontaneous data sources with natural language processing, TRB 2024 — Google Scholar · PDF in repo
💡 Patent: A method for integrating geotagged location and text location information in social media — CN117236316B
Problem. Urban heat island (UHI) effects on active travel are typically modelled as spatially uniform — but heat exposure and behavioural response vary substantially across the city.
Contribution. First spatial-heterogeneous assessment of UHI on active travel in Melbourne, using Multiscale Geographically Weighted Regression (MGWR) at Mesh Block resolution. MGWR allows each covariate (UHI, parkland, transit access, demographics, etc.) to operate at its own optimal spatial scale, revealing that different drivers of active travel respond to urban heat at fundamentally different geographic ranges.
UHI effects on all-day vs summer-only active-travel trips:
MGWR coefficients for built-environment & socio-demographic controls (Parkland, Tram stop density, Bus stops, Population density, Age 15–34, Household income, Unemployment, Higher education, Vehicle ownership). Each variable's panel shows its own optimal bandwidth (BW), indicating the spatial scale at which it most strongly affects active travel:
📄 Publication (full paper included): Assessing the spatial heterogeneous impacts of urban heat island effects on active travel by leveraging social media data, Sustainable Cities and Society, 2025 — ScienceDirect · PDF in repo
Problem. Linear models of micromobility speed miss the threshold and non-monotonic behaviours that matter most for infrastructure design.
Contribution. Interpretable ML on shared e-scooter & e-bike GPS trajectories (Lime, Melbourne CBD). SHAP reveals non-linear thresholds in shared-lane traffic speed, POI density, residential density, intersection volume, and air temperature.
Data preprocessing → map-matching → point-pairing → feature engineering → tree-based models → SHAP.
Spatial distribution of shared e-scooter and e-bike trips across the study area.
E-scooter and e-bike show distinct temporal signatures: e-scooter is leisure-dominated (afternoon and late-night peaks, weekend > weekday), while e-bike weekday demand peaks sharply at 5pm — closer to a commuter pattern.
(a) Absolute hourly trip demand:
(b) Within-mode ratio by hour (normalised demand share):
Left: per-observation SHAP values (violin plot). Right: mean absolute SHAP value (importance ranking).
Threshold and curvilinear effects of POI density, residential density, intersection volume, and air temperature on e-scooter speed.
📄 Related conference paper: Investigating the travel behaviour of e-scooter riders in Melbourne: A spatiotemporal analysis with PCA, ATRF 2024 — view
Languages — Python, SQL, R
Geospatial — ArcGIS Pro, QGIS, PostGIS, GeoPandas, Shapely
Spatial statistics — MGWR (mgwr Python library), Moran's I, spatial autocorrelation analysis
ML / NLP — PyTorch, Hugging Face Transformers, XGBoost, CatBoost, scikit-learn, SHAP
Data engineering — PySpark, ETL pipelines on large GPS trajectory datasets, map-matching with OSRM
Visualisation — ArcGIS cartography, matplotlib, Folium
For role discussions, collaborations, or methodology questions:
LinkedIn · Main GitHub · Email


























