Skip to content

liteng16/phd-portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PhD Portfolio — Urban Mobility, Spatial Analysis & Geospatial Machine Learning

Visual portfolio of my PhD research at Monash University (Transport Engineering, 2021–2026).
This repo presents the frameworks, workflows, and key results from three case studies on active travel and shared micromobility in Melbourne, using social media data, urban heat island modelling, and interpretable machine learning.

📄 Publications: see Google Scholar
🔗 Profile: LinkedIn · Main GitHub


📌 Overview of Case Studies

# Topic Data Methods
1 NLP for active-travel extraction from social media Twitter, Greater Melbourne, 2018–2021 BERT classification · Named Entity Matching · Location fusion
2 Spatial heterogeneous impacts of urban heat island on active travel Mesh Block UHI index + social-media-derived trips Multiscale Geographically Weighted Regression (MGWR)
3 Non-linear effects on shared e-scooter speed Lime e-scooter & e-bike GPS trajectories (Melbourne CBD) Map-matching · XGBoost / Random Forest / CatBoost · SHAP

🗺️ Study Area & Cartographic Work

All case studies are situated in Greater Melbourne. The maps below — produced in ArcGIS Pro and QGIS — characterise the study area's geography, population, and multi-modal transport networks. They demonstrate cartographic design choices made for the thesis: layer hierarchy, classification schemes, label placement, and consistent visual identity across figures.

Greater Melbourne — Study Region

LGA boundaries grouped into Inner / Middle / Outer Melbourne to support multi-scale analysis.

Melbourne map

Bicycle Infrastructure Network

Eight infrastructure types classified across Greater Melbourne (top) with a City-of-Melbourne zoom (bottom) showing the protected-lane backbone in the inner city.

Bicycle network

Tram Network

Tram stops and routes overlaid on satellite imagery, with an inset locator. Service is concentrated in the inner and middle ring.

Tram network

📂 More maps — Population, Train, Bus & hourly trip patterns (click to expand)

Population — Density and Count by Suburb

Two complementary normalisations: per-km² density (top) and absolute count (bottom). Density highlights inner-city intensity; count highlights outer suburbs with large populations.

Population distribution

Tram — Hourly Trip Profile

Trips per hour for four day types. Clear AM (7am) and PM (4–5pm) commuter peaks on weekdays; flatter midday-dominant profile on weekends.

Tram hourly

Train Stations

Metropolitan rail network across Greater Melbourne, with radial extensions to outer LGAs (Wyndham, Casey, Cardinia, Mornington Peninsula).

Train stations

Train — Hourly Trip Profile

Distinct weekday commuter peaks (7am, 4–5pm) contrast with the flatter, daytime-dominant pattern on weekends.

Train hourly

Bus Stops

Greater Melbourne bus stop coverage — denser than tram and rail, extending into outer suburbs where rail does not reach.

Bus stops

Bus — Hourly Trip Profile

Bus shows the strongest weekday/weekend contrast of all three modes, with a sharp 7am AM peak and broad afternoon peak.

Bus hourly


Case 1: Unveiling Active Travel from Social Media with NLP

Problem. Social media is a rich but noisy source of active-travel signals. Most tweets lack precise geo-coordinates, so spatial analysis is usually limited to the small fraction of geo-tagged posts.
Contribution. A BERT-based classification + content-based location extraction pipeline that turns ungeotagged tweets into mappable active-travel trips, expanding the usable dataset substantially.

Framework

End-to-end pipeline: data cleansing → tweet classification → location extraction (Con-Loc + Geo-Loc) → information fusion.

Framework

BERT-CLA Classifier Architecture

Transformer-based encoder + linear classification head for identifying active-travel-related tweets.

BERT-CLA

Location Fusion Strategy

Reconciles content-based locations (extracted from tweet text via NEM) with geotag-based locations, with conflict-resolution rules.

(a) Decision flow for fusing Con-Loc and Geo-Loc:

Location fusion flowchart

(b) Worked example — from location-contained tweets to trip tabulation:

Location fusion example

Results

Extracted active-travel locations across Greater Melbourne (2018–2021)

4,042 POI-level observations recovered from text content, where geotagging alone would have missed most of them.

Result map

Variation across 31 LGAs — Pre-COVID vs COVID period

(a) Spatial change in location ratio by LGA:

Result LGA map

(b) Ratio comparison across LGAs:

Result LGA chart

📄 Publication (full paper included): Identifying active transport from spontaneous data sources with natural language processing, TRB 2024Google Scholar · PDF in repo
💡 Patent: A method for integrating geotagged location and text location information in social mediaCN117236316B


Case 2: Spatial Heterogeneous Impacts of Urban Heat Island on Active Travel

Problem. Urban heat island (UHI) effects on active travel are typically modelled as spatially uniform — but heat exposure and behavioural response vary substantially across the city.
Contribution. First spatial-heterogeneous assessment of UHI on active travel in Melbourne, using Multiscale Geographically Weighted Regression (MGWR) at Mesh Block resolution. MGWR allows each covariate (UHI, parkland, transit access, demographics, etc.) to operate at its own optimal spatial scale, revealing that different drivers of active travel respond to urban heat at fundamentally different geographic ranges.

Urban Heat Island Index — Mesh Block level, Greater Melbourne

UHI map

Spatial Heterogeneity of UHI Impact (MGWR coefficients)

UHI effects on all-day vs summer-only active-travel trips:

UHI heterogeneity

MGWR coefficients for built-environment & socio-demographic controls (Parkland, Tram stop density, Bus stops, Population density, Age 15–34, Household income, Unemployment, Higher education, Vehicle ownership). Each variable's panel shows its own optimal bandwidth (BW), indicating the spatial scale at which it most strongly affects active travel:

Other factor heterogeneity

📄 Publication (full paper included): Assessing the spatial heterogeneous impacts of urban heat island effects on active travel by leveraging social media data, Sustainable Cities and Society, 2025 — ScienceDirect · PDF in repo


Case 3: Non-linear Effects on Shared E-Scooter Speed

Problem. Linear models of micromobility speed miss the threshold and non-monotonic behaviours that matter most for infrastructure design.
Contribution. Interpretable ML on shared e-scooter & e-bike GPS trajectories (Lime, Melbourne CBD). SHAP reveals non-linear thresholds in shared-lane traffic speed, POI density, residential density, intersection volume, and air temperature.

Modelling Framework

Data preprocessing → map-matching → point-pairing → feature engineering → tree-based models → SHAP.

Framework

Study Area — Melbourne CBD and surroundings

Study area

Data Overview — Spatial trip density (weekday vs weekend)

Spatial distribution of shared e-scooter and e-bike trips across the study area.

Data overview

Temporal Pattern — Hourly trip distribution

E-scooter and e-bike show distinct temporal signatures: e-scooter is leisure-dominated (afternoon and late-night peaks, weekend > weekday), while e-bike weekday demand peaks sharply at 5pm — closer to a commuter pattern.

(a) Absolute hourly trip demand:

Hourly trips count

(b) Within-mode ratio by hour (normalised demand share):

Hourly trips ratio

SHAP-based Feature Importance

Left: per-observation SHAP values (violin plot). Right: mean absolute SHAP value (importance ranking).

SHAP violin   SHAP bar

Non-linear Partial Dependence (top continuous features)

Threshold and curvilinear effects of POI density, residential density, intersection volume, and air temperature on e-scooter speed.

SHAP PDP

📄 Related conference paper: Investigating the travel behaviour of e-scooter riders in Melbourne: A spatiotemporal analysis with PCA, ATRF 2024view


🛠️ Technical Stack

Languages — Python, SQL, R
Geospatial — ArcGIS Pro, QGIS, PostGIS, GeoPandas, Shapely
Spatial statistics — MGWR (mgwr Python library), Moran's I, spatial autocorrelation analysis
ML / NLP — PyTorch, Hugging Face Transformers, XGBoost, CatBoost, scikit-learn, SHAP
Data engineering — PySpark, ETL pipelines on large GPS trajectory datasets, map-matching with OSRM
Visualisation — ArcGIS cartography, matplotlib, Folium


📬 Contact

For role discussions, collaborations, or methodology questions:
LinkedIn · Main GitHub · Email

About

Visual portfolio of my PhD research on urban mobility, spatial analysis & geospatial ML — three case studies in Melbourne using NLP, MGWR, and interpretable ML.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors