Spatial Statistics and Epidemiological Modelling

Applied spatial statistics project in R including geostatistical interpolation, spatial point pattern analysis and spatial epidemiological modelling.

This repository reproduces realistic spatial data science workflows commonly used in environmental sciences, epidemiology, ecological modelling and geospatial analytics.

The project combines:

Geostatistical interpolation
Spatial point process analysis
Spatial autocorrelation modelling
Spatial epidemiology
Predictive spatial inference

Project Overview

The repository is organised into three complementary spatial modelling frameworks:

Geostatistical modelling and universal kriging: Modeling elevation with non-stationary trends.
Spatial point pattern analysis: Testing CSR and interaction scales in biological and social patterns.
Spatial epidemiological modelling: Analyzing lead prevalence using spatial regression and neighbor structures.

Repository Structure

spatial-statistics-modelling/
│
├── data/
│   ├── elevation_spatial_data.csv
│   │
│   └── point_patterns/
│       ├── nests.csv
│       ├── trees.csv
│       ├── robberies.csv
│       └── truffles.csv
│
├── outputs/
│   ├── figures/
│   │   ├── spatial_locations.png
│   │   ├── trend_surfaces.png
│   │   ├── raw_variograms.png
│   │   ├── residual_variogram.png
│   │   ├── variogram_model_comparison.png
│   │   ├── universal_kriging_maps.png
│   │   │
│   │   ├── point_patterns_overview.png
│   │   ├── quadrat_analysis.png
│   │   ├── h_function_analysis.png
│   │   ├── g_function_analysis.png
│   │   ├── f_function_analysis.png
│   │   └── kernel_intensity_maps.png
│   │
│   │   ├── lead_prevalence_map.jpg
│   │   ├── neighbour_structure.png
│   │   ├── residual_distribution.png
│   │   └── spatial_residuals_map.jpg
│   │
│   └── tables/
│       ├── summary_statistics.csv
│       ├── variogram_model_parameters.csv
│       ├── point_prediction.csv
│       │
│       ├── point_pattern_summary.csv
│       ├── csr_quadrat_tests.csv
│       ├── clustering_indices.csv
│       │
│       ├── epidemiology_summary.csv
│       ├── logistic_model_results.csv
│       ├── moran_statistics.csv
│       └── model_comparison.csv
│
├── scripts/
│   ├── geostatistics_kriging.R
│   ├── point_pattern_analysis.R
│   └── spatial_epidemiology.R
│
├── README.md
└── LICENSE

Phase 1 — Geostatistical Modelling and Universal Kriging

The first stage focuses on spatial interpolation under non-stationary conditions using geostatistical methodologies.

The objective is to model terrain elevation while accounting for large-scale spatial trends and residual spatial dependence.

Dataset

Observations: 100 spatial locations
Variables: spatial coordinates (x, y) and elevation (alt)
Response variable: terrain elevation (meters)

Summary Statistics

Metric	Value
Mean elevation	1143.79 m
Minimum elevation	885.11 m
Maximum elevation	1462.68 m
Standard deviation	142.55 m

The spatial sampling design provides homogeneous spatial coverage without major unsampled gaps.

Geostatistical Workflow

The analysis follows a classical geostatistical modelling pipeline:

Exploratory spatial analysis
Spatial trend surface modelling
Empirical variogram estimation
Covariance model fitting
Universal kriging interpolation
Spatial uncertainty quantification

Spatial Trend Modelling

Polynomial trend surfaces of degree 1 and degree 2 were evaluated to assess deterministic spatial variation.

A strong spatial trend in elevation was detected, violating the constant mean assumption required for ordinary kriging. While a degree 2 surface was utilized, care must be taken as higher-degree polynomials can artificially force curvature and over-extrapolate near the study area boundaries.

This justified the use of:

Universal kriging
Residual covariance modelling after detrending

Variogram Analysis

Raw Variograms

Empirical variograms computed on the original elevation values displayed continuously increasing semivariance without stabilisation.

This behaviour confirmed:

Non-stationarity in the mean
Strong deterministic spatial trend

Residual Variogram

After removing the polynomial trend surface, the residual variogram exhibited stationary behaviour:

Nugget effect near the origin
Increasing semivariance with distance
Stabilisation around the sill

This confirmed the existence of structured residual spatial dependence suitable for covariance modelling.

Covariance Model Comparison

Several theoretical covariance structures were compared:

Exponential
Spherical
Matérn
Powered Exponential

Main Spatial Parameters

Parameter	Approximate Range
Nugget effect	566 – 1089
Sill	2000 – 2600

The nugget effect—representing nearly 40-50% of the total variance—suggests:

Substantial local terrain variability and micro-scale heterogeneity
A highly "noisy" spatial process where a large portion of the variance is not spatially structured at the sampled scale
Potential measurement noise

Universal Kriging

The final interpolation stage combined:

Polynomial spatial trend
Residual covariance structure
Universal kriging prediction

Main Outputs

Elevation Prediction Surface

A continuous terrain elevation map was reconstructed across the study region.

Kriging Variance Surface

Prediction uncertainty was quantified spatially.

As expected:

Prediction uncertainty is lowest near sampled locations
Uncertainty increases near borders and sparsely sampled regions

Point Prediction

The final objective consisted of predicting elevation at:

Coordinates: (3, 3)

Universal Kriging Prediction

Metric	Value
Predicted elevation	1161.19 m
Standard error	52.80 m
95% confidence interval	[1057.70, 1264.69]

The prediction standard error (52.80 m) is substantial relative to the total standard deviation of the series (142.55 m). This reflects the high nugget effect observed in the variogram, indicating that the final interpolation relies heavily on the deterministic polynomial trend surface rather than strong local spatial correlation.

Phase 2 — Spatial Point Pattern Analysis

The second stage analyses spatial point configurations under different stochastic spatial processes.

The objective is to distinguish between:

Complete spatial randomness
Inhibitory spatial structures
Clustered spatial processes
Spatial heterogeneity and hotspot formation

Point Pattern Workflow

The analysis combines:

Exploratory spatial visualisation
Quadrat-based CSR testing
Nearest-neighbour statistics
Spatial interaction functions
Kernel intensity estimation

Four spatial point patterns were analysed:

Nests
Trees
Robberies
Truffles

Complete Spatial Randomness (CSR) Testing

CSR was evaluated using:

Quadrat chi-square tests
Clark–Evans nearest-neighbour index (evaluating multiple edge corrections such as Donnelly and Guard)

Main Findings

Pattern	Spatial Structure
Nests	Approximately random (CSR)
Trees	Inhibitory / regular
Robberies	Strong clustering
Truffles	Strong clustering

Key Statistical Results

Nests

The nests pattern behaved almost exactly as a random spatial process:

Quadrat test p-value: 0.91
Clark–Evans index: ~1.0

This indicates compatibility with Complete Spatial Randomness.

Trees

The tree distribution exhibited strong spatial inhibition:

Quadrat test p-value: 0.0006
Clark–Evans index: ~1.45

An index greater than 1 indicates that trees are systematically more separated than expected under random placement, pointing towards biological competition for resources.

Robberies and Truffles

Both patterns strongly rejected CSR:

Extremely small p-values
Severe spatial clustering at short scales
Presence of concentrated hotspots

Nearest-Neighbour Analysis

The mean nearest-neighbour distance provided additional evidence regarding local spatial interaction.

Main Finding

Trees displayed the largest nearest-neighbour distances:

Mean NN distance ≈ 0.071

This confirms the existence of inhibitory spatial interaction at a local scale.

Conversely:

Robberies exhibited extremely short neighbour distances
Strong local aggregation was observed, consistent with urban density and socio-economic hotspots.

Spatial Interaction Functions

Three complementary functional statistics were analysed:

G function
F function
H/L (Besag) function

G Function

The trees pattern displayed a deficit of short distances relative to CSR expectations.

This confirms:

Spatial inhibition at short radii
Repulsion between neighbouring events driven by ecological constraints

F Function

Robberies and truffles displayed rapid increases in the empty-space function.

This indicates:

Large empty regions
Concentrated spatial clusters
Strong spatial heterogeneity driven by environmental or social variables

H/L Function

The H/L function revealed strong positive deviations for robberies and truffles.

This confirms:

Multi-scale clustering
Significant departure from random spatial organisation, peaking at distinct interaction distances.

Kernel Intensity Estimation

Kernel density estimation transformed discrete point events into continuous hotspot maps.

Main Findings

Nests

Relatively homogeneous intensity surface
Weak spatial heterogeneity
Near-random organisation

Trees

Smooth and regular spatial intensity
Absence of strong hotspots
Inhibitory structure

Robberies

Strong hotspot concentration
Extreme spatial aggregation
Large low-density surrounding areas

Truffles

Multiple clustered density regions
Spatial gradients in intensity
Aggregated point structure

Phase 3 — Spatial Epidemiological Modelling

The third stage analyses blood lead concentration in children across municipalities in Castellón province.

The objective is to evaluate the relationship between lead prevalence and demographic or socioeconomic covariates while assessing residual spatial dependence.

Epidemiological Workflow

The analysis combines:

Exploratory prevalence analysis
Spatial visualisation of disease burden
Logistic regression modelling
Spatial neighbour structure construction
Moran’s I autocorrelation analysis
Spatial regression comparison
Residual spatial diagnostics

Epidemiological Dataset

Study Population

Metric	Value
Municipalities	135
Total participants	2001
Positive cases	1026
Global prevalence	51.27%

The prevalence map revealed heterogeneous spatial distribution of elevated blood lead prevalence across municipalities.

Logistic Regression Analysis

The epidemiological model evaluated the association between prevalence and several socioeconomic covariates:

Income
Ageing / demographic structure
Unemployment

Main Findings

Only demographic ageing (edad) showed statistical significance:

Variable	Statistical Significance
Edad	p = 0.039
Income	p = 0.42
Unemployment	p = 0.32

The results suggest that demographic composition explains prevalence variation more strongly than economic indicators. The lack of statistical significance in economic variables may also be influenced by underlying multicollinearity or unmeasured local factors.

Spatial Autocorrelation

Spatial neighbour structures were constructed to evaluate municipality-level dependence. Autocorrelation was assessed using Moran’s I statistics.

Main Results

Variable	Moran Test p-value
Disease prevalence	0.109
Model residuals	0.063

Interpretation

The residual Moran's I test yielded a marginal p-value (0.063). While not strictly significant at the classical 5% level, this borderline result suggests a potential latent spatial structure that warrants further spatial modelling, rather than an absolute absence of spatial spillover effects. (Note: This sensitivity is highly dependent on the chosen contiguity/weight matrix, which may treat large inland and small coastal municipalities differently.)

Spatial Model Comparison

A classical logistic regression model was compared against a spatially informed epidemiological model.

Main Results

Model	AIC
Classical logistic model	704.21
Spatial logistic model	700.06

The spatial model also achieved lower deviance values.

Interpretation

Despite the marginal Moran's I p-value, incorporating neighbourhood information actively improved the predictive and explanatory performance (lowering the AIC). This confirms the presence of subtle latent spatial structure not fully captured by standard demographic covariates.

Residual Spatial Diagnostics

Residual analysis confirmed that the final spatial model achieved relatively balanced error behaviour.

Main Findings

Residual distributions showed no major systematic bias
Spatial residual maps identified specific municipality clusters where local unexplained variation remained elevated
These areas likely represent unmeasured local environmental or socioeconomic factors not included in the primary dataset

Generated Outputs

Figures

Output	Description
`spatial_locations.png`	Spatial distribution of sampled elevation points
`trend_surfaces.png`	Polynomial spatial trend surfaces
`raw_variograms.png`	Variograms before detrending
`residual_variogram.png`	Residual stationary variogram
`variogram_model_comparison.png`	Covariance model comparison
`universal_kriging_maps.png`	Universal kriging prediction and uncertainty
`point_patterns_overview.png`	Visualisation of all point patterns
`quadrat_analysis.png`	Quadrat partition analysis
`h_function_analysis.png`	H/L spatial interaction functions
`g_function_analysis.png`	G nearest-neighbour functions
`f_function_analysis.png`	F empty-space functions
`kernel_intensity_maps.png`	Kernel hotspot intensity estimation
`lead_prevalence_map.jpg`	Municipal prevalence distribution
`neighbour_structure.png`	Spatial neighbour network
`residual_distribution.png`	Histogram of model residuals
`spatial_residuals_map.jpg`	Spatial distribution of residuals

Tables

Output	Description
`summary_statistics.csv`	Elevation descriptive statistics
`variogram_model_parameters.csv`	Covariance model parameters
`point_prediction.csv`	Kriging point prediction
`point_pattern_summary.csv`	Point pattern descriptive statistics
`csr_quadrat_tests.csv`	CSR chi-square tests
`clustering_indices.csv`	Nearest-neighbour clustering metrics
`epidemiology_summary.csv`	Epidemiological descriptive statistics
`logistic_model_results.csv`	Logistic regression coefficients
`moran_statistics.csv`	Spatial autocorrelation statistics
`model_comparison.csv`	Spatial vs classical model comparison

Technologies

R
geoR
spatstat
sf
spdep
spatialreg
ggplot2
dplyr
spatial statistics
geostatistics
kriging
point process analysis
spatial epidemiology
spatial autocorrelation
logistic regression

Key Spatial Methodologies

The repository demonstrates applied implementations of:

Universal kriging
Variogram modelling
Spatial covariance estimation
Point process analysis
Complete Spatial Randomness testing
Kernel density estimation
Moran’s I spatial autocorrelation
Spatial logistic regression
Neighbourhood graph modelling
Residual spatial diagnostics

Author

Adrián Gómez Conde

MSc Biostatistics Candidate

Spatial Statistics · Geostatistics · Spatial Epidemiology · Applied Data Analysis

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
outputs		outputs
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.es.md		README.es.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Spatial Statistics and Epidemiological Modelling

Project Overview

Repository Structure

Phase 1 — Geostatistical Modelling and Universal Kriging

Dataset

Summary Statistics

Geostatistical Workflow

Spatial Trend Modelling

Variogram Analysis

Raw Variograms

Residual Variogram

Covariance Model Comparison

Main Spatial Parameters

Universal Kriging

Main Outputs

Elevation Prediction Surface

Kriging Variance Surface

Point Prediction

Universal Kriging Prediction

Phase 2 — Spatial Point Pattern Analysis

Point Pattern Workflow

Complete Spatial Randomness (CSR) Testing

Main Findings

Key Statistical Results

Nests

Trees

Robberies and Truffles

Nearest-Neighbour Analysis

Main Finding

Spatial Interaction Functions

G Function

F Function

H/L Function

Kernel Intensity Estimation

Main Findings

Nests

Trees

Robberies

Truffles

Phase 3 — Spatial Epidemiological Modelling

Epidemiological Workflow

Epidemiological Dataset

Study Population

Logistic Regression Analysis

Main Findings

Spatial Autocorrelation

Main Results

Interpretation

Spatial Model Comparison

Main Results

Interpretation

Residual Spatial Diagnostics

Main Findings

Generated Outputs

Figures

Tables

Technologies

Key Spatial Methodologies

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages