Back to Portfolio
Data ScienceBuilt on UHI-Pipe

UHI-Explorer

Interactive satellite ML classifier for urban heat islands.

Three cities, three different ML models, each chosen for the specific physics of its UHI pattern. Rio's heat is driven by thermal radiance and building density. Santiago's is controlled by elevation and thermal inversions. Freetown has no labeled data at all, so the classifier transfers knowledge from the other two using an ensemble with specialist routing.

0.96

Best F1

39

Features

64,255

Grid Points

3

Cities

Classification performance

From pixels to predictions

Satellite Extraction

UHI-Pipe pulls Sentinel-2 (optical), Landsat-8 (thermal), Copernicus DEM (elevation), and building footprints from Microsoft Planetary Computer. 100m grid, 5×5 pixel median.

Rio de Janeiro

XGBoost

Thermal radiance dominates Rio's UHI pattern. The city's dense urban core traps heat in concrete and asphalt, creating a clear spectral signature that XGBoost separates with high confidence. Building morphology features (compactness, sky view factor) add the physical mechanism that spectral bands alone cannot capture.

XGBoost

28,488 grid points at 100m resolution

0.96

F1 Score

n_estimators

300

max_depth

8

learning_rate

0.1

eval_metric

mlogloss

Iterative error correction on Medium-class boundary pixels. Boosting corrects exactly where Random Forest plateaus, pushing F1 from 0.947 to 0.959.

Top features

LST
22.0%
LST × NDVI
18.0%
LST × NDBI
14.0%
NDMI
10.0%
Building Compactness
6.0%
NDBI
5.0%
Elevation
4.0%
NDVI
4.0%

Ablation study

Spectral only (5 bands)
93.4%
+ Spectral indices
+0.2%93.6%
+ All features (RF)
+1.1%94.7%
+ All features (XGBoost)
+1.2%95.9%

Confusion matrix

Low
Medium
High
Predicted
Low
Medium
High
Actual
4,521
87
12
102
3,845
78
8
65
4,298
Rio de Janeiro Classifications28,488 grid points, 91 mismatches
High
Medium
Low

Only 91 mismatches in 28,488 grid points (99.7% agreement). Most errors cluster at coastal edges where water adjacency creates mixed pixels, and at forest-urban transitions where the 500m extraction footprint blends classes.

Santiago

Random Forest

Elevation dominates Santiago's UHI pattern, not thermal radiance. The city sits in a basin surrounded by the Andes, creating thermal inversion zones where hot air traps under cooler air at altitude. A 6-8°C temperature gradient can occur over short distances, making the Medium class genuinely ambiguous in spectral space.

Random Forest

21,662 grid points at 100m resolution

0.69

F1 Score

n_estimators

500

max_depth

20

min_samples_split

25

ccp_alpha

0.001

class_weight

balanced

Medium class is 49.9% of data. Unconstrained boosters overfit to the majority class. Constrained RF with cost-complexity pruning keeps per-class F1 balanced at 0.68-0.70.

Top features

Elevation
20.4%
Elev × LST
17.4%
LST
9.0%
NDVI
7.0%
LST × NDBI
6.0%
NDBI
5.0%
Albedo
4.0%
NDMI
4.0%

Per-class F1 score

High
0.68
Medium
0.70
Low
0.69

All three classes perform within 2 percentage points of each other. The balanced performance means the model is not sacrificing minority classes for overall accuracy.

Confusion matrix

Low
Medium
High
Predicted
Low
Medium
High
Actual
1,382
206
55
199
3,573
323
48
292
1,584
Santiago Classifications21,662 grid points
High
Medium
Low

Per-class F1 is remarkably balanced (High: 0.68, Medium: 0.70, Low: 0.69). No single class dramatically underperforms. The 69% ceiling is set by landscape physics, not algorithm choice. Thermal inversions create zones where ground truth itself is uncertain.

Freetown

Transfer Learning

Freetown has no labeled UHI data. The pipeline trains on Rio and Santiago, then transfers predictions to Freetown using only the 5 spectral bands that generalize across climates. All 13 spectral indices degraded transfer because their physical meaning is climate-specific. Building footprint data is unavailable for Freetown, eliminating 15 features.

Why single-source transfer fails

Leave-One-City-Out cross-validation shows that training on one city and predicting another produces unusable results. Climate, vegetation type, and urban morphology are too different between cities.

ChileBrazil

0.467

gap: -27.5%

BrazilChile

0.360

gap: -54.1%

Feature reduction

39

Total features

13 indices + 15 building + 6 interaction + 5 bands

5

Transferable bands

lwir11LSTSWIR2_NIRbluenir08

PCA(3)

Components retained

96% variance explained

All 13 spectral indices degraded transfer because their physical meaning is climate-specific. Rio's NDVI measures tropical broadleaf; Santiago's measures Mediterranean scrub. Building footprint data is unavailable for Freetown, eliminating 15 features. Only raw spectral bands generalize.

Specialist routing

Instead of one model for all classes, the ensemble routes each UHI class to the specialist model that handles it best. The routing is determined by which source city's training data best represents each class.

High UHI

Brazil XGBoost

Tropical thermal core matches informal settlements with dense impervious surfaces.

Medium UHI

Chile RF

Peri-urban transitions and mixed land cover are better captured by the constrained model.

Low UHI

Combined RF+XGB

Both models agree on cool vegetated surfaces. Ensemble averages for stability.

RF + XGBoost Ensemble

14,105 grid points at 100m resolution

0.58

F1 Score

RF n_estimators

300

RF max_depth

10

XGB n_estimators

300

XGB max_depth

6

XGB learning_rate

0.1

No ground-truth labels available. Single-source transfer fails badly (Chile→Brazil F1=0.467, Brazil→Chile F1=0.360). An ensemble routes each UHI class to the specialist model that handles it best.

Top features

lwir11
31.0%
LST
24.0%
SWIR2_NIR
19.0%
blue
14.0%
nir08
12.0%

Important caveat

The F1 score of 0.58 is validated against KMeans clustering pseudo-labels, not ground truth. There are no labeled UHI zones for Freetown. The spatial pattern is physically plausible (High in coastal lowland informal settlements, Low in forested uplands), but true classification accuracy is unknown. Nelder-Mead probability calibration adjusts score offsets to match approximate known prevalence before assigning hard labels.

Freetown UHI PredictionsEnsemble output, no ground truth available
High
Medium
Low

F1 of 0.58 is measured against KMeans clustering pseudo-labels, not ground truth. The spatial pattern is physically plausible: High UHI in coastal lowland informal settlements, Low in forested uplands. But true accuracy is unknown.

This project was a collaborative effort — built as part of a team research initiative combining remote sensing, machine learning, and urban climate analysis.

Technologies

Satellite MLGeospatialPythonEnvironmental