Last-Mile Route Optimization K-Means Clustering & Geospatial Analytics
Amazon × MIT · 2018 5 Metro Areas 898K stops analyzed

Dashboard

Project overview, dataset context, and key performance metrics from the cleaned Amazon last-mile data.

Project overview
Last-mile delivery accounts for ~53% of total shipping costs. This analysis uses real Amazon delivery data (2018) across five US metros to improve routing efficiency, reduce operational cost, and lower emissions.
Dataset
Amazon × MIT Last Mile Routing Challenge — GPS coordinates, route records, stop-level delivery details
Methods
7-step cleaning pipeline · K-Means (50K sample) · Elbow & Silhouette · OR-Tools · EPA SmartWay 0.21 kg CO₂/km
Metros
Los Angeles, CA · Chicago, IL · Seattle, WA · Boston, MA · Austin, TX
Total stops
898,415
across 5 metro areas
6,112 routes processed
Avg stops / route
146.7
std dev ± 31 stops
range 32 – 237
Best efficiency
188.3 km
Los Angeles avg route
~110 km less than Boston per route
CO₂ gap / route
23.2 kg
Boston vs Los Angeles
↓ 59% emissions inefficiency

Routes

Data cleaning pipeline and the distribution of stops per route across 6,112 delivery routes.

7-step preprocessing
Raw millions of GPS records → 898,415 valid stops across 6,112 complete routes
Depot-only records removed Duplicate stops removed Missing GPS dropped Invalid (0,0) coords removed Outside continental US filtered Unidentified city codes removed Routes <3 stops removed Extreme outlier distances removed
Distribution of stops per route
Near-normal bell curve centered at 146.7 — highly standardized operations
6,112 routes

Cities

Delivery volume, route distance, and side-by-side comparison for all five metropolitan areas.

Delivery stops by city
LA ~411,552 (46%) · Chicago ~161K · Seattle ~154K · Boston ~140K · Austin ~31K
Volume
Los Angeles Chicago Seattle Boston Austin
LA:411552, Chicago:161408, Seattle:154702, Boston:139693, Austin:31060
Average route distance
Per city — same stops, very different distances
km
Efficient Inefficient
LA:188.3, Seattle:196.2, Austin:249.6, Chicago:283.9, Boston:298.8
Delivery volume share
% of total 898,415 stops
Los Angeles45.8%
Chicago18.0%
Seattle17.2%
Boston15.5%
Austin3.5%
City comparison table
All key metrics side by side
City Stops Avg km CO₂ kg Status
Los Angeles 411,552 188.3 39.5 Efficient
Seattle 154,702 196.2 41.2 Efficient
Austin 31,060 249.6 52.4 Moderate
Chicago 161,408 283.9 59.6 Inefficient
Boston 139,693 298.8 62.7 Inefficient

Emissions

CO₂ impact per route by city using EPA SmartWay emission factors.

CO₂ emissions per route
EPA SmartWay factor · 0.21 kg CO₂ per km — Boston emits 59% more than LA
kg/route
Low Medium High
LA:39.5, Seattle:41.2, Austin:52.4, Chicago:59.6, Boston:62.7
Los Angeles — 39.5 kg/route
Lowest emissions despite highest delivery volume (411,552 stops).
Boston — 62.7 kg/route
~23.2 kg more CO₂ per route than LA. Routing improvements alone can cut this without new EVs.
Chicago — 59.6 kg/route
Second-highest emissions, aligned with long average route distance (283.9 km).

Optimizer

What-if calculator — estimate savings if Boston routes matched Los Angeles efficiency.

What-if savings calculator
If Boston routes matched LA efficiency, how much would be saved?
Interactive
Route optimization level: 50%
Distance saved per route
55.3 km
out of 110.5 km excess
CO₂ saved per route
11.6 kg
out of 23.2 kg excess
Fleet annual CO₂ savings
70.9K kg
across ~6,112 Boston routes/yr

Clustering

K-Means zone identification, silhouette validation, and delivery territory analysis.

K-Means silhouette score
Peaks at K=5 with score 0.96 — optimal clusters
Clustering
Silhouette score Optimal (K=5)
K2:0.71,K3:0.88,K4:0.89,K5:0.96,K6:0.65,K7:0.62,K8:0.61,K9:0.61,K10:0.61
Elbow method — inertia
Steep drop K=2→5, then flattens — diminishing returns beyond 5 clusters
K-Means
Clustering validation (report)
Both methods confirm optimal K = 5, matching the five metro regions
Elbow method
Inertia drops sharply between K=2 and K=5, then improvement flattens significantly.
Silhouette score
Peaks at 0.96 for K=5 — exceptionally high for real-world geospatial clustering. Declines sharply beyond K=5.
K-Means on 50,000 random delivery coordinates separated five perfectly distinct geographic zones with essentially zero overlap between metros.
K-Means delivery zones
5 clusters identified with silhouette score 0.96 — zero overlap between metro regions
Zone 1 — Los Angeles
Zone 2 — Chicago
Zone 3 — Seattle
Zone 4 — Boston
Zone 5 — Austin
Cluster quality metrics
Optimal K
K = 5
Silhouette score
0.96
Cluster overlap
~0%
Sample size
50K

Insights

Key findings, strategic recommendations, and overall conclusions from the Hack Report.

Boston is 59% less efficient
Drivers travel ~110 km more per route than LA while completing the same number of stops — driven by historic road layouts and water barriers.
No new EVs needed
Boston produces 23.2 kg more CO₂ per route than LA. Better routing alone — no new vehicles or infrastructure — can substantially cut emissions.
Clustering score of 0.96
K-Means perfectly identified all 5 metro zones with near-zero cluster overlap — validating automated territory design over manual boundaries.
1. Prioritize Boston
Highest avg route distance (298.8 km) and CO₂ (62.7 kg/route). Redesign territories, improve stop sequencing, optimize depot coverage.
2. Benchmark Los Angeles
Highest volume (411,552 stops) yet shortest routes (188.3 km). Use LA as the efficiency benchmark for other metros.
3. Expand clustering
Natural delivery zones emerge from stop coordinates (silhouette 0.96). Use clustering for territory design instead of static admin boundaries.
4. Improve Austin depots
Lowest volume (31,060 stops) with high route distance (249.6 km). Reposition fulfillment hubs to reduce travel as demand grows.
Overall findings
Key conclusions from the Hack Report analysis
  • Geographic clustering is highly effective — K=5 with silhouette 0.96 validates automated territory planning.
  • Route inefficiency varies substantially by city — Boston routes are ~59% longer than LA with nearly identical stop counts (~146.7).
  • CO₂ emissions track routing efficiency — Boston emits ~23.2 kg more CO₂ per route than LA (59% higher).
  • Operations are highly standardized — bell curve centered at 146.7 stops (σ ±31), so optimization scales predictably fleet-wide.
  • Delivery density improves efficiency — LA’s high volume still yields the shortest average route distance when properly optimized.

Interactive App

Explore K-Means delivery clusters on an interactive dark map — zoom, pan, and click stops for details.

K-Means delivery clusters on map
Explore delivery stops colored by cluster zone across five U.S. metros. Click any point for zone, city, and stop details. Built with Folium + Leaflet (aws_delivery_map.html).
Zone 1 — Los Angeles Zone 2 — Chicago Zone 3 — Boston Zone 4 — Seattle Zone 5 — Austin

Export

Print the full analysis or export raw metrics as JSON / CSV.

Download Hack Report
Print saves all sections, charts, and insights. JSON/CSV include city-level metrics only.
Use Print → Save as PDF for a full report. JSON/CSV: 5 cities + global stats (898,415 stops, 6,112 routes, K=5, silhouette 0.96).