Web pages of Lars Relund Nielsen

Distance matrix calculations in R

Many models in Operations Research needs a distance matrix between nodes in a network. If the distance matrix is based on road distances it can be found using R. Let us try to compute the distances between a set for zip codes in Denmark. First we load all zip codes for Jutland in Denmark:

options(stringsAsFactors = F)
zip<-read.csv2("zip_jutland.csv")
head(zip)
##    Zip       Area
## 1 5320    Agedrup
## 2 6753    Agerbæk
## 3 6534   Agerskov
## 4 8961 Allingåbro
## 5 6051     Almind
## 6 8592     Anholt
dim(zip)
## [1] 376   2

Note that we may have different zip codes for the same area, so we may use a single zip code for an area (this will reduce the number of calculations).

Using Google maps

First option is to use Google maps. We use the package ggmap to find the distance between two areas:

library(ggmap)
set<-1:5   # areas under considration
zip[set,]
##    Zip       Area
## 1 5320    Agedrup
## 2 6753    Agerbæk
## 3 6534   Agerskov
## 4 8961 Allingåbro
## 5 6051     Almind
distanceMat<-matrix(NA, nrow=length(set), ncol = length(set))
for(i in 1:length(set)) {
   for(j in 1:length(set)) {
      if (i==j) distanceMat[i,j] <- 0   
      else distanceMat[i,j] <- mapdist(paste(zip$Zip[set[i]],", Denmark", sep=""),
                                     paste(zip$Zip[set[j]], ", Denmark", sep=""),
                                     mode = 'driving', output = 'simple')$km 
   }
}
colnames(distanceMat)<-zip$Zip[set] 
rownames(distanceMat)<-zip$Zip[set] 
distanceMat
##         5320    6753    6534    8961    6051
## 5320   0.000 135.248 137.949 208.902  82.221
## 6753 136.759   0.000  71.860 178.424  58.829
## 6534 139.355  71.859   0.000 202.201  61.425
## 8961 210.037 179.297 201.797   0.000 146.069
## 6051  83.623  58.106  60.807 146.469   0.000
library(reshape2)
dat<-melt(distanceMat)
colnames(dat)<-c("from","to","km")
head(dat)
##   from   to      km
## 1 5320 5320   0.000
## 2 6753 5320 136.759
## 3 6534 5320 139.355
## 4 8961 5320 210.037
## 5 6051 5320  83.623
## 6 5320 6753 135.248

We use the zip as input to Google’s geocode function. We might have used the area name instead; however, it is not as stable since Google’s geocode function don’t like e.g. æ, ø and å. Two formats are returned which can be saved: a distance matrix and a long format.

Google’s API calculate shortest path distances based on Google maps. As a result the distances are not symmetric. If you can accept a small error replace if (i==j) distanceMatrix[i,j] <- 0 with if (i>j) next.
Note that the Google maps API limits to 2500 element queries a day. That is, for large distance matrices you have a problem. Your usage can be checked using:

distQueryCheck()
## 2480 distance queries remaining.

Using Bing maps

Bing maps don’t have a limit on the number of queries. However, it seems to be quite unstable (returning service unavaliable sometimes). We use the package taRifx.geo

library(taRifx.geo)
options(BingMapsKey="<Your Bing Maps API key>")
set<-1:5   # areas under considration
zip[set,]
##    Zip       Area
## 1 5320    Agedrup
## 2 6753    Agerbæk
## 3 6534   Agerskov
## 4 8961 Allingåbro
## 5 6051     Almind
distanceMatBing<-matrix(NA, nrow=length(set), ncol = length(set))
for (trys in 1:10) {   # number of times we try to connect
   for(i in set) {
      for(j in set) {
         if (i>j) {distanceMatBing[i,j] <- distanceMatBing[j,i]; next}   # assume symetric distances
         if (!is.na(distanceMatBing[i,j])) next   # value already calculated 
         if (i==j) {distanceMatBing[i,j] <- 0; next}
         result <- try(georoute( paste(c(zip$Zip[i], zip$Zip[j]), " Danmark", sep=""), returntype="distance",  service="bing" ) )
         if (class(result) != "try-error") {
            distanceMatBing[i,j] <- result[1,1]
         }
      }
   }
}
colnames(distanceMatBing)<-zip$Zip[set] 
rownames(distanceMatBing)<-zip$Zip[set] 
distanceMatBing
##         5320    6753    6534    8961    6051
## 5320   0.000 135.750 138.689 211.720  83.231
## 6753 135.750   0.000  81.983 179.541  59.115
## 6534 138.689  81.983   0.000 204.505  61.913
## 8961 211.720 179.541 204.505   0.000 148.726
## 6051  83.231  59.115  61.913 148.726   0.000
dat<-melt(distanceMatBing)
colnames(dat)<-c("from","to","km")
head(dat)
##   from   to      km
## 1 5320 5320   0.000
## 2 6753 5320 135.750
## 3 6534 5320 138.689
## 4 8961 5320 211.720
## 5 6051 5320  83.231
## 6 5320 6753 135.750

Note I have added a loop where we try to calculate the distance 10 times due to the unstable API. If we compare with the distances calculated using Google maps we see that there may be differences depending on which map engine we use.

I used the above code to find all distances between zips in jutland. The distance matrix is avaliable on GitHub.

Other solutions and resources

Note this way to calculate the distance matrix is brute force. A much better approach is to reoptimize the calculations using a All-to-All shortest path algorithm.

Leave a Reply